英语论文网

mation Criterion

When using EM algorithm to maximize the likelihood of the parameters, a phenomenon known as overfitting (Hand et al., 2001) may occur. In this situation, additional parameters are chosen to increase the likelihood obtained thus resulting in an overly complex model which fits the data too closely. As such, we validate our clusters solutions with Bayesian Information Criterion (BIC) as it is formulated to avoid overfitting. Typically, one can choose a model that maximizes the integrated likelihood given by, where: integrated likelihood is likelihood is m,K is the parameter space of the model m with K clusters is a non informative or a weakly informative prior distribution on for this model Following that, the asymptotic approximation of the integrated likelihood valid under regularity conditions (Schwarz, 1978) is given as, where: is the maximum likelihood estimate of vm,K is the number of free parameters in model m with K clusters Finally, this results in the minimization of BIC criterion given by, where is the maximum log-likelihood for m and K

1.1.1.1.3.Consensus Clustering

Consensus clustering, as its name suggests, is employed to find a consensus solution that is in agreement with several clusters solutions obtained through multiple clustering algorithms. As there are associated shortcomings with individual clustering algorithms, for instance in K Means clustering, one has to specify number of clusters a priori, consensus clustering, by combining solutions, can help eliminate such shortcomings. Further, consensus clusters solutions are less sensitive to noise, outliers or sample variations (Nguyen and Caruana, 2008). Shen et al. (2007) also contend that combining clusters results from multiple methods is more likely to expose the underlying natural group structure and trends present in the dataset. As a result, consensus clustering helps to increase the quality and robustness of clusters results (Strehl and Ghosh, 2002).

Intuitively, a superior aggregate solution should share as much information as possible with individual clusters solutions, as clusters which remain relatively similar across multiple algorithms runs are likely to reflect the actual group structure underlying the dataset. Accordingly, in our analysis, we make use of Meta-Clustering Algorithm (MCLA) (Strehl and Ghosh, 2002) that seeks to optimize the average mutual information shared between different pairs of clusters. Before illustrating MCLA, we will first explain the transformation of a cluster solution into hypergraph representation. Figure 4.4 illustrates the transformation of a cluster solution into hypergraph representation. Each cluster solution is transformed into a binary membership indicator matrix Hq with a column for each cluster (denoted as hyperedge hi). For each hi, 1 indicates that the vertex which corresponds to the row belongs to that hyperedge and 0 indicates otherwise.

MCLA is based on clustering clusters by first constructing a meta-graph. After transformation into hypergraph representation, MCLA first view all hyperedges as vertices of another regular and undirected graph, the meta-graph. Edges are weighted based on the similarity between vertices measured by the Jaccard measure. Next, the meta-graph is partitioned into k-balanced meta-clusters using METIS (Karypis and Kumar, 1998). Thereafter, hyperedges in each k meta-clusters are collapse into single