英语论文网

at xi arises from it using the Maximum A posteriori (MAP) principle.

The MAP principle is given by,

where:

EM algorithm seeks to find maximum likelihood estimates through iteration of the expectation and maximization steps.

The mth iteration of the Expectation step is given by,

and the maximum likelihood estimate of the mth iteration (denoted by m) is updated using the conditional probabilities as conditional mixing weights. This leads to the Maximization step as given by,

where:

As in K Means clustering, different initial values of the parameters may lead to different local maxima of the maximum likelihood estimate function. As such, to ensure that we can get a maximum that is either the global maximum or a local maximum that is closest to the global maximum, for each k (where k = 2 to 35), we repeat the algorithm 500 times using 500 random sets of initial parameters. The optimal solutions are the clusters solutions which result in the highest maximum likelihood estimate.

1.1.1.1.2.Cluster Validation

As cluster analysis is an unsupervised technique, cluster validation is a necessary step to evaluate results of cluster analysis in an objective and quantitative manner.

The main validation objectives are:

Determination of clustering tendency

Determination of the number of clusters

Evaluate how well a clusters result represent the natural group structure underlying the data based on information intrinsic to the data alone (i.e. internal validation) (Handl, Knowles, & Kell, 2005);

Evaluate clusters results based on comparison with known class labels which correspond to the natural group structure underlying the data (i.e. external validation) (Handl, Knowles, & Kell, 2005)

As clustering techniques are known to find clusters even when there is no underlying cluster structure, objective 1 is fundamental for cluster analysis. Objective 2 is imperative because the number of clusters is an essential parameter in two clustering techniques that we employ. To the best of our knowledge, this is the first time a cluster analysis is performed on the market for directors. Hence, there are no established class labels that correspond to the natural cluster structure. Thus we will only carry out an evaluation based on internal validation measures.

Assessment of Clustering Tendency

In comparison to other validation steps, assessing clustering tendency is a step prior to actual clustering of the data. In our study, we utilize self-organizing maps (SOMs) to assess the clustering tendency of our data. SOM Toolbox for Matlab (Vesanto et al., 1999) is employed to perform SOM training and visualization. An SOM consists of neurons as components that are organized on a regular low dimensional grid. Each neuron is represented by a weight vector of the same dimensions as the input vectors. Connections between adjacent neurons are by a neighbourhood relation, which dictates the topology, or structure, of the map. The SOM training algorithm moves the weight vectors around so that the map is organized in a way whereby neurons of similar weight vectors are grouped together.Visualization of SOM is performed through the U-matrix. By visualizing distances between neighbouring map units, U-matrix allows the creation of