摘要:本文是企业管理中的聚类分析论文,聚类分析是能够解决我们的研究问题一种技术,它能够使用户确定自然基础结构的复杂数据集。在这样做时,我们可以分辨出企业的类型和特定的董事组织、公司的质量水平。
based on new cluster memberships.
Iterate through steps 2 and 3 until convergence.
The algorithm converges when cluster membership of data points remain unchanged. In this situation, other widely-used conditions such as centroid computed and sum of squared distances from data points to their centroids stay constant.
K Means clustering iteratively shifts objects to various clusters, seeking to minimize the sum of squared distances, denoted by J, between each object and its cluster centroid. The sum of squared distances, Ji, for the ith cluster, denoted as Ci is given by,
where:s the Squared Euclidean Distance from object x in Ci to its centroid yi
The sum of squared distances of all the k clusters is given by,
In step 1, different sets of initial centroids can ultimately result in different local minima of J. However, we would like to find the clusters solutions that can result in the global minimum. The best methodology involves utilizing all sets of initial centroids in the analysis, but it is expensive and thus not viable. As an alternative, we repeat K Means clustering (for k = 2 to 35) 500 times with 500 random sets of initial centroids, to find clusters solutions that are either global minima or at least local minima that is the closest to the global minima among the various local minima.
Expectation Maximization for Gaussian Mixture Model
In the Gaussian Mixture Model (GMM), Expectation maximization (EM) algorithm seeks to find the maximum likelihood estimates for mixture models when the model is dependent on unknown latent variables.
The main steps of the EM method are (Dempster et al., 1977):
Compute the parameters (mean and variance) for the k Gaussian distributions.
Using the probability density function of Gaussian distribution, calculate the probability density for each feature vector in each of the k clusters
With the probability densities calculated in step 2, re-compute the parameters for each of the k Gaussian distributions
Repeat Steps 2 and 3 until convergence.
We perform EM clustering using MIXMOD for Matlab (Biernacki et al., 2006) and the statistical documentation are as follows. Clustering using mixture models typically partitions x objects into K clusters denoted by labels , with and depending on whether xi is assigned to kth cluster or not. In a mixture model where n independent vectors of a dataset are represented by x = {x1,…,xn}, each xi arises from a probability distribution with density,
where: pk is the mixing proportions (0 < pk < 1 for all k = 1, …,K and p1 +…+pK = 1)
h(.|λk) is the d-dimensional distribution parameterized by λk.
As such, we can show how each xi arises from a probability distribution with density in a GMM by replacing λk with its associated d-dimensional Gaussian density with mean μk and variance matrix Σk ,
where:
= (p1…,pK, μ1,…,μK, ∑1,…, ∑K) is the vector of the mixture parameters
Clusters can be derived from the maximum likelihood estimates of the mixture parameters obtained by using the Expectation Maximization (EM) algorithm. The maximum likelihood estimation of the GMM is given by,
Each xiis assigned to the cluster that provides the largest conditional probability th
本论文由英语论文网提供整理,提供论文代写,英语论文代写,代写论文,代写英语论文,代写留学生论文,代写英文论文,留学生论文代写相关核心关键词搜索。