留学生聚类分析论文 [7]
论文作者:英语论文论文属性:作业 Assignment登出时间:2014-09-16编辑:zcm84984点击率:8182
论文字数:4657论文编号:org201409131114493807语种:英语 English地区:南非价格:免费论文
关键词:利用回归分析聚类分析Employ cluster analysis自然群体结构董事组织
摘要:本文是企业管理中的聚类分析论文,聚类分析是能够解决我们的研究问题一种技术,它能够使用户确定自然基础结构的复杂数据集。在这样做时,我们可以分辨出企业的类型和特定的董事组织、公司的质量水平。
mation Criterion
When using EM algorithm to maximize the likelihood of the parameters, a phenomenon known as overfitting (Hand et al., 2001) may occur. In this situation, additional parameters are chosen to increase the likelihood obtained thus resulting in an overly complex model which fits the data too closely. As such, we validate our clusters solutions with Bayesian Information Criterion (BIC) as it is formulated to avoid overfitting. Typically, one can choose a model that maximizes the integrated likelihood given by, where: integrated likelihood is likelihood is m,K is the parameter space of the model m with K clusters is a non informative or a weakly informative prior distribution on for this model Following that, the asymptotic approximation of the integrated likelihood valid under regularity conditions (Schwarz, 1978) is given as, where: is the maximum likelihood estimate of vm,K is the number of free parameters in model m with K clusters Finally, this results in the minimization of BIC criterion given by, where is the maximum log-likelihood for m and K
1.1.1.1.3.Consensus Clustering
Consensus clustering, as its name suggests, is employed to find a consensus solution that is in agreement with several clusters solutions obtained through multiple clustering algorithms. As there are associated shortcomings with individual clustering algorithms, for instance in K Means clustering, one has to specify number of clusters a priori, consensus clustering, by combining solutions, can help eliminate such shortcomings. Further, consensus clusters solutions are less sensitive to noise, outliers or sample variations (Nguyen and Caruana, 2008). Shen et al. (2007) also contend that combining clusters results from multiple methods is more likely to expose the underlying natural group structure and trends present in the dataset. As a result, consensus clustering helps to increase the quality and robustness of clusters results (Strehl and Ghosh, 2002).
Intuitively, a superior aggregate solution should share as much information as possible with individual clusters solutions, as clusters which remain relatively similar across multiple algorithms runs are likely to reflect the actual group structure underlying the dataset. Accordingly, in our analysis, we make use of Meta-Clustering Algorithm (MCLA) (Strehl and Ghosh, 2002) that seeks to optimize the average mutual information shared between different pairs of clusters. Before illustrating MCLA, we will first explain the transformation of a cluster solution into hypergraph representation. Figure 4.4 illustrates the transformation of a cluster solution into hypergraph representation. Each cluster solution is transformed into a binary membership indicator matrix Hq with a column for each cluster (denoted as hyperedge hi). For each hi, 1 indicates that the vertex which corresponds to the row belongs to that hyperedge and 0 indicates otherwise.
MCLA is based on clustering clusters by first constructing a meta-graph. After transformation into hypergraph representation, MCLA first view all hyperedges as vertices of another regular and undirected graph, the meta-graph. Edges are weighted based on the similarity between vertices measured by the Jaccard measure. Next, the meta-graph is partitioned into k-balanced meta-clusters using METIS (Karypis and Kumar, 1998). Thereafter, hyperedges in each k meta-clusters are collapse into single
本论文由英语论文网提供整理,提供论文代写,英语论文代写,代写论文,代写英语论文,代写留学生论文,代写英文论文,留学生论文代写相关核心关键词搜索。