摘要:本文是企业管理中的聚类分析论文,聚类分析是能够解决我们的研究问题一种技术,它能够使用户确定自然基础结构的复杂数据集。在这样做时,我们可以分辨出企业的类型和特定的董事组织、公司的质量水平。
a clustering tendency map. In a clustering tendency map, high values (represented by dark-coloured hexagons) of the U-matrix indicate possible clusters borders while uniform areas of low values (represented by light-coloured hexagons) show possible clusters. Figure 4.3 illustrates a high clustering tendency map and a low clustering tendency map.
Internal Validation
We mainly make use of internal validation indices to evaluate the fitness of a clusters solution. Fitness measures are associated with the geometrical properties of clusters (i.e. compactness, separation and connectedness). These properties are utilized as most clustering methods usually optimize these properties to discover underlying group structure in the data (Johnson, 1967; Dempster et al., 1977; Kaufman and Rousseeuw, 1990; Handl and Knowles, 2006). Utilization of internal validation indices also allows us to find the optimal number of clusters (k), indicated by the clusters solution with the highest quality. For Hierarchical Clustering and K Means clustering, employing the program CVAP (Wang et al., 2009), we validate our clusters solutions with two different indices - Average Silhouette Width and C-Index, to ensure that our clustering results are robust to different validation measures.
Average Silhouette Width
Average Silhouette Width is a composite index which measures both compactness and separation of clusters (Kaufman and Rousseeuw, 1990). Silhouette width compares the similarity between an object and other objects in the same cluster with the similarity between the same object and other objects in a neighbour cluster. A neighbour cluster N(Xi) to object Xi in cluster C(Xi) is defined as the cluster whose objects have the shortest average distance to object Xi amid all the clusters beside cluster C. The neighbor cluster N(Xi) is given by,
where: Xiis the objects in the dataset d(Xi,Xj) is the distance between two objects Xi and Xj The silhouette width for Xi, as denoted by Si, is given by, where: is the average distance between Xi and the objects in cluster C(Xi) is the average distance between Xi and the objects in neighbour cluster N(Xi) Silhouette width, Si, ranges from -1 to 1. When Si is close to 1, the clustering solution give good clusters and that Xi is likely to be assigned to the appropriate cluster. When Si is close to 0, Xican likely be assigned to another cluster and when Si is close to -1, Xi is likely to be assigned to a wrong cluster. Average Silhouette Width (AS) is given as,
Thus, the best clusters solution associated with the optimal number of clusters (k) is given by the AS with the largest value.
C Index
C Index (Hubert and Levin, 1976) is a cluster similarity measure. The best clusters solution is identified as the solution that results in the lowest value. C Index (C) is given by,
where: S is the sum of pairwise dissimilarities between all pairs of objects in the same cluster
If the cluster has n such dissimilarities, then Smin is the sum of the n smallest pairwise dissimilarities
Similarly, Smaxis the sum of the n largest distance for all the pairs of pattern
In CVAP (Wang et al., 2009) however, the optimal k is given by the value which results in the steepest knee. Steepest knee refers the greatest jump of indices value between 2 k.
Bayesian Infor
本论文由英语论文网提供整理,提供论文代写,英语论文代写,代写论文,代写英语论文,代写留学生论文,代写英文论文,留学生论文代写相关核心关键词搜索。