摘要:本文是企业管理中的聚类分析论文,聚类分析是能够解决我们的研究问题一种技术,它能够使用户确定自然基础结构的复杂数据集。在这样做时,我们可以分辨出企业的类型和特定的董事组织、公司的质量水平。
computes similarity of two clusters as the similarity of their most similar members. Complete linkage clustering measures the similarity of two clusters as the similarity of their most dissimilar members. In the analysis, we choose Ward's method (Ward, 1963), a method that is distinct from the aforementioned methods, as our linkage function. Ward's method (Ward, 1963) chooses each successive merging step through the criterion of minimizing the increase in the error sum of squares (ESS) at each step. The ESS of a set X of NX values is given by the functional relation,
where: |.| is the absolute value of a scalar value or the norm (the 'length') of a vector.
The linkage function referring to the distance between clusters X and Y is given by,
where: XY is the combined cluster resulting from merged clusters X and Y;
ESS(.) is the error sum of squares described above.
In addition to a linkage function, a metric for measuring distance between two objects is required. In our study, the Squared Euclidean Distance (SED) is chosen as our metric for distance measure for both Hierarchical Clustering and K Means clustering. If two objects, x1 and x2 in the Euclidean n-space is given by x1 = (x1i, x1i, …, x1n) and x2 = (x2i, x2i, …, x2n), then the SED between these two objects is given by,
While Agglomerative Hierarchical Clustering (AHC) does not require user to specify the number of clusters, k, a priori, a drawback of AHC is that it neglects the phenomenon of input order instability. In steps 2 and 3 of an AHC, a problem arises when two pairs of clusters are both calculated to have the smallest distance value. “In such cases arbitrary [italics added] decisions must be made” (Sneath & Sokal, 1973) to choose the pair of clusters that will be merged. These arbitrary decisions extend to computer programs (Spaans & Van der Kloot, 2005) and as a result, different input orders of objects in the proximity matrix can result in significantly different clusters solutions (Van der Kloot et al., 2005). To avoid this pitfall, we employ PermuCLUSTER for SPSS (Van der Kloot et al., 2005). This program repeats AHC for a user specified number of times by permuting the rows and columns of the proximity matrix. Thereafter, it evaluates the quality of each AHC solution by using a goodness-of-fit measure (SSDIFN) given by,
where: dij is the distances of the objects in the original proximity matrix
cij is the distances of the objects in the AHC tree
In our analysis, we first employ PermuCLUSTER where the number of AHC repetitions is set at 500, and evaluate the resultant optimal solutions obtained. Thereafter, we validate the clusters solutions (k = 2 to 35) of the optimal solution.
K Means clustering
In K Means clustering, K refers to the number of clusters, though unknown a priori, has to be specified by the user. There is a centroid in each cluster, usually computed as the mean of the variable vectors in that cluster. Clustering is decided based on the association to the nearest centroid.
The basic process of the K Means clustering (MacQueen, 1967) is:
Determine initial centroids.
Find the closest centroid to each object and assign the object to the cluster associated with this centroid.
Recalculate the centroid for each of new clusters
本论文由英语论文网提供整理,提供论文代写,英语论文代写,代写论文,代写英语论文,代写留学生论文,代写英文论文,留学生论文代写相关核心关键词搜索。