英语论文网

第二章：文献综述

我们采用的聚类分析是能够解决我们的研究问题一种技术，它能够使用户确定自然基础结构的复杂数据集。在这样做时，我们可以分辨出企业的类型和特定的董事组织、公司的质量水平。具体来说，我们希望董事会能够将高质量的和低质量的企业分别组合在一起。图4.1显示了一个流程图，这个流程图说明了应用此研究的方法。聚类分析后，我们进行了回归分析。利用回归分析发现的关系，我们为包括在我们分析过程的董事计算所有管理者人数。最后，我们在每个集团董事会董事得到的聚类分析之前做了一个实际和预测的数目的比较。这样的比较，使我们能够直接处理三章提出的假设。

1.1.1.聚类分析

聚类分析是一种非监督学习技术，它允许用户通过自然群体结构的基本数据来识别探索复杂的数据集（埃弗里特，1993年；Jain等人。1999年；杜达等人。2001年；Hastie等人。2001年）。

CHAPTER TWO: LITERATURE REVIEW

We employ cluster analysis to address our research question as it is a technique that allows users to identify natural underlying group structure of complex datasets. In doing so, we can identify the type of firms and the level of firm quality that a particular group of directors is associated with. Specifically, we expect that directors sitting on high and low quality firms will be grouped together respectively. Figure 4.1 shows a flowchart illustrating the methodology employed in this study. After cluster analysis, we perform regression analysis. Using the relation found in regression analysis, we compute the predicted number of directorships for all directors included in our analysis. Finally, we do a comparison of the actual and predicted number of directorships for directors in each group obtained from cluster analysis earlier. Such a comparison allows us to directly address our hypotheses developed in Chapter Three.

1.1.1.Cluster Analysis

Cluster Analysis is an unsupervised learning technique, which allows users to explore complex datasets, through the identification of natural group structures underlying the data (Everitt, 1993; Jain et al., 1999; Duda et al., 2001; Hastie et al., 2001). In essence, cluster analysis partitions objects into various groups for which similarity between objects in the same group, and dissimilarity between groups are maximized (Hand et al., 2001). These objects are represented by multi-dimensional variables, and the similarity or dissimilarity between two objects are measured as the distance between the multi-dimensional variables vectors that represent the objects.

There are many ways of performing cluster analysis, and it is highly improbable that two different algorithms chosen will lead to completely identical partitions of data. Further, each algorithm is associated with potential shortcomings; no one particular algorithm can be considered the best. Hence, rather than choosing a particular algorithm to perform our cluster analysis and following Shen et al. (2007), we perform cluster analysis by employing three conceptually different algorithms - Hierarchical Clustering, K Means clustering and Expected Maximization for Gaussian Mixture Model, and subsequently employ consensus clustering to obtain a single consensus solution. This methodology allows us to avoid the potential shortcomings of various clustering methods, improving the reliability of our clusters solutions. Figure 4.2 shows a flowchart illustrating the steps we take to perform cluster analysis.

There are two main methods of clustering directors in our study; either by individual director characteristics or by individual firm characteristics. Firm characteristics can differentiate between directors as they define the type of firm a director is associated with. For instance, two di