Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
9.2. Cluster Analysis 213Before looking at examples of the uses just described of PCA in clusteranalysis, we discuss a rather different way in which cluster analysis canbe used and its connections with PCA. So far we have discussed clusteranalysis on observations or individuals, but in some circumstances it isdesirable to divide variables, rather than observations, into groups. In fact,by far the earliest book on cluster analysis (Tryon, 1939) was concernedwith this type of application. Provided that a suitable measure of similaritybetween variables can be defined—the correlation coefficient is an obviouscandidate—methods of cluster analysis used for observations can be readilyadapted for variables.One connection with PCA is that when the variables fall into well-definedclusters, then, as discussed in Section 3.8, there will be one high-variancePC and, except in the case of ‘single-variable’ clusters, one or more lowvariancePCs associated with each cluster of variables. Thus, PCA willidentify the presence of clusters among the variables, and can be thoughtof as a competitor to standard cluster analysis of variables. The use ofPCA in this way in fairly common in climatology (see, for example, Cohen(1983), White et al. (1991), Romero et al. (1999)). In an analysis of aclimate variable recorded at stations over a large geographical area, theloadings of the PCs at the various stations can be used to divide the areainto regions with high loadings on each PC. In fact, this regionalizationprocedure is usually more effective if the PCs are rotated (see Section 11.1)so that most analyses are done using rotated loadings.Identifying clusters of variables may be of general interest in investigatingthe structure of a data set but, more specifically, if we wish to reducethe number of variables without sacrificing too much information, then wecould retain one variable from each cluster. This is essentially the ideabehind some of the variable selection techniques based on PCA that weredescribed in Section 6.3.Hastie et al. (2000) describe a novel clustering procedure for ‘variables’which uses PCA applied in a genetic context. They call their method ‘geneshaving.’ Their data consist of p = 4673 gene expression measurementsfor n = 48 patients, and the objective is to classify the 4673 genes intogroups that have coherent expressions. The first PC is found for these dataand a proportion of the genes (typically 10%) having the smallest absoluteinner products with this PC are deleted (shaved). PCA followed by shavingis repeated for the reduced data set, and this procedure continues untilultimately only one gene remains. A nested sequence of subsets of genesresults from this algorithm and an optimality criterion is used to decidewhich set in the sequence is best. This gives the first cluster of genes. Thewhole procedure is then repeated after centering the data with respect tothe ‘average gene expression’ in the first cluster, to give a second clusterand so on.Another way of constructing clusters of variables, which simultaneouslyfinds the first PC within each cluster, is proposed by Vigneau and Qannari
214 9. Principal Components Used with Other Multivariate Techniques(2001). Suppose that the p variables are divided into G groups or clusters,and that x g denotes the vector of variables in the gth group, g =1, 2,...,G.Vigneau and Qannari (2001) seek vectors a 11 , a 21 ,...,a G1 that maximize∑ Gg=1 var(a′ g1x g ), where var(a ′ g1x g ) is the sample variance of the linearfunction a ′ g1x g . This sample variance is clearly maximized by the first PCfor the variables in the gth group, but simultaneously we wish to find thepartition of the variables into G groups for which the sum of these variancesis maximized. An iterative procedure is presented by Vigneau and Qannari(2001) for solving this problem.The formulation of the problem assumes that variables with large squaredcorrelations with the first PC in a cluster should be assigned to that cluster.Vigneau and Qannari consider two variations of their technique. In thefirst, the signs of the correlations between variables and PCs are important;only those variables with large positive correlations with a PC should bein its cluster. In the second, relationships with external variables are takeninto account.9.2.1 ExamplesOnly one example will be described in detail here, although a number ofother examples that have appeared elsewhere will be discussed briefly. Inmany of the published examples where PCs have been used in conjunctionwith cluster analysis, there is no clear-cut cluster structure, and clusteranalysis has been used as a dissection technique. An exception is the wellknownexample given by Jeffers (1967), which was discussed in the contextof variable selection in Section 6.4.1. The data consist of 19 variables measuredon 40 aphids and, when the 40 observations are plotted with respectto the first two PCs, there is a strong suggestion of four distinct groups; referto Figure 9.3, on which convex hulls (see Section 5.1) have been drawnaround the four suspected groups. It is likely that the four groups indicatedon Figure 9.3 correspond to four different species of aphids; thesefour species cannot be readily distinguished using only one variable at atime, but the plot with respect to the first two PCs clearly distinguishesthe four populations.The example introduced in Section 1.1 and discussed further in Section5.1.1, which has seven physical measurements on 28 students, also shows (inFigures 1.3, 5.1) how a plot with respect to the first two PCs can distinguishtwo groups, in this case men and women. There is, unlike the aphid data,a small amount of overlap between groups and if the PC plot is used toidentify, rather than verify, a cluster structure, then it is likely that somemisclassification between sexes will occur. A simple but specialized use ofPC scores, one PC at a time, to classify seabird communities is describedby Huettmann and Diamond (2001).In the situation where cluster analysis is used for dissection, the aim of atwo-dimensional plot with respect to the first two PCs will almost always be
- Page 194 and 195: 7.4. An Example of Factor Analysis
- Page 196 and 197: 7.5. Concluding Remarks 165To illus
- Page 198 and 199: 8Principal Components in Regression
- Page 200 and 201: 8.1. Principal Component Regression
- Page 202 and 203: 8.1. Principal Component Regression
- Page 204 and 205: 8.2. Selecting Components in Princi
- Page 206 and 207: 8.2. Selecting Components in Princi
- Page 208 and 209: 8.3. Connections Between PC Regress
- Page 210 and 211: 8.4. Variations on Principal Compon
- Page 212 and 213: 8.4. Variations on Principal Compon
- Page 214 and 215: 8.4. Variations on Principal Compon
- Page 216 and 217: 8.5. Variable Selection in Regressi
- Page 218 and 219: 8.5. Variable Selection in Regressi
- Page 220 and 221: 8.6. Functional and Structural Rela
- Page 222 and 223: 8.7. Examples of Principal Componen
- Page 224 and 225: Table 8.3. Principal component regr
- Page 226 and 227: 8.7. Examples of Principal Componen
- Page 228 and 229: 8.7. Examples of Principal Componen
- Page 230 and 231: 9Principal Components Used withOthe
- Page 232 and 233: 9.1. Discriminant Analysis 201on th
- Page 234 and 235: 9.1. Discriminant Analysis 203Figur
- Page 236 and 237: 9.1. Discriminant Analysis 205Corbi
- Page 238 and 239: 9.1. Discriminant Analysis 207that
- Page 240 and 241: 9.1. Discriminant Analysis 209betwe
- Page 242 and 243: 9.2. Cluster Analysis 211dimensiona
- Page 246 and 247: 9.2. Cluster Analysis 215Figure 9.3
- Page 248 and 249: 9.2. Cluster Analysis 217demographi
- Page 250 and 251: 9.2. Cluster Analysis 219county clu
- Page 252 and 253: 9.2. Cluster Analysis 221choosing a
- Page 254 and 255: 9.3. Canonical Correlation Analysis
- Page 256 and 257: 9.3. Canonical Correlation Analysis
- Page 258 and 259: 9.3. Canonical Correlation Analysis
- Page 260 and 261: 9.3. Canonical Correlation Analysis
- Page 262 and 263: 9.3. Canonical Correlation Analysis
- Page 264 and 265: 10.1. Detection of Outliers Using P
- Page 266 and 267: 10.1. Detection of Outliers Using P
- Page 268 and 269: 10.1. Detection of Outliers Using P
- Page 270 and 271: 10.1. Detection of Outliers Using P
- Page 272 and 273: 10.1. Detection of Outliers Using P
- Page 274 and 275: 10.1. Detection of Outliers Using P
- Page 276 and 277: 10.1. Detection of Outliers Using P
- Page 278 and 279: 10.1. Detection of Outliers Using P
- Page 280 and 281: 10.2. Influential Observations in a
- Page 282 and 283: 10.2. Influential Observations in a
- Page 284 and 285: 10.2. Influential Observations in a
- Page 286 and 287: 10.2. Influential Observations in a
- Page 288 and 289: 10.2. Influential Observations in a
- Page 290 and 291: 10.3. Sensitivity and Stability 259
- Page 292 and 293: 10.3. Sensitivity and Stability 261
214 9. <strong>Principal</strong> <strong>Component</strong>s Used with Other Multivariate Techniques(2001). Suppose that the p variables are divided into G groups or clusters,and that x g denotes the vector of variables in the gth group, g =1, 2,...,G.Vigneau and Qannari (2001) seek vectors a 11 , a 21 ,...,a G1 that maximize∑ Gg=1 var(a′ g1x g ), where var(a ′ g1x g ) is the sample variance of the linearfunction a ′ g1x g . This sample variance is clearly maximized by the first PCfor the variables in the gth group, but simultaneously we wish to find thepartition of the variables into G groups for which the sum of these variancesis maximized. An iterative procedure is presented by Vigneau and Qannari(2001) for solving this problem.The formulation of the problem assumes that variables with large squaredcorrelations with the first PC in a cluster should be assigned to that cluster.Vigneau and Qannari consider two variations of their technique. In thefirst, the signs of the correlations between variables and PCs are important;only those variables with large positive correlations with a PC should bein its cluster. In the second, relationships with external variables are takeninto account.9.2.1 ExamplesOnly one example will be described in detail here, although a number ofother examples that have appeared elsewhere will be discussed briefly. Inmany of the published examples where PCs have been used in conjunctionwith cluster analysis, there is no clear-cut cluster structure, and clusteranalysis has been used as a dissection technique. An exception is the wellknownexample given by Jeffers (1967), which was discussed in the contextof variable selection in Section 6.4.1. The data consist of 19 variables measuredon 40 aphids and, when the 40 observations are plotted with respectto the first two PCs, there is a strong suggestion of four distinct groups; referto Figure 9.3, on which convex hulls (see Section 5.1) have been drawnaround the four suspected groups. It is likely that the four groups indicatedon Figure 9.3 correspond to four different species of aphids; thesefour species cannot be readily distinguished using only one variable at atime, but the plot with respect to the first two PCs clearly distinguishesthe four populations.The example introduced in Section 1.1 and discussed further in Section5.1.1, which has seven physical measurements on 28 students, also shows (inFigures 1.3, 5.1) how a plot with respect to the first two PCs can distinguishtwo groups, in this case men and women. There is, unlike the aphid data,a small amount of overlap between groups and if the PC plot is used toidentify, rather than verify, a cluster structure, then it is likely that somemisclassification between sexes will occur. A simple but specialized use ofPC scores, one PC at a time, to classify seabird communities is describedby Huettmann and Diamond (2001).In the situation where cluster analysis is used for dissection, the aim of atwo-dimensional plot with respect to the first two PCs will almost always be