Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

9.2. Cluster Analysis 215Figure 9.3. Aphids: plot with respect to the first two PCs showing four groupscorresponding to species.to verify that a given dissection ‘looks’ reasonable, rather than to attemptto identify clusters. An early example of this type of use was given by Moserand Scott (1961), in their Figure 9.2. The PCA in their study, which hasalready been mentioned in Section 4.2, was a stepping stone on the wayto a cluster analysis of 157 British towns based on 57 variables. The PCswere used both in the construction of a distance measure, and as a meansof displaying the clusters in two dimensions.Principal components are used in cluster analysis in a similar mannerin other examples discussed in Section 4.2, details of which can be foundin Jolliffe et al. (1980, 1982a, 1986), Imber (1977) and Webber and Craig(1978). Each of these studies is concerned with demographic data, as is theexample described next in detail.Demographic Characteristics of English CountiesIn an unpublished undergraduate dissertation, Stone (1984) considereda cluster analysis of 46 English counties. For each county there were 12

216 9. Principal Components Used with Other Multivariate TechniquesTable 9.1. Demographic variables used in the analysis of 46 English counties.1. Population density—numbers per hectare2. Percentage of population aged under 163. Percentage of population above retirement age4. Percentage of men aged 16–65 who are employed5. Percentage of men aged 16–65 who are unemployed6. Percentage of population owning their own home7. Percentage of households which are ‘overcrowded’8. Percentage of employed men working in industry9. Percentage of employed men working in agriculture10. (Length of public roads)/(area of county)11. (Industrial floor space)/(area of county)12. (Shops and restaurant floor space)/(area of county)Table 9.2. Coefficients and variances for the first four PCs: English counties data.Component number 1 2 3 4⎧ 1 0.35 −0.19 0.29 0.062 0.02 0.60 −0.03 0.223 −0.11 −0.52 −0.27 −0.364 −0.30 0.07 0.59 −0.035 0.31 0.05 −0.57 0.07⎪⎨6 −0.29 0.09 −0.07 −0.59Variable7 0.38 0.04 0.09 0.088 0.13 0.50 −0.14 −0.349 −0.25 −0.17 −0.28 0.5110 0.37 −0.09 0.09 −0.18⎪⎩ 11 0.34 0.02 −0.00 −0.2412 0.35 −0.20 0.24 0.07Eigenvalue 6.27 2.53 1.16 0.96Cumulative percentageof total variation 52.3 73.3 83.0 90.9

216 9. <strong>Principal</strong> <strong>Component</strong>s Used with Other Multivariate TechniquesTable 9.1. Demographic variables used in the analysis of 46 English counties.1. Population density—numbers per hectare2. Percentage of population aged under 163. Percentage of population above retirement age4. Percentage of men aged 16–65 who are employed5. Percentage of men aged 16–65 who are unemployed6. Percentage of population owning their own home7. Percentage of households which are ‘overcrowded’8. Percentage of employed men working in industry9. Percentage of employed men working in agriculture10. (Length of public roads)/(area of county)11. (Industrial floor space)/(area of county)12. (Shops and restaurant floor space)/(area of county)Table 9.2. Coefficients and variances for the first four PCs: English counties data.<strong>Component</strong> number 1 2 3 4⎧ 1 0.35 −0.19 0.29 0.062 0.02 0.60 −0.03 0.223 −0.11 −0.52 −0.27 −0.364 −0.30 0.07 0.59 −0.035 0.31 0.05 −0.57 0.07⎪⎨6 −0.29 0.09 −0.07 −0.59Variable7 0.38 0.04 0.09 0.088 0.13 0.50 −0.14 −0.349 −0.25 −0.17 −0.28 0.5110 0.37 −0.09 0.09 −0.18⎪⎩ 11 0.34 0.02 −0.00 −0.2412 0.35 −0.20 0.24 0.07Eigenvalue 6.27 2.53 1.16 0.96Cumulative percentageof total variation 52.3 73.3 83.0 90.9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!