12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

9.2. Cluster <strong>Analysis</strong> 217demographic variables, which are listed in Table 9.1.The objective of Stone’s analysis, namely dissection of local authorityareas into clusters, was basically the same as that in other analyses byImber (1977), Webber and Craig (1978) and <strong>Jolliffe</strong> et al. (1986), but thesevarious analyses differ in the variables used and in the local authoritiesconsidered. For example, Stone’s list of variables is shorter than those of theother analyses, although it includes some variables not considered by any ofthe others. Also, Stone’s list of local authorities includes large metropolitancounties such as Greater London, Greater Manchester and Merseyside assingle entities, whereas these large authorities are subdivided into smallerareas in the other analyses. A comparison of the clusters obtained fromseveral different analyses is given by <strong>Jolliffe</strong> et al. (1986).As in other analyses of local authorities, PCA is used in Stone’s analysisin two ways: first, to summarize and explain the major sources of variationin the data, and second, to provide a visual display on which to judge theadequacy of the clustering.Table 9.2 gives the coefficients and variances for the first four PCs usingthe correlation matrix for Stone’s data. It is seen that the first two componentsaccount for 73% of the total variation, but that most of the relevantrules of Section 6.1 would retain four components (the fifth eigenvalueis 0.41).There are fairly clear interpretations for each of the first three PCs. Thefirst PC provides a contrast between urban and rural areas, with positivecoefficients for variables that are high in urban areas, such as densities ofpopulation, roads, and industrial and retail floor space; negative coefficientsoccur for owner occupation, percentage of employed men in agriculture, andoverall employment level, which at the time of the study tended to be higherin rural areas. The main contrast for component 2 is between the percentagesof the population below school-leaving age and above retirement age.This component is therefore a measure of the age of the population in eachcounty, and it identifies, at one extreme, the south coast retirement areas.The third PC contrasts employment and unemployment rates. This contrastis also present in the first urban versus rural PC, so that the thirdPC is measuring variation in employment/unemployment rates within ruralareas and within urban areas, rather than between the two types ofarea.Turning now to the cluster analysis of the data, Stone (1984) examinesseveral different clustering methods, and also considers the analysis withand without Greater London, which is very different from any other area,but whose omission produces surprisingly little change. Figure 9.4 showsthe position of the 46 counties with respect to the first two PCs, with thefour-cluster solution obtained using complete-linkage cluster analysis (seeGordon, 1999, p. 85) indicated by different symbols for different clusters.The results for complete-linkage are fairly similar to those found by severalof the other clustering methods investigated.In the four-cluster solution, the single observation at the bottom left of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!