12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

294 11. Rotation and Interpretation of <strong>Principal</strong> <strong>Component</strong>sbread (x 1 ), vegetables (x 2 ) and milk (x 6 ). However, a question that arisesis whether this truncated version of PC2 is the best linear approximation toPC2 using only these three variables. The answer is an emphatic ‘No.’ Thebest linear approximation in a least squares sense is obtained by regressingPC2 on x 1 ,x 2 and x 6 . This gives ẑ 2 =0.55x 1 − 0.27x 2 +0.73x 7 ,whichdiffers notably in interpretation from ẑ 2 , as the expenditure on vegetablesis now contrasted, rather than averaged, with expenditure on bread andmilk. Furthermore ẑ 2 has a correlation of 0.964 with PC2, whereas thecorrelation between PC2 and ẑ 2 is 0.766. Hence the truncated componentẑ 2 not only gives a misleading interpretation of PC2 in terms of x 1 ,x 2 andx 6 , but also gives an inferior approximation compared to ẑ 2 .To illustrate Cadima and <strong>Jolliffe</strong>’s (1995) second point, the third PC inthe same example has coefficients 0.40, −0.29, −0.34, 0.07, 0.38, −0.23 and0.66 on the 7 variables. If the 4 variables x 1 ,x 3 ,x 5 ,x 7 are kept, the bestlinear approximation to PC3 has coefficients −0.06, −0.48, 0.41, 0.68, respectively,so that x 1 looks much less important. The correlation of theapproximation with PC3 is increased to 0.979, compared to 0.773 for thetruncated component. Furthermore, although x 1 has the second highestcoefficient in PC3, it is not a member of the 3-variable subset {x 3 ,x 5 ,x 7 }that best approximates PC3. This subset does almost as well as the best4-variable subset, achieving a correlation of 0.975 with PC3.Ali et al. (1985) also note that using loadings to interpret PCs can bemisleading, and suggest examining correlations between variables and PCsinstead. In a correlation matrix-based PCA, these PC-variable correlationsare equal to the loadings when the normalization ã k is used (see Section 2.3for the population version of this result). The use of PCA in regionalizationstudies in climatology, as discussed in Section 9.2, often uses these correlationsto define and interpret clusters. However, Cadima and <strong>Jolliffe</strong> (1995)show that neither the absolute size of loadings in a PC nor the absolute sizeof correlations between variables and components gives reliable guidanceon which subset of variables best approximates a PC (see also Rencher(1995, Section 12.8.3)). Cadima and <strong>Jolliffe</strong> (2001) give further examplesof this phenomenon in the context of variable selection (see Section 6.3).Richman and Gong (1999) present an extensive study, which starts fromthe explicit premise that loadings greater than some threshold are retained,while those below the threshold are set to zero. Their objective is to findan optimal value for this threshold in a spatial atmospheric science context.The loadings they consider are, in fact, based on the normalizationã k , so for correlation matrix PCA they are correlations between PCs andvariables. Rotated PCs and covariance matrix PCs are also included in thestudy. Richman and Gong (1999) use biserial correlation to measure howwell the loadings in a truncated PC (or truncated rotated PC) match thepattern of correlations between the spatial location whose loading on thePC (rotated PC) is largest, and all other spatial locations in a data set.The optimum threshold is the one for which this match is best. Results

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!