Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
2.3. Principal Components Using a Correlation Matrix 23Figure 2.1. Contours of constant probability based on Σ 1 = ( 80444480).Figure 2.2. Contours of constant probability based on Σ 2 = ( 8000440)44080 .
24 2. Properties of Population Principal Componentsthe original variables rearranged in decreasing order of the size of theirvariances. Also, the first few PCs account for little of the off-diagonal elementsof Σ in this case (see Property A3) above. In most circumstances,such a transformation to PCs is of little value, and it will not occur if thecorrelation, rather than covariance, matrix is used.The example has shown that it is unwise to use PCs on a covariancematrix when x consists of measurements of different types, unless there is astrong conviction that the units of measurements chosen for each element ofx are the only ones that make sense. Even when this condition holds, usingthe covariance matrix will not provide very informative PCs if the variableshave widely differing variances. Furthermore, with covariance matrices andnon-commensurable variables the PC scores are difficult to interpret—whatdoes it mean to add a temperature to a weight? For correlation matrices, thestandardized variates are all dimensionless and can be happily combinedto give PC scores (Legendre and Legendre, 1983, p. 129).Another problem with the use of covariance matrices is that it is moredifficult than with correlation matrices to compare informally the resultsfrom different analyses. Sizes of variances of PCs have the same implicationsfor different correlation matrices of the same dimension, but not for differentcovariance matrices. Also, patterns of coefficients in PCs can be readilycompared for different correlation matrices to see if the two correlationmatrices are giving similar PCs, whereas informal comparisons are oftenmuch trickier for covariance matrices. Formal methods for comparing PCsfrom different covariance matrices are, however, available (see Section 13.5).The use of covariance matrices does have one general advantage overcorrelation matrices, and a particular advantage seen in a special case. Thegeneral advantage is that statistical inference regarding population PCsbased on sample PCs is easier for covariance matrices than for correlationmatrices, as will be discussed in Section 3.7. This is relevant when PCAis used in a context where statistical inference is important. However, inpractice, it is more common to use PCA as a descriptive, rather than aninferential, tool, and then the potential advantage of covariance matrixPCA is irrelevant.The second advantage of covariance matrices holds in the special casewhen all elements of x are measured in the same units. It can then beargued that standardizing the elements of x to give correlations is equivalentto making an arbitrary choice of measurement units. This argumentof arbitrariness can also be applied more generally to the use of correlationmatrices, but when the elements of x are measurements of different types,the choice of measurement units leading to a covariance matrix is evenmore arbitrary, so that the correlation matrix is again preferred.Standardizing the variables may be thought of as an attempt to removethe problem of scale dependence from PCA. Another way of doing this isto compute PCs of the logarithms of the original data (Flury, 1997, Section8.4), though this is only feasible and sensible for restricted types of data,
- Page 7 and 8: viPreface to the Second Editionerty
- Page 9 and 10: viiiPreface to the Second EditionA
- Page 11 and 12: xPreface to the First Editionand in
- Page 13 and 14: xiiPreface to the First EditionIn m
- Page 15 and 16: This page intentionally left blank
- Page 17 and 18: xviAcknowledgmentsthese institution
- Page 19 and 20: xviiiContents3.4.1 Example ........
- Page 21 and 22: xxContents10 Outlier Detection, Inf
- Page 23 and 24: This page intentionally left blank
- Page 25 and 26: xxivList of Figures5.2 Artistic qua
- Page 27 and 28: This page intentionally left blank
- Page 29 and 30: xxviiiList of Tables6.1 First six e
- Page 31 and 32: This page intentionally left blank
- Page 33 and 34: 2 1. IntroductionFigure 1.1. Plot o
- Page 35: 4 1. IntroductionFigure 1.3. Studen
- Page 38 and 39: 1.2. A Brief History of Principal C
- Page 40 and 41: 1.2. A Brief History of Principal C
- Page 42 and 43: 2.1. Optimal Algebraic Properties o
- Page 44 and 45: 2.1. Optimal Algebraic Properties o
- Page 46 and 47: 2.1. Optimal Algebraic Properties o
- Page 48 and 49: 2.1. Optimal Algebraic Properties o
- Page 50 and 51: 2.2. Geometric Properties of Popula
- Page 52 and 53: 2.3. Principal Components Using a C
- Page 56 and 57: 2.3. Principal Components Using a C
- Page 58 and 59: 2.4. Principal Components with Equa
- Page 60 and 61: 3Mathematical and StatisticalProper
- Page 62 and 63: where3.1. Optimal Algebraic Propert
- Page 64 and 65: 3.2. Geometric Properties of Sample
- Page 66 and 67: 3.2. Geometric Properties of Sample
- Page 68 and 69: 3.2. Geometric Properties of Sample
- Page 70 and 71: 3.3. Covariance and Correlation Mat
- Page 72 and 73: 3.3. Covariance and Correlation Mat
- Page 74 and 75: 3.4. Principal Components with Equa
- Page 76 and 77: show that X = ULA ′ .⎡ULA ′ =
- Page 78 and 79: 3.6. Probability Distributions for
- Page 80 and 81: 3.7. Inference Based on Sample Prin
- Page 82 and 83: 3.7.2 Interval Estimation3.7. Infer
- Page 84 and 85: 3.7. Inference Based on Sample Prin
- Page 86 and 87: 3.7. Inference Based on Sample Prin
- Page 88 and 89: 3.8. Patterned Covariance and Corre
- Page 90 and 91: 3.9. Models for Principal Component
- Page 92 and 93: 3.9. Models for Principal Component
- Page 94 and 95: 4Principal Components as a SmallNum
- Page 96 and 97: 4.1. Anatomical Measurements 65Tabl
- Page 98 and 99: 4.1. Anatomical Measurements 67spac
- Page 100 and 101: 4.2. The Elderly at Home 69Table 4.
- Page 102 and 103: 4.3. Spatial and Temporal Variation
24 2. Properties of Population <strong>Principal</strong> <strong>Component</strong>sthe original variables rearranged in decreasing order of the size of theirvariances. Also, the first few PCs account for little of the off-diagonal elementsof Σ in this case (see Property A3) above. In most circumstances,such a transformation to PCs is of little value, and it will not occur if thecorrelation, rather than covariance, matrix is used.The example has shown that it is unwise to use PCs on a covariancematrix when x consists of measurements of different types, unless there is astrong conviction that the units of measurements chosen for each element ofx are the only ones that make sense. Even when this condition holds, usingthe covariance matrix will not provide very informative PCs if the variableshave widely differing variances. Furthermore, with covariance matrices andnon-commensurable variables the PC scores are difficult to interpret—whatdoes it mean to add a temperature to a weight? For correlation matrices, thestandardized variates are all dimensionless and can be happily combinedto give PC scores (Legendre and Legendre, 1983, p. 129).Another problem with the use of covariance matrices is that it is moredifficult than with correlation matrices to compare informally the resultsfrom different analyses. Sizes of variances of PCs have the same implicationsfor different correlation matrices of the same dimension, but not for differentcovariance matrices. Also, patterns of coefficients in PCs can be readilycompared for different correlation matrices to see if the two correlationmatrices are giving similar PCs, whereas informal comparisons are oftenmuch trickier for covariance matrices. Formal methods for comparing PCsfrom different covariance matrices are, however, available (see Section 13.5).The use of covariance matrices does have one general advantage overcorrelation matrices, and a particular advantage seen in a special case. Thegeneral advantage is that statistical inference regarding population PCsbased on sample PCs is easier for covariance matrices than for correlationmatrices, as will be discussed in Section 3.7. This is relevant when PCAis used in a context where statistical inference is important. However, inpractice, it is more common to use PCA as a descriptive, rather than aninferential, tool, and then the potential advantage of covariance matrixPCA is irrelevant.The second advantage of covariance matrices holds in the special casewhen all elements of x are measured in the same units. It can then beargued that standardizing the elements of x to give correlations is equivalentto making an arbitrary choice of measurement units. This argumentof arbitrariness can also be applied more generally to the use of correlationmatrices, but when the elements of x are measurements of different types,the choice of measurement units leading to a covariance matrix is evenmore arbitrary, so that the correlation matrix is again preferred.Standardizing the variables may be thought of as an attempt to removethe problem of scale dependence from PCA. Another way of doing this isto compute PCs of the logarithms of the original data (Flury, 1997, Section8.4), though this is only feasible and sensible for restricted types of data,