Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

2.3. Principal Components Using a Correlation Matrix 23Figure 2.1. Contours of constant probability based on Σ 1 = ( 80444480).Figure 2.2. Contours of constant probability based on Σ 2 = ( 8000440)44080 .

24 2. Properties of Population Principal Componentsthe original variables rearranged in decreasing order of the size of theirvariances. Also, the first few PCs account for little of the off-diagonal elementsof Σ in this case (see Property A3) above. In most circumstances,such a transformation to PCs is of little value, and it will not occur if thecorrelation, rather than covariance, matrix is used.The example has shown that it is unwise to use PCs on a covariancematrix when x consists of measurements of different types, unless there is astrong conviction that the units of measurements chosen for each element ofx are the only ones that make sense. Even when this condition holds, usingthe covariance matrix will not provide very informative PCs if the variableshave widely differing variances. Furthermore, with covariance matrices andnon-commensurable variables the PC scores are difficult to interpret—whatdoes it mean to add a temperature to a weight? For correlation matrices, thestandardized variates are all dimensionless and can be happily combinedto give PC scores (Legendre and Legendre, 1983, p. 129).Another problem with the use of covariance matrices is that it is moredifficult than with correlation matrices to compare informally the resultsfrom different analyses. Sizes of variances of PCs have the same implicationsfor different correlation matrices of the same dimension, but not for differentcovariance matrices. Also, patterns of coefficients in PCs can be readilycompared for different correlation matrices to see if the two correlationmatrices are giving similar PCs, whereas informal comparisons are oftenmuch trickier for covariance matrices. Formal methods for comparing PCsfrom different covariance matrices are, however, available (see Section 13.5).The use of covariance matrices does have one general advantage overcorrelation matrices, and a particular advantage seen in a special case. Thegeneral advantage is that statistical inference regarding population PCsbased on sample PCs is easier for covariance matrices than for correlationmatrices, as will be discussed in Section 3.7. This is relevant when PCAis used in a context where statistical inference is important. However, inpractice, it is more common to use PCA as a descriptive, rather than aninferential, tool, and then the potential advantage of covariance matrixPCA is irrelevant.The second advantage of covariance matrices holds in the special casewhen all elements of x are measured in the same units. It can then beargued that standardizing the elements of x to give correlations is equivalentto making an arbitrary choice of measurement units. This argumentof arbitrariness can also be applied more generally to the use of correlationmatrices, but when the elements of x are measurements of different types,the choice of measurement units leading to a covariance matrix is evenmore arbitrary, so that the correlation matrix is again preferred.Standardizing the variables may be thought of as an attempt to removethe problem of scale dependence from PCA. Another way of doing this isto compute PCs of the logarithms of the original data (Flury, 1997, Section8.4), though this is only feasible and sensible for restricted types of data,

24 2. Properties of Population <strong>Principal</strong> <strong>Component</strong>sthe original variables rearranged in decreasing order of the size of theirvariances. Also, the first few PCs account for little of the off-diagonal elementsof Σ in this case (see Property A3) above. In most circumstances,such a transformation to PCs is of little value, and it will not occur if thecorrelation, rather than covariance, matrix is used.The example has shown that it is unwise to use PCs on a covariancematrix when x consists of measurements of different types, unless there is astrong conviction that the units of measurements chosen for each element ofx are the only ones that make sense. Even when this condition holds, usingthe covariance matrix will not provide very informative PCs if the variableshave widely differing variances. Furthermore, with covariance matrices andnon-commensurable variables the PC scores are difficult to interpret—whatdoes it mean to add a temperature to a weight? For correlation matrices, thestandardized variates are all dimensionless and can be happily combinedto give PC scores (Legendre and Legendre, 1983, p. 129).Another problem with the use of covariance matrices is that it is moredifficult than with correlation matrices to compare informally the resultsfrom different analyses. Sizes of variances of PCs have the same implicationsfor different correlation matrices of the same dimension, but not for differentcovariance matrices. Also, patterns of coefficients in PCs can be readilycompared for different correlation matrices to see if the two correlationmatrices are giving similar PCs, whereas informal comparisons are oftenmuch trickier for covariance matrices. Formal methods for comparing PCsfrom different covariance matrices are, however, available (see Section 13.5).The use of covariance matrices does have one general advantage overcorrelation matrices, and a particular advantage seen in a special case. Thegeneral advantage is that statistical inference regarding population PCsbased on sample PCs is easier for covariance matrices than for correlationmatrices, as will be discussed in Section 3.7. This is relevant when PCAis used in a context where statistical inference is important. However, inpractice, it is more common to use PCA as a descriptive, rather than aninferential, tool, and then the potential advantage of covariance matrixPCA is irrelevant.The second advantage of covariance matrices holds in the special casewhen all elements of x are measured in the same units. It can then beargued that standardizing the elements of x to give correlations is equivalentto making an arbitrary choice of measurement units. This argumentof arbitrariness can also be applied more generally to the use of correlationmatrices, but when the elements of x are measurements of different types,the choice of measurement units leading to a covariance matrix is evenmore arbitrary, so that the correlation matrix is again preferred.Standardizing the variables may be thought of as an attempt to removethe problem of scale dependence from PCA. Another way of doing this isto compute PCs of the logarithms of the original data (Flury, 1997, Section8.4), though this is only feasible and sensible for restricted types of data,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!