12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.4. <strong>Principal</strong> <strong>Component</strong>s with Equal and/or Zero Variances 433.4 <strong>Principal</strong> <strong>Component</strong>s with Equal and/or ZeroVariancesThe problems that arise when some of the eigenvalues of a populationcovariance matrix are zero and/or equal were discussed in Section 2.4;similar considerations hold when dealing with a sample.In practice, exactly equal non-zero eigenvalues are extremely rare. Evenif the underlying population covariance or correlation matrix has a patternthat gives equal eigenvalues, sampling variation will almost always ensurethat the sample eigenvalues are unequal. It should be noted, however, thatnearly equal eigenvalues need careful attention. The subspace spanned by aset of nearly equal eigenvalues that are well-separated from all other eigenvaluesis well-defined and stable, but individual PC directions within thatsubspace are unstable (see Section 10.3). This has implications for decidinghow many components to retain (Section 6.1), for assessing which observationsare influential (Section 10.2) and for deciding which components torotate (Section 11.1).With carefully selected variables, PCs with zero variances are a relativelyrare occurrence. When q zero eigenvalues do occur for a sample covarianceor correlation matrix, the implication is that the points x 1 , x 2 ,...,x n liein a (p − q)-dimensional subspace of p-dimensional space. This means thatthere are q separate linear functions of the p original variables having constantvalues for each of the observations x 1 , x 2 ,...,x n . Ideally, constantrelationships between the variables should be detected before doing a PCA,and the number of variables reduced so as to avoid them. However, priordetection will not always be possible, and the zero-variance PCs will enableany unsuspected constant relationships to be detected. Similarly, PCs withvery small, but non-zero, variances will define near-constant linear relationships.Finding such near-constant relationships may be of considerableinterest. In addition, low-variance PCs have a number of more specific potentialuses, as will be discussed at the end of Section 3.7 and in Sections 6.3,8.4, 8.6 and 10.1.3.4.1 ExampleHere we consider a second set of blood chemistry data, this time consistingof 16 variables measured on 36 patients. In fact, these observations andthose discussed in the previous section are both subsets of the same largerdata set. In the present subset, four of the variables, x 1 ,x 2 ,x 3 ,x 4 ,sumto1.00 for 35 patients and to 0.99 for the remaining patient, so that x 1 +x 2 + x 3 + x 4 is nearly constant. The last (sixteenth) PC for the correlationmatrix has variance less than 0.001, much smaller than the fifteenth, andis (rounding coefficients to the nearest 0.1) 0.7x ∗ 1 +0.3x ∗ 2 +0.7x ∗ 3 +0.1x ∗ 4,with all of the other 12 variables having negligible coefficients. Thus, the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!