Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
3Mathematical and StatisticalProperties of Sample PrincipalComponentsThe first part of this chapter is similar in structure to Chapter 2, exceptthat it deals with properties of PCs obtained from a sample covariance(or correlation) matrix, rather than from a population covariance (or correlation)matrix. The first two sections of the chapter, as in Chapter 2,describe, respectively, many of the algebraic and geometric properties ofPCs. Most of the properties discussed in Chapter 2 are almost the same forsamples as for populations. They will be mentioned again, but only briefly.There are, in addition, some properties that are relevant only to samplePCs, and these will be discussed more fully.The third and fourth sections of the chapter again mirror those of Chapter2. The third section discusses, with an example, the choice betweencorrelation and covariance matrices, while the fourth section looks at theimplications of equal and/or zero variances among the PCs, and illustratesthe potential usefulness of the last few PCs in detecting near-constantrelationships between the variables.The last five sections of the chapter cover material having no counterpartin Chapter 2. Section 3.5 discusses the singular value decomposition, whichcould have been included in Section 3.1 as an additional algebraic property.However, the topic is sufficiently important to warrant its own section, asit provides a useful alternative approach to some of the theory surroundingPCs, and also gives an efficient practical method for actually computingPCs.The sixth section looks at the probability distributions of the coefficientsand variances of a set of sample PCs, in other words, the probability distributionsof the eigenvectors and eigenvalues of a sample covariance matrix.
30 3. Properties of Sample Principal ComponentsThe seventh section then goes on to show how these distributions may beused to make statistical inferences about the population PCs, based onsample PCs.Section 3.8 demonstrates how the approximate structure and variancesof PCs can sometimes be deduced from patterns in the covariance or correlationmatrix. Finally, in Section 3.9 we discuss models that have beenproposed for PCA. The material could equally well have been included inChapter 2, but because the idea of maximum likelihood estimation arisesin some of the models we include it in the present chapter.3.1 Optimal Algebraic Properties of SamplePrincipal ComponentsBefore looking at the properties themselves, we need to establish somenotation. Suppose that we have n independent observations on the p-element random vector x; denote these n observations by x 1 , x 2 ,...,x n .Let ˜z i1 = a ′ 1x i ,i=1, 2,...,n, and choose the vector of coefficients a ′ 1 tomaximize the sample variance1n − 1n∑(˜z i1 − ¯z 1 ) 2i=1subject to the normalization constraint a ′ 1a 1 = 1. Next let ˜z i2 = a ′ 2x i ,i=1, 2,...,n,andchoosea ′ 2 to maximize the sample variance of the ˜z i2 subjectto the normalization constraint a ′ 2a 2 = 1, and subject also to the ˜z i2 beinguncorrelated with the ˜z i1 in the sample. Continuing this process in anobvious manner, we have a sample version of the definition of PCs given inSection 1.1. Thus a ′ kx is defined as the kth sample PC, k =1, 2,...,p,and˜z ik is the score for the ith observation on the kth PC. If the derivation inSection 1.1 is followed through, but with sample variances and covariancesreplacing population quantities, then it turns out that the sample varianceof the PC scores for the kth sample PC is l k ,thekth largest eigenvalue of thesample covariance matrix S for x 1 , x 2 ,...,x n ,anda k is the correspondingeigenvector for k =1, 2,...,p.Define the (n × p) matrices ˜X and ˜Z to have (i, k)th elements equal tothe value of the kth element ˜x ik of x i , and to ˜z ik , respectively. Then ˜Z and˜X are related by ˜Z = ˜XA, where A is the (p × p) orthogonal matrix whosekth column is a k .If the mean of each element of x is known to be zero, then S = 1 ˜X ′ n˜X.It is far more usual for the mean of x to be unknown, and in this case the(j, k)th element of S is1n − 1n∑(˜x ij − ¯x j )(˜x ik − ¯x k ),i=1
- Page 9 and 10: viiiPreface to the Second EditionA
- Page 11 and 12: xPreface to the First Editionand in
- Page 13 and 14: xiiPreface to the First EditionIn m
- Page 15 and 16: This page intentionally left blank
- Page 17 and 18: xviAcknowledgmentsthese institution
- Page 19 and 20: xviiiContents3.4.1 Example ........
- Page 21 and 22: xxContents10 Outlier Detection, Inf
- Page 23 and 24: This page intentionally left blank
- Page 25 and 26: xxivList of Figures5.2 Artistic qua
- Page 27 and 28: This page intentionally left blank
- Page 29 and 30: xxviiiList of Tables6.1 First six e
- Page 31 and 32: This page intentionally left blank
- Page 33 and 34: 2 1. IntroductionFigure 1.1. Plot o
- Page 35: 4 1. IntroductionFigure 1.3. Studen
- Page 38 and 39: 1.2. A Brief History of Principal C
- Page 40 and 41: 1.2. A Brief History of Principal C
- Page 42 and 43: 2.1. Optimal Algebraic Properties o
- Page 44 and 45: 2.1. Optimal Algebraic Properties o
- Page 46 and 47: 2.1. Optimal Algebraic Properties o
- Page 48 and 49: 2.1. Optimal Algebraic Properties o
- Page 50 and 51: 2.2. Geometric Properties of Popula
- Page 52 and 53: 2.3. Principal Components Using a C
- Page 54 and 55: 2.3. Principal Components Using a C
- Page 56 and 57: 2.3. Principal Components Using a C
- Page 58 and 59: 2.4. Principal Components with Equa
- Page 62 and 63: where3.1. Optimal Algebraic Propert
- Page 64 and 65: 3.2. Geometric Properties of Sample
- Page 66 and 67: 3.2. Geometric Properties of Sample
- Page 68 and 69: 3.2. Geometric Properties of Sample
- Page 70 and 71: 3.3. Covariance and Correlation Mat
- Page 72 and 73: 3.3. Covariance and Correlation Mat
- Page 74 and 75: 3.4. Principal Components with Equa
- Page 76 and 77: show that X = ULA ′ .⎡ULA ′ =
- Page 78 and 79: 3.6. Probability Distributions for
- Page 80 and 81: 3.7. Inference Based on Sample Prin
- Page 82 and 83: 3.7.2 Interval Estimation3.7. Infer
- Page 84 and 85: 3.7. Inference Based on Sample Prin
- Page 86 and 87: 3.7. Inference Based on Sample Prin
- Page 88 and 89: 3.8. Patterned Covariance and Corre
- Page 90 and 91: 3.9. Models for Principal Component
- Page 92 and 93: 3.9. Models for Principal Component
- Page 94 and 95: 4Principal Components as a SmallNum
- Page 96 and 97: 4.1. Anatomical Measurements 65Tabl
- Page 98 and 99: 4.1. Anatomical Measurements 67spac
- Page 100 and 101: 4.2. The Elderly at Home 69Table 4.
- Page 102 and 103: 4.3. Spatial and Temporal Variation
- Page 104 and 105: 4.3. Spatial and Temporal Variation
- Page 106 and 107: 4.4. Properties of Chemical Compoun
- Page 108 and 109: 4.5. Stock Market Prices 77Table 4.
30 3. Properties of Sample <strong>Principal</strong> <strong>Component</strong>sThe seventh section then goes on to show how these distributions may beused to make statistical inferences about the population PCs, based onsample PCs.Section 3.8 demonstrates how the approximate structure and variancesof PCs can sometimes be deduced from patterns in the covariance or correlationmatrix. Finally, in Section 3.9 we discuss models that have beenproposed for PCA. The material could equally well have been included inChapter 2, but because the idea of maximum likelihood estimation arisesin some of the models we include it in the present chapter.3.1 Optimal Algebraic Properties of Sample<strong>Principal</strong> <strong>Component</strong>sBefore looking at the properties themselves, we need to establish somenotation. Suppose that we have n independent observations on the p-element random vector x; denote these n observations by x 1 , x 2 ,...,x n .Let ˜z i1 = a ′ 1x i ,i=1, 2,...,n, and choose the vector of coefficients a ′ 1 tomaximize the sample variance1n − 1n∑(˜z i1 − ¯z 1 ) 2i=1subject to the normalization constraint a ′ 1a 1 = 1. Next let ˜z i2 = a ′ 2x i ,i=1, 2,...,n,andchoosea ′ 2 to maximize the sample variance of the ˜z i2 subjectto the normalization constraint a ′ 2a 2 = 1, and subject also to the ˜z i2 beinguncorrelated with the ˜z i1 in the sample. Continuing this process in anobvious manner, we have a sample version of the definition of PCs given inSection 1.1. Thus a ′ kx is defined as the kth sample PC, k =1, 2,...,p,and˜z ik is the score for the ith observation on the kth PC. If the derivation inSection 1.1 is followed through, but with sample variances and covariancesreplacing population quantities, then it turns out that the sample varianceof the PC scores for the kth sample PC is l k ,thekth largest eigenvalue of thesample covariance matrix S for x 1 , x 2 ,...,x n ,anda k is the correspondingeigenvector for k =1, 2,...,p.Define the (n × p) matrices ˜X and ˜Z to have (i, k)th elements equal tothe value of the kth element ˜x ik of x i , and to ˜z ik , respectively. Then ˜Z and˜X are related by ˜Z = ˜XA, where A is the (p × p) orthogonal matrix whosekth column is a k .If the mean of each element of x is known to be zero, then S = 1 ˜X ′ n˜X.It is far more usual for the mean of x to be unknown, and in this case the(j, k)th element of S is1n − 1n∑(˜x ij − ¯x j )(˜x ik − ¯x k ),i=1