Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
where3.1. Optimal Algebraic Properties of Sample Principal Components 31¯x j = 1 nn∑˜x ij ,i=1The matrix S can therefore be written asj =1, 2,...,p.S = 1n − 1 X′ X, (3.1.1)where X is an (n × p) matrix with (i, j)th element (˜x ij − ¯x j ); the representation(3.1.1) will be very useful in this and subsequent chapters. Thenotation x ij will be used to denote the (i, j)th element of X, so that x ij isthe value of the jth variable measured about its mean ¯x j for the ith observation.A final notational point is that it will be convenient to define thematrix of PC scores asZ = XA, (3.1.2)rather than as it was in the earlier definition. These PC scores will haveexactly the same variances and covariances as those given by ˜Z, but willhave zero means, rather than means ¯z k , k =1, 2,...,p.1Another point to note is that the eigenvectors ofn−1 X′ X and X ′ X are1identical, and the eigenvalues ofn−1 X′ 1X are simplyn−1(the eigenvaluesof X ′ X). Because of these relationships it will be convenient in some placesbelow to work in terms of eigenvalues and eigenvectors of X ′ X, rather thandirectly with those of S.Turning to the algebraic properties A1–A5 listed in Section 2.1, definey i = B ′ x i for i =1, 2,...,n, (3.1.3)where B, as in Properties A1, A2, A4, A5, is a (p×q) matrix whose columnsare orthonormal. Then Properties A1, A2, A4, A5, still hold, but with thesample covariance matrix of the observations y i ,i=1, 2,...,n, replacingΣ y , and with the matrix A now defined as having kth column a k , withA q , A ∗ q, respectively, representing its first and last q columns. Proofs inall cases are similar to those for populations, after making appropriatesubstitutions of sample quantities in place of population quantities, andwill not be repeated. Property A5 reappears as Property G3 in the nextsection and a proof will be given there.The spectral decomposition, Property A3, also holds for samples in theformS = l 1 a 1 a ′ 1 + l 2 a 2 a ′ 2 + ···+ l p a p a ′ p. (3.1.4)The statistical implications of this expression, and the other algebraic properties,A1, A2, A4, A5, are virtually the same as for the correspondingpopulation properties in Section 2.1, except that they must now be viewedin a sample context.
32 3. Properties of Sample Principal ComponentsIn the case of sample correlation matrices, one further reason can be putforward for interest in the last few PCs, as found by Property A2. Raveh(1985) argues that the inverse R −1 of a correlation matrix is of greaterinterest in some situations than R. It may then be more important toapproximate R −1 than R in a few dimensions. If this is done using thespectral decomposition (Property A3) of R −1 , then the first few terms willcorrespond to the last few PCs, since eigenvectors of R and R −1 are thesame, except that their order is reversed. The rôle of the last few PCs willbe discussed further in Sections 3.4 and 3.7, and again in Sections 6.3, 8.4,8.6 and 10.1.One further property, which is concerned with the use of principal componentsin regression, will now be discussed. Standard terminology fromregression is used and will not be explained in detail (see, for example,Draper and Smith (1998)). An extensive discussion of the use of principalcomponents in regression is given in Chapter 8.Property A7. Suppose now that X, defined as above, consists of n observationson p predictor variables x measured about their sample means,and that the corresponding regression equation isy = Xβ + ɛ, (3.1.5)where y is the vector of n observations on the dependent variable, againmeasured about the sample mean. (The notation y for the dependent variablehas no connection with the usage of y elsewhere in the chapter, butis standard in regression.) Suppose that X is transformed by the equationZ = XB, whereB is a (p × p) orthogonal matrix. The regression equationcan then be rewritten asy = Zγ + ɛ,where γ = B −1 β. The usual least squares estimator for γ is ˆγ =(Z ′ Z) −1 Z ′ y. Then the elements of ˆγ have, successively, the smallest possiblevariances if B = A, the matrix whose kth column is the kth eigenvectorof X ′ X, and hence the kth eigenvector of S. ThusZ consists of values ofthe sample principal components for x.Proof. From standard results in regression (Draper and Smith, 1998,Section 5.2) the covariance matrix of the least squares estimator ˆγ isproportional to(Z ′ Z) −1 =(B ′ X ′ XB) −1= B −1 (X ′ X) −1 (B ′ ) −1= B ′ (X ′ X) −1 B,as B is orthogonal. We require tr(B ′ q(X ′ X) −1 B q ), q =1, 2,...,p be minimized,where B q consists of the first q columns of B. But, replacing Σ yby (X ′ X) −1 in Property A2 of Section 2.1 shows that B q must consist of
- Page 11 and 12: xPreface to the First Editionand in
- Page 13 and 14: xiiPreface to the First EditionIn m
- Page 15 and 16: This page intentionally left blank
- Page 17 and 18: xviAcknowledgmentsthese institution
- Page 19 and 20: xviiiContents3.4.1 Example ........
- Page 21 and 22: xxContents10 Outlier Detection, Inf
- Page 23 and 24: This page intentionally left blank
- Page 25 and 26: xxivList of Figures5.2 Artistic qua
- Page 27 and 28: This page intentionally left blank
- Page 29 and 30: xxviiiList of Tables6.1 First six e
- Page 31 and 32: This page intentionally left blank
- Page 33 and 34: 2 1. IntroductionFigure 1.1. Plot o
- Page 35: 4 1. IntroductionFigure 1.3. Studen
- Page 38 and 39: 1.2. A Brief History of Principal C
- Page 40 and 41: 1.2. A Brief History of Principal C
- Page 42 and 43: 2.1. Optimal Algebraic Properties o
- Page 44 and 45: 2.1. Optimal Algebraic Properties o
- Page 46 and 47: 2.1. Optimal Algebraic Properties o
- Page 48 and 49: 2.1. Optimal Algebraic Properties o
- Page 50 and 51: 2.2. Geometric Properties of Popula
- Page 52 and 53: 2.3. Principal Components Using a C
- Page 54 and 55: 2.3. Principal Components Using a C
- Page 56 and 57: 2.3. Principal Components Using a C
- Page 58 and 59: 2.4. Principal Components with Equa
- Page 60 and 61: 3Mathematical and StatisticalProper
- Page 64 and 65: 3.2. Geometric Properties of Sample
- Page 66 and 67: 3.2. Geometric Properties of Sample
- Page 68 and 69: 3.2. Geometric Properties of Sample
- Page 70 and 71: 3.3. Covariance and Correlation Mat
- Page 72 and 73: 3.3. Covariance and Correlation Mat
- Page 74 and 75: 3.4. Principal Components with Equa
- Page 76 and 77: show that X = ULA ′ .⎡ULA ′ =
- Page 78 and 79: 3.6. Probability Distributions for
- Page 80 and 81: 3.7. Inference Based on Sample Prin
- Page 82 and 83: 3.7.2 Interval Estimation3.7. Infer
- Page 84 and 85: 3.7. Inference Based on Sample Prin
- Page 86 and 87: 3.7. Inference Based on Sample Prin
- Page 88 and 89: 3.8. Patterned Covariance and Corre
- Page 90 and 91: 3.9. Models for Principal Component
- Page 92 and 93: 3.9. Models for Principal Component
- Page 94 and 95: 4Principal Components as a SmallNum
- Page 96 and 97: 4.1. Anatomical Measurements 65Tabl
- Page 98 and 99: 4.1. Anatomical Measurements 67spac
- Page 100 and 101: 4.2. The Elderly at Home 69Table 4.
- Page 102 and 103: 4.3. Spatial and Temporal Variation
- Page 104 and 105: 4.3. Spatial and Temporal Variation
- Page 106 and 107: 4.4. Properties of Chemical Compoun
- Page 108 and 109: 4.5. Stock Market Prices 77Table 4.
- Page 110 and 111: 5. Graphical Representation of Data
where3.1. Optimal Algebraic Properties of Sample <strong>Principal</strong> <strong>Component</strong>s 31¯x j = 1 nn∑˜x ij ,i=1The matrix S can therefore be written asj =1, 2,...,p.S = 1n − 1 X′ X, (3.1.1)where X is an (n × p) matrix with (i, j)th element (˜x ij − ¯x j ); the representation(3.1.1) will be very useful in this and subsequent chapters. Thenotation x ij will be used to denote the (i, j)th element of X, so that x ij isthe value of the jth variable measured about its mean ¯x j for the ith observation.A final notational point is that it will be convenient to define thematrix of PC scores asZ = XA, (3.1.2)rather than as it was in the earlier definition. These PC scores will haveexactly the same variances and covariances as those given by ˜Z, but willhave zero means, rather than means ¯z k , k =1, 2,...,p.1Another point to note is that the eigenvectors ofn−1 X′ X and X ′ X are1identical, and the eigenvalues ofn−1 X′ 1X are simplyn−1(the eigenvaluesof X ′ X). Because of these relationships it will be convenient in some placesbelow to work in terms of eigenvalues and eigenvectors of X ′ X, rather thandirectly with those of S.Turning to the algebraic properties A1–A5 listed in Section 2.1, definey i = B ′ x i for i =1, 2,...,n, (3.1.3)where B, as in Properties A1, A2, A4, A5, is a (p×q) matrix whose columnsare orthonormal. Then Properties A1, A2, A4, A5, still hold, but with thesample covariance matrix of the observations y i ,i=1, 2,...,n, replacingΣ y , and with the matrix A now defined as having kth column a k , withA q , A ∗ q, respectively, representing its first and last q columns. Proofs inall cases are similar to those for populations, after making appropriatesubstitutions of sample quantities in place of population quantities, andwill not be repeated. Property A5 reappears as Property G3 in the nextsection and a proof will be given there.The spectral decomposition, Property A3, also holds for samples in theformS = l 1 a 1 a ′ 1 + l 2 a 2 a ′ 2 + ···+ l p a p a ′ p. (3.1.4)The statistical implications of this expression, and the other algebraic properties,A1, A2, A4, A5, are virtually the same as for the correspondingpopulation properties in Section 2.1, except that they must now be viewedin a sample context.