Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

where3.1. Optimal Algebraic Properties of Sample Principal Components 31¯x j = 1 nn∑˜x ij ,i=1The matrix S can therefore be written asj =1, 2,...,p.S = 1n − 1 X′ X, (3.1.1)where X is an (n × p) matrix with (i, j)th element (˜x ij − ¯x j ); the representation(3.1.1) will be very useful in this and subsequent chapters. Thenotation x ij will be used to denote the (i, j)th element of X, so that x ij isthe value of the jth variable measured about its mean ¯x j for the ith observation.A final notational point is that it will be convenient to define thematrix of PC scores asZ = XA, (3.1.2)rather than as it was in the earlier definition. These PC scores will haveexactly the same variances and covariances as those given by ˜Z, but willhave zero means, rather than means ¯z k , k =1, 2,...,p.1Another point to note is that the eigenvectors ofn−1 X′ X and X ′ X are1identical, and the eigenvalues ofn−1 X′ 1X are simplyn−1(the eigenvaluesof X ′ X). Because of these relationships it will be convenient in some placesbelow to work in terms of eigenvalues and eigenvectors of X ′ X, rather thandirectly with those of S.Turning to the algebraic properties A1–A5 listed in Section 2.1, definey i = B ′ x i for i =1, 2,...,n, (3.1.3)where B, as in Properties A1, A2, A4, A5, is a (p×q) matrix whose columnsare orthonormal. Then Properties A1, A2, A4, A5, still hold, but with thesample covariance matrix of the observations y i ,i=1, 2,...,n, replacingΣ y , and with the matrix A now defined as having kth column a k , withA q , A ∗ q, respectively, representing its first and last q columns. Proofs inall cases are similar to those for populations, after making appropriatesubstitutions of sample quantities in place of population quantities, andwill not be repeated. Property A5 reappears as Property G3 in the nextsection and a proof will be given there.The spectral decomposition, Property A3, also holds for samples in theformS = l 1 a 1 a ′ 1 + l 2 a 2 a ′ 2 + ···+ l p a p a ′ p. (3.1.4)The statistical implications of this expression, and the other algebraic properties,A1, A2, A4, A5, are virtually the same as for the correspondingpopulation properties in Section 2.1, except that they must now be viewedin a sample context.

32 3. Properties of Sample Principal ComponentsIn the case of sample correlation matrices, one further reason can be putforward for interest in the last few PCs, as found by Property A2. Raveh(1985) argues that the inverse R −1 of a correlation matrix is of greaterinterest in some situations than R. It may then be more important toapproximate R −1 than R in a few dimensions. If this is done using thespectral decomposition (Property A3) of R −1 , then the first few terms willcorrespond to the last few PCs, since eigenvectors of R and R −1 are thesame, except that their order is reversed. The rôle of the last few PCs willbe discussed further in Sections 3.4 and 3.7, and again in Sections 6.3, 8.4,8.6 and 10.1.One further property, which is concerned with the use of principal componentsin regression, will now be discussed. Standard terminology fromregression is used and will not be explained in detail (see, for example,Draper and Smith (1998)). An extensive discussion of the use of principalcomponents in regression is given in Chapter 8.Property A7. Suppose now that X, defined as above, consists of n observationson p predictor variables x measured about their sample means,and that the corresponding regression equation isy = Xβ + ɛ, (3.1.5)where y is the vector of n observations on the dependent variable, againmeasured about the sample mean. (The notation y for the dependent variablehas no connection with the usage of y elsewhere in the chapter, butis standard in regression.) Suppose that X is transformed by the equationZ = XB, whereB is a (p × p) orthogonal matrix. The regression equationcan then be rewritten asy = Zγ + ɛ,where γ = B −1 β. The usual least squares estimator for γ is ˆγ =(Z ′ Z) −1 Z ′ y. Then the elements of ˆγ have, successively, the smallest possiblevariances if B = A, the matrix whose kth column is the kth eigenvectorof X ′ X, and hence the kth eigenvector of S. ThusZ consists of values ofthe sample principal components for x.Proof. From standard results in regression (Draper and Smith, 1998,Section 5.2) the covariance matrix of the least squares estimator ˆγ isproportional to(Z ′ Z) −1 =(B ′ X ′ XB) −1= B −1 (X ′ X) −1 (B ′ ) −1= B ′ (X ′ X) −1 B,as B is orthogonal. We require tr(B ′ q(X ′ X) −1 B q ), q =1, 2,...,p be minimized,where B q consists of the first q columns of B. But, replacing Σ yby (X ′ X) −1 in Property A2 of Section 2.1 shows that B q must consist of

where3.1. Optimal Algebraic Properties of Sample <strong>Principal</strong> <strong>Component</strong>s 31¯x j = 1 nn∑˜x ij ,i=1The matrix S can therefore be written asj =1, 2,...,p.S = 1n − 1 X′ X, (3.1.1)where X is an (n × p) matrix with (i, j)th element (˜x ij − ¯x j ); the representation(3.1.1) will be very useful in this and subsequent chapters. Thenotation x ij will be used to denote the (i, j)th element of X, so that x ij isthe value of the jth variable measured about its mean ¯x j for the ith observation.A final notational point is that it will be convenient to define thematrix of PC scores asZ = XA, (3.1.2)rather than as it was in the earlier definition. These PC scores will haveexactly the same variances and covariances as those given by ˜Z, but willhave zero means, rather than means ¯z k , k =1, 2,...,p.1Another point to note is that the eigenvectors ofn−1 X′ X and X ′ X are1identical, and the eigenvalues ofn−1 X′ 1X are simplyn−1(the eigenvaluesof X ′ X). Because of these relationships it will be convenient in some placesbelow to work in terms of eigenvalues and eigenvectors of X ′ X, rather thandirectly with those of S.Turning to the algebraic properties A1–A5 listed in Section 2.1, definey i = B ′ x i for i =1, 2,...,n, (3.1.3)where B, as in Properties A1, A2, A4, A5, is a (p×q) matrix whose columnsare orthonormal. Then Properties A1, A2, A4, A5, still hold, but with thesample covariance matrix of the observations y i ,i=1, 2,...,n, replacingΣ y , and with the matrix A now defined as having kth column a k , withA q , A ∗ q, respectively, representing its first and last q columns. Proofs inall cases are similar to those for populations, after making appropriatesubstitutions of sample quantities in place of population quantities, andwill not be repeated. Property A5 reappears as Property G3 in the nextsection and a proof will be given there.The spectral decomposition, Property A3, also holds for samples in theformS = l 1 a 1 a ′ 1 + l 2 a 2 a ′ 2 + ···+ l p a p a ′ p. (3.1.4)The statistical implications of this expression, and the other algebraic properties,A1, A2, A4, A5, are virtually the same as for the correspondingpopulation properties in Section 2.1, except that they must now be viewedin a sample context.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!