Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

3.2. Geometric Properties of Sample Principal Components 37mation for which B = A q minimizes the distortion in the configuration asmeasured by ‖YY ′ − XX ′ ‖,where‖·‖ denotes Euclidean norm and Y isa matrix with (i, j)th element ỹ ij − ȳ j .Proof.Y = XB, soYY ′ = XBB ′ X and ‖YY ′ − XX ′ ‖ = ‖XBB ′ X ′ − XX ′ ‖.A matrix result given by Rao (1973, p. 63) states that if F is a symmetricmatrix of rank p with spectral decompositionF = f 1 φ 1 φ ′ 1 + f 2 φ 2 φ ′ 2 + ···+ f p φ p φ ′ p,and G is a matrix of rank q

38 3. Properties of Sample Principal Components====p∑k=q+1p∑k=q+1p∑k=q+1p∑k=q+1l k ‖a k a ′ k‖l k⎡⎣l k⎡⎣l k ,p∑i=1 j=1p∑⎤p∑(a ki a kj ) 2 ⎦p∑a 2 ki a 2 kji=1 j=1as a ′ k a k =1, k =1, 2,...,p.Property G4 is very similar to another optimality property of PCs, discussedin terms of the so-called RV-coefficient by Robert and Escoufier(1976). The RV-coefficient was introduced as a measure of the similaritybetween two configurations of n data points, as described by XX ′ andYY ′ . The distance between the two configurations is defined by Robertand Escoufier (1976) as∥ XX ′∥{tr(XX ′ ) 2 } − YY ′ ∥∥∥, (3.2.1)1/2 {tr(YY ′ ) 2 } 1/2where the divisors of XX ′ , YY ′ are introduced simply to standardize therepresentation of each configuration in the sense that∥ ∥ XX ′ ∥∥∥ ∥=YY ′ ∥∥∥{tr(XX ′ ) 2 } 1/2 ∥=1.{tr(YY ′ ) 2 } 1/2It can then be shown that (3.2.1) equals [2(1 − RV(X, Y))] 1/2 , where theRV-coefficient is defined astr(XY ′ YX ′ )RV(X, Y) =. (3.2.2){tr(XX ′ ) 2 tr(YY ′ ) 2 }1/2Thus, minimizing the distance measure (3.2.1) which, apart from standardizations,is the same as the criterion of Property G4, is equivalent tomaximization of RV(X, Y). Robert and Escoufier (1976) show that severalmultivariate techniques can be expressed in terms of maximizing RV(X, Y)for some definition of X and Y. In particular, if Y is restricted to be ofthe form Y = XB, where B is a (p × q) matrix such that the columns ofY are uncorrelated, then maximization of RV(X, Y) leads to B = A q , thatis Y consists of scores on the first q PCs. We will meet the RV-coefficientagain in Chapter 6 in the context of variable selection.Property G5. The algebraic derivation of sample PCs reduces to finding,successively, vectors a k ,k=1, 2,...,p, that maximize a ′ k Sa k subjectto a ′ k a k =1,andsubjecttoa ′ k a l =0for l

38 3. Properties of Sample <strong>Principal</strong> <strong>Component</strong>s====p∑k=q+1p∑k=q+1p∑k=q+1p∑k=q+1l k ‖a k a ′ k‖l k⎡⎣l k⎡⎣l k ,p∑i=1 j=1p∑⎤p∑(a ki a kj ) 2 ⎦p∑a 2 ki a 2 kji=1 j=1as a ′ k a k =1, k =1, 2,...,p.Property G4 is very similar to another optimality property of PCs, discussedin terms of the so-called RV-coefficient by Robert and Escoufier(1976). The RV-coefficient was introduced as a measure of the similaritybetween two configurations of n data points, as described by XX ′ andYY ′ . The distance between the two configurations is defined by Robertand Escoufier (1976) as∥ XX ′∥{tr(XX ′ ) 2 } − YY ′ ∥∥∥, (3.2.1)1/2 {tr(YY ′ ) 2 } 1/2where the divisors of XX ′ , YY ′ are introduced simply to standardize therepresentation of each configuration in the sense that∥ ∥ XX ′ ∥∥∥ ∥=YY ′ ∥∥∥{tr(XX ′ ) 2 } 1/2 ∥=1.{tr(YY ′ ) 2 } 1/2It can then be shown that (3.2.1) equals [2(1 − RV(X, Y))] 1/2 , where theRV-coefficient is defined astr(XY ′ YX ′ )RV(X, Y) =. (3.2.2){tr(XX ′ ) 2 tr(YY ′ ) 2 }1/2Thus, minimizing the distance measure (3.2.1) which, apart from standardizations,is the same as the criterion of Property G4, is equivalent tomaximization of RV(X, Y). Robert and Escoufier (1976) show that severalmultivariate techniques can be expressed in terms of maximizing RV(X, Y)for some definition of X and Y. In particular, if Y is restricted to be ofthe form Y = XB, where B is a (p × q) matrix such that the columns ofY are uncorrelated, then maximization of RV(X, Y) leads to B = A q , thatis Y consists of scores on the first q PCs. We will meet the RV-coefficientagain in Chapter 6 in the context of variable selection.Property G5. The algebraic derivation of sample PCs reduces to finding,successively, vectors a k ,k=1, 2,...,p, that maximize a ′ k Sa k subjectto a ′ k a k =1,andsubjecttoa ′ k a l =0for l

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!