12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

20 2. Properties of Population <strong>Principal</strong> <strong>Component</strong>sThe cross-product terms disappear because of the independence of x 1 , x 2 ,and hence of y 1 , y 2 .Now, for i =1, 2, we haveE[(y i − B ′ µ) ′ (y i − B ′ µ)] = E{tr[(y i − B ′ µ) ′ (y i − B ′ µ)]}= E{tr[(y i − B ′ µ)(y i − B ′ µ) ′ ]}=tr{E[(y i − B ′ µ)(y i − B ′ µ) ′ ]}=tr(B ′ ΣB).But tr(B ′ ΣB) is maximized when B = A q , from Property A1, and thepresent criterion has been shown above to be 2 tr(B ′ ΣB). Hence PropertyG2 is proved.✷There is a closely related property whose geometric interpretation is moretenuous, namely that with the same definitions as in Property G2,det{E[(y 1 − y 2 )(y 1 − y 2 ) ′ ]}is maximized when B = A q (see McCabe (1984)). This property says thatB = A q makes the generalized variance of (y 1 − y 2 ) as large as possible.Generalized variance may be viewed as an alternative measure of distanceapart of y 1 and y 2 in q-dimensional space, though a less intuitively obviousmeasure than expected squared Euclidean distance.Finally, Property G2 can be reversed in the sense that if E[(y 1 −y 2 ) ′ (y 1 −y 2 )] or det{E[(y 1 − y 2 )(y 1 − y 2 ) ′ ]} is to be minimized, then this can beachieved by taking B = A ∗ q.The properties given in this section and in the previous one show thatcovariance matrix PCs satisfy several different optimality criteria, but thelist of criteria covered is by no means exhaustive; for example, Devijverand Kittler (1982, Chapter 9) show that the first few PCs minimize representationentropy and the last few PCs minimize population entropy.Diamantaras and Kung (1996, Section 3.4) discuss PCA in terms of maximizingmutual information between x and y. Further optimality criteriaare given by Hudlet and Johnson (1982), McCabe (1984) and Okamoto(1969). The geometry of PCs is discussed at length by Treasure (1986).The property of self-consistency is useful in a non-linear extension ofPCA (see Section 14.1.2). For two p-variate random vectors x, y, the vectory is self-consistent for x if E(x|y) =y. Flury (1997, Section 8.4) shows thatif x is a p-variate random vector with a multivariate normal or ellipticaldistribution, and y is the orthogonal projection of x onto the q-dimensionalsubspace spanned by the first q PCs for x, then y is self-consistent for x.Tarpey (1999) uses self-consistency of principal components after lineartransformation of the variables to characterize elliptical distributions.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!