12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.2. Geometric Properties of Population <strong>Principal</strong> <strong>Component</strong>s 19for most other properties of PCs no distributional assumptions are required.However, the property will be discussed further in connection with PropertyG5 in Section 3.2, where we see that it has some relevance even withoutthe assumption of multivariate normality. Property G5 looks at the sampleversion of the ellipsoids x ′ Σx = const. Because Σ and Σ −1 share the sameeigenvectors, it follows that the principal axes of the ellipsoids x ′ Σx =constare the same as those of x ′ Σ −1 x = const, except that that their order isreversed.We digress slightly here to note that some authors imply, or even stateexplicitly, as do Qian et al. (1994), that PCA needs multivariate normality.This text takes a very different view and considers PCA as a mainlydescriptive technique. It will become apparent that many of the propertiesand applications of PCA and related techniques described in later chapters,as well as the properties discussed in the present chapter, have noneed for explicit distributional assumptions. It cannot be disputed thatlinearity and covariances/correlations, both of which play a central rôle inPCA, have especial relevance when distributions are multivariate normal,but this does not detract from the usefulness of PCA when data have otherforms. Qian et al. (1994) describe what might be considered an additionalproperty of PCA, based on minimum description length or stochastic complexity(Rissanen and Yu, 2000), but as they use it to define a somewhatdifferent technique, we defer discussion to Section 14.4.Property G2. Suppose that x 1 , x 2 are independent random vectors, bothhaving the same probability distribution, and that x 1 , x 2 , are both subjectedto the same linear transformationy i = B ′ x i , i =1, 2.If B is a (p × q) matrix with orthonormal columns chosen to maximizeE[(y 1 − y 2 ) ′ (y 1 − y 2 )], then B = A q , using the same notation as before.Proof. This result could be viewed as a purely algebraic property, and,indeed, the proof below is algebraic. The property is, however, includedin the present section because it has a geometric interpretation. This isthat the expected squared Euclidean distance, in a q-dimensional subspace,between two vectors of p random variables with the same distribution, ismade as large as possible if the subspace is defined by the first q PCs.To prove Property G2, first note that x 1 , x 2 have the same mean µ andcovariance matrix Σ. Hence y 1 , y 2 also have the same mean and covariancematrix, B ′ µ, B ′ ΣB respectively.E[(y 1 − y 2 ) ′ (y 1 − y 2 )] = E{[(y 1 − B ′ µ) − (y 2 − (B ′ µ)] ′ [(y 1 − B ′ µ)− (y 2 − B ′ µ)]}= E[(y 1 − B ′ µ) ′ (y 1 − B ′ µ)]+ E[(y 2 − B ′ µ) ′ (y 2 − B ′ µ)].

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!