12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

54 3. Properties of Sample <strong>Principal</strong> <strong>Component</strong>sspherical contours of equal probability, assuming multivariate normality,and the last (p − q) PCs are therefore not individually uniquely defined.By testing H 0q for various values of q it can be decided how many PCs aredistinguishable from ‘noise’ and are therefore worth retaining. This ideafor deciding how many components to retain will be discussed critically inSection 6.1.4. It is particularly relevant if a model similar to those describedin Section 3.9 is assumed for the data.A test statistic for H 0q against a general alternative H 1q can be found byassuming multivariate normality and constructing a likelihood ratio (LR)test. The test statistic takes the form⎧ /⎨ p∏⎡ ⎤p−q⎫ p∑⎬Q = l k⎣ l k /(p − q) ⎦ .⎩⎭n/2k=q+1k=q+1The exact distribution of Q is complicated, but we can use the well-knowngeneral result from statistical inference concerning LR tests, namely that−2ln(Q) has, approximately, a χ 2 distribution with degrees of freedomequal to the difference between the number of independently varying parametersunder H 0q ∪H 1q and under H 0q . Calculating the number of degreesof freedom is non-trivial (Mardia et al., 1979, p. 235), but it turns out tobe ν = 1 2 (p − q + 2)(p − q − 1), so that approximately, under H 0q,⎡⎤p∑n ⎣(p − q)ln(¯l) − ln(l k ) ⎦ ∼ χ 2 ν, (3.7.6)where¯l =p∑k=q+1k=q+1l kp − q .In fact, the approximation can be improved if n is replaced by n ′ = n −(2p + 11)/6, so H 0q is rejected at significance level α if⎡⎤p∑n ′ ⎣(p − q)ln(¯l) − ln(l k ) ⎦ ≥ χ 2 ν;α.k=q+1Another, more complicated, improvement to the approximation is given bySrivastava and Khatri (1979, p. 294). The test is easily adapted so thatthe null hypothesis defines equality of any subset of (p − q) consecutiveeigenvalues, not necessarily the smallest (Flury, 1997, Section 8.6). Anothermodification is to test whether the last (p − q) eigenvalues follow a lineartrend (Bentler and Yuan, 1998). The relevance of this null hypothesis willbe discussed in Section 6.1.4.A special case of the test of the null hypothesis H 0q occurs when q =0,in which case H 0q is equivalent to all the variables being independent and

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!