12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

118 6. Choosing a Subset of <strong>Principal</strong> <strong>Component</strong>s or Variablesstrap versions of these rules are used by Jackson (1993) and are discussedfurther in Section 6.1.5. Stauffer et al. (1985) informally compare screeplots from a number of ecological data sets with corresponding plots fromrandom data sets of the same size. They incorporate bootstrap confidenceintervals (see Section 6.1.5) but their main interest is in the stability ofthe eigenvalues (see Section 10.3) rather than the choice of m. Preisendorferand Mobley’s (1988) Rule N, described in Section 6.1.7 also uses ideassimilar to parallel analysis.Turning to the LEV diagram, an example of which is given in Section6.2.2 below, one of the earliest published descriptions was in Craddockand Flood (1969), although, like the scree graph, it had been used routinelyfor some time before this. Craddock and Flood argue that, in meteorology,eigenvalues corresponding to ‘noise’ should decay in a geometric progression,and such eigenvalues will therefore appear as a straight line on theLEV diagram. Thus, to decide on how many PCs to retain, we shouldlook for a point beyond which the LEV diagram becomes, approximately,a straight line. This is the same procedure as in Cattell’s interpretation ofthe scree graph, but the results are different, as we are now plotting log(l k )rather than l k . To justify Craddock and Flood’s procedure, Farmer (1971)generated simulated data with various known structures (or no structure).For purely random data, with all variables uncorrelated, Farmer found thatthe whole of the LEV diagram is approximately a straight line. Furthermore,he showed that if structures of various dimensions are introduced,then the LEV diagram is useful in indicating the correct dimensionality, althoughreal examples, of course, give much less clear-cut results than thoseof simulated data.6.1.4 The Number of <strong>Component</strong>s with Unequal Eigenvaluesand Other Hypothesis Testing ProceduresIn Section 3.7.3 a test, sometimes known as Bartlett’s test, was describedfor the null hypothesisH 0,q : λ q+1 = λ q+2 = ···= λ pagainst the general alternative that at least two of the last (p−q) eigenvaluesare unequal. It was argued that using this test for various values of q, itcan be discovered how many of the PCs contribute substantial amounts ofvariation, and how many are simply ‘noise.’ If m, the required number ofPCs to be retained, is defined as the number of PCs that are not noise,then the test is used sequentially to find m.H 0,p−2 is tested first, that is λ p−1 = λ p ,andifH 0,p−2 is not rejected thenH 0,p−3 is tested. If H 0,p−3 is not rejected, H 0,p−4 is tested next, and thissequence continues until H 0,q is first rejected at q = q ∗ ,say.Thevalueofm is then taken to be q ∗ + 1 (or q ∗ +2 if q ∗ = p − 2). There are a number ofdisadvantages to this procedure, the first of which is that equation (3.7.6)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!