12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6.1. How Many <strong>Principal</strong> <strong>Component</strong>s? 119is based on the assumption of multivariate normality for x, and is onlyapproximately true even then. The second problem is concerned with thefact that unless H 0,p−2 is rejected, there are several tests to be done, sothat the overall significance level of the sequence of tests is not the sameas the individual significance levels of each test. Furthermore, it is difficultto get even an approximate idea of the overall significance level becausethe number of tests done is not fixed but random, and the tests are notindependent of each other. It follows that, although the testing sequencesuggested above can be used to estimate m, it is dangerous to treat theprocedure as a formal piece of statistical inference, as significance levels areusually unknown. The reverse sequence H 00 ,H 01 ,... canbeusedinsteaduntil the first non-rejection occurs (Jackson, 1991, Section 2.6), but thissuffers from similar problems.The procedure could be added to the list of ad hoc rules, but it hasone further, more practical, disadvantage, namely that in nearly all realexamples it tends to retain more PCs than are really necessary. Bartlett(1950), in introducing the procedure for correlation matrices, refers to itas testing how many of the PCs are statistically significant, but ‘statisticalsignificance’ in the context of these tests does not imply that a PC accountsfor a substantial proportion of the total variation. For correlation matrices,<strong>Jolliffe</strong> (1970) found that the rule often corresponds roughly to choosing acut-off l ∗ of about 0.1 to 0.2 in the method of Section 6.1.2. This is muchsmaller than is recommended in that section, and occurs because definingunimportant PCs as those with variances equal to that of the last PC isnot necessarily a sensible way of finding m. If this definition is acceptable,as it may be if the model of Tipping and Bishop (1999a) (see Section 3.9) isassumed, for example, then the sequential testing procedure may producesatisfactory results, but it is easy to construct examples where the methodgives silly answers. For instance, if there is one near-constant relationshipamong the elements of x, with a much smaller variance than any otherPC, then the procedure rejects H 0,p−2 and declares that all PCs need tobe retained, regardless of how nearly equal are the next few eigenvalues.The method of this section is similar in spirit to, though more formalizedthan, one formulation of the scree graph. Looking for the first ‘shallow’slope in the graph corresponds to looking for the first of two consecutiveeigenvalues that are nearly equal. The scree graph differs from the formaltesting procedure in that it starts from the largest eigenvalue and comparesconsecutive eigenvalues two at a time, whereas the tests start withthe smallest eigenvalues and compare blocks of two, three, four and so on.Another difference is that the ‘elbow’ point is retained in Cattell’s formulationof the scree graph, but excluded in the testing procedure. The screegraph is also more subjective but, as has been stated above, the objectivityof the testing procedure is something of an illusion.Cattell’s original formulation of the scree graph differs from the abovesince it is differences l k−1 − l k , rather than l k , which must be equal beyond

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!