12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

122 6. Choosing a Subset of <strong>Principal</strong> <strong>Component</strong>s or Variablesfor choosing m. To decide on whether to include the mth PC, Wold (1978)examines the ratioR =PRESS(m)∑ ni=l∑ pj=1 ( (m−1)˜x ij − x ij ) 2 . (6.1.4)This compares the prediction error sum of squares after fitting m components,with the sum of squared differences between observed and estimateddata points based on all the data, using (m − 1) components. If R1, then inclusionof the mth PC is worthwhile, although this cut-off at unity is to be interpretedwith some flexibility. It is certainly not appropriate to stop addingPCs as soon as (6.1.5) first falls below unity, because the criterion is notnecessarily a monotonic decreasing function of m. Because the orderingof the population eigenvalues may not be the same as that of the sampleeigenvalues, especially if consecutive eigenvalues are close, Krzanowski(1987a) considers orders of the components different from those implied bythe sample eigenvalues. For the well-known alate adelges data set (see Section6.4), Krzanowski (1987a) retains components 1–4 in a straightforwardimplementation of W , but he keeps only components 1,2,4 when reorderingsare allowed. In an example with a large number (100) of variables,Krzanowski and Kline (1995) use W in the context of factor analysis andsimply take the number of components with W greater than a threshold,regardless of their position in the ordering of eigenvalues, as an indicator ofthe number of factors to retain. For example, the result where W exceeds0.9 for components 1, 2, 4, 18 and no others is taken to indicate that a4-factor solution is appropriate.It should be noted that although the criteria described in this sectionare somewhat less ad hoc than those of Sections 6.1.1–6.1.3, there is stillno real attempt to set up a formal significance test to decide on m. Someprogress has been made by Krzanowski (1983) in investigating the samplingdistribution of W using simulated data. He points out that there aretwo sources of variability to be considered in constructing such a distribution;namely the variability due to different sample covariance matricesS for a fixed population covariance matrix Σ and the variability due to

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!