Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

6.1. How Many Principal Components? 125needs to be estimated; an obvious estimate is the average of the (p − q)smallest eigenvalues of S.Besse and de Falguerolles (1993) start from the same fixed effects modeland concentrate on the special case just noted. They modify the lossfunction to becomeL q = 1 ∥∥P q −2ˆP∥ ∥∥2q , (6.1.9)where ˆP q = A q A ′ q, A q is the (p × q) matrix whose kth column is thekth eigenvalue of S, P q is the quantity corresponding to ˆP q for the trueq-dimensional subspace F q ,and‖.‖ denotes Euclidean norm. The loss functionL q measures the distance between the subspace F q and its estimateˆF q spanned by the columns of A q .The risk function that Besse and de Falguerolles (1993) seek to minimizeis R q = E[L q ]. As with f q , R q must be estimated, and Besse andde Falguerolles (1993) compare four computationally intensive ways of doingso, three of which were suggested by Besse (1992), building on ideasfrom Daudin et al. (1988, 1989). Two are bootstrap methods; one is basedon bootstrapping residuals from the q-dimensional model, while the otherbootstraps the data themselves. A third procedure uses a jackknife estimateand the fourth, which requires considerably less computational effort,constructs an approximation to the jackknife.Besse and de Falguerolles (1993) simulate data sets according to the fixedeffects model, with p = 10, q = 4 and varying levels of the noise varianceσ 2 . Because q and σ 2 are known, the true value of R q can be calculated.The four procedures outlined above are compared with the traditional screegraph and Kaiser’s rule, together with boxplots of scores for each principalcomponent. In the latter case a value m is sought such that the boxplotsare much less wide for components (m +1), (m +2),...,p than they arefor components 1, 2,...,m.As the value of σ 2 increases, all of the criteria, new or old, deteriorate intheir performance. Even the true value of R q does not take its minimumvalue at q = 4, although q = 4 gives a local minimum in all the simulations.Bootstrapping of residuals is uninformative regarding the value of q, butthe other three new procedures each have strong local minima at q =4.Allmethods have uninteresting minima at q =1andatq = p, but the jackknifetechniques also have minima at q =6, 7 which become more pronouncedas σ 2 increases. The traditional methods correctly choose q = 4 for smallσ 2 , but become less clear as σ 2 increases.The plots of the risk estimates are very irregular, and both Besse (1992)and Besse and de Falguerolles (1993) note that they reflect the importantfeature of stability of the subspaces retained. Many studies of stability (see,for example, Sections 10.2, 10.3, 11.1 and Besse, 1992) show that pairs ofconsecutive eigenvectors are unstable if their corresponding eigenvalues areof similar size. In a similar way, Besse and de Falguerolles’ (1993) risk

126 6. Choosing a Subset of Principal Components or Variablesestimates depend on the reciprocal of the difference between l m and l m+1where, as before, m is the number of PCs retained. The usual implementationsof the rules of Sections 6.1.1, 6.1.2 ignore the size of gaps betweeneigenvalues and hence do not take stability into account. However, it is advisablewhen using Kaiser’s rule or one of its modifications, or a rule basedon cumulative variance, to treat the threshold with flexibility, and be preparedto move it, if it does not correspond to a good-sized gap betweeneigenvalues.Besse and de Falguerolles (1993) also examine a real data set with p =16and n = 60. Kaiser’s rule chooses m = 5, and the scree graph suggests eitherm =3orm = 5. The bootstrap and jackknife criteria behave similarly toeach other. Ignoring the uninteresting minimum at m = 1, all four methodschoose m = 3, although there are strong secondary minima at m =8andm =5.Another model-based rule is introduced by Bishop (1999) and, eventhough one of its merits is said to be that it avoids cross-validation, itseems appropriate to mention it here. Bishop (1999) proposes a Bayesianframework for Tipping and Bishop’s (1999a) model, which was described inSection 3.9. Recall that under this model the covariance matrix underlyingthe data can be written as BB ′ + σ 2 I p , where B is a (p × q) matrix. Theprior distribution of B in Bishop’s (1999) framework allows B to have itsmaximum possible value of q (= p − 1) under the model. However if theposterior distribution assigns small values for all elements of a column b k ofB, then that dimension is removed. The mode of the posterior distributioncan be found using the EM algorithm.Jackson (1993) discusses two bootstrap versions of ‘parallel analysis,’which was described in general terms in Section 6.1.3. The first, whichis a modification of Kaiser’s rule defined in Section 6.1.2, uses bootstrapsamples from a data set to construct confidence limits for the populationeigenvalues (see Section 3.7.2). Only those components for which thecorresponding 95% confidence interval lies entirely above 1 are retained.Unfortunately, although this criterion is reasonable as a means of decidingthe number of factors in a factor analysis (see Chapter 7), it is inappropriatein PCA. This is because it will not retain PCs dominated by a singlevariable whose correlations with all the other variables are close to zero.Such variables are generally omitted from a factor model, but they provideinformation not available from other variables and so should be retained ifmost of the information in X is to be kept. Jolliffe’s (1972) suggestion ofreducing Kaiser’s threshold from 1 to around 0.7 reflects the fact that weare dealing with PCA and not factor analysis. A bootstrap rule designedwith PCA in mind would retain all those components for which the 95%confidence interval for the corresponding eigenvalue does not lie entirelybelow 1.A second bootstrap approach suggested by Jackson (1993) finds 95%confidence intervals for both eigenvalues and eigenvector coefficients. To

6.1. How Many <strong>Principal</strong> <strong>Component</strong>s? 125needs to be estimated; an obvious estimate is the average of the (p − q)smallest eigenvalues of S.Besse and de Falguerolles (1993) start from the same fixed effects modeland concentrate on the special case just noted. They modify the lossfunction to becomeL q = 1 ∥∥P q −2ˆP∥ ∥∥2q , (6.1.9)where ˆP q = A q A ′ q, A q is the (p × q) matrix whose kth column is thekth eigenvalue of S, P q is the quantity corresponding to ˆP q for the trueq-dimensional subspace F q ,and‖.‖ denotes Euclidean norm. The loss functionL q measures the distance between the subspace F q and its estimateˆF q spanned by the columns of A q .The risk function that Besse and de Falguerolles (1993) seek to minimizeis R q = E[L q ]. As with f q , R q must be estimated, and Besse andde Falguerolles (1993) compare four computationally intensive ways of doingso, three of which were suggested by Besse (1992), building on ideasfrom Daudin et al. (1988, 1989). Two are bootstrap methods; one is basedon bootstrapping residuals from the q-dimensional model, while the otherbootstraps the data themselves. A third procedure uses a jackknife estimateand the fourth, which requires considerably less computational effort,constructs an approximation to the jackknife.Besse and de Falguerolles (1993) simulate data sets according to the fixedeffects model, with p = 10, q = 4 and varying levels of the noise varianceσ 2 . Because q and σ 2 are known, the true value of R q can be calculated.The four procedures outlined above are compared with the traditional screegraph and Kaiser’s rule, together with boxplots of scores for each principalcomponent. In the latter case a value m is sought such that the boxplotsare much less wide for components (m +1), (m +2),...,p than they arefor components 1, 2,...,m.As the value of σ 2 increases, all of the criteria, new or old, deteriorate intheir performance. Even the true value of R q does not take its minimumvalue at q = 4, although q = 4 gives a local minimum in all the simulations.Bootstrapping of residuals is uninformative regarding the value of q, butthe other three new procedures each have strong local minima at q =4.Allmethods have uninteresting minima at q =1andatq = p, but the jackknifetechniques also have minima at q =6, 7 which become more pronouncedas σ 2 increases. The traditional methods correctly choose q = 4 for smallσ 2 , but become less clear as σ 2 increases.The plots of the risk estimates are very irregular, and both Besse (1992)and Besse and de Falguerolles (1993) note that they reflect the importantfeature of stability of the subspaces retained. Many studies of stability (see,for example, Sections 10.2, 10.3, 11.1 and Besse, 1992) show that pairs ofconsecutive eigenvectors are unstable if their corresponding eigenvalues areof similar size. In a similar way, Besse and de Falguerolles’ (1993) risk

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!