Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
6.1. How Many Principal Components? 125needs to be estimated; an obvious estimate is the average of the (p − q)smallest eigenvalues of S.Besse and de Falguerolles (1993) start from the same fixed effects modeland concentrate on the special case just noted. They modify the lossfunction to becomeL q = 1 ∥∥P q −2ˆP∥ ∥∥2q , (6.1.9)where ˆP q = A q A ′ q, A q is the (p × q) matrix whose kth column is thekth eigenvalue of S, P q is the quantity corresponding to ˆP q for the trueq-dimensional subspace F q ,and‖.‖ denotes Euclidean norm. The loss functionL q measures the distance between the subspace F q and its estimateˆF q spanned by the columns of A q .The risk function that Besse and de Falguerolles (1993) seek to minimizeis R q = E[L q ]. As with f q , R q must be estimated, and Besse andde Falguerolles (1993) compare four computationally intensive ways of doingso, three of which were suggested by Besse (1992), building on ideasfrom Daudin et al. (1988, 1989). Two are bootstrap methods; one is basedon bootstrapping residuals from the q-dimensional model, while the otherbootstraps the data themselves. A third procedure uses a jackknife estimateand the fourth, which requires considerably less computational effort,constructs an approximation to the jackknife.Besse and de Falguerolles (1993) simulate data sets according to the fixedeffects model, with p = 10, q = 4 and varying levels of the noise varianceσ 2 . Because q and σ 2 are known, the true value of R q can be calculated.The four procedures outlined above are compared with the traditional screegraph and Kaiser’s rule, together with boxplots of scores for each principalcomponent. In the latter case a value m is sought such that the boxplotsare much less wide for components (m +1), (m +2),...,p than they arefor components 1, 2,...,m.As the value of σ 2 increases, all of the criteria, new or old, deteriorate intheir performance. Even the true value of R q does not take its minimumvalue at q = 4, although q = 4 gives a local minimum in all the simulations.Bootstrapping of residuals is uninformative regarding the value of q, butthe other three new procedures each have strong local minima at q =4.Allmethods have uninteresting minima at q =1andatq = p, but the jackknifetechniques also have minima at q =6, 7 which become more pronouncedas σ 2 increases. The traditional methods correctly choose q = 4 for smallσ 2 , but become less clear as σ 2 increases.The plots of the risk estimates are very irregular, and both Besse (1992)and Besse and de Falguerolles (1993) note that they reflect the importantfeature of stability of the subspaces retained. Many studies of stability (see,for example, Sections 10.2, 10.3, 11.1 and Besse, 1992) show that pairs ofconsecutive eigenvectors are unstable if their corresponding eigenvalues areof similar size. In a similar way, Besse and de Falguerolles’ (1993) risk
126 6. Choosing a Subset of Principal Components or Variablesestimates depend on the reciprocal of the difference between l m and l m+1where, as before, m is the number of PCs retained. The usual implementationsof the rules of Sections 6.1.1, 6.1.2 ignore the size of gaps betweeneigenvalues and hence do not take stability into account. However, it is advisablewhen using Kaiser’s rule or one of its modifications, or a rule basedon cumulative variance, to treat the threshold with flexibility, and be preparedto move it, if it does not correspond to a good-sized gap betweeneigenvalues.Besse and de Falguerolles (1993) also examine a real data set with p =16and n = 60. Kaiser’s rule chooses m = 5, and the scree graph suggests eitherm =3orm = 5. The bootstrap and jackknife criteria behave similarly toeach other. Ignoring the uninteresting minimum at m = 1, all four methodschoose m = 3, although there are strong secondary minima at m =8andm =5.Another model-based rule is introduced by Bishop (1999) and, eventhough one of its merits is said to be that it avoids cross-validation, itseems appropriate to mention it here. Bishop (1999) proposes a Bayesianframework for Tipping and Bishop’s (1999a) model, which was described inSection 3.9. Recall that under this model the covariance matrix underlyingthe data can be written as BB ′ + σ 2 I p , where B is a (p × q) matrix. Theprior distribution of B in Bishop’s (1999) framework allows B to have itsmaximum possible value of q (= p − 1) under the model. However if theposterior distribution assigns small values for all elements of a column b k ofB, then that dimension is removed. The mode of the posterior distributioncan be found using the EM algorithm.Jackson (1993) discusses two bootstrap versions of ‘parallel analysis,’which was described in general terms in Section 6.1.3. The first, whichis a modification of Kaiser’s rule defined in Section 6.1.2, uses bootstrapsamples from a data set to construct confidence limits for the populationeigenvalues (see Section 3.7.2). Only those components for which thecorresponding 95% confidence interval lies entirely above 1 are retained.Unfortunately, although this criterion is reasonable as a means of decidingthe number of factors in a factor analysis (see Chapter 7), it is inappropriatein PCA. This is because it will not retain PCs dominated by a singlevariable whose correlations with all the other variables are close to zero.Such variables are generally omitted from a factor model, but they provideinformation not available from other variables and so should be retained ifmost of the information in X is to be kept. Jolliffe’s (1972) suggestion ofreducing Kaiser’s threshold from 1 to around 0.7 reflects the fact that weare dealing with PCA and not factor analysis. A bootstrap rule designedwith PCA in mind would retain all those components for which the 95%confidence interval for the corresponding eigenvalue does not lie entirelybelow 1.A second bootstrap approach suggested by Jackson (1993) finds 95%confidence intervals for both eigenvalues and eigenvector coefficients. To
- Page 106 and 107: 4.4. Properties of Chemical Compoun
- Page 108 and 109: 4.5. Stock Market Prices 77Table 4.
- Page 110 and 111: 5. Graphical Representation of Data
- Page 112 and 113: Anatomical Measurements5.1. Plottin
- Page 114 and 115: 5.1. Plotting Two or Three Principa
- Page 116 and 117: 5.2. Principal Coordinate Analysis
- Page 118 and 119: 5.2. Principal Coordinate Analysis
- Page 120 and 121: 5.2. Principal Coordinate Analysis
- Page 122 and 123: 5.3. Biplots 91columns, L is an (r
- Page 124 and 125: 5.3. Biplots 93ButandSubstituting i
- Page 126 and 127: 5.3. Biplots 95The vector gi ∗ co
- Page 128 and 129: 5.3. Biplots 97Figure 5.3. Biplot u
- Page 130 and 131: 5.3. Biplots 99Table 5.2. First two
- Page 132 and 133: 5.3. Biplots 101Figure 5.5. Biplot
- Page 134 and 135: 5.4. Correspondence Analysis 103of
- Page 136 and 137: 5.4. Correspondence Analysis 105Fig
- Page 138 and 139: 5.6. Displaying Intrinsically High-
- Page 140 and 141: 5.6. Displaying Intrinsically High-
- Page 142 and 143: 6Choosing a Subset of PrincipalComp
- Page 144 and 145: 6.1. How Many Principal Components?
- Page 146 and 147: 6.1. How Many Principal Components?
- Page 148 and 149: 6.1. How Many Principal Components?
- Page 150 and 151: 6.1. How Many Principal Components?
- Page 152 and 153: 6.1. How Many Principal Components?
- Page 154 and 155: 6.1. How Many Principal Components?
- Page 158 and 159: 6.1. How Many Principal Components?
- Page 160 and 161: 6.1. How Many Principal Components?
- Page 162 and 163: 6.1. How Many Principal Components?
- Page 164 and 165: 6.2. Choosing m, the Number of Comp
- Page 166 and 167: 6.2. Choosing m, the Number of Comp
- Page 168 and 169: 6.3. Selecting a Subset of Variable
- Page 170 and 171: 6.3. Selecting a Subset of Variable
- Page 172 and 173: 6.3. Selecting a Subset of Variable
- Page 174 and 175: 6.3. Selecting a Subset of Variable
- Page 176 and 177: 6.4. Examples Illustrating Variable
- Page 178 and 179: 6.4. Examples Illustrating Variable
- Page 180 and 181: 6.4. Examples Illustrating Variable
- Page 182 and 183: 7.1. Models for Factor Analysis 151
- Page 184 and 185: 7.2. Estimation of the Factor Model
- Page 186 and 187: 7.2. Estimation of the Factor Model
- Page 188 and 189: 7.2. Estimation of the Factor Model
- Page 190 and 191: 7.3. Comparisons Between Factor and
- Page 192 and 193: 7.4. An Example of Factor Analysis
- Page 194 and 195: 7.4. An Example of Factor Analysis
- Page 196 and 197: 7.5. Concluding Remarks 165To illus
- Page 198 and 199: 8Principal Components in Regression
- Page 200 and 201: 8.1. Principal Component Regression
- Page 202 and 203: 8.1. Principal Component Regression
- Page 204 and 205: 8.2. Selecting Components in Princi
6.1. How Many <strong>Principal</strong> <strong>Component</strong>s? 125needs to be estimated; an obvious estimate is the average of the (p − q)smallest eigenvalues of S.Besse and de Falguerolles (1993) start from the same fixed effects modeland concentrate on the special case just noted. They modify the lossfunction to becomeL q = 1 ∥∥P q −2ˆP∥ ∥∥2q , (6.1.9)where ˆP q = A q A ′ q, A q is the (p × q) matrix whose kth column is thekth eigenvalue of S, P q is the quantity corresponding to ˆP q for the trueq-dimensional subspace F q ,and‖.‖ denotes Euclidean norm. The loss functionL q measures the distance between the subspace F q and its estimateˆF q spanned by the columns of A q .The risk function that Besse and de Falguerolles (1993) seek to minimizeis R q = E[L q ]. As with f q , R q must be estimated, and Besse andde Falguerolles (1993) compare four computationally intensive ways of doingso, three of which were suggested by Besse (1992), building on ideasfrom Daudin et al. (1988, 1989). Two are bootstrap methods; one is basedon bootstrapping residuals from the q-dimensional model, while the otherbootstraps the data themselves. A third procedure uses a jackknife estimateand the fourth, which requires considerably less computational effort,constructs an approximation to the jackknife.Besse and de Falguerolles (1993) simulate data sets according to the fixedeffects model, with p = 10, q = 4 and varying levels of the noise varianceσ 2 . Because q and σ 2 are known, the true value of R q can be calculated.The four procedures outlined above are compared with the traditional screegraph and Kaiser’s rule, together with boxplots of scores for each principalcomponent. In the latter case a value m is sought such that the boxplotsare much less wide for components (m +1), (m +2),...,p than they arefor components 1, 2,...,m.As the value of σ 2 increases, all of the criteria, new or old, deteriorate intheir performance. Even the true value of R q does not take its minimumvalue at q = 4, although q = 4 gives a local minimum in all the simulations.Bootstrapping of residuals is uninformative regarding the value of q, butthe other three new procedures each have strong local minima at q =4.Allmethods have uninteresting minima at q =1andatq = p, but the jackknifetechniques also have minima at q =6, 7 which become more pronouncedas σ 2 increases. The traditional methods correctly choose q = 4 for smallσ 2 , but become less clear as σ 2 increases.The plots of the risk estimates are very irregular, and both Besse (1992)and Besse and de Falguerolles (1993) note that they reflect the importantfeature of stability of the subspaces retained. Many studies of stability (see,for example, Sections 10.2, 10.3, 11.1 and Besse, 1992) show that pairs ofconsecutive eigenvectors are unstable if their corresponding eigenvalues areof similar size. In a similar way, Besse and de Falguerolles’ (1993) risk