12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.2. Selecting <strong>Component</strong>s in <strong>Principal</strong> <strong>Component</strong> Regression 173(ii) the columns of U are those eigenvectors of XX ′ that correspond tonon-zero eigenvalues, normalized so that U ′ U = I p .Then Xβ can be rewritten ULA ′ β = Uδ, where δ = LA ′ β,sothatβ = AL −1 δ. The least squares estimator for δ isˆδ =(U ′ U) −1 U ′ y = U ′ y,leading to ˆβ = AL −1ˆδ.The relationship between γ, defined earlier, and δ is straightforward,namelyγ = A ′ β = A ′ (AL −1 δ)=(A ′ A)L −1 δ = L −1 δ,so that setting a subset of elements of δ equal to zero is equivalent to settingthe same subset of elements of γ equal to zero. This result means that theSVD can provide an alternative computational approach for estimating PCregression equations, which is an advantage, as efficient algorithms exist forfinding the SVD of a matrix (see Appendix A1).Interpretation of the results of a PC regression can also be aided by usingthe SVD, as illustrated by Mandel (1982) for artificial data (see also Nelder(1985)).8.2 Strategies for Selecting <strong>Component</strong>s in<strong>Principal</strong> <strong>Component</strong> RegressionWhen choosing the subset M in equation (8.1.12) there are two partiallyconflicting objectives. In order to eliminate large variances due to multicollinearitiesit is essential to delete all those components whose variancesare very small but, at the same time, it is undesirable to delete componentsthat have large correlations with the dependent variable y. One strategyfor choosing M is simply to delete all those components whose variancesare less than l ∗ , where l ∗ is some cut-off level. The choice of l ∗ is ratherarbitrary, but when dealing with correlation matrices, where the averagevalue of the eigenvalues is 1, a value of l ∗ somewhere in the range 0.01 to0.1 seems to be useful in practice.An apparently more sophisticated way of choosing l ∗ is to look at socalledvariance inflation factors (VIFs) for the p predictor variables. TheVIF for the jth variable when using standardized variables is defined asc jj /σ 2 (which equals the jth diagonal element of (X ′ X) −1 —Marquardt,1970), where c jj is the variance of the jth element of the least squaresestimator for β. If all the variables are uncorrelated, then all the VIFsare equal to 1, but if severe multicollinearities exist then the VIFs for ˆβwill be very large for those variables involved in the multicollinearities. Bysuccessively deleting the last few terms in (8.1.8), the VIFs for the resultingbiased estimators will be reduced; deletion continues until all VIFs are

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!