12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

176 8. <strong>Principal</strong> <strong>Component</strong>s in Regression <strong>Analysis</strong>so that ˜β is preferred to ˆβ if the expected Euclidean distance between X˜β(the estimate of y) andXβ (the expected value of y) is smaller than thecorresponding distance between Xˆβ and Xβ. An alternative MSE criterionis to look at the distance between each estimate of y and the actual, ratherthan expected, value of y. Thus˜β is preferred to ˆβ ifE[(X˜β − y) ′ (X˜β − y)] ≤ E[(Xˆβ − y) ′ (Xˆβ − y)].Substituting y = Xβ + ɛ it follows thatE[(X˜β − y) ′ (X˜β − y)] = E[(X˜β − Xβ) ′ (X˜β − Xβ)] + nσ 2 ,with a similar expression for ˆβ. At first sight, it seems that this secondcriterion is equivalent to the first. However σ 2 is unknown and, althoughit can be estimated, we may get different estimates when the equation isfitted using ˜β, ˆβ, respectively.Hill et al. (1977) consider several other criteria; further details may befound in their paper, which also describes connections between the variousdecision rules for choosing M and gives illustrative examples. They arguethat the choice of PCs should not be based solely on the size of theirvariance, but little advice is offered on which of their criteria gives an overall‘best’ trade-off between variance and bias; rather, separate circumstancesare identified in which each may be the most appropriate.Gunst and Mason (1979) also consider integrated MSE of predictionsas a criterion for comparing different regression estimators. Friedman andMontgomery (1985) prefer to use the predictive ability for individual observations,rather than averaging this ability over a distribution of potentialobservations as is done by Gunst and Mason (1979).Another way of comparing predicted and observed values of y is by meansof cross-validation. Mertens et al. (1995) use a version of PRESS, definedin equation (6.1.3), as a criterion for deciding how many PCs to retain inPC regression. Their criterion isn∑(y i − ŷ M(i) ) 2 ,i=1where ŷ M(i) is the estimate of y i obtained from a PC regression based ona subset M and using the data matrix X (i) ,whichisX with its ith rowdeleted. They have an efficient algorithm for computing all PCAs with eachobservation deleted in turn, though the algebra that it uses is applicableonly to covariance, not correlation, matrices. Mainly for reasons of convenience,they also restrict their procedure to implementing (8.1.10), ratherthan the more general (8.1.12).Yet another approach to deletion of PCs that takes into account bothvariance and bias is given by Lott (1973). This approach simply calculates

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!