12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.4. Variations on <strong>Principal</strong> <strong>Component</strong> Regression 181are given byf k = −δ 0k η y˜l−1 k( ∑M LRδ 2 0k˜l −1k) −1, (8.4.2)where ηy 2 = ∑ ni=1 (y i − ȳ) 2 ,andδ 0k , ˜l k are as defined above. Note that theleast squares estimator ˆβ can also be written in the form (8.4.1) if M LR in(8.4.1) and (8.4.2) is taken to be the full set of PCs.The full derivation of this expression for f k is fairly lengthy, and canbe found in Webster et al. (1974). It is interesting to note that f k is proportionalto the size of the coefficient of y in the kth PC, and inverselyproportional to the variance of the kth PC; both of these relationships areintuitively reasonable.In order to choose the subset M LR it is necessary to decide not only howsmall the eigenvalues must be in order to indicate multicollinearities, butalso how large the coefficient of y must be in order to indicate a predictivemulticollinearity. Again, these are arbitrary choices, and ad hoc rules havebeen used, for example, by Gunst et al. (1976). A more formal procedurefor identifying non-predictive multicollinearities is described by White andGunst (1979), but its derivation is based on asymptotic properties of thestatistics used in latent root regression.Gunst et al. (1976) compared ˆβ LR and ˆβ in terms of MSE, using asimulation study, for cases of only one multicollinearity, and found thatˆβ LR showed substantial improvement over ˆβ when the multicollinearityis non-predictive. However, in cases where the single multicollinearity hadsome predictive value, the results were, unsurprisingly, less favourable toˆβ LR . Gunst and Mason (1977a) reported a larger simulation study, whichcompared PC, latent root, ridge and shrinkage estimators, again on thebasis of MSE. Overall, latent root estimators did well in many, but not all,situations studied, as did PC estimators, but no simulation study can everbe exhaustive, and different conclusions might be drawn for other types ofsimulated data.Hawkins (1973) also proposed finding PCs for the enlarged set of (p +1)variables, but he used the PCs in a rather different way from that of latentroot regression as defined above. The idea here is to use the PCs themselves,or rather a rotated version of them, to decide upon a suitable regressionequation. Any PC with a small variance gives a relationship between yand the predictor variables whose sum of squared residuals orthogonal tothe fitted plane is small. Of course, in regression it is squared residuals inthe y-direction, rather than orthogonal to the fitted plane, which are tobe minimized (see Section 8.6), but the low-variance PCs can neverthelessbe used to suggest low-variability relationships between y and the predictorvariables. Hawkins (1973) goes further by suggesting that it may bemore fruitful to look at rotated versions of the PCs, instead of the PCsthemselves, in order to indicate low-variance relationships. This is done

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!