Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

8.4. Variations on Principal Component Regression 181are given byf k = −δ 0k η y˜l−1 k( ∑M LRδ 2 0k˜l −1k) −1, (8.4.2)where ηy 2 = ∑ ni=1 (y i − ȳ) 2 ,andδ 0k , ˜l k are as defined above. Note that theleast squares estimator ˆβ can also be written in the form (8.4.1) if M LR in(8.4.1) and (8.4.2) is taken to be the full set of PCs.The full derivation of this expression for f k is fairly lengthy, and canbe found in Webster et al. (1974). It is interesting to note that f k is proportionalto the size of the coefficient of y in the kth PC, and inverselyproportional to the variance of the kth PC; both of these relationships areintuitively reasonable.In order to choose the subset M LR it is necessary to decide not only howsmall the eigenvalues must be in order to indicate multicollinearities, butalso how large the coefficient of y must be in order to indicate a predictivemulticollinearity. Again, these are arbitrary choices, and ad hoc rules havebeen used, for example, by Gunst et al. (1976). A more formal procedurefor identifying non-predictive multicollinearities is described by White andGunst (1979), but its derivation is based on asymptotic properties of thestatistics used in latent root regression.Gunst et al. (1976) compared ˆβ LR and ˆβ in terms of MSE, using asimulation study, for cases of only one multicollinearity, and found thatˆβ LR showed substantial improvement over ˆβ when the multicollinearityis non-predictive. However, in cases where the single multicollinearity hadsome predictive value, the results were, unsurprisingly, less favourable toˆβ LR . Gunst and Mason (1977a) reported a larger simulation study, whichcompared PC, latent root, ridge and shrinkage estimators, again on thebasis of MSE. Overall, latent root estimators did well in many, but not all,situations studied, as did PC estimators, but no simulation study can everbe exhaustive, and different conclusions might be drawn for other types ofsimulated data.Hawkins (1973) also proposed finding PCs for the enlarged set of (p +1)variables, but he used the PCs in a rather different way from that of latentroot regression as defined above. The idea here is to use the PCs themselves,or rather a rotated version of them, to decide upon a suitable regressionequation. Any PC with a small variance gives a relationship between yand the predictor variables whose sum of squared residuals orthogonal tothe fitted plane is small. Of course, in regression it is squared residuals inthe y-direction, rather than orthogonal to the fitted plane, which are tobe minimized (see Section 8.6), but the low-variance PCs can neverthelessbe used to suggest low-variability relationships between y and the predictorvariables. Hawkins (1973) goes further by suggesting that it may bemore fruitful to look at rotated versions of the PCs, instead of the PCsthemselves, in order to indicate low-variance relationships. This is done

182 8. Principal Components in Regression Analysisby rescaling and then using varimax rotation (see Chapter 7), which hasthe effect of transforming the PCs to a different set of uncorrelated variables.These variables are, like the PCs, linear functions of the original(p + 1) variables, but their coefficients are mostly close to zero or a longway from zero, with relatively few intermediate values. There is no guarantee,in general, that any of the new variables will have particularly large orparticularly small variances, as they are chosen by simplicity of structureof their coefficients, rather than for their variance properties. However, ifonly one or two of the coefficients for y are large, as should often happenwith varimax rotation, then Hawkins (1973) shows that the correspondingtransformed variables will have very small variances, and therefore suggestlow-variance relationships between y and the predictor variables. Otherpossible regression equations may be found by substitution of one subset ofpredictor variables in terms of another, using any low-variability relationshipsbetween predictor variables that are suggested by the other rotatedPCs.The above technique is advocated by Hawkins (1973) and by Jeffers(1981) as a means of selecting which variables should appear in the regressionequation (see Section 8.5), rather than as a way of directly estimatingtheir coefficients in the regression equation, although the technique couldbe used for the latter purpose. Daling and Tamura (1970) also discussedrotation of PCs in the context of variable selection, but their PCs were forthe predictor variables only.In a later paper, Hawkins and Eplett (1982) propose another variant of latentroot regression one which can be used to efficiently find low-variabilityrelationships between y and the predictor variables, and which also can beused in variable selection. This method replaces the rescaling and varimaxrotation of Hawkins’ earlier method by a sequence of rotations leading toa set of relationships between y and the predictor variables that are simplerto interpret than in the previous method. This simplicity is achievedbecause the matrix of coefficients defining the relationships has non-zeroentries only in its lower-triangular region. Despite the apparent complexityof the new method, it is also computationally simple to implement. Thecovariance (or correlation) matrix ˜Σ of y and all the predictor variables isfactorized using a Cholesky factorization˜Σ = DD ′ ,where D is lower-triangular. Then the matrix of coefficients defining therelationships is proportional to D −1 , which is also lower-triangular. To findD it is not necessary to calculate PCs based on ˜Σ, which makes the linksbetween the method and PCA rather more tenuous than those betweenPCA and latent root regression. The next section discusses variable selectionin regression using PCs, and because all three variants of latent rootregression described above can be used in variable selection, they will allbe discussed further in that section.

182 8. <strong>Principal</strong> <strong>Component</strong>s in Regression <strong>Analysis</strong>by rescaling and then using varimax rotation (see Chapter 7), which hasthe effect of transforming the PCs to a different set of uncorrelated variables.These variables are, like the PCs, linear functions of the original(p + 1) variables, but their coefficients are mostly close to zero or a longway from zero, with relatively few intermediate values. There is no guarantee,in general, that any of the new variables will have particularly large orparticularly small variances, as they are chosen by simplicity of structureof their coefficients, rather than for their variance properties. However, ifonly one or two of the coefficients for y are large, as should often happenwith varimax rotation, then Hawkins (1973) shows that the correspondingtransformed variables will have very small variances, and therefore suggestlow-variance relationships between y and the predictor variables. Otherpossible regression equations may be found by substitution of one subset ofpredictor variables in terms of another, using any low-variability relationshipsbetween predictor variables that are suggested by the other rotatedPCs.The above technique is advocated by Hawkins (1973) and by Jeffers(1981) as a means of selecting which variables should appear in the regressionequation (see Section 8.5), rather than as a way of directly estimatingtheir coefficients in the regression equation, although the technique couldbe used for the latter purpose. Daling and Tamura (1970) also discussedrotation of PCs in the context of variable selection, but their PCs were forthe predictor variables only.In a later paper, Hawkins and Eplett (1982) propose another variant of latentroot regression one which can be used to efficiently find low-variabilityrelationships between y and the predictor variables, and which also can beused in variable selection. This method replaces the rescaling and varimaxrotation of Hawkins’ earlier method by a sequence of rotations leading toa set of relationships between y and the predictor variables that are simplerto interpret than in the previous method. This simplicity is achievedbecause the matrix of coefficients defining the relationships has non-zeroentries only in its lower-triangular region. Despite the apparent complexityof the new method, it is also computationally simple to implement. Thecovariance (or correlation) matrix ˜Σ of y and all the predictor variables isfactorized using a Cholesky factorization˜Σ = DD ′ ,where D is lower-triangular. Then the matrix of coefficients defining therelationships is proportional to D −1 , which is also lower-triangular. To findD it is not necessary to calculate PCs based on ˜Σ, which makes the linksbetween the method and PCA rather more tenuous than those betweenPCA and latent root regression. The next section discusses variable selectionin regression using PCs, and because all three variants of latent rootregression described above can be used in variable selection, they will allbe discussed further in that section.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!