Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

8.5. Variable Selection in Regression Using Principal Components 185ridge regression, although this latter conclusion is disputed by S. Wold inthe published discussion that follows the article.Naes and Isaksson (1992) use a locally weighted version of PC regressionin the calibration of spectroscopic data. PCA is done on the predictorvariables, and to form a predictor for a particular observation only the kobservations closest to the chosen observation in the space of the first mPCs are used. These k observations are given weights in a regression of thedependent variable on the first m PCs whose values decrease as distancefrom the chosen observation increases. The values of m and k are chosenby cross-validation, and the technique is shown to outperform both PCregression and PLS.Bertrand et al. (2001) revisit latent root regression, and replace the PCAof the matrix of (p + 1) variables formed by y together with X by theequivalent PCA of y together with the PC scores Z. This makes it easierto identify predictive and non-predictive multicollinearities, and gives asimple expression for the MSE of the latent root estimator. Bertrand et al.(2001) present their version of latent root regression as an alternative toPLS or PC regression for near infrared spectroscopic data.Marx and Smith (1990) extend PC regression from linear models to generalizedlinear models. Straying further from ordinary PCA, Li et al. (2000)discuss principal Hessian directions, which utilize a variety of generalizedPCA (see Section 14.2.2) in a regression context. These directions are usedto define splits in a regression tree, where the objective is to find directionsalong which the regression surface ‘bends’ as much as possible. A weightedcovariance matrix S W is calculated for the predictor variables, where theweights are residuals from a multiple regression of y on all the predictorvariables. Given the (unweighted) covariance matrix S, their derivation ofthe first principal Hessian direction is equivalent to finding the first eigenvectorin a generalized PCA of S W with metric Q = S −1 and D = 1 n I n,inthe notation of Section 14.2.2.8.5 Variable Selection in Regression UsingPrincipal ComponentsPrincipal component regression, latent root regression, and other biased regressionestimates keep all the predictor variables in the model, but changethe estimates from least squares estimates in a way that reduces the effectsof multicollinearity. As mentioned in the introductory section of thischapter, an alternative way of dealing with multicollinearity problems is touse only a subset of the predictor variables. Among the very many possiblemethods of selecting a subset of variables, a few use PCs.As noted in the previous section, the procedures due to Hawkins (1973)and Hawkins and Eplett (1982) can be used in this way. Rotation of the PCs

186 8. Principal Components in Regression Analysisproduces a large number of near-zero coefficients for the rotated variables,so that in low-variance relationships involving y (if such low-variance relationshipsexist) only a subset of the predictor variables will have coefficientssubstantially different from zero. This subset forms a plausible selection ofvariables to be included in a regression model. There may be other lowvariancerelationships between the predictor variables alone, again withrelatively few coefficients far from zero. If such relationships exist, and involvesome of the same variables as are in the relationship involving y,then substitution will lead to alternative subsets of predictor variables.Jeffers (1981) argues that in this way it is possible to identify all good subregressionsusing Hawkins’ (1973) original procedure. Hawkins and Eplett(1982) demonstrate that their newer technique, incorporating Cholesky factorization,can do even better than the earlier method. In particular, for anexample that is analysed by both methods, two subsets of variables selectedby the first method are shown to be inappropriate by the second.Principal component regression and latent root regression may also beused in an iterative manner to select variables. Consider, first, PC regressionand suppose that ˜β given by (8.1.12) is the proposed estimator for β.Then it is possible to test whether or not subsets of the elements of ˜β aresignificantly different from zero, and those variables whose coefficients arefound to be not significantly non-zero can then be deleted from the model.Mansfield et al. (1977), after a moderate amount of algebra, construct theappropriate tests for estimators of the form (8.1.10), that is, where thePCs deleted from the regression are restricted to be those with the smallestvariances. Provided that the true coefficients of the deleted PCs are zeroand that normality assumptions are valid, the appropriate test statisticsare F -statistics, reducing to t-statistics if only one variable is considered ata time. A corresponding result will also hold for the more general form ofestimator (8.1.12).Although the variable selection procedure could stop at this stage, it maybe more fruitful to use an iterative procedure, similar to that suggested byJolliffe (1972) for variable selection in another (non-regression) context (seeSection 6.3, method (i)). The next step in such a procedure is to performa PC regression on the reduced set of variables, and then see if any furthervariables can be deleted from the reduced set, using the same reasoningas before. This process is repeated, until eventually no more variables aredeleted. Two variations on this iterative procedure are described by Mansfieldet al. (1977). The first is a stepwise procedure that first looks for thebest single variable to delete, then the best pair of variables, one of which isthe best single variable, then the best triple of variables, which includes thebest pair, and so on. The procedure stops when the test for zero regressioncoefficients on the subset of excluded variables first gives a significant result.The second variation is to delete only one variable at each stage, and thenrecompute the PCs using the reduced set of variables, rather than allowingthe deletion of several variables before the PCs are recomputed. According

186 8. <strong>Principal</strong> <strong>Component</strong>s in Regression <strong>Analysis</strong>produces a large number of near-zero coefficients for the rotated variables,so that in low-variance relationships involving y (if such low-variance relationshipsexist) only a subset of the predictor variables will have coefficientssubstantially different from zero. This subset forms a plausible selection ofvariables to be included in a regression model. There may be other lowvariancerelationships between the predictor variables alone, again withrelatively few coefficients far from zero. If such relationships exist, and involvesome of the same variables as are in the relationship involving y,then substitution will lead to alternative subsets of predictor variables.Jeffers (1981) argues that in this way it is possible to identify all good subregressionsusing Hawkins’ (1973) original procedure. Hawkins and Eplett(1982) demonstrate that their newer technique, incorporating Cholesky factorization,can do even better than the earlier method. In particular, for anexample that is analysed by both methods, two subsets of variables selectedby the first method are shown to be inappropriate by the second.<strong>Principal</strong> component regression and latent root regression may also beused in an iterative manner to select variables. Consider, first, PC regressionand suppose that ˜β given by (8.1.12) is the proposed estimator for β.Then it is possible to test whether or not subsets of the elements of ˜β aresignificantly different from zero, and those variables whose coefficients arefound to be not significantly non-zero can then be deleted from the model.Mansfield et al. (1977), after a moderate amount of algebra, construct theappropriate tests for estimators of the form (8.1.10), that is, where thePCs deleted from the regression are restricted to be those with the smallestvariances. Provided that the true coefficients of the deleted PCs are zeroand that normality assumptions are valid, the appropriate test statisticsare F -statistics, reducing to t-statistics if only one variable is considered ata time. A corresponding result will also hold for the more general form ofestimator (8.1.12).Although the variable selection procedure could stop at this stage, it maybe more fruitful to use an iterative procedure, similar to that suggested by<strong>Jolliffe</strong> (1972) for variable selection in another (non-regression) context (seeSection 6.3, method (i)). The next step in such a procedure is to performa PC regression on the reduced set of variables, and then see if any furthervariables can be deleted from the reduced set, using the same reasoningas before. This process is repeated, until eventually no more variables aredeleted. Two variations on this iterative procedure are described by Mansfieldet al. (1977). The first is a stepwise procedure that first looks for thebest single variable to delete, then the best pair of variables, one of which isthe best single variable, then the best triple of variables, which includes thebest pair, and so on. The procedure stops when the test for zero regressioncoefficients on the subset of excluded variables first gives a significant result.The second variation is to delete only one variable at each stage, and thenrecompute the PCs using the reduced set of variables, rather than allowingthe deletion of several variables before the PCs are recomputed. According

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!