12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.5. Variable Selection in Regression Using <strong>Principal</strong> <strong>Component</strong>s 187to Mansfield et al. (1977) this second variation gives, for several examples,an improved performance for the selected variables compared with subsetsselected by the other possibilities. Only one example is described in detailin their paper, and this will be discussed further in the final section ofthe present chapter. In this example, they adapt their method still furtherby discarding a few low variance PCs before attempting any selection ofvariables.A different iterative procedure is described by Boneh and Mendieta(1994). The method works on standardized variables, and hence on thecorrelation matrix. The first step is to do a PC regression and choose Mto contain those PCs that contribute significantly to the regression. Significanceis judged mainly by the use of t-tests. However, a modification isused for the PCs with the smallest variance, as Mason and Gunst (1985)have shown that t-tests have reduced power for such PCs.Each of the p predictor variables is then regressed on the PCs in M,and the variable with the smallest residual sum of squares is selected. Atsubsequent stages in the iteration, suppose that a set Q of q variables hasbeen selected and that ¯Q is the complement of Q, consisting of (p − q)variables. The variables in ¯Q are individually regressed on all the variablesin Q, and a vector of residuals is found for each variable in ¯Q. <strong>Principal</strong>components are then found for the (p − q) residual variables and the dependentvariable y is regressed on these (p − q) PCs, together with theq variables in Q. If none of the PCs contributes significantly to this regression,the procedure stops. Otherwise, each of the residual variables isregressed on the significant PCs, and the variable is selected for which theresidual sum of squares is smallest. As well as these forward selection steps,Boneh and Mendieta’s (1994) procedure includes backward looks, in whichpreviously selected variables can be deleted from (and never allowed to returnto) Q. Deletion of a variable occurs if its contribution is sufficientlydiminished by the later inclusion of other variables. Boneh and Mendieta(1994) claim that, using cross-validation, their method often does betterthan its competitors with respect to prediction error.A similar procedure to that of Mansfield et al. (1977) for PC regressioncan be constructed for latent root regression, this time leading to approximateF -statistics (see Gunst and Mason (1980, p. 339)). Such a procedureis described and illustrated by Webster et al. (1974) and Gunst et al. (1976).Baskerville and Toogood (1982) also suggest that the PCs appearing inlatent root regression can be used to select subsets of the original predictorvariables. Their procedure divides the predictor variables into four groupson the basis of their coefficients in the PCs, where each of the groups hasa different degree of potential usefulness in the regression equation. Thefirst group of predictor variables they define consists of ‘isolated’ variables,which are virtually uncorrelated with y and with all other predictor variables;such variables can clearly be deleted. The second and third groupscontain variables that are involved in nonpredictive and predictive multi-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!