12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

184 8. <strong>Principal</strong> <strong>Component</strong>s in Regression <strong>Analysis</strong>called continuum regression, in which least squares and PC regression aretwo extremes of the class, and PLS lies halfway along the continuum inbetween them. As well as different algorithms and interpretations, PLS issometimes known by a quite different name, albeit with the same acronym,in the field of statistical process control (see Section 13.7). Martin et al.(1999), for example, refer to it as ‘projection to latent structure.’Lang et al. (1998) define another general class of regression estimates,called cyclic subspace regression, which includes both PC regression andPLS as special cases. The nature of the special cases within this frameworkshows that PLS uses information from the directions of all the eigenvectorsof X ′ X, whereas PC regression, by definition, uses information from onlya chosen subset of these directions.Naes and Helland (1993) propose a compromise between PC regressionand PLS, which they call restricted principal component regression(RPCR). The motivation behind the method lies in the idea of components(where ‘component’ means any linear function of the predictor variables x)or subspaces that are ‘relevant’ for predicting y. Anm-dimensional subspaceM in the space of the predictor variables is strongly relevant if thelinear functions of x defining the (p − m)-dimensional subspace ¯M, orthogonalto M, are uncorrelated with y and with the linear functions ofx defining M. Using this definition, if an m-dimensional relevant subspaceexists it can be obtained by taking the first component found by PLS as thefirst component in this subspace, followed by (m − 1) components, whichcan be considered as PCs in the space orthogonal to the first PLS component.Naes and Helland (1993) show, in terms of predictive ability, thatwhen PC regression and PLS differ considerably in performance, RPCRtends to be close to the better of the two. Asymptotic comparisons betweenPLS, RPCR and PC regression (with M restricted to contain thefirst m integers) are made by Helland and Almøy (1994). Their conclusionsare that PLS is preferred in many circumstances, although in some casesPC regression is a better choice.A number of other comparisons have been made between least squares,PC regression, PLS and other biased regression techniques, and adaptationsinvolving one or more of the biased methods have been suggested.A substantial proportion of this literature is in chemometrics, in particularconcentrating on the analysis of spectroscopic data. Naes et al. (1986)find that PLS tends to be superior to PC regression, although only therule based on (8.1.10) is considered for PC regression. For near infraredspectroscopy data, the researchers also find that results are improved bypre-processing the data using an alternative technique which they call multiplescatter correction, rather than simple centering. Frank and Friedman(1993) give an extensive comparative discussion of PLS and PC regression,together with other strategies for overcoming the problems caused bymulticollinearity. From simulations and other considerations they concludethat the two techniques are superior to variable selection but inferior to

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!