12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.2. Selecting <strong>Component</strong>s in <strong>Principal</strong> <strong>Component</strong> Regression 175positively correlated with each regressor variable it is strongly correlatedwith the first PC. Hadi and Ling (1998) (see also Cuadras (1998)) definePC regression in terms of equation (8.1.10), and argue that the techniqueis flawed because predictive low-variance PCs may be excluded. With themore general definition of PC regression, based on (8.1.12), this criticismdisappears.In contrast to selection based solely on size of variance, the oppositeextreme is to base selection only on values of t-statistics measuring the(independent) contribution of each PC to the regression equation. This, too,has its pitfalls. Mason and Gunst (1985) showed that t-tests for low-variancePCs have reduced power compared to those for high-variance components,and so are less likely to be selected. A compromise between selection on thebasis of variance and on the outcome of t-tests is to delete PCs sequentiallystarting with the smallest variance, then the next smallest variance andso on; deletion stops when the first significant t-value is reached. Such astrategy is likely to retain more PCs than are really necessary.Hill et al. (1977) give a comprehensive discussion of various, more sophisticated,strategies for deciding which PCs to delete from the regressionequation. Their criteria are of two main types, depending on whether theprimary objective is to get ˜β close to β, ortogetX˜β, the estimate of y,close to y or to E(y). In the first case, estimation of β is the main interest;in the second it is prediction of y which is the chief concern. Whether ornot ˜β is an improvement on ˆβ is determined for several of the criteria bylooking at mean square error (MSE) so that variance and bias are bothtaken into account.More specifically, two criteria are suggested of the first type, the ‘weak’and ‘strong’ criteria. The weak criterion, due to Wallace (1972), prefers ˜βto ˆβ if tr[MSE(˜β)] ≤ tr[MSE(ˆβ)], where MSE(˜β) is the matrix E[(˜β −β)(˜β − β) ′ ], with a similar definition for the matrix MSE(ˆβ). This simplymeans that ˜β is preferred when the expected Euclidean distance between˜β and β is smaller than that between ˆβ and β.The strong criterion insists thatMSE(c ′ ˜β) ≤ MSE(c′ ˆβ)for every non-zero p-element vector c, whereMSE(c ′ ˜β) =E[(c′ ˜β − c ′ β) 2 ],with, again, a similar definition for MSE(c ′ ˆβ).Among those criteria of the second type (where prediction of y ratherthan estimation of β is the main concern) that are considered by Hill et al.(1977), there are again two which use MSE. The first is also due to Wallace(1972) and is again termed a ‘weak’ criterion. It prefers ˜β to ˆβ ifE[(X˜β − Xβ) ′ (X˜β − Xβ)] ≤ E[(Xˆβ − Xβ) ′ (Xˆβ − Xβ)],

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!