12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.4. Examples Illustrating Variable Selection 145examples given by Tanaka and Mori (1997) the decision on when to stopdeleting variables appears to be rather subjective.Mori et al. (1999) propose that the subsets selected in modified PCAare also assessed by means of a PRESS criterion, similar to that defined inequation (6.1.3), except that m˜x ij is replaced by the prediction of x ij foundfrom modified PCA with the ith observation omitted. Mori et al. (2000)demonstrate a procedure in which the PRESS citerion is used directly toselect variables, rather than as a supplement to another criterion. Tanakaand Mori (1997) show how to evaluate the influence of variables on parametersin a PCA (see Section 10.2 for more on influence), and Mori et al.(2000) implement and illustrate a backward-elimination variable selectionalgorithm in which variables with the smallest influence are successivelyremoved.Hawkins and Eplett (1982) describe a method which can be used forselecting a subset of variables in regression; their technique and an earlierone introduced by Hawkins (1973) are discussed in Sections 8.4 and8.5. Hawkins and Eplett (1982) note that their method is also potentiallyuseful for selecting a subset of variables in situations other than multipleregression, but, as with the RV-coefficient, no numerical example is givenin the original paper. Krzanowski (1987a,b) describes a methodology, usingprincipal components together with Procrustes rotation for selectingsubsets of variables. As his main objective is preserving ‘structure’ such asgroups in the data, we postpone detailed discussion of his technique untilSection 9.2.2.6.4 Examples Illustrating Variable SelectionTwo examples are presented here; two other relevant examples are given inSection 8.7.6.4.1 Alate adelges (Winged Aphids)These data were first presented by Jeffers (1967) and comprise 19 differentvariables measured on 40 winged aphids. A description of the variables,together with the correlation matrix and the coefficients of the first fourPCs based on the correlation matrix, is given by Jeffers (1967) and willnot be reproduced here. For 17 of the 19 variables all of the correlationcoefficients are positive, reflecting the fact that 12 variables are lengthsor breadths of parts of each individual, and some of the other (discrete)variables also measure aspects of the size of each aphid. Not surprisingly,the first PC based on the correlation matrix accounts for a large proportion(73.0%) of the total variation, and this PC is a measure of overall size ofeach aphid. The second PC, accounting for 12.5% of total variation, has its

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!