12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.7. Examples of <strong>Principal</strong> <strong>Component</strong>s in Regression 191could be measured fairly straightforwardly without destroying the props.The variables are listed by Jeffers (1967, 1981) and the correlation matrixfor all 14 variables is reproduced in Table 8.2. In his original paper, Jeffers(1967) used PC regression to predict y from the 13 variables. The coefficientsof the variables for each of the PCs are given in Table 8.3. Thepattern of correlations in Table 8.2 is not easy to interpret; nor is it simpleto deduce the form of the first few PCs from the correlation matrix.However, Jeffers (1967) was able to interpret the first six PCs.Also given in Table 8.3 are variances of each component, the percentageof total variation accounted for by each component, the coefficients γ k ina regression of y on the PCs, and the values of t-statistics measuring theimportance of each PC in the regression.Judged solely on the basis of size of variance it appears that the lastthree, or possibly four, PCs should be deleted from the regression. However,looking at values of γ k and the corresponding t-statistics, it can be seen thatthe twelfth component is relatively important as a predictor of y, despitethe fact that it accounts for only 0.3% of the total variation in the predictorvariables. Jeffers (1967) only retained the first, second, third, fifth and sixthPCs in his regression equation, whereas Mardia et al. (1979, p. 246) suggestthat the seventh, eighth and twelfth PCs should also be included.This example has been used by various authors to illustrate techniquesof variable selection, and some of the results are given in Table 8.4. Jeffers(1981) used Hawkins’ (1973) variant of latent root regression to select subsetsof five, six or seven regressor variables. After varimax rotation, onlyone of the rotated components has a substantial coefficient for compressivestrength, y. This rotated component has five other variables that have largecoefficients, and it is suggested that these should be included in the regressionequation for y; two further variables with moderate coefficients mightalso be included. One of the five variables definitely selected by this methodis quite difficult to measure, and one of the other rotated components suggeststhat it can be replaced by another, more readily measured, variable.However, this substitution causes a substantial drop in the squared multiplecorrelation for the five-variable regression equation, from 0.695 to 0.581.Mansfield et al. (1977) used an iterative method based on PC regressionand described above in Section 8.5, to select a subset of variables for thesedata. The procedure is fairly lengthy as only one variable is deleted at eachiteration, but the F -criterion used to decide whether to delete an extravariable jumps from 1.1 to 7.4 between the fifth and sixth iterations, givinga clear-cut decision to delete five variables, that is to retain eight variables.The iterative procedure of Boneh and Mendieta (1994) also selects eightvariables. As can be seen from Table 8.4, these eight-variable subsets havea large degree of overlap with the subsets found by Jeffers (1981).<strong>Jolliffe</strong> (1973) also found subsets of the 13 variables, using various methods,but the variables in this case were chosen to reproduce the relationshipsbetween the regressor variables, rather than to predict y as well as possi-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!