12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

196 8. <strong>Principal</strong> <strong>Component</strong>s in Regression <strong>Analysis</strong>Table 8.6. Eigenvalues of the correlation matrix and order of importance inpredicting y for the household formation data.PC Eigenvalue Order of PC Eigenvalue Order ofnumber importance number importancein predicting yin predicting y1 8.62 1 15 0.24 172 6.09 4 16 0.21 253 3.40 2 17 0.18 164 2.30 8 18 0.14 105 1.19 9 19 0.14 76 1.06 3 20 0.10 217 0.78 13 21 0.10 288 0.69 22 22 0.07 69 0.58 20 23 0.07 1810 0.57 5 24 0.05 1211 0.46 11 25 0.04 1412 0.36 15 26 0.03 2713 0.27 24 27 0.02 1914 0.25 23 28 0.003 26Although this was not the purpose of the original project, the objectiveconsidered here is to predict the final variable (average annual total incomeper adult) from the other 28. This objective is a useful one, as informationon income is often difficult to obtain accurately, and predictions from other,more readily available, variables would be valuable. The results presentedbelow were given by Garnham (1979) in an unpublished M.Sc. dissertation,and further details of the regression analysis can be found in thatsource. A full description of the project from which the data are taken isavailable in Bassett et al. (1980). Most regression problems with as manyas 28 regressor variables have multicollinearities, and the current exampleis no exception. Looking at the list of variables in Table 8.5 it is clear,even without detailed definitions, that there are groups of variables thatare likely to be highly correlated. For example, several variables relate totype of household, whereas another group of variables considers rates ofemployment in various types of job. Table 8.6, giving the eigenvalues of thecorrelation matrix, confirms that there are multicollinearities; some of theeigenvalues are very small.Consider now PC regression and some of the strategies that can be usedto select a subset of PCs to be included in the regression. Deleting componentswith small variance, with a cut-off of about l ∗ =0.10, implies thatbetween seven and nine components can be left out. Sequential deletionof PCs with the smallest variances using t-statistics at each stage suggeststhat only six PCs can be deleted. However, from the point of view of R 2 ,the squared multiple correlation coefficient, deletion of eight or more might

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!