12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.2. The Elderly at Home 69Table 4.3. Variables used in the PCA for the elderly at home.1. Age 11. Separate kitchen2. Sex 12. Hot water3. Marital status 13. Car or van ownership4. Employed 14. Number of elderly in household5. Birthplace 15. Owner occupier6. Father’s birthplace 16. Council tenant7. Length of residence in 17. Private tenantpresent household 18. Lives alone8. Density: persons per room 19. Lives with spouse or sibling9. Lavatory 20. Lives with younger generation10. Bathroominferential techniques that depend on assumptions such as multivariatenormality (see Section 3.7) are not invoked, there is no real necessity forthe variables to have any particular distribution. Admittedly, correlationsor covariances, on which PCs are based, have particular relevance for normalrandom variables, but they are still valid for discrete variables providedthat the possible values of the discrete variables have a genuine interpretation.Variables should not be defined with more than two possible values,unless the values have a valid meaning relative to each other. If 0, 1, 3 arepossible values for a variable, then the values 1 and 3 must really be twiceas far apart as the values 0 and 1. Further discussion of PCA and relatedtechniques for discrete variables is given in Section 13.1.It is widely accepted that old people who have only just passed retirementage are different from the ‘very old,’ so that it might be misleading to dealwith all 2622 individuals together. Hunt (1978), too, recognized possibledifferences between age groups by taking a larger proportion of elderlywhose age was 75 or over in her sample—compared to those between 65and 74—than is present in the population as a whole. It was thereforedecided to analyse the two age groups 65–74 and 75+ separately, and partof each analysis consisted of a PCA on the correlation matrices for the20 variables listed in Table 4.3. It would certainly not be appropriate touse the covariance matrix here, where the variables are of several differenttypes.It turned out that for both age groups as many as 11 PCs could bereasonably well interpreted, in the sense that not too many coefficientswere far from zero. Because there are relatively few strong correlationsamong the 20 variables, the effective dimensionality of the 20 variablesis around 10 or 11, a much less substantial reduction than occurs whenthere are large correlations between most of the variables (see Sections 4.3and 6.4, for example). Eleven PCs accounted for 85.0% and 86.6% of thetotal variation for the 65–74 and 75+ age groups, respectively.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!