12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

11.3. Simplified Approximations to <strong>Principal</strong> <strong>Component</strong>s 293to integer values. The difference is that the latter methods provide simpleralternatives to PCA, whereas Bibby’s (1980) suggestion approximates thePCs. Both types of simplification increase the interpretability of individualcomponents, but comparisons between components are more difficultwhen integers are used, because different components have different valuesof a ′ k a k.It is possible to test whether a single simplified (rounded or otherwise)PC is a plausible ‘population’ PC using the result in equation (3.7.5) (seeJackson (1991, Section 7.4)) but the lack of orthogonality between a setof simplified PCs means that it is not possible for them to simultaneouslyrepresent population PCs.Green (1977) investigates a different effect of rounding in PCA. Insteadof looking at the direct impact on the PCs, he looks at the proportionsof variance accounted for in each individual variable by the first m PCs,and examines by how much these proportions are reduced by rounding. Heconcludes that changes due to rounding are small, even for quite severerounding, and recommends rounding to the nearest 0.1 or even 0.2, asthis will increase interpretability with little effect on other aspects of theanalysis.It is fairly common practice in interpreting a PC to ignore (set to zero),either consciously or subconsciously, the variables whose coefficients havethe smallest absolute values for that principal component. A second stagethen focuses on these ‘truncated’ components to see whether the pattern ofnon-truncated coefficients can be interpreted as a simple weighted averageor contrast for the non-ignored variables. Cadima and <strong>Jolliffe</strong> (1995) showthat the first ‘truncation’ step does not always do what might be expectedand should be undertaken with caution. In particular, this step can beconsidered as choosing a subset of variables (those not truncated) withwhich to approximate a PC. Cadima and <strong>Jolliffe</strong> (1995) show that• for the chosen subset of variables, the linear combination given bythe coefficients in the untruncated PC may be far from the optimallinear approximation to that PC using those variables;• a different subset of the same size may provide a better approximation.As an illustration of the first point, consider an example given by Cadimaand <strong>Jolliffe</strong> (1995) using data presented by Lebart et al. (1982). In thisexample there are seven variables measuring yearly expenditure of groups ofFrench families on 7 types of foodstuff. The loadings on the variables in thesecond PC to two decimal places are 0.58, 0.41, −0.10, −0.11, −0.24, 0.63and 0.14, so a truncated version of this component isẑ 2 =0.58x 1 +0.41x 2 +0.63x 6 .Thus, PC2 can be interpreted as a weighted average of expenditure on

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!