12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

286 11. Rotation and Interpretation of <strong>Principal</strong> <strong>Component</strong>sIn practice, it has been found that c = 0 usually gives the best balancebetween simplicity and retention of variance. Examples of this technique’sapplication are now given.Mediterranean SSTReturning to the Mediterrean SST example of Section 11.1.2, Figures 11.2–11.5 show that for autumn the simple components using c = 0 are verysimple indeed. In the figures the normalization a ′ k a k = 1 is used to aidcomparisons between methods, but when converted to integers all coefficientsin the first simple component are equal to 1. The second componentis a straightforward contrast between east and west with all coefficientsequal to +1 or −1. The results for winter are slightly less simple. The firstsimple component has four grid boxes with coefficients equal to 2 with theremaining 12 coefficients equal to 1, and the second component has coefficientsproportional to 3, 4, 5 and 6 in absolute value. The first two simplecomponents account for 70.1%, 67.4% of total variation in autumn andwinter, respectively, compared to 78.2%, 71.0% for the first two PCs.PitpropsHere we revisit the pitprop data, originally analysed by Jeffers (1967) anddiscussed in Section 8.7.1. Tables 11.3 and 11.4 give the coefficients andcumulative variance for the first and fourth simple components for thesedata. Also given in the tables is corresponding information for SCoT andSCoTLASS. The first simple component is, as in the SST example, verysimple with all its coefficients proportional to +1, 0 or −1. Its loss ofvariance compared to the first PC is non-trivial, though not large. Thesecond component (not shown) is also simple, the third (also not shown)is less so, and the fourth (Table 11.4) is by no means simple, reflectingthe pattern that higher variance simple components are simpler than laterones. The cumulative loss of variance over 4 components compared to PCAis similar to that over 2 components in the SST example.11.2.2 <strong>Component</strong>s Based on the LASSOTibshirani (1996) considers the difficulties associated with interpreting multipleregression equations with many predictor variables. As discussed inChapter 8, these problems may occur due to the instability of the regressioncoefficients in the presence of collinearity, or may simply be as a consequenceof the large number of variables included in the regression equation.Alternatives to least squares regression that tackle the instability are of twomain types. Biased regression methods such as PC regression keep all variablesin the regression equation but typically shrink some of the regressioncoefficients towards zero (see Section 8.3). On the other hand, variable selectionprocedures (Section 8.5) choose a subset of variables and keep only

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!