12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.3. Biplots 99Table 5.2. First two PCs: 100 km running data.<strong>Component</strong> 1 <strong>Component</strong> 2First 10 km⎫−0.30 0.45Second 10 km −0.30 0.45Third 10 km −0.33 0.34Fourth 10 km −0.34 0.20Fifth 10 km⎪⎬−0.34 −0.06CoefficientsSixth 10 km −0.35 −0.16Seventh 10 km −0.31 −0.27Eighth 10 km −0.31 −0.30Ninth 10 km ⎪⎭−0.31 −0.29Tenth 10 km −0.27 −0.40Eigenvalue 72.4 1.28Cumulative percentage of total variation 7.24 85.3measures the overall speed of the runners, and the second contrasts thoserunners who slow down substantially during the course of the race withthose runners who maintain a more even pace. Together, the first two PCsaccount for more than 85% of the total variation in the data.The adapted α = 0 biplot for these data is shown in Figure 5.4.As with the previous example, the plot using α = 1 is not very satisfactorybecause the vectors corresponding to the variables are all veryclose to the centre of the plot. Figure 5.4 shows that with α =0wehavethe opposite extreme—the vectors corresponding to the variables and thepoints corresponding to the observations are completely separated. As acompromise, Figure 5.5 gives the biplot with α = 1 2, which at least hasapproximately the same degree of spread for variables and observations.As with α = 0, the plot has been modified from the straightforward factorizationcorresponding to α = 1 2 .Theg i have been multiplied, and theh j divided, by (n − 1) 1/4 , so that we have a compromise between α =1and the adapted version of α = 0. The adapted plot with α = 1 2is still notentirely satisfactory, but even an arbitrary rescaling of observations and/orvariables, as suggested by Digby and Kempton (1987, Section 3.2), wouldstill have all the vectors corresponding to variables within a very narrowsector of the plot. This is unavoidable for data that, as in the present case,have large correlations between all variables. The tight bunching of the vectorssimply reflects large correlations, but it is interesting to note that theordering of the vectors around their sector corresponds almost exactly totheir position within the race. (The ordering is the same for both diagrams,but to avoid congestion, this fact has not been indicated on Figure 5.5.)With hindsight, this is not surprising as times in one part of the race are

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!