12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

94 5. Graphical Representation of Data Using <strong>Principal</strong> <strong>Component</strong>sinterobservation dissimilarities or distances, and do not consider directlythe elements of X or S.We have now seen readily interpretable properties of both the gi ∗ and theh ∗ j separately for the biplot when α = 0, but there is a further property,valid for any value of α, which shows that the plots of the gi∗ and h∗ j canbe usefully superimposed rather than simply considered separately.From the relationship x ij = g i ′ h j, it follows that x ij is represented bythe projection of g i onto h j . Remembering that x ij is the value for the ithobservation of the jth variable measured about its sample mean, values ofx ij close to zero, which correspond to observations close to the sample meanof the jth variable, will only be achieved if g i and h j are nearly orthogonal.Conversely, observations for which x ij is a long way from zero will have g ilying in a similar direction to h j . The relative positions of the points definedby the g i and h j , or their approximations in two dimensions, the gi∗ andh ∗ j , will therefore give information about which observations take large,average and small values on each variable.Turning to the biplot with α = 1, the properties relating to g i and h jseparately are different from those for α = 0. With α =1wehaveG = UL, H ′ = A ′ ,and instead of (g h − g i ) ′ (g h − g i ) being proportional to the Mahalanobisdistance between x h and x i , it is now equal to the Euclidean distance. Thisfollows because(x h − x i ) ′ (x h − x i )=(g h − g i ) ′ H ′ H(g h − g i )=(g h − g i ) ′ A ′ A(g h − g i )=(g h − g i ) ′ (g h − g i ).Therefore, if we prefer a plot on which the distance between gh∗ andgi∗ is a good approximation to Euclidean, rather than Mahalanobis, distancebetween x h and x i then the biplot with α = 1 will be preferredto α = 0. Note that using Mahalanobis distance emphasizes the distancebetween the observations in the direction of the low-variance PCsand downweights distances in the direction of high-variance PCs, whencompared with Euclidean distance (see Section 10.1).Another interesting property of the biplot with α = 1 is that the positionsof the gi ∗ are identical to those given by a straightforward plot with respectto the first two PCs, as described in Section 5.1. It follows from equation(5.3.3) and Section 3.5 that we can writer∑x ij = z ik a jk ,k=1where z ik = u ik l 1/2kis the value of the kth PC for the ith observation. Butα = 1 implies that G = UL, sothekth element of g i is u ik l 1/2k= z ik .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!