12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

92 5. Graphical Representation of Data Using <strong>Principal</strong> <strong>Component</strong>sIf α = 0, then G = U and H ′ = LA ′ or H = AL. This means thatX ′ X =(GH ′ ) ′ (GH ′ )= HG ′ GH ′= HU ′ UH ′= HH ′ ,because the columns of U are orthonormal. The product h ′ j h k is thereforeequal to (n − 1) multiplied by the covariance s jk between the jthand kth variables, and h ∗′j h∗ k , where h∗ j ,j = 1, 2, ··· ,p are as definedabove, provides an approximation to (n − 1)s jk . The lengths h ′ j h j of thevectors h j ,i=1, 2, ··· ,p are proportional to the variances of the variablesx 1 ,x 2 , ··· ,x p , and the cosines of the angles between the h j representcorrelations between variables. Plots of the h ∗ j therefore provide a twodimensionalpicture (usually an approximation, but often a good one) ofthe elements of the covariance matrix S, and such plots are advocatedby Corsten and Gabriel (1976) as a means of comparing the variancecovariancestructures of several different data sets. An earlier paper byGittins (1969), which is reproduced in Bryant and Atchley (1975), alsogives plots of the h ∗ j , although it does not discuss their formal properties.Not only do the h j have a ready graphical interpretation when α = 0, butthe g i also have the satisfying property that the Euclidean distance betweeng h and g i in the biplot is proportional to the Mahalanobis distance betweenthe hth and ith observations in the complete data set. The Mahalanobisdistance between two observations x h , x i , assuming that X has rank p sothat S −1 exists, is defined asδ 2 hi =(x h − x i ) ′ S −1 (x h − x i ), (5.3.5)and is often used as an alternative to the Euclidean distanced 2 hi =(x h − x i ) ′ (x h − x i ).Whereas Euclidean distance treats all variables on an equal footing, whichessentially assumes that all variables have equal variances and are uncorrelated,Mahalanobis distance gives relatively less weight to variables withlarge variances and to groups of highly correlated variables.To prove this Mahalanobis distance interpretation, rewrite (5.3.2) asand substitute in (5.3.5) to givex ′ i = g ′ iH ′ , i =1, 2,...,n,δ 2 hi =(g h − g i ) ′ H ′ S −1 H(g h − g i )=(n − 1)(g h − g i ) ′ LA ′ (X ′ X) −1 AL(g h − g i ), (5.3.6)as H ′ = LA ′ and S −1 =(n − 1)(X ′ X) −1 .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!