12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

212 9. <strong>Principal</strong> <strong>Component</strong>s Used with Other Multivariate TechniquesEuclidean distance based on the first two PCs gives a very close approximationto Euclidean distance based on all four variables, but it givesroughly three times as much weight to the first PC as to the second. Alternatively,if the first two PCs are renormalized to have equal weight, thisimplies that we are treating the one measurement of cloudbase height asbeing equal in importance to the three measurements of temperature.In general, if Euclidean distance is calculated using all p renormalizedPCs, then this is equivalent to calculating the Mahalanobis distance for theoriginal variables (see Section 10.1, below equation (10.1.2), for a proof ofthe corresponding property for Mahalanobis distances of observations fromsample means, rather than between pairs of observations). Mahalanobisdistance is yet another plausible dissimilarity measure, which takes intoaccount the variances and covariances between the elements of x. NaesandIsaksson (1991) give an example of (fuzzy) clustering in which the distancemeasure is based on Mahalanobis distance, but is truncated to exclude thelast few PCs when the variances of these are small and unstable.Regardless of the similarity or dissimilarity measure adopted, PCA hasa further use in cluster analysis, namely to give a two-dimensional representationof the observations (see also Section 5.1). Such a two-dimensionalrepresentation can give a simple visual means of either detecting or verifyingthe existence of clusters, as noted by Rao (1964), provided that most ofthe variation, and in particular the between-cluster variation, falls in thetwo-dimensional subspace defined by the first two PCs.Of course, the same problem can arise as in discriminant analysis, namelythat the between-cluster variation may be in directions other than those ofthe first two PCs, even if these two PCs account for nearly all of the totalvariation. However, this behaviour is generally less likely in cluster analysis,as the PCs are calculated for the whole data set, not within-groups. Aspointed out in Section 9.1, if between-cluster variation is much greater thanwithin-cluster variation, such PCs will often successfully reflect the clusterstructure. It is, in any case, frequently impossible to calculate within-groupPCs in cluster analysis as the group structure is usually unknown a priori.It can be argued that there are often better directions than PCs inwhich to view the data in order to ‘see’ structure such as clusters. Projectionpursuit includes a number of ideas for finding such directions, andwill be discussed in Section 9.2.2. However, the examples discussed belowillustrate that plots with respect to the first two PCs can give suitabletwo-dimensional representations on which to view the cluster structure ifa clear structure exists. Furthermore, in the case where there is no clearstructure, but it is required to dissect the data using cluster analysis, therecan be no real objection to the use of a plot with respect to the first twoPCs. If we wish to view the data in two dimensions in order to see whethera set of clusters given by some procedure ‘looks sensible,’ then the first twoPCs give the best possible representation in two dimensions in the sensedefined by Property G3 of Section 3.2.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!