12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

210 9. <strong>Principal</strong> <strong>Component</strong>s Used with Other Multivariate TechniquesBecause of the connections between PCs and canonical variates, Mardia etal. (1979, p. 344) refer to canonical discriminant analysis as the analoguefor grouped data of PCA for ungrouped data. Note also that with an appropriatechoice of metric, generalized PCA as defined in Section 14.2.2 isequivalent to a form of discriminant analysis.No examples have been given in detail in this section, although some havebeen mentioned briefly. An interesting example, in which the objective isto discriminate between different carrot cultivars, is presented by Horganet al. (2001). Two types of data are available, namely the positions of‘landmarks’ on the outlines of the carrots and the brightness of each pixelin a cross-section of each carrot. The two data sets are subjected to separatePCAs, and a subset of PCs taken from both analyses is used to constructa discriminant function.9.2 Cluster <strong>Analysis</strong>In cluster analysis, it is required to divide a set of observations into groupsor clusters in such a way that most pairs of observations that are placed inthe same group are more similar to each other than are pairs of observationsthat are placed into two different clusters. In some circumstances, it maybe expected or hoped that there is a clear-cut group structure underlyingthe data, so that each observation comes from one of several distinct populations,as in discriminant analysis. The objective then is to determine thisgroup structure where, in contrast to discriminant analysis, there is little orno prior information about the form that the structure takes. Cluster analysiscan also be useful when there is no clear group structure in the data.In this case, it may still be desirable to segment or dissect (using the terminologyof Kendall (1966)) the observations into relatively homogeneousgroups, as observations within the same group may be sufficiently similarto be treated identically for the purpose of some further analysis, whereasthis would be impossible for the whole heterogeneous data set. There arevery many possible methods of cluster analysis, and several books haveappeared on the subject, for example Aldenderfer and Blashfield (1984),Everitt et al. (2001), Gordon (1999). Most methods can be used eitherfor detection of clear-cut groups or for dissection/segmentation, althoughthere is increasing interest in mixture models, which explicitly assume theexistence of clusters (see Section 9.2.3).The majority of cluster analysis techniques require a measure of similarityor dissimilarity between each pair of observations, and PCs havebeen used quite extensively in the computation of one type of dissimilarity.If the p variables that are measured for each observation are quantitativeand in similar units, then an obvious measure of dissimilarity between twoobservations is the Euclidean distance between the observations in the p-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!