Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

9.2. Cluster Analysis 219county cluster in the top right of the plot splits into three groups containing13, 10 and 4 counties, with some overlap between them.This example is typical of many in which cluster analysis is used fordissection. Examples like that of Jeffers’ (1967) aphids, where a very clearcutand previously unknown cluster structure is uncovered, are relativelyunusual, although another illustration is given by Blackith and Reyment(1971, p. 155). In their example, a plot of the observations with respect tothe second and third (out of seven) PCs shows a very clear separation intotwo groups. It is probable that in many circumstances ‘projection-pursuit’methods, which are discussed next, will provide a better two-dimensionalspace in which to view the results of a cluster analysis than that defined bythe first two PCs. However, if dissection rather than discovery of a clear-cutcluster structure is the objective of a cluster analysis, then there is likelyto be little improvement over a plot with respect to the first two PCs.9.2.2 Projection PursuitAs mentioned earlier in this chapter, it may be possible to find lowdimensionalrepresentations of a data set that are better than the first fewPCs at displaying ‘structure’ in the data. One approach to doing this is todefine structure as ‘interesting’ and then construct an index of ‘interestingness,’which is successively maximized. This is the idea behind projectionpursuit, with different indices leading to different displays. If ‘interesting’is defined as ‘large variance,’ it is seen that PCA is a special case of projectionpursuit. However, the types of structure of interest are often clustersor outliers, and there is no guarantee that the high-variance PCs will findsuch features. The term ‘projection pursuit’ dates back to Friedman andTukey (1974), and a great deal of work was done in the early 1980s. Thisis described at length in three key papers: Friedman (1987), Huber (1985),and Jones and Sibson (1987). The last two both include extensive discussion,in addition to the paper itself. Some techniques are good at findingclusters, whereas others are better at detecting outliers.Most projection pursuit techniques start from the premise that the leastinteresting structure is multivariate normality, so that deviations fromnormality form the basis of many indices. There are measures based onskewness and kurtosis, on entropy, on looking for deviations from uniformityin transformed data, and on finding ‘holes’ in the data. More recently,Foster (1998) suggested looking for directions of high density, after ‘sphering’the data to remove linear structure. Sphering operates by transformingthe variables x to z = S − 1 2 (x−¯x), which is equivalent to converting to PCs,which are then standardized to have zero mean and unit variance. Friedman(1987) also advocates sphering as a first step in his version of projectionpursuit. After identifying the high-density directions for the sphered data,Foster (1998) uses the inverse transformation to discover the nature of theinteresting structures in terms of the original variables.

220 9. Principal Components Used with Other Multivariate TechniquesProjection pursuit indices usually seek out deviations from multivariatenormality. Bolton and Krzanowski (1999) show that if normality holds thenPCA finds directions for which the maximized likelihood is minimized. Theyinterpret this result as PCA choosing interesting directions to be those forwhich normality is least likely, thus providing a link with the ideas of projectionpursuit. A different projection pursuit technique with an implicitassumption of normality is based on the fixed effects model of Section 3.9.Recall that the model postulates that, apart from an error term e i withvar(e i )= σ2w iΓ, the variables x lie in a q-dimensional subspace. To findthe best-fitting subspace, ∑ ni=1 w i ‖x i − z i ‖ 2 Mis minimized for an appropriatelychosen metric M. For multivariate normal e i the optimal choicefor M is Γ −1 . Given a structure of clusters in the data, all w i equal, ande i describing variation within clusters, Caussinus and Ruiz (1990) suggesta robust estimate of Γ, defined byˆΓ =∑ n−1 ∑ ni=1 j=i+1 K[‖x i − x j ‖ 2 S](x −1 i − x j )(x i − x j ) ′∑ n−1 ∑ ni=1 j=i+1 K[‖x i − x j ‖ 2 , (9.2.1)S] −1where K[.] is a decreasing positive real function (Caussinus and Ruiz, 1990,use K[d] =e − β 2 t for β>0) and S is the sample covariance matrix. The bestfit is then given by finding eigenvalues and eigenvectors of SˆΓ −1 ,whichisa type of generalized PCA (see Section 14.2.2). There is a similarity herewith canonical discriminant analysis (Section 9.1), which finds eigenvaluesand eigenvectors of S b S −1w , where S b , S w are between and within-groupcovariance matrices. In Caussinus and Ruiz’s (1990) form of projectionpursuit, S is the overall covariance matrix, and ˆΓ is an estimate of thewithin-group covariance matrix. Equivalent results would be obtained if Swere replaced by an estimate of between-group covariance, so that the onlyreal difference from canonical discriminant analysis is that the groups areknown in the latter case but are unknown in projection pursuit. Furthertheoretical details and examples of Caussinus and Ruiz’s technique canbe found in Caussinus and Ruiz-Gazen (1993, 1995). The choice of valuesfor β is discussed, and values in the range 0.5 to3.0 are recommended.There is a link between Caussinus and Ruiz-Gazen’s technique and themixture models of Section 9.2.3. In discussing theoretical properties of theirtechnique, they consider a framework in which clusters arise from a mixtureof multivariate normal distributions. The q dimensions of the underlyingmodel correspond to q clusters and Γ represents ‘residual’ or within-groupcovariance.Although not projection pursuit as such, Krzanowski (1987b) also looksfor low-dimensional representations of the data that preserve structure, butin the context of variable selection. Plots are made with respect to the firsttwo PCs calculated from only a subset of the variables. A criterion for

9.2. Cluster <strong>Analysis</strong> 219county cluster in the top right of the plot splits into three groups containing13, 10 and 4 counties, with some overlap between them.This example is typical of many in which cluster analysis is used fordissection. Examples like that of Jeffers’ (1967) aphids, where a very clearcutand previously unknown cluster structure is uncovered, are relativelyunusual, although another illustration is given by Blackith and Reyment(1971, p. 155). In their example, a plot of the observations with respect tothe second and third (out of seven) PCs shows a very clear separation intotwo groups. It is probable that in many circumstances ‘projection-pursuit’methods, which are discussed next, will provide a better two-dimensionalspace in which to view the results of a cluster analysis than that defined bythe first two PCs. However, if dissection rather than discovery of a clear-cutcluster structure is the objective of a cluster analysis, then there is likelyto be little improvement over a plot with respect to the first two PCs.9.2.2 Projection PursuitAs mentioned earlier in this chapter, it may be possible to find lowdimensionalrepresentations of a data set that are better than the first fewPCs at displaying ‘structure’ in the data. One approach to doing this is todefine structure as ‘interesting’ and then construct an index of ‘interestingness,’which is successively maximized. This is the idea behind projectionpursuit, with different indices leading to different displays. If ‘interesting’is defined as ‘large variance,’ it is seen that PCA is a special case of projectionpursuit. However, the types of structure of interest are often clustersor outliers, and there is no guarantee that the high-variance PCs will findsuch features. The term ‘projection pursuit’ dates back to Friedman andTukey (1974), and a great deal of work was done in the early 1980s. Thisis described at length in three key papers: Friedman (1987), Huber (1985),and Jones and Sibson (1987). The last two both include extensive discussion,in addition to the paper itself. Some techniques are good at findingclusters, whereas others are better at detecting outliers.Most projection pursuit techniques start from the premise that the leastinteresting structure is multivariate normality, so that deviations fromnormality form the basis of many indices. There are measures based onskewness and kurtosis, on entropy, on looking for deviations from uniformityin transformed data, and on finding ‘holes’ in the data. More recently,Foster (1998) suggested looking for directions of high density, after ‘sphering’the data to remove linear structure. Sphering operates by transformingthe variables x to z = S − 1 2 (x−¯x), which is equivalent to converting to PCs,which are then standardized to have zero mean and unit variance. Friedman(1987) also advocates sphering as a first step in his version of projectionpursuit. After identifying the high-density directions for the sphered data,Foster (1998) uses the inverse transformation to discover the nature of theinteresting structures in terms of the original variables.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!