12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

220 9. <strong>Principal</strong> <strong>Component</strong>s Used with Other Multivariate TechniquesProjection pursuit indices usually seek out deviations from multivariatenormality. Bolton and Krzanowski (1999) show that if normality holds thenPCA finds directions for which the maximized likelihood is minimized. Theyinterpret this result as PCA choosing interesting directions to be those forwhich normality is least likely, thus providing a link with the ideas of projectionpursuit. A different projection pursuit technique with an implicitassumption of normality is based on the fixed effects model of Section 3.9.Recall that the model postulates that, apart from an error term e i withvar(e i )= σ2w iΓ, the variables x lie in a q-dimensional subspace. To findthe best-fitting subspace, ∑ ni=1 w i ‖x i − z i ‖ 2 Mis minimized for an appropriatelychosen metric M. For multivariate normal e i the optimal choicefor M is Γ −1 . Given a structure of clusters in the data, all w i equal, ande i describing variation within clusters, Caussinus and Ruiz (1990) suggesta robust estimate of Γ, defined byˆΓ =∑ n−1 ∑ ni=1 j=i+1 K[‖x i − x j ‖ 2 S](x −1 i − x j )(x i − x j ) ′∑ n−1 ∑ ni=1 j=i+1 K[‖x i − x j ‖ 2 , (9.2.1)S] −1where K[.] is a decreasing positive real function (Caussinus and Ruiz, 1990,use K[d] =e − β 2 t for β>0) and S is the sample covariance matrix. The bestfit is then given by finding eigenvalues and eigenvectors of SˆΓ −1 ,whichisa type of generalized PCA (see Section 14.2.2). There is a similarity herewith canonical discriminant analysis (Section 9.1), which finds eigenvaluesand eigenvectors of S b S −1w , where S b , S w are between and within-groupcovariance matrices. In Caussinus and Ruiz’s (1990) form of projectionpursuit, S is the overall covariance matrix, and ˆΓ is an estimate of thewithin-group covariance matrix. Equivalent results would be obtained if Swere replaced by an estimate of between-group covariance, so that the onlyreal difference from canonical discriminant analysis is that the groups areknown in the latter case but are unknown in projection pursuit. Furthertheoretical details and examples of Caussinus and Ruiz’s technique canbe found in Caussinus and Ruiz-Gazen (1993, 1995). The choice of valuesfor β is discussed, and values in the range 0.5 to3.0 are recommended.There is a link between Caussinus and Ruiz-Gazen’s technique and themixture models of Section 9.2.3. In discussing theoretical properties of theirtechnique, they consider a framework in which clusters arise from a mixtureof multivariate normal distributions. The q dimensions of the underlyingmodel correspond to q clusters and Γ represents ‘residual’ or within-groupcovariance.Although not projection pursuit as such, Krzanowski (1987b) also looksfor low-dimensional representations of the data that preserve structure, butin the context of variable selection. Plots are made with respect to the firsttwo PCs calculated from only a subset of the variables. A criterion for

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!