12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

204 9. <strong>Principal</strong> <strong>Component</strong>s Used with Other Multivariate Techniquesdistance between two samples (see equation (9.1.3)), between an observationand a sample mean (see Section 10.1, below equation (10.1.2)), andbetween an observation and a population mean.If we take a subset of the original p variables, then the discriminatorypower of the subset can be measured by the Mahalanobis distance betweenthe two populations in the subspace defined by the subset of variables.Chang (1983) shows that this is also true if Σ −1 is replaced in (9.1.1) byΨ −1 , where Ψ = Σ + π(1 − π)(µ 1 − µ 2 )(µ 1 − µ 2 ) ′ and π is the probabilityof an observation coming from the ith population, i =1, 2. The matrixΨ is the overall covariance matrix for x, ignoring the group structure.Chang (1983) shows further that the Mahalanobis distance based on thekth PC of Ψ is a monotonic, increasing function of θ k =[α ′ k (µ 1 −µ 2 )] 2 /λ k ,where α k , λ k are, as usual, the vector of coefficients in the kth PC and thevariance of the kth PC, respectively. Therefore, the PC with the largestdiscriminatory power is the one that maximizes θ k ; this will not necessarilycorrespond to the first PC, which maximizes λ k . Indeed, if α 1 is orthogonalto (µ 1 − µ 2 ), as in Figure 9.2, then the first PC has no discriminatorypower at all. Chang (1983) gives an example in which low-variance PCs areimportant discriminators, and he also demonstrates that a change of scalingfor the variables, for example in going from a covariance to a correlationmatrix, can change the relative importance of the PCs. Townshend (1984)has an example from remote sensing in which there are seven variablescorresponding to seven spectral bands. The seventh PC accounts for lessthan 0.1% of the total variation, but is important in discriminating betweendifferent types of land cover.The quantity θ k is also identified as an important parameter for discriminantanalysis by Dillon et al. (1989) and Kshirsagar et al. (1990) but,like Chang (1983), they do not examine the properties of its sample analogueˆθ k , where ˆθ k is defined as [a ′ k (¯x 1 − ¯x 2 )] 2 /l k , with obvious notation.<strong>Jolliffe</strong> et al. (1996) show that a statistic which is a function of ˆθ k has at-distribution under the null hypothesis that the kth PC has equal meansfor the two populations.The results of Chang (1983) and <strong>Jolliffe</strong> et al. (1996) are for two groupsonly, but Devijver and Kittler (1982, Section 9.6) suggest that a similarquantity to ˆθ k should be used when more groups are present. Their statisticis ˜θ k = a ′ k S ba k /l k , where S b is proportional to a quantity that generalizes(¯x 1 − ¯x 2 )(¯x 1 − ¯x 2 ) ′ , namely ∑ Gg=1 n g(¯x g − ¯x)(¯x g − ¯x) ′ , where n g is thenumber of observations in the gth group and ¯x is the overall mean of allthe observations. A difference between ˆθ k and ˜θ k is seen in the following:a k , l k are eigenvectors and eigenvalues, respectively, for the overall covariancematrix in the formula for ˆθ k , but for the within-group covariancematrix S w for ˜θ k . Devijver and Kittler (1982) advocate ranking the PCs interms of ˜θ k and deleting those components for which ˜θ k is smallest. Onceagain, this ranking will typically diverge from the ordering based on size ofvariance.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!