12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

10.2. Influential Observations in a <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> 251change to a covariance matrix may change one of the eigenvalues withoutaffecting the others, but that this cannot happen for a correlation matrix.For a correlation matrix the sum of the eigenvalues is a constant, so thatif one of them is changed there must be compensatory changes in at leastone of the others.Expressions for I(x; α k ) are more complicated than those for I(x; λ k );for example, for covariance matrices we havep∑I(x; α k )=−z k z h α h (λ h − λ k ) −1 (10.2.4)h≠kcompared with (10.2.2) for I(x; λ k ). A number of comments can bemade concerning (10.2.4) and the corresponding expression for correlationmatrices, which isp∑p∑ p∑I(x; α k )= α h (λ h − λ k ) −1 α hi α kj I(x; ρ ij ). (10.2.5)h≠ki=1 j=1i≠jFirst, and perhaps most important, the form of the expression is completelydifferent from that for I(x; λ k ). It is possible for an observationto be influential for λ k but not for α k , and vice versa. This behaviour isillustrated by the examples in Section 10.2.1 below.A second related point is that for covariance matrices I(x; α k ) dependson all of the PCs, z 1 ,z 2 ,...,z p , unlike I(x; λ k ), which depends just onz k . The dependence is quadratic, but involves only cross-product termsz j z k , j ≠ k, and not linear or squared terms. The general shape of theinfluence curves I(x; α k ) is hyperbolic for both covariance and correlationmatrices, but the details of the functions are different. The dependence ofboth (10.2.4) and (10.2.5) on eigenvalues is through (λ h −λ k ) −1 . This meansthat influence, and hence changes to α k resulting from small perturbationsto the data, tend to be large when λ k is close to λ (k−1) or to λ (k+1) .A final point is, that unlike regression, the influence of different observationsin PCA is approximately additive, that is the presence of oneobservation does not affect the influence of another (Calder (1986), Tanakaand Tarumi (1987)).To show that theoretical influence functions are relevant to sample data,predictions from the theoretical influence function can be compared withthe sample influence function, which measures actual changes caused bythe deletion from a data set of one observation at a time. The theoreticalinfluence function typically contains unknown parameters and thesemust be replaced by equivalent sample quantities in such comparisons.This gives what Critchley (1985) calls the empirical influence function. Healso considers a third sample-based influence function, the deleted empiricalinfluence function in which the unknown quantities in the theoreticalinfluence function are estimated using a sample from which the observation

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!