Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
10.2. Influential Observations in a Principal Component Analysis 251change to a covariance matrix may change one of the eigenvalues withoutaffecting the others, but that this cannot happen for a correlation matrix.For a correlation matrix the sum of the eigenvalues is a constant, so thatif one of them is changed there must be compensatory changes in at leastone of the others.Expressions for I(x; α k ) are more complicated than those for I(x; λ k );for example, for covariance matrices we havep∑I(x; α k )=−z k z h α h (λ h − λ k ) −1 (10.2.4)h≠kcompared with (10.2.2) for I(x; λ k ). A number of comments can bemade concerning (10.2.4) and the corresponding expression for correlationmatrices, which isp∑p∑ p∑I(x; α k )= α h (λ h − λ k ) −1 α hi α kj I(x; ρ ij ). (10.2.5)h≠ki=1 j=1i≠jFirst, and perhaps most important, the form of the expression is completelydifferent from that for I(x; λ k ). It is possible for an observationto be influential for λ k but not for α k , and vice versa. This behaviour isillustrated by the examples in Section 10.2.1 below.A second related point is that for covariance matrices I(x; α k ) dependson all of the PCs, z 1 ,z 2 ,...,z p , unlike I(x; λ k ), which depends just onz k . The dependence is quadratic, but involves only cross-product termsz j z k , j ≠ k, and not linear or squared terms. The general shape of theinfluence curves I(x; α k ) is hyperbolic for both covariance and correlationmatrices, but the details of the functions are different. The dependence ofboth (10.2.4) and (10.2.5) on eigenvalues is through (λ h −λ k ) −1 . This meansthat influence, and hence changes to α k resulting from small perturbationsto the data, tend to be large when λ k is close to λ (k−1) or to λ (k+1) .A final point is, that unlike regression, the influence of different observationsin PCA is approximately additive, that is the presence of oneobservation does not affect the influence of another (Calder (1986), Tanakaand Tarumi (1987)).To show that theoretical influence functions are relevant to sample data,predictions from the theoretical influence function can be compared withthe sample influence function, which measures actual changes caused bythe deletion from a data set of one observation at a time. The theoreticalinfluence function typically contains unknown parameters and thesemust be replaced by equivalent sample quantities in such comparisons.This gives what Critchley (1985) calls the empirical influence function. Healso considers a third sample-based influence function, the deleted empiricalinfluence function in which the unknown quantities in the theoreticalinfluence function are estimated using a sample from which the observation
252 10. Outlier Detection, Influential Observations and Robust Estimationwhose influence is to be assessed is omitted. The first example given in Section10.2.1 below illustrates that the empirical influence function can givea good approximation to the sample influence function for moderate samplesizes. Critchley (1985) compares the various influence functions from amore theoretical viewpoint.A considerable amount of work was done in the late 1980s and early 1990son influence functions in multivariate analysis, some of which extends thebasic results for PCA given earlier in this section. Benasseni in France andTanaka and co-workers in Japan were particularly active in various aspectsof influence and sensitivity for a wide range of multivariate techniques.Some of their work on sensitivity will be discussed further in Section 10.3.Tanaka (1988) extends earlier work on influence in PCA in two relatedways. The first is to explicitly consider the situation where there are equaleigenvalues—equations (10.2.4) and (10.2.5) break down in this case. Secondlyhe considers influence functions for subspaces spanned by subsetsof PCs, not simply individual PCs. Specifically, if A q is a matrix whosecolumns are a subset of q eigenvectors, and Λ q is the diagonal matrix of correspondingeigenvalues, Tanaka (1988) finds expressions for I(x; A q Λ q A ′ q)and I(x; A q A ′ q). In discussing a general strategy for analysing influencein multivariate methods, Tanaka (1995) suggests that groups of observationswith similar patterns of influence across a set of parameters may bedetected by means of a PCA of the empirical influence functions for eachparameter.Benasseni (1990) examines a number of measures for comparing principalcomponent subspaces computed with and without one of the observations.After eliminating some possible measures such as the RV-coefficient(Robert and Escoufier, 1976) and Yanai’s generalized coefficient of determination(Yanai, 1980) for being too insensitive to perturbations, he settlesonandρ 1(i) =1−ρ 2(i) =1−q∑k=1q∑k=1‖a k − P (i) a k ‖q‖a k(i) − Pa k(i) ‖qwhere a k , a k(i) are eigenvectors with and without the ith observation,P, P (i) are projection matrices onto the subspaces derived with and withoutthe ith observation, and the summation is over the q eigenvectors withinthe subspace of interest. Benasseni (1990) goes on to find expressions forthe theoretical influence functions for these two quantities, which can thenbe used to compute empirical influences.Reducing the comparison of two subspaces to a single measure inevitablyleads to a loss of information about the structure of the differences be-
- Page 232 and 233: 9.1. Discriminant Analysis 201on th
- Page 234 and 235: 9.1. Discriminant Analysis 203Figur
- Page 236 and 237: 9.1. Discriminant Analysis 205Corbi
- Page 238 and 239: 9.1. Discriminant Analysis 207that
- Page 240 and 241: 9.1. Discriminant Analysis 209betwe
- Page 242 and 243: 9.2. Cluster Analysis 211dimensiona
- Page 244 and 245: 9.2. Cluster Analysis 213Before loo
- Page 246 and 247: 9.2. Cluster Analysis 215Figure 9.3
- Page 248 and 249: 9.2. Cluster Analysis 217demographi
- Page 250 and 251: 9.2. Cluster Analysis 219county clu
- Page 252 and 253: 9.2. Cluster Analysis 221choosing a
- Page 254 and 255: 9.3. Canonical Correlation Analysis
- Page 256 and 257: 9.3. Canonical Correlation Analysis
- Page 258 and 259: 9.3. Canonical Correlation Analysis
- Page 260 and 261: 9.3. Canonical Correlation Analysis
- Page 262 and 263: 9.3. Canonical Correlation Analysis
- Page 264 and 265: 10.1. Detection of Outliers Using P
- Page 266 and 267: 10.1. Detection of Outliers Using P
- Page 268 and 269: 10.1. Detection of Outliers Using P
- Page 270 and 271: 10.1. Detection of Outliers Using P
- Page 272 and 273: 10.1. Detection of Outliers Using P
- Page 274 and 275: 10.1. Detection of Outliers Using P
- Page 276 and 277: 10.1. Detection of Outliers Using P
- Page 278 and 279: 10.1. Detection of Outliers Using P
- Page 280 and 281: 10.2. Influential Observations in a
- Page 284 and 285: 10.2. Influential Observations in a
- Page 286 and 287: 10.2. Influential Observations in a
- Page 288 and 289: 10.2. Influential Observations in a
- Page 290 and 291: 10.3. Sensitivity and Stability 259
- Page 292 and 293: 10.3. Sensitivity and Stability 261
- Page 294 and 295: 10.4. Robust Estimation of Principa
- Page 296 and 297: 10.4. Robust Estimation of Principa
- Page 298 and 299: 10.4. Robust Estimation of Principa
- Page 300 and 301: 11Rotation and Interpretation ofPri
- Page 302 and 303: 11.1. Rotation of Principal Compone
- Page 304 and 305: oot of the corresponding eigenvalue
- Page 306 and 307: 11.1. Rotation of Principal Compone
- Page 308 and 309: 11.1. Rotation of Principal Compone
- Page 310 and 311: 11.2. Alternatives to Rotation 279w
- Page 312 and 313: 11.2. Alternatives to Rotation 281F
- Page 314 and 315: 11.2. Alternatives to Rotation 283F
- Page 316 and 317: 11.2. Alternatives to Rotation 285T
- Page 318 and 319: 11.2. Alternatives to Rotation 287T
- Page 320 and 321: 11.2. Alternatives to Rotation 289A
- Page 322 and 323: 11.2. Alternatives to Rotation 291
- Page 324 and 325: 11.3. Simplified Approximations to
- Page 326 and 327: 11.3. Simplified Approximations to
- Page 328 and 329: 11.4. Physical Interpretation of Pr
- Page 330 and 331: 12Principal Component Analysis forT
252 10. Outlier Detection, Influential Observations and Robust Estimationwhose influence is to be assessed is omitted. The first example given in Section10.2.1 below illustrates that the empirical influence function can givea good approximation to the sample influence function for moderate samplesizes. Critchley (1985) compares the various influence functions from amore theoretical viewpoint.A considerable amount of work was done in the late 1980s and early 1990son influence functions in multivariate analysis, some of which extends thebasic results for PCA given earlier in this section. Benasseni in France andTanaka and co-workers in Japan were particularly active in various aspectsof influence and sensitivity for a wide range of multivariate techniques.Some of their work on sensitivity will be discussed further in Section 10.3.Tanaka (1988) extends earlier work on influence in PCA in two relatedways. The first is to explicitly consider the situation where there are equaleigenvalues—equations (10.2.4) and (10.2.5) break down in this case. Secondlyhe considers influence functions for subspaces spanned by subsetsof PCs, not simply individual PCs. Specifically, if A q is a matrix whosecolumns are a subset of q eigenvectors, and Λ q is the diagonal matrix of correspondingeigenvalues, Tanaka (1988) finds expressions for I(x; A q Λ q A ′ q)and I(x; A q A ′ q). In discussing a general strategy for analysing influencein multivariate methods, Tanaka (1995) suggests that groups of observationswith similar patterns of influence across a set of parameters may bedetected by means of a PCA of the empirical influence functions for eachparameter.Benasseni (1990) examines a number of measures for comparing principalcomponent subspaces computed with and without one of the observations.After eliminating some possible measures such as the RV-coefficient(Robert and Escoufier, 1976) and Yanai’s generalized coefficient of determination(Yanai, 1980) for being too insensitive to perturbations, he settlesonandρ 1(i) =1−ρ 2(i) =1−q∑k=1q∑k=1‖a k − P (i) a k ‖q‖a k(i) − Pa k(i) ‖qwhere a k , a k(i) are eigenvectors with and without the ith observation,P, P (i) are projection matrices onto the subspaces derived with and withoutthe ith observation, and the summation is over the q eigenvectors withinthe subspace of interest. Benasseni (1990) goes on to find expressions forthe theoretical influence functions for these two quantities, which can thenbe used to compute empirical influences.Reducing the comparison of two subspaces to a single measure inevitablyleads to a loss of information about the structure of the differences be-