Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
10.2. Influential Observations in a Principal Component Analysis 249influential depends on the analysis being done on the data set; observationsthat are influential for one type of analysis or parameter of interest maynot be so for a different analysis or parameter. This behaviour is evident inPCA where observations that are influential for the coefficients of a PC arenot necessarily influential for the variance of that PC, and vice versa. Wehave seen in the Section 10.1 that PCA can be used to search for influentialobservations in a regression analysis. The present section concentrateson looking for observations that are influential for some aspect of a PCA,either the variances, the coefficients (loadings) or the PCs themselves (thePC scores).The intuitive definition of the influence of an observation on a statistic,such as the kth eigenvalue l k or eigenvector a k of a sample covariance matrix,is simply the change in l k or a k , perhaps renormalized in some way,when the observation is deleted from the sample. For example, the sampleinfluence function for the ith observation on a quantity ˆθ, which might bel k or a k , is defined by Critchley (1985) as (n − 1)(ˆθ − ˆθ (i) ), where n is thesample size and ˆθ (i) is the quantity corresponding to ˆθ when the ith observationis omitted from the sample. Gnanadesikan and Kettenring (1972)suggested similar leave-one-out statistics for the correlation coefficient andfor ∑ pk=1 l k. The problems with influence defined in this manner are: theremay be no closed form for the influence function; the influence needs tobe computed afresh for each different sample. Various other definitions ofsample influence have been proposed (see, for example, Cook and Weisberg,1982, Section 3.4; Critchley, 1985); some of these have closed-form expressionsfor regression coefficients (Cook and Weisberg, 1982, Section 3.4), butnot for the statistics of interest in PCA. Alternatively, a theoretical influencefunction may be defined that can be expressed as a once-and-for-allformula and, provided that sample sizes are not too small, can be used toestimate the influence of actual and potential observations within a sample.To define the theoretical influence function, suppose that y is a p-variaterandom vector, and let y have cumulative distribution function (c.d.f.)F (y). If θ is a vector of parameters of the distribution of y (such as λ k , α k ,respectively, the kth eigenvalue and eigenvector of the covariance matrix ofy) then θ can be written as a functional of F .NowletF (y) be perturbedto become˜F (y) =(1− ε)F (y)+εδ x ,where 0
250 10. Outlier Detection, Influential Observations and Robust EstimationAlternatively, if ˜θ is expanded about θ as a power series in ε, that is˜θ = θ + c 1 ε + c 2 ε 2 + ··· , (10.2.1)then the influence function is the coefficient c 1 of ε in this expansion.Some of the above may appear somewhat abstract, but in many situationsan expression can be derived for I(x; θ) without too much difficultyand, as we shall see in examples below, I(x; θ) can provide valuableguidance about the influence of individual observations in samples.The influence functions for λ k and α k are given by Radhakrishnan andKshirsagar (1981) and by Critchley (1985) for the case of covariance matrices.Critchley (1985) also discusses various sample versions of the influencefunction and considers the coefficients of ε 2 ,aswellasε, in the expansion(10.2.1). Pack et al. (1988) give the main results for correlation matrices,which are somewhat different in character from those for covariancematrices. Calder (1986) can be consulted for further details.For covariance matrices, the theoretical influence function for λ k can bewritten very simply asI(x; λ k )=z 2 k − λ k , (10.2.2)where z k is the value of the kth PC for the given value of x, thatis,z k isthe kth element of z, where z = A ′ x, using the same notation as in earlierchapters. Thus, the influence of an observation on λ k depends only on itsscore on the kth component; an observation can be extreme on any or allof the other components without affecting λ k . This illustrates the pointmade earlier that outlying observations need not necessarily be influentialfor every part of an analysis.For correlation matrices, I(x; λ k ) takes a different form, which can bewritten most conveniently asp∑ p∑I(x; λ k )= α ki α kj I(x; ρ ij ), (10.2.3)i=1 j=1i≠jwhere α kj is the jth element of α k , I(x; ρ ij )=− 1 2 ρ ij(x 2 i +x2 j )+x ix j ,andx i ,x j are elements of x standardized to zero mean and unit variance. I(x; ρ ij )is the influence function for the correlation coefficient ρ ij , which is givenby Devlin et al. (1975). The expression (10.2.3) is relatively simple, and itshows that investigation of the influence of an observation on the correlationcoefficients is useful in determining the influence of the observation on λ k .There is a corresponding expression to (10.2.3) for covariance matricesthat expresses I(x; λ k ) in terms of influence functions for the elements ofthe covariance matrix. However, when I(x; λ k ) in (10.2.3) is written interms of x, or the PCs, by substituting for I(x; ρ ij ), it cannot be expressedin as simple a form as in (10.2.2). In particular, I(x; λ k ) now depends onz j ,j=1, 2,...,p, and not just on z k . This result reflects the fact that a
- Page 230 and 231: 9Principal Components Used withOthe
- Page 232 and 233: 9.1. Discriminant Analysis 201on th
- Page 234 and 235: 9.1. Discriminant Analysis 203Figur
- Page 236 and 237: 9.1. Discriminant Analysis 205Corbi
- Page 238 and 239: 9.1. Discriminant Analysis 207that
- Page 240 and 241: 9.1. Discriminant Analysis 209betwe
- Page 242 and 243: 9.2. Cluster Analysis 211dimensiona
- Page 244 and 245: 9.2. Cluster Analysis 213Before loo
- Page 246 and 247: 9.2. Cluster Analysis 215Figure 9.3
- Page 248 and 249: 9.2. Cluster Analysis 217demographi
- Page 250 and 251: 9.2. Cluster Analysis 219county clu
- Page 252 and 253: 9.2. Cluster Analysis 221choosing a
- Page 254 and 255: 9.3. Canonical Correlation Analysis
- Page 256 and 257: 9.3. Canonical Correlation Analysis
- Page 258 and 259: 9.3. Canonical Correlation Analysis
- Page 260 and 261: 9.3. Canonical Correlation Analysis
- Page 262 and 263: 9.3. Canonical Correlation Analysis
- Page 264 and 265: 10.1. Detection of Outliers Using P
- Page 266 and 267: 10.1. Detection of Outliers Using P
- Page 268 and 269: 10.1. Detection of Outliers Using P
- Page 270 and 271: 10.1. Detection of Outliers Using P
- Page 272 and 273: 10.1. Detection of Outliers Using P
- Page 274 and 275: 10.1. Detection of Outliers Using P
- Page 276 and 277: 10.1. Detection of Outliers Using P
- Page 278 and 279: 10.1. Detection of Outliers Using P
- Page 282 and 283: 10.2. Influential Observations in a
- Page 284 and 285: 10.2. Influential Observations in a
- Page 286 and 287: 10.2. Influential Observations in a
- Page 288 and 289: 10.2. Influential Observations in a
- Page 290 and 291: 10.3. Sensitivity and Stability 259
- Page 292 and 293: 10.3. Sensitivity and Stability 261
- Page 294 and 295: 10.4. Robust Estimation of Principa
- Page 296 and 297: 10.4. Robust Estimation of Principa
- Page 298 and 299: 10.4. Robust Estimation of Principa
- Page 300 and 301: 11Rotation and Interpretation ofPri
- Page 302 and 303: 11.1. Rotation of Principal Compone
- Page 304 and 305: oot of the corresponding eigenvalue
- Page 306 and 307: 11.1. Rotation of Principal Compone
- Page 308 and 309: 11.1. Rotation of Principal Compone
- Page 310 and 311: 11.2. Alternatives to Rotation 279w
- Page 312 and 313: 11.2. Alternatives to Rotation 281F
- Page 314 and 315: 11.2. Alternatives to Rotation 283F
- Page 316 and 317: 11.2. Alternatives to Rotation 285T
- Page 318 and 319: 11.2. Alternatives to Rotation 287T
- Page 320 and 321: 11.2. Alternatives to Rotation 289A
- Page 322 and 323: 11.2. Alternatives to Rotation 291
- Page 324 and 325: 11.3. Simplified Approximations to
- Page 326 and 327: 11.3. Simplified Approximations to
- Page 328 and 329: 11.4. Physical Interpretation of Pr
10.2. Influential Observations in a <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> 249influential depends on the analysis being done on the data set; observationsthat are influential for one type of analysis or parameter of interest maynot be so for a different analysis or parameter. This behaviour is evident inPCA where observations that are influential for the coefficients of a PC arenot necessarily influential for the variance of that PC, and vice versa. Wehave seen in the Section 10.1 that PCA can be used to search for influentialobservations in a regression analysis. The present section concentrateson looking for observations that are influential for some aspect of a PCA,either the variances, the coefficients (loadings) or the PCs themselves (thePC scores).The intuitive definition of the influence of an observation on a statistic,such as the kth eigenvalue l k or eigenvector a k of a sample covariance matrix,is simply the change in l k or a k , perhaps renormalized in some way,when the observation is deleted from the sample. For example, the sampleinfluence function for the ith observation on a quantity ˆθ, which might bel k or a k , is defined by Critchley (1985) as (n − 1)(ˆθ − ˆθ (i) ), where n is thesample size and ˆθ (i) is the quantity corresponding to ˆθ when the ith observationis omitted from the sample. Gnanadesikan and Kettenring (1972)suggested similar leave-one-out statistics for the correlation coefficient andfor ∑ pk=1 l k. The problems with influence defined in this manner are: theremay be no closed form for the influence function; the influence needs tobe computed afresh for each different sample. Various other definitions ofsample influence have been proposed (see, for example, Cook and Weisberg,1982, Section 3.4; Critchley, 1985); some of these have closed-form expressionsfor regression coefficients (Cook and Weisberg, 1982, Section 3.4), butnot for the statistics of interest in PCA. Alternatively, a theoretical influencefunction may be defined that can be expressed as a once-and-for-allformula and, provided that sample sizes are not too small, can be used toestimate the influence of actual and potential observations within a sample.To define the theoretical influence function, suppose that y is a p-variaterandom vector, and let y have cumulative distribution function (c.d.f.)F (y). If θ is a vector of parameters of the distribution of y (such as λ k , α k ,respectively, the kth eigenvalue and eigenvector of the covariance matrix ofy) then θ can be written as a functional of F .NowletF (y) be perturbedto become˜F (y) =(1− ε)F (y)+εδ x ,where 0