Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

10.2. Influential Observations in a Principal Component Analysis 249influential depends on the analysis being done on the data set; observationsthat are influential for one type of analysis or parameter of interest maynot be so for a different analysis or parameter. This behaviour is evident inPCA where observations that are influential for the coefficients of a PC arenot necessarily influential for the variance of that PC, and vice versa. Wehave seen in the Section 10.1 that PCA can be used to search for influentialobservations in a regression analysis. The present section concentrateson looking for observations that are influential for some aspect of a PCA,either the variances, the coefficients (loadings) or the PCs themselves (thePC scores).The intuitive definition of the influence of an observation on a statistic,such as the kth eigenvalue l k or eigenvector a k of a sample covariance matrix,is simply the change in l k or a k , perhaps renormalized in some way,when the observation is deleted from the sample. For example, the sampleinfluence function for the ith observation on a quantity ˆθ, which might bel k or a k , is defined by Critchley (1985) as (n − 1)(ˆθ − ˆθ (i) ), where n is thesample size and ˆθ (i) is the quantity corresponding to ˆθ when the ith observationis omitted from the sample. Gnanadesikan and Kettenring (1972)suggested similar leave-one-out statistics for the correlation coefficient andfor ∑ pk=1 l k. The problems with influence defined in this manner are: theremay be no closed form for the influence function; the influence needs tobe computed afresh for each different sample. Various other definitions ofsample influence have been proposed (see, for example, Cook and Weisberg,1982, Section 3.4; Critchley, 1985); some of these have closed-form expressionsfor regression coefficients (Cook and Weisberg, 1982, Section 3.4), butnot for the statistics of interest in PCA. Alternatively, a theoretical influencefunction may be defined that can be expressed as a once-and-for-allformula and, provided that sample sizes are not too small, can be used toestimate the influence of actual and potential observations within a sample.To define the theoretical influence function, suppose that y is a p-variaterandom vector, and let y have cumulative distribution function (c.d.f.)F (y). If θ is a vector of parameters of the distribution of y (such as λ k , α k ,respectively, the kth eigenvalue and eigenvector of the covariance matrix ofy) then θ can be written as a functional of F .NowletF (y) be perturbedto become˜F (y) =(1− ε)F (y)+εδ x ,where 0

250 10. Outlier Detection, Influential Observations and Robust EstimationAlternatively, if ˜θ is expanded about θ as a power series in ε, that is˜θ = θ + c 1 ε + c 2 ε 2 + ··· , (10.2.1)then the influence function is the coefficient c 1 of ε in this expansion.Some of the above may appear somewhat abstract, but in many situationsan expression can be derived for I(x; θ) without too much difficultyand, as we shall see in examples below, I(x; θ) can provide valuableguidance about the influence of individual observations in samples.The influence functions for λ k and α k are given by Radhakrishnan andKshirsagar (1981) and by Critchley (1985) for the case of covariance matrices.Critchley (1985) also discusses various sample versions of the influencefunction and considers the coefficients of ε 2 ,aswellasε, in the expansion(10.2.1). Pack et al. (1988) give the main results for correlation matrices,which are somewhat different in character from those for covariancematrices. Calder (1986) can be consulted for further details.For covariance matrices, the theoretical influence function for λ k can bewritten very simply asI(x; λ k )=z 2 k − λ k , (10.2.2)where z k is the value of the kth PC for the given value of x, thatis,z k isthe kth element of z, where z = A ′ x, using the same notation as in earlierchapters. Thus, the influence of an observation on λ k depends only on itsscore on the kth component; an observation can be extreme on any or allof the other components without affecting λ k . This illustrates the pointmade earlier that outlying observations need not necessarily be influentialfor every part of an analysis.For correlation matrices, I(x; λ k ) takes a different form, which can bewritten most conveniently asp∑ p∑I(x; λ k )= α ki α kj I(x; ρ ij ), (10.2.3)i=1 j=1i≠jwhere α kj is the jth element of α k , I(x; ρ ij )=− 1 2 ρ ij(x 2 i +x2 j )+x ix j ,andx i ,x j are elements of x standardized to zero mean and unit variance. I(x; ρ ij )is the influence function for the correlation coefficient ρ ij , which is givenby Devlin et al. (1975). The expression (10.2.3) is relatively simple, and itshows that investigation of the influence of an observation on the correlationcoefficients is useful in determining the influence of the observation on λ k .There is a corresponding expression to (10.2.3) for covariance matricesthat expresses I(x; λ k ) in terms of influence functions for the elements ofthe covariance matrix. However, when I(x; λ k ) in (10.2.3) is written interms of x, or the PCs, by substituting for I(x; ρ ij ), it cannot be expressedin as simple a form as in (10.2.2). In particular, I(x; λ k ) now depends onz j ,j=1, 2,...,p, and not just on z k . This result reflects the fact that a

10.2. Influential Observations in a <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> 249influential depends on the analysis being done on the data set; observationsthat are influential for one type of analysis or parameter of interest maynot be so for a different analysis or parameter. This behaviour is evident inPCA where observations that are influential for the coefficients of a PC arenot necessarily influential for the variance of that PC, and vice versa. Wehave seen in the Section 10.1 that PCA can be used to search for influentialobservations in a regression analysis. The present section concentrateson looking for observations that are influential for some aspect of a PCA,either the variances, the coefficients (loadings) or the PCs themselves (thePC scores).The intuitive definition of the influence of an observation on a statistic,such as the kth eigenvalue l k or eigenvector a k of a sample covariance matrix,is simply the change in l k or a k , perhaps renormalized in some way,when the observation is deleted from the sample. For example, the sampleinfluence function for the ith observation on a quantity ˆθ, which might bel k or a k , is defined by Critchley (1985) as (n − 1)(ˆθ − ˆθ (i) ), where n is thesample size and ˆθ (i) is the quantity corresponding to ˆθ when the ith observationis omitted from the sample. Gnanadesikan and Kettenring (1972)suggested similar leave-one-out statistics for the correlation coefficient andfor ∑ pk=1 l k. The problems with influence defined in this manner are: theremay be no closed form for the influence function; the influence needs tobe computed afresh for each different sample. Various other definitions ofsample influence have been proposed (see, for example, Cook and Weisberg,1982, Section 3.4; Critchley, 1985); some of these have closed-form expressionsfor regression coefficients (Cook and Weisberg, 1982, Section 3.4), butnot for the statistics of interest in PCA. Alternatively, a theoretical influencefunction may be defined that can be expressed as a once-and-for-allformula and, provided that sample sizes are not too small, can be used toestimate the influence of actual and potential observations within a sample.To define the theoretical influence function, suppose that y is a p-variaterandom vector, and let y have cumulative distribution function (c.d.f.)F (y). If θ is a vector of parameters of the distribution of y (such as λ k , α k ,respectively, the kth eigenvalue and eigenvector of the covariance matrix ofy) then θ can be written as a functional of F .NowletF (y) be perturbedto become˜F (y) =(1− ε)F (y)+εδ x ,where 0

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!