12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

10.2. Influential Observations in a <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> 253tween the two. Ramsier (1991) introduces a graphical method, using ideassimilar to those of Andrews’ curves (see Section 5.6), in which curves representingsubspaces with and without individual observations omitted aredisplayed. It is not easy to deduce exactly how a particular change in subspacestructure is reflected in differences between curves. However, curvesplotted with an observation missing that are close to the curve for the fulldata set imply negligible influence on the subspace for that observation,and similarly shaped curves for different omitted observations suggest thatthese observations have similar detailed effects on the subspace’s structure.Krzanowski (1987a) notes that the algorithm used by Eastment and Krzanowski(1982) to decide how many components to retain (see Section 6.1.5)calculates elements of the singular value decomposition of the data matrixX with individual observations or variables omitted. It can therefore beused to evaluate the influence of each observation on the subspace spannedby however many components it has been decided to keep. Mertens et al.(1995) similarly use a cross-validation algorithm to give easily computedexpressions for the sample influence of observations on the eigenvalues ofa covariance matrix. They also provide a closed form expression for theangle between an eigenvector of that matrix using all the data and thecorresponding eigenvector when an observation is omitted. An example illustratingthe use of these expressions for spectroscopic data is given byMertens (1998), together with some discussion of the relationships betweenmeasures of influence and outlyingness.Wang and Nyquist (1991) provide a number of algebraic results, proofsand approximations relating eigenvalues and eigenvectors of covariance matriceswith and without the removal of one of the n observations. Hadi andNyquist (1993) give improved approximations for eigenvalues, and Wangand Liski (1993) extend the results for both eigenvalues and eigenvectorsto the situation where more than one observation is removed. Comparisonsare made with Critchley’s (1985) results for the special case of a singledeleted observation.Brooks (1994) uses simulation to address the question of when an apparentlyinfluential observation can be declared ‘significant.’ He points outthat the sample influence function (which he confusingly refers to as the‘empirical influence function’) for an observation x i depends on all theother observations in the sample, so that simulation of repeated values ofthis function under some null hypothesis needs whole samples to be generated.The theoretical influence function can be evaluated more easilybecause equations (10.2.2)—(10.2.5) depend only on the eigenvalues andeigenvectors of the correlation or covariance matrix together with the singleobservation whose influence is to be assessed. Thus, if the sample correlationor covariance matrix is used as a surrogate for the corresponding populationmatrix, it is only necessary to simulate individual observations, rather thana whole new sample, in order to generate a value for this influence function.Of course, any simulation study requires some assumption about the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!