12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

254 10. Outlier Detection, Influential Observations and Robust Estimationdistribution from which the simulated data are generated. Brooks (1994)gives an example in which 1000 simulated observations are generated froma 7-variable multivariate normal distribution whose parameters are thoseestimated from the available sample of data. Empirical distributions of theestimated theoretical influence functions for eigenvalues and eigenvectors ofthe correlation matrix are constructed from the simulated observations bycomputing the values of these functions for each of the 1000 observations.The actual values of the functions for the same 7 anatomical variables aswere discussed in Sections 5.1.1 and 10.1, but for a different sample of 44students, are then compared to these distributions. Observations whose valuesare in the upper 5% (1%) tails of the distributions can be considered tobe ‘significantly influential’ at the 5% (1%) level. Brooks (1994) finds that,on this basis, 10 of the 44 observations are significant at the 5% level formore than one of the 7 eigenvalues and/or eigenvectors. Brooks (1994) usesthe same reasoning to investigate ‘significantly influential’ observations ina two-dimensional principal component subspace based on Tanaka’s (1988)influence functions I(x; A q Λ q A ′ q)andI(x; A q A ′ q).10.2.1 ExamplesTwo examples are now given, both using data sets that have been discussedearlier. In the first example we examine the usefulness of expressions fortheoretical influence in predicting the actual effect of omitting observationsfor the data on artistic qualities of painters described in Section 5.1.1. Asa second illustration, we follow up the suggestion, made in Section 10.1,that an outlier is largely responsible for the form of the second PC in thestudent anatomical data.Artistic Qualities of PaintersWe consider again the set of four subjectively assessed variables for 54painters, which was described by Davenport and Studdert-Kennedy (1972)and discussed in Section 5.1.1. Tables 10.2 and 10.3 give some comparisonsbetween the values of the influence functions obtained from expressionssuch as (10.2.2), (10.2.3), (10.2.4) and (10.2.5) by substituting sample quantitiesl k , a kj , r ij in place of the unknown λ k , α kj , ρ ij , and the actual changesobserved in eigenvalues and eigenvectors when individual observations areomitted. The information given in Table 10.2 relates to PCs derived fromthe covariance matrix; Table 10.3 gives corresponding results for the correlationmatrix. Some further explanation is necessary of exactly how thenumbers in these two tables are derived.First, the ‘actual’ changes in eigenvalues are precisely that—the differencesbetween eigenvalues with and without a particular observationincluded in the analysis. The tables give the four largest and four smallestsuch changes for each PC, and identify those observations for which

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!