12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

248 10. Outlier Detection, Influential Observations and Robust EstimationTrace Element ConcentrationsThese data, which are discussed by Hawkins and Fatti (1984), consist ofmeasurements of the log concentrations of 12 trace elements in 75 rock-chipsamples. In order to detect outliers, Hawkins and Fatti simply look at thevalues for each observation on each variable, on each PC, and on transformedand rotated PCs. To decide whether an observation is an outlier, acut-off is defined assuming normality and using a Bonferroni bound withsignificance level 0.01. On the original variables, only two observations satisfythis criterion for outliers, but the number of outliers increases to sevenif (unrotated) PCs are used. Six of these seven outlying observations areextreme on one of the last four PCs, and each of these low-variance PCsaccounts for less than 1% of the total variation. The PCs are thus againdetecting observations whose correlation structure differs from the bulk ofthe data, rather than those that are extreme on individual variables. Indeed,one of the ‘outliers’ on the original variables is not detected by thePCs.When transformed and rotated PCs are considered, nine observations aredeclared to be outliers, including all those detected by the original variablesand by the unrotated PCs. There is a suggestion, then, that transfomationand rotation of the PCs as advocated by Hawkins and Fatti (1984) providesan even more powerful tool for detecting outliers.Epidemiological DataBartkowiak et al. (1988) use PCs in a number of ways to search for potentialoutliers in a large epidemiological data set consisting of 2433 observationson 7 variables. They examine the first two and last two PCs from bothcorrelation and covariance matrices. In addition, some of the variables aretransformed to have distributions closer to normality, and the PCAs are repeatedafter transformation. The researchers report that (unlike Garnham’s(1979) analysis of the household formation data) the potential outliersfound by the various analyses overlap only slightly. Different analyses arecapable of identifying different potential outliers.10.2 Influential Observations in a <strong>Principal</strong><strong>Component</strong> <strong>Analysis</strong>Outliers are generally thought of as observations that in some way are atypicalof a data set but, depending on the analysis done, removal of an outliermay or may not have a substantial effect on the results of that analysis.Observations whose removal does have a large effect are called ‘influential,’and, whereas most influential observations are outliers in some respect, outliersneed not be at all influential. Also, whether or not an observation is

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!