12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

242 10. Outlier Detection, Influential Observations and Robust Estimationstudy of methods for detecting multivariate outliers. It did well compared toother methods in some circumstances, particularly when there are multipleoutliers and p is not too large.Before turning to examples, recall that an example in which outliersare detected using PCs in a rather different way was given in Section 5.6.In that example, Andrews’ curves (Andrews, 1972) were computed usingPCs and some of the observations stood out as different from the otherswhen plotted as curves. Further examination of these different observationsshowed that they were indeed ‘outlying’ in some respects, compared to theremaining observations.10.1.1 ExamplesIn this section one example will be discussed in some detail, while threeothers will be described more briefly.Anatomical MeasurementsA set of seven anatomical measurements on 28 students was discussed inSection 5.1.1 and it was found that on a plot of the first two PCs (Figures1.3, 5.1) there was an extreme observation on the second PC. Whenthe measurements of this individual were examined in detail, it was foundthat he had an anomalously small head circumference. Whereas the other27 students all had head girths in the narrow range 21–24 cm, this student(no. 16) had a measurement of 19 cm. It is impossible to check whetherthis was an incorrect measurement or whether student 16 indeed had anunusually small head (his other measurements were close to average), butit is clear that this observation would be regarded as an ‘outlier’ accordingto most definitions of the term.This particular outlier is detected on the second PC, and it was suggestedabove that any outliers detected by high-variance PCs are usuallydetectable on examination of individual variables; this is indeed the casehere. Another point concerning this observation is that it is so extreme onthe second PC that it may be suspected that it alone is largely responsiblefor the direction of this PC. This question will be investigated at the endof Section 10.2, which deals with influential observations.Figure 1.3 indicates one other possible outlier at the extreme left ofthe diagram. This turns out to be the largest student in the class—190cm (6 ft 3 in) tall, with all measurements except head girth at least aslarge as all other 27 students. There is no suspicion here of any incorrectmeasurements.Turning now to the last few PCs, we hope to detect any observationswhich are ‘outliers’ with respect to the correlation structure of the data.Figure 10.3 gives a plot of the scores of the observations for the last twoPCs, and Table 10.1 gives the values of d 2 1i , d2 2i and d 4i, defined in equations(10.1.1), (10.1.2) and (10.1.4), respectively, for the six ‘most extreme’

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!