12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

10.1. Detection of Outliers Using <strong>Principal</strong> <strong>Component</strong>s 235variables, and will often be extreme with respect to one or both of thesevariables looked at individually.By contrast, the last few PCs may detect outliers that are not apparentwith respect to the original variables. A strong correlation structure betweenvariables implies that there are linear functions of the variables withsmall variances compared to the variances of the original variables. In thesimple height-and-weight example described above, height and weight havea strong positive correlation, so it is possible to writex 2 = βx 1 + ε,where x 1 ,x 2 are height and weight measured about their sample means,β is a positive constant, and ε is a random variable with a much smallervariance than x 1 or x 2 . Therefore the linear functionx 2 − βx 1has a small variance, and the last (in this case the second) PC in an analysisof x 1 ,x 2 has a similar form, namely a 22 x 2 − a 12 x 1 , where a 12 ,a 22 > 0.Calculation of the value of this second PC for each observation will detectobservations such as (175 cm, 25 kg) that are outliers with respect to thecorrelation structure of the data, though not necessarily with respect toindividual variables. Figure 10.2 shows a plot of the data from Figure 10.1,with respect to the PCs derived from the correlation matrix. The outlyingobservation is ‘average’ for the first PC, but very extreme for the second.This argument generalizes readily when the number of variables p isgreater than two; by examining the values of the last few PCs, we may beable to detect observations that violate the correlation structure imposedby the bulk of the data, but that are not necessarily aberrant with respectto individual variables. Of course, it is possible that, if the sample size isrelatively small or if a few observations are sufficiently different from therest, then the outlier(s) may so strongly influence the last few PCs thatthese PCs now reflect mainly the position of the outlier(s) rather than thestructure of the majority of the data. One way of avoiding this maskingor camouflage of outliers is to compute PCs leaving out one (or more)observations and then calculate for the deleted observations the values ofthe last PCs based on the reduced data set. To do this for each observationis a heavy computational burden, but it might be worthwhile in smallsamples where such camouflaging is, in any case, more likely to occur.Alternatively, if PCs are estimated robustly (see Section 10.4), then theinfluence of outliers on the last few PCs should be reduced and it may beunnecessary to repeat the analysis with each observation deleted.A series of scatterplots of pairs of the first few and last few PCs maybe useful in identifying possible outliers. One way of presentating each PCseparately is as a set of parallel boxplots. These have been suggested as ameans of deciding how many PCs to retain (see Section 6.1.5), but theymay also be useful for flagging potential outliers (Besse, 1994).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!