12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

366 13. <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> for Special Types of Datataken by a candidate are therefore ‘missing.’ Shibayama (1990) devises amethod for producing a linear combination of the examination scores thatrepresents the overall performance of each candidate. When p ′ = p themethod is equivalent to PCA.Anderson et al. (1983) report a method that they attribute to Dear(1959), which is not for dealing with missing values in a PCA, but whichuses PCA to impute missing data in a more general context. The idea seemsto be to first substitute zeros for any missing cells in the data matrix, andthen find the SVD of this matrix. Finally, the leading term in the SVD,corresponding to the first PC, is used to approximate the missing values.If the data matrix is column-centred, this is a variation on using means ofvariables in place of missing values. Here there is the extra SVD step thatadjusts the mean values using information from other entries in the datamatrix.Finally, note that there is a similarity of purpose in robust estimation ofPCs (see Section 10.4) to that present in handling missing data. In bothcases we identify particular observations which we cannot use in unadjustedform, either because they are suspiciously extreme (in robust estimation),or because they are not given at all (missing values). To completely ignoresuch observations may throw away valuable information, so we attemptto estimate ‘correct’ values for the observations in question. Similar techniquesmay be relevant in each case. For example, we noted above thepossibility of imputing missing values for a particular observation by regressingthe missing variables on the variables present for that observation,an idea that dates back at least to Beale and Little (1975), Frane (1976)and Gleason and Staelin (1975) (see Jackson (1991, Section 14.1.5)). Asimilar idea, namely robust regression of the variables on each other, isincluded in Devlin et al.’s (1981) study of robust estimation of PCs (seeSection 10.4).13.7 PCA in Statistical Process ControlThe topic of this section, finding outliers, is closely linked to that of Section10.1, and many of the techniques used are based on those describedin that section. However, the literature on using PCA in multivariate statisticalprocess control (SPC) is sufficiently extensive to warrant its ownsection. In various manufacturing processes improved technology meansthat greater numbers of variables are now measured in order to monitorwhether or not a process is ‘in control.’ It has therefore become increasinglyrelevant to use multivariate techniques for control purposes, ratherthan simply to monitor each variable separately.The main ways in which PCA is used in this context are (Martin et al.,1999):

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!