12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

364 13. <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> for Special Types of Dataof the mean and covariance matrix in the presence of missing values, underthe assumption of multivariate normality. Little and Rubin (1987, Section8.2) describe three versions of the EM algorithm for solving this problem;a number of other authors, for example, Anderson (1957), De Ligny et al.(1981), tackled the same problem earlier by less efficient means.The multivariate normal assumption is a restrictive one, and Little (1988)relaxes it by adapting the EM algorithm to find MLEs when the data arefrom a multivariate t-distribution or from a mixture of two multivariate normalswith different covariance matrices. He calls these ‘robust’ methods fordealing with missing data because they assume longer-tailed distributionsthan multivariate normal. Little (1988) conducts a simulation study, theresults of which demonstrate that his robust MLEs cope well with missingdata, compared to other methods discussed earlier in this section. However,the simulation study is limited to multivariate normal data, and to datafrom distributions that are similar to those assumed by the robust MLEs.It is not clear that the good performance of the robust MLEs would berepeated for other distributions. Little and Rubin (1987, Section 8.3) alsoextend their multivariate normal procedures to deal with covariance matriceson which some structure is imposed. Whilst this may be appropriatefor factor analysis it is less relevant for PCA.Another adaptation of the EM algorithm for estimation of covariancematrices, the regularized EM algorithm, is given by Schneider (2001). Itis particularly useful when the number of variables exceeds the numberof observations. Schneider (2001) adds a diagonal matrix to the currentestimate of the covariance matrix before inverting the matrix, a similaridea to that used in ridge regression.Tipping and Bishop (1999a) take the idea of maximum likelihood estimationusing the EM algorithm further. They suggest an iterative algorithmin which their EM procedure for estimating the probabilistic PCA model(Section 3.9) is combined with Little and Rubin’s (1987) methodology forestimating the parameters of a multivariate normal distribution in the presenceof missing data. The PCs are estimated directly, rather than by goingthrough the intermediate step of estimating the covariance or correlationmatrix. An example in which data are randomly deleted from a data set isused by Tipping and Bishop (1999a) to illustrate their procedure.In the context of satellite-derived sea surface temperature measurementswith missing data caused by cloud cover, Houseago-Stokes and Challenor(2001) compare Tipping and Bishop’s procedure with a standard interpolationtechnique followed by PCA on the interpolated data. The twoprocedures give similar results but the new method is computationallymuch more efficent. This is partly due to the fact that only the first fewPCs are found and that they are calculated directly, without the intermediatestep of estimating the covariance matrix. Houseago-Stokes andChallenor note that the quality of interpolated data using probabilisticPCA depends on the number of components q in the model. In the absence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!