12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

394 14. Generalizations and Adaptations of <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong>be used to obtain improved estimates of the coefficients B in the equationpredicting y from x. Kloek and Mennes (1960) examine a number of waysin which PCs of w or PCs of the residuals obtained from regressing w onx or PCs of the combined vector containing all elements of w and x, canbe used as ‘instrumental variables’ in order to obtain improved estimatesof the coefficients B.14.4 Alternatives to <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong>for Non-Normal DistributionsWe have noted several times that for many purposes it is not necessary toassume any particular distribution for the variables x in a PCA, althoughsome of the properties of Chapters 2 and 3 rely on the assumption ofmultivariate normality.One way of handling possible non-normality, especially if the distributionhas heavy tails, is to use robust estimation of the covariance or correlationmatrix, or of the PCs themselves. The estimates may be designedto allow for the presence of aberrant observations in general, or may bebased on a specific non-normal distribution with heavier tails, as in Bacciniet al. (1996) (see Section 10.4). In inference, confidence intervals ortests of hypothesis may be constructed without any need for distributionalassumptions using the bootstrap or jackknife (Section 3.7.2). The paperby Dudziński et al. (1995), which was discussed in Section 10.3, investigatesthe effect of non-normality on repeatability of PCA, albeit in a smallsimulation study.Another possibility is to assume that the vector x of random variableshas a known distribution other than the multivariate normal. A numberof authors have investigated the case of elliptical distributions, of whichthe multivariate normal is a special case. For example, Waternaux (1984)considers the usual test statistic for the null hypothesis H 0q , as defined inSection 6.1.4, of equality of the last (p−q) eigenvalues of the covariance matrix.She shows that, with an adjustment for kurtosis, the same asymptoticdistribution for the test statistic is valid for all elliptical distributions withfinite fourth moments. Jensen (1986) takes this further by demonstratingthat for a range of hypotheses relevant to PCA, tests based on a multivariatenormal assumption have identical level and power for all distributionswith ellipsoidal contours, even those without second moments. Things getmore complicated outside the class of elliptical distributions, as shown byWaternaux (1984) for H 0q .Jensen (1987) calls the linear functions of x that successively maximize‘scatter’ of a conditional distribution, where conditioning is on previouslyderived linear functions, principal variables. Unlike McCabe’s (1984) usageof the same phrase, these ‘principal variables’ are not a subset of the original

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!