12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

390 14. Generalizations and Adaptations of <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong>Reyment and Jöreskog (1993, Section 8.7) discuss an application of themethod (which they refer to as Imbrie’s Q-mode method) in a similar contextconcerning the abundance of various marine micro-organisms in corestaken at a number of sites on the seabed. The same authors also suggestthat this type of analysis is relevant for data where the p variables areamounts of p chemical constituents in n soil or rock samples. If the degreeto which two samples have the same proportions of each constituent is consideredto be an important index of similarity between samples, then thesimilarity measure implied by non-centred PCA is appropriate (Reymentand Jöreskog, 1993, Section 5.4). An alternative approach if proportions areof interest is to reduce the data to compositional form (see Section 13.3).The technique of empirical orthogonal teleconnections (van den Doolet al., 2000), described in Section 11.2.3, operates on uncentred data.Here matters are confused by referring to uncentred sums of squares andcross-products as ‘variances’ and ‘correlations.’ Devijver and Kittler (1982,Section 9.3) use similar misleading terminology in a population derivationand discussion of uncentred PCA.Doubly centred PCA was proposed by Buckland and Anderson (1985) asanother method of analysis for data that consist of species counts at varioussites. They argue that centred PCA of such data may be dominated bya ‘size’ component, which measures the relative abundance of the variousspecies. It is possible to simply ignore the first PC, and concentrate on laterPCs, but an alternative is provided by double centering, which ‘removes’ the‘size’ PC. The same idea has been suggested in the analysis of size/shapedata (see Section 13.2). Double centering introduces a component with zeroeigenvalue, because the constraint x i1 +x i2 +...+x ip = 0 now holds for all i.A further alternative for removing the ‘size’ effect of different abundancesof different species is, for some such data sets, to record only whether aspecies is present or absent at each site, rather than the actual counts foreach species.In fact, what is being done in double centering is the same as Mandel’s(1971, 1972) approach to data in a two-way analysis of variance (see Section13.4). It removes main effects due to rows/observations/sites, and dueto columns/variables/species, and concentrates on the interaction betweenspecies and sites. In the regression context, Hoerl et al. (1985) suggestthat double centering can remove ‘non-essential ill-conditioning,’ which iscaused by the presence of a row (observation) effect in the original data.Kazmierczak (1985) advocates a logarithmic transformation of data, followedby double centering. This gives a procedure that is invariant to preandpost-multiplication of the data matrix by diagonal matrices. Hence itis invariant to different weightings of observations and to different scalingsof the variables.One reason for the suggestion of both non-centred and doubly-centredPCA for counts of species at various sites is perhaps that it is not entirely

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!