12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

348 13. <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> for Special Types of Data(i) It was noted above that the constraint (13.3.1) introduces a negativebias to the correlations between the elements of x, so that any notionof ‘independence’ between variables will not imply zero correlations. Anumber of ideas have been put forward concerning what should constitute‘independence,’ and what ‘null correlations’ are implied, forcompositional data. Aitchison (1982) presents arguments in favour ofa definition of independence in terms of the structure of the covariancematrix of v (j) (see his equations (4.1) and (5.1)). With this definition,the PCs based on v (or v (j) ) for a set of ‘independent’ variables aresimply the elements of v (or v (j) ) arranged in descending size of theirvariances. This is equivalent to what happens in PCA for ‘ordinary’data with independent variables.(ii) There is a tractable class of probability distributions for v (j) and forlinear contrasts of the elements of v (j) , but there is no such tractableclass for linear contrasts of the elements of x when x is restricted bythe constraint (13.3.1).(iii) Because the log-ratio transformation removes the effect of the constrainton the interpretation of covariance, it is possible to definedistances between separate observations of v in a way that is notpossible with x.(iv) It is easier to examine the variability of subcompositions (subsetsof x renormalized to sum to unity) compared to that of thewhole composition, if the comparison is done in terms of v ratherthan x.Aitchison (1983) provides examples in which the proposed PCA of vis considerably superior to a PCA of x. This seems to be chiefly becausethere is curvature inherent in many compositional data sets; the proposedanalysis is very successful in uncovering correct curved axes of maximumvariation, whereas the usual PCA, which is restricted to linear functions ofx, is not. However, Aitchison’s (1983) proposal does not necessarily makemuch difference to the results of a PCA, as is illustrated in the examplegiven below in Section 13.3.1. Aitchison (1986, Chapter 8) covers similarmaterial to Aitchison (1983), although more detail is given, includingexamples, of the analysis of subdecompositions.A disadvantage of Aitchison’s (1983) approach is that it cannot handlezeros for any of the x j (see equation (13.3.2)). One possibility is to omitfrom the analysis any variables which have zeros, though discarding informationin this way is undesirable. Alternatively, any zeros can be replacedby a small positive number, but the results are sensitive to the choiceof that number. Bacon-Shone (1992) proposes an approach to compositionaldata based on ranks, which allows zeros to be present. The values ofx ij ,i=1, 2,...,n; j =1, 2,...,p are ranked either within rows or withincolumns or across the whole data matrix, and the data values are then

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!