Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

13.3. Principal Component Analysis for Compositional Data 347At first sight, it might seem that no real difficulty is implied by theconditionx i1 + x i2 + ···+ x ip =1, (13.3.1)which holds for each observation. If a PCA is done on x, there will be a PCwith zero eigenvalue identifying the constraint. This PC can be ignored becauseit is entirely predictable from the form of the data, and the remainingPCs can be interpreted as usual. A counter to this argument is that correlationsand covariances, and hence PCs, cannot be interpreted in the usualway when the constraint is present. In particular, the constraint (13.3.1)introduces a bias towards negative values among the correlations, so thata set of compositional variables that are ‘as independent as possible’ willnot all have zero correlations between them.One way of overcoming this problem is to do the PCA on a subset of(p−1) of the p compositional variables, but this idea has the unsatisfactoryfeature that the choice of which variable to leave out is arbitrary, anddifferent choices will lead to different PCs. For example, suppose that twovariables have much larger variances than the other (p − 2) variables. If aPCA is based on the covariance matrix for (p−1) of the variables, then theresult will vary considerably, depending on whether the omitted variablehas a large or small variance. Furthermore, there remains the restrictionthat the (p−1) chosen variables must sum to no more than unity, so that theinterpretation of correlations and covariances is still not straightforward.The alternative that is suggested by Aitchison (1983) is to replace x byv = log[x/g(x)], where g(x) =( ∏ pi=1 x i) 1 p is the geometric mean of theelements of x. Thus,thejth element of v isv j = log x j − 1 p∑log x i , j =1, 2,...,p. (13.3.2)pi=1A PCA is then done for v rather than x. There is one zero eigenvaluewhose eigenvector is the isometric vector with all elements equal; theremaining eigenvalues are positive and, because the corresponding eigenvectorsare orthogonal to the final eigenvector, they define contrasts (thatis, linear functions whose coefficients sum to zero) for the log x j .Aitchison (1983) also shows that these same functions can equivalentlybe found by basing a PCA on the non-symmetric set of variables v (j) , wherev (j) = log[x (j) /x j ] (13.3.3)and x (j) is the (p−1)-vector obtained by deleting the jth element x j from x.The idea of transforming to logarithms before doing the PCA can, of course,be used for data other than compositional data (see also Section 13.2).However, there are a number of particular advantages of the log-ratio transformation(13.3.2), or equivalently (13.3.3), for compositional data. Theseinclude the following, which are discussed further by Aitchison (1983)

348 13. Principal Component Analysis for Special Types of Data(i) It was noted above that the constraint (13.3.1) introduces a negativebias to the correlations between the elements of x, so that any notionof ‘independence’ between variables will not imply zero correlations. Anumber of ideas have been put forward concerning what should constitute‘independence,’ and what ‘null correlations’ are implied, forcompositional data. Aitchison (1982) presents arguments in favour ofa definition of independence in terms of the structure of the covariancematrix of v (j) (see his equations (4.1) and (5.1)). With this definition,the PCs based on v (or v (j) ) for a set of ‘independent’ variables aresimply the elements of v (or v (j) ) arranged in descending size of theirvariances. This is equivalent to what happens in PCA for ‘ordinary’data with independent variables.(ii) There is a tractable class of probability distributions for v (j) and forlinear contrasts of the elements of v (j) , but there is no such tractableclass for linear contrasts of the elements of x when x is restricted bythe constraint (13.3.1).(iii) Because the log-ratio transformation removes the effect of the constrainton the interpretation of covariance, it is possible to definedistances between separate observations of v in a way that is notpossible with x.(iv) It is easier to examine the variability of subcompositions (subsetsof x renormalized to sum to unity) compared to that of thewhole composition, if the comparison is done in terms of v ratherthan x.Aitchison (1983) provides examples in which the proposed PCA of vis considerably superior to a PCA of x. This seems to be chiefly becausethere is curvature inherent in many compositional data sets; the proposedanalysis is very successful in uncovering correct curved axes of maximumvariation, whereas the usual PCA, which is restricted to linear functions ofx, is not. However, Aitchison’s (1983) proposal does not necessarily makemuch difference to the results of a PCA, as is illustrated in the examplegiven below in Section 13.3.1. Aitchison (1986, Chapter 8) covers similarmaterial to Aitchison (1983), although more detail is given, includingexamples, of the analysis of subdecompositions.A disadvantage of Aitchison’s (1983) approach is that it cannot handlezeros for any of the x j (see equation (13.3.2)). One possibility is to omitfrom the analysis any variables which have zeros, though discarding informationin this way is undesirable. Alternatively, any zeros can be replacedby a small positive number, but the results are sensitive to the choiceof that number. Bacon-Shone (1992) proposes an approach to compositionaldata based on ranks, which allows zeros to be present. The values ofx ij ,i=1, 2,...,n; j =1, 2,...,p are ranked either within rows or withincolumns or across the whole data matrix, and the data values are then

13.3. <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> for Compositional Data 347At first sight, it might seem that no real difficulty is implied by theconditionx i1 + x i2 + ···+ x ip =1, (13.3.1)which holds for each observation. If a PCA is done on x, there will be a PCwith zero eigenvalue identifying the constraint. This PC can be ignored becauseit is entirely predictable from the form of the data, and the remainingPCs can be interpreted as usual. A counter to this argument is that correlationsand covariances, and hence PCs, cannot be interpreted in the usualway when the constraint is present. In particular, the constraint (13.3.1)introduces a bias towards negative values among the correlations, so thata set of compositional variables that are ‘as independent as possible’ willnot all have zero correlations between them.One way of overcoming this problem is to do the PCA on a subset of(p−1) of the p compositional variables, but this idea has the unsatisfactoryfeature that the choice of which variable to leave out is arbitrary, anddifferent choices will lead to different PCs. For example, suppose that twovariables have much larger variances than the other (p − 2) variables. If aPCA is based on the covariance matrix for (p−1) of the variables, then theresult will vary considerably, depending on whether the omitted variablehas a large or small variance. Furthermore, there remains the restrictionthat the (p−1) chosen variables must sum to no more than unity, so that theinterpretation of correlations and covariances is still not straightforward.The alternative that is suggested by Aitchison (1983) is to replace x byv = log[x/g(x)], where g(x) =( ∏ pi=1 x i) 1 p is the geometric mean of theelements of x. Thus,thejth element of v isv j = log x j − 1 p∑log x i , j =1, 2,...,p. (13.3.2)pi=1A PCA is then done for v rather than x. There is one zero eigenvaluewhose eigenvector is the isometric vector with all elements equal; theremaining eigenvalues are positive and, because the corresponding eigenvectorsare orthogonal to the final eigenvector, they define contrasts (thatis, linear functions whose coefficients sum to zero) for the log x j .Aitchison (1983) also shows that these same functions can equivalentlybe found by basing a PCA on the non-symmetric set of variables v (j) , wherev (j) = log[x (j) /x j ] (13.3.3)and x (j) is the (p−1)-vector obtained by deleting the jth element x j from x.The idea of transforming to logarithms before doing the PCA can, of course,be used for data other than compositional data (see also Section 13.2).However, there are a number of particular advantages of the log-ratio transformation(13.3.2), or equivalently (13.3.3), for compositional data. Theseinclude the following, which are discussed further by Aitchison (1983)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!