Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
13.3. Principal Component Analysis for Compositional Data 347At first sight, it might seem that no real difficulty is implied by theconditionx i1 + x i2 + ···+ x ip =1, (13.3.1)which holds for each observation. If a PCA is done on x, there will be a PCwith zero eigenvalue identifying the constraint. This PC can be ignored becauseit is entirely predictable from the form of the data, and the remainingPCs can be interpreted as usual. A counter to this argument is that correlationsand covariances, and hence PCs, cannot be interpreted in the usualway when the constraint is present. In particular, the constraint (13.3.1)introduces a bias towards negative values among the correlations, so thata set of compositional variables that are ‘as independent as possible’ willnot all have zero correlations between them.One way of overcoming this problem is to do the PCA on a subset of(p−1) of the p compositional variables, but this idea has the unsatisfactoryfeature that the choice of which variable to leave out is arbitrary, anddifferent choices will lead to different PCs. For example, suppose that twovariables have much larger variances than the other (p − 2) variables. If aPCA is based on the covariance matrix for (p−1) of the variables, then theresult will vary considerably, depending on whether the omitted variablehas a large or small variance. Furthermore, there remains the restrictionthat the (p−1) chosen variables must sum to no more than unity, so that theinterpretation of correlations and covariances is still not straightforward.The alternative that is suggested by Aitchison (1983) is to replace x byv = log[x/g(x)], where g(x) =( ∏ pi=1 x i) 1 p is the geometric mean of theelements of x. Thus,thejth element of v isv j = log x j − 1 p∑log x i , j =1, 2,...,p. (13.3.2)pi=1A PCA is then done for v rather than x. There is one zero eigenvaluewhose eigenvector is the isometric vector with all elements equal; theremaining eigenvalues are positive and, because the corresponding eigenvectorsare orthogonal to the final eigenvector, they define contrasts (thatis, linear functions whose coefficients sum to zero) for the log x j .Aitchison (1983) also shows that these same functions can equivalentlybe found by basing a PCA on the non-symmetric set of variables v (j) , wherev (j) = log[x (j) /x j ] (13.3.3)and x (j) is the (p−1)-vector obtained by deleting the jth element x j from x.The idea of transforming to logarithms before doing the PCA can, of course,be used for data other than compositional data (see also Section 13.2).However, there are a number of particular advantages of the log-ratio transformation(13.3.2), or equivalently (13.3.3), for compositional data. Theseinclude the following, which are discussed further by Aitchison (1983)
348 13. Principal Component Analysis for Special Types of Data(i) It was noted above that the constraint (13.3.1) introduces a negativebias to the correlations between the elements of x, so that any notionof ‘independence’ between variables will not imply zero correlations. Anumber of ideas have been put forward concerning what should constitute‘independence,’ and what ‘null correlations’ are implied, forcompositional data. Aitchison (1982) presents arguments in favour ofa definition of independence in terms of the structure of the covariancematrix of v (j) (see his equations (4.1) and (5.1)). With this definition,the PCs based on v (or v (j) ) for a set of ‘independent’ variables aresimply the elements of v (or v (j) ) arranged in descending size of theirvariances. This is equivalent to what happens in PCA for ‘ordinary’data with independent variables.(ii) There is a tractable class of probability distributions for v (j) and forlinear contrasts of the elements of v (j) , but there is no such tractableclass for linear contrasts of the elements of x when x is restricted bythe constraint (13.3.1).(iii) Because the log-ratio transformation removes the effect of the constrainton the interpretation of covariance, it is possible to definedistances between separate observations of v in a way that is notpossible with x.(iv) It is easier to examine the variability of subcompositions (subsetsof x renormalized to sum to unity) compared to that of thewhole composition, if the comparison is done in terms of v ratherthan x.Aitchison (1983) provides examples in which the proposed PCA of vis considerably superior to a PCA of x. This seems to be chiefly becausethere is curvature inherent in many compositional data sets; the proposedanalysis is very successful in uncovering correct curved axes of maximumvariation, whereas the usual PCA, which is restricted to linear functions ofx, is not. However, Aitchison’s (1983) proposal does not necessarily makemuch difference to the results of a PCA, as is illustrated in the examplegiven below in Section 13.3.1. Aitchison (1986, Chapter 8) covers similarmaterial to Aitchison (1983), although more detail is given, includingexamples, of the analysis of subdecompositions.A disadvantage of Aitchison’s (1983) approach is that it cannot handlezeros for any of the x j (see equation (13.3.2)). One possibility is to omitfrom the analysis any variables which have zeros, though discarding informationin this way is undesirable. Alternatively, any zeros can be replacedby a small positive number, but the results are sensitive to the choiceof that number. Bacon-Shone (1992) proposes an approach to compositionaldata based on ranks, which allows zeros to be present. The values ofx ij ,i=1, 2,...,n; j =1, 2,...,p are ranked either within rows or withincolumns or across the whole data matrix, and the data values are then
- Page 328 and 329: 11.4. Physical Interpretation of Pr
- Page 330 and 331: 12Principal Component Analysis forT
- Page 332 and 333: 12.1. Introduction 301series is alm
- Page 334 and 335: 12.2. PCA and Atmospheric Time Seri
- Page 336 and 337: 12.2. PCA and Atmospheric Time Seri
- Page 338 and 339: and a typical row of the matrix is1
- Page 340 and 341: 12.2. PCA and Atmospheric Time Seri
- Page 342 and 343: 12.2. PCA and Atmospheric Time Seri
- Page 344 and 345: 12.2. PCA and Atmospheric Time Seri
- Page 346 and 347: 12.2. PCA and Atmospheric Time Seri
- Page 348 and 349: 12.3. Functional PCA 317A key refer
- Page 350 and 351: 12.3. Functional PCA 319The sample
- Page 352 and 353: 12.3. Functional PCA 321speed (mete
- Page 354 and 355: 12.3. Functional PCA 323of the data
- Page 356 and 357: 12.3. Functional PCA 325subject to
- Page 358 and 359: 12.3. Functional PCA 327series than
- Page 360 and 361: 12.4. PCA and Non-Independent Data
- Page 362 and 363: 12.4. PCA and Non-Independent Data
- Page 364 and 365: 12.4. PCA and Non-Independent Data
- Page 366 and 367: 12.4. PCA and Non-Independent Data
- Page 368 and 369: 12.4. PCA and Non-Independent Data
- Page 370 and 371: 13.1. Principal Component Analysis
- Page 372 and 373: 13.1. Principal Component Analysis
- Page 374 and 375: 13.2. Analysis of Size and Shape 34
- Page 376 and 377: 13.2. Analysis of Size and Shape 34
- Page 380 and 381: 13.3. Principal Component Analysis
- Page 382 and 383: 13.4. Principal Component Analysis
- Page 384 and 385: 13.4. Principal Component Analysis
- Page 386 and 387: 13.5. Common Principal Components 3
- Page 388 and 389: 13.5. Common Principal Components 3
- Page 390 and 391: 13.5. Common Principal Components 3
- Page 392 and 393: 13.5. Common Principal Components 3
- Page 394 and 395: 13.6. Principal Component Analysis
- Page 396 and 397: 13.6. Principal Component Analysis
- Page 398 and 399: 13.7. PCA in Statistical Process Co
- Page 400 and 401: 13.8. Some Other Types of Data 369A
- Page 402 and 403: 13.8. Some Other Types of Data 371d
- Page 404 and 405: 14Generalizations and Adaptations o
- Page 406 and 407: 14.1. Non-Linear Extensions of Prin
- Page 408 and 409: 14.1. Additive Principal Components
- Page 410 and 411: 14.1. Additive Principal Components
- Page 412 and 413: 14.1. Additive Principal Components
- Page 414 and 415: 14.2. Weights, Metrics, Transformat
- Page 416 and 417: 14.2. Weights, Metrics, Transformat
- Page 418 and 419: 14.2. Weights, Metrics, Transformat
- Page 420 and 421: 14.2. Weights, Metrics, Transformat
- Page 422 and 423: 14.2. Weights, Metrics, Transformat
- Page 424 and 425: 14.3. PCs in the Presence of Second
- Page 426 and 427: 14.4. PCA for Non-Normal Distributi
13.3. <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> for Compositional Data 347At first sight, it might seem that no real difficulty is implied by theconditionx i1 + x i2 + ···+ x ip =1, (13.3.1)which holds for each observation. If a PCA is done on x, there will be a PCwith zero eigenvalue identifying the constraint. This PC can be ignored becauseit is entirely predictable from the form of the data, and the remainingPCs can be interpreted as usual. A counter to this argument is that correlationsand covariances, and hence PCs, cannot be interpreted in the usualway when the constraint is present. In particular, the constraint (13.3.1)introduces a bias towards negative values among the correlations, so thata set of compositional variables that are ‘as independent as possible’ willnot all have zero correlations between them.One way of overcoming this problem is to do the PCA on a subset of(p−1) of the p compositional variables, but this idea has the unsatisfactoryfeature that the choice of which variable to leave out is arbitrary, anddifferent choices will lead to different PCs. For example, suppose that twovariables have much larger variances than the other (p − 2) variables. If aPCA is based on the covariance matrix for (p−1) of the variables, then theresult will vary considerably, depending on whether the omitted variablehas a large or small variance. Furthermore, there remains the restrictionthat the (p−1) chosen variables must sum to no more than unity, so that theinterpretation of correlations and covariances is still not straightforward.The alternative that is suggested by Aitchison (1983) is to replace x byv = log[x/g(x)], where g(x) =( ∏ pi=1 x i) 1 p is the geometric mean of theelements of x. Thus,thejth element of v isv j = log x j − 1 p∑log x i , j =1, 2,...,p. (13.3.2)pi=1A PCA is then done for v rather than x. There is one zero eigenvaluewhose eigenvector is the isometric vector with all elements equal; theremaining eigenvalues are positive and, because the corresponding eigenvectorsare orthogonal to the final eigenvector, they define contrasts (thatis, linear functions whose coefficients sum to zero) for the log x j .Aitchison (1983) also shows that these same functions can equivalentlybe found by basing a PCA on the non-symmetric set of variables v (j) , wherev (j) = log[x (j) /x j ] (13.3.3)and x (j) is the (p−1)-vector obtained by deleting the jth element x j from x.The idea of transforming to logarithms before doing the PCA can, of course,be used for data other than compositional data (see also Section 13.2).However, there are a number of particular advantages of the log-ratio transformation(13.3.2), or equivalently (13.3.3), for compositional data. Theseinclude the following, which are discussed further by Aitchison (1983)