Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
13.3. Principal Component Analysis for Compositional Data 349replaced by their ranks, which range from 1 to p, from1ton, orfrom1tonp, respectively, depending on the type of ranking. These ranks are thenscaled within each row so that each row sum equals 1, as is true for theoriginal data.Bacon-Shone (1992) does not use PCA on these rank-transformed data,but Baxter (1993) does. He looks at several approaches for a number ofcompositional examples from archaeology, and demonstrates that for typicalarchaeological data, which often include zeros, Bacon-Shone’s (1992)procedure is unsatisfactory because it is too sensitive to the ranked zeros.Baxter (1993) also shows that both Aitchison’s and Bacon-Shone’sapproaches can be misleading when there are small but non-zero elementsin the data. He claims that simply ignoring the compositional nature of thedata and performing PCA on the original data is often a more informativealternative in archaeology than these approaches.Kaciak and Sheahan (1988) advocate the use of uncentred PCA(see Section 14.2.3), apparently without a log transformation, for theanalysis of compositional data, and use it in a market segmentationexample.13.3.1 Example: 100 km Running DataIn Sections 5.3 and 12.3.3, a data set was discussed which consisted oftimes taken for each of ten 10 km sections by 80 competitors in a 100 kmrace. If, instead of recording the actual time taken in each section, we lookat the proportion of the total time taken for each section, the data thenbecome compositional in nature. A PCA was carried out on these compositionaldata, and so was a modified analysis as proposed by Aitchison(1983). The coefficients and variances for the first two PCs are given forthe unmodified and modified analyses in Tables 13.1, 13.2, respectively.It can be seen that the PCs defined in Tables 13.1 and 13.2 have verysimilar coefficients, with angles between corresponding vectors of coefficientsequal to 8 ◦ for both first and second PCs. This similarity continueswith later PCs. The first PC is essentially a linear contrast between timesearly and late in the race, whereas the second PC is a ‘quadratic’ contrastwith times early and late in the race contrasted with those in themiddle.Comparison of Tables 13.1 and 13.2 with Table 5.2 shows that convertingthe data to compositional form has removed the first (overall time)component, but the coefficients for the second PC in Table 5.2 are verysimilar to those of the first PC in Tables 13.1 and 13.2. This correspondencecontinues to later PCs, with the third, fourth, ...PCs for the ‘raw’data having similar coefficients to those of the second, third,. . . PCs for thecompositional data.
350 13. Principal Component Analysis for Special Types of DataTable 13.1. First two PCs: 100 km compositional data.Coefficients CoefficientsComponent 1 Component 2First 10 km 0.42 0.19Second 10 km 0.44 0.18Third 10 km 0.44 0.00Fourth 10 km 0.40 −0.23Fifth 10 km 0.05 −0.56Sixth 10 km −0.18 −0.53Seventh 10 km −0.20 −0.15Eighth 10 km −0.27 −0.07Ninth 10 km −0.24 0.30Tenth 10 km −0.27 0.41Eigenvalue 4.30 2.31Cumulative percentageof total variation 43.0 66.1Table 13.2. First two PCs: Aitchison’s (1983) technique for 100 km compositionaldata.Coefficients CoefficientsComponent 1 Component 2First 10 km 0.41 0.19Second 10 km 0.44 0.17Third 10 km 0.42 −0.06Fourth 10 km 0.36 −0.31Fifth 10 km −0.04 −0.57Sixth 10 km −0.25 −0.48Seventh 10 km −0.24 −0.08Eighth 10 km −0.30 −0.01Ninth 10 km −0.24 0.30Tenth 10 km −0.25 0.43Eigenvalue 4.38 2.29Cumulative percentageof total variation 43.8 66.6
- Page 330 and 331: 12Principal Component Analysis forT
- Page 332 and 333: 12.1. Introduction 301series is alm
- Page 334 and 335: 12.2. PCA and Atmospheric Time Seri
- Page 336 and 337: 12.2. PCA and Atmospheric Time Seri
- Page 338 and 339: and a typical row of the matrix is1
- Page 340 and 341: 12.2. PCA and Atmospheric Time Seri
- Page 342 and 343: 12.2. PCA and Atmospheric Time Seri
- Page 344 and 345: 12.2. PCA and Atmospheric Time Seri
- Page 346 and 347: 12.2. PCA and Atmospheric Time Seri
- Page 348 and 349: 12.3. Functional PCA 317A key refer
- Page 350 and 351: 12.3. Functional PCA 319The sample
- Page 352 and 353: 12.3. Functional PCA 321speed (mete
- Page 354 and 355: 12.3. Functional PCA 323of the data
- Page 356 and 357: 12.3. Functional PCA 325subject to
- Page 358 and 359: 12.3. Functional PCA 327series than
- Page 360 and 361: 12.4. PCA and Non-Independent Data
- Page 362 and 363: 12.4. PCA and Non-Independent Data
- Page 364 and 365: 12.4. PCA and Non-Independent Data
- Page 366 and 367: 12.4. PCA and Non-Independent Data
- Page 368 and 369: 12.4. PCA and Non-Independent Data
- Page 370 and 371: 13.1. Principal Component Analysis
- Page 372 and 373: 13.1. Principal Component Analysis
- Page 374 and 375: 13.2. Analysis of Size and Shape 34
- Page 376 and 377: 13.2. Analysis of Size and Shape 34
- Page 378 and 379: 13.3. Principal Component Analysis
- Page 382 and 383: 13.4. Principal Component Analysis
- Page 384 and 385: 13.4. Principal Component Analysis
- Page 386 and 387: 13.5. Common Principal Components 3
- Page 388 and 389: 13.5. Common Principal Components 3
- Page 390 and 391: 13.5. Common Principal Components 3
- Page 392 and 393: 13.5. Common Principal Components 3
- Page 394 and 395: 13.6. Principal Component Analysis
- Page 396 and 397: 13.6. Principal Component Analysis
- Page 398 and 399: 13.7. PCA in Statistical Process Co
- Page 400 and 401: 13.8. Some Other Types of Data 369A
- Page 402 and 403: 13.8. Some Other Types of Data 371d
- Page 404 and 405: 14Generalizations and Adaptations o
- Page 406 and 407: 14.1. Non-Linear Extensions of Prin
- Page 408 and 409: 14.1. Additive Principal Components
- Page 410 and 411: 14.1. Additive Principal Components
- Page 412 and 413: 14.1. Additive Principal Components
- Page 414 and 415: 14.2. Weights, Metrics, Transformat
- Page 416 and 417: 14.2. Weights, Metrics, Transformat
- Page 418 and 419: 14.2. Weights, Metrics, Transformat
- Page 420 and 421: 14.2. Weights, Metrics, Transformat
- Page 422 and 423: 14.2. Weights, Metrics, Transformat
- Page 424 and 425: 14.3. PCs in the Presence of Second
- Page 426 and 427: 14.4. PCA for Non-Normal Distributi
- Page 428 and 429: 14.5. Three-Mode, Multiway and Mult
13.3. <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> for Compositional Data 349replaced by their ranks, which range from 1 to p, from1ton, orfrom1tonp, respectively, depending on the type of ranking. These ranks are thenscaled within each row so that each row sum equals 1, as is true for theoriginal data.Bacon-Shone (1992) does not use PCA on these rank-transformed data,but Baxter (1993) does. He looks at several approaches for a number ofcompositional examples from archaeology, and demonstrates that for typicalarchaeological data, which often include zeros, Bacon-Shone’s (1992)procedure is unsatisfactory because it is too sensitive to the ranked zeros.Baxter (1993) also shows that both Aitchison’s and Bacon-Shone’sapproaches can be misleading when there are small but non-zero elementsin the data. He claims that simply ignoring the compositional nature of thedata and performing PCA on the original data is often a more informativealternative in archaeology than these approaches.Kaciak and Sheahan (1988) advocate the use of uncentred PCA(see Section 14.2.3), apparently without a log transformation, for theanalysis of compositional data, and use it in a market segmentationexample.13.3.1 Example: 100 km Running DataIn Sections 5.3 and 12.3.3, a data set was discussed which consisted oftimes taken for each of ten 10 km sections by 80 competitors in a 100 kmrace. If, instead of recording the actual time taken in each section, we lookat the proportion of the total time taken for each section, the data thenbecome compositional in nature. A PCA was carried out on these compositionaldata, and so was a modified analysis as proposed by Aitchison(1983). The coefficients and variances for the first two PCs are given forthe unmodified and modified analyses in Tables 13.1, 13.2, respectively.It can be seen that the PCs defined in Tables 13.1 and 13.2 have verysimilar coefficients, with angles between corresponding vectors of coefficientsequal to 8 ◦ for both first and second PCs. This similarity continueswith later PCs. The first PC is essentially a linear contrast between timesearly and late in the race, whereas the second PC is a ‘quadratic’ contrastwith times early and late in the race contrasted with those in themiddle.Comparison of Tables 13.1 and 13.2 with Table 5.2 shows that convertingthe data to compositional form has removed the first (overall time)component, but the coefficients for the second PC in Table 5.2 are verysimilar to those of the first PC in Tables 13.1 and 13.2. This correspondencecontinues to later PCs, with the third, fourth, ...PCs for the ‘raw’data having similar coefficients to those of the second, third,. . . PCs for thecompositional data.