Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

13.3. Principal Component Analysis for Compositional Data 349replaced by their ranks, which range from 1 to p, from1ton, orfrom1tonp, respectively, depending on the type of ranking. These ranks are thenscaled within each row so that each row sum equals 1, as is true for theoriginal data.Bacon-Shone (1992) does not use PCA on these rank-transformed data,but Baxter (1993) does. He looks at several approaches for a number ofcompositional examples from archaeology, and demonstrates that for typicalarchaeological data, which often include zeros, Bacon-Shone’s (1992)procedure is unsatisfactory because it is too sensitive to the ranked zeros.Baxter (1993) also shows that both Aitchison’s and Bacon-Shone’sapproaches can be misleading when there are small but non-zero elementsin the data. He claims that simply ignoring the compositional nature of thedata and performing PCA on the original data is often a more informativealternative in archaeology than these approaches.Kaciak and Sheahan (1988) advocate the use of uncentred PCA(see Section 14.2.3), apparently without a log transformation, for theanalysis of compositional data, and use it in a market segmentationexample.13.3.1 Example: 100 km Running DataIn Sections 5.3 and 12.3.3, a data set was discussed which consisted oftimes taken for each of ten 10 km sections by 80 competitors in a 100 kmrace. If, instead of recording the actual time taken in each section, we lookat the proportion of the total time taken for each section, the data thenbecome compositional in nature. A PCA was carried out on these compositionaldata, and so was a modified analysis as proposed by Aitchison(1983). The coefficients and variances for the first two PCs are given forthe unmodified and modified analyses in Tables 13.1, 13.2, respectively.It can be seen that the PCs defined in Tables 13.1 and 13.2 have verysimilar coefficients, with angles between corresponding vectors of coefficientsequal to 8 ◦ for both first and second PCs. This similarity continueswith later PCs. The first PC is essentially a linear contrast between timesearly and late in the race, whereas the second PC is a ‘quadratic’ contrastwith times early and late in the race contrasted with those in themiddle.Comparison of Tables 13.1 and 13.2 with Table 5.2 shows that convertingthe data to compositional form has removed the first (overall time)component, but the coefficients for the second PC in Table 5.2 are verysimilar to those of the first PC in Tables 13.1 and 13.2. This correspondencecontinues to later PCs, with the third, fourth, ...PCs for the ‘raw’data having similar coefficients to those of the second, third,. . . PCs for thecompositional data.

350 13. Principal Component Analysis for Special Types of DataTable 13.1. First two PCs: 100 km compositional data.Coefficients CoefficientsComponent 1 Component 2First 10 km 0.42 0.19Second 10 km 0.44 0.18Third 10 km 0.44 0.00Fourth 10 km 0.40 −0.23Fifth 10 km 0.05 −0.56Sixth 10 km −0.18 −0.53Seventh 10 km −0.20 −0.15Eighth 10 km −0.27 −0.07Ninth 10 km −0.24 0.30Tenth 10 km −0.27 0.41Eigenvalue 4.30 2.31Cumulative percentageof total variation 43.0 66.1Table 13.2. First two PCs: Aitchison’s (1983) technique for 100 km compositionaldata.Coefficients CoefficientsComponent 1 Component 2First 10 km 0.41 0.19Second 10 km 0.44 0.17Third 10 km 0.42 −0.06Fourth 10 km 0.36 −0.31Fifth 10 km −0.04 −0.57Sixth 10 km −0.25 −0.48Seventh 10 km −0.24 −0.08Eighth 10 km −0.30 −0.01Ninth 10 km −0.24 0.30Tenth 10 km −0.25 0.43Eigenvalue 4.38 2.29Cumulative percentageof total variation 43.8 66.6

13.3. <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> for Compositional Data 349replaced by their ranks, which range from 1 to p, from1ton, orfrom1tonp, respectively, depending on the type of ranking. These ranks are thenscaled within each row so that each row sum equals 1, as is true for theoriginal data.Bacon-Shone (1992) does not use PCA on these rank-transformed data,but Baxter (1993) does. He looks at several approaches for a number ofcompositional examples from archaeology, and demonstrates that for typicalarchaeological data, which often include zeros, Bacon-Shone’s (1992)procedure is unsatisfactory because it is too sensitive to the ranked zeros.Baxter (1993) also shows that both Aitchison’s and Bacon-Shone’sapproaches can be misleading when there are small but non-zero elementsin the data. He claims that simply ignoring the compositional nature of thedata and performing PCA on the original data is often a more informativealternative in archaeology than these approaches.Kaciak and Sheahan (1988) advocate the use of uncentred PCA(see Section 14.2.3), apparently without a log transformation, for theanalysis of compositional data, and use it in a market segmentationexample.13.3.1 Example: 100 km Running DataIn Sections 5.3 and 12.3.3, a data set was discussed which consisted oftimes taken for each of ten 10 km sections by 80 competitors in a 100 kmrace. If, instead of recording the actual time taken in each section, we lookat the proportion of the total time taken for each section, the data thenbecome compositional in nature. A PCA was carried out on these compositionaldata, and so was a modified analysis as proposed by Aitchison(1983). The coefficients and variances for the first two PCs are given forthe unmodified and modified analyses in Tables 13.1, 13.2, respectively.It can be seen that the PCs defined in Tables 13.1 and 13.2 have verysimilar coefficients, with angles between corresponding vectors of coefficientsequal to 8 ◦ for both first and second PCs. This similarity continueswith later PCs. The first PC is essentially a linear contrast between timesearly and late in the race, whereas the second PC is a ‘quadratic’ contrastwith times early and late in the race contrasted with those in themiddle.Comparison of Tables 13.1 and 13.2 with Table 5.2 shows that convertingthe data to compositional form has removed the first (overall time)component, but the coefficients for the second PC in Table 5.2 are verysimilar to those of the first PC in Tables 13.1 and 13.2. This correspondencecontinues to later PCs, with the third, fourth, ...PCs for the ‘raw’data having similar coefficients to those of the second, third,. . . PCs for thecompositional data.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!