Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

4.1. Anatomical Measurements 65Table 4.1. First three PCs: student anatomical measurements.Component number 1 2 3⎫WomenHand 0.33 0.56 0.03Wrist 0.26 0.62 0.11Height ⎪⎬0.40 −0.44 −0.00Forearm Coefficients 0.41 −0.05 −0.55Head 0.27 −0.19 0.80Chest 0.45 −0.26 −0.12⎪⎭Waist 0.47 0.03 −0.03Eigenvalue 3.72 1.37 0.97Cumulative percentageof total variation 53.2 72.7 86.5⎫MenHand 0.23 0.62 0.64Wrist 0.29 0.53 −0.42Height ⎪⎬0.43 −0.20 0.04Forearm Coefficients 0.33 −0.53 0.38Head 0.41 −0.09 −0.51Chest 0.44 0.08 −0.01⎪⎭Waist 0.46 −0.07 0.09Eigenvalue 4.17 1.26 0.66Cumulative percentageof total variation 59.6 77.6 87.0The PCA was done on the correlation matrix, even though it could beargued that, since all measurements are made in the same units, the covariancematrix might be more appropriate (see Sections 2.3 and 3.3). Thecorrelation matrix was preferred because it was desired to treat all variableson an equal footing: the covariance matrix gives greater weight to larger,and hence more variable, measurements, such as height and chest girth, andless weight to smaller measurements such as wrist girth and hand length.Some of the results of the PC analyses, done separately for women andmen, are given in Tables 4.1 and 4.2.It can be seen that the form of the first two PCs is similar for the twosexes, with some similarity, too, for the third PC. Bearing in mind the smallsample sizes, and the consequent large sampling variation in PC coefficients,it seems that the major sources of variation in the measurements, as givenby the first three PCs, are similar for each sex. A combined PCA using all 28

66 4. Interpreting Principal Components: ExamplesTable 4.2. Simplified version of the coefficients in Table 4.1.Component number 1 2 3WomenHand + +Wrist + +Height + −Forearm + −Head + (−) +Chest + (−)Waist +MenHand + + +Wrist + + −Height + (−)Forearm + − +Head + −Chest +Waist +observations therefore seems appropriate, in order to get better estimatesof the first three PCs. It is, of course, possible that later PCs are differentfor the two sexes, and that combining all 28 observations will obscure suchdifferences. However, if we are interested solely in interpreting the first few,high variance, PCs, then this potential problem is likely to be relativelyunimportant.Before we attempt to interpret the PCs, some explanation of Table 4.2is necessary. Typically, computer packages that produce PCs give the coefficientsto several decimal places. When we interpret PCs, as with othertypes of tabular data, it is usually only the general pattern of the coefficientsthat is really of interest, not values to several decimal places, whichmay give a false impression of precision. Table 4.1 gives only two decimalplaces and Table 4.2 simplifies still further. A + or − in Table 4.2 indicatesa coefficient whose absolute value is greater than half the maximum coefficient(again in absolute value) for the relevant PC; the sign of the coefficientis also indicated. Similarly, a (+) or (−) indicates a coefficient whose absolutevalue is between a quarter and a half of the largest absolute valuefor the PC of interest. There are, of course, many ways of constructing asimplified version of the PC coefficients in Table 4.1. For example, anotherpossibility is to rescale the coefficients in each PC so that the maximumvalue is ±1, and tabulate only the values of the coefficients, rounded toone decimal place whose absolute values are above a certain cut-off, say 0.5or 0.7. Values of coefficients below the cut-off are omitted, leaving blank

66 4. Interpreting <strong>Principal</strong> <strong>Component</strong>s: ExamplesTable 4.2. Simplified version of the coefficients in Table 4.1.<strong>Component</strong> number 1 2 3WomenHand + +Wrist + +Height + −Forearm + −Head + (−) +Chest + (−)Waist +MenHand + + +Wrist + + −Height + (−)Forearm + − +Head + −Chest +Waist +observations therefore seems appropriate, in order to get better estimatesof the first three PCs. It is, of course, possible that later PCs are differentfor the two sexes, and that combining all 28 observations will obscure suchdifferences. However, if we are interested solely in interpreting the first few,high variance, PCs, then this potential problem is likely to be relativelyunimportant.Before we attempt to interpret the PCs, some explanation of Table 4.2is necessary. Typically, computer packages that produce PCs give the coefficientsto several decimal places. When we interpret PCs, as with othertypes of tabular data, it is usually only the general pattern of the coefficientsthat is really of interest, not values to several decimal places, whichmay give a false impression of precision. Table 4.1 gives only two decimalplaces and Table 4.2 simplifies still further. A + or − in Table 4.2 indicatesa coefficient whose absolute value is greater than half the maximum coefficient(again in absolute value) for the relevant PC; the sign of the coefficientis also indicated. Similarly, a (+) or (−) indicates a coefficient whose absolutevalue is between a quarter and a half of the largest absolute valuefor the PC of interest. There are, of course, many ways of constructing asimplified version of the PC coefficients in Table 4.1. For example, anotherpossibility is to rescale the coefficients in each PC so that the maximumvalue is ±1, and tabulate only the values of the coefficients, rounded toone decimal place whose absolute values are above a certain cut-off, say 0.5or 0.7. Values of coefficients below the cut-off are omitted, leaving blank

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!