12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

40 3. Properties of Sample <strong>Principal</strong> <strong>Component</strong>sTable 3.1. Correlations and standard deviations for eight blood chemistryvariables.Correlation matrix (n = 72)rblood plate wblood neut lymph bilir sodium potassrblood 1.000plate 0.290 1.000wblood 0.202 0.415 1.000neut −0.055 0.285 0.419 1.000lymph −0.105 −0.376 −0.521 −0.877 1.000bilir −0.252 −0.349 −0.441 −0.076 0.206 1.000sodium −0.229 −0.164 −0.145 0.023 0.034 0.192 1.000potass 0.058 −0.129 −0.076 −0.131 0.151 0.077 0.423 1.000Standard 0.371 41.253 1.935 0.077 0.071 4.037 2.732 0.297deviationsstraightforward relationship between the PCs obtained from a correlationmatrix and those based on the corresponding covariance matrix. The mainpurpose of the present section is to give an example illustrating some of theproperties of PCs based on sample covariance and correlation matrices.The data for this example consist of measurements on 8 blood chemistryvariables for 72 patients in a clinical trial. The correlation matrix for thesedata, together with the standard deviations of each of the eight variables,is given in Table 3.1. Two main points emerge from Table 3.1. First, thereare considerable differences in the standard deviations, caused mainly bydifferences in scale for the eight variables, and, second, none of the correlationsis particularly large in absolute value, apart from the value of −0.877for NEUT and LYMPH.The large differences in standard deviations give a warning that theremay be considerable differences between the PCs for the correlation andcovariance matrices. That this is indeed true can be seen in Tables 3.2and 3.3, which give coefficients for the first four components, based on thecorrelation and covariance matrices respectively. For ease of comparison,the coefficients are rounded to the nearest 0.2. The effect of such severerounding is investigated for this example in Section 10.3.Each of the first four PCs for the correlation matrix has moderate-sizedcoefficients for several of the variables, whereas the first four PCs for thecovariance matrix are each dominated by a single variable. The first componentis a slight perturbation of the single variable PLATE, which has thelargest variance; the second component is almost the same as the variableBILIR with the second highest variance; and so on. In fact, this patterncontinues for the fifth and sixth components, which are not shown in Table3.3. Also, the relative percentages of total variation accounted for byeach component closely mirror the variances of the corresponding variables.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!