Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
6.4. Examples Illustrating Variable Selection 147both among the group of dominant variables for the second PC, and variable13 (tibia length 3) has the largest coefficient of any variable for PC1.Comparisons can be made regarding how well Jolliffe’s and McCabe’s selectionsperform with respect to the criteria (6.3.4) and (6.3.5). For (6.3.5),Jolliffe’s choices are closer to optimality than McCabe’s, achieving valuesof 0.933 and 0.945 for four variables, compared to 0.907 and 0.904 forMcCabe, whereas the optimal value is 0.948. Discrepancies are generallylarger but more variable for criterion (6.3.4). For example, the B2 selectionof three variables achieves a value of only 0.746 compared the optimalvalue of 0.942, which is attained by B4. Values for McCabe’s selections areintermediate (0.838, 0.880).Regarding the choice of m, thel k criterion of Section 6.1.2 was foundby Jolliffe (1972), using simulation studies, to be appropriate for methodsB2 and B4, with a cut-off close to l ∗ =0.7. In the present example thecriterion suggests m =3,asl 3 =0.75 and l 4 =0.50. Confirmation that mshould be this small is given by the criterion t m of Section 6.1.1. Two PCsaccount for 85.4% of the variation, three PCs give 89.4% and four PCscontribute 92.0%, from which Jeffers (1967) concludes that two PCs aresufficient to account for most of the variation. However, Jolliffe (1973) alsolooked at how well other aspects of the structure of data are reproduced forvarious values of m. For example, the form of the PCs and the division intofour distinct groups of aphids (see Section 9.2 for further discussion of thisaspect) were both examined and found to be noticeably better reproducedfor m = 4 than for m = 2 or 3, so it seems that the criteria of Sections 6.1.1and 6.1.2 might be relaxed somewhat when very small values of m areindicated, especially when coupled with small values of n, the sample size.McCabe (1982) notes that four or five of the original variables are necessaryin order to account for as much variation as the first two PCs, confirmingthat m = 4 or 5 is probably appropriate here.Tanaka and Mori (1997) suggest, on the basis of their two criteria andusing a backward elimination algorithm, that seven or nine variables shouldbe kept, rather more than Jolliffe (1973) or McCabe (1982). If only fourvariables are retained, Tanaka and Mori’s (1997) analysis keeps variables5, 6, 14, 19 according to the RV-coefficient, and variables 5, 14, 17, 18 usingresiduals from regression. At least three of the four variables overlap withchoices made in Table 6.4. On the other hand, the selection rule basedon influential variables suggested by Mori et al. (2000) retains variables2, 4, 12, 13 in a 4-variable subset, a quite different selection from those ofthe other methods.6.4.2 Crime RatesThese data were given by Ahamad (1967) and consist of measurements ofthe crime rate in England and Wales for 18 different categories of crime(the variables) for the 14 years, 1950–63. The sample size n = 14 is very
148 6. Choosing a Subset of Principal Components or VariablesTable 6.5. Subsets of selected variables, crime rates.(Each row corresponds to a selected subset with × denoting a selected variable.)Variables1 3 4 5 7 8 10 13 14 16 17McCabe, using criterion (a){best × × ×Three variablessecond best × × ×{best × × × ×Four variablessecond best × × × ×Jolliffe, using criteria B2, B4{B2 × × ×Three variablesB4 × × ×{B2 × × × ×Four variablesB4 × × × ×Criterion (6.3.4)Three variables × × ×Four variables × × × ×Criterion (6.3.5)Three variables × × ×Four variables × × × ×small, and is in fact smaller than the number of variables. Furthermore,the data are time series, and the 14 observations are not independent (seeChapter 12), so that the effective sample size is even smaller than 14. Leavingaside this potential problem and other criticisms of Ahamad’s analysis(Walker, 1967), subsets of variables that are selected using the correlationmatrix by the same methods as in Table 6.4 are shown in Table 6.5.There is a strong similarity between the correlation structure of thepresent data set and that of the previous example. Most of the variablesconsidered increased during the time period considered, and the correlationsbetween these variables are large and positive. (Some elements of thecorrelation matrix given by Ahamad (1967) are incorrect; Jolliffe (1970)gives the correct values.)The first PC based on the correlation matrix therefore has large coefficientson all these variables; it measures an ‘average crime rate’ calculatedlargely from 13 of the 18 variables, and accounts for 71.7% of the totalvariation. The second PC, accounting for 16.1% of the total variation, haslarge coefficients on the five variables whose behaviour over the 14 yearsis ‘atypical’ in one way or another. The third PC, accounting for 5.5% ofthe total variation, is dominated by the single variable ‘homicide,’ whichstayed almost constant compared with the trends in other variables overthe period of study. On the basis of t m only two or three PCs are necessary,
- Page 128 and 129: 5.3. Biplots 97Figure 5.3. Biplot u
- Page 130 and 131: 5.3. Biplots 99Table 5.2. First two
- Page 132 and 133: 5.3. Biplots 101Figure 5.5. Biplot
- Page 134 and 135: 5.4. Correspondence Analysis 103of
- Page 136 and 137: 5.4. Correspondence Analysis 105Fig
- Page 138 and 139: 5.6. Displaying Intrinsically High-
- Page 140 and 141: 5.6. Displaying Intrinsically High-
- Page 142 and 143: 6Choosing a Subset of PrincipalComp
- Page 144 and 145: 6.1. How Many Principal Components?
- Page 146 and 147: 6.1. How Many Principal Components?
- Page 148 and 149: 6.1. How Many Principal Components?
- Page 150 and 151: 6.1. How Many Principal Components?
- Page 152 and 153: 6.1. How Many Principal Components?
- Page 154 and 155: 6.1. How Many Principal Components?
- Page 156 and 157: 6.1. How Many Principal Components?
- Page 158 and 159: 6.1. How Many Principal Components?
- Page 160 and 161: 6.1. How Many Principal Components?
- Page 162 and 163: 6.1. How Many Principal Components?
- Page 164 and 165: 6.2. Choosing m, the Number of Comp
- Page 166 and 167: 6.2. Choosing m, the Number of Comp
- Page 168 and 169: 6.3. Selecting a Subset of Variable
- Page 170 and 171: 6.3. Selecting a Subset of Variable
- Page 172 and 173: 6.3. Selecting a Subset of Variable
- Page 174 and 175: 6.3. Selecting a Subset of Variable
- Page 176 and 177: 6.4. Examples Illustrating Variable
- Page 180 and 181: 6.4. Examples Illustrating Variable
- Page 182 and 183: 7.1. Models for Factor Analysis 151
- Page 184 and 185: 7.2. Estimation of the Factor Model
- Page 186 and 187: 7.2. Estimation of the Factor Model
- Page 188 and 189: 7.2. Estimation of the Factor Model
- Page 190 and 191: 7.3. Comparisons Between Factor and
- Page 192 and 193: 7.4. An Example of Factor Analysis
- Page 194 and 195: 7.4. An Example of Factor Analysis
- Page 196 and 197: 7.5. Concluding Remarks 165To illus
- Page 198 and 199: 8Principal Components in Regression
- Page 200 and 201: 8.1. Principal Component Regression
- Page 202 and 203: 8.1. Principal Component Regression
- Page 204 and 205: 8.2. Selecting Components in Princi
- Page 206 and 207: 8.2. Selecting Components in Princi
- Page 208 and 209: 8.3. Connections Between PC Regress
- Page 210 and 211: 8.4. Variations on Principal Compon
- Page 212 and 213: 8.4. Variations on Principal Compon
- Page 214 and 215: 8.4. Variations on Principal Compon
- Page 216 and 217: 8.5. Variable Selection in Regressi
- Page 218 and 219: 8.5. Variable Selection in Regressi
- Page 220 and 221: 8.6. Functional and Structural Rela
- Page 222 and 223: 8.7. Examples of Principal Componen
- Page 224 and 225: Table 8.3. Principal component regr
- Page 226 and 227: 8.7. Examples of Principal Componen
148 6. Choosing a Subset of <strong>Principal</strong> <strong>Component</strong>s or VariablesTable 6.5. Subsets of selected variables, crime rates.(Each row corresponds to a selected subset with × denoting a selected variable.)Variables1 3 4 5 7 8 10 13 14 16 17McCabe, using criterion (a){best × × ×Three variablessecond best × × ×{best × × × ×Four variablessecond best × × × ×<strong>Jolliffe</strong>, using criteria B2, B4{B2 × × ×Three variablesB4 × × ×{B2 × × × ×Four variablesB4 × × × ×Criterion (6.3.4)Three variables × × ×Four variables × × × ×Criterion (6.3.5)Three variables × × ×Four variables × × × ×small, and is in fact smaller than the number of variables. Furthermore,the data are time series, and the 14 observations are not independent (seeChapter 12), so that the effective sample size is even smaller than 14. Leavingaside this potential problem and other criticisms of Ahamad’s analysis(Walker, 1967), subsets of variables that are selected using the correlationmatrix by the same methods as in Table 6.4 are shown in Table 6.5.There is a strong similarity between the correlation structure of thepresent data set and that of the previous example. Most of the variablesconsidered increased during the time period considered, and the correlationsbetween these variables are large and positive. (Some elements of thecorrelation matrix given by Ahamad (1967) are incorrect; <strong>Jolliffe</strong> (1970)gives the correct values.)The first PC based on the correlation matrix therefore has large coefficientson all these variables; it measures an ‘average crime rate’ calculatedlargely from 13 of the 18 variables, and accounts for 71.7% of the totalvariation. The second PC, accounting for 16.1% of the total variation, haslarge coefficients on the five variables whose behaviour over the 14 yearsis ‘atypical’ in one way or another. The third PC, accounting for 5.5% ofthe total variation, is dominated by the single variable ‘homicide,’ whichstayed almost constant compared with the trends in other variables overthe period of study. On the basis of t m only two or three PCs are necessary,