Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
6.1. How Many Principal Components? 115As well as these intuitive justifications, Kaiser (1960) put forward a numberof other reasons for a cut-off at l k = 1. It must be noted, however, thatmost of the reasons are pertinent to factor analysis (see Chapter 7), ratherthan PCA, although Kaiser refers to PCs in discussing one of them.It can be argued that a cut-off at l k = 1 retains too few variables. Considera variable which, in the population, is more-or-less independent ofall other variables. In a sample, such a variable will have small coefficientsin (p − 1) of the PCs but will dominate one of the PCs, whose variancel k will be close to 1 when using the correlation matrix. As the variableprovides independent information from the other variables it would be unwiseto delete it. However, deletion will occur if Kaiser’s rule is used, andif, due to sampling variation, l k < 1. It is therefore advisable to choosea cut-off l ∗ lower than 1, to allow for sampling variation. Jolliffe (1972)suggested, based on simulation studies, that l ∗ =0.7 is roughly the correctlevel. Further discussion of this cut-off level will be given with respect toexamples in Sections 6.2 and 6.4.The rule just described is specifically designed for correlation matrices,but it can be easily adapted for covariance matrices by taking as a cut-off l ∗the average value ¯l of the eigenvalues or, better, a somewhat lower cut-offsuch as l ∗ =0.7¯l. For covariance matrices with widely differing variances,however, this rule and the one based on t k from Section 6.1.1 retain veryfew (arguably, too few) PCs, as will be seen in the examples of Section 6.2.An alternative way of looking at the sizes of individual variances is to usethe so-called broken stick model. If we have a stick of unit length, brokenat random into p segments, then it can be shown that the expected lengthof the kth longest segment isl ∗ k = 1 pOne way of deciding whether the proportion of variance accounted for bythe kth PC is large enough for that component to be retained is to comparethe proportion with lk ∗ . Principal components for which the proportionexceeds lk ∗ are then retained, and all other PCs deleted. Tables of l∗ k areavailable for various values of p and k (see, for example, Legendre andLegendre (1983, p. 406)).p∑j=k1j .6.1.3 The Scree Graph and the Log-Eigenvalue DiagramThe first two rules described above usually involve a degree of subjectivityin the choice of cut-off levels, t ∗ and l ∗ respectively. The scree graph,which was discussed and named by Cattell (1966) but which was alreadyin common use, is even more subjective in its usual form, as it involveslooking at a plot of l k against k (see Figure 6.1, which is discussed in detailin Section 6.2) and deciding at which value of k the slopes of lines joining
116 6. Choosing a Subset of Principal Components or VariablesFigure 6.1. Scree graph for the correlation matrix: blood chemistry data.the plotted points are ‘steep’ to the left of k, and ‘not steep’ to the right.This value of k, defining an ‘elbow’ in the graph, is then taken to be thenumber of components m to be retained. Its name derives from the similarityof its typical shape to that of the accumulation of loose rubble, orscree, at the foot of a mountain slope. An alternative to the scree graph,which was developed in atmospheric science, is to plot log(l k ), rather thanl k , against k; this is known as the log-eigenvalue (or LEV) diagram (seeFarmer (1971), Maryon (1979)).In introducing the scree graph, Cattell (1966) gives a somewhat differentformulation from that above, and presents strong arguments that when itis used in factor analysis it is entirely objective and should produce the‘correct’ number of factors (see Cattell and Vogelmann (1977) for a largenumber of examples). In fact, Cattell (1966) views the rule as a means ofdeciding upon an upper bound to the true number of factors in a factoranalysis after rotation (see Chapter 7). He did not seem to envisage its usein PCA, although it has certainly been widely adopted for that purpose.The way in which Cattell (1966) formulates the rule goes beyond a simplechange of slope from ‘steep’ to ‘shallow.’ He looks for the point beyond
- Page 96 and 97: 4.1. Anatomical Measurements 65Tabl
- Page 98 and 99: 4.1. Anatomical Measurements 67spac
- Page 100 and 101: 4.2. The Elderly at Home 69Table 4.
- Page 102 and 103: 4.3. Spatial and Temporal Variation
- Page 104 and 105: 4.3. Spatial and Temporal Variation
- Page 106 and 107: 4.4. Properties of Chemical Compoun
- Page 108 and 109: 4.5. Stock Market Prices 77Table 4.
- Page 110 and 111: 5. Graphical Representation of Data
- Page 112 and 113: Anatomical Measurements5.1. Plottin
- Page 114 and 115: 5.1. Plotting Two or Three Principa
- Page 116 and 117: 5.2. Principal Coordinate Analysis
- Page 118 and 119: 5.2. Principal Coordinate Analysis
- Page 120 and 121: 5.2. Principal Coordinate Analysis
- Page 122 and 123: 5.3. Biplots 91columns, L is an (r
- Page 124 and 125: 5.3. Biplots 93ButandSubstituting i
- Page 126 and 127: 5.3. Biplots 95The vector gi ∗ co
- Page 128 and 129: 5.3. Biplots 97Figure 5.3. Biplot u
- Page 130 and 131: 5.3. Biplots 99Table 5.2. First two
- Page 132 and 133: 5.3. Biplots 101Figure 5.5. Biplot
- Page 134 and 135: 5.4. Correspondence Analysis 103of
- Page 136 and 137: 5.4. Correspondence Analysis 105Fig
- Page 138 and 139: 5.6. Displaying Intrinsically High-
- Page 140 and 141: 5.6. Displaying Intrinsically High-
- Page 142 and 143: 6Choosing a Subset of PrincipalComp
- Page 144 and 145: 6.1. How Many Principal Components?
- Page 148 and 149: 6.1. How Many Principal Components?
- Page 150 and 151: 6.1. How Many Principal Components?
- Page 152 and 153: 6.1. How Many Principal Components?
- Page 154 and 155: 6.1. How Many Principal Components?
- Page 156 and 157: 6.1. How Many Principal Components?
- Page 158 and 159: 6.1. How Many Principal Components?
- Page 160 and 161: 6.1. How Many Principal Components?
- Page 162 and 163: 6.1. How Many Principal Components?
- Page 164 and 165: 6.2. Choosing m, the Number of Comp
- Page 166 and 167: 6.2. Choosing m, the Number of Comp
- Page 168 and 169: 6.3. Selecting a Subset of Variable
- Page 170 and 171: 6.3. Selecting a Subset of Variable
- Page 172 and 173: 6.3. Selecting a Subset of Variable
- Page 174 and 175: 6.3. Selecting a Subset of Variable
- Page 176 and 177: 6.4. Examples Illustrating Variable
- Page 178 and 179: 6.4. Examples Illustrating Variable
- Page 180 and 181: 6.4. Examples Illustrating Variable
- Page 182 and 183: 7.1. Models for Factor Analysis 151
- Page 184 and 185: 7.2. Estimation of the Factor Model
- Page 186 and 187: 7.2. Estimation of the Factor Model
- Page 188 and 189: 7.2. Estimation of the Factor Model
- Page 190 and 191: 7.3. Comparisons Between Factor and
- Page 192 and 193: 7.4. An Example of Factor Analysis
- Page 194 and 195: 7.4. An Example of Factor Analysis
6.1. How Many <strong>Principal</strong> <strong>Component</strong>s? 115As well as these intuitive justifications, Kaiser (1960) put forward a numberof other reasons for a cut-off at l k = 1. It must be noted, however, thatmost of the reasons are pertinent to factor analysis (see Chapter 7), ratherthan PCA, although Kaiser refers to PCs in discussing one of them.It can be argued that a cut-off at l k = 1 retains too few variables. Considera variable which, in the population, is more-or-less independent ofall other variables. In a sample, such a variable will have small coefficientsin (p − 1) of the PCs but will dominate one of the PCs, whose variancel k will be close to 1 when using the correlation matrix. As the variableprovides independent information from the other variables it would be unwiseto delete it. However, deletion will occur if Kaiser’s rule is used, andif, due to sampling variation, l k < 1. It is therefore advisable to choosea cut-off l ∗ lower than 1, to allow for sampling variation. <strong>Jolliffe</strong> (1972)suggested, based on simulation studies, that l ∗ =0.7 is roughly the correctlevel. Further discussion of this cut-off level will be given with respect toexamples in Sections 6.2 and 6.4.The rule just described is specifically designed for correlation matrices,but it can be easily adapted for covariance matrices by taking as a cut-off l ∗the average value ¯l of the eigenvalues or, better, a somewhat lower cut-offsuch as l ∗ =0.7¯l. For covariance matrices with widely differing variances,however, this rule and the one based on t k from Section 6.1.1 retain veryfew (arguably, too few) PCs, as will be seen in the examples of Section 6.2.An alternative way of looking at the sizes of individual variances is to usethe so-called broken stick model. If we have a stick of unit length, brokenat random into p segments, then it can be shown that the expected lengthof the kth longest segment isl ∗ k = 1 pOne way of deciding whether the proportion of variance accounted for bythe kth PC is large enough for that component to be retained is to comparethe proportion with lk ∗ . <strong>Principal</strong> components for which the proportionexceeds lk ∗ are then retained, and all other PCs deleted. Tables of l∗ k areavailable for various values of p and k (see, for example, Legendre andLegendre (1983, p. 406)).p∑j=k1j .6.1.3 The Scree Graph and the Log-Eigenvalue DiagramThe first two rules described above usually involve a degree of subjectivityin the choice of cut-off levels, t ∗ and l ∗ respectively. The scree graph,which was discussed and named by Cattell (1966) but which was alreadyin common use, is even more subjective in its usual form, as it involveslooking at a plot of l k against k (see Figure 6.1, which is discussed in detailin Section 6.2) and deciding at which value of k the slopes of lines joining