Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

6.1. How Many Principal Components? 115As well as these intuitive justifications, Kaiser (1960) put forward a numberof other reasons for a cut-off at l k = 1. It must be noted, however, thatmost of the reasons are pertinent to factor analysis (see Chapter 7), ratherthan PCA, although Kaiser refers to PCs in discussing one of them.It can be argued that a cut-off at l k = 1 retains too few variables. Considera variable which, in the population, is more-or-less independent ofall other variables. In a sample, such a variable will have small coefficientsin (p − 1) of the PCs but will dominate one of the PCs, whose variancel k will be close to 1 when using the correlation matrix. As the variableprovides independent information from the other variables it would be unwiseto delete it. However, deletion will occur if Kaiser’s rule is used, andif, due to sampling variation, l k < 1. It is therefore advisable to choosea cut-off l ∗ lower than 1, to allow for sampling variation. Jolliffe (1972)suggested, based on simulation studies, that l ∗ =0.7 is roughly the correctlevel. Further discussion of this cut-off level will be given with respect toexamples in Sections 6.2 and 6.4.The rule just described is specifically designed for correlation matrices,but it can be easily adapted for covariance matrices by taking as a cut-off l ∗the average value ¯l of the eigenvalues or, better, a somewhat lower cut-offsuch as l ∗ =0.7¯l. For covariance matrices with widely differing variances,however, this rule and the one based on t k from Section 6.1.1 retain veryfew (arguably, too few) PCs, as will be seen in the examples of Section 6.2.An alternative way of looking at the sizes of individual variances is to usethe so-called broken stick model. If we have a stick of unit length, brokenat random into p segments, then it can be shown that the expected lengthof the kth longest segment isl ∗ k = 1 pOne way of deciding whether the proportion of variance accounted for bythe kth PC is large enough for that component to be retained is to comparethe proportion with lk ∗ . Principal components for which the proportionexceeds lk ∗ are then retained, and all other PCs deleted. Tables of l∗ k areavailable for various values of p and k (see, for example, Legendre andLegendre (1983, p. 406)).p∑j=k1j .6.1.3 The Scree Graph and the Log-Eigenvalue DiagramThe first two rules described above usually involve a degree of subjectivityin the choice of cut-off levels, t ∗ and l ∗ respectively. The scree graph,which was discussed and named by Cattell (1966) but which was alreadyin common use, is even more subjective in its usual form, as it involveslooking at a plot of l k against k (see Figure 6.1, which is discussed in detailin Section 6.2) and deciding at which value of k the slopes of lines joining

116 6. Choosing a Subset of Principal Components or VariablesFigure 6.1. Scree graph for the correlation matrix: blood chemistry data.the plotted points are ‘steep’ to the left of k, and ‘not steep’ to the right.This value of k, defining an ‘elbow’ in the graph, is then taken to be thenumber of components m to be retained. Its name derives from the similarityof its typical shape to that of the accumulation of loose rubble, orscree, at the foot of a mountain slope. An alternative to the scree graph,which was developed in atmospheric science, is to plot log(l k ), rather thanl k , against k; this is known as the log-eigenvalue (or LEV) diagram (seeFarmer (1971), Maryon (1979)).In introducing the scree graph, Cattell (1966) gives a somewhat differentformulation from that above, and presents strong arguments that when itis used in factor analysis it is entirely objective and should produce the‘correct’ number of factors (see Cattell and Vogelmann (1977) for a largenumber of examples). In fact, Cattell (1966) views the rule as a means ofdeciding upon an upper bound to the true number of factors in a factoranalysis after rotation (see Chapter 7). He did not seem to envisage its usein PCA, although it has certainly been widely adopted for that purpose.The way in which Cattell (1966) formulates the rule goes beyond a simplechange of slope from ‘steep’ to ‘shallow.’ He looks for the point beyond

6.1. How Many <strong>Principal</strong> <strong>Component</strong>s? 115As well as these intuitive justifications, Kaiser (1960) put forward a numberof other reasons for a cut-off at l k = 1. It must be noted, however, thatmost of the reasons are pertinent to factor analysis (see Chapter 7), ratherthan PCA, although Kaiser refers to PCs in discussing one of them.It can be argued that a cut-off at l k = 1 retains too few variables. Considera variable which, in the population, is more-or-less independent ofall other variables. In a sample, such a variable will have small coefficientsin (p − 1) of the PCs but will dominate one of the PCs, whose variancel k will be close to 1 when using the correlation matrix. As the variableprovides independent information from the other variables it would be unwiseto delete it. However, deletion will occur if Kaiser’s rule is used, andif, due to sampling variation, l k < 1. It is therefore advisable to choosea cut-off l ∗ lower than 1, to allow for sampling variation. <strong>Jolliffe</strong> (1972)suggested, based on simulation studies, that l ∗ =0.7 is roughly the correctlevel. Further discussion of this cut-off level will be given with respect toexamples in Sections 6.2 and 6.4.The rule just described is specifically designed for correlation matrices,but it can be easily adapted for covariance matrices by taking as a cut-off l ∗the average value ¯l of the eigenvalues or, better, a somewhat lower cut-offsuch as l ∗ =0.7¯l. For covariance matrices with widely differing variances,however, this rule and the one based on t k from Section 6.1.1 retain veryfew (arguably, too few) PCs, as will be seen in the examples of Section 6.2.An alternative way of looking at the sizes of individual variances is to usethe so-called broken stick model. If we have a stick of unit length, brokenat random into p segments, then it can be shown that the expected lengthof the kth longest segment isl ∗ k = 1 pOne way of deciding whether the proportion of variance accounted for bythe kth PC is large enough for that component to be retained is to comparethe proportion with lk ∗ . <strong>Principal</strong> components for which the proportionexceeds lk ∗ are then retained, and all other PCs deleted. Tables of l∗ k areavailable for various values of p and k (see, for example, Legendre andLegendre (1983, p. 406)).p∑j=k1j .6.1.3 The Scree Graph and the Log-Eigenvalue DiagramThe first two rules described above usually involve a degree of subjectivityin the choice of cut-off levels, t ∗ and l ∗ respectively. The scree graph,which was discussed and named by Cattell (1966) but which was alreadyin common use, is even more subjective in its usual form, as it involveslooking at a plot of l k against k (see Figure 6.1, which is discussed in detailin Section 6.2) and deciding at which value of k the slopes of lines joining

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!