Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

6.1. How Many Principal Components? 127decide on m, two criteria need to be satisfied. First, the confidence intervalsfor λ m and λ m+1 should not overlap, and second no component shouldbe retained unless it has at least two coefficients whose confidence intervalsexclude zero. This second requirement is again relevant for factor analysis,but not PCA. With regard to the first criterion, it has already beennoted that avoiding small gaps between l m and l m+1 is desirable becauseit reduces the likelihood of instability in the retained components.6.1.6 Partial CorrelationFor PCA based on a correlation matrix, Velicer (1976) suggested that thepartial correlations between the p variables, given the values of the firstm PCs, may be used to determine how many PCs to retain. The criterionproposed is the average of the squared partial correlationsV =p∑i=1i≠jp∑j=1(r ∗ ij )2p(p − 1) ,where rij ∗ is the partial correlation between the ith and jth variables, giventhe first m PCs. The statistic rij ∗ is defined as the correlation between theresiduals from the linear regression of the ith variable on the first m PCs,and the residuals from the corresponding regression of the jth variable onthese m PCs. It therefore measures the strength of the linear relationshipbetween the ith and jth variables after removing the common effect of thefirst m PCs.The criterion V first decreases, and then increases, as m increases, andVelicer (1976) suggests that the optimal value of m corresponds to theminimum value of the criterion. As with Jackson’s (1993) bootstrap rulesof Section 6.1.5, and for the same reasons, this criterion is plausible asa means of deciding the number of factors in a factor analysis, but it isinappropriate in PCA. Numerous other rules have been suggested in thecontext of factor analysis (Reddon, 1984, Chapter 3). Many are subjective,although some, such as parallel analysis (see Sections 6.1.3, 6.1.5) attempta more objective approach. Few are relevant to, or useful for, PCA unlessthey are modified in some way.Beltrando (1990) gives a sketchy description of what appears to be anotherselection rule based on partial correlations. Instead of choosing m sothat the average squared partial correlation is minimized, Beltrando (1990)selects m for which the number of statistically significant elements in thematrix of partial correlations is minimized.6.1.7 Rules for an Atmospheric Science ContextAs mentioned in Section 4.3, PCA has been widely used in meteorologyand climatology to summarize data that vary both spatially and tempo-

128 6. Choosing a Subset of Principal Components or Variablesrally, and a number of rules for selecting a subset of PCs have been putforward with this context very much in mind. The LEV diagram, discussedin Section 6.1.3, is one example, as is Beltrando’s (1990) method in Section6.1.6, but there are many others. In the fairly common situation wheredifferent observations correspond to different time points, Preisendorfer andMobley (1988) suggest that important PCs will be those for which there isa clear pattern, rather than pure randomness, present in their behaviourthrough time. The important PCs can then be discovered by forming atime series of each PC, and testing which time series are distinguishablefrom white noise. Many tests are available for this purpose in the timeseries literature, and Preisendorfer and Mobley (1988, Sections 5g–5j) discussthe use of a number of them. This type of test is perhaps relevantin cases where the set of multivariate observations form a time series (seeChapter 12), as in many atmospheric science applications, but in the moreusual (non-meteorological) situation where the observations are independent,such techniques are irrelevant, as the values of the PCs for differentobservations will also be independent. There is therefore no natural orderingof the observations, and if they are placed in a sequence, they shouldnecessarily look like a white noise series.Chapter 5 of Preisendorfer and Mobley (1988) gives a thorough review ofselection rules used in atmospheric science. In Sections 5c–5e they discussa number of rules similar in spirit to the rules of Sections 6.1.3 and 6.1.4above. They are, however, derived from consideration of a physical model,based on spring-coupled masses (Section 5b), where it is required to distinguishsignal (the important PCs) from noise (the unimportant PCs). Thedetails of the rules are, as a consequence, somewhat different from thoseof Sections 6.1.3 and 6.1.4. Two main ideas are described. The first, calledRule A 4 by Preisendorfer and Mobley (1988), has a passing resemblance toBartlett’s test of equality of eigenvalues, which was defined and discussedin Sections 3.7.3 and 6.1.4. Rule A 4 assumes that the last (p−q) populationeigenvalues are equal, and uses the asymptotic distribution of the averageof the last (p − q) sample eigenvalues to test whether the common populationvalue is equal to λ 0 . Choosing an appropriate value for λ 0 introducesa second step into the procedure and is a weakness of the rule.Rule N, described in Section 5d of Preisendorfer and Mobley (1988) ispopular in atmospheric science. It is similar to the techniques of parallelanalysis, discussed in Sections 6.1.3 and 6.1.5, and involves simulating alarge number of uncorrelated sets of data of the same size as the real dataset which is to be analysed, and computing the eigenvalues of each simulateddata set. To assess the significance of the eigenvalues for the realdata set, the eigenvalues are compared to percentiles derived empiricallyfrom the simulated data. The suggested rule keeps any components whoseeigenvalues lie above the 95% level in the cumulative distribution of thesimulated data. A disadvantage is that if the first eigenvalue for the datais very large, it makes it difficult for later eigenvalues to exceed their own

6.1. How Many <strong>Principal</strong> <strong>Component</strong>s? 127decide on m, two criteria need to be satisfied. First, the confidence intervalsfor λ m and λ m+1 should not overlap, and second no component shouldbe retained unless it has at least two coefficients whose confidence intervalsexclude zero. This second requirement is again relevant for factor analysis,but not PCA. With regard to the first criterion, it has already beennoted that avoiding small gaps between l m and l m+1 is desirable becauseit reduces the likelihood of instability in the retained components.6.1.6 Partial CorrelationFor PCA based on a correlation matrix, Velicer (1976) suggested that thepartial correlations between the p variables, given the values of the firstm PCs, may be used to determine how many PCs to retain. The criterionproposed is the average of the squared partial correlationsV =p∑i=1i≠jp∑j=1(r ∗ ij )2p(p − 1) ,where rij ∗ is the partial correlation between the ith and jth variables, giventhe first m PCs. The statistic rij ∗ is defined as the correlation between theresiduals from the linear regression of the ith variable on the first m PCs,and the residuals from the corresponding regression of the jth variable onthese m PCs. It therefore measures the strength of the linear relationshipbetween the ith and jth variables after removing the common effect of thefirst m PCs.The criterion V first decreases, and then increases, as m increases, andVelicer (1976) suggests that the optimal value of m corresponds to theminimum value of the criterion. As with Jackson’s (1993) bootstrap rulesof Section 6.1.5, and for the same reasons, this criterion is plausible asa means of deciding the number of factors in a factor analysis, but it isinappropriate in PCA. Numerous other rules have been suggested in thecontext of factor analysis (Reddon, 1984, Chapter 3). Many are subjective,although some, such as parallel analysis (see Sections 6.1.3, 6.1.5) attempta more objective approach. Few are relevant to, or useful for, PCA unlessthey are modified in some way.Beltrando (1990) gives a sketchy description of what appears to be anotherselection rule based on partial correlations. Instead of choosing m sothat the average squared partial correlation is minimized, Beltrando (1990)selects m for which the number of statistically significant elements in thematrix of partial correlations is minimized.6.1.7 Rules for an Atmospheric Science ContextAs mentioned in Section 4.3, PCA has been widely used in meteorologyand climatology to summarize data that vary both spatially and tempo-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!