12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.1. How Many <strong>Principal</strong> <strong>Component</strong>s? 12995% thresholds. It may therefore be better to look at the size of second andsubsequent eigenvalues only with respect to smaller, not larger, eigenvalues.This could be achieved by removing the first term in the singular valuedecomposition (SVD) (3.5.2), and viewing the original second eigenvalueas the first eigenvalue in the analysis of this residual matrix. If the secondeigenvalue is above its 95% threshold in this analysis, we subtract a secondterm from the SVD, and so on. An alternative idea, noted in Preisendorferand Mobley (1988, Section 5f), is to simulate from a given covariance orcorrelation structure in which not all the variables are uncorrelated.If the data are time series, with autocorrelation between successive observations,Preisendorfer and Mobley (1988) suggest calculating an ‘equivalentsample size’, n ∗ , allowing for the autocorrelation. The simulations used toimplement Rule N are then carried out with sample size n ∗ , rather thanthe actual sample size, n. They also note that both Rules A 4 and N tend toretain too few components, and therefore recommend choosing a value form that is the larger of the two values indicated by these rules. In Section 5kPreisendorfer and Mobley (1988) provide rules for the case of vector-valuedfields.Like Besse and de Falguerolles (1993) (see Section 6.1.5) North et al.(1982) argue strongly that a set of PCs with similar eigenvalues shouldeither all be retained or all excluded. The size of gaps between successiveeigenvalues is thus an important consideration for any decision rule, andNorth et al. (1982) provide a rule-of-thumb for deciding whether gaps aretoo small to split the PCs on either side of the gap.The idea of using simulated data to assess significance of eigenvalueshas also been explored by other authors, for example, Farmer (1971) (seealso Section 6.1.3 above), Cahalan (1983) and, outside the meteorologicalcontext, Mandel (1972), Franklin et al. (1995) and the parallel analysisliterature.Other methods have also been suggested in the atmospheric science literature.For example, Jones et al. (1983), Briffa et al. (1986) use a criterion forcorrelation matrices, which they attribute to Guiot (1981). In this methodPCs are retained if their cumulative eigenvalue product exceeds one. Thistechnique retains more PCs than most of the other procedures discussedearlier, but Jones et al. (1983) seem to be satisfied with the results itproduces. Preisendorfer and Mobley (1982, Part IV) suggest a rule thatconsiders retaining subsets of m PCs not necessarily restricted to the firstm. This is reasonable if the PCs are to be used for an external purpose,such as regression or discriminant analysis (see Chapter 8, Section 9.1),but is not really relevant if we are merely interested in accounting for asmuch of the variation in x as possible. Richman and Lamb (1987) lookspecifically at the case where PCs are rotated (see Section 11.1), and givea rule for choosing m based on the patterns in rotated eigenvectors.North and Wu (2001), in an application of PCA to climate changedetection, use a modification of the percentage of variation criterion of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!