12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

130 6. Choosing a Subset of <strong>Principal</strong> <strong>Component</strong>s or VariablesSection 6.1.1. They use instead the percentage of ‘signal’ accounted for,although the PCA is done on a covariance matrix other than that associatedwith the signal (see Section 12.4.3). Buell (1978) advocates stabilitywith respect to different degrees of approximation of a continuous spatialfield by discrete points as a criterion for choosing m. Section 13.3.4 of vonStorch and Zwiers (1999) is dismissive of selection rules.6.1.8 DiscussionAlthough many rules have been examined in the last seven subsections,the list is by no means exhaustive. For example, in Section 5.1 we notedthat superimposing a minimum spanning tree on a plot of the observationswith respect to the first two PCs gives a subjective indication of whether ornot a two-dimensional representation is adequate. It is not possible to givedefinitive guidance on which rules are best, but we conclude this sectionwith a few comments on their relative merits. First, though, we discuss asmall selection of the many comparative studies that have been published.Reddon (1984, Section 3.9) describes nine such studies, mostly from thepsychological literature, but all are concerned with factor analysis ratherthan PCA. A number of later studies in the ecological, psychological andmeteorological literatures have examined various rules on both real andsimulated data sets. Simulation of multivariate data sets can always becriticized as unrepresentative, because they can never explore more thana tiny fraction of the vast range of possible correlation and covariancestructures. Several of the published studies, for example Grossman et al.(1991), Richman (1988), are particularly weak in this respect, looking onlyat simulations where all p of the variables are uncorrelated, a situationwhich is extremely unlikely to be of much interest in practice. Anotherweakness of several psychology-based studies is their confusion betweenPCA and factor analysis. For example, Zwick and Velicer (1986) state that‘if PCA is used to summarize a data set each retained component mustcontain at least two substantial loadings.’ If the word ‘summarize’ impliesa descriptive purpose the statement is nonsense, but in the simulation studythat follows all their ‘components’ have three or more large loadings. Withthis structure, based on factor analysis, it is no surprise that Zwick andVelicer (1986) conclude that some of the rules they compare, which weredesigned with descriptive PCA in mind, retain ‘too many’ factors.Jackson (1993) investigates a rather broader range of structures, includingup to 12 variables in up to 3 correlated groups, as well as the completelyuncorrelated case. The range of stopping rules is also fairly wide, including:Kaiser’s rule; the scree graph; the broken stick rule; the proportion oftotal variance; tests of equality of eigenvalues; and Jackson’s two bootstrapprocedures described in Section 6.1.5. Jackson (1993) concludes that thebroken stick and bootstrapped eigenvalue-eigenvector rules give the bestresults in his study. However, as with the reasoning used to develop his

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!