12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

114 6. Choosing a Subset of <strong>Principal</strong> <strong>Component</strong>s or Variablesance. Mandel’s results are based on simulation studies, and although exactresults have been produced by some authors, they are only for limited specialcases. For example, Krzanowski (1979a) gives exact results for m =1and p = 3 or 4, again under the assumptions of normality, independenceand equal variances for all variables. These assumptions mean that theresults can be used to determine whether or not all variables are independent,but are of little general use in determining an ‘optimal’ cut-off fort m . Sugiyama and Tong (1976) describe an approximate distribution for t mwhich does not assume independence or equal variances, and which can beused to test whether l 1 ,l 2 ,...,l m are compatible with any given structurefor λ 1 ,λ 2 ,...,λ m , the corresponding population variances. However, thetest still assumes normality and it is only approximate, so it is not clearhow useful it is in practice for choosing an appropriate value of m.Huang and Tseng (1992) describe a ‘decision procedure for determiningthe number of components’ based on t m . Given a proportion of populationvariance τ, which one wishes to retain, and the true minimum number ofpopulation PCs m τ that achieves this, Huang and Tseng (1992) developa procedure for finding a sample size n and a threshold t ∗ having a prescribedhigh probability of choosing m = m τ . It is difficult to envisagecircumstances where this would be of practical value.A number of other criteria based on ∑ pk=m+1 l k are discussed briefly byJackson (1991, Section 2.8.11). In situations where some desired residualvariation can be specified, as sometimes happens for example in qualitycontrol (see Section 13.7), Jackson (1991, Section 2.8.5) advocates choosingm such that the absolute, rather than percentage, value of ∑ pk=m+1 l k firstfalls below the chosen threshold.6.1.2 Size of Variances of <strong>Principal</strong> <strong>Component</strong>sThe previous rule is equally valid whether a covariance or a correlationmatrix is used to compute the PCs. The rule described in this section isconstructed specifically for use with correlation matrices, although it canbe adapted for some types of covariance matrices. The idea behind therule is that if all elements of x are independent, then the PCs are thesame as the original variables and all have unit variances in the case ofa correlation matrix. Thus any PC with variance less than 1 contains lessinformation than one of the original variables and so is not worth retaining.The rule, in its simplest form, is sometimes called Kaiser’s rule (Kaiser,1960) and retains only those PCs whose variances l k exceed 1. If the dataset contains groups of variables having large within-group correlations, butsmall between group correlations, then there is one PC associated with eachgroup whose variance is > 1, whereas any other PCs associated with thegroup have variances < 1 (see Section 3.8). Thus, the rule will generallyretain one, and only one, PC associated with each such group of variables,which seems to be a reasonable course of action for data of this type.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!