12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

142 6. Choosing a Subset of <strong>Principal</strong> <strong>Component</strong>s or VariablesCadima et al. (<strong>2002</strong>) compare various algorithms for finding good subsetsaccording to the measures (6.3.4) and (6.3.5), and also with respect to theRV-coefficient, which is discussed briefly below (see also Section 3.2). Twoversions of simulated annealing, a genetic algorithm, and a restricted improvementalgorithm, are compared with a number of stepwise algorithms,on a total of fourteen data sets. The results show a general inferiority ofthe stepwise methods, but no single algorithm outperforms all the others.Cadima et al. (<strong>2002</strong>) recommend using simulated annealing or a genetic algorithmto provide a starting point for a restricted improvement algorithm,which then refines the solution. They make the interesting point that forlarge p the number of candidate subsets is so large that, for criteria whoserange of values is bounded, it is almost inevitable that there are many solutionsthat are very close to optimal. For instance, in one of their examples,with p = 62, they find 800 solutions corresponding to a population size of800 in their genetic algorithm. The best of these has a value 0.8079 for thecriterion (6.3.5), but the worst is 0.8060, less than 0.3% smaller. Of course,it is possible that the global optimum is much greater than the best ofthese 800, but it seems highly unlikely.Al-Kandari (1998) provides an extensive study of a large number ofvariable selection methods. The ideas of <strong>Jolliffe</strong> (1972, 1973) and McCabe(1984) are compared with a variety of new methods, based on loadings inthe PCs, on correlations of the PCs with the variables, and on versions ofMcCabe’s (1984) principal variables that are constructed from correlation,rather than covariance, matrices. The methods are compared on simulateddata with a wide range of covariance or correlation structures, and on variousreal data sets that are chosen to have similar covariance/correlationstructures to those of the simulated data. On the basis of the results ofthese analyses, it is concluded that few of the many techniques consideredare uniformly inferior to other methods, and none is uniformly superior.The ‘best’ method varies, depending on the covariance or correlation structureof a data set. It also depends on the ‘measure of efficiency’ used todetermine how good is a subset of variables, as noted also by Cadima and<strong>Jolliffe</strong> (2001). In assessing which subsets of variables are best, Al-Kandari(1998) additionally takes into account the interpretability of the PCs basedon the subset, relative to the PCs based on all p variables (see Section 11.3).Al-Kandari (1998) also discusses the distinction between criteria used tochoose subsets of variables and criteria used to evaluate how good a chosensubset is. The latter are her ‘measures of efficiency’ and ideally these samecriteria should be used to choose subsets in the first place. However, thismay be computationally infeasible so that a suboptimal but computationallystraightforward criterion is used to do the choosing instead. Some ofAl-Kandari’s (1998) results are reported in Al-Kandari and <strong>Jolliffe</strong> (2001)for covariance, but not correlation, matrices.King and Jackson (1999) combine some of the ideas of the present Sectionwith some from Section 6.1. Their main objective is to select a subset

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!