Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

6Choosing a Subset of PrincipalComponents or VariablesIn this chapter two separate, but related, topics are considered, both ofwhich are concerned with choosing a subset of variables. In the first section,the choice to be examined is how many PCs adequately account for thetotal variation in x. The major objective in many applications of PCA isto replace the p elements of x by a much smaller number m of PCs, whichnevertheless discard very little information. It is crucial to know how smallm can be taken without serious information loss. Various rules, many adhoc, have been proposed for determining a suitable value of m, and theseare discussed in Section 6.1. Examples of their use are given in Section 6.2.Using m PCs instead of p variables considerably reduces the dimensionalityof the problem when m ≪ p, but usually the values of all p variablesare still needed in order to calculate the PCs, as each PC is likely to bea function of all p variables. It might be preferable if, instead of using mPCs we could use m, or perhaps slightly more, of the original variables,to account for most of the variation in x. The question arises of how tocompare the information contained in a subset of variables with that inthe full data set. Different answers to this question lead to different criteriaand different algorithms for choosing the subset. In Section 6.3 we concentrateon methods that either use PCA to choose the variables or aim toreproduce the PCs in the full data set with a subset of variables, thoughother variable selection techniques are also mentioned briefly. Section 6.4gives two examples of the use of variable selection methods.All of the variable selection methods described in the present chapterare appropriate when the objective is to describe variation within x aswell as possible. Variable selection when x is a set of regressor variables

112 6. Choosing a Subset of Principal Components or Variablesin a regression analysis, or a set of predictor variables in a discriminantanalysis, is a different type of problem as criteria external to x must beconsidered. Variable selection in regression is the subject of Section 8.5. Therelated problem of choosing which PCs to include in a regression analysisor discriminant analysis is discussed in Sections 8.2, 9.1 respectively.6.1 How Many Principal Components?In this section we present a number of rules for deciding how many PCsshould be retained in order to account for most of the variation in x (orin the standardized variables x ∗ in the case of a correlation matrix-basedPCA).In some circumstances the last few, rather than the first few, PCs are ofinterest, as was discussed in Section 3.4 (see also Sections 3.7, 6.3, 8.4, 8.6and 10.1). In the present section, however, the traditional idea of tryingto reduce dimensionality by replacing the p variables by the first m PCs(m

112 6. Choosing a Subset of <strong>Principal</strong> <strong>Component</strong>s or Variablesin a regression analysis, or a set of predictor variables in a discriminantanalysis, is a different type of problem as criteria external to x must beconsidered. Variable selection in regression is the subject of Section 8.5. Therelated problem of choosing which PCs to include in a regression analysisor discriminant analysis is discussed in Sections 8.2, 9.1 respectively.6.1 How Many <strong>Principal</strong> <strong>Component</strong>s?In this section we present a number of rules for deciding how many PCsshould be retained in order to account for most of the variation in x (orin the standardized variables x ∗ in the case of a correlation matrix-basedPCA).In some circumstances the last few, rather than the first few, PCs are ofinterest, as was discussed in Section 3.4 (see also Sections 3.7, 6.3, 8.4, 8.6and 10.1). In the present section, however, the traditional idea of tryingto reduce dimensionality by replacing the p variables by the first m PCs(m

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!