12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

138 6. Choosing a Subset of <strong>Principal</strong> <strong>Component</strong>s or Variablesmethods, including some based on cluster analyses of variables (see Section9.2) were also examined but, as these do not use the PCs to selectvariables, they are not described here. Three main types of method usingPCs were examined.(i) Associate one variable with each of the last m ∗ 1(= p − m 1 ) PCs anddelete those m ∗ 1 variables. This can either be done once only or iteratively.In the latter case a second PCA is performed on the m 1remaining variables, and a further set of m ∗ 2 variables is deleted, if appropriate.A third PCA can then be done on the p−m ∗ 1 −m ∗ 2 variables,and the procedure is repeated until no further deletions are considerednecessary. The choice of m ∗ 1,m ∗ 2,... is based on a criterion determinedby the size of the eigenvalues l k .The reasoning behind this method is that small eigenvalues correspondto near-constant relationships among a subset of variables. If one ofthe variables involved in such a relationship is deleted (a fairly obviouschoice for deletion is the variable with the highest coefficient in absolutevalue in the relevant PC) little information is lost. To decide onhow many variables to delete, the criterion l k is used as described inSection 6.1.2. The criterion t m of Section 6.1.1 was also tried by <strong>Jolliffe</strong>(1972), but shown to be less useful.(ii) Associate a set of m ∗ variables en bloc with the last m ∗ PCs, andthen delete these variables. <strong>Jolliffe</strong> (1970, 1972) investigated this typeof method, with the m ∗ variables either chosen to maximize sums ofsquares of coefficients in the last m ∗ PCs or to be those m ∗ variablesthat are best predicted by regression on the first m = p − m ∗ PCs.Choice of m ∗ is again based on the sizes of the l k . Such methodswere found to be unsatisfactory, as they consistently failed to selectan appropriate subset for some simple correlation structures.(iii) Associate one variable with each of the first m PCs, namely the variablenot already chosen with the highest coefficient in absolute value ineach successive PC. These m variables are retained, and the remainingm ∗ = p − m are deleted. The arguments leading to this approach aretwofold. First, it is an obvious complementary approach to (i) and,second, in cases where there are groups of highly correlated variables itis designed to select just one variable from each group. This will happenbecause there will be exactly one high-variance PC associated with eachgroup (see Section 3.8). The approach is a plausible one, as a singlevariable from each group should preserve most of the information givenby that group when all variables in the group are highly correlated.In <strong>Jolliffe</strong> (1972) comparisons were made, using simulated data, betweennon-iterative versions of method (i) and method (iii), called methods B2, B4respectively, and with several other subset selection methods that did notuse the PCs. The results showed that the PC methods B2, B4 retained the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!