12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

144 6. Choosing a Subset of <strong>Principal</strong> <strong>Component</strong>s or Variablesto compare what is being optimized here with the approaches describedearlier.• The RV-coefficient compares linear combinations of subsets ofvariables with the full set of variables.• Some methods, such as those of <strong>Jolliffe</strong> (1970, 1972, 1973), compareprincipal components of subsets of variables with principalcomponents from the full set.• Some approaches, such as McCabe’s (1984) principal variables, simplycompare subsets of the variables with the full set of variables.• Some criteria, such as Yanai’s generalized coefficient of determination,compare subspaces spanned by a subset of variables with subspacesspanned by a subset of PCs, as in Cadima and <strong>Jolliffe</strong> (2001).No examples are presented by Robert and Escoufier (1976) of how theirmethod works in practice. However, Gonzalez et al. (1990) give a stepwisealgorithm for implementing the procedure and illustrate it with a smallexample (n = 49; p = 6). The example is small enough for all subsets ofeach size to be evaluated. Only for m =1, 2, 3 does the stepwise algorithmgive the best subset with respect to RV, as identified by the full search.Escoufier (1986) provides further discussion of the properties of the RVcoefficientwhen used in this context.Tanaka and Mori (1997) also use the RV-coefficient, as one of two criteriafor variable selection. They consider the same linear combinations M ′ X 1 ofa given set of variables as Robert and Escoufier (1976), and call these linearcombinations modified principal components. Tanaka and Mori (1997)assess how well a subset reproduces the full set of variables by means ofthe RV-coefficient. They also have a second form of ‘modified’ principalcomponents, constructed by minimizing the trace of the residual covariancematrix obtained by regressing X on M ′ X 1 . This latter formulation issimilar to Rao’s (1964) PCA of instrumental variables (see Section 14.3).The difference between Tanaka and Mori’s (1997) instrumental variableapproach and that of Rao (1964) is that Rao attempts to predict X 2 ,the(n × (p − m)) complementary matrix to X 1 using linear functions of X 1 ,whereas Tanaka and Mori try to predict the full matrix X.Both of Tanaka and Mori’s modified PCAs solve the same eigenequation(S 2 11 + S 12 S 21 )a = lS 11 a, (6.3.6)with obvious notation, but differ in the way that the quality of a subsetis measured. For the instrumental variable approach, the criterionis proportional to ∑ mk=1 l k, whereas for the components derived via theRV-coefficient, quality is based on ∑ mk=1 l2 k , where l k is the kth largesteigenvalue in the solution of (6.3.6). A backward elimination method isused to delete variables until some threshold is reached, although in the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!