12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.3. Selecting a Subset of Variables 139‘best’ subsets more often than the other methods considered, but they alsoselected ‘bad,’ as opposed to ‘good’ or ‘moderate’, subsets more frequentlythan the other methods. Method B4 was most extreme in this respect; itselected ‘best’ and ‘bad’ subsets more frequently than any other method,and ‘moderate’ or ‘good’ subsets less frequently.Similarly, for various real data sets <strong>Jolliffe</strong> (1973) found that none ofthe variable selection methods was uniformly best, but several of them,including B2 and B4, found reasonable subsets in most cases.McCabe (1984) adopted a somewhat different approach to the variableselection problem. He started from the fact that, as has been seen in Chapters2 and 3, PCs satisfy a number of different optimality criteria. A subsetof the original variables that optimizes one of these criteria is termed a setof principal variables by McCabe (1984). Property A1 of Sections 2.1, 3.1,is uninteresting as it simply leads to a subset of variables whose variancesare largest, but other properties lead to one of these four criteria:(a) Minimize(b) Minimize(c) Minimize∏m ∗j=1∑m ∗j=1∑m ∗j=1θ jθ jθ 2 j(d) Minimize∑m −ρ 2 jj=1where θ j ,j=1, 2,...,m ∗ are the eigenvalues of the conditional covariance(or correlation) matrix of the m ∗ deleted variables, given the values ofthe m selected variables, and ρ j ,j=1, 2,...,m − = min(m, m ∗ )arethecanonical correlations between the set of m ∗ deleted variables and the setof m selected variables.Consider, for example, Property A4 of Sections 2.1 and 3.1, wheredet(Σ y ) (or det(S y ) for samples) is to be maximized. In PCA, y consistsof orthonormal linear functions of x; for principal variables y is a subset ofx.From a well-known result concerning partitioned matrices, det(Σ) =det(Σ y ) det(Σ y ∗· y), where Σ y ∗· y is the matrix of conditional covariances forthose variables not in y, given the value of y. Because Σ, and hence det(Σ),is fixed for a given random vector x, maximizing det(Σ y ) is equivalent tominimizing det(Σ y ∗· y). Now det(Σ y ∗· y) = ∏ m ∗j=1 θ j,sothatPropertyA4becomes McCabe’s criterion (a) when deriving principal variables. Otherproperties of Chapters 2 and 3 can similarly be shown to be equivalent to

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!