12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.3. Selecting a Subset of Variables 141variable selection in factor analysis. Cadima and <strong>Jolliffe</strong> (2001) show thatYanai’s coefficient can be written ascorr(P q , P m )= √ 1 q∑r 2qmkm (6.3.4)where r km is the multiple correlation between the kth PC and the set ofm selected variables.The second indicator examined by Cadima and <strong>Jolliffe</strong> (2001) is again amatrix correlation, this time between the data matrix X and the matrixformed by orthogonally projecting X onto the space spanned by the mselected variables. It can be writtenk=1∑pk=1corr(X, P m X)=√∑ λ krkm2pk=1 λ . (6.3.5)kIt turns out that this measure is equivalent to the second of McCabe’s(1984) criteria defined above (see also McCabe (1986)). Cadima and <strong>Jolliffe</strong>(2001) discuss a number of other interpretations, and relationshipsbetween their measures and previous suggestions in the literature. Bothindicators (6.3.4) and (6.3.5) are weighted averages of the squared multiplecorrelations between each PC and the set of selected variables. In thesecond measure, the weights are simply the eigenvalues of S, and hence thevariances of the PCs. For the first indicator the weights are positive andequal for the first q PCs, but zero otherwise. Thus the first indicator ignoresPCs outside the chosen q-dimensional subspace when assessing closeness,but it also gives less weight than the second indicator to the PCs with thevery largest variances relative to those with intermediate variances.Cadima and <strong>Jolliffe</strong> (2001) discuss algorithms for finding good subsetsof variables and demonstrate the use of the two measures on three examples,one of which is large (p = 62) compared to those typically used forillustration. The examples show that the two measures can lead to quitedifferent optimal subsets, implying that it is necessary to know what aspectof a subspace it is most desirable to preserve before choosing a subset ofvariables to achieve this. They also show that• the algorithms usually work efficiently in cases where numbers ofvariables are small enough to allow comparisions with an exhaustivesearch;• as discussed elsewhere (Section 11.3), choosing variables on the basisof the size of coefficients or loadings in the PCs’ eigenvectors can beinadvisable;• to match the information provided by the first q PCsitisoftenonlynecessary to keep (q +1)or(q + 2) variables.For data sets in which p is too large to conduct an exhaustive searchfor the optimal subset, algorithms that can find a good subset are needed.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!