12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6.1. How Many <strong>Principal</strong> <strong>Component</strong>s? 121(1982), the number of terms in the estimate for X, corresponding to thenumber of PCs, is successively taken as 1, 2,..., and so on, until overallprediction of the x ij is no longer significantly improved by the addition ofextra terms (PCs). The number of PCs to be retained, m, is then taken tobe the minimum number necessary for adequate prediction.Using the SVD, x ij can be written, as in equations (3.5.2),(5.3.3),x ij =r∑k=1u ik l 1/2ka jk ,where r is the rank of X. (Recall that, in this context, l k ,k =1, 2,...,pare eigenvalues of X ′ X, rather than of S.)An estimate of x ij , based on the first m PCs and using all the data, ism˜x ij =m∑k=1u ik l 1/2ka jk , (6.1.1)but what is required is an estimate based on a subset of the data that doesnot include x ij . This estimate is writtenm∑mˆx ij = û ikˆl1/2 kâ jk , (6.1.2)k=1where û ik , ˆl k , â jk are calculated from suitable subsets of the data. The sumof squared differences between predicted and observed x ij is thenn∑ p∑PRESS(m) = ( mˆx ij − x ij ) 2 . (6.1.3)i=1 j=1The notation PRESS stands for PREdiction Sum of Squares, and is takenfrom the similar concept in regression, due to Allen (1974). All of the aboveis essentially common to both Wold (1978) and Eastment and Krzanowski(1982); they differ in how a subset is chosen for predicting x ij , and in how(6.1.3) is used for deciding on m.Eastment and Krzanowski (1982) use an estimate â jk in (6.1.2) based onthe data set with just the ith observation x i deleted. û ik is calculated withonly the jth variable deleted, and ˆl k combines information from the twocases with the ith observation and the jth variable deleted, respectively.Wold (1978), on the other hand, divides the data into g blocks, where herecommends that g should be between four and seven and must not be adivisor of p, and that no block should contain the majority of the elementsin any row or column of X. Quantities equivalent to û ik , ˆl k and â jk arecalculated g times, once with each block of data deleted, and the estimatesformed with the hth block deleted are then used to predict the data in thehth block, h =1, 2,...,g.With respect to the choice of m, Wold (1978) and Eastment and Krzanowski(1982) each use a (different) function of PRESS(m) as a criterion

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!