12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6.1. How Many <strong>Principal</strong> <strong>Component</strong>s? 113that is the sum of the variances of the PCs is equal to the sum of thevariances of the elements of x. The obvious definition of ‘percentage ofvariation accounted for by the first m PCs’ is thereforem∑ / ∑p m∑ / ∑pt m = 100 l k s jj = 100 l k l k ,which reduces tok=1j=1t m = 100pm∑l kk=1in the case of a correlation matrix.Choosing a cut-off t ∗ somewhere between 70% and 90% and retaining mPCs, where m is the smallest integer for which t m >t ∗ , provides a rulewhich in practice preserves in the first m PCs most of the information inx. The best value for t ∗ will generally become smaller as p increases, oras n, the number of observations, increases. Although a sensible cutoff isvery often in the range 70% to 90%, it can sometimes be higher or lowerdepending on the practical details of a particular data set. For example,a value greater than 90% will be appropriate when one or two PCs representvery dominant and rather obvious sources of variation. Here the lessobvious structures beyond these could be of interest, and to find them acut-off higher than 90% may be necessary. Conversely, when p is very largechoosing m corresponding to 70% may give an impractically large value ofm for further analyses. In such cases the threshold should be set somewhatlower.Using the rule is, in a sense, equivalent to looking at the spectral decompositionof the covariance (or correlation) matrix S (see Property A3of Sections 2.1, 3.1), or the SVD of the data matrix X (see Section 3.5). Ineither case, deciding how many terms to include in the decomposition inorder to get a good fit to S or X respectively is closely related to lookingat t m , because an appropriate measure of lack-of-fit of the first m terms ineither decomposition is ∑ pk=m+1 l k. This follows becausen∑ p∑p∑( m˜x ij − x ij ) 2 =(n − 1) l k ,i=1 j=1k=1k=m+1(Gabriel, 1978) and ‖ m S−S‖ = ∑ pk=m+1 l k (see the discussion of PropertyG4 in Section 3.2), where m˜x ij is the rank m approximation to x ij basedon the SVD as given in equation (3.5.3), and m S is the sum of the first mterms of the spectral decomposition of S.A number of attempts have been made to find the distribution of t m ,and hence to produce a formal procedure for choosing m, based on t m .Mandel (1972) presents some expected values for t m for the case where allvariables are independent, normally distributed, and have the same vari-k=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!