12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

124 6. Choosing a Subset of <strong>Principal</strong> <strong>Component</strong>s or Variablesfor large n, PRESS(m) andW are almost equivalent to the much simplerquantities ∑ pk=m+1 l k andl∑ mpk=m+1 l ,krespectively. However, Gabriel (personal communication) notes that thisconclusion holds only for large sample sizes.In Section 3.9 we introduced the fixed effects model. A number of authorshave used this model as a basis for constructing rules to determine m,with some of the rules relying on the resampling ideas associated with thebootstrap and jackknife. Recall that the model assumes that the rows x i ofthe data matrix are such that E(x i )=z i , where z i lies in a q-dimensionalspace F q .Ife i is defined as (x i − z i ), then E(e i )=0 and var(e i )= σ2w iΓ,where Γ is a positive definite symmetric matrix and the w i are positivescalars whose sum is unity. For fixed q, the quantityn∑w i ‖x i − z i ‖ 2 M , (6.1.6)i=1given in equation (3.9.1), is to be minimized in order to estimate σ 2 ,thez iand F q (Γ and the w i are assumed known). The current selection problemis not only to estimate the unknown parameters, but also to find q. Wewish our choice of m, the number of components retained, to coincide withthe true value of q, assuming that such a value exists.To choose m, Ferré (1990) attempts to find q so that it minimizes theloss functionf q = E[n∑i=1w i ‖z i − ẑ i ‖ 2 Γ−1], (6.1.7)where ẑ i is the projection of x i onto F q . The criterion f q cannot be calculated,but must be estimated, and Ferré (1990) shows that a good estimateof f q isˆf q =p∑k=q+1ˆλ k + σ 2 [2q(n + q − p) − np +2(p − q)+4where ˆλ k is the kth largest eigenvalue of VΓ −1 andp∑V = w i (x i − ¯x)(x i − ¯x) ′ .i=1q∑p∑l=1 k=q+1ˆλ l(ˆλ l − ˆλ k ) ],(6.1.8)In the special case where Γ = I p and w i = 1 n , i =1,...,n,wehaveVΓ −1 = (n−1)nS,andˆλ k = (n−1)nl k, where l k is the kth largest eigenvalueof the sample covariance matrix S. In addition, ẑ i is the projection of x ionto the space spanned by the first q PCs. The residual variance σ 2 still

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!