12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

378 14. Generalizations and Adaptations of <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong>solved is then to successively find p-variate vectors φ (k) ,k =1, 2,...,whoseelements are φ (k)j (x j ), which minimize[ ∑p ]var φ (k)j (x j )j=1subject to ∑ pj=1 var[φ(k) j (x j )] = 1, and for k>1,k >l,p∑cov[φ (k)j (x j )φ (l)j (x j)] = 0.j=1As with linear PCA, this reduces to an eigenvalue problem. The mainchoice to be made is the set of functions φ(.) over which optimization is totake place. In an example Donnell et al. (1994) use splines, but their theoreticalresults are quite general and they discuss other, more sophisticated,smoothers. They identify two main uses for low-variance additive principalcomponents, namely to fit additive implicit equations to data and to identifythe presence of ‘concurvities,’ which play the same rôle and cause thesame problems in additive regression as do collinearities in linear regression.<strong>Principal</strong> curves are included in the same section as additive principalcomponents despite the insistence by Donnell and coworkers in a responseto discussion of their paper by Flury that they are very different. One differenceis that although the range of functions allowed in additive principalcomponents is wide, an equation is found relating the variables via thefunctions φ j (x j ), whereas a principal curve is just that, a smooth curvewith no necessity for a parametric equation. A second difference is thatadditive principal components concentrate on low-variance relationships,while principal curves minimize variation orthogonal to the curve.There is nevertheless a similarity between the two techniques, in thatboth replace an optimum line or plane produced by linear PCA by anoptimal non-linear curve or surface. In the case of principal curves, a smoothone-dimensional curve is sought that passes through the ‘middle’ of the dataset. With an appropriate definition of ‘middle,’ the first PC gives the beststraight line through the middle of the data, and principal curves generalizethis using the idea of self-consistency, which was introduced at the end ofSection 2.2. We saw there that, for p-variate random vectors x, y, thevector of random variables y is self-consistent for x if E[x|y] =y. Considera smooth curve in the p-dimensional space defined by x. The curve can bewritten f(λ), where λ defines the position along the curve, and the vectorf(λ) contains the values of the elements of x for a given value of λ. A curvef(λ) is self-consistent, that is, a principal curve,ifE[x | f −1 (x) =λ] =f(λ),where f −1 (x) is the value of λ for which ‖x−f(λ)‖ is minimized. What thismeans intuitively is that, for any given value of λ, sayλ 0 , the average of allvalues of x that have f(λ 0 ) as their closest point on the curve is preciselyf(λ 0 ).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!