12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

14.1. Additive <strong>Principal</strong> <strong>Component</strong>s and <strong>Principal</strong> Curves 379It follows from the discussion in Section 2.2 that for multivariate normaland elliptical distributions the first principal component defines a principalcurve, though there may also be other principal curves which are differentfrom the first PC. Hence, non-linear principal curves may be thoughtof as a generalization of the first PC for other probability distributions.The discussion so far has been in terms of probability distributions, but asimilar idea can be defined for samples. In this case a curve is fitted iteratively,alternating between ‘projection’ and ‘conditional-expectation’ steps.In a projection step, the closest point on the current curve is found for eachobservation in the sample, and the conditional-expectation step then calculatesthe average of observations closest to each point on the curve. Theseaverages form a new curve to be used in the next projection step. In afinite data set there will usually be at most one observation correspondingto a given point on the curve, so some sort of smoothing is requiredto form the averages. Hastie and Stuetzle (1989) provide details of somepossible smoothing schemes, together with examples. They also discuss thepossibility of extension from curves to higher-dimensional surfaces.Tarpey (1999) describes a ‘lack-of-fit’ test that can be used to decidewhether or not a principal curve is simply the first PC. The test involvesthe idea of principal points which, for populations, are defined as follows.Suppose that x is a p-variate random vector and y is a discrete p-variaterandom vector, taking only the k values y 1 , y 2 ,...,y k .Ify is such thatE[ ‖x − y‖ 2 ] is minimized over all possible choices of the k values fory, then y 1 , y 2 ,...,y k are the k principal points for the distribution of x.There is a connection with self-consistency, as y is self-consistent for xin this case. Flury (1993) discusses several methods for finding principalpoints in a sample.There is another link between principal points and principal components,namely that if x has a multivariate normal or elliptical distribution, and theprincipal points y 1 , y 2 ,...,y k for the distribution lie in a q (< p) subspace,then the subspace is identical to that spanned by the vectors of coefficientsdefining the first q PCs of x (Flury, 1995, Theorem 2.3). Tarpey (2000)introduces the idea of parallel principal axes, which are parallel hyperplanesorthogonal to the axis defined by the first PC that intersect that axis at theprincipal points of the marginal distribution of x along the axis. He showsthat self-consistency of parallel principal axes characterizes multivariatenormal distributions.14.1.3 Non-Linearity Using Neural NetworksA considerable amount of work has been done on PCA in the context ofneural networks. Indeed, there is a book on the subject (Diamantaras andKung, 1996) which gives a good overview. Here we describe only thosedevelopments that provide non-linear extensions of PCA. Computationalmatters are discussed in Appendix A1, and other aspects of the PCA/neural

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!