Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
14.1. Additive Principal Components and Principal Curves 377in some way to make the optimization problem tractable. One choice is touse step functions, which leads back towards Gifi’s (1990) system of nonlinearPCA. Besse and Ferraty (1995) favour an approach based on splines.They contrast their proposal, in which flexibility of the functional transformationis controlled by the choice of smoothing parameters, with earlierspline-based procedures controlled by the number and positioning of knots(see, for example, van Rijckevorsel (1988) and Winsberg (1988)). Usingsplines as Besse and Ferraty do is equivalent to adding a roughness penaltyfunction to the quantity to be minimized. This is similar to Besse et al.’s(1997) approach to analysing functional data described in Section 12.3.4using equation (12.3.6).As with Gifi’s (1990) non-linear PCA, Besse and Ferraty’s (1995) proposalis implemented by means of an alternating least squares algorithmand, as in Besse and de Falgerolles (1993) for the linear case (see Section6.1.5), bootstrapping of residuals from a q-dimensional model is usedto decide on the best fit. Here, instead of simply using the bootstrap tochoose q, simultaneous optimization with respect q and with respect to thesmoothing parameters which determine the function f(x) is needed. At thisstage it might be asked ‘where is the PCA in all this?’ The name ‘PCA’is still appropriate because the q-dimensional subspace is determined byan optimal set of q linear functions of the vector of transformed randomvariables f(x), and it is these linear functions that are the non-linear PCs.14.1.2 Additive Principal Components and Principal CurvesFowlkes and Kettenring (1985) note that one possible objective for transformingdata before performing a PCA is to find near-singularities in thetransformed data. In other words, x ′ =(x 1 ,x 2 ,...,x p ) is transformed tof ′ (x) =(f 1 (x 1 ),f 2 (x 2 ),...,f p (x p )), and we are interested in finding linearfunctions a ′ f(x) off(x) for which var[a ′ f(x)] ≈ 0. Fowlkes and Kettenring(1985) suggest looking for a transformation that minimizes the determinantof the correlation matrix of the transformed variables. The last fewPCs derived from this correlation matrix should then identify the requirednear-constant relationships, if any exist.A similar idea underlies additive principal components, which are discussedin detail by Donnell et al. (1994). The additive principal componentstake the form ∑ pj=1 φ j(x j ) instead of ∑ pj=1 a jx j in standard PCA, and, aswith Fowlkes and Kettenring (1985), interest centres on components forwhich var[ ∑ pj=1 φ j(x j )] is small. To define a non-linear analogue of PCAthere is a choice of either an algebraic definition that minimizes variance,or a geometric definition that optimizes expected squared distance fromthe additive manifold ∑ pj=1 φ j(x j ) = const. Once we move away from linearPCA, the two definitions lead to different solutions, and Donnell etal. (1994) choose to minimize variance. The optimization problem to be
378 14. Generalizations and Adaptations of Principal Component Analysissolved is then to successively find p-variate vectors φ (k) ,k =1, 2,...,whoseelements are φ (k)j (x j ), which minimize[ ∑p ]var φ (k)j (x j )j=1subject to ∑ pj=1 var[φ(k) j (x j )] = 1, and for k>1,k >l,p∑cov[φ (k)j (x j )φ (l)j (x j)] = 0.j=1As with linear PCA, this reduces to an eigenvalue problem. The mainchoice to be made is the set of functions φ(.) over which optimization is totake place. In an example Donnell et al. (1994) use splines, but their theoreticalresults are quite general and they discuss other, more sophisticated,smoothers. They identify two main uses for low-variance additive principalcomponents, namely to fit additive implicit equations to data and to identifythe presence of ‘concurvities,’ which play the same rôle and cause thesame problems in additive regression as do collinearities in linear regression.Principal curves are included in the same section as additive principalcomponents despite the insistence by Donnell and coworkers in a responseto discussion of their paper by Flury that they are very different. One differenceis that although the range of functions allowed in additive principalcomponents is wide, an equation is found relating the variables via thefunctions φ j (x j ), whereas a principal curve is just that, a smooth curvewith no necessity for a parametric equation. A second difference is thatadditive principal components concentrate on low-variance relationships,while principal curves minimize variation orthogonal to the curve.There is nevertheless a similarity between the two techniques, in thatboth replace an optimum line or plane produced by linear PCA by anoptimal non-linear curve or surface. In the case of principal curves, a smoothone-dimensional curve is sought that passes through the ‘middle’ of the dataset. With an appropriate definition of ‘middle,’ the first PC gives the beststraight line through the middle of the data, and principal curves generalizethis using the idea of self-consistency, which was introduced at the end ofSection 2.2. We saw there that, for p-variate random vectors x, y, thevector of random variables y is self-consistent for x if E[x|y] =y. Considera smooth curve in the p-dimensional space defined by x. The curve can bewritten f(λ), where λ defines the position along the curve, and the vectorf(λ) contains the values of the elements of x for a given value of λ. A curvef(λ) is self-consistent, that is, a principal curve,ifE[x | f −1 (x) =λ] =f(λ),where f −1 (x) is the value of λ for which ‖x−f(λ)‖ is minimized. What thismeans intuitively is that, for any given value of λ, sayλ 0 , the average of allvalues of x that have f(λ 0 ) as their closest point on the curve is preciselyf(λ 0 ).
- Page 358 and 359: 12.3. Functional PCA 327series than
- Page 360 and 361: 12.4. PCA and Non-Independent Data
- Page 362 and 363: 12.4. PCA and Non-Independent Data
- Page 364 and 365: 12.4. PCA and Non-Independent Data
- Page 366 and 367: 12.4. PCA and Non-Independent Data
- Page 368 and 369: 12.4. PCA and Non-Independent Data
- Page 370 and 371: 13.1. Principal Component Analysis
- Page 372 and 373: 13.1. Principal Component Analysis
- Page 374 and 375: 13.2. Analysis of Size and Shape 34
- Page 376 and 377: 13.2. Analysis of Size and Shape 34
- Page 378 and 379: 13.3. Principal Component Analysis
- Page 380 and 381: 13.3. Principal Component Analysis
- Page 382 and 383: 13.4. Principal Component Analysis
- Page 384 and 385: 13.4. Principal Component Analysis
- Page 386 and 387: 13.5. Common Principal Components 3
- Page 388 and 389: 13.5. Common Principal Components 3
- Page 390 and 391: 13.5. Common Principal Components 3
- Page 392 and 393: 13.5. Common Principal Components 3
- Page 394 and 395: 13.6. Principal Component Analysis
- Page 396 and 397: 13.6. Principal Component Analysis
- Page 398 and 399: 13.7. PCA in Statistical Process Co
- Page 400 and 401: 13.8. Some Other Types of Data 369A
- Page 402 and 403: 13.8. Some Other Types of Data 371d
- Page 404 and 405: 14Generalizations and Adaptations o
- Page 406 and 407: 14.1. Non-Linear Extensions of Prin
- Page 410 and 411: 14.1. Additive Principal Components
- Page 412 and 413: 14.1. Additive Principal Components
- Page 414 and 415: 14.2. Weights, Metrics, Transformat
- Page 416 and 417: 14.2. Weights, Metrics, Transformat
- Page 418 and 419: 14.2. Weights, Metrics, Transformat
- Page 420 and 421: 14.2. Weights, Metrics, Transformat
- Page 422 and 423: 14.2. Weights, Metrics, Transformat
- Page 424 and 425: 14.3. PCs in the Presence of Second
- Page 426 and 427: 14.4. PCA for Non-Normal Distributi
- Page 428 and 429: 14.5. Three-Mode, Multiway and Mult
- Page 430 and 431: 14.5. Three-Mode, Multiway and Mult
- Page 432 and 433: 14.6. Miscellanea 401• Linear App
- Page 434 and 435: 14.6. Miscellanea 40314.6.3 Regress
- Page 436 and 437: 14.7. Concluding Remarks 405space o
- Page 438 and 439: Appendix AComputation of Principal
- Page 440 and 441: A.1. Numerical Calculation of Princ
- Page 442 and 443: A.1. Numerical Calculation of Princ
- Page 444 and 445: A.1. Numerical Calculation of Princ
- Page 446 and 447: ReferencesAguilera, A.M., Gutiérre
- Page 448 and 449: References 417Apley, D.W. and Shi,
- Page 450 and 451: References 419Benasseni, J. (1986b)
- Page 452 and 453: References 421Boik, R.J. (1986). Te
- Page 454 and 455: References 423Castro, P.E., Lawton,
- Page 456 and 457: References 425Cook, R.D. (1986). As
14.1. Additive <strong>Principal</strong> <strong>Component</strong>s and <strong>Principal</strong> Curves 377in some way to make the optimization problem tractable. One choice is touse step functions, which leads back towards Gifi’s (1990) system of nonlinearPCA. Besse and Ferraty (1995) favour an approach based on splines.They contrast their proposal, in which flexibility of the functional transformationis controlled by the choice of smoothing parameters, with earlierspline-based procedures controlled by the number and positioning of knots(see, for example, van Rijckevorsel (1988) and Winsberg (1988)). Usingsplines as Besse and Ferraty do is equivalent to adding a roughness penaltyfunction to the quantity to be minimized. This is similar to Besse et al.’s(1997) approach to analysing functional data described in Section 12.3.4using equation (12.3.6).As with Gifi’s (1990) non-linear PCA, Besse and Ferraty’s (1995) proposalis implemented by means of an alternating least squares algorithmand, as in Besse and de Falgerolles (1993) for the linear case (see Section6.1.5), bootstrapping of residuals from a q-dimensional model is usedto decide on the best fit. Here, instead of simply using the bootstrap tochoose q, simultaneous optimization with respect q and with respect to thesmoothing parameters which determine the function f(x) is needed. At thisstage it might be asked ‘where is the PCA in all this?’ The name ‘PCA’is still appropriate because the q-dimensional subspace is determined byan optimal set of q linear functions of the vector of transformed randomvariables f(x), and it is these linear functions that are the non-linear PCs.14.1.2 Additive <strong>Principal</strong> <strong>Component</strong>s and <strong>Principal</strong> CurvesFowlkes and Kettenring (1985) note that one possible objective for transformingdata before performing a PCA is to find near-singularities in thetransformed data. In other words, x ′ =(x 1 ,x 2 ,...,x p ) is transformed tof ′ (x) =(f 1 (x 1 ),f 2 (x 2 ),...,f p (x p )), and we are interested in finding linearfunctions a ′ f(x) off(x) for which var[a ′ f(x)] ≈ 0. Fowlkes and Kettenring(1985) suggest looking for a transformation that minimizes the determinantof the correlation matrix of the transformed variables. The last fewPCs derived from this correlation matrix should then identify the requirednear-constant relationships, if any exist.A similar idea underlies additive principal components, which are discussedin detail by Donnell et al. (1994). The additive principal componentstake the form ∑ pj=1 φ j(x j ) instead of ∑ pj=1 a jx j in standard PCA, and, aswith Fowlkes and Kettenring (1985), interest centres on components forwhich var[ ∑ pj=1 φ j(x j )] is small. To define a non-linear analogue of PCAthere is a choice of either an algebraic definition that minimizes variance,or a geometric definition that optimizes expected squared distance fromthe additive manifold ∑ pj=1 φ j(x j ) = const. Once we move away from linearPCA, the two definitions lead to different solutions, and Donnell etal. (1994) choose to minimize variance. The optimization problem to be