Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

12.07.2015 Views
14.1. Additive Principal Components and Principal Curves 377in some way to make the optimization problem tractable. One choice is touse step functions, which leads back towards Gifi’s (1990) system of nonlinearPCA. Besse and Ferraty (1995) favour an approach based on splines.They contrast their proposal, in which flexibility of the functional transformationis controlled by the choice of smoothing parameters, with earlierspline-based procedures controlled by the number and positioning of knots(see, for example, van Rijckevorsel (1988) and Winsberg (1988)). Usingsplines as Besse and Ferraty do is equivalent to adding a roughness penaltyfunction to the quantity to be minimized. This is similar to Besse et al.’s(1997) approach to analysing functional data described in Section 12.3.4using equation (12.3.6).As with Gifi’s (1990) non-linear PCA, Besse and Ferraty’s (1995) proposalis implemented by means of an alternating least squares algorithmand, as in Besse and de Falgerolles (1993) for the linear case (see Section6.1.5), bootstrapping of residuals from a q-dimensional model is usedto decide on the best fit. Here, instead of simply using the bootstrap tochoose q, simultaneous optimization with respect q and with respect to thesmoothing parameters which determine the function f(x) is needed. At thisstage it might be asked ‘where is the PCA in all this?’ The name ‘PCA’is still appropriate because the q-dimensional subspace is determined byan optimal set of q linear functions of the vector of transformed randomvariables f(x), and it is these linear functions that are the non-linear PCs.14.1.2 Additive Principal Components and Principal CurvesFowlkes and Kettenring (1985) note that one possible objective for transformingdata before performing a PCA is to find near-singularities in thetransformed data. In other words, x ′ =(x 1 ,x 2 ,...,x p ) is transformed tof ′ (x) =(f 1 (x 1 ),f 2 (x 2 ),...,f p (x p )), and we are interested in finding linearfunctions a ′ f(x) off(x) for which var[a ′ f(x)] ≈ 0. Fowlkes and Kettenring(1985) suggest looking for a transformation that minimizes the determinantof the correlation matrix of the transformed variables. The last fewPCs derived from this correlation matrix should then identify the requirednear-constant relationships, if any exist.A similar idea underlies additive principal components, which are discussedin detail by Donnell et al. (1994). The additive principal componentstake the form ∑ pj=1 φ j(x j ) instead of ∑ pj=1 a jx j in standard PCA, and, aswith Fowlkes and Kettenring (1985), interest centres on components forwhich var[ ∑ pj=1 φ j(x j )] is small. To define a non-linear analogue of PCAthere is a choice of either an algebraic definition that minimizes variance,or a geometric definition that optimizes expected squared distance fromthe additive manifold ∑ pj=1 φ j(x j ) = const. Once we move away from linearPCA, the two definitions lead to different solutions, and Donnell etal. (1994) choose to minimize variance. The optimization problem to be

378 14. Generalizations and Adaptations of Principal Component Analysissolved is then to successively find p-variate vectors φ (k) ,k =1, 2,...,whoseelements are φ (k)j (x j ), which minimize[ ∑p ]var φ (k)j (x j )j=1subject to ∑ pj=1 var[φ(k) j (x j )] = 1, and for k>1,k >l,p∑cov[φ (k)j (x j )φ (l)j (x j)] = 0.j=1As with linear PCA, this reduces to an eigenvalue problem. The mainchoice to be made is the set of functions φ(.) over which optimization is totake place. In an example Donnell et al. (1994) use splines, but their theoreticalresults are quite general and they discuss other, more sophisticated,smoothers. They identify two main uses for low-variance additive principalcomponents, namely to fit additive implicit equations to data and to identifythe presence of ‘concurvities,’ which play the same rôle and cause thesame problems in additive regression as do collinearities in linear regression.Principal curves are included in the same section as additive principalcomponents despite the insistence by Donnell and coworkers in a responseto discussion of their paper by Flury that they are very different. One differenceis that although the range of functions allowed in additive principalcomponents is wide, an equation is found relating the variables via thefunctions φ j (x j ), whereas a principal curve is just that, a smooth curvewith no necessity for a parametric equation. A second difference is thatadditive principal components concentrate on low-variance relationships,while principal curves minimize variation orthogonal to the curve.There is nevertheless a similarity between the two techniques, in thatboth replace an optimum line or plane produced by linear PCA by anoptimal non-linear curve or surface. In the case of principal curves, a smoothone-dimensional curve is sought that passes through the ‘middle’ of the dataset. With an appropriate definition of ‘middle,’ the first PC gives the beststraight line through the middle of the data, and principal curves generalizethis using the idea of self-consistency, which was introduced at the end ofSection 2.2. We saw there that, for p-variate random vectors x, y, thevector of random variables y is self-consistent for x if E[x|y] =y. Considera smooth curve in the p-dimensional space defined by x. The curve can bewritten f(λ), where λ defines the position along the curve, and the vectorf(λ) contains the values of the elements of x for a given value of λ. A curvef(λ) is self-consistent, that is, a principal curve,ifE[x | f −1 (x) =λ] =f(λ),where f −1 (x) is the value of λ for which ‖x−f(λ)‖ is minimized. What thismeans intuitively is that, for any given value of λ, sayλ 0 , the average of allvalues of x that have f(λ 0 ) as their closest point on the curve is preciselyf(λ 0 ).

Page 1: Principal ComponentAnalysis,Second

Page 7 and 8: viPreface to the Second Editionerty

Page 9 and 10: viiiPreface to the Second EditionA

Page 11 and 12: xPreface to the First Editionand in

Page 13 and 14: xiiPreface to the First EditionIn m

Page 15 and 16: This page intentionally left blank

Page 17 and 18: xviAcknowledgmentsthese institution

Page 19 and 20: xviiiContents3.4.1 Example ........

Page 21 and 22: xxContents10 Outlier Detection, Inf


Page 25 and 26: xxivList of Figures5.2 Artistic qua


Page 29 and 30: xxviiiList of Tables6.1 First six e


Page 33 and 34: 2 1. IntroductionFigure 1.1. Plot o

Page 35: 4 1. IntroductionFigure 1.3. Studen

Page 38 and 39: 1.2. A Brief History of Principal C

Page 40 and 41: 1.2. A Brief History of Principal C

Page 42 and 43: 2.1. Optimal Algebraic Properties o




Page 50 and 51: 2.2. Geometric Properties of Popula

Page 52 and 53: 2.3. Principal Components Using a C



Page 58 and 59: 2.4. Principal Components with Equa

Page 60 and 61: 3Mathematical and StatisticalProper

Page 62 and 63: where3.1. Optimal Algebraic Propert

Page 64 and 65: 3.2. Geometric Properties of Sample



Page 70 and 71: 3.3. Covariance and Correlation Mat

Page 72 and 73: 3.3. Covariance and Correlation Mat

Page 74 and 75: 3.4. Principal Components with Equa

Page 76 and 77: show that X = ULA ′ .⎡ULA ′ =

Page 78 and 79: 3.6. Probability Distributions for

Page 80 and 81: 3.7. Inference Based on Sample Prin

Page 82 and 83: 3.7.2 Interval Estimation3.7. Infer



Page 88 and 89: 3.8. Patterned Covariance and Corre

Page 90 and 91: 3.9. Models for Principal Component

Page 92 and 93: 3.9. Models for Principal Component

Page 94 and 95: 4Principal Components as a SmallNum

Page 96 and 97: 4.1. Anatomical Measurements 65Tabl

Page 98 and 99: 4.1. Anatomical Measurements 67spac

Page 100 and 101: 4.2. The Elderly at Home 69Table 4.

Page 102 and 103: 4.3. Spatial and Temporal Variation

Page 104 and 105: 4.3. Spatial and Temporal Variation

Page 106 and 107: 4.4. Properties of Chemical Compoun

Page 108 and 109: 4.5. Stock Market Prices 77Table 4.

Page 110 and 111: 5. Graphical Representation of Data

Page 112 and 113: Anatomical Measurements5.1. Plottin

Page 114 and 115: 5.1. Plotting Two or Three Principa

Page 116 and 117: 5.2. Principal Coordinate Analysis



Page 122 and 123: 5.3. Biplots 91columns, L is an (r

Page 124 and 125: 5.3. Biplots 93ButandSubstituting i

Page 126 and 127: 5.3. Biplots 95The vector gi ∗ co

Page 128 and 129: 5.3. Biplots 97Figure 5.3. Biplot u

Page 130 and 131: 5.3. Biplots 99Table 5.2. First two

Page 132 and 133: 5.3. Biplots 101Figure 5.5. Biplot

Page 134 and 135: 5.4. Correspondence Analysis 103of

Page 136 and 137: 5.4. Correspondence Analysis 105Fig

Page 138 and 139: 5.6. Displaying Intrinsically High-

Page 140 and 141: 5.6. Displaying Intrinsically High-

Page 142 and 143: 6Choosing a Subset of PrincipalComp

Page 144 and 145: 6.1. How Many Principal Components?










Page 164 and 165: 6.2. Choosing m, the Number of Comp

Page 166 and 167: 6.2. Choosing m, the Number of Comp

Page 168 and 169: 6.3. Selecting a Subset of Variable




Page 176 and 177: 6.4. Examples Illustrating Variable



Page 182 and 183: 7.1. Models for Factor Analysis 151

Page 184 and 185: 7.2. Estimation of the Factor Model



Page 190 and 191: 7.3. Comparisons Between Factor and

Page 192 and 193: 7.4. An Example of Factor Analysis

Page 194 and 195: 7.4. An Example of Factor Analysis

Page 196 and 197: 7.5. Concluding Remarks 165To illus

Page 198 and 199: 8Principal Components in Regression

Page 200 and 201: 8.1. Principal Component Regression

Page 202 and 203: 8.1. Principal Component Regression

Page 204 and 205: 8.2. Selecting Components in Princi

Page 206 and 207: 8.2. Selecting Components in Princi

Page 208 and 209: 8.3. Connections Between PC Regress

Page 210 and 211: 8.4. Variations on Principal Compon



Page 216 and 217: 8.5. Variable Selection in Regressi

Page 218 and 219: 8.5. Variable Selection in Regressi

Page 220 and 221: 8.6. Functional and Structural Rela

Page 222 and 223: 8.7. Examples of Principal Componen

Page 224 and 225: Table 8.3. Principal component regr



Page 230 and 231: 9Principal Components Used withOthe

Page 232 and 233: 9.1. Discriminant Analysis 201on th

Page 234 and 235: 9.1. Discriminant Analysis 203Figur

Page 236 and 237: 9.1. Discriminant Analysis 205Corbi

Page 238 and 239: 9.1. Discriminant Analysis 207that

Page 240 and 241: 9.1. Discriminant Analysis 209betwe

Page 242 and 243: 9.2. Cluster Analysis 211dimensiona

Page 244 and 245: 9.2. Cluster Analysis 213Before loo

Page 246 and 247: 9.2. Cluster Analysis 215Figure 9.3

Page 248 and 249: 9.2. Cluster Analysis 217demographi

Page 250 and 251: 9.2. Cluster Analysis 219county clu

Page 252 and 253: 9.2. Cluster Analysis 221choosing a

Page 254 and 255: 9.3. Canonical Correlation Analysis





Page 264 and 265: 10.1. Detection of Outliers Using P








Page 280 and 281: 10.2. Influential Observations in a





Page 290 and 291: 10.3. Sensitivity and Stability 259

Page 292 and 293: 10.3. Sensitivity and Stability 261

Page 294 and 295: 10.4. Robust Estimation of Principa



Page 300 and 301: 11Rotation and Interpretation ofPri

Page 302 and 303: 11.1. Rotation of Principal Compone

Page 304 and 305: oot of the corresponding eigenvalue



Page 310 and 311: 11.2. Alternatives to Rotation 279w

Page 312 and 313: 11.2. Alternatives to Rotation 281F

Page 314 and 315: 11.2. Alternatives to Rotation 283F

Page 316 and 317: 11.2. Alternatives to Rotation 285T

Page 318 and 319: 11.2. Alternatives to Rotation 287T

Page 320 and 321: 11.2. Alternatives to Rotation 289A

Page 322 and 323: 11.2. Alternatives to Rotation 291

Page 324 and 325: 11.3. Simplified Approximations to

Page 326 and 327: 11.3. Simplified Approximations to

Page 328 and 329: 11.4. Physical Interpretation of Pr

Page 330 and 331: 12Principal Component Analysis forT

Page 332 and 333: 12.1. Introduction 301series is alm

Page 334 and 335: 12.2. PCA and Atmospheric Time Seri


Page 338 and 339: and a typical row of the matrix is1





Page 348 and 349: 12.3. Functional PCA 317A key refer

Page 350 and 351: 12.3. Functional PCA 319The sample

Page 352 and 353: 12.3. Functional PCA 321speed (mete

Page 354 and 355: 12.3. Functional PCA 323of the data

Page 356 and 357: 12.3. Functional PCA 325subject to

Page 358 and 359: 12.3. Functional PCA 327series than

Page 360 and 361: 12.4. PCA and Non-Independent Data





Page 370 and 371: 13.1. Principal Component Analysis


Page 374 and 375: 13.2. Analysis of Size and Shape 34

Page 376 and 377: 13.2. Analysis of Size and Shape 34





Page 386 and 387: 13.5. Common Principal Components 3






Page 398 and 399: 13.7. PCA in Statistical Process Co

Page 400 and 401: 13.8. Some Other Types of Data 369A

Page 402 and 403: 13.8. Some Other Types of Data 371d

Page 404 and 405: 14Generalizations and Adaptations o

Page 406 and 407: 14.1. Non-Linear Extensions of Prin

Page 410 and 411: 14.1. Additive Principal Components

Page 412 and 413: 14.1. Additive Principal Components

Page 414 and 415: 14.2. Weights, Metrics, Transformat





Page 424 and 425: 14.3. PCs in the Presence of Second

Page 426 and 427: 14.4. PCA for Non-Normal Distributi

Page 428 and 429: 14.5. Three-Mode, Multiway and Mult

Page 430 and 431: 14.5. Three-Mode, Multiway and Mult

Page 432 and 433: 14.6. Miscellanea 401• Linear App

Page 434 and 435: 14.6. Miscellanea 40314.6.3 Regress

Page 436 and 437: 14.7. Concluding Remarks 405space o

Page 438 and 439: Appendix AComputation of Principal

Page 440 and 441: A.1. Numerical Calculation of Princ



Page 446 and 447: ReferencesAguilera, A.M., Gutiérre

Page 448 and 449: References 417Apley, D.W. and Shi,

Page 450 and 451: References 419Benasseni, J. (1986b)

Page 452 and 453: References 421Boik, R.J. (1986). Te

Page 454 and 455: References 423Castro, P.E., Lawton,

Page 456 and 457: References 425Cook, R.D. (1986). As

Page 458 and 459: References 427Dempster, A.P., Laird

Page 460 and 461: References 429Feeney, G.J. and Hest

Page 462 and 463: References 431in Descriptive Multiv

Page 464 and 465: References 433Gunst, R.F. and Mason

Page 466 and 467: References 435Hocking, R.R., Speed,

Page 468 and 469: References 437Jeffers, J.N.R. (1978

Page 470 and 471: References 439Kazi-Aoual, F., Sabat

Page 472 and 473: References 441Krzanowski, W.J. (200

Page 474 and 475: References 443Mann, M.E. and Park,

Page 476 and 477: References 445Monahan, A.H., Tangan

Page 478 and 479: References 447Pack, P., Jolliffe, I

Page 480 and 481: References 449Richman M.B. (1993).

Page 482 and 483: References 451Soofi, E.S. (1988). P

Page 484 and 485: References 453Tenenbaum, J.B., de S

Page 486 and 487: References 455Vong, R., Geladi, P.,

Page 488 and 489: References 457regularities in multi

Page 490 and 491: Index 459116, 127-130, 133, 270, 27

Page 492 and 493: Index 461computationin (PC) regress

Page 494 and 495: Index 463discriminant principal com

Page 496 and 497: Index 465of correlations between va

Page 498 and 499: Index 467see also hypothesis testin

Page 500 and 501: Index 469PC algorithms with noise 4

Page 502 and 503: Index 471in functional and structur

Page 504 and 505: Index 473variance ellipsoids,S-esti

Page 506 and 507: Index 475spatial correlation/covari

Page 508 and 509: Index 477for covariance matrices 26

Page 510 and 511: Author Index 479Belsley, D.A. 169Be

Page 512 and 513: Author Index 481Fowlkes, E.B. 377Fr

Page 514 and 515: Author Index 483Krzanowski, W.J. 46

Page 516 and 517: Author Index 485Rencher, A.C. 64, 1

Page 518: Author Index 487Yaguchi, H. 371Yana

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) ... View more Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Delete template?

Save as template ?

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)