Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
14.1. Additive Principal Components and Principal Curves 381Since Kramer’s paper appeared, a number of authors in the neural networkliterature have noted limitations to his procedure and have suggestedalternatives or modifications (see, for example, Jia et al., 2000), althoughit is now used in a range of disciplines including climatology (Monahan,2001). Dong and McAvoy (1996) propose an algorithm that combines theprincipal curves of Section 14.1.2 (Hastie and Stuetzle, 1989) with the autoassociativeneural network set-up of Kramer (1991). Principal curves alonedo not allow the calculation of ‘scores’ with respect to the curves for newobservations, but their combination with a neural network enables suchquantities to be computed.An alternative approach, based on a so-called input-training net, is suggestedby Tan and Mavrovouniotis (1995). In such networks, the inputs arenot fixed, but are trained along with the other parameters of the network.With a single input the results of the algorithm are equivalent to principalcurves, but with a larger number of inputs there is increased flexibility togo beyond the additive model underlying principal curves.Jia et al. (2000) use Tan and Mavrovouniotis’s (1995) input-trainingnet, but have an ordinary linear PCA as a preliminary step. The nonlinearalgorithm is then conducted on the first m linear PCs, where mis chosen to be sufficiently large, ensuring that only PCs with very smallvariances are excluded. Jia and coworkers suggest that around 97% of thetotal variance should be retained to avoid discarding dimensions that mightinclude important non-linear variation. The non-linear components are usedin process control (see Section 13.7), and in an example they give improvedfault detection compared to linear PCs (Jia et al., 2000). The preliminarystep reduces the dimensionality of the data from 37 variables to 12 linearPCs, whilst retaining 98% of the variation.Kambhatla and Leen (1997) introduce non-linearity in a different way, usinga piecewise-linear or ‘local’ approach. The p-dimensional space definedby the possible values of x is partitioned into Q regions, and linear PCsare then found separately for each region. Kambhatla and Leen (1997) notethat this local PCA provides a faster algorithm than a global non-linearneural network. A clustering algorithm is used to define the Q regions.Roweis and Saul (2000) describe a locally linear embedding algorithm thatalso generates local linear reconstructions of observations, this time basedon a set of ‘neighbours’ of each observation. Tarpey (2000) implements asimilar but more restricted idea. He looks separately at the first PCs withintwo regions defined by the sign of the first PC for the whole data set, as ameans of determining the presence of non-linear structure in the data.14.1.4 Other Aspects of Non-LinearityWe saw in Section 5.3 that biplots can provide an informative way of displayingthe results of a PCA. Modifications of these ‘classical’ biplots tobecome non-linear are discussed in detail by Gower and Hand (1996, Chap-
382 14. Generalizations and Adaptations of Principal Component Analysister 6), and a shorter description is given by Krzanowski and Marriott (1994,Chapter 8). The link between non-linear biplots and PCA is somewhattenuous, so we introduce them only briefly. Classical biplots are basedon the singular value decomposition of the data matrix X, and providea best possible rank 2 approximation to X in a least squares sense (Section3.5). The distances between observations in the 2-dimensional spaceof the biplot with α = 1 (see Section 5.3) give optimal approximations tothe corresponding Euclidean distances in p-dimensional space (Krzanowskiand Marriott, 1994). Non-linear biplots replace Euclidean distance byother distance functions. In plots thus produced the straight lines or arrowsrepresenting variables in the classical biplot are replaced by curvedtrajectories. Different trajectories are used to interpolate positions of observationson the plots and to predict values of the variables given theplotting position of an observation. Gower and Hand (1996) give examplesof interpolation biplot trajectories but state that they ‘do not yet have anexample of prediction nonlinear biplots.’Tenenbaum et al. (2000) describe an algorithm in which, as withnon-linear biplots, distances between observations other than Euclideandistance are used in a PCA-related procedure. Here so-called geodesic distancesare approximated by finding the shortest paths in a graph connectingthe observations to be analysed. These distances are then used as input towhat seems to be principal coordinate analysis, a technique which is relatedto PCA (see Section 5.2).14.2 Weights, Metrics, Transformations andCenteringsVarious authors have suggested ‘generalizations’ of PCA. We have met examplesof this in the direction of non-linearity in the previous section. Anumber of generalizations introduce weights or metrics on either observationsor variables or both. The related topics of weights and metrics makeup two of the three parts of the present section; the third is concerned withdifferent ways of transforming or centering the data.14.2.1 WeightsWe start with a definition of generalized PCA which was given by Greenacre(1984, Appendix A). It can viewed as introducing either weights or metricsinto the definition of PCA. Recall the singular value decomposition (SVD)of the (n × p) data matrix X defined in equation (3.5.1), namelyX = ULA ′ . (14.2.1)
- Page 362 and 363: 12.4. PCA and Non-Independent Data
- Page 364 and 365: 12.4. PCA and Non-Independent Data
- Page 366 and 367: 12.4. PCA and Non-Independent Data
- Page 368 and 369: 12.4. PCA and Non-Independent Data
- Page 370 and 371: 13.1. Principal Component Analysis
- Page 372 and 373: 13.1. Principal Component Analysis
- Page 374 and 375: 13.2. Analysis of Size and Shape 34
- Page 376 and 377: 13.2. Analysis of Size and Shape 34
- Page 378 and 379: 13.3. Principal Component Analysis
- Page 380 and 381: 13.3. Principal Component Analysis
- Page 382 and 383: 13.4. Principal Component Analysis
- Page 384 and 385: 13.4. Principal Component Analysis
- Page 386 and 387: 13.5. Common Principal Components 3
- Page 388 and 389: 13.5. Common Principal Components 3
- Page 390 and 391: 13.5. Common Principal Components 3
- Page 392 and 393: 13.5. Common Principal Components 3
- Page 394 and 395: 13.6. Principal Component Analysis
- Page 396 and 397: 13.6. Principal Component Analysis
- Page 398 and 399: 13.7. PCA in Statistical Process Co
- Page 400 and 401: 13.8. Some Other Types of Data 369A
- Page 402 and 403: 13.8. Some Other Types of Data 371d
- Page 404 and 405: 14Generalizations and Adaptations o
- Page 406 and 407: 14.1. Non-Linear Extensions of Prin
- Page 408 and 409: 14.1. Additive Principal Components
- Page 410 and 411: 14.1. Additive Principal Components
- Page 414 and 415: 14.2. Weights, Metrics, Transformat
- Page 416 and 417: 14.2. Weights, Metrics, Transformat
- Page 418 and 419: 14.2. Weights, Metrics, Transformat
- Page 420 and 421: 14.2. Weights, Metrics, Transformat
- Page 422 and 423: 14.2. Weights, Metrics, Transformat
- Page 424 and 425: 14.3. PCs in the Presence of Second
- Page 426 and 427: 14.4. PCA for Non-Normal Distributi
- Page 428 and 429: 14.5. Three-Mode, Multiway and Mult
- Page 430 and 431: 14.5. Three-Mode, Multiway and Mult
- Page 432 and 433: 14.6. Miscellanea 401• Linear App
- Page 434 and 435: 14.6. Miscellanea 40314.6.3 Regress
- Page 436 and 437: 14.7. Concluding Remarks 405space o
- Page 438 and 439: Appendix AComputation of Principal
- Page 440 and 441: A.1. Numerical Calculation of Princ
- Page 442 and 443: A.1. Numerical Calculation of Princ
- Page 444 and 445: A.1. Numerical Calculation of Princ
- Page 446 and 447: ReferencesAguilera, A.M., Gutiérre
- Page 448 and 449: References 417Apley, D.W. and Shi,
- Page 450 and 451: References 419Benasseni, J. (1986b)
- Page 452 and 453: References 421Boik, R.J. (1986). Te
- Page 454 and 455: References 423Castro, P.E., Lawton,
- Page 456 and 457: References 425Cook, R.D. (1986). As
- Page 458 and 459: References 427Dempster, A.P., Laird
- Page 460 and 461: References 429Feeney, G.J. and Hest
14.1. Additive <strong>Principal</strong> <strong>Component</strong>s and <strong>Principal</strong> Curves 381Since Kramer’s paper appeared, a number of authors in the neural networkliterature have noted limitations to his procedure and have suggestedalternatives or modifications (see, for example, Jia et al., 2000), althoughit is now used in a range of disciplines including climatology (Monahan,2001). Dong and McAvoy (1996) propose an algorithm that combines theprincipal curves of Section 14.1.2 (Hastie and Stuetzle, 1989) with the autoassociativeneural network set-up of Kramer (1991). <strong>Principal</strong> curves alonedo not allow the calculation of ‘scores’ with respect to the curves for newobservations, but their combination with a neural network enables suchquantities to be computed.An alternative approach, based on a so-called input-training net, is suggestedby Tan and Mavrovouniotis (1995). In such networks, the inputs arenot fixed, but are trained along with the other parameters of the network.With a single input the results of the algorithm are equivalent to principalcurves, but with a larger number of inputs there is increased flexibility togo beyond the additive model underlying principal curves.Jia et al. (2000) use Tan and Mavrovouniotis’s (1995) input-trainingnet, but have an ordinary linear PCA as a preliminary step. The nonlinearalgorithm is then conducted on the first m linear PCs, where mis chosen to be sufficiently large, ensuring that only PCs with very smallvariances are excluded. Jia and coworkers suggest that around 97% of thetotal variance should be retained to avoid discarding dimensions that mightinclude important non-linear variation. The non-linear components are usedin process control (see Section 13.7), and in an example they give improvedfault detection compared to linear PCs (Jia et al., 2000). The preliminarystep reduces the dimensionality of the data from 37 variables to 12 linearPCs, whilst retaining 98% of the variation.Kambhatla and Leen (1997) introduce non-linearity in a different way, usinga piecewise-linear or ‘local’ approach. The p-dimensional space definedby the possible values of x is partitioned into Q regions, and linear PCsare then found separately for each region. Kambhatla and Leen (1997) notethat this local PCA provides a faster algorithm than a global non-linearneural network. A clustering algorithm is used to define the Q regions.Roweis and Saul (2000) describe a locally linear embedding algorithm thatalso generates local linear reconstructions of observations, this time basedon a set of ‘neighbours’ of each observation. Tarpey (2000) implements asimilar but more restricted idea. He looks separately at the first PCs withintwo regions defined by the sign of the first PC for the whole data set, as ameans of determining the presence of non-linear structure in the data.14.1.4 Other Aspects of Non-LinearityWe saw in Section 5.3 that biplots can provide an informative way of displayingthe results of a PCA. Modifications of these ‘classical’ biplots tobecome non-linear are discussed in detail by Gower and Hand (1996, Chap-