Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

14.1. Additive Principal Components and Principal Curves 381Since Kramer’s paper appeared, a number of authors in the neural networkliterature have noted limitations to his procedure and have suggestedalternatives or modifications (see, for example, Jia et al., 2000), althoughit is now used in a range of disciplines including climatology (Monahan,2001). Dong and McAvoy (1996) propose an algorithm that combines theprincipal curves of Section 14.1.2 (Hastie and Stuetzle, 1989) with the autoassociativeneural network set-up of Kramer (1991). Principal curves alonedo not allow the calculation of ‘scores’ with respect to the curves for newobservations, but their combination with a neural network enables suchquantities to be computed.An alternative approach, based on a so-called input-training net, is suggestedby Tan and Mavrovouniotis (1995). In such networks, the inputs arenot fixed, but are trained along with the other parameters of the network.With a single input the results of the algorithm are equivalent to principalcurves, but with a larger number of inputs there is increased flexibility togo beyond the additive model underlying principal curves.Jia et al. (2000) use Tan and Mavrovouniotis’s (1995) input-trainingnet, but have an ordinary linear PCA as a preliminary step. The nonlinearalgorithm is then conducted on the first m linear PCs, where mis chosen to be sufficiently large, ensuring that only PCs with very smallvariances are excluded. Jia and coworkers suggest that around 97% of thetotal variance should be retained to avoid discarding dimensions that mightinclude important non-linear variation. The non-linear components are usedin process control (see Section 13.7), and in an example they give improvedfault detection compared to linear PCs (Jia et al., 2000). The preliminarystep reduces the dimensionality of the data from 37 variables to 12 linearPCs, whilst retaining 98% of the variation.Kambhatla and Leen (1997) introduce non-linearity in a different way, usinga piecewise-linear or ‘local’ approach. The p-dimensional space definedby the possible values of x is partitioned into Q regions, and linear PCsare then found separately for each region. Kambhatla and Leen (1997) notethat this local PCA provides a faster algorithm than a global non-linearneural network. A clustering algorithm is used to define the Q regions.Roweis and Saul (2000) describe a locally linear embedding algorithm thatalso generates local linear reconstructions of observations, this time basedon a set of ‘neighbours’ of each observation. Tarpey (2000) implements asimilar but more restricted idea. He looks separately at the first PCs withintwo regions defined by the sign of the first PC for the whole data set, as ameans of determining the presence of non-linear structure in the data.14.1.4 Other Aspects of Non-LinearityWe saw in Section 5.3 that biplots can provide an informative way of displayingthe results of a PCA. Modifications of these ‘classical’ biplots tobecome non-linear are discussed in detail by Gower and Hand (1996, Chap-

382 14. Generalizations and Adaptations of Principal Component Analysister 6), and a shorter description is given by Krzanowski and Marriott (1994,Chapter 8). The link between non-linear biplots and PCA is somewhattenuous, so we introduce them only briefly. Classical biplots are basedon the singular value decomposition of the data matrix X, and providea best possible rank 2 approximation to X in a least squares sense (Section3.5). The distances between observations in the 2-dimensional spaceof the biplot with α = 1 (see Section 5.3) give optimal approximations tothe corresponding Euclidean distances in p-dimensional space (Krzanowskiand Marriott, 1994). Non-linear biplots replace Euclidean distance byother distance functions. In plots thus produced the straight lines or arrowsrepresenting variables in the classical biplot are replaced by curvedtrajectories. Different trajectories are used to interpolate positions of observationson the plots and to predict values of the variables given theplotting position of an observation. Gower and Hand (1996) give examplesof interpolation biplot trajectories but state that they ‘do not yet have anexample of prediction nonlinear biplots.’Tenenbaum et al. (2000) describe an algorithm in which, as withnon-linear biplots, distances between observations other than Euclideandistance are used in a PCA-related procedure. Here so-called geodesic distancesare approximated by finding the shortest paths in a graph connectingthe observations to be analysed. These distances are then used as input towhat seems to be principal coordinate analysis, a technique which is relatedto PCA (see Section 5.2).14.2 Weights, Metrics, Transformations andCenteringsVarious authors have suggested ‘generalizations’ of PCA. We have met examplesof this in the direction of non-linearity in the previous section. Anumber of generalizations introduce weights or metrics on either observationsor variables or both. The related topics of weights and metrics makeup two of the three parts of the present section; the third is concerned withdifferent ways of transforming or centering the data.14.2.1 WeightsWe start with a definition of generalized PCA which was given by Greenacre(1984, Appendix A). It can viewed as introducing either weights or metricsinto the definition of PCA. Recall the singular value decomposition (SVD)of the (n × p) data matrix X defined in equation (3.5.1), namelyX = ULA ′ . (14.2.1)

14.1. Additive <strong>Principal</strong> <strong>Component</strong>s and <strong>Principal</strong> Curves 381Since Kramer’s paper appeared, a number of authors in the neural networkliterature have noted limitations to his procedure and have suggestedalternatives or modifications (see, for example, Jia et al., 2000), althoughit is now used in a range of disciplines including climatology (Monahan,2001). Dong and McAvoy (1996) propose an algorithm that combines theprincipal curves of Section 14.1.2 (Hastie and Stuetzle, 1989) with the autoassociativeneural network set-up of Kramer (1991). <strong>Principal</strong> curves alonedo not allow the calculation of ‘scores’ with respect to the curves for newobservations, but their combination with a neural network enables suchquantities to be computed.An alternative approach, based on a so-called input-training net, is suggestedby Tan and Mavrovouniotis (1995). In such networks, the inputs arenot fixed, but are trained along with the other parameters of the network.With a single input the results of the algorithm are equivalent to principalcurves, but with a larger number of inputs there is increased flexibility togo beyond the additive model underlying principal curves.Jia et al. (2000) use Tan and Mavrovouniotis’s (1995) input-trainingnet, but have an ordinary linear PCA as a preliminary step. The nonlinearalgorithm is then conducted on the first m linear PCs, where mis chosen to be sufficiently large, ensuring that only PCs with very smallvariances are excluded. Jia and coworkers suggest that around 97% of thetotal variance should be retained to avoid discarding dimensions that mightinclude important non-linear variation. The non-linear components are usedin process control (see Section 13.7), and in an example they give improvedfault detection compared to linear PCs (Jia et al., 2000). The preliminarystep reduces the dimensionality of the data from 37 variables to 12 linearPCs, whilst retaining 98% of the variation.Kambhatla and Leen (1997) introduce non-linearity in a different way, usinga piecewise-linear or ‘local’ approach. The p-dimensional space definedby the possible values of x is partitioned into Q regions, and linear PCsare then found separately for each region. Kambhatla and Leen (1997) notethat this local PCA provides a faster algorithm than a global non-linearneural network. A clustering algorithm is used to define the Q regions.Roweis and Saul (2000) describe a locally linear embedding algorithm thatalso generates local linear reconstructions of observations, this time basedon a set of ‘neighbours’ of each observation. Tarpey (2000) implements asimilar but more restricted idea. He looks separately at the first PCs withintwo regions defined by the sign of the first PC for the whole data set, as ameans of determining the presence of non-linear structure in the data.14.1.4 Other Aspects of Non-LinearityWe saw in Section 5.3 that biplots can provide an informative way of displayingthe results of a PCA. Modifications of these ‘classical’ biplots tobecome non-linear are discussed in detail by Gower and Hand (1996, Chap-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!