Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

14.2. Weights, Metrics, Transformations and Centerings 387M in (3.9.1)) is equal to Γ −1 (see Besse (1994b)). Futhermore, it can beshown (Besse, 1994b) that Γ −1 is approximately optimal even without theassumption of multivariate normality. Optimality is defined here as findingQ for which E[ 1 ∑ nn i=1 ‖z i − ẑ i ‖ 2 A ] is minimized, where A is any givenEuclidean metric. The matrix Q enters this expression because ẑ i is theQ-orthogonal projection of x i onto the optimal q-dimensional subspace.Of course, the model is often a fiction, and even when it might be believed,Γ will typically not be known. There are, however, certain types ofdata where plausible estimators exist for Γ. One is the case where the datafall into groups or clusters. If the groups are known, then within-groupvariation can be used to estimate Γ, and generalized PCA is equivalentto a form of discriminant analysis (Besse, 1994b). In the case of unknownclusters, Caussinus and Ruiz (1990) use a form of generalized PCA as aprojection pursuit technique to find such clusters (see Section 9.2.2). Anotherform of generalized PCA is used by the same authors to look foroutliers in a data set (Section 10.1).Besse (1988) searches for an ‘optimal’ metric in a less formal manner. Inthe context of fitting splines to functional data, he suggests several familiesof metric that combine elements of closeness between vectors with closenessbetween their smoothness. A family is indexed by a parameter playinga similar rôle to λ in equation (12.3.6), which governs smoothness. Theoptimal value of λ, and hence the optimal metric, is chosen to give themost clear-cut decision on how many PCs to retain.Thacker (1996) independently came up with a similar approach, whichhe refers to as metric-based PCA. He assumes that associated with a setof p variables x is a covariance matrix E for errors or uncertainties. If Sis the covariance matrix of x, then rather than finding a ′ x that maximizesa ′ Sa, it may be more relevant to maximize a′ Saa ′ Ea. This reduces to solvingthe eigenproblemSa k = l k Ea k (14.2.6)for k =1, 2,...,p.Second, third, and subsequent a k are subject to the constraints a ′ h Ea k =0forh

388 14. Generalizations and Adaptations of Principal Component AnalysisMetric-based PCA, as defined by Thacker (1996), corresponds to thetriple (X, E −1 , 1 n I n), and E plays the same rôle as does Γ in the fixedeffects model. Tipping and Bishop’s (1999a) model (Section 3.9) can befitted as a special case with E = σ 2 I p . In this case the a k are simplyeigenvectors of S.Consider a model for x in which x is the sum of a signal and an independentnoise term, so that the overall covariance matrix can be decomposedas S = S S + S N , where S S , S N are constructed from signal and noise,respectively. If S N = E, then S S = S − E andS S a k = Sa k − Ea k = l k Ea k − Ea k =(l k − 1)Ea k ,so the a k are also eigenvectors of the signal covariance matrix, using themetric defined by E −1 . Hannachi (2000) demonstrates equivalences between• Thacker’s technique;• a method that finds a linear function of x that minimizes the probabilitydensity of noise for a fixed value of the probability density ofthe data, assuming both densities are multivariate normal;• maximization of signal to noise ratio as defined by Allen and Smith(1997).Diamantaras and Kung (1996, Section 7.2) discuss maximization of signalto noise ratio in a neural network context using what they call ‘orientedPCA.’ Their optimization problem is again equivalent to that of Thacker(1996). The fingerprint techniques in Section 12.4.3 also analyse signal tonoise ratios, but in that case the signal is defined as a squared expectation,rather than in terms of a signal covariance matrix.Because any linear transformation of x affects both the numerator anddenominator of the ratio in the same way, Thacker’s (1996) technique shareswith canonical variate analysis and CCA an invariance to the units ofmeasurement. In particular, unlike PCA, the results from covariance andcorrelation matrices are equivalent.14.2.3 Transformations and CenteringData may be transformed in a variety of ways before PCA is carried out,and we have seen a number of instances of this elsewhere in the book.Transformations are often used as a way of producing non-linearity (Section14.1) and are a frequent preprocessing step in the analysis of specialtypes of data. For example, discrete data may be ranked (Section 13.1)and size/shape data, compositional data and species abundance data (Sections13.2, 13.3, 13.8) may each be log-transformed before a PCA is done.The log transformation is particularly common and its properties, with andwithout standardization, are illustrated by Baxter (1995) using a numberof examples from archaeology.

388 14. Generalizations and Adaptations of <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong>Metric-based PCA, as defined by Thacker (1996), corresponds to thetriple (X, E −1 , 1 n I n), and E plays the same rôle as does Γ in the fixedeffects model. Tipping and Bishop’s (1999a) model (Section 3.9) can befitted as a special case with E = σ 2 I p . In this case the a k are simplyeigenvectors of S.Consider a model for x in which x is the sum of a signal and an independentnoise term, so that the overall covariance matrix can be decomposedas S = S S + S N , where S S , S N are constructed from signal and noise,respectively. If S N = E, then S S = S − E andS S a k = Sa k − Ea k = l k Ea k − Ea k =(l k − 1)Ea k ,so the a k are also eigenvectors of the signal covariance matrix, using themetric defined by E −1 . Hannachi (2000) demonstrates equivalences between• Thacker’s technique;• a method that finds a linear function of x that minimizes the probabilitydensity of noise for a fixed value of the probability density ofthe data, assuming both densities are multivariate normal;• maximization of signal to noise ratio as defined by Allen and Smith(1997).Diamantaras and Kung (1996, Section 7.2) discuss maximization of signalto noise ratio in a neural network context using what they call ‘orientedPCA.’ Their optimization problem is again equivalent to that of Thacker(1996). The fingerprint techniques in Section 12.4.3 also analyse signal tonoise ratios, but in that case the signal is defined as a squared expectation,rather than in terms of a signal covariance matrix.Because any linear transformation of x affects both the numerator anddenominator of the ratio in the same way, Thacker’s (1996) technique shareswith canonical variate analysis and CCA an invariance to the units ofmeasurement. In particular, unlike PCA, the results from covariance andcorrelation matrices are equivalent.14.2.3 Transformations and CenteringData may be transformed in a variety of ways before PCA is carried out,and we have seen a number of instances of this elsewhere in the book.Transformations are often used as a way of producing non-linearity (Section14.1) and are a frequent preprocessing step in the analysis of specialtypes of data. For example, discrete data may be ranked (Section 13.1)and size/shape data, compositional data and species abundance data (Sections13.2, 13.3, 13.8) may each be log-transformed before a PCA is done.The log transformation is particularly common and its properties, with andwithout standardization, are illustrated by Baxter (1995) using a numberof examples from archaeology.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!