Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
14.2. Weights, Metrics, Transformations and Centerings 387M in (3.9.1)) is equal to Γ −1 (see Besse (1994b)). Futhermore, it can beshown (Besse, 1994b) that Γ −1 is approximately optimal even without theassumption of multivariate normality. Optimality is defined here as findingQ for which E[ 1 ∑ nn i=1 ‖z i − ẑ i ‖ 2 A ] is minimized, where A is any givenEuclidean metric. The matrix Q enters this expression because ẑ i is theQ-orthogonal projection of x i onto the optimal q-dimensional subspace.Of course, the model is often a fiction, and even when it might be believed,Γ will typically not be known. There are, however, certain types ofdata where plausible estimators exist for Γ. One is the case where the datafall into groups or clusters. If the groups are known, then within-groupvariation can be used to estimate Γ, and generalized PCA is equivalentto a form of discriminant analysis (Besse, 1994b). In the case of unknownclusters, Caussinus and Ruiz (1990) use a form of generalized PCA as aprojection pursuit technique to find such clusters (see Section 9.2.2). Anotherform of generalized PCA is used by the same authors to look foroutliers in a data set (Section 10.1).Besse (1988) searches for an ‘optimal’ metric in a less formal manner. Inthe context of fitting splines to functional data, he suggests several familiesof metric that combine elements of closeness between vectors with closenessbetween their smoothness. A family is indexed by a parameter playinga similar rôle to λ in equation (12.3.6), which governs smoothness. Theoptimal value of λ, and hence the optimal metric, is chosen to give themost clear-cut decision on how many PCs to retain.Thacker (1996) independently came up with a similar approach, whichhe refers to as metric-based PCA. He assumes that associated with a setof p variables x is a covariance matrix E for errors or uncertainties. If Sis the covariance matrix of x, then rather than finding a ′ x that maximizesa ′ Sa, it may be more relevant to maximize a′ Saa ′ Ea. This reduces to solvingthe eigenproblemSa k = l k Ea k (14.2.6)for k =1, 2,...,p.Second, third, and subsequent a k are subject to the constraints a ′ h Ea k =0forh
388 14. Generalizations and Adaptations of Principal Component AnalysisMetric-based PCA, as defined by Thacker (1996), corresponds to thetriple (X, E −1 , 1 n I n), and E plays the same rôle as does Γ in the fixedeffects model. Tipping and Bishop’s (1999a) model (Section 3.9) can befitted as a special case with E = σ 2 I p . In this case the a k are simplyeigenvectors of S.Consider a model for x in which x is the sum of a signal and an independentnoise term, so that the overall covariance matrix can be decomposedas S = S S + S N , where S S , S N are constructed from signal and noise,respectively. If S N = E, then S S = S − E andS S a k = Sa k − Ea k = l k Ea k − Ea k =(l k − 1)Ea k ,so the a k are also eigenvectors of the signal covariance matrix, using themetric defined by E −1 . Hannachi (2000) demonstrates equivalences between• Thacker’s technique;• a method that finds a linear function of x that minimizes the probabilitydensity of noise for a fixed value of the probability density ofthe data, assuming both densities are multivariate normal;• maximization of signal to noise ratio as defined by Allen and Smith(1997).Diamantaras and Kung (1996, Section 7.2) discuss maximization of signalto noise ratio in a neural network context using what they call ‘orientedPCA.’ Their optimization problem is again equivalent to that of Thacker(1996). The fingerprint techniques in Section 12.4.3 also analyse signal tonoise ratios, but in that case the signal is defined as a squared expectation,rather than in terms of a signal covariance matrix.Because any linear transformation of x affects both the numerator anddenominator of the ratio in the same way, Thacker’s (1996) technique shareswith canonical variate analysis and CCA an invariance to the units ofmeasurement. In particular, unlike PCA, the results from covariance andcorrelation matrices are equivalent.14.2.3 Transformations and CenteringData may be transformed in a variety of ways before PCA is carried out,and we have seen a number of instances of this elsewhere in the book.Transformations are often used as a way of producing non-linearity (Section14.1) and are a frequent preprocessing step in the analysis of specialtypes of data. For example, discrete data may be ranked (Section 13.1)and size/shape data, compositional data and species abundance data (Sections13.2, 13.3, 13.8) may each be log-transformed before a PCA is done.The log transformation is particularly common and its properties, with andwithout standardization, are illustrated by Baxter (1995) using a numberof examples from archaeology.
- Page 368 and 369: 12.4. PCA and Non-Independent Data
- Page 370 and 371: 13.1. Principal Component Analysis
- Page 372 and 373: 13.1. Principal Component Analysis
- Page 374 and 375: 13.2. Analysis of Size and Shape 34
- Page 376 and 377: 13.2. Analysis of Size and Shape 34
- Page 378 and 379: 13.3. Principal Component Analysis
- Page 380 and 381: 13.3. Principal Component Analysis
- Page 382 and 383: 13.4. Principal Component Analysis
- Page 384 and 385: 13.4. Principal Component Analysis
- Page 386 and 387: 13.5. Common Principal Components 3
- Page 388 and 389: 13.5. Common Principal Components 3
- Page 390 and 391: 13.5. Common Principal Components 3
- Page 392 and 393: 13.5. Common Principal Components 3
- Page 394 and 395: 13.6. Principal Component Analysis
- Page 396 and 397: 13.6. Principal Component Analysis
- Page 398 and 399: 13.7. PCA in Statistical Process Co
- Page 400 and 401: 13.8. Some Other Types of Data 369A
- Page 402 and 403: 13.8. Some Other Types of Data 371d
- Page 404 and 405: 14Generalizations and Adaptations o
- Page 406 and 407: 14.1. Non-Linear Extensions of Prin
- Page 408 and 409: 14.1. Additive Principal Components
- Page 410 and 411: 14.1. Additive Principal Components
- Page 412 and 413: 14.1. Additive Principal Components
- Page 414 and 415: 14.2. Weights, Metrics, Transformat
- Page 416 and 417: 14.2. Weights, Metrics, Transformat
- Page 420 and 421: 14.2. Weights, Metrics, Transformat
- Page 422 and 423: 14.2. Weights, Metrics, Transformat
- Page 424 and 425: 14.3. PCs in the Presence of Second
- Page 426 and 427: 14.4. PCA for Non-Normal Distributi
- Page 428 and 429: 14.5. Three-Mode, Multiway and Mult
- Page 430 and 431: 14.5. Three-Mode, Multiway and Mult
- Page 432 and 433: 14.6. Miscellanea 401• Linear App
- Page 434 and 435: 14.6. Miscellanea 40314.6.3 Regress
- Page 436 and 437: 14.7. Concluding Remarks 405space o
- Page 438 and 439: Appendix AComputation of Principal
- Page 440 and 441: A.1. Numerical Calculation of Princ
- Page 442 and 443: A.1. Numerical Calculation of Princ
- Page 444 and 445: A.1. Numerical Calculation of Princ
- Page 446 and 447: ReferencesAguilera, A.M., Gutiérre
- Page 448 and 449: References 417Apley, D.W. and Shi,
- Page 450 and 451: References 419Benasseni, J. (1986b)
- Page 452 and 453: References 421Boik, R.J. (1986). Te
- Page 454 and 455: References 423Castro, P.E., Lawton,
- Page 456 and 457: References 425Cook, R.D. (1986). As
- Page 458 and 459: References 427Dempster, A.P., Laird
- Page 460 and 461: References 429Feeney, G.J. and Hest
- Page 462 and 463: References 431in Descriptive Multiv
- Page 464 and 465: References 433Gunst, R.F. and Mason
- Page 466 and 467: References 435Hocking, R.R., Speed,
388 14. Generalizations and Adaptations of <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong>Metric-based PCA, as defined by Thacker (1996), corresponds to thetriple (X, E −1 , 1 n I n), and E plays the same rôle as does Γ in the fixedeffects model. Tipping and Bishop’s (1999a) model (Section 3.9) can befitted as a special case with E = σ 2 I p . In this case the a k are simplyeigenvectors of S.Consider a model for x in which x is the sum of a signal and an independentnoise term, so that the overall covariance matrix can be decomposedas S = S S + S N , where S S , S N are constructed from signal and noise,respectively. If S N = E, then S S = S − E andS S a k = Sa k − Ea k = l k Ea k − Ea k =(l k − 1)Ea k ,so the a k are also eigenvectors of the signal covariance matrix, using themetric defined by E −1 . Hannachi (2000) demonstrates equivalences between• Thacker’s technique;• a method that finds a linear function of x that minimizes the probabilitydensity of noise for a fixed value of the probability density ofthe data, assuming both densities are multivariate normal;• maximization of signal to noise ratio as defined by Allen and Smith(1997).Diamantaras and Kung (1996, Section 7.2) discuss maximization of signalto noise ratio in a neural network context using what they call ‘orientedPCA.’ Their optimization problem is again equivalent to that of Thacker(1996). The fingerprint techniques in Section 12.4.3 also analyse signal tonoise ratios, but in that case the signal is defined as a squared expectation,rather than in terms of a signal covariance matrix.Because any linear transformation of x affects both the numerator anddenominator of the ratio in the same way, Thacker’s (1996) technique shareswith canonical variate analysis and CCA an invariance to the units ofmeasurement. In particular, unlike PCA, the results from covariance andcorrelation matrices are equivalent.14.2.3 Transformations and CenteringData may be transformed in a variety of ways before PCA is carried out,and we have seen a number of instances of this elsewhere in the book.Transformations are often used as a way of producing non-linearity (Section14.1) and are a frequent preprocessing step in the analysis of specialtypes of data. For example, discrete data may be ranked (Section 13.1)and size/shape data, compositional data and species abundance data (Sections13.2, 13.3, 13.8) may each be log-transformed before a PCA is done.The log transformation is particularly common and its properties, with andwithout standardization, are illustrated by Baxter (1995) using a numberof examples from archaeology.