Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
14.2. Weights, Metrics, Transformations and Centerings 385PC framework above unless the w ij can be written as products w ij =ω i φ j ,i=1, 2,...,n; j =1, 2,...,p, although this method involves similarideas. The examples given by Gabriel and Zamir (1979) can be expressed ascontingency tables, so that correspondence analysis rather than PCA maybe more appropriate, and Greenacre (1984), too, develops generalized PCAas an offshoot of correspondence analysis (he shows that another specialcase of the generalized SVD (14.2.2) produces correspondence analysis, aresult which was discussed further in Section 13.1). The idea of weightingcould, however, be used in PCA for any type of data, provided that suitableweights can be defined.Gabriel and Zamir (1979) suggest a number of ways in which special casesof their weighted analysis may be used. As noted in Section 13.6, it canaccommodate missing data by giving zero weight to missing elements of X.Alternatively, the analysis can be used to look for ‘outlying cells’ in a datamatrix. This can be achieved by using similar ideas to those introducedin Section 6.1.5 in the context of choosing how many PCs to retain. Anyparticular element x ij of X is estimated by least squares based on a subsetof the data that does not include x ij . This (rank m) estimate mˆx ij isreadily found by equating to zero a subset of weights in (14.2.5), includingw ij , The difference between x ij and mˆx ij provides a better measure of the‘outlyingness’ of x ij compared to the remaining elements of X, than doesthe difference between x ij and a rank m estimate, m˜x ij , based on the SVDfor the entire matrix X. This result follows because mˆx ij is not affected byx ij , whereas x ij contributes to the estimate m˜x ij .Commandeur et al. (1999) describe how to introduce weights for bothvariables and observations into Meulman’s (1986) distance approach tononlinear multivariate data analysis (see Section 14.1.1).In the standard atmospheric science set-up, in which variables correspondto spatial locations, weights may be introduced to take account of unevenspacing between the locations where measurements are taken. The weightsreflect the size of the area for which a particular location (variable) isthe closest point. This type of weighting may also be necessary when thelocations are regularly spaced on a latitude/longitude grid. The areas of thecorresponding grid cells decrease towards the poles, and allowance shouldbe made for this if the latitudinal spread of the data is moderate or large. Anobvious strategy is to assign to the grid cells weights that are proportionalto their areas. However, if there is a strong positive correlation within cells,it can be argued that doubling the area, for example, does not double theamount of independent information and that weights should reflect this.Folland (1988) implies that weights should be proportional to (Area) c ,where c is between 1 2and 1. Hannachi and O’Neill (2001) weight their databy the cosine of latitude.Buell (1978) and North et al. (1982) derive weights for irregularly spacedatmospheric data by approximating a continuous version of PCA, based onan equation similar to (12.3.1).
386 14. Generalizations and Adaptations of Principal Component Analysis14.2.2 MetricsThe idea of defining PCA with respect to a metric or an inner-product datesback at least to Dempster (1969, Section 7.6). Following the publication ofCailliez and Pagès (1976) it became, together with an associated ‘dualitydiagram,’ a popular view of PCA in France in the 1980s (see, for example,Caussinus, 1986; Escoufier, 1987). In this framework, PCA is defined interms of a triple (X, Q, D), the three elements of which are:• the matrix X is the (n × p) data matrix, which is usually but notnecessarily column-centred;• the (p × p) matrix Q defines a metric on the p variables, so that thedistance between two observations x j and x k is (x j −x k ) ′ Q(x j −x k );• the (n × n) matrix D is usually diagonal, and its diagonal elementsconsist of a set of weights for the n observations. It can, however, bemore general, for example when the observations are not independent,as in time series (Caussinus, 1986; Escoufier, 1987).The usual definition of covariance-based PCA has Q = I p the identitymatrix, and D = 1 n I n, though to get the sample covariance matrix withdivisor (n − 1) it is necessary to replace n by (n − 1) in the definition ofD, leading to a set of ‘weights’ which do not sum to unity. CorrelationbasedPCA is achieved either by standardizing X, or by taking Q to bethe diagonal matrix whose jth diagonal element is the reciprocal of thestandard deviation of the jth variable, j =1, 2,...,p.Implementation of PCA with a general triple (X, Q, D) is readilyachieved by means of the generalized SVD, described in Section 14.2.1,with Φ and Ω from that section equal to Q and D from this section. Thecoefficients of the generalized PCs are given in the columns of the matrixB defined by equation (14.2.2). Alternatively, they can be found from aneigenanalysis of X ′ DXQ or XQX ′ D (Escoufier, 1987).A number of particular generalizations of the standard form of PCA fitwithin this framework. For example, Escoufier (1987) shows that, in additionto the cases already noted, it can be used to: transform variables; toremove the effect of an observation by putting it at the origin; to look atsubspaces orthogonal to a subset of variables; to compare sample and theoreticalcovariance matrices; and to derive correspondence and discriminantanalyses. Maurin (1987) examines how the eigenvalues and eigenvectors ofa generalized PCA change when the matrix Q in the triple is changed.The framework also has connections with the fixed effects model of Section3.9. In that model, the observations x i are such that x i = z i + e i ,where z i lies in a q-dimensional subspace and e i is an error term with zeromean and covariance matrix σ2w iΓ. Maximum likelihood estimation of themodel, assuming a multivariate normal distribution for e, leads to a generalizedPCA, where D is diagonal with elements w i and Q (which is denoted
- Page 366 and 367: 12.4. PCA and Non-Independent Data
- Page 368 and 369: 12.4. PCA and Non-Independent Data
- Page 370 and 371: 13.1. Principal Component Analysis
- Page 372 and 373: 13.1. Principal Component Analysis
- Page 374 and 375: 13.2. Analysis of Size and Shape 34
- Page 376 and 377: 13.2. Analysis of Size and Shape 34
- Page 378 and 379: 13.3. Principal Component Analysis
- Page 380 and 381: 13.3. Principal Component Analysis
- Page 382 and 383: 13.4. Principal Component Analysis
- Page 384 and 385: 13.4. Principal Component Analysis
- Page 386 and 387: 13.5. Common Principal Components 3
- Page 388 and 389: 13.5. Common Principal Components 3
- Page 390 and 391: 13.5. Common Principal Components 3
- Page 392 and 393: 13.5. Common Principal Components 3
- Page 394 and 395: 13.6. Principal Component Analysis
- Page 396 and 397: 13.6. Principal Component Analysis
- Page 398 and 399: 13.7. PCA in Statistical Process Co
- Page 400 and 401: 13.8. Some Other Types of Data 369A
- Page 402 and 403: 13.8. Some Other Types of Data 371d
- Page 404 and 405: 14Generalizations and Adaptations o
- Page 406 and 407: 14.1. Non-Linear Extensions of Prin
- Page 408 and 409: 14.1. Additive Principal Components
- Page 410 and 411: 14.1. Additive Principal Components
- Page 412 and 413: 14.1. Additive Principal Components
- Page 414 and 415: 14.2. Weights, Metrics, Transformat
- Page 418 and 419: 14.2. Weights, Metrics, Transformat
- Page 420 and 421: 14.2. Weights, Metrics, Transformat
- Page 422 and 423: 14.2. Weights, Metrics, Transformat
- Page 424 and 425: 14.3. PCs in the Presence of Second
- Page 426 and 427: 14.4. PCA for Non-Normal Distributi
- Page 428 and 429: 14.5. Three-Mode, Multiway and Mult
- Page 430 and 431: 14.5. Three-Mode, Multiway and Mult
- Page 432 and 433: 14.6. Miscellanea 401• Linear App
- Page 434 and 435: 14.6. Miscellanea 40314.6.3 Regress
- Page 436 and 437: 14.7. Concluding Remarks 405space o
- Page 438 and 439: Appendix AComputation of Principal
- Page 440 and 441: A.1. Numerical Calculation of Princ
- Page 442 and 443: A.1. Numerical Calculation of Princ
- Page 444 and 445: A.1. Numerical Calculation of Princ
- Page 446 and 447: ReferencesAguilera, A.M., Gutiérre
- Page 448 and 449: References 417Apley, D.W. and Shi,
- Page 450 and 451: References 419Benasseni, J. (1986b)
- Page 452 and 453: References 421Boik, R.J. (1986). Te
- Page 454 and 455: References 423Castro, P.E., Lawton,
- Page 456 and 457: References 425Cook, R.D. (1986). As
- Page 458 and 459: References 427Dempster, A.P., Laird
- Page 460 and 461: References 429Feeney, G.J. and Hest
- Page 462 and 463: References 431in Descriptive Multiv
- Page 464 and 465: References 433Gunst, R.F. and Mason
386 14. Generalizations and Adaptations of <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong>14.2.2 MetricsThe idea of defining PCA with respect to a metric or an inner-product datesback at least to Dempster (1969, Section 7.6). Following the publication ofCailliez and Pagès (1976) it became, together with an associated ‘dualitydiagram,’ a popular view of PCA in France in the 1980s (see, for example,Caussinus, 1986; Escoufier, 1987). In this framework, PCA is defined interms of a triple (X, Q, D), the three elements of which are:• the matrix X is the (n × p) data matrix, which is usually but notnecessarily column-centred;• the (p × p) matrix Q defines a metric on the p variables, so that thedistance between two observations x j and x k is (x j −x k ) ′ Q(x j −x k );• the (n × n) matrix D is usually diagonal, and its diagonal elementsconsist of a set of weights for the n observations. It can, however, bemore general, for example when the observations are not independent,as in time series (Caussinus, 1986; Escoufier, 1987).The usual definition of covariance-based PCA has Q = I p the identitymatrix, and D = 1 n I n, though to get the sample covariance matrix withdivisor (n − 1) it is necessary to replace n by (n − 1) in the definition ofD, leading to a set of ‘weights’ which do not sum to unity. CorrelationbasedPCA is achieved either by standardizing X, or by taking Q to bethe diagonal matrix whose jth diagonal element is the reciprocal of thestandard deviation of the jth variable, j =1, 2,...,p.Implementation of PCA with a general triple (X, Q, D) is readilyachieved by means of the generalized SVD, described in Section 14.2.1,with Φ and Ω from that section equal to Q and D from this section. Thecoefficients of the generalized PCs are given in the columns of the matrixB defined by equation (14.2.2). Alternatively, they can be found from aneigenanalysis of X ′ DXQ or XQX ′ D (Escoufier, 1987).A number of particular generalizations of the standard form of PCA fitwithin this framework. For example, Escoufier (1987) shows that, in additionto the cases already noted, it can be used to: transform variables; toremove the effect of an observation by putting it at the origin; to look atsubspaces orthogonal to a subset of variables; to compare sample and theoreticalcovariance matrices; and to derive correspondence and discriminantanalyses. Maurin (1987) examines how the eigenvalues and eigenvectors ofa generalized PCA change when the matrix Q in the triple is changed.The framework also has connections with the fixed effects model of Section3.9. In that model, the observations x i are such that x i = z i + e i ,where z i lies in a q-dimensional subspace and e i is an error term with zeromean and covariance matrix σ2w iΓ. Maximum likelihood estimation of themodel, assuming a multivariate normal distribution for e, leads to a generalizedPCA, where D is diagonal with elements w i and Q (which is denoted