Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

9.3. Canonical Correlation Analysis and Related Techniques 227by Stewart and Love (1968), and is an index of the average proportionof the variance of the variables in one set that is reproducible from thevariables in the other set. One immediate difference from both CCA andmaximum covariance analysis is that it does not view the two sets of variablessymmetrically. One set is treated as response variables and the otheras predictor variables, and the results of the analysis are different dependingon the choice of which set contains responses. For convenience, in whatfollows x p1 and x p2 consist of responses and predictors, respectively.Stewart and Love’s (1968) redundancy index, given a pair of canonicalvariates, can be expressed as the product of two terms. These terms arethe squared canonical correlation and the variance of the canonical variatefor the response set. It is clear that a different value results if the rôles ofpredictor and response variables are reversed. The redundancy coefficientcan be obtained by regressing each response variable on all the predictorvariables and then averaging the p 1 squared multiple correlations fromthese regressions. This has a link to the interpretation of PCA given inthe discussion of Property A6 in Chapter 2, and was used by van denWollenberg (1977) and Thacker (1999) to introduce two slightly differenttechniques.In van den Wollenberg’s (1977) redundancy analysis, linear functionsa ′ k2 x p 2of x p2 are found that successively maximize their average squaredcorrelation with the elements of the response set x p1 , subject to the vectorsof loadings a 12 , a 22 ,...beingorthogonal. It turns out (van den Wollenberg,1977) that finding the required linear functions is achieved by solving theequationR xy R yx a k2 = l k R xx a k2 , (9.3.2)where R xx is the correlation matrix for the predictor variables, R xy is thematrix of correlations between the predictor and response variables, andR yx is the transpose of R xy . A linear function of x p1 can be found byreversing the rôles of predictor and response variables, and hence replacingx by y and vice versa, in equation (9.3.2).Thacker (1999) also considers a linear function z 1 = a ′ 12x p2 of the predictorsx p2 . Again a 12 is chosen to maximize ∑ p 1j=1 r2 1j , where r 1j is thecorrelation between z 1 and the jth response variable. The variable z 1 iscalled the first principal predictor by Thacker (1999). Second, third, ...principal predictors are defined by maximizing the same quantity, subjectto the constraint that each principal predictor must be uncorrelated withall previous principal predictors. Thacker (1999) shows that the vectors ofloadings a 12 , a 22 , ... are solutions of the equationS xy [diag(S yy )] −1 S yx a k2 = l k S xx a k2 , (9.3.3)where S xx , S yy , S xy and S yx are covariance matrices defined analogously tothe correlation matrices R xx , R yy , R xy and R yx above. The eigenvalue l kcorresponding to a k2 is equal to the sum of squared correlations ∑ p 1j=1 r2 kj

228 9. Principal Components Used with Other Multivariate Techniquesbetween a ′ k2 x p2 and each of the variables x j . The difference between principalpredictors and redundancy analysis is that the principal predictorsare uncorrelated, whereas the derived variables in redundancy analysis arecorrelated but have vectors of loadings that are orthogonal. The presenceof correlation in redundancy analysis may be regarded as a drawback, andvan den Wollenberg (1977) suggests using the first few derived variablesfrom redundancy analysis as input to CCA. This will then produce uncorrelatedcanonical variates whose variances are unlikely to be small. Thepossibility of using the first few PCs from each set as input to CCA wasmentioned above, as was the disadvantage that excluded low-variance PCsmight contain strong inter-set correlations. As low-variance directions areunlikely to be of interest in redundancy analysis, using the first few PCsas input seems to be far safer in this case and is another option.It is of interest to note the similarity between equations (9.3.2), (9.3.3)and the eigenequation whose solution gives the loadings a k2 on x p2 forcanonical correlation analysis, namelyS xy S −1yy S yx a k2 = l k S xx a k2 , (9.3.4)using the present notation. Wang and Zwiers (2001) solve a version of(9.3.2) with covariance matrices replacing correlation matrices, by firstsolving the eigenequationS yx S −1xx S xy b k2 = l k b k2 , (9.3.5)and then setting a k2 = l − 1 2kS−1 xx S xy b k2 . This is equivalent to a PCA ofthe covariance matrix S yx S −1xx S xy of the predicted values of the responsevariables obtained from a multivariate regression on the predictor variables.Multivariate regression is discussed further in Section 9.3.4.Van den Wollenberg (1977) notes that PCA is a special case of redundancyanalysis (and principal predictors, but not CCA) when x p1 and x p2are the same (see also Property A6 in Chapter 2). Muller (1981) showsthat redundancy analysis is equivalent to orthogonally rotating the resultsof a multivariate regression analysis. DeSarbo and Jedidi (1986) give anumber of other properties, together with modifications and extensions, ofredundancy analysis.9.3.4 Other Techniques for Relating Two Sets of VariablesA number of other techniques for relating two sets of variables were notedin Section 8.4. They include separate PCAs on the two groups of variables,followed by the calculation of a regression equation to predict, again separately,each PC from one set from the PCs in the other set. Another way ofusing PCA is to concatenate the two sets of variables and find PCs for thecombined set of (p 1 + p 2 ) variables. This is sometimes known as combinedPCA, and is one of the methods that Bretherton et al. (1992) compare with

228 9. <strong>Principal</strong> <strong>Component</strong>s Used with Other Multivariate Techniquesbetween a ′ k2 x p2 and each of the variables x j . The difference between principalpredictors and redundancy analysis is that the principal predictorsare uncorrelated, whereas the derived variables in redundancy analysis arecorrelated but have vectors of loadings that are orthogonal. The presenceof correlation in redundancy analysis may be regarded as a drawback, andvan den Wollenberg (1977) suggests using the first few derived variablesfrom redundancy analysis as input to CCA. This will then produce uncorrelatedcanonical variates whose variances are unlikely to be small. Thepossibility of using the first few PCs from each set as input to CCA wasmentioned above, as was the disadvantage that excluded low-variance PCsmight contain strong inter-set correlations. As low-variance directions areunlikely to be of interest in redundancy analysis, using the first few PCsas input seems to be far safer in this case and is another option.It is of interest to note the similarity between equations (9.3.2), (9.3.3)and the eigenequation whose solution gives the loadings a k2 on x p2 forcanonical correlation analysis, namelyS xy S −1yy S yx a k2 = l k S xx a k2 , (9.3.4)using the present notation. Wang and Zwiers (2001) solve a version of(9.3.2) with covariance matrices replacing correlation matrices, by firstsolving the eigenequationS yx S −1xx S xy b k2 = l k b k2 , (9.3.5)and then setting a k2 = l − 1 2kS−1 xx S xy b k2 . This is equivalent to a PCA ofthe covariance matrix S yx S −1xx S xy of the predicted values of the responsevariables obtained from a multivariate regression on the predictor variables.Multivariate regression is discussed further in Section 9.3.4.Van den Wollenberg (1977) notes that PCA is a special case of redundancyanalysis (and principal predictors, but not CCA) when x p1 and x p2are the same (see also Property A6 in Chapter 2). Muller (1981) showsthat redundancy analysis is equivalent to orthogonally rotating the resultsof a multivariate regression analysis. DeSarbo and Jedidi (1986) give anumber of other properties, together with modifications and extensions, ofredundancy analysis.9.3.4 Other Techniques for Relating Two Sets of VariablesA number of other techniques for relating two sets of variables were notedin Section 8.4. They include separate PCAs on the two groups of variables,followed by the calculation of a regression equation to predict, again separately,each PC from one set from the PCs in the other set. Another way ofusing PCA is to concatenate the two sets of variables and find PCs for thecombined set of (p 1 + p 2 ) variables. This is sometimes known as combinedPCA, and is one of the methods that Bretherton et al. (1992) compare with

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!