Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
9.3. Canonical Correlation Analysis and Related Techniques 227by Stewart and Love (1968), and is an index of the average proportionof the variance of the variables in one set that is reproducible from thevariables in the other set. One immediate difference from both CCA andmaximum covariance analysis is that it does not view the two sets of variablessymmetrically. One set is treated as response variables and the otheras predictor variables, and the results of the analysis are different dependingon the choice of which set contains responses. For convenience, in whatfollows x p1 and x p2 consist of responses and predictors, respectively.Stewart and Love’s (1968) redundancy index, given a pair of canonicalvariates, can be expressed as the product of two terms. These terms arethe squared canonical correlation and the variance of the canonical variatefor the response set. It is clear that a different value results if the rôles ofpredictor and response variables are reversed. The redundancy coefficientcan be obtained by regressing each response variable on all the predictorvariables and then averaging the p 1 squared multiple correlations fromthese regressions. This has a link to the interpretation of PCA given inthe discussion of Property A6 in Chapter 2, and was used by van denWollenberg (1977) and Thacker (1999) to introduce two slightly differenttechniques.In van den Wollenberg’s (1977) redundancy analysis, linear functionsa ′ k2 x p 2of x p2 are found that successively maximize their average squaredcorrelation with the elements of the response set x p1 , subject to the vectorsof loadings a 12 , a 22 ,...beingorthogonal. It turns out (van den Wollenberg,1977) that finding the required linear functions is achieved by solving theequationR xy R yx a k2 = l k R xx a k2 , (9.3.2)where R xx is the correlation matrix for the predictor variables, R xy is thematrix of correlations between the predictor and response variables, andR yx is the transpose of R xy . A linear function of x p1 can be found byreversing the rôles of predictor and response variables, and hence replacingx by y and vice versa, in equation (9.3.2).Thacker (1999) also considers a linear function z 1 = a ′ 12x p2 of the predictorsx p2 . Again a 12 is chosen to maximize ∑ p 1j=1 r2 1j , where r 1j is thecorrelation between z 1 and the jth response variable. The variable z 1 iscalled the first principal predictor by Thacker (1999). Second, third, ...principal predictors are defined by maximizing the same quantity, subjectto the constraint that each principal predictor must be uncorrelated withall previous principal predictors. Thacker (1999) shows that the vectors ofloadings a 12 , a 22 , ... are solutions of the equationS xy [diag(S yy )] −1 S yx a k2 = l k S xx a k2 , (9.3.3)where S xx , S yy , S xy and S yx are covariance matrices defined analogously tothe correlation matrices R xx , R yy , R xy and R yx above. The eigenvalue l kcorresponding to a k2 is equal to the sum of squared correlations ∑ p 1j=1 r2 kj
228 9. Principal Components Used with Other Multivariate Techniquesbetween a ′ k2 x p2 and each of the variables x j . The difference between principalpredictors and redundancy analysis is that the principal predictorsare uncorrelated, whereas the derived variables in redundancy analysis arecorrelated but have vectors of loadings that are orthogonal. The presenceof correlation in redundancy analysis may be regarded as a drawback, andvan den Wollenberg (1977) suggests using the first few derived variablesfrom redundancy analysis as input to CCA. This will then produce uncorrelatedcanonical variates whose variances are unlikely to be small. Thepossibility of using the first few PCs from each set as input to CCA wasmentioned above, as was the disadvantage that excluded low-variance PCsmight contain strong inter-set correlations. As low-variance directions areunlikely to be of interest in redundancy analysis, using the first few PCsas input seems to be far safer in this case and is another option.It is of interest to note the similarity between equations (9.3.2), (9.3.3)and the eigenequation whose solution gives the loadings a k2 on x p2 forcanonical correlation analysis, namelyS xy S −1yy S yx a k2 = l k S xx a k2 , (9.3.4)using the present notation. Wang and Zwiers (2001) solve a version of(9.3.2) with covariance matrices replacing correlation matrices, by firstsolving the eigenequationS yx S −1xx S xy b k2 = l k b k2 , (9.3.5)and then setting a k2 = l − 1 2kS−1 xx S xy b k2 . This is equivalent to a PCA ofthe covariance matrix S yx S −1xx S xy of the predicted values of the responsevariables obtained from a multivariate regression on the predictor variables.Multivariate regression is discussed further in Section 9.3.4.Van den Wollenberg (1977) notes that PCA is a special case of redundancyanalysis (and principal predictors, but not CCA) when x p1 and x p2are the same (see also Property A6 in Chapter 2). Muller (1981) showsthat redundancy analysis is equivalent to orthogonally rotating the resultsof a multivariate regression analysis. DeSarbo and Jedidi (1986) give anumber of other properties, together with modifications and extensions, ofredundancy analysis.9.3.4 Other Techniques for Relating Two Sets of VariablesA number of other techniques for relating two sets of variables were notedin Section 8.4. They include separate PCAs on the two groups of variables,followed by the calculation of a regression equation to predict, again separately,each PC from one set from the PCs in the other set. Another way ofusing PCA is to concatenate the two sets of variables and find PCs for thecombined set of (p 1 + p 2 ) variables. This is sometimes known as combinedPCA, and is one of the methods that Bretherton et al. (1992) compare with
- Page 208 and 209: 8.3. Connections Between PC Regress
- Page 210 and 211: 8.4. Variations on Principal Compon
- Page 212 and 213: 8.4. Variations on Principal Compon
- Page 214 and 215: 8.4. Variations on Principal Compon
- Page 216 and 217: 8.5. Variable Selection in Regressi
- Page 218 and 219: 8.5. Variable Selection in Regressi
- Page 220 and 221: 8.6. Functional and Structural Rela
- Page 222 and 223: 8.7. Examples of Principal Componen
- Page 224 and 225: Table 8.3. Principal component regr
- Page 226 and 227: 8.7. Examples of Principal Componen
- Page 228 and 229: 8.7. Examples of Principal Componen
- Page 230 and 231: 9Principal Components Used withOthe
- Page 232 and 233: 9.1. Discriminant Analysis 201on th
- Page 234 and 235: 9.1. Discriminant Analysis 203Figur
- Page 236 and 237: 9.1. Discriminant Analysis 205Corbi
- Page 238 and 239: 9.1. Discriminant Analysis 207that
- Page 240 and 241: 9.1. Discriminant Analysis 209betwe
- Page 242 and 243: 9.2. Cluster Analysis 211dimensiona
- Page 244 and 245: 9.2. Cluster Analysis 213Before loo
- Page 246 and 247: 9.2. Cluster Analysis 215Figure 9.3
- Page 248 and 249: 9.2. Cluster Analysis 217demographi
- Page 250 and 251: 9.2. Cluster Analysis 219county clu
- Page 252 and 253: 9.2. Cluster Analysis 221choosing a
- Page 254 and 255: 9.3. Canonical Correlation Analysis
- Page 256 and 257: 9.3. Canonical Correlation Analysis
- Page 260 and 261: 9.3. Canonical Correlation Analysis
- Page 262 and 263: 9.3. Canonical Correlation Analysis
- Page 264 and 265: 10.1. Detection of Outliers Using P
- Page 266 and 267: 10.1. Detection of Outliers Using P
- Page 268 and 269: 10.1. Detection of Outliers Using P
- Page 270 and 271: 10.1. Detection of Outliers Using P
- Page 272 and 273: 10.1. Detection of Outliers Using P
- Page 274 and 275: 10.1. Detection of Outliers Using P
- Page 276 and 277: 10.1. Detection of Outliers Using P
- Page 278 and 279: 10.1. Detection of Outliers Using P
- Page 280 and 281: 10.2. Influential Observations in a
- Page 282 and 283: 10.2. Influential Observations in a
- Page 284 and 285: 10.2. Influential Observations in a
- Page 286 and 287: 10.2. Influential Observations in a
- Page 288 and 289: 10.2. Influential Observations in a
- Page 290 and 291: 10.3. Sensitivity and Stability 259
- Page 292 and 293: 10.3. Sensitivity and Stability 261
- Page 294 and 295: 10.4. Robust Estimation of Principa
- Page 296 and 297: 10.4. Robust Estimation of Principa
- Page 298 and 299: 10.4. Robust Estimation of Principa
- Page 300 and 301: 11Rotation and Interpretation ofPri
- Page 302 and 303: 11.1. Rotation of Principal Compone
- Page 304 and 305: oot of the corresponding eigenvalue
- Page 306 and 307: 11.1. Rotation of Principal Compone
228 9. <strong>Principal</strong> <strong>Component</strong>s Used with Other Multivariate Techniquesbetween a ′ k2 x p2 and each of the variables x j . The difference between principalpredictors and redundancy analysis is that the principal predictorsare uncorrelated, whereas the derived variables in redundancy analysis arecorrelated but have vectors of loadings that are orthogonal. The presenceof correlation in redundancy analysis may be regarded as a drawback, andvan den Wollenberg (1977) suggests using the first few derived variablesfrom redundancy analysis as input to CCA. This will then produce uncorrelatedcanonical variates whose variances are unlikely to be small. Thepossibility of using the first few PCs from each set as input to CCA wasmentioned above, as was the disadvantage that excluded low-variance PCsmight contain strong inter-set correlations. As low-variance directions areunlikely to be of interest in redundancy analysis, using the first few PCsas input seems to be far safer in this case and is another option.It is of interest to note the similarity between equations (9.3.2), (9.3.3)and the eigenequation whose solution gives the loadings a k2 on x p2 forcanonical correlation analysis, namelyS xy S −1yy S yx a k2 = l k S xx a k2 , (9.3.4)using the present notation. Wang and Zwiers (2001) solve a version of(9.3.2) with covariance matrices replacing correlation matrices, by firstsolving the eigenequationS yx S −1xx S xy b k2 = l k b k2 , (9.3.5)and then setting a k2 = l − 1 2kS−1 xx S xy b k2 . This is equivalent to a PCA ofthe covariance matrix S yx S −1xx S xy of the predicted values of the responsevariables obtained from a multivariate regression on the predictor variables.Multivariate regression is discussed further in Section 9.3.4.Van den Wollenberg (1977) notes that PCA is a special case of redundancyanalysis (and principal predictors, but not CCA) when x p1 and x p2are the same (see also Property A6 in Chapter 2). Muller (1981) showsthat redundancy analysis is equivalent to orthogonally rotating the resultsof a multivariate regression analysis. DeSarbo and Jedidi (1986) give anumber of other properties, together with modifications and extensions, ofredundancy analysis.9.3.4 Other Techniques for Relating Two Sets of VariablesA number of other techniques for relating two sets of variables were notedin Section 8.4. They include separate PCAs on the two groups of variables,followed by the calculation of a regression equation to predict, again separately,each PC from one set from the PCs in the other set. Another way ofusing PCA is to concatenate the two sets of variables and find PCs for thecombined set of (p 1 + p 2 ) variables. This is sometimes known as combinedPCA, and is one of the methods that Bretherton et al. (1992) compare with