12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

9.3. Canonical Correlation <strong>Analysis</strong> and Related Techniques 223correlation between a ′ k1 x p 1and a ′ k2 x p 2is maximized, subject to a ′ k1 x p 1,a ′ k2 x p 2both being uncorrelated with a ′ jh x p h, j =1, 2,...,(k − 1); h =1, 2. The name of the technique is confusingly similar to ‘canonical variateanalysis,’ which is used in discrimination (see Section 9.1). In fact, there is alink between the two techniques (see, for example, Gittins, 1985, Chapter 4;Mardia et al. 1979, Exercise 11.5.4), but this will not be discussed in detailhere. Because of this link, the view of canonical discriminant analysis as atwo-stage PCA, noted by Campbell and Atchley (1981) and discussed inSection 9.1, is also a valid perspective for CCA. Although CCA treats thetwo sets of variables x p1 , x p2 on an equal footing, it can still be used, asin the example of Section 9.3.2, if one set is clearly a set of responses whilethe other is a set of predictors. However, alternatives such as multivariateregression and other techniques discussed in Sections 8.4, 9.3.3 and 9.3.4may be more appropriate in this case.A number of authors have suggested that there are advantages in calculatingPCs z p1 , z p2 separately for x p1 , x p2 and then performing the CCA onz p1 , z p2 rather than x p1 , x p2 . Indeed, the main derivation of CCA given byPreisendorfer and Mobley (1988, Chapter 8) is in terms of the PCs for thetwo groups of variables. If z p1 , z p2 consist of all p 1 , p 2 PCs, respectively,then the results using z p1 , z p2 are equivalent to those for x p1 , x p2 .Thisfollows as z p1 , z p2 are exact linear functions of x p1 , x p2 , respectively, and,conversely, x p1 , x p2 are exact linear functions of z p1 , z p2 , respectively. Weare looking for ‘optimal’ linear functions of z p1 , z p2 , but this is equivalentto searching for ‘optimal’ linear functions of x p1 , x p2 so we have the sameanalysis as that based on x p1 , x p2 .Muller (1982) argues that using z p1 , z p2 instead of x p1 , x p2 can makesome of the theory behind CCA easier to understand, and that it canhelp in interpreting the results of such an analysis. He also illustrates theuse of PCA as a preliminary dimension-reducing technique by performingCCA based on just the first few elements of z p1 and z p2 . Von Storch andZwiers (1999, Section 14.1.6) note computational advantages in workingwith the PCs and also suggest using only the first few PCs to constructthe canonical variates. This works well in the example given by Muller(1982), but cannot be expected to do so in general, for reasons similarto those already discussed in the contexts of regression (Chapter 8) anddiscriminant analysis (Section 9.1). There is simply no reason why thoselinear functions of x p1 that are highly correlated with linear functions ofx p2 should necessarily be in the subspace spanned by the first few PCsof x p1 ; they could equally well be related to the last few PCs of x p1 .The fact that a linear function of x p1 has a small variance, as do thelast few PCs, in no way prevents it from having a high correlation withsome linear function of x p2 . As well as suggesting the use of PCs in CCA,Muller (1982) describes the closely related topic of using canonical correlationanalysis to compare sets of PCs. This will be discussed further inSection 13.5.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!