12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

224 9. <strong>Principal</strong> <strong>Component</strong>s Used with Other Multivariate TechniquesAn interesting connection between PCA and CCA is given by consideringthe problem of minimizing var[a ′ 1x p1 −a ′ 2x p2 ]. If constraints a ′ 1Σ 11 a 1 =a ′ 2Σ 22 a 2 = 1 are added to this problem, where Σ 11 , Σ 22 are the covariancematrices for x p1 , x p2 , respectively, we obtain the first pair of canonicalvariates. If, instead, the constraint a ′ 1a 1 + a ′ 2a 2 = 1 is added, the coefficients(a ′ 1, a ′ 2) ′ define the last PC for the vector of random variablesx =(x ′ p 1, x ′ p 2) ′ . There has been much discussion in the literature of a varietyof connections between multivariate techniques, including PCA andCCA. Gittins (1985, Sections 4.8, 5.6, 5.7) gives numerous references. Inthe special case where p 1 = p 2 and the same variables are measured inboth x p1 and x p2 , perhaps at different time periods or for matched pairsof individuals, Flury and Neuenschwander (1995) demonstrate a theoreticalequivalence between the canonical variates and a common principalcomponent model (see Section 13.5) when the latter model holds.9.3.2 Example of CCAJeffers (1978, p. 136) considers an example with 15 variables measured on272 sand and mud samples taken from various locations in Morecambe Bay,on the north west coast of England. The variables are of two types: eightvariables are chemical or physical properties of the sand or mud samples,and seven variables measure the abundance of seven groups of invertebratespecies in the samples. The relationships between the two groups of variables,describing environment and species, are of interest, so that canonicalcorrelation analysis is an obvious technique to use.Table 9.3 gives the coefficients for the first two pairs of canonical variates,together with the correlations between each pair—the canonical correlations.The definitions of each variable are not given here (see Jeffers(1978, pp. 103, 107)). The first canonical variate for species is dominatedby a single species. The corresponding canonical variate for the environmentalvariables involves non-trivial coefficients for four of the variables,but is not difficult to interpret (Jeffers, 1978, p. 138). The second pair ofcanonical variates has fairly large coefficients for three species and threeenvironmental variables.Jeffers (1978, pp. 105–109) also looks at PCs for the environmental andspecies variables separately, and concludes that four and five PCs, respectively,are necessary to account for most of the variation in each group. Hegoes on to look, informally, at the between-group correlations for each setof retained PCs.Instead of simply looking at the individual correlations between PCs fordifferent groups, an alternative is to do a canonical correlation analysisbased only on the retained PCs, as suggested by Muller (1982). In thepresent example this analysis gives values of 0.420 and 0.258 for the firsttwo canonical correlations, compared with 0.559 and 0.334 when all thevariables are used. The first two canonical variates for the environmental

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!