12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.4. Correspondence <strong>Analysis</strong> 103of dividing the jth column of X by the standard deviation of x j to give acorrelation biplot, here the jth column is divided by the mean of x j .Ofcourse, this only makes sense for certain types of non-negative variables,but Underhill (1990) shows that for such variables the resulting biplot givesa useful view of the data and variables. The cosines of the angles betweenthe h ∗ j still provide approximations to the correlations between variables,but the lengths of the vectors h ∗ j now give information on the variabilityof the x j relative to their means.Finally, the biplot can be adapted to cope with missing values by introducingweights w ij for each observation x ij when approximating x ij bygi ∗′ h∗ j . A weight of zero is given to missing values and a unit weight to thosevalues which are present. The appropriate values for gi ∗, h∗ j can be calculatedusing an algorithm which handles general weights, due to Gabriel andZamir (1979). For a more general discussion of missing data in PCA seeSection 13.6.5.4 Correspondence <strong>Analysis</strong>The technique commonly called correspondence analysis has been ‘rediscovered’many times in several different guises with various names, suchas ‘reciprocal averaging’ or ‘dual scaling.’ Greenacre (1984) provides acomprehensive treatment of the subject; in particular his Section 1.3 andChapter 4 discuss, respectively, the history and the various different approachesto the topic. Benzécri (1992) is also comprehensive, and morerecent, but its usefulness is limited by a complete lack of references toother sources. Two shorter texts, which concentrate on the more practicalaspects of correspondence analysis, are Clausen (1998) and Greenacre(1993).The name ‘correspondence analysis’ is derived from the French ‘analysedes correspondances’ (Benzécri, 1980). Although, at first sight, correspondenceanalysis seems unrelated to PCA it can be shown that it is, in fact,equivalent to a form of PCA for discrete (generally nominal) variables (seeSection 13.1). The technique is often used to provide a graphical representationof data in two dimensions. The data are normally presented in the formof a contingency table, but because of this graphical usage the technique isintroduced briefly in the present chapter. Further discussion of correspondenceanalysis and various generalizations of the technique, together withits connections to PCA, is given in Sections 13.1, 14.1 and 14.2.Suppose that a set of data is presented in the form of a two-way contingencytable, in which a set of n observations is classified according to itsvalues on two discrete random variables. Thus the information available isthe set of frequencies {n ij ,i=1, 2,...,r; j =1, 2,...,c}, where n ij is thenumber of observations that take the ith value for the first (row) variable

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!