12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

104 5. Graphical Representation of Data Using <strong>Principal</strong> <strong>Component</strong>sand the jth value for the second (column) variable. Let N be the (r × c)matrix whose (i, j)th element is n ij .There are a number of seemingly different approaches, all of which leadto correspondence analysis; Greenacre (1984, Chapter 4) discusses thesevarious possibilities in some detail. Whichever approach is used, the finalproduct is a sequence of pairs of vectors (f 1 , g 1 ), (f 2 , g 2 ),...,(f q , g q ) wheref k ,k=1, 2,...,arer-vectors of scores or coefficients for the rows of N,and g k ,k=1, 2,... are c-vectors of scores or coefficients for the columnsof N. These pairs of vectors are such that the first q such pairs give a ‘bestfitting’representation in q dimensions, in a sense defined in Section 13.1,of the matrix N, and of its rows and columns. It is common to take q =2.The rows and columns can then be plotted on a two-dimensional diagram;the coordinates of the ith row are the ith elements of f 1 , f 2 ,i=1, 2,...,r,and the coordinates of the jth column are the jth elements of g 1 , g 2 ,j=1, 2,...,c.Such two-dimensional plots cannot in general be compared in any directway with plots made with respect to PCs or classical biplots, as N isa different type of data matrix from that used for PCs or their biplots.However, Greenacre (1984, Sections 9.6 and 9.10) gives examples wherecorrespondence analysis is done with an ordinary (n × p) data matrix,X replacing N. This is only possible if all variables are measured in thesame units. In these circumstances, correspondence analysis produces asimultaneous two-dimensional plot of the rows and columns of X, whichisprecisely what is done in a biplot, but the two analyses are not the same.Both the classical biplot and correspondence analysis determine theplotting positions for rows and columns of X from the singular value decomposition(SVD) of a matrix (see Section 3.5). For the classical biplot,the SVD is calculated for the column-centred matrix X, but in correspondenceanalysis, the SVD is found for a matrix of residuals, after subtracting‘expected values assuming independence of rows and columns’ from X/n(see Section 13.1). The effect of looking at residual (or interaction) terms is(Greenacre, 1984, p. 288) that all the dimensions found by correspondenceanalysis represent aspects of the ‘shape’ of the data, whereas in PCA thefirst PC often simply represents ‘size’ (see Sections 4.1, 13.2). Correspondenceanalysis provides one way in which a data matrix may be adjustedin order to eliminate some uninteresting feature such as ‘size,’ before findingan SVD and hence ‘PCs.’ Other possible adjustments are discussed inSections 13.2 and 14.2.3.As with the biplot and its choice of α, there are several different ways ofplotting the points corresponding to rows and columns in correspondenceanalysis. Greenacre and Hastie (1987) give a good description of the geometryassociated with the most usual of these plots. Whereas the biplot mayapproximate Euclidean or Mahalanobis distances between rows, in correspondenceanalysis the points are often plotted to optimally approximateso-called χ 2 distances (see Greenacre (1984), Benzécri (1992)).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!