12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

13.1. <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> for Discrete Data 341first PC maximizes the (weighted) sum of squared correlations between thePC and each of the p variables. The weights are unity for correlation-basedPCA and equal to the sample variances for covariance-based PCA. Theseresults follow from the sample version of Property A6 in Section 2.3 andthe discussion that follows the property.Korhonen and Siljamäki (1998) define the first ordinal principal componentas the ranking of the n observations for which the sum of squaredrank correlation coefficients between the ordinal PC and each of the p variablesis maximized. They suggest that either Spearman’s rank correlationor Kendall’s τ could be used as the ‘rank correlation’ in this definition. Thefirst ordinal PC can be computed when the data themselves are ordinal,but it can also be obtained for continuous data, and may be useful if anoptimal ordering of the data, rather than a continuous derived variable thatmaximizes variance, is of primary interest. Computationally the first ordinalprincipal component may be found by an exhaustive search for smallexamples, but the optimization problem is non-trivial for moderate or largedata sets. Korhonen and Siljamäki (1998) note that the development of afast algorithm for the procedure is a topic for further research, as is thequestion of whether it is useful and feasible to look for 2nd, 3rd, ...ordinalPCs.Baba (1995) uses the simpler procedure for ranked data of calculatingrank correlations and conducting an ordinary PCA on the resulting (rank)correlation matrix. It is shown that useful results may be obtained for sometypes of data.Returning to contingency tables, the usual ‘adaptation’ of PCA to suchdata is correspondence analysis. This technique is the subject of the remainderof this section. Correspondence analysis was introduced in Section 5.4as a graphical technique, but is appropriate to discuss it in a little moredetail here because, as well as being used as a graphical means of displayingcontingency table data (see Section 5.4), the technique has been describedby some authors (for example, De Leeuw and van Rijckevorsel, 1980) as aform of PCA for nominal data. To see how this description is valid, consider,as in Section 5.4, a data set of n observations arranged in a two-waycontingency table, with n ij denoting the number of observations that takethe ith value for the first (row) variable and the jth value for the second(column) variable, i =1, 2,...,r; j =1, 2,...,c. Let N be the (r × c) matrixwith (i, j)th element n ij , and define P = 1 n N, r = Pl c, c = P ′ l r ,andX = P − rc ′ , where l c , l r , are vectors of c and r elements, respectively,with all elements unity. If the variable defining the rows of the contingencytable is independent of the variable defining the columns, then the matrixof ‘expected counts’ is given by nrc ′ .Thus,X is a matrix of the residualsthat remain when the ‘independence’ model is fitted to P.The generalized singular value decomposition (SVD) of X, is defined byX = VMB ′ , (13.1.1)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!