12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

340 13. <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> for Special Types of Dataand x ′ 2 = x 1 . There is also a third possibility namely x ′′1 = x ′ 1, x ′′2 = x 2 .Forp>2 variables there are many more possible permutations of this type,and Cox (1972) suggests that an alternative to PCA might be to transformto independent binary variables using such permutations. Bloomfield(1974) investigates Cox’s suggestion in more detail, and presents an examplehaving four variables. In examples involving two variables we canwritex ′ 1 = x 1 + x 2(modulo 2), using the notation above.For more than two variables, not all permutations can be written inthis way, but Bloomfield restricts his attention to those permutationswhich can. Thus, for a set of p binary variables x 1 ,x 2 ,...,x p ,heconsiderstransformations to z 1 ,z 2 ,...,z p such that, for k =1, 2,...,p,wehaveeitherorz k = x j for some j,z k = x i + x j (modulo 2) for some i, j, i ≠ j.He is thus restricting attention to linear transformations of the variables(as in PCA) and the objective in this case is to choose a transformationthat simplifies the structure between variables. The data can be viewed asa contingency table, and Bloomfield (1974) interprets a simpler structureas one that reduces high order interaction terms between variables. Thisidea is illustrated on a 4-variable example, and several transformations areexamined, but (unlike PCA) there is no algorithm for finding a unique‘best’ transformation.A second special type of discrete data occurs when, for each observation,only ranks and not actual values are given for each variable. For such data,all the columns of the data matrix X have the same sum, so that thedata are constrained in a way similar to that which holds for compositionaldata (see Section 13.3). Gower (1967) discusses some geometric implicationsthat follow from the constraints imposed by this type of ranked data andby compositional data.Another possible adaptation of PCA to discrete data is to replace variancesand covariances by measures of dispersion and association that aremore relevant to discrete variables. For the particular case of contingencytable data, many different measures of association have been suggested(Bishop et al., 1975, Chapter 11). It is also possible to define measures ofvariation other than variance for such data, for example Gini’s measure(see Bishop et al., 1975, Section 11.3.4). An approach of this sort is proposedby Korhonen and Siljamäki (1998) for ordinal data. The objectivethey tackle is to find an ‘optimal’ ordering or ranking of the multivariatedata so that, instead of assigning a score on a continuum to each of n observations,what is required is a rank between 1 and n. They note that the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!