12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

13.1. <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> for Discrete Data 339ulations, and also examines how PCs from the different populations can becompared.Section 13.6 discusses possible ways of dealing with missing data in aPCA, and Section 13.7 describes the use of PCs in statistical process control.Finally, Section 13.8 covers a number of other types of data ratherbriefly. These include vector or directional data, data presented as intervals,species abundance data and large data sets.13.1 <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> forDiscrete DataWhen PCA is used as a descriptive technique, there is no reason for thevariables in the analysis to be of any particular type. At one extreme, xmay have a multivariate normal distribution, in which case all the relevantinferential results mentioned in Section 3.7 can be used. At the oppositeextreme, the variables could be a mixture of continuous, ordinal or evenbinary (0/1) variables. It is true that variances, covariances and correlationshave especial relevance for multivariate normal x, and that linear functionsof binary variables are less readily interpretable than linear functions ofcontinuous variables. However, the basic objective of PCA—to summarizemost of the ‘variation’ that is present in the original set of p variables usinga smaller number of derived variables—can be achieved regardless of thenature of the original variables.For data in which all variables are binary, Gower (1966) points out thatusing PCA does provide a plausible low-dimensional representation. Thisfollows because PCA is equivalent to a principal coordinate analysis basedon the commonly used definition of similarity between two individuals (observations)as the proportion of the p variables for which the two individualstake the same value (see Section 5.2). Cox (1972), however, suggests an alternativeto PCA for binary data. His idea, which he calls ‘permutationalprincipal components,’ is based on the fact that a set of data consistingof p binary variables can be expressed in a number of different but equivalentways. As an example, consider the following two variables from acloud-seeding experiment:{ 1 if rain falls in seeded area,x 1 =0 if no rain falls in seeded area{ 1 if rain falls in control area,x 2 =0 if no rain falls in control area.Instead of x 1 ,x 2 we could define{x ′ 0 if both areas have rain or both areas dry1 =1 if one area has rain, the other is dry

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!