Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
1.1. Definition and Derivation of Principal Components 3Figure 1.2. Plot of the 50 observations from Figure 1.1 with respect to their PCsz 1, z 2.Figure 1.1 gives a plot of 50 observations on two highly correlated variablesx 1 , x 2 . There is considerable variation in both variables, thoughrather more in the direction of x 2 than x 1 . If we transform to PCs z 1 , z 2 ,we obtain the plot given in Figure 1.2.It is clear that there is greater variation in the direction of z 1 than ineither of the original variables, but very little variation in the direction ofz 2 . More generally, if a set of p (> 2) variables has substantial correlationsamong them, then the first few PCs will account for most of the variationin the original variables. Conversely, the last few PCs identify directionsin which there is very little variation; that is, they identify near-constantlinear relationships among the original variables.As a taster of the many examples to come later in the book, Figure 1.3provides a plot of the values of the first two principal components in a7-variable example. The data presented here consist of seven anatomicalmeasurements on 28 students, 11 women and 17 men. This data set andsimilar ones for other groups of students are discussed in more detail inSections 4.1 and 5.1. The important thing to note here is that the first twoPCs account for 80 percent of the total variation in the data set, so that the
4 1. IntroductionFigure 1.3. Student anatomical measurements: plots of 28 students with respectto their first two PCs. × denotes women; ◦ denotes men.2-dimensional picture of the data given in Figure 1.3 is a reasonably faithfulrepresentation of the positions of the 28 observations in 7-dimensionalspace. It is also clear from the figure that the first PC, which, as we shallsee later, can be interpreted as a measure of the overall size of each student,does a good job of separating the women and men in the sample.Having defined PCs, we need to know how to find them. Consider, for themoment, the case where the vector of random variables x has a known covariancematrix Σ. This is the matrix whose (i, j)th element is the (known)covariance between the ith and jth elements of x when i ≠ j, and the varianceof the jth element of x when i = j. The more realistic case, where Σis unknown, follows by replacing Σ by a sample covariance matrix S (seeChapter 3). It turns out that for k =1, 2, ··· ,p,thekth PC is given byz k = α ′ k x where α k is an eigenvector of Σ corresponding to its kth largesteigenvalue λ k . Furthermore, if α k is chosen to have unit length (α ′ k α k = 1),then var(z k )=λ k , where var(z k ) denotes the variance of z k .The following derivation of these results is the standard one given inmany multivariate textbooks; it may be skipped by readers who mainlyare interested in the applications of PCA. Such readers could also skip
- Page 1: Principal ComponentAnalysis,Second
- Page 7 and 8: viPreface to the Second Editionerty
- Page 9 and 10: viiiPreface to the Second EditionA
- Page 11 and 12: xPreface to the First Editionand in
- Page 13 and 14: xiiPreface to the First EditionIn m
- Page 15 and 16: This page intentionally left blank
- Page 17 and 18: xviAcknowledgmentsthese institution
- Page 19 and 20: xviiiContents3.4.1 Example ........
- Page 21 and 22: xxContents10 Outlier Detection, Inf
- Page 23 and 24: This page intentionally left blank
- Page 25 and 26: xxivList of Figures5.2 Artistic qua
- Page 27 and 28: This page intentionally left blank
- Page 29 and 30: xxviiiList of Tables6.1 First six e
- Page 31 and 32: This page intentionally left blank
- Page 33: 2 1. IntroductionFigure 1.1. Plot o
- Page 38 and 39: 1.2. A Brief History of Principal C
- Page 40 and 41: 1.2. A Brief History of Principal C
- Page 42 and 43: 2.1. Optimal Algebraic Properties o
- Page 44 and 45: 2.1. Optimal Algebraic Properties o
- Page 46 and 47: 2.1. Optimal Algebraic Properties o
- Page 48 and 49: 2.1. Optimal Algebraic Properties o
- Page 50 and 51: 2.2. Geometric Properties of Popula
- Page 52 and 53: 2.3. Principal Components Using a C
- Page 54 and 55: 2.3. Principal Components Using a C
- Page 56 and 57: 2.3. Principal Components Using a C
- Page 58 and 59: 2.4. Principal Components with Equa
- Page 60 and 61: 3Mathematical and StatisticalProper
- Page 62 and 63: where3.1. Optimal Algebraic Propert
- Page 64 and 65: 3.2. Geometric Properties of Sample
- Page 66 and 67: 3.2. Geometric Properties of Sample
- Page 68 and 69: 3.2. Geometric Properties of Sample
- Page 70 and 71: 3.3. Covariance and Correlation Mat
- Page 72 and 73: 3.3. Covariance and Correlation Mat
- Page 74 and 75: 3.4. Principal Components with Equa
- Page 76 and 77: show that X = ULA ′ .⎡ULA ′ =
- Page 78 and 79: 3.6. Probability Distributions for
- Page 80 and 81: 3.7. Inference Based on Sample Prin
- Page 82 and 83: 3.7.2 Interval Estimation3.7. Infer
4 1. IntroductionFigure 1.3. Student anatomical measurements: plots of 28 students with respectto their first two PCs. × denotes women; ◦ denotes men.2-dimensional picture of the data given in Figure 1.3 is a reasonably faithfulrepresentation of the positions of the 28 observations in 7-dimensionalspace. It is also clear from the figure that the first PC, which, as we shallsee later, can be interpreted as a measure of the overall size of each student,does a good job of separating the women and men in the sample.Having defined PCs, we need to know how to find them. Consider, for themoment, the case where the vector of random variables x has a known covariancematrix Σ. This is the matrix whose (i, j)th element is the (known)covariance between the ith and jth elements of x when i ≠ j, and the varianceof the jth element of x when i = j. The more realistic case, where Σis unknown, follows by replacing Σ by a sample covariance matrix S (seeChapter 3). It turns out that for k =1, 2, ··· ,p,thekth PC is given byz k = α ′ k x where α k is an eigenvector of Σ corresponding to its kth largesteigenvalue λ k . Furthermore, if α k is chosen to have unit length (α ′ k α k = 1),then var(z k )=λ k , where var(z k ) denotes the variance of z k .The following derivation of these results is the standard one given inmany multivariate textbooks; it may be skipped by readers who mainlyare interested in the applications of PCA. Such readers could also skip