Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Preface to the First EditionPrincipal component analysis is probably the oldest and best known ofthe techniques of multivariate analysis. It was first introduced by Pearson(1901), and developed independently by Hotelling (1933). Like manymultivariate methods, it was not widely used until the advent of electroniccomputers, but it is now well entrenched in virtually every statisticalcomputer package.The central idea of principal component analysis is to reduce the dimensionalityof a data set in which there are a large number of interrelatedvariables, while retaining as much as possible of the variation present inthe data set. This reduction is achieved by transforming to a new set ofvariables, the principal components, which are uncorrelated, and which areordered so that the first few retain most of the variation present in all ofthe original variables. Computation of the principal components reduces tothe solution of an eigenvalue-eigenvector problem for a positive-semidefinitesymmetric matrix. Thus, the definition and computation of principal componentsare straightforward but, as will be seen, this apparently simpletechnique has a wide variety of different applications, as well as a numberof different derivations. Any feelings that principal component analysisis a narrow subject should soon be dispelled by the present book; indeedsome quite broad topics which are related to principal component analysisreceive no more than a brief mention in the final two chapters.Although the term ‘principal component analysis’ is in common usage,and is adopted in this book, other terminology may be encountered for thesame technique, particularly outside of the statistical literature. For example,the phrase ‘empirical orthogonal functions’ is common in meteorology,
xPreface to the First Editionand in other fields the term ‘factor analysis’ may be used when ‘principalcomponent analysis’ is meant. References to ‘eigenvector analysis ’ or‘latent vector analysis’ may also camouflage principal component analysis.Finally, some authors refer to principal components analysis rather thanprincipal component analysis. To save space, the abbreviations PCA andPC will be used frequently in the present text.The book should be useful to readers with a wide variety of backgrounds.Some knowledge of probability and statistics, and of matrix algebra, isnecessary, but this knowledge need not be extensive for much of the book.It is expected, however, that most readers will have had some exposure tomultivariate analysis in general before specializing to PCA. Many textbookson multivariate analysis have a chapter or appendix on matrix algebra, e.g.Mardia et al. (1979, Appendix A), Morrison (1976, Chapter 2), Press (1972,Chapter 2), and knowledge of a similar amount of matrix algebra will beuseful in the present book.After an introductory chapter which gives a definition and derivation ofPCA, together with a brief historical review, there are three main parts tothe book. The first part, comprising Chapters 2 and 3, is mainly theoreticaland some small parts of it require rather more knowledge of matrix algebraand vector spaces than is typically given in standard texts on multivariateanalysis. However, it is not necessary to read all of these chapters in orderto understand the second, and largest, part of the book. Readers who aremainly interested in applications could omit the more theoretical sections,although Sections 2.3, 2.4, 3.3, 3.4 and 3.8 are likely to be valuable tomost readers; some knowledge of the singular value decomposition whichis discussed in Section 3.5 will also be useful in some of the subsequentchapters.This second part of the book is concerned with the various applicationsof PCA, and consists of Chapters 4 to 10 inclusive. Several chapters in thispart refer to other statistical techniques, in particular from multivariateanalysis. Familiarity with at least the basic ideas of multivariate analysiswill therefore be useful, although each technique is explained briefly whenit is introduced.The third part, comprising Chapters 11 and 12, is a mixture of theory andpotential applications. A number of extensions, generalizations and uses ofPCA in special circumstances are outlined. Many of the topics covered inthese chapters are relatively new, or outside the mainstream of statisticsand, for several, their practical usefulness has yet to be fully explored. Forthese reasons they are covered much more briefly than the topics in earlierchapters.The book is completed by an Appendix which contains two sections.The first section describes some numerical algorithms for finding PCs,and the second section describes the current availability of routinesfor performing PCA and related analyses in five well-known computerpackages.
- Page 1: Principal ComponentAnalysis,Second
- Page 7 and 8: viPreface to the Second Editionerty
- Page 9: viiiPreface to the Second EditionA
- Page 13 and 14: xiiPreface to the First EditionIn m
- Page 15 and 16: This page intentionally left blank
- Page 17 and 18: xviAcknowledgmentsthese institution
- Page 19 and 20: xviiiContents3.4.1 Example ........
- Page 21 and 22: xxContents10 Outlier Detection, Inf
- Page 23 and 24: This page intentionally left blank
- Page 25 and 26: xxivList of Figures5.2 Artistic qua
- Page 27 and 28: This page intentionally left blank
- Page 29 and 30: xxviiiList of Tables6.1 First six e
- Page 31 and 32: This page intentionally left blank
- Page 33 and 34: 2 1. IntroductionFigure 1.1. Plot o
- Page 35: 4 1. IntroductionFigure 1.3. Studen
- Page 38 and 39: 1.2. A Brief History of Principal C
- Page 40 and 41: 1.2. A Brief History of Principal C
- Page 42 and 43: 2.1. Optimal Algebraic Properties o
- Page 44 and 45: 2.1. Optimal Algebraic Properties o
- Page 46 and 47: 2.1. Optimal Algebraic Properties o
- Page 48 and 49: 2.1. Optimal Algebraic Properties o
- Page 50 and 51: 2.2. Geometric Properties of Popula
- Page 52 and 53: 2.3. Principal Components Using a C
- Page 54 and 55: 2.3. Principal Components Using a C
- Page 56 and 57: 2.3. Principal Components Using a C
- Page 58 and 59: 2.4. Principal Components with Equa
xPreface to the First Editionand in other fields the term ‘factor analysis’ may be used when ‘principalcomponent analysis’ is meant. References to ‘eigenvector analysis ’ or‘latent vector analysis’ may also camouflage principal component analysis.Finally, some authors refer to principal components analysis rather thanprincipal component analysis. To save space, the abbreviations PCA andPC will be used frequently in the present text.The book should be useful to readers with a wide variety of backgrounds.Some knowledge of probability and statistics, and of matrix algebra, isnecessary, but this knowledge need not be extensive for much of the book.It is expected, however, that most readers will have had some exposure tomultivariate analysis in general before specializing to PCA. Many textbookson multivariate analysis have a chapter or appendix on matrix algebra, e.g.Mardia et al. (1979, Appendix A), Morrison (1976, Chapter 2), Press (1972,Chapter 2), and knowledge of a similar amount of matrix algebra will beuseful in the present book.After an introductory chapter which gives a definition and derivation ofPCA, together with a brief historical review, there are three main parts tothe book. The first part, comprising Chapters 2 and 3, is mainly theoreticaland some small parts of it require rather more knowledge of matrix algebraand vector spaces than is typically given in standard texts on multivariateanalysis. However, it is not necessary to read all of these chapters in orderto understand the second, and largest, part of the book. Readers who aremainly interested in applications could omit the more theoretical sections,although Sections 2.3, 2.4, 3.3, 3.4 and 3.8 are likely to be valuable tomost readers; some knowledge of the singular value decomposition whichis discussed in Section 3.5 will also be useful in some of the subsequentchapters.This second part of the book is concerned with the various applicationsof PCA, and consists of Chapters 4 to 10 inclusive. Several chapters in thispart refer to other statistical techniques, in particular from multivariateanalysis. Familiarity with at least the basic ideas of multivariate analysiswill therefore be useful, although each technique is explained briefly whenit is introduced.The third part, comprising Chapters 11 and 12, is a mixture of theory andpotential applications. A number of extensions, generalizations and uses ofPCA in special circumstances are outlined. Many of the topics covered inthese chapters are relatively new, or outside the mainstream of statisticsand, for several, their practical usefulness has yet to be fully explored. Forthese reasons they are covered much more briefly than the topics in earlierchapters.The book is completed by an Appendix which contains two sections.The first section describes some numerical algorithms for finding PCs,and the second section describes the current availability of routinesfor performing PCA and related analyses in five well-known computerpackages.