Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
3.6. Probability Distributions for Sample Principal Components 473.6 Probability Distributions for Sample PrincipalComponentsA considerable amount of mathematical effort has been expended on derivingprobability distributions, mostly asymptotic, for the coefficients in thesample PCs and for the variances of sample PCs or, equivalently, findingdistributions for the eigenvectors and eigenvalues of a sample covariancematrix. For example, the first issue of Journal of Multivariate Analysis in1982 contained three papers, totalling 83 pages, on the topic. In recentyears there has probably been less theoretical work on probability distributionsdirectly connected to PCA; this may simply reflect the fact thatthere is little remaining still to do. The distributional results that havebeen derived suffer from three drawbacks:(i) they usually involve complicated mathematics;(ii) they are mostly asymptotic;(iii) they are often based on the assumption that the original set of variableshas a multivariate normal distribution.Despite these drawbacks, the distributional results are useful in some circumstances,and a selection of the main circumstances is given in thissection. Their use in inference about the population PCs, given samplePCs, is discussed in the next section.Assume that x ∼ N(µ, Σ), that is, x has a p-variate normal distributionwith mean µ and covariance matrix Σ. Although µ need not be given, Σis assumed known. Then(n − 1)S ∼ W p (Σ,n− 1),that is (n − 1)S has the so-called Wishart distribution with parametersΣ, (n−1) (see, for example, Mardia et al. (1979, Section 3.4)). Therefore, investigationof the sampling properties of the coefficients and variances of thesample PCs is equivalent to looking at sampling properties of eigenvectorsand eigenvalues of Wishart random variables.The density function of a matrix V that has the W p (Σ,n−1) distributionis{c|V| (n−p−2)/2 exp − 1 }2 tr(Σ−1 V) ,wherec −1 =2 p(n−1)/2 Π p(1−p)/4 |Σ| (n−1)/2p∏( ) n − jΓ ,2and various properties of Wishart random variables have been thoroughlyinvestigated (see, for example, Srivastava and Khatri, 1979, Chapter 3).j=1
48 3. Properties of Sample Principal ComponentsLet l k , a k ,fork =1, 2,...,p be the eigenvalues and eigenvectors of S,respectively, and let λ k , α k ,fork =1, 2,...,p, be the eigenvalues andeigenvectors of Σ, respectively. Also, let l, λ be the p-element vectors consistingof the l k and λ k , respectively and let the jth elements of a k , α k bea kj , α kj , respectively. [The notation a jk was used for the jth element of a kin the previous section, but it seems more natural to use a kj in this and thenext, section. We also revert to using l k to denote the kth eigenvalue of Srather than that of X ′ X.] The best known and simplest results concerningthe distribution of the l k and the a k assume, usually quite realistically, thatλ 1 >λ 2 > ··· >λ p > 0; in other words all the population eigenvalues arepositive and distinct. Then the following results hold asymptotically:(i) all of the l k are independent of all of the a k ;(ii) l and the a k are jointly normally distributed;(iii)(iv)E(l) =λ, E(a k )=α k , k =1, 2,...,p; (3.6.1)⎧⎨ 2λ 2 kk = kcov(l k ,l k ′)=′ ,n − 1⎩0 k ≠ k ′ ,cov(a kj ,a k ′ j ′)= ⎧⎪ ⎨⎪ ⎩λ k(n − 1)p∑l=1l≠kλ l α lj α lj ′(λ l − λ k ) 2 k = k ′ ,(3.6.2)(3.6.3)− λ kλ k ′α kj α k ′ j ′(n − 1)(λ k − λ k ′) 2 k ≠ k ′ .An extension of the above results to the case where some of the λ k maybe equal to each other, though still positive, is given by Anderson (1963),and an alternative proof to that of Anderson can be found in Srivastavaand Khatri (1979, Section 9.4.1).It should be stressed that the above results are asymptotic and thereforeonly approximate for finite samples. Exact results are available, butonly for a few special cases, such as when Σ = I (Srivastava and Khatri,1979, p. 86) and more generally for l 1 , l p , the largest and smallest eigenvalues(Srivastava and Khatri, 1979, p. 205). In addition, better but morecomplicated approximations can be found to the distributions of l and thea k in the general case (see Srivastava and Khatri, 1979, Section 9.4; Jackson,1991, Sections 4.2, 4.5; and the references cited in these sources). Onespecific point regarding the better approximations is that E(l 1 ) >λ 1 andE(l p )
- Page 27 and 28: This page intentionally left blank
- Page 29 and 30: xxviiiList of Tables6.1 First six e
- Page 31 and 32: This page intentionally left blank
- Page 33 and 34: 2 1. IntroductionFigure 1.1. Plot o
- Page 35: 4 1. IntroductionFigure 1.3. Studen
- Page 38 and 39: 1.2. A Brief History of Principal C
- Page 40 and 41: 1.2. A Brief History of Principal C
- Page 42 and 43: 2.1. Optimal Algebraic Properties o
- Page 44 and 45: 2.1. Optimal Algebraic Properties o
- Page 46 and 47: 2.1. Optimal Algebraic Properties o
- Page 48 and 49: 2.1. Optimal Algebraic Properties o
- Page 50 and 51: 2.2. Geometric Properties of Popula
- Page 52 and 53: 2.3. Principal Components Using a C
- Page 54 and 55: 2.3. Principal Components Using a C
- Page 56 and 57: 2.3. Principal Components Using a C
- Page 58 and 59: 2.4. Principal Components with Equa
- Page 60 and 61: 3Mathematical and StatisticalProper
- Page 62 and 63: where3.1. Optimal Algebraic Propert
- Page 64 and 65: 3.2. Geometric Properties of Sample
- Page 66 and 67: 3.2. Geometric Properties of Sample
- Page 68 and 69: 3.2. Geometric Properties of Sample
- Page 70 and 71: 3.3. Covariance and Correlation Mat
- Page 72 and 73: 3.3. Covariance and Correlation Mat
- Page 74 and 75: 3.4. Principal Components with Equa
- Page 76 and 77: show that X = ULA ′ .⎡ULA ′ =
- Page 80 and 81: 3.7. Inference Based on Sample Prin
- Page 82 and 83: 3.7.2 Interval Estimation3.7. Infer
- Page 84 and 85: 3.7. Inference Based on Sample Prin
- Page 86 and 87: 3.7. Inference Based on Sample Prin
- Page 88 and 89: 3.8. Patterned Covariance and Corre
- Page 90 and 91: 3.9. Models for Principal Component
- Page 92 and 93: 3.9. Models for Principal Component
- Page 94 and 95: 4Principal Components as a SmallNum
- Page 96 and 97: 4.1. Anatomical Measurements 65Tabl
- Page 98 and 99: 4.1. Anatomical Measurements 67spac
- Page 100 and 101: 4.2. The Elderly at Home 69Table 4.
- Page 102 and 103: 4.3. Spatial and Temporal Variation
- Page 104 and 105: 4.3. Spatial and Temporal Variation
- Page 106 and 107: 4.4. Properties of Chemical Compoun
- Page 108 and 109: 4.5. Stock Market Prices 77Table 4.
- Page 110 and 111: 5. Graphical Representation of Data
- Page 112 and 113: Anatomical Measurements5.1. Plottin
- Page 114 and 115: 5.1. Plotting Two or Three Principa
- Page 116 and 117: 5.2. Principal Coordinate Analysis
- Page 118 and 119: 5.2. Principal Coordinate Analysis
- Page 120 and 121: 5.2. Principal Coordinate Analysis
- Page 122 and 123: 5.3. Biplots 91columns, L is an (r
- Page 124 and 125: 5.3. Biplots 93ButandSubstituting i
- Page 126 and 127: 5.3. Biplots 95The vector gi ∗ co
3.6. Probability Distributions for Sample <strong>Principal</strong> <strong>Component</strong>s 473.6 Probability Distributions for Sample <strong>Principal</strong><strong>Component</strong>sA considerable amount of mathematical effort has been expended on derivingprobability distributions, mostly asymptotic, for the coefficients in thesample PCs and for the variances of sample PCs or, equivalently, findingdistributions for the eigenvectors and eigenvalues of a sample covariancematrix. For example, the first issue of Journal of Multivariate <strong>Analysis</strong> in1982 contained three papers, totalling 83 pages, on the topic. In recentyears there has probably been less theoretical work on probability distributionsdirectly connected to PCA; this may simply reflect the fact thatthere is little remaining still to do. The distributional results that havebeen derived suffer from three drawbacks:(i) they usually involve complicated mathematics;(ii) they are mostly asymptotic;(iii) they are often based on the assumption that the original set of variableshas a multivariate normal distribution.Despite these drawbacks, the distributional results are useful in some circumstances,and a selection of the main circumstances is given in thissection. Their use in inference about the population PCs, given samplePCs, is discussed in the next section.Assume that x ∼ N(µ, Σ), that is, x has a p-variate normal distributionwith mean µ and covariance matrix Σ. Although µ need not be given, Σis assumed known. Then(n − 1)S ∼ W p (Σ,n− 1),that is (n − 1)S has the so-called Wishart distribution with parametersΣ, (n−1) (see, for example, Mardia et al. (1979, Section 3.4)). Therefore, investigationof the sampling properties of the coefficients and variances of thesample PCs is equivalent to looking at sampling properties of eigenvectorsand eigenvalues of Wishart random variables.The density function of a matrix V that has the W p (Σ,n−1) distributionis{c|V| (n−p−2)/2 exp − 1 }2 tr(Σ−1 V) ,wherec −1 =2 p(n−1)/2 Π p(1−p)/4 |Σ| (n−1)/2p∏( ) n − jΓ ,2and various properties of Wishart random variables have been thoroughlyinvestigated (see, for example, Srivastava and Khatri, 1979, Chapter 3).j=1