Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

3.6. Probability Distributions for Sample Principal Components 473.6 Probability Distributions for Sample PrincipalComponentsA considerable amount of mathematical effort has been expended on derivingprobability distributions, mostly asymptotic, for the coefficients in thesample PCs and for the variances of sample PCs or, equivalently, findingdistributions for the eigenvectors and eigenvalues of a sample covariancematrix. For example, the first issue of Journal of Multivariate Analysis in1982 contained three papers, totalling 83 pages, on the topic. In recentyears there has probably been less theoretical work on probability distributionsdirectly connected to PCA; this may simply reflect the fact thatthere is little remaining still to do. The distributional results that havebeen derived suffer from three drawbacks:(i) they usually involve complicated mathematics;(ii) they are mostly asymptotic;(iii) they are often based on the assumption that the original set of variableshas a multivariate normal distribution.Despite these drawbacks, the distributional results are useful in some circumstances,and a selection of the main circumstances is given in thissection. Their use in inference about the population PCs, given samplePCs, is discussed in the next section.Assume that x ∼ N(µ, Σ), that is, x has a p-variate normal distributionwith mean µ and covariance matrix Σ. Although µ need not be given, Σis assumed known. Then(n − 1)S ∼ W p (Σ,n− 1),that is (n − 1)S has the so-called Wishart distribution with parametersΣ, (n−1) (see, for example, Mardia et al. (1979, Section 3.4)). Therefore, investigationof the sampling properties of the coefficients and variances of thesample PCs is equivalent to looking at sampling properties of eigenvectorsand eigenvalues of Wishart random variables.The density function of a matrix V that has the W p (Σ,n−1) distributionis{c|V| (n−p−2)/2 exp − 1 }2 tr(Σ−1 V) ,wherec −1 =2 p(n−1)/2 Π p(1−p)/4 |Σ| (n−1)/2p∏( ) n − jΓ ,2and various properties of Wishart random variables have been thoroughlyinvestigated (see, for example, Srivastava and Khatri, 1979, Chapter 3).j=1

48 3. Properties of Sample Principal ComponentsLet l k , a k ,fork =1, 2,...,p be the eigenvalues and eigenvectors of S,respectively, and let λ k , α k ,fork =1, 2,...,p, be the eigenvalues andeigenvectors of Σ, respectively. Also, let l, λ be the p-element vectors consistingof the l k and λ k , respectively and let the jth elements of a k , α k bea kj , α kj , respectively. [The notation a jk was used for the jth element of a kin the previous section, but it seems more natural to use a kj in this and thenext, section. We also revert to using l k to denote the kth eigenvalue of Srather than that of X ′ X.] The best known and simplest results concerningthe distribution of the l k and the a k assume, usually quite realistically, thatλ 1 >λ 2 > ··· >λ p > 0; in other words all the population eigenvalues arepositive and distinct. Then the following results hold asymptotically:(i) all of the l k are independent of all of the a k ;(ii) l and the a k are jointly normally distributed;(iii)(iv)E(l) =λ, E(a k )=α k , k =1, 2,...,p; (3.6.1)⎧⎨ 2λ 2 kk = kcov(l k ,l k ′)=′ ,n − 1⎩0 k ≠ k ′ ,cov(a kj ,a k ′ j ′)= ⎧⎪ ⎨⎪ ⎩λ k(n − 1)p∑l=1l≠kλ l α lj α lj ′(λ l − λ k ) 2 k = k ′ ,(3.6.2)(3.6.3)− λ kλ k ′α kj α k ′ j ′(n − 1)(λ k − λ k ′) 2 k ≠ k ′ .An extension of the above results to the case where some of the λ k maybe equal to each other, though still positive, is given by Anderson (1963),and an alternative proof to that of Anderson can be found in Srivastavaand Khatri (1979, Section 9.4.1).It should be stressed that the above results are asymptotic and thereforeonly approximate for finite samples. Exact results are available, butonly for a few special cases, such as when Σ = I (Srivastava and Khatri,1979, p. 86) and more generally for l 1 , l p , the largest and smallest eigenvalues(Srivastava and Khatri, 1979, p. 205). In addition, better but morecomplicated approximations can be found to the distributions of l and thea k in the general case (see Srivastava and Khatri, 1979, Section 9.4; Jackson,1991, Sections 4.2, 4.5; and the references cited in these sources). Onespecific point regarding the better approximations is that E(l 1 ) >λ 1 andE(l p )

3.6. Probability Distributions for Sample <strong>Principal</strong> <strong>Component</strong>s 473.6 Probability Distributions for Sample <strong>Principal</strong><strong>Component</strong>sA considerable amount of mathematical effort has been expended on derivingprobability distributions, mostly asymptotic, for the coefficients in thesample PCs and for the variances of sample PCs or, equivalently, findingdistributions for the eigenvectors and eigenvalues of a sample covariancematrix. For example, the first issue of Journal of Multivariate <strong>Analysis</strong> in1982 contained three papers, totalling 83 pages, on the topic. In recentyears there has probably been less theoretical work on probability distributionsdirectly connected to PCA; this may simply reflect the fact thatthere is little remaining still to do. The distributional results that havebeen derived suffer from three drawbacks:(i) they usually involve complicated mathematics;(ii) they are mostly asymptotic;(iii) they are often based on the assumption that the original set of variableshas a multivariate normal distribution.Despite these drawbacks, the distributional results are useful in some circumstances,and a selection of the main circumstances is given in thissection. Their use in inference about the population PCs, given samplePCs, is discussed in the next section.Assume that x ∼ N(µ, Σ), that is, x has a p-variate normal distributionwith mean µ and covariance matrix Σ. Although µ need not be given, Σis assumed known. Then(n − 1)S ∼ W p (Σ,n− 1),that is (n − 1)S has the so-called Wishart distribution with parametersΣ, (n−1) (see, for example, Mardia et al. (1979, Section 3.4)). Therefore, investigationof the sampling properties of the coefficients and variances of thesample PCs is equivalent to looking at sampling properties of eigenvectorsand eigenvalues of Wishart random variables.The density function of a matrix V that has the W p (Σ,n−1) distributionis{c|V| (n−p−2)/2 exp − 1 }2 tr(Σ−1 V) ,wherec −1 =2 p(n−1)/2 Π p(1−p)/4 |Σ| (n−1)/2p∏( ) n − jΓ ,2and various properties of Wishart random variables have been thoroughlyinvestigated (see, for example, Srivastava and Khatri, 1979, Chapter 3).j=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!