Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

3Mathematical and StatisticalProperties of Sample PrincipalComponentsThe first part of this chapter is similar in structure to Chapter 2, exceptthat it deals with properties of PCs obtained from a sample covariance(or correlation) matrix, rather than from a population covariance (or correlation)matrix. The first two sections of the chapter, as in Chapter 2,describe, respectively, many of the algebraic and geometric properties ofPCs. Most of the properties discussed in Chapter 2 are almost the same forsamples as for populations. They will be mentioned again, but only briefly.There are, in addition, some properties that are relevant only to samplePCs, and these will be discussed more fully.The third and fourth sections of the chapter again mirror those of Chapter2. The third section discusses, with an example, the choice betweencorrelation and covariance matrices, while the fourth section looks at theimplications of equal and/or zero variances among the PCs, and illustratesthe potential usefulness of the last few PCs in detecting near-constantrelationships between the variables.The last five sections of the chapter cover material having no counterpartin Chapter 2. Section 3.5 discusses the singular value decomposition, whichcould have been included in Section 3.1 as an additional algebraic property.However, the topic is sufficiently important to warrant its own section, asit provides a useful alternative approach to some of the theory surroundingPCs, and also gives an efficient practical method for actually computingPCs.The sixth section looks at the probability distributions of the coefficientsand variances of a set of sample PCs, in other words, the probability distributionsof the eigenvectors and eigenvalues of a sample covariance matrix.

30 3. Properties of Sample Principal ComponentsThe seventh section then goes on to show how these distributions may beused to make statistical inferences about the population PCs, based onsample PCs.Section 3.8 demonstrates how the approximate structure and variancesof PCs can sometimes be deduced from patterns in the covariance or correlationmatrix. Finally, in Section 3.9 we discuss models that have beenproposed for PCA. The material could equally well have been included inChapter 2, but because the idea of maximum likelihood estimation arisesin some of the models we include it in the present chapter.3.1 Optimal Algebraic Properties of SamplePrincipal ComponentsBefore looking at the properties themselves, we need to establish somenotation. Suppose that we have n independent observations on the p-element random vector x; denote these n observations by x 1 , x 2 ,...,x n .Let ˜z i1 = a ′ 1x i ,i=1, 2,...,n, and choose the vector of coefficients a ′ 1 tomaximize the sample variance1n − 1n∑(˜z i1 − ¯z 1 ) 2i=1subject to the normalization constraint a ′ 1a 1 = 1. Next let ˜z i2 = a ′ 2x i ,i=1, 2,...,n,andchoosea ′ 2 to maximize the sample variance of the ˜z i2 subjectto the normalization constraint a ′ 2a 2 = 1, and subject also to the ˜z i2 beinguncorrelated with the ˜z i1 in the sample. Continuing this process in anobvious manner, we have a sample version of the definition of PCs given inSection 1.1. Thus a ′ kx is defined as the kth sample PC, k =1, 2,...,p,and˜z ik is the score for the ith observation on the kth PC. If the derivation inSection 1.1 is followed through, but with sample variances and covariancesreplacing population quantities, then it turns out that the sample varianceof the PC scores for the kth sample PC is l k ,thekth largest eigenvalue of thesample covariance matrix S for x 1 , x 2 ,...,x n ,anda k is the correspondingeigenvector for k =1, 2,...,p.Define the (n × p) matrices ˜X and ˜Z to have (i, k)th elements equal tothe value of the kth element ˜x ik of x i , and to ˜z ik , respectively. Then ˜Z and˜X are related by ˜Z = ˜XA, where A is the (p × p) orthogonal matrix whosekth column is a k .If the mean of each element of x is known to be zero, then S = 1 ˜X ′ n˜X.It is far more usual for the mean of x to be unknown, and in this case the(j, k)th element of S is1n − 1n∑(˜x ij − ¯x j )(˜x ik − ¯x k ),i=1

30 3. Properties of Sample <strong>Principal</strong> <strong>Component</strong>sThe seventh section then goes on to show how these distributions may beused to make statistical inferences about the population PCs, based onsample PCs.Section 3.8 demonstrates how the approximate structure and variancesof PCs can sometimes be deduced from patterns in the covariance or correlationmatrix. Finally, in Section 3.9 we discuss models that have beenproposed for PCA. The material could equally well have been included inChapter 2, but because the idea of maximum likelihood estimation arisesin some of the models we include it in the present chapter.3.1 Optimal Algebraic Properties of Sample<strong>Principal</strong> <strong>Component</strong>sBefore looking at the properties themselves, we need to establish somenotation. Suppose that we have n independent observations on the p-element random vector x; denote these n observations by x 1 , x 2 ,...,x n .Let ˜z i1 = a ′ 1x i ,i=1, 2,...,n, and choose the vector of coefficients a ′ 1 tomaximize the sample variance1n − 1n∑(˜z i1 − ¯z 1 ) 2i=1subject to the normalization constraint a ′ 1a 1 = 1. Next let ˜z i2 = a ′ 2x i ,i=1, 2,...,n,andchoosea ′ 2 to maximize the sample variance of the ˜z i2 subjectto the normalization constraint a ′ 2a 2 = 1, and subject also to the ˜z i2 beinguncorrelated with the ˜z i1 in the sample. Continuing this process in anobvious manner, we have a sample version of the definition of PCs given inSection 1.1. Thus a ′ kx is defined as the kth sample PC, k =1, 2,...,p,and˜z ik is the score for the ith observation on the kth PC. If the derivation inSection 1.1 is followed through, but with sample variances and covariancesreplacing population quantities, then it turns out that the sample varianceof the PC scores for the kth sample PC is l k ,thekth largest eigenvalue of thesample covariance matrix S for x 1 , x 2 ,...,x n ,anda k is the correspondingeigenvector for k =1, 2,...,p.Define the (n × p) matrices ˜X and ˜Z to have (i, k)th elements equal tothe value of the kth element ˜x ik of x i , and to ˜z ik , respectively. Then ˜Z and˜X are related by ˜Z = ˜XA, where A is the (p × p) orthogonal matrix whosekth column is a k .If the mean of each element of x is known to be zero, then S = 1 ˜X ′ n˜X.It is far more usual for the mean of x to be unknown, and in this case the(j, k)th element of S is1n − 1n∑(˜x ij − ¯x j )(˜x ik − ¯x k ),i=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!