12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.1. Optimal Algebraic Properties of Population <strong>Principal</strong> <strong>Component</strong>s 11Most of the properties described in this chapter have sample counterparts.Some have greater relevance in the sample context, but it is moreconvenient to introduce them here, rather than in Chapter 3.2.1 Optimal Algebraic Properties of Population<strong>Principal</strong> <strong>Component</strong>s and Their StatisticalImplicationsConsider again the derivation of PCs given in Chapter 1, and denote byz the vector whose kth element is z k ,thekth PC, k =1, 2,...,p.(Unlessstated otherwise, the kth PC will be taken to mean the PC with the kthlargest variance, with corresponding interpretations for the ‘kth eigenvalue’and ‘kth eigenvector.’) Thenz = A ′ x, (2.1.1)where A is the orthogonal matrix whose kth column, α k , is the ktheigenvector of Σ. Thus, the PCs are defined by an orthonormal lineartransformation of x. Furthermore, we have directly from the derivationin Chapter 1 thatΣA = AΛ, (2.1.2)where Λ is the diagonal matrix whose kth diagonal element is λ k ,thektheigenvalue of Σ, andλ k =var(α ′ k x)=var(z k). Two alternative ways ofexpressing (2.1.2) that follow because A is orthogonal will be useful later,namelyandA ′ ΣA = Λ (2.1.3)Σ = AΛA ′ . (2.1.4)The orthonormal linear transformation of x, (2.1.1), which defines z, hasanumber of optimal properties, which are now discussed in turn.Property A1. For any integer q, 1 ≤ q ≤ p, consider the orthonormallinear transformationy = B ′ x, (2.1.5)where y is a q-element vector and B ′ is a (q×p) matrix, and let Σ y = B ′ ΣBbe the variance-covariance matrix for y. Then the trace of Σ y , denotedtr (Σ y ), is maximized by taking B = A q ,whereA q consists of the first qcolumns of A.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!