24.06.2015 Views

Notes on Principal Component Analysis

Notes on Principal Component Analysis

Notes on Principal Component Analysis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

etween the subject’s row vector and the vector defined by the particular<br />

linear combinati<strong>on</strong> of the two eigenvectors. These approximating<br />

vectors in p dimensi<strong>on</strong>s are all in a plane defined by all linear<br />

combinati<strong>on</strong>s of the two eigenvectors. For a rank 1 approximati<strong>on</strong>,<br />

we merely have a multiple of the first eigenvector (in p dimensi<strong>on</strong>s)<br />

as the approximating vector for a subject’s row vector.<br />

What’s going <strong>on</strong> in “subject space”: Suppose we begin by looking<br />

at a rank 1 approximati<strong>on</strong> of Z n×p ≈ U n×1 D 1×1 V ′ 1×p. The<br />

j th column (i.e., variable) of Z is a point in n-dimensi<strong>on</strong>al “subject<br />

space”, and is approximated by a multiple of the scores <strong>on</strong> the first<br />

comp<strong>on</strong>ent, (UD) n×1 . The multiple used is the j th element of the<br />

1 × p vector of first comp<strong>on</strong>ent weights, V ′ 1×p. Thus, each column<br />

of the n × p approximating matrix, U n×1 D 1×1 V ′ 1×p, is a multiple of<br />

the same vector giving the scores <strong>on</strong> the first comp<strong>on</strong>ent. In other<br />

words, we represent each column (variable) by a multiple of <strong>on</strong>e specific<br />

vector, where the multiple represents where the projecti<strong>on</strong> lies<br />

<strong>on</strong> this <strong>on</strong>e single vector (the term “projecti<strong>on</strong>” is used because of the<br />

least-squares property of the approximati<strong>on</strong>). For a rank 2 approximati<strong>on</strong>,<br />

each column variable in Z is represented by a point in the<br />

plane defined by all linear combinati<strong>on</strong>s of the two comp<strong>on</strong>ent score<br />

columns in U n×2 D 2×2 ; the point in that plane is determined by the<br />

weights in the j th column of V ′ 2×p. Alternatively, Z is approximated<br />

by the sum of two n×p matrices defined by columns being multiples<br />

of the first or sec<strong>on</strong>d comp<strong>on</strong>ent scores.<br />

As a way of illustrating a graphical way of representing principal<br />

comp<strong>on</strong>ents of a data matrix (through a biplot), suppose we have<br />

the rank 2 approximati<strong>on</strong>, Z n×p ≈ U n×2 D 2×2 V ′ 2×p, and c<strong>on</strong>sider<br />

13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!