Notes on Principal Component Analysis
Notes on Principal Component Analysis
Notes on Principal Component Analysis
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
For example, varimax is <strong>on</strong>e method for c<strong>on</strong>structing B ∗ . The<br />
columns of B ∗ when normalized to unit length, define r linear composites<br />
of the observable variables, where the sum of squares within<br />
columns of B ∗ defines the variance for that composite. The composites<br />
are still orthog<strong>on</strong>al.<br />
0.3 <strong>Principal</strong> Comp<strong>on</strong>ents in Terms of the Data Matrix<br />
For c<strong>on</strong>venience, suppose we transform our n × p data matrix X<br />
into the z-score data matrix Z, and assuming n > p, let the SVD of<br />
Z n×p = U n×p D p×p V ′ p×p. Note that the p × p correlati<strong>on</strong> matrix<br />
R = 1 n Z′ Z = 1 n (VDU′ )(UDV ′ ) = V( 1 n D2 )V ′ .<br />
So, the rows of V ′ are the principal comp<strong>on</strong>ent weights. Also,<br />
ZV = UDV ′ V = UD .<br />
In other words, (UD) n×p are the scores for the n subjects <strong>on</strong> the p<br />
principal comp<strong>on</strong>ents.<br />
What’s going <strong>on</strong> in “variable” space: Suppose we look at a rank<br />
2 approximati<strong>on</strong> of Z n×p ≈ U n×2 D 2×2 V ′ 2×p. The i th subject’s row<br />
data vector sits somewhere in p-dimensi<strong>on</strong>al “variable” space; it is<br />
approximated by a linear combinati<strong>on</strong> of the two eigenvectors (which<br />
gives another point in p dimensi<strong>on</strong>s), where the weights used in the<br />
linear combinati<strong>on</strong> come from the i th row of (UD) n×2 . Because we<br />
do least-squares, we are minimizing the squared Euclidean distances<br />
12