12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

14.6. Miscellanea 40314.6.3 Regression <strong>Component</strong>s, Sweep-out <strong>Component</strong>s andExtended <strong>Component</strong>sOttestad (1975) proposed an alternative to PCA that he called regressioncomponents. He developed this for standardized variables, and hence it isa correlation-based analysis. The new variables, or regression components,y 1 ,y 2 ,...,y p are defined in terms of the original (standardized) variablesx 1 ,x 2 ,...,x p as y 1 = x 1 , y 2 = x 2 − b 21 x 1 , y 3 = x 3 − b 31 x 1 − b 32 x 2 , ...,y p = x p − b p1 x 1 − b p2 x 2 − ...b p(p−1) x (p−1) ,where b jk is the regression coefficient of x k in a regression of x j on all othervariables on the right hand-side of the equation defining y j . It should bestressed that the labelling in these defining equations has been chosen forsimplicity to correspond to the order in which the y variables are defined. Itwill usually be different from the labelling of the data as originally recorded.The x variables can be selected in any order to define the y variables andthe objective of the technique is to choose a best order from the p! possibilities.This is done by starting with y p , for which x p is chosen to bethe original variable that has maximum multiple correlation with the other(p − 1) variables. The next variable x (p−1) ,fromwhichy (p−1) is defined,minimizes (1 + b p(p−1) ) 2 (1 − R 2 ), where R 2 denotes the multiple correlationof x (p−1) with x (p−2) ,x (p−3) ,...,x 1 , and so on until only x 1 is left.The reasoning behind the method, which gives uncorrelated components,is that it provides results that are simpler to interpret than PCA in theexamples that Ottestad (1975) studies. However, orthogonality of vectorsof coefficients and successive variance maximization are both lost. Unlikethe techniques described in Chapter 11, no explicit form of simplicity istargeted and neither is there any overt attempt to limit variance loss, sothe method is quite different from PCA.A variation on the same theme is proposed by Atiqullah and Uddin(1993). They also produce new variables y 1 ,y 2 ,...,y p from a set of measuredvariables x 1 ,x 2 ,...,x p in a sequence y 1 = x 1 , y 2 = x 2 − b 21 x 1 ,y 3 = x 3 − b 31 x 1 − b 32 x 2 , ...,y p = x p − b p1 x 1 − b p2 x 2 − ...b p(p−1) x (p−1) ,but for a different set of b kj . Although the details are not entirely clear itappears that, unlike Ottestad’s (1975) method, the ordering in the sequenceis not determined by statistical criteria, but simply corresponds to thelabels on the original x variables. Atiqullah and Uddin (1993) transformthe covariance matrix for the x variables into upper triangular form, withdiagonal elements equal to unity. The elements of this matrix above thediagonal are then the b kj . As with Ottestad’s method, the new variables,called sweep-out components, are uncorrelated.Rather than compare variances of y 1 ,y 2 ,...,y p , which do not sum to∑ pj=1 var(x i), both Ottestad (1975) and Atiqullah and Uddin (1993) de-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!