12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

376 14. Generalizations and Adaptations of <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong>1984, Chapter 5). This technique has at its core the idea of assigning scoresto each category of each variable. It can be shown that if a PCA is doneon the correlation matrix of these scores, the first PC is equivalent to thefirst non-trivial multiple correspondence analysis dimension (Bekker andde Leeuw, 1988). These authors give further discussion of the relationshipsbetween the different varieties of non-linear PCA.Mori et al. (1998) combine the Gifi approach with the procedure describedby Tanaka and Mori (1997) for selecting a subset of variables (seeSection 6.3). Using the optimal values of the variables c j found by minimizing(14.1.3), variables are selected in the same way as in Tanaka and Mori(1997). The results can be thought of as either an extension of Tanaka andMori’s method to qualitative data, or as a simplication of Gifi’s non-linearPCA by using only a subset of variables.An approach that overlaps with—but differs from—the main Gifi ideasunderlying non-linear PCA is described by Meulman (1986). Categoricaldata are again transformed to give optimal scores or values for eachcategory of each variable, and simultaneously a small number of optimaldimensions is found within which to represent these scores. The‘non-linearity’ of the technique becomes more obvious when a continuousvariable is fitted into this framework by first dividing its range of valuesinto a finite number of categories and then assigning a value to each category.The non-linear transformation is thus a step function. Meulman’s(1986) proposal, which is known as the distance approach to nonlinearmultivariate data analysis, differs from the main Gifi (1990) frameworkby using different optimality criteria (loss functions) instead of (14.1.3).Gifi’s (1990) algorithms concentrate on the representation of the variablesin the analysis, so that representation of the objects (observations) canbe suboptimal. The distance approach directly approximates distances betweenobjects. Krzanowski and Marriott (1994, Chapter 8) give a readableintroduction to, and an example of, the distance approach.An example of Gifi non-linear PCA applied in an agricultural contextand involving a mixture of categorical and numerical variables is given byKroonenberg et al. (1997). Michailidis and de Leeuw (1998) discuss variousaspects of stability for Gifi-based methods, and Verboon (1993) describesa robust version of a Gifi-like procedure.A sophisticated way of replacing the variables by functions of the variables,and hence incorporating non-linearity, is described by Besse andFerraty (1995). It is based on an adaptation of the fixed effects modelwhich was introduced in Section 3.9. The adaptation is that, whereas beforewe had E(x i )=z i ,nowE[f(x i )] = z i , where f(x i )isap-dimensionalvector of functions of x i . As before, z i lies in a q-dimensional subspace F q ,but var(e i ) is restricted to be σ 2 I p . The quantity to be minimized is similarto (3.9.1) with x i replaced by f(x i ). In the current problem it is necessaryto choose q and then optimize with respect to the q-dimensional subspaceF q and with respect to the functions f(.). The functions must be restricted

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!