12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

380 14. Generalizations and Adaptations of <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong>networks interface are covered briefly in Section 14.6.1.Diamantaras and Kung (1996, Section 6.6) give a general definition ofnon-linear PCA as minimizingE[‖x − g(h(x))‖ 2 ], (14.1.4)where y = h(x) isaq(< p)-dimensional function of x and g(y) isapdimensionalfunction of y. The functions g(.), h(.) are chosen from somegiven sets of non-linear functions so as to minimize (14.1.4). When g(.)and h(.) are restricted to be linear functions, it follows from Property A5of Section 2.1 that minimizing (14.1.4) gives the usual (linear) PCs.Diamantaras and Kung (1996, Section 6.6.1) note that for some typesof network allowing non-linear functions leads to no improvement in minimizing(14.1.4) compared to the linear case. Kramer (1991) describes anetwork for which improvement does occur. There are two parts to thenetwork, one that creates the components z k from the p variables x j ,anda second that approximates the p variables given a reduced set of m (< p)components. The components are constructed from the variables by meansof the formulaN∑ [ ∑pz k = w lk2 σ w jl1 x j + θ l],where[ ∑p ]σ w jl1 x j + θ l =j=1l=1j=1[ (1 + exp −p∑ ) ]w jl1 x j − θ l−1, (14.1.5)in which w lk2 ,w jl1 ,θ l , j =1, 2,...,p; k =1, 2,...,m; l =1, 2,...,N areconstants to be chosen, and N is the number of nodes in the hidden layer.A similar equation relates the estimated variables ˆx j to the components z k ,and Kramer (1991) combines both relationships into a single network. Theobjective is find the values of all the unknown constants so as to minimizethe Euclidean norm of the matrix of residuals formed by estimating nvalues of each x j by the corresponding values of ˆx j . This is therefore aspecial case of Diamantaras and Kung’s general formulation with g(.), h(.)both restricted to the class of non-linear functions defined by (14.1.5).For Kramer’s network, m and N need to chosen, and he discusses variousstrategies for doing this, including the use of information criteria such asAIC (Akaike, 1974) and the comparison of errors in training and test setsto avoid overfitting. In the approach just described, m components arecalculated simultaneously, but Kramer (1991) also discusses a sequentialversion in which one component at a time is extracted. Two examples aregiven of very different sizes. One is a two-variable artificial example in whichnon-linear PCA finds a built-in non-linearity. The second is from chemicalengineering with 100 variables, and again non-linear PCA appears to besuperior to its linear counterpart.j=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!