12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

9.2. Cluster <strong>Analysis</strong> 221choosing a subset of k variables is based on how closely a Procrustes rotationof the configuration of scores on the first q PCs for a selected subset ofvariables matches the corresponding configuration based on the first q PCsfor the full set of variables. It is shown that the visibility of group structuremay be enhanced by plotting with respect to PCs that are calculated fromonly a subset of variables. The selected variables differ from those chosenby the methods of Section 6.3 (Krzanowski, 1987b; Jmel, 1992), which illustratesagain that different selection rules are needed, depending on thepurpose for which the variables are chosen (<strong>Jolliffe</strong>, 1987a).Some types of projection pursuit are far more computationally demandingthan PCA, and for large data sets an initial reduction of dimensionmay be necessary before they can be implemented. In such cases, Friedman(1987) suggests reducing dimensionality by retaining only the high-variancePCs from a data set and conducting the projection pursuit on those. Caussinus(1987) argues that an initial reduction of dimensionality using PCAmay be useful even when there are no computational problems.9.2.3 Mixture ModelsCluster analysis traditionally consisted of a wide variety of rather ad hocdescriptive techniques, with little in the way of statistical underpinning.The consequence was that it was fine for dissection, but less satisfactoryfor deciding whether clusters actually existed and, if so, how many therewere. An attractive alternative approach is to model the cluster structureby a mixture model, in which the probability density function (p.d.f.) forthe vector of variables x is expressed asG∑f(x; θ) = π g f g (x; θ g ), (9.2.2)g=1where G is the number of clusters, π g is the probability of an observationcoming from the gth cluster, f g (x; θ g ) is the p.d.f. in the gth cluster, andθ ′ =(θ ′ 1, θ ′ 2,...,θ ′ G) is a vector of parameters that must be estimated.A particular form needs to be assumed for each p.d.f. f g (x; θ g ), the mostusual choice being multivariate normality in which θ g consists of the meanvector µ g , and the covariance matrix Σ g , for the gth cluster.The problem of fitting a model such as (9.2.2) is difficult, even for smallvalues of p and G, so the approach was largely ignored at the time whenmany clustering algorithms were developed. Later advances in theory, incomputational sophistication, and in computing speed made it possible forversions of (9.2.2) to be developed and fitted, especially in the univariatecase (see, for example, Titterington et al. (1985); McLachlan and Bashford(1988); Böhning (1999)). However, many multivariate problems are stillintractable because of the large number of parameters that need to be estimated.For example, in a multivariate normal mixture the total number of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!