12.07.2015 Views

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

60 3. Properties of Sample <strong>Principal</strong> <strong>Component</strong>swhere M denotes a metric (see Section 14.2.2) and may be related to Γ.This statement of the model generalizes the usual form of PCA, for whichw i = 1 n ,i =1, 2,...,n and M = I p, to allow different weights on the observationsand a choice of metric. When M = Γ −1 , and the distribution ofthe x i is multivariate normal, the estimates obtained by minimizing (3.9.1)are maximum likelihood estimates (Besse, 1994b). An interesting aspect ofthe fixed effects model is that it moves away from the idea of a sample ofidentically distributed observations whose covariance or correlation structureis to be explored, to a formulation in which the variation among themeans of the observations is the feature of interest.Tipping and Bishop (1999a) describe a model in which column-centredobservations x i are independent normally distributed random variableswith zero means and covariance matrix BB ′ + σ 2 I p , where B is a (p × q)matrix. We shall see in Chapter 7 that this is a special case of a factoranalysis model. The fixed effects model also has links to factor analysisand, indeed, de Leeuw (1986) suggests in discussion of Caussinus (1986)that the model is closer to factor analysis than to PCA. Similar modelsdate back to Young (1941).Tipping and Bishop (1999a) show that, apart from a renormalization ofcolumns, and the possibility of rotation, the maximum likelihood estimateof B is the matrix A q of PC coefficients defined earlier (see also de Leeuw(1986)). The MLE for σ 2 is the average of the smallest (p − q) eigenvaluesof the sample covariance matrix S. Tipping and Bishop (1999a) fittheir model using the EM algorithm (Dempster et al. (1977)), treating theunknown underlying components as ‘missing values.’ Clearly, the complicationof the EM algorithm is not necessary once we realise that we aredealing with PCA, but it has advantages when the model is extended tocope with genuinely missing data or to mixtures of distributions (see Sections13.6, 9.2.3). Bishop (1999) describes a Bayesian treatment of Tippingand Bishop’s (1999a) model. The main objective in introducing a prior distributionfor B appears to be as a means of deciding on its dimension q(see Section 6.1.5).Roweis (1997) also uses the EM algorithm to fit a model for PCA. Hismodel is more general than Tipping and Bishop’s, with the error covariancematrix allowed to take any form, rather than being restricted to σ 2 I p .Inthis respect it is more similar to the fixed effects model with equal weights,but differs from it by not specifying different means for different observations.Roweis (1997) notes that a full PCA, with all p PCs, is obtainedfrom his model in the special case where the covariance matrix is σ 2 I p andσ 2 → 0. He refers to the analysis based on Tipping and Bishop’s (1999a)model with σ 2 > 0assensible principal component analysis.Martin (1988) considers another type of probability-based PCA, in whicheach of the n observations has a probability distribution in p-dimensionalspace centred on it, rather than being represented by a single point. In

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!