Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

3.9. Models for Principal Component Analysis 59Table 3.5. Principal components based on the correlation matrix of Table 3.4Component 1 2 3 4 5 6 7 8 9 10numberCoefficientsV1 0.3 −0.2 0.2 −0.5 0.3 0.1 −0.1 −0.0 −0.6 0.2V2 0.4 −0.2 0.2 −0.5 0.3 0.0 −0.1 −0.0 0.7 −0.3V3 0.4 −0.1 −0.1 −0.0 −0.7 0.5 −0.2 0.0 0.1 0.1V4 0.4 −0.1 −0.1 −0.0 −0.4 −0.7 0.3 −0.0 −0.1 −0.1V5 0.3 −0.2 0.1 0.5 0.2 0.2 −0.0 −0.1 −0.2 −0.6V6 0.3 −0.2 0.2 0.5 0.2 −0.1 −0.0 0.1 0.2 0.6V7 0.3 0.3 −0.5 −0.0 0.2 0.3 0.7 0.0 −0.0 0.0V8 0.3 0.3 −0.5 0.1 0.2 −0.2 −0.7 −0.0 −0.0 −0.0V9 0.2 0.5 0.4 0.0 −0.1 0.0 −0.0 0.7 −0.0 −0.1V10 0.2 0.5 0.4 0.0 −0.1 0.0 0.0 −0.7 0.0 0.0Percentage of 52.3 20.4 11.0 8.5 5.0 1.0 0.9 0.6 0.2 0.2total variation explained3.9 Models for Principal Component AnalysisThere is a variety of interpretations of what is meant by a model in thecontext of PCA. Mandel (1972) considers the retention of m PCs, basedon the SVD (3.5.3), as implicitly using a model. Caussinus (1986) discussesthree types of ‘model.’ The first is a ‘descriptive algebraic model,’ which inits simplest form reduces to the SVD. It can also be generalized to includea choice of metric, rather than simply using a least squares approach. Suchgeneralizations are discussed further in Section 14.2.2. This model has norandom element, so there is no idea of expectation or variance. Hence itcorresponds to Pearson’s geometric view of PCA, rather than to Hotelling’svariance-based approach.Caussinus’s (1986) second type of model introduces probability distributionsand corresponds to Hotelling’s definition. Once again, the ‘model’can be generalized by allowing a choice of metric.The third type of model described by Caussinus is the so-called fixedeffects model (see also Esposito (1998)). In this model we assume thatthe rows x 1 , x 2 ,...,x n of X are independent random variables, such thatE(x i )=z i , where z i lies in a q-dimensional subspace, F q . Furthermore,if e i = x i − z i , then E(e i )=0 and var(e i )= σ2w iΓ, where Γ is a positivedefinite symmetric matrix and the w i are positive scalars whose sum is 1.Both Γ and the w i are assumed to be known, but σ 2 ,thez i and thesubspace F q all need to be estimated. This is done by minimizingn∑w i ‖x i − z i ‖ 2 M , (3.9.1)i=1

60 3. Properties of Sample Principal Componentswhere M denotes a metric (see Section 14.2.2) and may be related to Γ.This statement of the model generalizes the usual form of PCA, for whichw i = 1 n ,i =1, 2,...,n and M = I p, to allow different weights on the observationsand a choice of metric. When M = Γ −1 , and the distribution ofthe x i is multivariate normal, the estimates obtained by minimizing (3.9.1)are maximum likelihood estimates (Besse, 1994b). An interesting aspect ofthe fixed effects model is that it moves away from the idea of a sample ofidentically distributed observations whose covariance or correlation structureis to be explored, to a formulation in which the variation among themeans of the observations is the feature of interest.Tipping and Bishop (1999a) describe a model in which column-centredobservations x i are independent normally distributed random variableswith zero means and covariance matrix BB ′ + σ 2 I p , where B is a (p × q)matrix. We shall see in Chapter 7 that this is a special case of a factoranalysis model. The fixed effects model also has links to factor analysisand, indeed, de Leeuw (1986) suggests in discussion of Caussinus (1986)that the model is closer to factor analysis than to PCA. Similar modelsdate back to Young (1941).Tipping and Bishop (1999a) show that, apart from a renormalization ofcolumns, and the possibility of rotation, the maximum likelihood estimateof B is the matrix A q of PC coefficients defined earlier (see also de Leeuw(1986)). The MLE for σ 2 is the average of the smallest (p − q) eigenvaluesof the sample covariance matrix S. Tipping and Bishop (1999a) fittheir model using the EM algorithm (Dempster et al. (1977)), treating theunknown underlying components as ‘missing values.’ Clearly, the complicationof the EM algorithm is not necessary once we realise that we aredealing with PCA, but it has advantages when the model is extended tocope with genuinely missing data or to mixtures of distributions (see Sections13.6, 9.2.3). Bishop (1999) describes a Bayesian treatment of Tippingand Bishop’s (1999a) model. The main objective in introducing a prior distributionfor B appears to be as a means of deciding on its dimension q(see Section 6.1.5).Roweis (1997) also uses the EM algorithm to fit a model for PCA. Hismodel is more general than Tipping and Bishop’s, with the error covariancematrix allowed to take any form, rather than being restricted to σ 2 I p .Inthis respect it is more similar to the fixed effects model with equal weights,but differs from it by not specifying different means for different observations.Roweis (1997) notes that a full PCA, with all p PCs, is obtainedfrom his model in the special case where the covariance matrix is σ 2 I p andσ 2 → 0. He refers to the analysis based on Tipping and Bishop’s (1999a)model with σ 2 > 0assensible principal component analysis.Martin (1988) considers another type of probability-based PCA, in whicheach of the n observations has a probability distribution in p-dimensionalspace centred on it, rather than being represented by a single point. In

3.9. Models for <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> 59Table 3.5. <strong>Principal</strong> components based on the correlation matrix of Table 3.4<strong>Component</strong> 1 2 3 4 5 6 7 8 9 10numberCoefficientsV1 0.3 −0.2 0.2 −0.5 0.3 0.1 −0.1 −0.0 −0.6 0.2V2 0.4 −0.2 0.2 −0.5 0.3 0.0 −0.1 −0.0 0.7 −0.3V3 0.4 −0.1 −0.1 −0.0 −0.7 0.5 −0.2 0.0 0.1 0.1V4 0.4 −0.1 −0.1 −0.0 −0.4 −0.7 0.3 −0.0 −0.1 −0.1V5 0.3 −0.2 0.1 0.5 0.2 0.2 −0.0 −0.1 −0.2 −0.6V6 0.3 −0.2 0.2 0.5 0.2 −0.1 −0.0 0.1 0.2 0.6V7 0.3 0.3 −0.5 −0.0 0.2 0.3 0.7 0.0 −0.0 0.0V8 0.3 0.3 −0.5 0.1 0.2 −0.2 −0.7 −0.0 −0.0 −0.0V9 0.2 0.5 0.4 0.0 −0.1 0.0 −0.0 0.7 −0.0 −0.1V10 0.2 0.5 0.4 0.0 −0.1 0.0 0.0 −0.7 0.0 0.0Percentage of 52.3 20.4 11.0 8.5 5.0 1.0 0.9 0.6 0.2 0.2total variation explained3.9 Models for <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong>There is a variety of interpretations of what is meant by a model in thecontext of PCA. Mandel (1972) considers the retention of m PCs, basedon the SVD (3.5.3), as implicitly using a model. Caussinus (1986) discussesthree types of ‘model.’ The first is a ‘descriptive algebraic model,’ which inits simplest form reduces to the SVD. It can also be generalized to includea choice of metric, rather than simply using a least squares approach. Suchgeneralizations are discussed further in Section 14.2.2. This model has norandom element, so there is no idea of expectation or variance. Hence itcorresponds to Pearson’s geometric view of PCA, rather than to Hotelling’svariance-based approach.Caussinus’s (1986) second type of model introduces probability distributionsand corresponds to Hotelling’s definition. Once again, the ‘model’can be generalized by allowing a choice of metric.The third type of model described by Caussinus is the so-called fixedeffects model (see also Esposito (1998)). In this model we assume thatthe rows x 1 , x 2 ,...,x n of X are independent random variables, such thatE(x i )=z i , where z i lies in a q-dimensional subspace, F q . Furthermore,if e i = x i − z i , then E(e i )=0 and var(e i )= σ2w iΓ, where Γ is a positivedefinite symmetric matrix and the w i are positive scalars whose sum is 1.Both Γ and the w i are assumed to be known, but σ 2 ,thez i and thesubspace F q all need to be estimated. This is done by minimizingn∑w i ‖x i − z i ‖ 2 M , (3.9.1)i=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!