Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

7.2. Estimation of the Factor Model 157a sample estimate of the matrix of partial correlations is calculated. Thedeterminant of this matrix will attain its maximum value of unity whenall its off-diagonal elements are zero, so that maximizing this determinantis one way of attempting to minimize the absolute values of thepartial correlations. This maximization problem leads to the MLEs, buthere they appear regardless of whether or not multivariate normalityholds.The procedure suggested by Rao (1955) is based on canonical correlationanalysis (see Section 9.3) between x and f. He looks, successively,for pairs of linear functions {a ′ k1 x, a′ k2f} that have maximum correlationsubject to being uncorrelated with previous pairs. The factor loadings arethen proportional to the elements of the a k2 , k =1, 2,...,m,whichinturn leads to the same loadings as for the MLEs based on the assumptionof multivariate normality (Rao, 1955). As with the criterion based onpartial correlations, no distributional assumptions are necessary for Rao’scanonical analysis.In a way, the behaviour of the partial correlation and canonical correlationcriteria parallels the phenomenon in regression where the leastsquares criterion is valid regardless of the distribution of error terms, butif errors are normally distributed then least squares estimators have theadded attraction of maximizing the likelihood function.An alternative but popular way of getting initial estimates for Λ is touse the first m PCs. If z = A ′ x is the vector consisting of all p PCs, withA defined to have α k ,thekth eigenvector of Σ, as its kth column as in(2.1.1), then x = Az because of the orthogonality of A. IfA is partitionedinto its first m and last (p − m) columns, with a similar partitioning of therows of z, then( )x =(A m | A ∗ zmp−m)(7.2.3)z ∗ p−mwhere= A m z m + A ∗ p−mz ∗ p−m= Λf + e,Λ = A m , f = z m and e = A ∗ p−mz ∗ p−m.Equation (7.2.3) looks very much like the factor model (7.1.2) but it violatesa basic assumption of the factor model, because the elements of e in (7.2.3)are not usually uncorrelated. Despite the apparently greater sophisticationof using the sample version of A m as an initial estimator, compared withcrude techniques such as centroid estimates, its theoretical justification isreally no stronger.As well as the straightforward use of PCs to estimate Λ, many varietiesof factor analysis use modifications of this approach; this topic will bediscussed further in the next section.

158 7. Principal Component Analysis and Factor Analysis7.3 Comparisons and Contrasts Between FactorAnalysis and Principal Component AnalysisAs mentioned in Section 7.1 a major distinction between factor analysisand PCA is that there is a definite model underlying factor analysis, butfor most purposes no model is assumed in PCA. Section 7.2 concluded bydescribing the most common way in which PCs are used in factor analysis.Further connections and contrasts between the two techniques are discussedin the present section, but first we revisit the ‘models’ that have beenproposed for PCA. Recall from Section 3.9 that Tipping and Bishop (1999a)describe a model in which x has covariance matrix BB ′ + σ 2 I p , where B isa(p × q) matrix. Identifying B with Λ, andq with m, it is clear that thismodel is equivalent to a special case of equation (7.2.1) in which Ψ = σ 2 I p ,so that all p specific variances are equal.De Leeuw (1986) refers to a generalization of Tipping and Bishop’s(1999a) model, in which σ 2 I p is replaced by a general covariance matrix forthe error terms in the model, as the (random factor score) factor analysismodel. This model is also discussed by Roweis (1997). A related model,in which the factors are assumed to be fixed rather than random, correspondsto Caussinus’s (1986) fixed effects model, which he also calls the‘fixed factor scores model.’ In such models, variability amongst individualsis mainly due to different means rather than to individuals’ covariancestructure, so they are distinctly different from the usual factor analysisframework.Both factor analysis and PCA can be thought of as trying to representsome aspect of the covariance matrix Σ (or correlation matrix) as wellas possible, but PCA concentrates on the diagonal elements, whereas infactor analysis the interest is in the off-diagonal elements. To justify thisstatement, consider first PCA. The objective is to maximize ∑ mk=1 var(z k)or, as ∑ pk=1 var(z k)= ∑ pj=1 var(x j), to account for as much as possibleof the sum of diagonal elements of Σ. As discussed after Property A3 inSection 2.1, the first m PCs will in addition often do a good job of explainingthe off-diagonal elements of Σ, which means that PCs can frequentlyprovide an adequate initial solution in a factor analysis. However, this isnot the stated purpose of PCA and will not hold universally. Turning nowto factor analysis, consider the factor model (7.1.2) and the correspondingequation (7.2.1) for Σ. It is seen that, as Ψ is diagonal, the common factorterm Λf in (7.1.2) accounts completely for the off -diagonal elementsof Σ in the perfect factor model, but there is no compulsion for the diagonalelements to be well explained by the common factors. The elements,ψ j ,j=1, 2,...,p,ofΨ will all be low if all of the variables have considerablecommon variation, but if a variable x j is almost independent of allother variables, then ψ j =var(e j ) will be almost as large as var(x j ). Thus,factor analysis concentrates on explaining only the off-diagonal elements of

158 7. <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> and Factor <strong>Analysis</strong>7.3 Comparisons and Contrasts Between Factor<strong>Analysis</strong> and <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong>As mentioned in Section 7.1 a major distinction between factor analysisand PCA is that there is a definite model underlying factor analysis, butfor most purposes no model is assumed in PCA. Section 7.2 concluded bydescribing the most common way in which PCs are used in factor analysis.Further connections and contrasts between the two techniques are discussedin the present section, but first we revisit the ‘models’ that have beenproposed for PCA. Recall from Section 3.9 that Tipping and Bishop (1999a)describe a model in which x has covariance matrix BB ′ + σ 2 I p , where B isa(p × q) matrix. Identifying B with Λ, andq with m, it is clear that thismodel is equivalent to a special case of equation (7.2.1) in which Ψ = σ 2 I p ,so that all p specific variances are equal.De Leeuw (1986) refers to a generalization of Tipping and Bishop’s(1999a) model, in which σ 2 I p is replaced by a general covariance matrix forthe error terms in the model, as the (random factor score) factor analysismodel. This model is also discussed by Roweis (1997). A related model,in which the factors are assumed to be fixed rather than random, correspondsto Caussinus’s (1986) fixed effects model, which he also calls the‘fixed factor scores model.’ In such models, variability amongst individualsis mainly due to different means rather than to individuals’ covariancestructure, so they are distinctly different from the usual factor analysisframework.Both factor analysis and PCA can be thought of as trying to representsome aspect of the covariance matrix Σ (or correlation matrix) as wellas possible, but PCA concentrates on the diagonal elements, whereas infactor analysis the interest is in the off-diagonal elements. To justify thisstatement, consider first PCA. The objective is to maximize ∑ mk=1 var(z k)or, as ∑ pk=1 var(z k)= ∑ pj=1 var(x j), to account for as much as possibleof the sum of diagonal elements of Σ. As discussed after Property A3 inSection 2.1, the first m PCs will in addition often do a good job of explainingthe off-diagonal elements of Σ, which means that PCs can frequentlyprovide an adequate initial solution in a factor analysis. However, this isnot the stated purpose of PCA and will not hold universally. Turning nowto factor analysis, consider the factor model (7.1.2) and the correspondingequation (7.2.1) for Σ. It is seen that, as Ψ is diagonal, the common factorterm Λf in (7.1.2) accounts completely for the off -diagonal elementsof Σ in the perfect factor model, but there is no compulsion for the diagonalelements to be well explained by the common factors. The elements,ψ j ,j=1, 2,...,p,ofΨ will all be low if all of the variables have considerablecommon variation, but if a variable x j is almost independent of allother variables, then ψ j =var(e j ) will be almost as large as var(x j ). Thus,factor analysis concentrates on explaining only the off-diagonal elements of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!