Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
7.2. Estimation of the Factor Model 157a sample estimate of the matrix of partial correlations is calculated. Thedeterminant of this matrix will attain its maximum value of unity whenall its off-diagonal elements are zero, so that maximizing this determinantis one way of attempting to minimize the absolute values of thepartial correlations. This maximization problem leads to the MLEs, buthere they appear regardless of whether or not multivariate normalityholds.The procedure suggested by Rao (1955) is based on canonical correlationanalysis (see Section 9.3) between x and f. He looks, successively,for pairs of linear functions {a ′ k1 x, a′ k2f} that have maximum correlationsubject to being uncorrelated with previous pairs. The factor loadings arethen proportional to the elements of the a k2 , k =1, 2,...,m,whichinturn leads to the same loadings as for the MLEs based on the assumptionof multivariate normality (Rao, 1955). As with the criterion based onpartial correlations, no distributional assumptions are necessary for Rao’scanonical analysis.In a way, the behaviour of the partial correlation and canonical correlationcriteria parallels the phenomenon in regression where the leastsquares criterion is valid regardless of the distribution of error terms, butif errors are normally distributed then least squares estimators have theadded attraction of maximizing the likelihood function.An alternative but popular way of getting initial estimates for Λ is touse the first m PCs. If z = A ′ x is the vector consisting of all p PCs, withA defined to have α k ,thekth eigenvector of Σ, as its kth column as in(2.1.1), then x = Az because of the orthogonality of A. IfA is partitionedinto its first m and last (p − m) columns, with a similar partitioning of therows of z, then( )x =(A m | A ∗ zmp−m)(7.2.3)z ∗ p−mwhere= A m z m + A ∗ p−mz ∗ p−m= Λf + e,Λ = A m , f = z m and e = A ∗ p−mz ∗ p−m.Equation (7.2.3) looks very much like the factor model (7.1.2) but it violatesa basic assumption of the factor model, because the elements of e in (7.2.3)are not usually uncorrelated. Despite the apparently greater sophisticationof using the sample version of A m as an initial estimator, compared withcrude techniques such as centroid estimates, its theoretical justification isreally no stronger.As well as the straightforward use of PCs to estimate Λ, many varietiesof factor analysis use modifications of this approach; this topic will bediscussed further in the next section.
158 7. Principal Component Analysis and Factor Analysis7.3 Comparisons and Contrasts Between FactorAnalysis and Principal Component AnalysisAs mentioned in Section 7.1 a major distinction between factor analysisand PCA is that there is a definite model underlying factor analysis, butfor most purposes no model is assumed in PCA. Section 7.2 concluded bydescribing the most common way in which PCs are used in factor analysis.Further connections and contrasts between the two techniques are discussedin the present section, but first we revisit the ‘models’ that have beenproposed for PCA. Recall from Section 3.9 that Tipping and Bishop (1999a)describe a model in which x has covariance matrix BB ′ + σ 2 I p , where B isa(p × q) matrix. Identifying B with Λ, andq with m, it is clear that thismodel is equivalent to a special case of equation (7.2.1) in which Ψ = σ 2 I p ,so that all p specific variances are equal.De Leeuw (1986) refers to a generalization of Tipping and Bishop’s(1999a) model, in which σ 2 I p is replaced by a general covariance matrix forthe error terms in the model, as the (random factor score) factor analysismodel. This model is also discussed by Roweis (1997). A related model,in which the factors are assumed to be fixed rather than random, correspondsto Caussinus’s (1986) fixed effects model, which he also calls the‘fixed factor scores model.’ In such models, variability amongst individualsis mainly due to different means rather than to individuals’ covariancestructure, so they are distinctly different from the usual factor analysisframework.Both factor analysis and PCA can be thought of as trying to representsome aspect of the covariance matrix Σ (or correlation matrix) as wellas possible, but PCA concentrates on the diagonal elements, whereas infactor analysis the interest is in the off-diagonal elements. To justify thisstatement, consider first PCA. The objective is to maximize ∑ mk=1 var(z k)or, as ∑ pk=1 var(z k)= ∑ pj=1 var(x j), to account for as much as possibleof the sum of diagonal elements of Σ. As discussed after Property A3 inSection 2.1, the first m PCs will in addition often do a good job of explainingthe off-diagonal elements of Σ, which means that PCs can frequentlyprovide an adequate initial solution in a factor analysis. However, this isnot the stated purpose of PCA and will not hold universally. Turning nowto factor analysis, consider the factor model (7.1.2) and the correspondingequation (7.2.1) for Σ. It is seen that, as Ψ is diagonal, the common factorterm Λf in (7.1.2) accounts completely for the off -diagonal elementsof Σ in the perfect factor model, but there is no compulsion for the diagonalelements to be well explained by the common factors. The elements,ψ j ,j=1, 2,...,p,ofΨ will all be low if all of the variables have considerablecommon variation, but if a variable x j is almost independent of allother variables, then ψ j =var(e j ) will be almost as large as var(x j ). Thus,factor analysis concentrates on explaining only the off-diagonal elements of
- Page 138 and 139: 5.6. Displaying Intrinsically High-
- Page 140 and 141: 5.6. Displaying Intrinsically High-
- Page 142 and 143: 6Choosing a Subset of PrincipalComp
- Page 144 and 145: 6.1. How Many Principal Components?
- Page 146 and 147: 6.1. How Many Principal Components?
- Page 148 and 149: 6.1. How Many Principal Components?
- Page 150 and 151: 6.1. How Many Principal Components?
- Page 152 and 153: 6.1. How Many Principal Components?
- Page 154 and 155: 6.1. How Many Principal Components?
- Page 156 and 157: 6.1. How Many Principal Components?
- Page 158 and 159: 6.1. How Many Principal Components?
- Page 160 and 161: 6.1. How Many Principal Components?
- Page 162 and 163: 6.1. How Many Principal Components?
- Page 164 and 165: 6.2. Choosing m, the Number of Comp
- Page 166 and 167: 6.2. Choosing m, the Number of Comp
- Page 168 and 169: 6.3. Selecting a Subset of Variable
- Page 170 and 171: 6.3. Selecting a Subset of Variable
- Page 172 and 173: 6.3. Selecting a Subset of Variable
- Page 174 and 175: 6.3. Selecting a Subset of Variable
- Page 176 and 177: 6.4. Examples Illustrating Variable
- Page 178 and 179: 6.4. Examples Illustrating Variable
- Page 180 and 181: 6.4. Examples Illustrating Variable
- Page 182 and 183: 7.1. Models for Factor Analysis 151
- Page 184 and 185: 7.2. Estimation of the Factor Model
- Page 186 and 187: 7.2. Estimation of the Factor Model
- Page 190 and 191: 7.3. Comparisons Between Factor and
- Page 192 and 193: 7.4. An Example of Factor Analysis
- Page 194 and 195: 7.4. An Example of Factor Analysis
- Page 196 and 197: 7.5. Concluding Remarks 165To illus
- Page 198 and 199: 8Principal Components in Regression
- Page 200 and 201: 8.1. Principal Component Regression
- Page 202 and 203: 8.1. Principal Component Regression
- Page 204 and 205: 8.2. Selecting Components in Princi
- Page 206 and 207: 8.2. Selecting Components in Princi
- Page 208 and 209: 8.3. Connections Between PC Regress
- Page 210 and 211: 8.4. Variations on Principal Compon
- Page 212 and 213: 8.4. Variations on Principal Compon
- Page 214 and 215: 8.4. Variations on Principal Compon
- Page 216 and 217: 8.5. Variable Selection in Regressi
- Page 218 and 219: 8.5. Variable Selection in Regressi
- Page 220 and 221: 8.6. Functional and Structural Rela
- Page 222 and 223: 8.7. Examples of Principal Componen
- Page 224 and 225: Table 8.3. Principal component regr
- Page 226 and 227: 8.7. Examples of Principal Componen
- Page 228 and 229: 8.7. Examples of Principal Componen
- Page 230 and 231: 9Principal Components Used withOthe
- Page 232 and 233: 9.1. Discriminant Analysis 201on th
- Page 234 and 235: 9.1. Discriminant Analysis 203Figur
- Page 236 and 237: 9.1. Discriminant Analysis 205Corbi
7.2. Estimation of the Factor Model 157a sample estimate of the matrix of partial correlations is calculated. Thedeterminant of this matrix will attain its maximum value of unity whenall its off-diagonal elements are zero, so that maximizing this determinantis one way of attempting to minimize the absolute values of thepartial correlations. This maximization problem leads to the MLEs, buthere they appear regardless of whether or not multivariate normalityholds.The procedure suggested by Rao (1955) is based on canonical correlationanalysis (see Section 9.3) between x and f. He looks, successively,for pairs of linear functions {a ′ k1 x, a′ k2f} that have maximum correlationsubject to being uncorrelated with previous pairs. The factor loadings arethen proportional to the elements of the a k2 , k =1, 2,...,m,whichinturn leads to the same loadings as for the MLEs based on the assumptionof multivariate normality (Rao, 1955). As with the criterion based onpartial correlations, no distributional assumptions are necessary for Rao’scanonical analysis.In a way, the behaviour of the partial correlation and canonical correlationcriteria parallels the phenomenon in regression where the leastsquares criterion is valid regardless of the distribution of error terms, butif errors are normally distributed then least squares estimators have theadded attraction of maximizing the likelihood function.An alternative but popular way of getting initial estimates for Λ is touse the first m PCs. If z = A ′ x is the vector consisting of all p PCs, withA defined to have α k ,thekth eigenvector of Σ, as its kth column as in(2.1.1), then x = Az because of the orthogonality of A. IfA is partitionedinto its first m and last (p − m) columns, with a similar partitioning of therows of z, then( )x =(A m | A ∗ zmp−m)(7.2.3)z ∗ p−mwhere= A m z m + A ∗ p−mz ∗ p−m= Λf + e,Λ = A m , f = z m and e = A ∗ p−mz ∗ p−m.Equation (7.2.3) looks very much like the factor model (7.1.2) but it violatesa basic assumption of the factor model, because the elements of e in (7.2.3)are not usually uncorrelated. Despite the apparently greater sophisticationof using the sample version of A m as an initial estimator, compared withcrude techniques such as centroid estimates, its theoretical justification isreally no stronger.As well as the straightforward use of PCs to estimate Λ, many varietiesof factor analysis use modifications of this approach; this topic will bediscussed further in the next section.