Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

3.7. Inference Based on Sample Principal Components 49If a distribution other than the multivariate normal is assumed, distributionalresults for PCs will typically become less tractable. Jackson (1991,Section 4.8) gives a number of references that examine the non-normal case.In addition, for non-normal distributions a number of alternatives to PCscan reasonably be suggested (see Sections 13.1, 13.3 and 14.4).Another deviation from the assumptions underlying most of the distributionalresults arises when the n observations are not independent. Theclassic examples of this are when the observations correspond to adjacentpoints in time (a time series) or in space. Another situation where nonindependenceoccurs is found in sample surveys, where survey designs areoften more complex than simple random sampling, and induce dependencebetween observations (see Skinner et al. (1986)). PCA for non-independentdata, especially time series, is discussed in detail in Chapter 12.As a complete contrast to the strict assumptions made in most workon the distributions of PCs, Efron and Tibshirani (1993, Section 7.2) lookat the use of the ‘bootstrap’ in this context. The idea is, for a particularsample of n observations x 1 , x 2 ,..., x n , to take repeated random samplesof size n from the distribution that has P [x = x i ]= 1 n,i=1, 2,...,n,calculate the PCs for each sample, and build up empirical distributions forPC coefficients and variances. These distributions rely only on the structureof the sample, and not on any predetermined assumptions. Care needs tobe taken in comparing PCs from different bootstrap samples because ofpossible reordering and/or sign switching in the PCs from different samples.Failure to account for these phenomena is likely to give misleadingly widedistributions for PC coefficients, and distributions for PC variances thatmay be too narrow.3.7 Inference Based on Sample PrincipalComponentsThe distributional results outlined in the previous section may be usedto make inferences about population PCs, given the sample PCs, providedthat the necessary assumptions are valid. The major assumption that x hasa multivariate normal distribution is often not satisfied and the practicalvalue of the results is therefore limited. It can be argued that PCA shouldonly ever be done for data that are, at least approximately, multivariatenormal, for it is only then that ‘proper’ inferences can be made regardingthe underlying population PCs. As already noted in Section 2.2, this isa rather narrow view of what PCA can do, as it is a much more widelyapplicable tool whose main use is descriptive rather than inferential. Itcan provide valuable descriptive information for a wide variety of data,whether the variables are continuous and normally distributed or not. Themajority of applications of PCA successfully treat the technique as a purely

50 3. Properties of Sample Principal Componentsdescriptive tool, although Mandel (1972) argued that retaining m PCs inan analysis implicitly assumes a model for the data, based on (3.5.3). Therehas recently been an upsurge of interest in models related to PCA; this isdiscussed further in Section 3.9.Although the purely inferential side of PCA is a very small part of theoverall picture, the ideas of inference can sometimes be useful and arediscussed briefly in the next three subsections.3.7.1 Point EstimationThe maximum likelihood estimator (MLE) for Σ, the covariance matrix ofa multivariate normal distribution, is not S, but (n−1)nS (see, for example,Press (1972, Section 7.1) for a derivation). This result is hardly surprising,given the corresponding result for the univariate normal. If λ, l, α k , a k andrelated quantities are defined as in the previous section, then the MLEs ofλ and α k ,k=1, 2,...,p, can be derived from the MLE of Σ and are equalto ˆλ = (n−1)nl,and ˆα k = a k ,k=1, 2,...,p, assuming that the elements ofλ are all positive and distinct. The MLEs are the same in this case as theestimators derived by the method of moments. The MLE for λ k is biasedbut asymptotically unbiased, as is the MLE for Σ. As noted in the previoussection, l itself, as well as ˆλ, is a biased estimator for λ, but ‘corrections’can be made to reduce the bias.In the case where some of the λ k are equal, the MLE for their commonvalue is simply the average of the corresponding l k , multiplied by (n−1)/n.The MLEs of the α k corresponding to equal λ k are not unique; the (p × q)matrix whose columns are MLEs of α k corresponding to equal λ k can bemultiplied by any (q × q) orthogonal matrix, where q is the multiplicity ofthe eigenvalues, to get another set of MLEs.Most often, point estimates of λ, α k are simply given by l, a k , and theyare rarely accompanied by standard errors. An exception is Flury (1997,Section 8.6). Jackson (1991, Sections 5.3, 7.5) goes further and gives examplesthat not only include estimated standard errors, but also estimatesof the correlations between elements of l and between elements of a k anda k ′. The practical implications of these (sometimes large) correlations arediscussed in Jackson’s examples. Flury (1988, Sections 2.5, 2.6) gives athorough discussion of asymptotic inference for functions of the variancesand coefficients of covariance-based PCs.If multivariate normality cannot be assumed, and if there is no obviousalternative distributional assumption, then it may be desirable to use a‘robust’ approach to the estimation of the PCs: this topic is discussed inSection 10.4.

50 3. Properties of Sample <strong>Principal</strong> <strong>Component</strong>sdescriptive tool, although Mandel (1972) argued that retaining m PCs inan analysis implicitly assumes a model for the data, based on (3.5.3). Therehas recently been an upsurge of interest in models related to PCA; this isdiscussed further in Section 3.9.Although the purely inferential side of PCA is a very small part of theoverall picture, the ideas of inference can sometimes be useful and arediscussed briefly in the next three subsections.3.7.1 Point EstimationThe maximum likelihood estimator (MLE) for Σ, the covariance matrix ofa multivariate normal distribution, is not S, but (n−1)nS (see, for example,Press (1972, Section 7.1) for a derivation). This result is hardly surprising,given the corresponding result for the univariate normal. If λ, l, α k , a k andrelated quantities are defined as in the previous section, then the MLEs ofλ and α k ,k=1, 2,...,p, can be derived from the MLE of Σ and are equalto ˆλ = (n−1)nl,and ˆα k = a k ,k=1, 2,...,p, assuming that the elements ofλ are all positive and distinct. The MLEs are the same in this case as theestimators derived by the method of moments. The MLE for λ k is biasedbut asymptotically unbiased, as is the MLE for Σ. As noted in the previoussection, l itself, as well as ˆλ, is a biased estimator for λ, but ‘corrections’can be made to reduce the bias.In the case where some of the λ k are equal, the MLE for their commonvalue is simply the average of the corresponding l k , multiplied by (n−1)/n.The MLEs of the α k corresponding to equal λ k are not unique; the (p × q)matrix whose columns are MLEs of α k corresponding to equal λ k can bemultiplied by any (q × q) orthogonal matrix, where q is the multiplicity ofthe eigenvalues, to get another set of MLEs.Most often, point estimates of λ, α k are simply given by l, a k , and theyare rarely accompanied by standard errors. An exception is Flury (1997,Section 8.6). Jackson (1991, Sections 5.3, 7.5) goes further and gives examplesthat not only include estimated standard errors, but also estimatesof the correlations between elements of l and between elements of a k anda k ′. The practical implications of these (sometimes large) correlations arediscussed in Jackson’s examples. Flury (1988, Sections 2.5, 2.6) gives athorough discussion of asymptotic inference for functions of the variancesand coefficients of covariance-based PCs.If multivariate normality cannot be assumed, and if there is no obviousalternative distributional assumption, then it may be desirable to use a‘robust’ approach to the estimation of the PCs: this topic is discussed inSection 10.4.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!