Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

8.6. Functional and Structural Relationships 189Consider the case where there are (p + 1) variables x 0 ,x 1 ,x 2 ,...,x p thathave a linear functional relationship (Kendall and Stuart, 1979, p. 416)p∑β j x j = const (8.6.1)j=0between them, but which are all subject to measurement error, so that weactually have observations on ξ 0 ,ξ 1 ,ξ 2 ,...,ξ p , whereξ j = x j + e j ,j =0, 1, 2,...,p,and e j is a measurement error term. The distinction between ‘functional’and ‘structural’ relationships is that x 1 ,x 2 ,...,x p are taken as fixed in theformer but are random variables in the latter. We have included (p +1)variables in order to keep a parallel with the case of linear regression withdependent variable y and p predictor variables x 1 ,x 2 ,...,x p , but there is noreason here to treat any one variable differently from the remaining p. Onthe basis of n observations on ξ j ,j=0, 1, 2,...,p, we wish to estimate thecoefficients β 0 ,β 1 ,...β p in the relationship (8.6.1). If the e j are assumed tobe normally distributed, and (the ratios of) their variances are known, thenmaximum likelihood estimation of β 0 ,β 1 ,...,β p leads to the coefficients ofthe last PC from the covariance matrix of ξ 0 /σ 0 ,ξ 1 /σ 1 ,...,ξ p /σ p , whereσ 2 j =var(e j). This holds for both functional and structural relationships. Ifthere is no information about the variances of the e j , and the x j are distinct,then no formal estimation procedure is possible, but if it is expected thatthe measurement errors of all (p + 1) variables are of similar variability,then a reasonable procedure is to use the last PC of ξ 0 ,ξ 1 ,...,ξ p .If replicate observations are available for each x j , they can be used toestimate var(e j ). In this case, Anderson (1984) shows that the maximumlikelihood estimates for functional, but not structural, relationships aregiven by solving an eigenequation, similar to a generalized PCA in whichthe PCs of between-x j variation are found with respect to a metric basedon within-x j variation (see Section 14.2.2). Even if there is no formal requirementto estimate a relationship such as (8.6.1), the last few PCs arestill of interest in finding near-constant linear relationships among a set ofvariables, as discussed in Section 3.4.When the last PC is used to estimate a ‘best-fitting’ relationship betweena set of (p + 1) variables, we are finding the p-dimensional hyperplane forwhich the sum of squares of perpendicular distances of the observationsfrom the hyperplane is minimized. This was, in fact, one of the objectives ofPearson’s (1901) original derivation of PCs (see Property G3 in Section 3.2).By contrast, if one of the (p + 1) variables y is a dependent variable andthe remaining p are predictor variables, then the ‘best-fitting’ hyperplane,in the least squares sense, minimizes the sum of squares of the distancesin the y direction of the observations from the hyperplane and leads to adifferent relationship.

190 8. Principal Components in Regression AnalysisA different way of using PCs in investigating structural relationshipsis illustrated by Rao (1964). In his example there are 20 variables correspondingto measurements of ‘absorbance’ made by a spectrophotometerat 20 different wavelengths. There are 54 observations of the 20 variables,corresponding to nine different spectrophotometers, each used under threeconditions on two separate days. The aim is to relate the absorbance measurementsto wavelengths; both are subject to measurement error, so thata structural relationship, rather than straightforward regression analysis, isof interest. In this example, the first PCs, rather than the last, proved to beuseful in investigating aspects of the structural relationship. Examinationof the values of the first two PCs for the 54 observations identified systematicdifferences between spectrophotometers in the measurement errorsfor wavelength. Other authors have used similar, but rather more complicated,ideas based on PCs for the same type of data. Naes (1985) refers tothe problem as one of multivariate calibration (see also Martens and Naes(1989)) and investigates an estimate (which uses PCs) of some chemicalor physical quantity, given a number of spectrophotometer measurements.Sylvestre et al. (1974) take as their objective the identification and estimationof mixtures of two or more overlapping curves in spectrophotometry,and again use PCs in their procedure.8.7 Examples of Principal Components inRegressionEarly examples of PC regression include those given by Kendall (1957,p. 71), Spurrell (1963) and Massy (1965). Examples of latent root regressionin one form or another, and its use in variable selection, are given byGunst et al. (1976), Gunst and Mason (1977b), Hawkins (1973), Baskervilleand Toogood (1982) and Hawkins and Eplett (1982). In Gunst and Mason(1980, Chapter 10) PC regression, latent root regression and ridge regressionare all illustrated, and can therefore be compared, for the same dataset. In the present section we discuss two examples illustrating some of thetechniques described in this chapter.8.7.1 Pitprop DataNo discussion of PC regression would be complete without the examplegiven originally by Jeffers (1967) concerning strengths of pitprops, whichhas since been analysed by several authors. The data consist of 14 variableswhich were measured for each of 180 pitprops cut from Corsican pinetimber. The objective is to construct a prediction equation for one of thevariables (compressive strength y) using the values of the other 13 variables.These other 13 variables are physical measurements on the pitprops that

8.6. Functional and Structural Relationships 189Consider the case where there are (p + 1) variables x 0 ,x 1 ,x 2 ,...,x p thathave a linear functional relationship (Kendall and Stuart, 1979, p. 416)p∑β j x j = const (8.6.1)j=0between them, but which are all subject to measurement error, so that weactually have observations on ξ 0 ,ξ 1 ,ξ 2 ,...,ξ p , whereξ j = x j + e j ,j =0, 1, 2,...,p,and e j is a measurement error term. The distinction between ‘functional’and ‘structural’ relationships is that x 1 ,x 2 ,...,x p are taken as fixed in theformer but are random variables in the latter. We have included (p +1)variables in order to keep a parallel with the case of linear regression withdependent variable y and p predictor variables x 1 ,x 2 ,...,x p , but there is noreason here to treat any one variable differently from the remaining p. Onthe basis of n observations on ξ j ,j=0, 1, 2,...,p, we wish to estimate thecoefficients β 0 ,β 1 ,...β p in the relationship (8.6.1). If the e j are assumed tobe normally distributed, and (the ratios of) their variances are known, thenmaximum likelihood estimation of β 0 ,β 1 ,...,β p leads to the coefficients ofthe last PC from the covariance matrix of ξ 0 /σ 0 ,ξ 1 /σ 1 ,...,ξ p /σ p , whereσ 2 j =var(e j). This holds for both functional and structural relationships. Ifthere is no information about the variances of the e j , and the x j are distinct,then no formal estimation procedure is possible, but if it is expected thatthe measurement errors of all (p + 1) variables are of similar variability,then a reasonable procedure is to use the last PC of ξ 0 ,ξ 1 ,...,ξ p .If replicate observations are available for each x j , they can be used toestimate var(e j ). In this case, Anderson (1984) shows that the maximumlikelihood estimates for functional, but not structural, relationships aregiven by solving an eigenequation, similar to a generalized PCA in whichthe PCs of between-x j variation are found with respect to a metric basedon within-x j variation (see Section 14.2.2). Even if there is no formal requirementto estimate a relationship such as (8.6.1), the last few PCs arestill of interest in finding near-constant linear relationships among a set ofvariables, as discussed in Section 3.4.When the last PC is used to estimate a ‘best-fitting’ relationship betweena set of (p + 1) variables, we are finding the p-dimensional hyperplane forwhich the sum of squares of perpendicular distances of the observationsfrom the hyperplane is minimized. This was, in fact, one of the objectives ofPearson’s (1901) original derivation of PCs (see Property G3 in Section 3.2).By contrast, if one of the (p + 1) variables y is a dependent variable andthe remaining p are predictor variables, then the ‘best-fitting’ hyperplane,in the least squares sense, minimizes the sum of squares of the distancesin the y direction of the observations from the hyperplane and leads to adifferent relationship.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!