Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

12.4. PCA and Non-Independent Data—Some Additional Topics 329series, rather than being restricted to real series. It turns out (Brillinger,1981, p. 344) thatB ′ u = 1 ∫ 2π˜B(λ)e iuλ dλ2πC u = 12π0∫ 2π0˜C(λ)e iuλ dλ,where ˜C(λ) isa(p × q) matrix whose columns are the first q eigenvectorsof the matrix F(λ) given in (12.1.4), and ˜B(λ) is the conjugate transposeof ˜C(λ).The q series that form the elements of z t are called the first q PC series ofx t . Brillinger (1981, Sections 9.3, 9.4) discusses various properties and estimatesof these PC series, and gives an example in Section 9.6 on monthlytemperature measurements at 14 meteorological stations. Principal componentanalysis in the frequency domain has also been used on economictime series, for example on Dutch provincial unemployment data (Bartels,1977, Section 7.7).There is a connection between frequency domain PCs and PCs definedin the time domain (Brillinger, 1981, Section 9.5). The connection involvesHilbert transforms and hence, as noted in Section 12.2.3, frequency domainPCA has links to HEOF analysis. Define the vector of variablesyt H (λ) = (x ′ t(λ), x ′Ht (λ)) ′ , where x t (λ) is the contribution to x t at frequencyλ (Brillinger, 1981, Section 4.6), and x H t (λ) is its Hilbert transform.Then the covariance matrix of yt H (λ) is proportional to[ ]Re(F(λ)) Im(F(λ)),− Im(F(λ)) Re(F(λ))where the functions Re(.), Im(.) denote the real and imaginary parts, respectively,of their argument. A PCA of ytH gives eigenvalues that are theeigenvalues of F(λ) with a corresponding pair of eigenvectors[ ] [ ]Re( ˜Cj (λ)) − Im( ˜Cj (λ))Im( ˜C ,j (λ)) Re( ˜C ,j (λ))where ˜C j (λ) isthejth column of ˜C(λ).Horel (1984) interprets HEOF analysis as frequency domain PCA averagedover all frequency bands. When a single frequency of oscillationdominates the variation in a time series, the two techniques become thesame. The averaging over frequencies of HEOF analysis is presumably thereason behind Plaut and Vautard’s (1994) claim that it is less good thanMSSA at distinguishing propagating patterns with different frequencies.Preisendorfer and Mobley (1988) describe a number of ways in whichPCA is combined with a frequency domain approach. Their Section 4ediscusses the use of PCA after a vector field has been transformed intothe frequency domain using Fourier analysis, and for scalar-valued fields

330 12. PCA for Time Series and Other Non-Independent Datatheir Chapter 12 examines various combinations of real and complex-valuedharmonic analysis with PCA.Stoffer (1999) describes a different type of frequency domain PCA, whichhe calls the spectral envelope. Here a PCA is done on the spectral matrixF(λ) relative to the time domain covariance matrix Γ 0 .Thisisaformofgeneralized PCA for F(λ) with Γ 0 as a metric (see Section 14.2.2), andleads to solving the eigenequation [F(λ) − l(λ)Γ 0 ]a(λ) = 0 for varyingangular frequency λ. Stoffer (1999) advocates the method as a way of discoveringwhether the p series x 1 (t),x 2 (t),...x p (t) share common signalsand illustrates its use on two data sets involving pain perception and bloodpressure.The idea of cointegration is important in econometrics. It has a technicaldefinition, but can essentially be described as follows. Suppose that theelements of the p-variate time series x t are stationary after, but not before,differencing. If there are one or more vectors α such that α ′ x t is stationarywithout differencing, the p series are cointegrated. Tests for cointegrationbased on the variances of frequency domain PCs have been put forward bya number of authors. For example, Cubadda (1995) points out problemswith previously defined tests and suggests a new one.12.4.2 Growth Curves and Longitudinal DataA common type of data that take the form of curves, even if they are notnecessarily recorded as such, consists of measurements of growth for animalsor children. Some curves such as heights are monotonically increasing, butothers such as weights need not be. The idea of using principal componentsto summarize the major sources of variation in a set of growth curves datesback to Rao (1958), and several of the examples in Ramsay and Silverman(1997) are of this type. Analyses of growth curves are often concerned withpredicting future growth, and one way of doing this is to use principalcomponents as predictors. A form of generalized PC regression developedfor this purpose is described by Rao (1987).Caussinus and Ferré (1992) use PCA in a different type of analysis ofgrowth curves. They consider a 7-parameter model for a set of curves, andestimate the parameters of the model separately for each curve. These 7-parameter estimates are then taken as values of 7 variables to be analyzedby PCA. A two-dimensional plot in the space of the first two PCs givesa representation of the relative similarities between members of the set ofcurves. Because the parameters are not estimated with equal precision, aweighted version of PCA is used, based on the fixed effects model describedin Section 3.9.Growth curves constitute a special case of longitudinal data, also knownas ‘repeated measures,’ where measurements are taken on a number of individualsat several different points of time. Berkey et al. (1991) use PCAto model such data, calling their model a ‘longitudinal principal compo-

330 12. PCA for Time Series and Other Non-Independent Datatheir Chapter 12 examines various combinations of real and complex-valuedharmonic analysis with PCA.Stoffer (1999) describes a different type of frequency domain PCA, whichhe calls the spectral envelope. Here a PCA is done on the spectral matrixF(λ) relative to the time domain covariance matrix Γ 0 .Thisisaformofgeneralized PCA for F(λ) with Γ 0 as a metric (see Section 14.2.2), andleads to solving the eigenequation [F(λ) − l(λ)Γ 0 ]a(λ) = 0 for varyingangular frequency λ. Stoffer (1999) advocates the method as a way of discoveringwhether the p series x 1 (t),x 2 (t),...x p (t) share common signalsand illustrates its use on two data sets involving pain perception and bloodpressure.The idea of cointegration is important in econometrics. It has a technicaldefinition, but can essentially be described as follows. Suppose that theelements of the p-variate time series x t are stationary after, but not before,differencing. If there are one or more vectors α such that α ′ x t is stationarywithout differencing, the p series are cointegrated. Tests for cointegrationbased on the variances of frequency domain PCs have been put forward bya number of authors. For example, Cubadda (1995) points out problemswith previously defined tests and suggests a new one.12.4.2 Growth Curves and Longitudinal DataA common type of data that take the form of curves, even if they are notnecessarily recorded as such, consists of measurements of growth for animalsor children. Some curves such as heights are monotonically increasing, butothers such as weights need not be. The idea of using principal componentsto summarize the major sources of variation in a set of growth curves datesback to Rao (1958), and several of the examples in Ramsay and Silverman(1997) are of this type. Analyses of growth curves are often concerned withpredicting future growth, and one way of doing this is to use principalcomponents as predictors. A form of generalized PC regression developedfor this purpose is described by Rao (1987).Caussinus and Ferré (1992) use PCA in a different type of analysis ofgrowth curves. They consider a 7-parameter model for a set of curves, andestimate the parameters of the model separately for each curve. These 7-parameter estimates are then taken as values of 7 variables to be analyzedby PCA. A two-dimensional plot in the space of the first two PCs givesa representation of the relative similarities between members of the set ofcurves. Because the parameters are not estimated with equal precision, aweighted version of PCA is used, based on the fixed effects model describedin Section 3.9.Growth curves constitute a special case of longitudinal data, also knownas ‘repeated measures,’ where measurements are taken on a number of individualsat several different points of time. Berkey et al. (1991) use PCAto model such data, calling their model a ‘longitudinal principal compo-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!