Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

and a typical row of the matrix is12.2. PCA and Atmospheric Time Series 307x ′ i =(x i1 ,x (i+1)1 ,...,x (i+m−1)1 ,x i2 ,...,x (i+m−1)2 ,...,x (i+m−1)p ),i =1, 2,...,n ′ , where x ij is the value of the measured variable at the ithtime point and the jth spatial location, and m plays the same rôle in MSSAas p does in SSA. The covariance matrix for this data matrix has the form⎡⎤S 11 S 12 ··· S 1pS 21 S 22 ··· S 2p⎢⎣.⎥.⎦ ,S p1 S p2 ··· S ppwhere S kk is an (m × m) covariance matrix at various lags for the kthvariable (location), with the same structure as the covariance matrix inan SSA of that variable. The off-diagonal matrices S kl , k ≠ l, have(i, j)thelement equal to the covariance between locations k and l at time lag |i−j|.Plaut and Vautard (1994) claim that the ‘fundamental property’ of MSSAis its ability to detect oscillatory behaviour in the same manner as SSA, butrather than an oscillation of a single series the technique finds oscillatoryspatial patterns. Furthermore, it is capable of finding oscillations with thesame period but different spatially orthogonal patterns, and oscillationswith the same spatial pattern but different periods.The same problem of ascertaining ‘significance’ arises for MSSA as inSSA. Allen and Robertson (1996) tackle this problem in a similar mannerto that adopted by Allen and Smith (1996) for SSA. The null hypothesishere extends one-dimensional ‘red noise’ to a set of p independentAR(1) processes. A general multivariate AR(1) process is not appropriateas it can itself exhibit oscillatory behaviour, as exemplified in POP analysis(Section 12.2.2).MSSA extends SSA from one time series to several, but if the number oftime series p is large, it can become unmanageable. A solution, which is usedby Benzi et al. (1997), is to carry out PCA on the (n × p) data matrix, andthen implement SSA separately on the first few PCs. Alternatively for largep, MSSA is often performed on the first few PCs instead of the variablesthemselves, as in Plaut and Vautard (1994).Although MSSA is a natural extension of SSA, it is also equivalent toextended empirical orthogonal function (EEOF) analysis which was introducedindependently of SSA by Weare and Nasstrom (1982). Barnett andHasselmann (1979) give an even more general analysis, in which differentmeteorological variables, as well as or instead of different time lags, may beincluded at the various locations. When different variables replace differenttime lags, the temporal correlation in the data is no longer taken intoaccount, so further discussion is deferred to Section 14.5.The general technique, including both time lags and several variables,is referred to as multivariate EEOF (MEEOF) analysis by Mote et al.

308 12. PCA for Time Series and Other Non-Independent Data(2000), who give an example of the technique for five variables, and comparethe results to those of separate EEOFs (MSSAs) for each variable. Moteand coworkers note that it is possible that some of the dominant MEEOFpatterns may not be dominant in any of the individual EEOF analyses,and this may viewed as a disadvantage of the method. On the other hand,MEEOF analysis has the advantage of showing directly the connectionsbetween patterns for the different variables. Discussion of the properties ofMSSA and MEEOF analysis is ongoing (in addition to Mote et al. (2000),see Monahan et al. (1999), for example). Compagnucci et al. (2001) proposeyet another variation on the same theme. In their analysis, the PCA is doneon the transpose of the matrix used in MSSA, a so-called T-mode instead ofS-mode analysis (see Section 14.5). Compagnucci et al. call their techniqueprincipal sequence pattern analysis.12.2.2 Principal Oscillation Pattern (POP) AnalysisSSA, MSSA, and other techniques described in this chapter can be viewedas special cases of PCA, once the variables have been defined in a suitableway. With the chosen definition of the variables, the procedures performan eigenanalysis of a covariance matrix. POP analysis is different, but itis described briefly here because its results are used for similar purposesto those of some of the PCA-based techniques for time series includedelsewhere in the chapter. Furthermore its core is an eigenanalysis, albeitnot on a covariance matrix.POP analysis was introduced by Hasselman (1988). Suppose that we havethe usual (n×p) matrix of measurements on a meteorological variable, takenat n time points and p spatial locations. POP analysis has an underlyingassumption that the p time series can be modelled as a multivariate firstorderautoregressive process. If x ′ t is the tth row of the data matrix, wehave(x (t+1) − µ) =Υ(x t − µ)+ɛ t , t =1, 2,...,(n − 1), (12.2.1)where Υ is a (p × p) matrix of constants, µ is a vector of means for the pvariables and ɛ t is a multivariate white noise term. Standard results frommultivariate regression analysis (Mardia et al., 1979, Chapter 6) lead to estimationof Υ by ˆΥ = S 1 S −10 , where S 0 is the usual sample covariance matrixfor the p variables, and S 1 has (i, j)th element equal to the sample covariancebetween the ith and jth variables at lag 1. POP analysis then finds theeigenvalues and eigenvectors of ˆΥ. The eigenvectors are known as principaloscillation patterns (POPs) and denoted p 1 , p 2 ,...,p p . The quantitiesz t1 ,z t2 ,...,z tp which can be used to reconstitute x t as ∑ pk=1 z tkp k arecalled the POP coefficients. They play a similar rôle in POP analysis tothat of PC scores in PCA.One obvious question is why this technique is called principal oscillationpattern analysis. Because ˆΥ is not symmetric it typically has a

and a typical row of the matrix is12.2. PCA and Atmospheric Time Series 307x ′ i =(x i1 ,x (i+1)1 ,...,x (i+m−1)1 ,x i2 ,...,x (i+m−1)2 ,...,x (i+m−1)p ),i =1, 2,...,n ′ , where x ij is the value of the measured variable at the ithtime point and the jth spatial location, and m plays the same rôle in MSSAas p does in SSA. The covariance matrix for this data matrix has the form⎡⎤S 11 S 12 ··· S 1pS 21 S 22 ··· S 2p⎢⎣.⎥.⎦ ,S p1 S p2 ··· S ppwhere S kk is an (m × m) covariance matrix at various lags for the kthvariable (location), with the same structure as the covariance matrix inan SSA of that variable. The off-diagonal matrices S kl , k ≠ l, have(i, j)thelement equal to the covariance between locations k and l at time lag |i−j|.Plaut and Vautard (1994) claim that the ‘fundamental property’ of MSSAis its ability to detect oscillatory behaviour in the same manner as SSA, butrather than an oscillation of a single series the technique finds oscillatoryspatial patterns. Furthermore, it is capable of finding oscillations with thesame period but different spatially orthogonal patterns, and oscillationswith the same spatial pattern but different periods.The same problem of ascertaining ‘significance’ arises for MSSA as inSSA. Allen and Robertson (1996) tackle this problem in a similar mannerto that adopted by Allen and Smith (1996) for SSA. The null hypothesishere extends one-dimensional ‘red noise’ to a set of p independentAR(1) processes. A general multivariate AR(1) process is not appropriateas it can itself exhibit oscillatory behaviour, as exemplified in POP analysis(Section 12.2.2).MSSA extends SSA from one time series to several, but if the number oftime series p is large, it can become unmanageable. A solution, which is usedby Benzi et al. (1997), is to carry out PCA on the (n × p) data matrix, andthen implement SSA separately on the first few PCs. Alternatively for largep, MSSA is often performed on the first few PCs instead of the variablesthemselves, as in Plaut and Vautard (1994).Although MSSA is a natural extension of SSA, it is also equivalent toextended empirical orthogonal function (EEOF) analysis which was introducedindependently of SSA by Weare and Nasstrom (1982). Barnett andHasselmann (1979) give an even more general analysis, in which differentmeteorological variables, as well as or instead of different time lags, may beincluded at the various locations. When different variables replace differenttime lags, the temporal correlation in the data is no longer taken intoaccount, so further discussion is deferred to Section 14.5.The general technique, including both time lags and several variables,is referred to as multivariate EEOF (MEEOF) analysis by Mote et al.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!