Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
12.1. Introduction 301series is almost a pure oscillation with angular frequency λ 0 , then f(λ) islarge for λ close to λ 0 and near zero elsewhere. This behaviour is signalled inthe autocovariances by a large value of γ k at k = k 0 , where k 0 is the periodof oscillation corresponding to angular frequency λ 0 (that is k 0 =2π/λ 0 ),and small values elsewhere.Because there are two different but equivalent functions (12.1.1) and(12.1.2) expressing the second-order behaviour of a time series, there aretwo different types of analysis of time series, namely in the time domainusing (12.1.1) and in the frequency domain using (12.1.2).Consider now a time series that consists not of a single variable, but pvariables. The definitions (12.1.1), (12.1.2) generalize readily towhereandΓ k = E[(x i − µ)(x i+k − µ) ′ ], (12.1.3)F(λ) = 12πµ = E[x i ]∞∑k=−∞Γ k e −ikλ (12.1.4)The mean µ is now a p-element vector, and Γ k , F(λ) are(p × p) matrices.Principal component analysis operates on a covariance or correlationmatrix, but in time series we can calculate not only covariances betweenvariables measured at the same time (the usual definition of covariance,which is given by the matrix Γ 0 defined in (12.1.3)), but also covariancesbetween variables at different times, as measured by Γ k ,k ≠0.Thisisincontrast to the more usual situation where our observations x 1 , x 2 ,... areindependent, so that any covariances between elements of x i , x j are zerowhen i ≠ j. In addition to the choice of which Γ k to examine, the fact thatthe covariances have an alternative representation in the frequency domainmeans that there are several different ways in which PCA can be appliedto time series data.Before looking at specific techniques, we define the terms ‘white noise’and ‘red noise.’ A white noise series is one whose terms are all identicallydistributed and independent of each other. Its spectrum is flat, like that ofwhite light; hence its name. Red noise is equivalent to a series that followsa positively autocorrelated first-order autoregressive modelx t = φx t−1 + ɛ t , t = ...0, 1, 2 ...,where φ is a constant such that 0
302 12. PCA for Time Series and Other Non-Independent Dataencountered in this area, with observations corresponding to times andvariables to spatial position, they are not necessarily restricted to suchdata.Time series are usually measured at discrete points in time, but sometimesthe series are curves. The analysis of such data is known as functionaldata analysis (functional PCA is the subject of Section 12.3). The final sectionof the chapter collects together a number of largely unconnected ideasand references concerning PCA in the context of time series and othernon-independent data.12.2 PCA-Related Techniques for (Spatio-)Temporal Atmospheric Science DataIt was noted in Section 4.3 that, for a common type of data in atmosphericscience, the use of PCA, more often referred to as empirical orthogonalfunction (EOF) analysis, is widespread. The data concerned consist of measurementsof some variable, for example, sea level pressure, temperature,...,atp spatial locations (usually points on a grid) at n different times. Themeasurements at different spatial locations are treated as variables and thetime points play the rôle of observations. An example of this type was givenin Section 4.3. It is clear that, unless the observations are well-separatedin time, there is likely to be correlation between measurements at adjacenttime points, so that we have non-independence between observations. Severaltechniques have been developed for use in atmospheric science that takeaccount of correlation in both time and space, and these will be describedin this section. First, however, we start with the simpler situation wherethere is a single time series. Here we can use a principal component-liketechnique, called singular spectrum analysis (SSA), to analyse the autocorrelationin the series. SSA is described in Section 12.2.1, as is its extensionto several time series, multichannel singular spectrum analysis (MSSA).Suppose that a set of p series follows a multivariate first-order autoregressivemodel in which the values of the series at time t are linearly relatedto the values at time (t − 1), except for a multivariate white noise term.An estimate of the matrix defining the linear relationship can be subjectedto an eigenanalysis, giving insight into the structure of the series. Such ananalysis is known as principal oscillation pattern (POP) analysis, and isdiscussed in Section 12.2.2.One idea underlying POP analysis is that there may be patterns in themaps comprising our data set, which travel in space as time progresses, andthat POP analysis can help to find such patterns. Complex (Hilbert) empiricalorthogonal functions (EOFs), which are described in Section 12.2.3,are designed to achieve the same objective. Detection of detailed oscillatorybehaviour is also the aim of multitaper frequency-domain singularvalue decomposition, which is the subject of Section 12.2.4.
- Page 282 and 283: 10.2. Influential Observations in a
- Page 284 and 285: 10.2. Influential Observations in a
- Page 286 and 287: 10.2. Influential Observations in a
- Page 288 and 289: 10.2. Influential Observations in a
- Page 290 and 291: 10.3. Sensitivity and Stability 259
- Page 292 and 293: 10.3. Sensitivity and Stability 261
- Page 294 and 295: 10.4. Robust Estimation of Principa
- Page 296 and 297: 10.4. Robust Estimation of Principa
- Page 298 and 299: 10.4. Robust Estimation of Principa
- Page 300 and 301: 11Rotation and Interpretation ofPri
- Page 302 and 303: 11.1. Rotation of Principal Compone
- Page 304 and 305: oot of the corresponding eigenvalue
- Page 306 and 307: 11.1. Rotation of Principal Compone
- Page 308 and 309: 11.1. Rotation of Principal Compone
- Page 310 and 311: 11.2. Alternatives to Rotation 279w
- Page 312 and 313: 11.2. Alternatives to Rotation 281F
- Page 314 and 315: 11.2. Alternatives to Rotation 283F
- Page 316 and 317: 11.2. Alternatives to Rotation 285T
- Page 318 and 319: 11.2. Alternatives to Rotation 287T
- Page 320 and 321: 11.2. Alternatives to Rotation 289A
- Page 322 and 323: 11.2. Alternatives to Rotation 291
- Page 324 and 325: 11.3. Simplified Approximations to
- Page 326 and 327: 11.3. Simplified Approximations to
- Page 328 and 329: 11.4. Physical Interpretation of Pr
- Page 330 and 331: 12Principal Component Analysis forT
- Page 334 and 335: 12.2. PCA and Atmospheric Time Seri
- Page 336 and 337: 12.2. PCA and Atmospheric Time Seri
- Page 338 and 339: and a typical row of the matrix is1
- Page 340 and 341: 12.2. PCA and Atmospheric Time Seri
- Page 342 and 343: 12.2. PCA and Atmospheric Time Seri
- Page 344 and 345: 12.2. PCA and Atmospheric Time Seri
- Page 346 and 347: 12.2. PCA and Atmospheric Time Seri
- Page 348 and 349: 12.3. Functional PCA 317A key refer
- Page 350 and 351: 12.3. Functional PCA 319The sample
- Page 352 and 353: 12.3. Functional PCA 321speed (mete
- Page 354 and 355: 12.3. Functional PCA 323of the data
- Page 356 and 357: 12.3. Functional PCA 325subject to
- Page 358 and 359: 12.3. Functional PCA 327series than
- Page 360 and 361: 12.4. PCA and Non-Independent Data
- Page 362 and 363: 12.4. PCA and Non-Independent Data
- Page 364 and 365: 12.4. PCA and Non-Independent Data
- Page 366 and 367: 12.4. PCA and Non-Independent Data
- Page 368 and 369: 12.4. PCA and Non-Independent Data
- Page 370 and 371: 13.1. Principal Component Analysis
- Page 372 and 373: 13.1. Principal Component Analysis
- Page 374 and 375: 13.2. Analysis of Size and Shape 34
- Page 376 and 377: 13.2. Analysis of Size and Shape 34
- Page 378 and 379: 13.3. Principal Component Analysis
- Page 380 and 381: 13.3. Principal Component Analysis
12.1. Introduction 301series is almost a pure oscillation with angular frequency λ 0 , then f(λ) islarge for λ close to λ 0 and near zero elsewhere. This behaviour is signalled inthe autocovariances by a large value of γ k at k = k 0 , where k 0 is the periodof oscillation corresponding to angular frequency λ 0 (that is k 0 =2π/λ 0 ),and small values elsewhere.Because there are two different but equivalent functions (12.1.1) and(12.1.2) expressing the second-order behaviour of a time series, there aretwo different types of analysis of time series, namely in the time domainusing (12.1.1) and in the frequency domain using (12.1.2).Consider now a time series that consists not of a single variable, but pvariables. The definitions (12.1.1), (12.1.2) generalize readily towhereandΓ k = E[(x i − µ)(x i+k − µ) ′ ], (12.1.3)F(λ) = 12πµ = E[x i ]∞∑k=−∞Γ k e −ikλ (12.1.4)The mean µ is now a p-element vector, and Γ k , F(λ) are(p × p) matrices.<strong>Principal</strong> component analysis operates on a covariance or correlationmatrix, but in time series we can calculate not only covariances betweenvariables measured at the same time (the usual definition of covariance,which is given by the matrix Γ 0 defined in (12.1.3)), but also covariancesbetween variables at different times, as measured by Γ k ,k ≠0.Thisisincontrast to the more usual situation where our observations x 1 , x 2 ,... areindependent, so that any covariances between elements of x i , x j are zerowhen i ≠ j. In addition to the choice of which Γ k to examine, the fact thatthe covariances have an alternative representation in the frequency domainmeans that there are several different ways in which PCA can be appliedto time series data.Before looking at specific techniques, we define the terms ‘white noise’and ‘red noise.’ A white noise series is one whose terms are all identicallydistributed and independent of each other. Its spectrum is flat, like that ofwhite light; hence its name. Red noise is equivalent to a series that followsa positively autocorrelated first-order autoregressive modelx t = φx t−1 + ɛ t , t = ...0, 1, 2 ...,where φ is a constant such that 0