Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

4.3. Spatial and Temporal Variation in Atmospheric Science 71Interpretations of the first 11 PCs for the two age groups are given inTable 4.4, together with the percentage of total variation accounted for byeach PC. The variances of corresponding PCs for the two age groups differvery little, and there are similar interpretations for several pairs of PCs, forexample the first, second, sixth and eighth. In other cases there are groupsof PCs involving the same variables, but in different combinations for thetwo age groups, for example the third, fourth and fifth PCs. Similarly, theninth and tenth PCs involve the same variables for the two age groups, butthe order of the PCs is reversed.Principal component analysis has also been found useful in other demographicstudies, one of the earliest being that described by Moser andScott (1961). In this study, there were 57 demographic variables measuredfor 157 British towns. A PCA of these data showed that, unlike the elderlydata, dimensionality could be vastly reduced; there are 57 variables, butas few as four PCs account for 63% of the total variation. These PCs alsohave ready interpretations as measures of social class, population growthfrom 1931 to 1951, population growth after 1951, and overcrowding.Similar studies have been done on local authority areas in the UK byImber (1977) and Webber and Craig (1978) (see also Jolliffe et al. (1986)).In each of these studies, as well as Moser and Scott (1961) and the ‘elderly athome’ project, the main objective was to classify the local authorities, townsor elderly individuals, and the PCA was done as a prelude to, or as partof, cluster analysis. The use of PCA in cluster analysis is discussed furtherin Section 9.2, but the PCA in each study mentioned here provided usefulinformation, separate from the results of the cluster analysis, For example,Webber and Craig (1978) used 40 variables, and they were able to interpretthe first four PCs as measuring social dependence, family structure, agestructure and industrial employment opportunity. These four componentsaccounted for 29.5%, 22.7%, 12.0% and 7.4% of total variation, respectively,so that 71.6% of the total variation is accounted for in four interpretabledimensions.4.3 Spatial and Temporal Variation inAtmospheric SciencePrincipal component analysis provides a widely used method of describingpatterns of pressure, temperature, or other meteorological variables over alarge spatial area. For example, Richman (1983) stated that, over the previous3 years, more than 60 applications of PCA, or similar techniques, hadappeared in meteorological/climatological journals. More recently, 53 outof 215 articles in the 1999 and 2000 volumes of the International Journal ofClimatology used PCA in some form. No other statistical technique cameclose to this 25% rate of usage. The example considered in detail in this

72 4. Interpreting Principal Components: Examplessection is taken from Maryon (1979) and is concerned with sea level atmosphericpressure fields, averaged over half-month periods, for most of theNorthern Hemisphere. There were 1440 half-months, corresponding to 60years between 1900 and 1974, excluding the years 1916–21, 1940–48 whendata were inadequate. The pressure fields are summarized by estimatingaverage pressure at p = 221 grid points covering the Northern Hemisphereso that the data set consists of 1440 observations on 221 variables. Datasets of this size, or larger, are commonplace in atmospheric science, anda standard procedure is to replace the variables by a few large-variancePCs. The eigenvectors that define the PCs are often known as empiricalorthogonal functions (EOFs) in the meteorological or climatological literature,and the values of the PCs (the PC scores) are sometimes referredto as amplitude time series (Rasmusson et al., 1981) or, confusingly, ascoefficients (Maryon, 1979) or EOF coefficients (von Storch and Zwiers,1999, Chapter 13). Richman (1986) distinguishes between EOF analysisand PCA, with the former having unit-length eigenvectors and the latterhaving eigenvectors renormalized, as in (2.3.2), to have lengths proportionalto their respective eigenvalues. Other authors, such as von Storchand Zwiers (1999) treat PCA and EOF analysis as synonymous.For each PC, there is a coefficient (in the usual sense of the word), orloading, for each variable, and because variables are gridpoints (geographicallocations) it is possible to plot each loading (coefficient) on a map atits corresponding gridpoint, and then draw contours through geographicallocations having the same coefficient values. The map representation cangreatly aid interpretation, as is illustrated in Figure 4.1.This figure, which comes from Maryon (1979), gives the map of coefficients,arbitrarily renormalized to give ‘round numbers’ on the contours,for the second PC from the pressure data set described above, and is mucheasier to interpret than would be the corresponding table of 221 coefficients.Half-months having large positive scores for this PC will tend to have highvalues of the variables, that is high pressure values, where coefficients on themap are positive, and low values of the variables (low pressure values) atgridpoints where coefficients are negative. In Figure 4.1 this corresponds tolow pressure in the polar regions and high pressure in the subtropics, leadingto situations where there is a strong westerly flow in high latitudes atmost longitudes. This is known as strong zonal flow, a reasonably frequentmeteorological phenomenon, and the second PC therefore contrasts halfmonthswith strong zonal flow with those of opposite character. Similarly,the first PC (not shown) has one of its extremes identified as correspondingto an intense high pressure area over Asia and such situations are again afairly frequent occurrence, although only in winter.Several other PCs in Maryon’s (1979) study can also be interpreted ascorresponding to recognizable meteorological situations, especially whencoefficients are plotted in map form. The use of PCs to summarize pressurefields and other meteorological or climatological fields has been found

72 4. Interpreting <strong>Principal</strong> <strong>Component</strong>s: Examplessection is taken from Maryon (1979) and is concerned with sea level atmosphericpressure fields, averaged over half-month periods, for most of theNorthern Hemisphere. There were 1440 half-months, corresponding to 60years between 1900 and 1974, excluding the years 1916–21, 1940–48 whendata were inadequate. The pressure fields are summarized by estimatingaverage pressure at p = 221 grid points covering the Northern Hemisphereso that the data set consists of 1440 observations on 221 variables. Datasets of this size, or larger, are commonplace in atmospheric science, anda standard procedure is to replace the variables by a few large-variancePCs. The eigenvectors that define the PCs are often known as empiricalorthogonal functions (EOFs) in the meteorological or climatological literature,and the values of the PCs (the PC scores) are sometimes referredto as amplitude time series (Rasmusson et al., 1981) or, confusingly, ascoefficients (Maryon, 1979) or EOF coefficients (von Storch and Zwiers,1999, Chapter 13). Richman (1986) distinguishes between EOF analysisand PCA, with the former having unit-length eigenvectors and the latterhaving eigenvectors renormalized, as in (2.3.2), to have lengths proportionalto their respective eigenvalues. Other authors, such as von Storchand Zwiers (1999) treat PCA and EOF analysis as synonymous.For each PC, there is a coefficient (in the usual sense of the word), orloading, for each variable, and because variables are gridpoints (geographicallocations) it is possible to plot each loading (coefficient) on a map atits corresponding gridpoint, and then draw contours through geographicallocations having the same coefficient values. The map representation cangreatly aid interpretation, as is illustrated in Figure 4.1.This figure, which comes from Maryon (1979), gives the map of coefficients,arbitrarily renormalized to give ‘round numbers’ on the contours,for the second PC from the pressure data set described above, and is mucheasier to interpret than would be the corresponding table of 221 coefficients.Half-months having large positive scores for this PC will tend to have highvalues of the variables, that is high pressure values, where coefficients on themap are positive, and low values of the variables (low pressure values) atgridpoints where coefficients are negative. In Figure 4.1 this corresponds tolow pressure in the polar regions and high pressure in the subtropics, leadingto situations where there is a strong westerly flow in high latitudes atmost longitudes. This is known as strong zonal flow, a reasonably frequentmeteorological phenomenon, and the second PC therefore contrasts halfmonthswith strong zonal flow with those of opposite character. Similarly,the first PC (not shown) has one of its extremes identified as correspondingto an intense high pressure area over Asia and such situations are again afairly frequent occurrence, although only in winter.Several other PCs in Maryon’s (1979) study can also be interpreted ascorresponding to recognizable meteorological situations, especially whencoefficients are plotted in map form. The use of PCs to summarize pressurefields and other meteorological or climatological fields has been found

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!