Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)
4.3. Spatial and Temporal Variation in Atmospheric Science 71Interpretations of the first 11 PCs for the two age groups are given inTable 4.4, together with the percentage of total variation accounted for byeach PC. The variances of corresponding PCs for the two age groups differvery little, and there are similar interpretations for several pairs of PCs, forexample the first, second, sixth and eighth. In other cases there are groupsof PCs involving the same variables, but in different combinations for thetwo age groups, for example the third, fourth and fifth PCs. Similarly, theninth and tenth PCs involve the same variables for the two age groups, butthe order of the PCs is reversed.Principal component analysis has also been found useful in other demographicstudies, one of the earliest being that described by Moser andScott (1961). In this study, there were 57 demographic variables measuredfor 157 British towns. A PCA of these data showed that, unlike the elderlydata, dimensionality could be vastly reduced; there are 57 variables, butas few as four PCs account for 63% of the total variation. These PCs alsohave ready interpretations as measures of social class, population growthfrom 1931 to 1951, population growth after 1951, and overcrowding.Similar studies have been done on local authority areas in the UK byImber (1977) and Webber and Craig (1978) (see also Jolliffe et al. (1986)).In each of these studies, as well as Moser and Scott (1961) and the ‘elderly athome’ project, the main objective was to classify the local authorities, townsor elderly individuals, and the PCA was done as a prelude to, or as partof, cluster analysis. The use of PCA in cluster analysis is discussed furtherin Section 9.2, but the PCA in each study mentioned here provided usefulinformation, separate from the results of the cluster analysis, For example,Webber and Craig (1978) used 40 variables, and they were able to interpretthe first four PCs as measuring social dependence, family structure, agestructure and industrial employment opportunity. These four componentsaccounted for 29.5%, 22.7%, 12.0% and 7.4% of total variation, respectively,so that 71.6% of the total variation is accounted for in four interpretabledimensions.4.3 Spatial and Temporal Variation inAtmospheric SciencePrincipal component analysis provides a widely used method of describingpatterns of pressure, temperature, or other meteorological variables over alarge spatial area. For example, Richman (1983) stated that, over the previous3 years, more than 60 applications of PCA, or similar techniques, hadappeared in meteorological/climatological journals. More recently, 53 outof 215 articles in the 1999 and 2000 volumes of the International Journal ofClimatology used PCA in some form. No other statistical technique cameclose to this 25% rate of usage. The example considered in detail in this
72 4. Interpreting Principal Components: Examplessection is taken from Maryon (1979) and is concerned with sea level atmosphericpressure fields, averaged over half-month periods, for most of theNorthern Hemisphere. There were 1440 half-months, corresponding to 60years between 1900 and 1974, excluding the years 1916–21, 1940–48 whendata were inadequate. The pressure fields are summarized by estimatingaverage pressure at p = 221 grid points covering the Northern Hemisphereso that the data set consists of 1440 observations on 221 variables. Datasets of this size, or larger, are commonplace in atmospheric science, anda standard procedure is to replace the variables by a few large-variancePCs. The eigenvectors that define the PCs are often known as empiricalorthogonal functions (EOFs) in the meteorological or climatological literature,and the values of the PCs (the PC scores) are sometimes referredto as amplitude time series (Rasmusson et al., 1981) or, confusingly, ascoefficients (Maryon, 1979) or EOF coefficients (von Storch and Zwiers,1999, Chapter 13). Richman (1986) distinguishes between EOF analysisand PCA, with the former having unit-length eigenvectors and the latterhaving eigenvectors renormalized, as in (2.3.2), to have lengths proportionalto their respective eigenvalues. Other authors, such as von Storchand Zwiers (1999) treat PCA and EOF analysis as synonymous.For each PC, there is a coefficient (in the usual sense of the word), orloading, for each variable, and because variables are gridpoints (geographicallocations) it is possible to plot each loading (coefficient) on a map atits corresponding gridpoint, and then draw contours through geographicallocations having the same coefficient values. The map representation cangreatly aid interpretation, as is illustrated in Figure 4.1.This figure, which comes from Maryon (1979), gives the map of coefficients,arbitrarily renormalized to give ‘round numbers’ on the contours,for the second PC from the pressure data set described above, and is mucheasier to interpret than would be the corresponding table of 221 coefficients.Half-months having large positive scores for this PC will tend to have highvalues of the variables, that is high pressure values, where coefficients on themap are positive, and low values of the variables (low pressure values) atgridpoints where coefficients are negative. In Figure 4.1 this corresponds tolow pressure in the polar regions and high pressure in the subtropics, leadingto situations where there is a strong westerly flow in high latitudes atmost longitudes. This is known as strong zonal flow, a reasonably frequentmeteorological phenomenon, and the second PC therefore contrasts halfmonthswith strong zonal flow with those of opposite character. Similarly,the first PC (not shown) has one of its extremes identified as correspondingto an intense high pressure area over Asia and such situations are again afairly frequent occurrence, although only in winter.Several other PCs in Maryon’s (1979) study can also be interpreted ascorresponding to recognizable meteorological situations, especially whencoefficients are plotted in map form. The use of PCs to summarize pressurefields and other meteorological or climatological fields has been found
- Page 52 and 53: 2.3. Principal Components Using a C
- Page 54 and 55: 2.3. Principal Components Using a C
- Page 56 and 57: 2.3. Principal Components Using a C
- Page 58 and 59: 2.4. Principal Components with Equa
- Page 60 and 61: 3Mathematical and StatisticalProper
- Page 62 and 63: where3.1. Optimal Algebraic Propert
- Page 64 and 65: 3.2. Geometric Properties of Sample
- Page 66 and 67: 3.2. Geometric Properties of Sample
- Page 68 and 69: 3.2. Geometric Properties of Sample
- Page 70 and 71: 3.3. Covariance and Correlation Mat
- Page 72 and 73: 3.3. Covariance and Correlation Mat
- Page 74 and 75: 3.4. Principal Components with Equa
- Page 76 and 77: show that X = ULA ′ .⎡ULA ′ =
- Page 78 and 79: 3.6. Probability Distributions for
- Page 80 and 81: 3.7. Inference Based on Sample Prin
- Page 82 and 83: 3.7.2 Interval Estimation3.7. Infer
- Page 84 and 85: 3.7. Inference Based on Sample Prin
- Page 86 and 87: 3.7. Inference Based on Sample Prin
- Page 88 and 89: 3.8. Patterned Covariance and Corre
- Page 90 and 91: 3.9. Models for Principal Component
- Page 92 and 93: 3.9. Models for Principal Component
- Page 94 and 95: 4Principal Components as a SmallNum
- Page 96 and 97: 4.1. Anatomical Measurements 65Tabl
- Page 98 and 99: 4.1. Anatomical Measurements 67spac
- Page 100 and 101: 4.2. The Elderly at Home 69Table 4.
- Page 104 and 105: 4.3. Spatial and Temporal Variation
- Page 106 and 107: 4.4. Properties of Chemical Compoun
- Page 108 and 109: 4.5. Stock Market Prices 77Table 4.
- Page 110 and 111: 5. Graphical Representation of Data
- Page 112 and 113: Anatomical Measurements5.1. Plottin
- Page 114 and 115: 5.1. Plotting Two or Three Principa
- Page 116 and 117: 5.2. Principal Coordinate Analysis
- Page 118 and 119: 5.2. Principal Coordinate Analysis
- Page 120 and 121: 5.2. Principal Coordinate Analysis
- Page 122 and 123: 5.3. Biplots 91columns, L is an (r
- Page 124 and 125: 5.3. Biplots 93ButandSubstituting i
- Page 126 and 127: 5.3. Biplots 95The vector gi ∗ co
- Page 128 and 129: 5.3. Biplots 97Figure 5.3. Biplot u
- Page 130 and 131: 5.3. Biplots 99Table 5.2. First two
- Page 132 and 133: 5.3. Biplots 101Figure 5.5. Biplot
- Page 134 and 135: 5.4. Correspondence Analysis 103of
- Page 136 and 137: 5.4. Correspondence Analysis 105Fig
- Page 138 and 139: 5.6. Displaying Intrinsically High-
- Page 140 and 141: 5.6. Displaying Intrinsically High-
- Page 142 and 143: 6Choosing a Subset of PrincipalComp
- Page 144 and 145: 6.1. How Many Principal Components?
- Page 146 and 147: 6.1. How Many Principal Components?
- Page 148 and 149: 6.1. How Many Principal Components?
- Page 150 and 151: 6.1. How Many Principal Components?
72 4. Interpreting <strong>Principal</strong> <strong>Component</strong>s: Examplessection is taken from Maryon (1979) and is concerned with sea level atmosphericpressure fields, averaged over half-month periods, for most of theNorthern Hemisphere. There were 1440 half-months, corresponding to 60years between 1900 and 1974, excluding the years 1916–21, 1940–48 whendata were inadequate. The pressure fields are summarized by estimatingaverage pressure at p = 221 grid points covering the Northern Hemisphereso that the data set consists of 1440 observations on 221 variables. Datasets of this size, or larger, are commonplace in atmospheric science, anda standard procedure is to replace the variables by a few large-variancePCs. The eigenvectors that define the PCs are often known as empiricalorthogonal functions (EOFs) in the meteorological or climatological literature,and the values of the PCs (the PC scores) are sometimes referredto as amplitude time series (Rasmusson et al., 1981) or, confusingly, ascoefficients (Maryon, 1979) or EOF coefficients (von Storch and Zwiers,1999, Chapter 13). Richman (1986) distinguishes between EOF analysisand PCA, with the former having unit-length eigenvectors and the latterhaving eigenvectors renormalized, as in (2.3.2), to have lengths proportionalto their respective eigenvalues. Other authors, such as von Storchand Zwiers (1999) treat PCA and EOF analysis as synonymous.For each PC, there is a coefficient (in the usual sense of the word), orloading, for each variable, and because variables are gridpoints (geographicallocations) it is possible to plot each loading (coefficient) on a map atits corresponding gridpoint, and then draw contours through geographicallocations having the same coefficient values. The map representation cangreatly aid interpretation, as is illustrated in Figure 4.1.This figure, which comes from Maryon (1979), gives the map of coefficients,arbitrarily renormalized to give ‘round numbers’ on the contours,for the second PC from the pressure data set described above, and is mucheasier to interpret than would be the corresponding table of 221 coefficients.Half-months having large positive scores for this PC will tend to have highvalues of the variables, that is high pressure values, where coefficients on themap are positive, and low values of the variables (low pressure values) atgridpoints where coefficients are negative. In Figure 4.1 this corresponds tolow pressure in the polar regions and high pressure in the subtropics, leadingto situations where there is a strong westerly flow in high latitudes atmost longitudes. This is known as strong zonal flow, a reasonably frequentmeteorological phenomenon, and the second PC therefore contrasts halfmonthswith strong zonal flow with those of opposite character. Similarly,the first PC (not shown) has one of its extremes identified as correspondingto an intense high pressure area over Asia and such situations are again afairly frequent occurrence, although only in winter.Several other PCs in Maryon’s (1979) study can also be interpreted ascorresponding to recognizable meteorological situations, especially whencoefficients are plotted in map form. The use of PCs to summarize pressurefields and other meteorological or climatological fields has been found