Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

5.1. Plotting Two or Three Principal Components 83Figure 5.1. (b). Student anatomical measurements: plot of the first two PCs for28 students with minimum spanning tree superimposed.Artistic Qualities of PaintersThe second data set described in this section was analysed by Davenportand Studdert-Kennedy (1972). It consists of a set of subjective measurementsof the artistic qualities ‘composition,’ ‘drawing,’ ‘colour’ and‘expression’ for 54 painters. The measurements, on a scale from 0 to 20,were compiled in France in 1708 by Roger de Piles for painters ‘of establishedreputation.’ Davenport and Studdert-Kennedy (1972) give data for56 painters, but for two painters one measurement is missing, so thesepainters are omitted from the analysis.Table 5.1 gives the variances and coefficients for the first two PCs basedon the correlation matrix for the 54 painters with complete data. The components,and their contributions to the total variation, are very similar tothose found by Davenport and Studdert-Kennedy (1972) for the covariancematrix. This strong similarity between the PCs for correlation andcovariance matrices is relatively unusual (see Section 3.3) and is due to thenear-equality of the variances for the four variables. The first component

84 5. Graphical Representation of Data Using Principal ComponentsTable 5.1. First two PCs: artistic qualities of painters.Component 1 Component 2⎫Composition 0.50 -0.49⎪⎬Drawing 0.56 0.27CoefficientsColour −0.35 −0.77⎪⎭Expression 0.56 −0.31Eigenvalue 2.27 1.04Cumulative percentage of total variation 56.8 82.8is interpreted by the researchers as an index of de Piles’ overall assessmentof the painters, although the negative coefficient for colour needs some additionalexplanation. The form of this first PC could be predicted from thecorrelation matrix. If the sign of the variable ‘colour’ is changed, then allcorrelations in the matrix are positive, so that we would expect the first PCto have positive coefficients for all variables after this redefinition of ‘colour’(see Section 3.8). The second PC has its largest coefficient for colour, butthe other coefficients are also non-negligible.A plot of the 54 painters with respect to the first two components isgiven in Figure 5.2, and this two-dimensional display represents 82.8% ofthe total variation. The main feature of Figure 5.2 is that painters of thesame school are mostly fairly close to each other. For example, the set ofthe ten ‘Venetians’ {Bassano, Bellini, Veronese, Giorgione, Murillo, PalmaVecchio, Palma Giovane, Pordenone, Tintoretto, Titian} are indicated onthe figure, and are all in a relatively small area at the bottom left of theplot. Davenport and Studdert-Kennedy (1972) perform a cluster analysison the data, and display the clusters on a plot of the first two PCs. Theclusters dissect the data in a sensible looking manner, and none of themhas a convoluted shape on the PC plot. However, there is little evidence ofa strong cluster structure in Figure 5.2. Possible exceptions are a group ofthree isolated painters near the bottom of the plot, and four painters at theextreme left. The first group are all members of the ‘Seventeenth CenturySchool,’ namely Rembrandt, Rubens, and Van Dyck, and the second groupconsists of three ‘Venetians,’ Bassano, Bellini, Palma Vecchio, together withthe ‘Lombardian’ Caravaggio. This data set will be discussed again in Sections5.3 and 10.2, and the numbered observations on Figure 5.2 will bereferred to there. Further examples of the use of PCA in conjunction withcluster analysis are given in Section 9.2.Throughout this section there has been the suggestion that plots of thefirst two PCs may reveal interesting structure in the data. This contradictsthe implicit assumption that the n observations are identically distributedwith a common mean and covariance matrix. Most ‘structures’ in the dataindicate that different observations have different means, and that PCA

84 5. Graphical Representation of Data Using <strong>Principal</strong> <strong>Component</strong>sTable 5.1. First two PCs: artistic qualities of painters.<strong>Component</strong> 1 <strong>Component</strong> 2⎫Composition 0.50 -0.49⎪⎬Drawing 0.56 0.27CoefficientsColour −0.35 −0.77⎪⎭Expression 0.56 −0.31Eigenvalue 2.27 1.04Cumulative percentage of total variation 56.8 82.8is interpreted by the researchers as an index of de Piles’ overall assessmentof the painters, although the negative coefficient for colour needs some additionalexplanation. The form of this first PC could be predicted from thecorrelation matrix. If the sign of the variable ‘colour’ is changed, then allcorrelations in the matrix are positive, so that we would expect the first PCto have positive coefficients for all variables after this redefinition of ‘colour’(see Section 3.8). The second PC has its largest coefficient for colour, butthe other coefficients are also non-negligible.A plot of the 54 painters with respect to the first two components isgiven in Figure 5.2, and this two-dimensional display represents 82.8% ofthe total variation. The main feature of Figure 5.2 is that painters of thesame school are mostly fairly close to each other. For example, the set ofthe ten ‘Venetians’ {Bassano, Bellini, Veronese, Giorgione, Murillo, PalmaVecchio, Palma Giovane, Pordenone, Tintoretto, Titian} are indicated onthe figure, and are all in a relatively small area at the bottom left of theplot. Davenport and Studdert-Kennedy (1972) perform a cluster analysison the data, and display the clusters on a plot of the first two PCs. Theclusters dissect the data in a sensible looking manner, and none of themhas a convoluted shape on the PC plot. However, there is little evidence ofa strong cluster structure in Figure 5.2. Possible exceptions are a group ofthree isolated painters near the bottom of the plot, and four painters at theextreme left. The first group are all members of the ‘Seventeenth CenturySchool,’ namely Rembrandt, Rubens, and Van Dyck, and the second groupconsists of three ‘Venetians,’ Bassano, Bellini, Palma Vecchio, together withthe ‘Lombardian’ Caravaggio. This data set will be discussed again in Sections5.3 and 10.2, and the numbered observations on Figure 5.2 will bereferred to there. Further examples of the use of PCA in conjunction withcluster analysis are given in Section 9.2.Throughout this section there has been the suggestion that plots of thefirst two PCs may reveal interesting structure in the data. This contradictsthe implicit assumption that the n observations are identically distributedwith a common mean and covariance matrix. Most ‘structures’ in the dataindicate that different observations have different means, and that PCA

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!