Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s) Jolliffe I. Principal Component Analysis (2ed., Springer, 2002)(518s)

cda.psych.uiuc.edu
from cda.psych.uiuc.edu More from this publisher
12.07.2015 Views

13.2. Analysis of Size and Shape 345components are orthogonal to the isometric vector, but the shape componentsthemselves are correlated with the isometric component. Cadima andJolliffe (1996) quote an example in which these correlations are as large as0.92.Ranatunga (1989) introduced a method for which the shape componentsare uncorrelated with the isometric component, but her technique sacrificesorthogonality of the vectors of coefficients. A similar problem, namelylosing either uncorrelatedness or orthogonality when searching for simplealternatives to PCA, was observed in Chapter 11. In the present context,however, Cadima and Jolliffe (1996) derived a procedure combining aspectsof double-centering and Ranatunga’s approach and gives shape componentsthat are both uncorrelated with the isometric component and have vectorsof coefficients orthogonal to a 0 . Unfortunately, introducing one desirableproperty leads to the loss of another. As pointed out by Mardia et al.(1996), if x h = cx i where x h , x i are two observations and c is a constant,then in Cadima and Jolliffe’s (1996) method the scores of the two observationsare different on the shape components. Most definitions of shapeconsider two observations related in this manner to have the same shape.Decomposition into size and shape of the variation in measurementsmade on organisms is a complex problem. None of the terms ‘size,’ ‘shape,’‘isometric’ or ‘allometry’ is uniquely defined, which leaves plenty of scopefor vigorous debate on the merits or otherwise of various procedures (see,for example, Bookstein (1989); Jungers et al. (1995)).One of the other approaches to the analysis of size and shape is to definea scalar measure of size, and then calculate a shape vector as the originalvector x of p measurements divided by the size. This is intuitivelyreasonable, but needs a definition of size. Darroch and Mosimann (1985)list a number of possibilities, but home in on g a (x) = ∏ pk=1 xa kk , wherea ′ =(a 1 ,a 2 ,...,a p )and ∑ pk=1 a k = 1. The size is thus a generalizationof the geometric mean. Darroch and Mosimann (1985) discuss a numberof properties of the shape vector x/g a (x) and its logarithm, and advocatethe use of PCA on the log-transformed shape vector, leading to shapecomponents. The log shape vector generalizes the vector v used by Aitchison(1983) in the analysis of compositional data (see Section 13.3), butthe PCs are invariant with respect to the choice of a. As with Aitchison’s(1983) analysis, the covariance matrix of the log shape data has the isometricvector a 0 as an eigenvector, with zero eigenvalue. Hence all the shapecomponents are contrasts between log-transformed variables. Darroch andMosimann (1985) give an example in which both the first and last shapecomponents are of interest.The analysis of shapes goes well beyond the size and shape of organisms(see, for example, Dryden and Mardia (1998) and Bookstein (1991)). Acompletely different approach to the analysis of shape is based on ‘landmarks.’These are well-defined points on an object whose coordinates definethe shape of the object, after the effects of location, scale and rotation have

346 13. Principal Component Analysis for Special Types of Databeen removed. Landmarks can be used to examine the shapes of animalsand plants, but they are also relevant to the analysis of shapes of manyother types of object. Here, too, PCA can be useful and it may be implementedin a variety of forms. Kent (1994) distinguishes four versionsdepending on the choice of coordinates and on the choice of whether to usereal or complex PCA (see also Section 13.8).Kent (1994) describes two coordinate systems due to Kendall (1984)and to Bookstein (1991). The two systems arise because of the differentpossible ways of removing location and scale. If there are p landmarks,then the end result in either system of coordinates is that each object isrepresented by a set of (p−1) two-dimensional vectors. There is now a choiceof whether to treat the data as measurements on 2(p − 1) real variables, oras measurements on (p − 1) complex variables where the two coordinatesat each landmark point give the real and imaginary parts. Kent (1994)discusses the properties of the four varieties of PCA thus produced, andcomments that complex PCA is rather uninformative. He gives an exampleof real PCA for both coordinate systems.Horgan (2000) describes an application of PCA to the comparison ofshapes of carrots. After bringing the carrots into the closest possible alignment,distances are calculated between each pair of carrots based on theamount of non-overlap of their shapes. A principal coordinate analysis(Section 5.2) is done on these distances, and Horgan (2000) notes thatthis is equivalent to a principal component analysis on binary variablesrepresenting the presence or absence of each carrot at a grid of points in twodimensionalspace. Horgan (2000) also notes the similarity between his techniqueand a PCA on the grey scale levels of aligned images, giving so-calledeigenimages. This latter procedure has been used to analyse faces (see, forexample Craw and Cameron, 1992) as well as carrots (Horgan, 2001).13.3 Principal Component Analysis forCompositional DataCompositional data consist of observations x 1 , x 2 ,...,x n for which eachelement of x i is a proportion, and the elements of x i are constrained tosum to unity. Such data occur, for example, when a number of chemicalcompounds or geological specimens or blood samples are analysed, and theproportion in each of a number of chemical elements is recorded. As notedin Section 13.1, Gower (1967) discusses some geometric implications thatfollow from the constraints on the elements of x, but the major reference forPCA on compositional data is Aitchison (1983). Because of the constraintson the elements of x, and also because compositional data apparently oftenexhibit non-linear rather than linear structure among their variables,Aitchison (1983) proposes that PCA be modified for such data.

346 13. <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> for Special Types of Databeen removed. Landmarks can be used to examine the shapes of animalsand plants, but they are also relevant to the analysis of shapes of manyother types of object. Here, too, PCA can be useful and it may be implementedin a variety of forms. Kent (1994) distinguishes four versionsdepending on the choice of coordinates and on the choice of whether to usereal or complex PCA (see also Section 13.8).Kent (1994) describes two coordinate systems due to Kendall (1984)and to Bookstein (1991). The two systems arise because of the differentpossible ways of removing location and scale. If there are p landmarks,then the end result in either system of coordinates is that each object isrepresented by a set of (p−1) two-dimensional vectors. There is now a choiceof whether to treat the data as measurements on 2(p − 1) real variables, oras measurements on (p − 1) complex variables where the two coordinatesat each landmark point give the real and imaginary parts. Kent (1994)discusses the properties of the four varieties of PCA thus produced, andcomments that complex PCA is rather uninformative. He gives an exampleof real PCA for both coordinate systems.Horgan (2000) describes an application of PCA to the comparison ofshapes of carrots. After bringing the carrots into the closest possible alignment,distances are calculated between each pair of carrots based on theamount of non-overlap of their shapes. A principal coordinate analysis(Section 5.2) is done on these distances, and Horgan (2000) notes thatthis is equivalent to a principal component analysis on binary variablesrepresenting the presence or absence of each carrot at a grid of points in twodimensionalspace. Horgan (2000) also notes the similarity between his techniqueand a PCA on the grey scale levels of aligned images, giving so-calledeigenimages. This latter procedure has been used to analyse faces (see, forexample Craw and Cameron, 1992) as well as carrots (Horgan, 2001).13.3 <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> forCompositional DataCompositional data consist of observations x 1 , x 2 ,...,x n for which eachelement of x i is a proportion, and the elements of x i are constrained tosum to unity. Such data occur, for example, when a number of chemicalcompounds or geological specimens or blood samples are analysed, and theproportion in each of a number of chemical elements is recorded. As notedin Section 13.1, Gower (1967) discusses some geometric implications thatfollow from the constraints on the elements of x, but the major reference forPCA on compositional data is Aitchison (1983). Because of the constraintson the elements of x, and also because compositional data apparently oftenexhibit non-linear rather than linear structure among their variables,Aitchison (1983) proposes that PCA be modified for such data.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!