Graphical Analysis of Variance (Graphical ANOVA) This set of slides ...

Graphical Analysis of Variance (Graphical ANOVA) 

This set of slides introduces a graphical method 

for analyzing multiple samples which makes the 

main idea of ANOVA ... accessible .... ... when we ask if a set of 

sample means gives evidence for differences among the 

population means, what matters is not how far apart the sample 

means are but how far apart they are relative to the variability of 

individual observations.* 

*David Moore, The Basic Practice of Statistics. 2010, p. 642. 

1

The Typical One Factor Layout 

The data consist of m samples: sets of n 

independent measurements taken under various 

conditions or treatments: 

Treatment 1 Treatment 2 · · · Treatment m 

X 11 X 21 · · · X m1 

X 12 X 22 · · · X m2 

· · · · · · · · · · · · 

X 1n X 2n · · · X mn 

2

A Paper Airplane Experiment 

Does the distance travelled by a paper 

airplane after being thrown depend on the 

weight of the paper used in its construction? 

3

A Paper Airplane Experiment 

12 paper airplanes of a single design were 

constructed using 

• 20 × 27 cm sheets of paper 

• 3 different weights of paper (light, medium, 

heavy) 

m = 3 treatment groups with n = 4 

• Response: distance travelled (in m) 

• Factor of interest: weight of paper 

4

The Paper Airplane Experiment Data 

> airplane

Paper Airplane Data 

> head(airplane, n=6) 

paper distance 

1 light 3.1 

2 light 3.3 

3 light 2.1 

4 light 1.9 

5 medium 4.0 

6 medium 3.5 

6

Paper Airplane Data 

> tail(airplane, n=2) 

paper distance 

11 heavy 4.7 

12 heavy 5.3 

7

Displaying the Data Differently 

> unstack(airplane, distance ~ paper) 

heavy light medium 

1 5.1 3.1 4.0 

2 3.1 3.3 3.5 

3 4.7 2.1 4.5 

4 5.3 1.9 6.1 

8

Compare Averages for the 3 Groups 

First, split the data frame up by paper type: 

> airplane.split

Split Data Frame 

> airplane.split 

$heavy 

[1] 5.1 3.1 4.7 5.3 

$light 

[1] 3.1 3.3 2.1 1.9 

$medium 

[1] 4.0 3.5 4.5 6.1 

10

Compare Averages for the 3 Groups 

Now, compute the averages: 

> means means 

heavy 

light medium 

4.550 2.600 4.525 

11

Discussion 

Flight distances are different for different 

treatments, but they are also different within 

treatments 

What has caused the variation within each 

treatment? 

Factors other than paper weight must be having 

an effect: unexplained variation or error. 

12

Discussion 

Some of the possibilities are: individual airplane 

construction, initial throwing height, and initial 

thrust and direction 

These unmeasured factors probably vary slightly 

from throw to throw and thus could account for 

some or all of the variation observed within each 

treatment 

Is there confounding? (What is confounding?) 

13

Discussion 

The presence of unmeasured factors makes it 

difficult to tell if differences in paper type have an 

effect on flight distance 

Are the differences in the treatment averages due 

only to the unmeasured factors? 

14

Identifying Within Group Variation 

Subtract the within-group averages from each of 

the distance values to remove any possible 

effects due to different types of paper. 

15


First step: Create a vector of distances. 

> unlist(airplane.split) 

heavy1 heavy2 heavy3 heavy4 light1 

5.1 3.1 4.7 5.3 3.1 

light2 light3 light4 medium1 medium2 

3.3 2.1 1.9 4.0 3.5 

medium3 medium4 

4.5 6.1 

16


Second step: Create vector containing 

corresponding averages. 

Do this using the rep() function. We need the 

sample sizes first: 

> n n # sample sizes 

heavy 

light medium 

4 4 4 

17


The vector of averages: 

> rep(means, n) 

heavy heavy heavy heavy light 

4.550 4.550 4.550 4.550 2.600 

light light light medium medium 

2.600 2.600 2.600 4.525 4.525 

medium medium 

4.525 4.525 

18


Subtract the averages from the original 

measurements. 

> errors


Variation due to factors other than paper type: 

> errors 

heavy1 heavy2 heavy3 heavy4 light1 

0.550 -1.450 0.150 0.750 0.500 

light2 light3 light4 medium1 medium2 

0.700 -0.500 -0.700 -0.525 -1.025 

medium3 medium4 

-0.025 1.575 

20

Does paper type have an effect on mean distance? 

Averages of the samples vary, but maybe only 

because of the unmeasured factors. 

If this is true (and paper type does not affect mean 

distance), then 

Var( ¯X i ) = σ2 

n 

where ¯X i is the average distance in the ith 

sample, n is the sample size and 

σ 2 is the variance of the error. 

21


Observe that, for the ith sample, 

Var( √ n ¯X i ) = σ 2 

if variability is due only to unmeasured factors. 

the variability of (scaled) sample averages ∼ 

the variability in the errors, when paper type has 

no effect. 

22


Let ¯X = the grand mean of the flight distances. 

If paper type has no effect on distance, then 

√ n( ¯X i − ¯X) 

and the errors will both have mean 0 and the same 

variance. 

(Why do we need to subtract the grand mean?) 

23

Graphically testing whether paper type has an 

effect on mean distance 

A histogram of the errors shows the variability in 

the errors. 

Since the scaled averages are supposed to have 

the same amount of variability when paper type 

has no effect, we can plot the locations of the 

scaled averages on the error histogram and look 

for any surprises. 

24

Graphically testing whether paper type has an 

effect on mean distance 

A surprising result represents evidence that paper 

type has an effect. 

In other words, look for outliers relative to the 

histogram. 

25

The Graphical Test 

> hist(errors, xlim=c(-3,3)) 

> scaled.means points(scaled.means, rep(0,3), 

+ pch=16, col=2, cex=2.5) 

26

The Graphical Test 

Histogram of errors 

Frequency 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 

● 

● 

−3 −2 −1 0 1 2 3 

errors 

27

The Graphical Test Result 

One of the scaled means is an outlier. 

We have strong evidence that paper type affects 

flight distance. 

28

The Graphical ANOVA Procedure 

1. Compute averages ¯X 1 , ¯X 2 , . . . , ¯X m and the grand average, ¯X. 

2. Remove treatment effects from each sample by computing errors: 

e ij = X ij − ¯X i , 

for j = 1, . . . , n, and i = 1, . . . , m. 

3. Center and scale the ith treatment average by subtracting the grand 

average and multiplying by the sample size: 

√ n( ¯X i − ¯X). 

4. Compare the m scaled averages with a histogram of the errors. 

29

Graphical ANOVA Function 

> graphicalANOVA

Application of the function to the Airplane Data 

> graphicalANOVA(airplane) 

Frequency 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 

● 

average heavy 

average light 

average medium 

● 

−3 −2 −1 0 1 2 

errors 

31

The Graphical ANOVA Plot 

The light paper average is an outlier relative to the 

histogram 

There is strong evidence that the treatment 

means are not all the same 

Otherwise, the treatment averages should be 

located in regions corresponding to higher 

histogram density 

32

Motor Vibration Example 

5 different brands of bearings are compared in 

terms of the amount of vibration they generate in 

an electric motor* 

*The motor vibration data can be found in the Devore5 library (Bates, 2004) 

33

Motor Vibration Data 

> motor head(motor) 

V1 V2 V3 V4 V5 

1 13.1 16.3 13.7 15.7 13.5 

2 15.0 15.7 13.9 13.7 13.4 

3 14.0 17.2 12.4 14.4 13.2 

4 14.4 14.9 13.8 16.0 12.7 

5 14.0 14.4 14.9 13.9 13.4 

6 11.6 17.2 13.3 14.7 12.3 

34


> names(motor) motors head(motors) 

values ind 

1 13.1 Brand 1 

2 15.0 Brand 1 

3 14.0 Brand 1 

4 14.4 Brand 1 

5 14.0 Brand 1 

6 11.6 Brand 1 

35


> tail(motors, n=2) 

values ind 

29 13.4 Brand 5 

30 12.3 Brand 5 

36


> names(motors) head(motors, n=3) 

vibration bearing brand 

1 13.1 Brand 1 

2 15.0 Brand 1 

3 14.0 Brand 1 

37


> graphicalANOVA(motors[,c(2,1)]) 

38


Frequency 

0 2 4 6 8 10 

● 

● 

average Brand 1 





● 

● 

−2 0 2 4 

errors 

39

Motor Vibration 

The second brand average appears in the extreme 

right tail. 

There is clear evidence that the amount of 

vibration depends on the type of bearing. 

40

An Example Using Simulated Data 

What does this ANOVA plot look like if there really 

is no difference between the samples? 

To check, we simulate 40 independent 

observations from a normal distribution with 

mean 0 and variance 1. 

We pretend that the first 10 observations are from 

1 sample, the second set of 10 are from another 

sample, and so on. 

41

Simulated Data 

> set.seed(12030); pretenddata samplenumber pretend.df rm(pretenddata, samplenumber) 

> head(pretend.df, n=3) 

samplenumber pretenddata 

1 1 0.3752435 

2 1 -0.6727016 

3 1 0.8656027 

42


> graphicalANOVA(pretend.df) 

43


Frequency 

0 5 10 15 

● 

average 1 

average 2 

average 3 

average 4 

● 

−3 −2 −1 0 1 2 3 4 

errors 

44

Graphical Analysis of Variance (Graphical ANOVA) This set of slides ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?