10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Using</strong> R <strong>for</strong> introductory statistics 308means are µ 1 and µ 2 and the two samples have a com-mon variance. We may per<strong>for</strong>m atwo-sided significance test of µ 1 =µ 2 with a t-test.We illustrate with simulated data:> mu1=0; mu2=1> x=rnorm(15,mu1); y=rnorm(15,mu2)> t.test(x,y, var.equal=TRUE)Two Sample t-testdata: x and yt=−2.858, df=28, p-value=0.007961alternative hypothesis: true difference in means is notequal to a95 percent confidence interval:−2.0520 −0.3386sample estimates:mean of x mean of y0.0157 1.211We see that the p-value is small, as expected.We can approach this test differently, in a manner that generalizes to the case whenthere are more than two independent samples. Combine the data into a single data vector,Y, and a factor keeping track of which sample, 1 or 2, the data is from. This presumessome ordering on the data after it is stored in Y. For example, we can let the first n 1values be from the first sample and the second n 2 from the last. This is what stack() does.<strong>Using</strong> this order, let 1 1 (i) be an indicator function that is 1 if the level of the factor <strong>for</strong> theith data value is 1. Similarly, define 1 2 (i). Then we can rewrite our model asY i =µ 1 l 1 (i)+µ 2 1 2 (i)+ε i .When the data <strong>for</strong> the first sample is considered, 1 2 (i)=0, and this model is simplyY i =µ 1 +ε i . When the second sample is considered, the other dummy variable is 0, and themodel considered is Y i =µ 2 +ε i .We can rewrite the model to use just the second indicator variable. We use differentnames <strong>for</strong> the coefficients:Y i =β 1 +β 2 l 2 (i)+ε i .Now when the data <strong>for</strong> the first sample is considered the model is Y i =β 1 +ε i , so β 1 is stillµ 1 . However, when the second sample is considered, we have Y i = β 1 +β 2 +ε i , so µ 2 =β 1 +β 2 .That is, β 2 =µ 2 −µ 1 . We say that level 1 is a reference level, as the mean of the second levelis represented in reference to the first.It turns out that statistical inference is a little more natural when we pick one of themeans to serve as a reference. The resulting model looks just like a linear-regressionmodel where x i is 1 2 (i). We can fit it that way and interpret the coefficients accordingly.The model is specified the same way, as with oneway.test(), y ~ f, where y holds the dataand f is a factor indicating which group the data is <strong>for</strong>.To model, first we stack, then we fit with lm().> d=stack(list(x=x,y=y)) # need named list.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!