10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Linear regression 265response ~ predictorThe ~ (tilde) is read “is modeled by” and is used to separate the response from thepredictor(s). The response variable can have regular mathematical expressions applied toit, but <strong>for</strong> the predictor variables the regular notations +, −, *, /, and ^ have differentmeanings. A+means to add another term to the model, − means to drop a term, more orless coinciding with the symbols’ common usage. But *, /, and ^ are used differently. Ifwe want to use regular mathematical notation <strong>for</strong> the predictor we must insulate thesymbols’ usage with the I () function, as in I (x^2).10.1.2 Examples of the linear modelAt first, the simple linear regression model appears to be solely about a straightlinerelationship between pairs of data. We’ll see that this isn’t so, by looking at how themodel accommodates many of the ideas previously mentioned.Simple linear regression If (x i , y i ) are related by the linear modely i= β 0 +β 1 x i +ε ias above, then the model is represented in R by the <strong>for</strong>mula y ~ x. The intercept term, β 0 ,is implicitly defined.If <strong>for</strong> some reason the intercept term is not desired, it can be dropped from the modelby including the term −1, as in y ~ x−1.The mean of an i.i.d. sample In finding confidence intervals or per<strong>for</strong>ming asignificance test <strong>for</strong> the mean of an i.i.d. sample, Y 1 , Y 2 ,…,Y n , we often assumednormality of the population. In terms of a statistical model this could be viewed asY i =µ+ε i ,where the ε i are Normal(0, σ).The model <strong>for</strong> this in R is y ~ 1. As there is no predictor variable, the intercept term isexplicitly presen t.The paired t-test In Chapter 8, we considered the paired t-test. This test applies whentwo samples are somehow related and the differences between the two samples israndom. That is, Y i −X i , is the quantity of interest. This corresponds to the statisticalmodely i =x i +ε i .If we assume ε i has mean 0, then we can model the mean difference between Y and X byµ, and our model becomesY i =µ+X i +ε i .Our significance test with H 0 : µ 1 =µ 2 turns into a test of µ=0.The model <strong>for</strong>mula to fit this in R uses an offset, which we won’t discuss again, but<strong>for</strong> reference it would look like y ~ offset(x).In Chapter 11 we will see that this model can be used <strong>for</strong> a two-sample t-test. Later inthis chapter we will extend the model to describe relationships that are not straight linesand relationships involving multiple predictors.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!