10.07.2015 Views

Multiple Linear Regression

Multiple Linear Regression

Multiple Linear Regression

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Height by a new variable Tall which indicates whether or not the cherry tree is taller than acertain threshold (which for the sake of argument will be the sample median height of 76 ft). Thatis, Tall will be defined by⎧⎪⎨ yes, if Height > 76,Tall =(47)⎪⎩ no, if Height ≤ 76.We can construct Tall very quickly in R with the cut function:> trees$Tall trees$Tall[1:5][1] no no no no yesLevels: no yesNote that Tall is automatically generated to be a factor with the labels in the correct order.See ?cut for more.Once we have Tall, we include it in the regression model just like we would any other variable.It is handled internally in the following way. Define the “dummy variable” Tallyes that takesvalues⎧⎪⎨ 1, if Tall = yes,Tallyes =(48)⎪⎩ 0, otherwise.That is, Tallyes is an indicator variable which indicates when a respective tree is tall. The modelmay now be written asVolume = β 0 + β 1 Girth + β 2 Tallyes + ɛ. (49)Let us take a look at what this definition does to the mean response. Trees with Tall = yes willhave the mean responseµ(Girth) = (β 0 + β 2 ) + β 1 Girth, (50)while trees with Tall = no will have the mean responseµ(Girth) = β 0 + β 1 Girth. (51)In essence, we are fitting two regression lines: one for tall trees, and one for short trees. Theregression lines have the same slope but they have different y intercepts (which are exactly |β 2 | farapart).21

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!