Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression

people.ysu.edu
from people.ysu.edu More from this publisher
10.07.2015 Views

F-statistic: 350.5 on 2 and 28 DF,p-value: < 2.2e-16Now nothing is significant in the model except Girth^2. We could delete the Intercept andGirth from the model, but the model would no longer be parsimonious. A novice may see theoutput and be confused about how to proceed, while the seasoned statistician recognizes immediatelythat Girth and Girth^2 are highly correlated. The only remedy to this ailment is to rescaleGirth, which we should have done in the first place.5 InteractionIn our model for tree volume there have been two independent variables: Girth and Height. Wemay suspect that the independent variables are related, that is, values of one variable may tend toinfluence values of the other. We include an additional term in our model to try and capture thedependence between the variables.Perhaps the Girth and Height of the tree interact to influence the its Volume; we would liketo investigate whether the model (Girth = x 1 and Height = x 2 )Y = β 0 + β 1 x 1 + β 2 x 2 + ɛ (43)would be significantly improved by the modelY = β 0 + β 1 x 1 + β 2 x 2 + β 1:2 x 1 x 2 + ɛ, (44)where the subscript 1 : 2 denotes that β 1:2 is a coefficient of an interaction term between x 1 and x 2 .Interpretation: The mean response µ(x 1 , x 2 ) as a function of x 2 :µ(x 2 ) = (β 0 + β 1 x 1 ) + β 2 x 2 (45)is a linear function of x 2 with slope β 2 . As x 1 changes, the y-intercept of the mean response in x 2changes, but the slope remains the same. So the mean response in x 2 is represented by a collectionof parallel lines all with common slope β 2 .When the interaction term β 1:2 x 1 x 2 is included the mean response in x 2 then looks likeµ(x 2 ) = (β 0 + β 1 x 1 ) + (β 2 + β 1:2 x 1 )x 2 . (46)In this case we see that not only the y-intercept changes when x 1 varies, but the slope also changesin x 1 . Thus, the interaction term allows the slope of the mean response in x 2 to increase anddecrease as x 1 varies.18

How to do it with RThere are several ways to introduce an interaction term into the model.1. Make a new variable prod treesint.lm summary(treesint.lm)Call:lm(formula = Volume ~ Girth + Height + Girth:Height, data = trees)Residuals:Min 1Q Median 3Q Max-6.5821 -1.0673 0.3026 1.5641 4.6649Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 69.39632 23.83575 2.911 0.00713 **Girth -5.85585 1.92134 -3.048 0.00511 **Height -1.29708 0.30984 -4.186 0.00027 ***Girth:Height 0.13465 0.02438 5.524 7.48e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 2.709 on 27 degrees of freedomMultiple R-squared: 0.9756, Adjusted R-squared: 0.9728F-statistic: 359.3 on 3 and 27 DF, p-value: < 2.2e-16We can see from the output that the interaction term is highly significant. Further, the estimateb 1:2 is positive. This means that the slope of µ(x 2 ) is steeper for bigger values of Girth. Keep inmind: the same interpretation holds for µ(x 1 ); that is, the slope of µ(x 1 ) is steeper for bigger valuesof Height.19

F-statistic: 350.5 on 2 and 28 DF,p-value: < 2.2e-16Now nothing is significant in the model except Girth^2. We could delete the Intercept andGirth from the model, but the model would no longer be parsimonious. A novice may see theoutput and be confused about how to proceed, while the seasoned statistician recognizes immediatelythat Girth and Girth^2 are highly correlated. The only remedy to this ailment is to rescaleGirth, which we should have done in the first place.5 InteractionIn our model for tree volume there have been two independent variables: Girth and Height. Wemay suspect that the independent variables are related, that is, values of one variable may tend toinfluence values of the other. We include an additional term in our model to try and capture thedependence between the variables.Perhaps the Girth and Height of the tree interact to influence the its Volume; we would liketo investigate whether the model (Girth = x 1 and Height = x 2 )Y = β 0 + β 1 x 1 + β 2 x 2 + ɛ (43)would be significantly improved by the modelY = β 0 + β 1 x 1 + β 2 x 2 + β 1:2 x 1 x 2 + ɛ, (44)where the subscript 1 : 2 denotes that β 1:2 is a coefficient of an interaction term between x 1 and x 2 .Interpretation: The mean response µ(x 1 , x 2 ) as a function of x 2 :µ(x 2 ) = (β 0 + β 1 x 1 ) + β 2 x 2 (45)is a linear function of x 2 with slope β 2 . As x 1 changes, the y-intercept of the mean response in x 2changes, but the slope remains the same. So the mean response in x 2 is represented by a collectionof parallel lines all with common slope β 2 .When the interaction term β 1:2 x 1 x 2 is included the mean response in x 2 then looks likeµ(x 2 ) = (β 0 + β 1 x 1 ) + (β 2 + β 1:2 x 1 )x 2 . (46)In this case we see that not only the y-intercept changes when x 1 varies, but the slope also changesin x 1 . Thus, the interaction term allows the slope of the mean response in x 2 to increase anddecrease as x 1 varies.18

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!