10.07.2015 Views

Multiple Linear Regression

Multiple Linear Regression

Multiple Linear Regression

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

for this hypothesis test there are two competing models under consideration:the full model: y = β 0 + β 1 x 1 + · · · + β p x p + ɛ, (55)the reduced model: y = β 0 + β 1 x 1 + · · · + β j x j + ɛ, (56)Of course, the full model will always explain the data better than the reduced model, but does thefull model explain the data significantly better than the reduced model? This question is exactlywhat the partial F statistic is designed to answer.We first calculate S S E f , the unexplained variation in the full model, and S S E r , the unexplainedvariation in the reduced model. We base our test on the difference S S E r − S S E f which measuresthe reduction in unexplained variation attributable to the variables x j+1 , . . . ,x p . In the full modelthere are p + 1 parameters and in the reduced model there are j + 1 parameters, which gives adifference of p − j parameters (hence degrees of freedom). The partial F statistic isF = (S S E r − S S E f )/(p − j). (57)S S E f /(n − p − 1)It can be shown when the regression assumptions hold under H 0 that the partial F statistic has anf(df1 = p − j, df2 = n − p − 1) distribution. We calculate the p-value of the observed partial Fstatistic and reject H 0 if the p-value is small.How to do it with RThe key ingredient above is that the two competing models are nested in the sense that the reducedmodel is entirely contained within the complete model. The way to test whether the improvementis significant is to compute lm objects both for the complete model and the reduced model thencompare the answers with the anova function.For the trees data, let us fit a polynomial regression model and for the sake of argument wewill ignore our own good advice and fail to rescale the explanatory variables.> treesfull.lm summary(treesfull.lm)Call:lm(formula = Volume ~ Girth + I(Girth^2) + Height + I(Height^2),data = trees)Residuals:Min 1Q Median 3Q Max25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!