10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Using</strong> R <strong>for</strong> introductory statistics 318> abline(107.0674, 0.1204)| > abline(107.0674–8.3971, 0.1204, lty=2)The last line of the output of summary (res) shows that the F-test is rejected. This is a testof whether all the coefficients except the intercept are 0. A better test would be to seewhether the additional smoke variable is significant once we control <strong>for</strong> the mother’sweight. This is done using anova() to compare the two models.> res.1=lm(wt ~ wt1, data=babies, subset=wt1 < 800)> anova(res.1,res)Analysis of Variance TableModel 1: wt ~ wt1Model 2: wt ~ wt1+factor(smoke)Res.Df RSS Df Sum of Sq F Pr(>F)1 1198 3945722 1194 372847 4 21725 17.4 7e-14 ***--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1‘ ’ 1The small p-value indicates that the additional term is warranted. ■11.3.1 Problems11.19 The nym. 2002 (<strong>Using</strong>R) data set contains data on the finishers of the 2002 NewYork City Marathon. Do an ANCOVA of time on the numeric variable age and the factorgender. How much difference is there between the genders?11.20 For the mtcars data set, per<strong>for</strong>m an ANCOVA of mpg on the weight, wt, and thetransmission type, am. You should use factor (am) in your model to ensure that thisvariable is treated as a factor. Is the transmission type significant?11.21 Per<strong>for</strong>m an ANCOVA <strong>for</strong> the babies (<strong>Using</strong>R) data set modeling birth weight(wt) by gestation (gestation), mother’s weight (wt1), mother’s height (ht), and mother’ssmoking status (smoke).11.22 From the kid. weights (<strong>Using</strong>R) data set, the body mass index (BMI) can becomputed by dividing the weight by the height squared in metric units.The following will add a BMI variable:> kid.weights$BMI=(kid.weights$weight/2.54)/+ (kid.weights$height*2.54/100)^2Model the BMI by the age and gender variables. This is a parallel-lines model. Whichvariables are significant? Use the partial F-test to find the preferred model. Does thisagree with the output of stepAIC()?11.23 The cf b (<strong>Using</strong>R) data set contains in<strong>for</strong>mation on consumer expenses. Inparticular, INCOME contains income figures, EDUC is the number of years of education,and AGE is the age of the participant. Per<strong>for</strong>m an ANCOVA modeling log (INCOME+1)by AGE and EDUC. You need to <strong>for</strong>ce EDUC to be a factor. Are both variablessignificant?

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!