10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Using</strong> R <strong>for</strong> introductory statistics 332We can now model the variable preemie by the levels of smoke and the variable BMI.This is similar to an ANCOVA, except that the response variable is binary.> res=glm(preemie ~ factor(smoke)+BMI, family=binomial,+ data=babies.prem)> summary(res)…Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) −3.4246 0.7113 −4.81 1.5e-06 ***factor(smoke)1 0.1935 0.2355 0.82 0.41factor(smoke)2 0.3137 0.3888 0.81 0.42factor(smoke)3 0.1011 0.4047 0.25 0.80BMI 0.0401 0.0304 1.32 0.19--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1‘ ’ 1…None of the variables are flagged as significant. This indicates that the model with noeffects is, perhaps, preferred. (The sampling distribution under the null hypothesis isdifferent from the previous example, so the column gets marked with “z value” asopposed to “t value”) We check which model is preferred by the AIC using stepAIC ()from the MASS package.> library(MASS)> stepAIC(res)Start: AIC= 672.3…Step: AIC= 666.8preemie ~ 1Call:glm(<strong>for</strong>mula=preemie ~data=babies.prem)Coefficients:(Intercept)−2.42…1, family=binomial,The model of constant mean is chosen by this criteria, indicating that these risk factors donot show up in this data set.■ Example 12.4: The spam data Let’s apply logistic regression to the data on spamin Table 12.1. Set Y i to be 1 if the e-mail is opened, and a otherwise. Likewise, let x 1i be 1if the e-mail has a name in the subject, and X 2i be 1 if the e-mail has an offer in thesubject. Then we want to model Y i by x 1i and X 2i . To use logistic regression, we first turnthe summarized data into 5,000 samples. We use rep () repeatedly to do so.> first.name = rep(1:0,c(2500,2500))> offer = rep(c(1,0,1,0),rep(1250,4))

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!