10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Using</strong> R <strong>for</strong> introductory statistics 330res=glm(<strong>for</strong>mula, family=…, data=…)The <strong>for</strong>mula is specified as though it were a linear model. The argument family=allowsus to specify the distribution and the link. Details are in the help page ? family and in thesection “Generalized linear models” in the manual An Introduction to R accompanying R.We will use only two: the one <strong>for</strong> logistic regression and one to compare the results withsimple linear regression.For logistic regression the argument is specified by f amily=binomial, as the defaultlink function is what we want. For comparison to simple linear regression, the linkfunction is just an identity, and the family is specified as family=gaussian. *As an illustration, let’s compare using glm() and 1m () to analyze a linear model. Wewill use simulated data so we already “know” the answer.■ Example 12.2: Comparing glm () and 1m () We first simulate data from the modelthat Y i has a Normal(x 1i +2x 2i , σ) distribution.> x1 = rep(1:10,2)> x2 = rchisq(20,df=2)> y = rnorm(20,mean=xl + 2*x2, sd=2)* Gaussian is a mathematical term named <strong>for</strong> Carl Gauss that describes the normal distribution.We fit this using 1m () as follows:> res.lm=lm(y ~ x1+x2)> summary(res.1m)…Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) −0.574 1.086 -0.53 0.6x1 1.125 0.143 7.89 4.4e-07 ***x2 1.971 0.254 7.75 5.6e-07 ***…Signif. codes: 0 ‘***’ 0.001 ‘**’0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1…Both the coefficients <strong>for</strong> x1 and x2 are flagged as significantly different from a in themarginal t-tests.The above can all be done using glm (). The only difference is that the modelinginvolves specifying the family=argument. We show all the output below.> res.glm=glm(y ~ x1+x2, family=gaussian)> summary(res.glm)Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) −0.574 1.086 −0.53 0.6x1 1.125 0.143 7.89 4.4e-07 ***x2 1.971 0.254 7.75 5.6e-07 ***

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!