10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Goodness of fit 253Table 9.10 Accidents by injury level and seatbeltusageInjury levelnone minimal minor majorSeat belt yes 12,813 647 359 42no 65,963 4,000 2,642 3039.13 The air quality data set contains measurements of air quality in New York City. Wewish to see if ozone levels are independent of temperature. First we gather the data, usingcomplete. cases () to remove missing data from our data set.> aq = airquality[complete.cases(airquality),]> attach(aq)> te = cut(Temp, quantile(Temp))> oz = cut(Ozone,quantile(Ozone))Per<strong>for</strong>m a chi-squared test of independence on the two variables te and oz. Does the datasupport an assumption of independence?9.14 In an ef<strong>for</strong>t to increase student retention, many colleges have tried blockprograms. Assume that 100 students are broken into two groups of 50 at random. Fiftyare in a block program; the others are not. The number of years each student attends thecollege is then measured. We wish to test whether the block program makes a differencein retention. The data is recorded in Table 9.11. Per<strong>for</strong>m a chi-squared test of significanceto investigate whether the distributions are homogeneous.Table 9.11 Retention data by year and programProgram 1 year 2 year 3 year 4year 5+ yearsnonblock 18 15 5 8 4block 10 5 7 18 109.15 The data set oral.lesion (<strong>Using</strong>R) contains data on location of an oral lesion <strong>for</strong> threegeographic locations. This data set appears in an article by Mehta and Patel aboutdifferences in p-values in tests <strong>for</strong> independence when the exact or asymptoticdistributions are used. Compare the p-values found by chisq.test() when the asymptoticdistribution of the sampling distribution is used to find the p-value and when a simulatedvalue is used. Are the p-values similar? If not, which do you think is more accurate?Why?9.3 Goodness-of-fit tests <strong>for</strong> continuous distributionsWhen finding confidence intervals <strong>for</strong> a sample we were concerned about whether or notthe data was sampled from a normal distribution. To investigate, we made a quantile plotor histogram and eyeballed the result. In this section, we see how to compare acontinuous distribution with a theoretical one using a significance test.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!