10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Using</strong> R <strong>for</strong> introductory statistics 124A model <strong>for</strong>mula of the type numeric ~ factor represents the statistical modelThat is, <strong>for</strong> each level i of the factor, there is a sample,with mean described by µ i .As this model says something about the means of the different samples, multipleboxplots are useful <strong>for</strong> viewing the data. A boxplot allows us to compare the medians,which are basically the mean if the data is not skewed. Consequently, it is the plot madewhen the plot () function encounters such a model <strong>for</strong>mula. That is, if x is a numeric datavector and f a factor indicating which group the corresponding element of x belongs to,then the command plot (x ~ f) will create side-by-side boxplots of values of x split up bythe levels of f.For example, Figure 4.6 could also have been made with the commands> plot(gestation ~ factor(inc), data=babies,varwidth=TRUE,+ subset = gestation != 999 & inc !=98,+ xlab="income level", ylab="gestation (days)")The function factor() explicitly makes inc a factor and not a numeric data vector.Otherwise, the arguments are identical to those to the boxplot() function that createdFigure 4.6.4.3.3 Creating contingency tables with xtabs()We saw in Example 4.1 how to make a three-way contingency table using the table()function starting with raw data. What if we had only the count data? How could we enterit in to make a three-way contingency table? We can enter the data in as a data frame andthen use xtabs() to create contingency tables. The xtabs() function offers a <strong>for</strong>mulainterface as an alternative to table().The function as.data.frame() inverts what xtabs() and table() do. It will create a dataframe with all possible combinations of the variable levels and a count of frequencieswhen called on a contingency table.■ Example 4.7: Seat-belt usage factors The three-way table in Table 4.5 showspercentages of seat-belt usage <strong>for</strong> two years, broken down by type of law en<strong>for</strong>cementand type of car. Law en<strong>for</strong>cement is primary if a driver can be pulled over and ticketed<strong>for</strong> not wearing a seat belt, and secondary if a driver can be ticketed <strong>for</strong> this offense onlyif pulled over <strong>for</strong> another infraction. This data comes from a summary of the 2002Moving Traffic Study as part of NOPUS (http://www.nhtsa.gov/), which identified thesetwo factors as the primary factors in determining seat-belt usage.We show how to enter the data into R using a data frame, and from thereTable 4.5 Seat-belt data by type of law andvehicleEn<strong>for</strong>cement primary secondaryYear 2001 2002 2001 2002Car typepassenger 71 82 71 71

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!