10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Using</strong> R <strong>for</strong> introductory statistics 314Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1‘ ’ 1Residual standard error: 3.45 on 176 degrees of freedomMultiple R-Squared: 0.355, Adjusted R-squared: 0.329F-statistic: 13.8 on 7 and 176 DF, p-value: 3.27e-14The boxplots show many differences. Are they statistically significant? We assume <strong>for</strong>now that the data is actually a collection of independent samples (rather than monthlyaverages of varying sizes) and proceed using the TukeyHSD() function.> TukeyHSD(res)Error in TukeyHSD(res) : no applicable method <strong>for</strong>“TukeyHSD"Oops, the TukeyHSD() function wants aov() to fit the linear model, notlm(). The commands are the same.> res.aov=aov(time ~ airline, data=out)> TukeyHSD(res.aov)Tukey multiple comparisons of means95% family-wise confidence levelFit: aov(<strong>for</strong>mula=time ~ airline, data=out)$airlinediff lwr uprCO-AA 3.83478 0.7093 6.96025DL-AA −2.05217 −5.1776 1.07330…US-TW −2.17826 −5.3037 0.94721US-UA −3.79130 −6.9168 −0.66583> plot(TukeyHSD(res.aov), las=2)The output of TukeyHSD() is best viewed with the plot of the confidence intervals(Figure 11.5). This is created by calling plot() on the output. The argument las=2 turnsthe tick-mark labels perpendicular to the axes.Recall the duality between confidence intervals and tests of hypothesis discussed inChapter 8. For a given confidence level and sample, if the confidence interval excludes apopulation parameter, then the two-sided significance test of the same parameter will berejected. Applying this to the Newark airport example, we see several statisticallysignificant differences at the α=.05 level, the first few being CO-AA and NW-AA (justvisible on the graph shown).11.2.3 Problems11.10 The data set MLB At tend (<strong>Using</strong>R) contains attendance data <strong>for</strong> major leaguebaseball between the years 1969 and 2000. Use 1m () to per<strong>for</strong>m a t-test on attendance<strong>for</strong> the two levels of league. Is the difference in mean attendance significant? Compareyour results to those provided by t. test ().

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!