10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Goodness of fit 2459.3 A package of M&M candies is filled from batches that contain a specifiedpercentage of each of six colors. These percentages are given in the mandms (<strong>Using</strong>R)data set. Assume a package of candies contains the following color distribution: 15 blue,34 brown, 7 green, 19 orange, 29 red, and 24 yellow. Per<strong>for</strong>m a chi-squared test with thenull hypothesis that the candies are from a milk chocolate package. Repeat assuming thecandies are from a Peanut package. Based on the p-values, which would you suspect isthe true source of the candies?9.4 The pi2000 (<strong>Using</strong>R) data set contains the first 2,000 digits of π. Per<strong>for</strong>m a chisquaredsignificance test to see if the digits appear with equal probability.9.5 A simple trick <strong>for</strong> determining what language a document is written in is tocompare the letter distributions (e.g., the number of z’s) to the known proportions <strong>for</strong> alanguage. For these proportions, we use the familiar letter frequencies given in thefrequencies variable of the scrabble (<strong>Using</strong>R) data set. These are an okay approximationto those in the English language.For simplicity (see ?scrabble <strong>for</strong> more details), we focus on the vowel distribution of aparagraph from R’s webpage appearing below. The counts and Scrabble frequencies aregiven in Table 9.3.R is a language and environment <strong>for</strong> statistical computing and graphics. Itis a GNU project which is similar to the S language and environmentwhich was developed at Bell Laboratories (<strong>for</strong>merly AT&T, now LucentTechnologies) by <strong>John</strong> Chambers and colleagues. R can be considered asa different implementation of S. There are some important differences, butmuch code written <strong>for</strong> S runs unaltered under R.Table 9.3 Vowel distribution and Scrabblefrequencya e i o ucount 28 39 23 22 11Scrabble frequency 9 12 9 8 4Per<strong>for</strong>m a chi-squared goodness-of-fit test to see whether the distribution of vowelsappears to be from English.9.6 The names of common stars are typically Greek or Arab in derivation. The bright.stars (<strong>Using</strong>R) data set contains 96 names of common stars. Per<strong>for</strong>m a significance teston the letter distribution to see whether they could be mistaken <strong>for</strong> English words.The letter distribution can be found with:> all.names = paste(bright.stars$name, sep="",collapse="")> x = unlist(strsplit(tolower(all.names), ""))> letter.dist = sapply(letters, function(i) sum(x ==i))The English-letter frequency is found using the scrabble (<strong>Using</strong>R) data set with:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!