10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Univariate data 57> hist(waiting) # use defaults> hist(waiting,breaks=10) # suggest 10 breaks> hist(waiting,breaks=seq(43,108,length=10)) # usethese breaks> hist(waiting,breaks="scott") # use “Scott” algorithmIf these graphs are made, we will be surprised that the second histogram has more thanten bins, despite our suggestion. We directly specify the breaks as a vector of cut pointsto get exactly what is wanted. The “Sturges” algorithm is the default; “Scott” is analternative, as is “Friedman-Diaconis,” which may be abbreviated as FD.The choice to draw a histogram of frequencies or proportions is made by the argumentprobability=. By default, this is FALSE and frequencies are drawn. Setting it to TRUEwill create histograms where the total area is 1. For example, the commands> hist(waiting)> hist(waiting,prob=T) # shortenedprobability=TRUEwill create identical-looking graphs, but the y-axes will differ. We used prob=T to shortenthe typing of probability=TRUE. Although T can usually be used as a substitute <strong>for</strong>TRUE, there is no guarantee it will work, as we can assign new values to a variablenamed T.By default, R uses intervals of the type (a,b]. If we want the left-most interval to be ofthe type [a, b] (i.e., include a), we use the argument include. lowest=TRUE.■ Example 2.8: Baseball’s on-base percentage Statistical summaries are very mucha part of baseball. A common statistic is the “on-base percentage” (OBP), which indicateshow successful a player is as a batter. This “percentage” is usually given as a“proportion,” or a number between 0 and 1. The data set OBP (<strong>Using</strong>R) contains the OBP<strong>for</strong> the year 2002, according to the Sam Lahman baseball database(http://www.baseball1.com/).This command will produce the histogram in Figure 2.12.> hist(OBP,breaks="Scott",prob=TRUE,col=gray(0.9))The distribution has a single peak and is fairly symmetric, except <strong>for</strong> the one outlier onthe right side of the distribution. The outlier is Barry Bonds, who had a tremendousseason in 2002.The arguments to hist() are good ones, but not the default. They are those from thetruehist() function in the MASS package, which may be used as an alternate to hist().

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!