10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Univariate data 59Figure 2.13 Frequency polygon <strong>for</strong>waiting variable of the faithful datasetThe plot() function is used to plot points. It will be discussed more thoroughly in the nextchapter. The type="1" argument to plot() is used to draw line segments between thepoints instead of plotting the points. The rug() function is used to display the data pointsusing hash marks along the x-axis. This example shows how we can use the cut() functionand the table() function to turn continuous numeric data into discrete numeric data, oreven categorical data. The output of cut() is simply the bin that the data point is in, wherebins are specified with a vector of endpoints. For example, if this vector is c (1, 3, 5) thenthe bins are (1, 3], (3, 5]. The left-most endpoint is not included by default; if the extraargument include. lowest=TRUE is given, it will be included. (We could also use theoutput from hist() to do most of this.)The frequency polygon is used to tie in the histogram with the notion of a probabilitydensity, which will be discussed when probabilities are discussed in Chapter 5. However,it is more desirable to estimate the density directly, as the frequency polygon, like thehistogram, is very dependent on the choice of bins.Estimating the density The density() function will find a density estimate from thedata. To use it, we give it the data vector and, optionally, an argument as to whatalgorithm to use. The result can be viewed with either the plot() function or the lines()function. A new graphic showing the densityplot is produced by the command plot(density(x)). The example uses lines() to add to the existing graphic.> attach(faithful)> hist(waiting, breaks="scott", prob=TRUE,main="",ylab="")> lines(density(waiting)) # add to histogram> detach(waiting) # tidy upIn Figure 2.14, the density estimate clearly shows the two peaks in this data set. It islayered on top of the histogram plotted with total area 1 (from prob=TRUE).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!