10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Using</strong> R <strong>for</strong> introductory statistics 94> plot(weight ~ I(height^2), data=kid.weights)> res = lm(weight ~ I(height^2), data=kid.weights)> abline(res)3.4.4 Interacting with a scatterplotWhen looking at a scatterplot we see the trend of the data as well as individual datapoints. If one of these data points stands out, how can it be identified? Which index in thedata set corresponds to it? What are its x-and y-coordinates? If the data set is small, theanswers can be identified by visual inspection of the data. For larger data sets, bettermethods are available.The R function to identify points on a scatterplot by their corresponding index isidentify(). A template <strong>for</strong> its usage isidentify (x, y, labels=…,n=…)In order to work, identify () must know about the points we want to identify. These arespecified as variables and not as a model <strong>for</strong>mula. The value n= specifies the number ofpoints to identify. By default, identify() identifies points with each mouse click untilinstructed to stop. (This varies from system to system. Typically it’s a right-click inWindows, a middle-click in Linux, and the escape key in Mac OS X.) As points areidentified, R will put the index of the point next to it. The argument labels= allows <strong>for</strong> theplacement of other text. The identify() function returns the indices of the selected points.For example, if our plot is made with plot(x,y), identify(x,y,n=1) will identify theclosest point to our first mouse click on the scatterplot by its index, whereas identify (x,y,labels=names(x)) will let us identify as many points as we want, labeling them by thenames of x.The function locator() will locate the (x, y) coordinates of the points we select with ourmouse. It is called with the number of points desired, as with locator (2). The return valueis a list containing two data vectors, x and y, holding the x and y positions of the selectedpoints.■ Example 3.7: Florida 2000 The florida (<strong>Using</strong>R) data set contains county-bycountyvote counts <strong>for</strong> the 2000 United States presidential election in the state of Florida.This election was extremely close and was marred by several technical issues, such aspoorly designed ballots and outdated voting equipment. As an academic exercise only,we might try to correct <strong>for</strong> one of these issues statistically in an attempt to divine the trueintent of the voters.As both Pat Buchanan and George Bush were conservative candidates (Bush was theRepublican and Buchanan was an Independent), there should be some relationshipbetween the number of votes <strong>for</strong> Buchanan and those <strong>for</strong> Bush. A scatterplot (Figure3.13) is illuminating. There are two outliers. We identify the outliers as follows:> plot(BUCHANAN ~ BUSH, data=florida)> res = 1m(BUCHANAN ~ BUSH, data=florida) # storeit> abline(res)> with(florida,identify(BUSH,BUCHANAN,n=2,labels=County))

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!