10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Using</strong> R <strong>for</strong> introductory statistics 40> table(npdb$ID)create a table of malpractice awards <strong>for</strong> each of the 6,369 doctors. What does thecommand table (table (ID)) do, and why is this interesting?2.10 The data set MLBattend (<strong>Using</strong>R) contains attendance in<strong>for</strong>mation <strong>for</strong> majorleague baseball between 1969 and 2000. The following commands will extract just thewins <strong>for</strong> the New York Yankees, in chronological order.> attach(MLBattend)> wins[franchise == "NYA"][1] 80 93 82 79 80 89 83 97 100 100 89 10359 79 91…> detach(MLBattend) # tidy upAdd the names 1969:2000 to your variable. Then make a barplot and dot chart showingthis data in chronological order.2.2 Numeric dataFor univariate data, we want to understand the distribution of the data. What is the rangeof the data? What is the central tendency? How spread out are the values? We can answerthese questions graphically or numerically. In this section, we’ll see how. The familiarmean and standard deviation will be introduced, as will the p th quantile, which extendsthe idea of a median to measure position in a data set.2.2.1 Stem-and-leafplotsIf we run across a data set, the first thing we should do is organize the data so that a senseof the values becomes more clear. A useful way to do so <strong>for</strong> a relatively small data set iswith a stem-and-leaf plot. This is a way to code a set of numeric values that minimizeswriting and gives a fairly clear idea of what the data is, in terms of its range anddistribution. For each data point only a single digit is recorded, making a compactdisplay. These digits are the “leaves.” The stem is the part of the data value to the left ofthe leaf.To illustrate, we have the following data <strong>for</strong> the number of points scored in a game byeach member of a basketball team:2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5The stem in this case would naturally be the 10s digit. A number like 23 would be writtenas a 2 <strong>for</strong> the stem and a 3 <strong>for</strong> the leaf. The results are tabulated as shown below in theoutput of stem().> x = scan()1:2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!