10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Using</strong> R <strong>for</strong> introductory statistics 126van/suv 70 73The ftable() command can again be used to flatten the tables. We specify the columnvariables to make the output look like Table 4.5.> ftable(tab, col.vars=c("en<strong>for</strong>cement","year"))en<strong>for</strong>cement primary secondaryyear 2001 2002 2001 2002carpassenger 71 82 71 71pickup 70 71 50 55van/suv 79 83 70 734.3.4 Manipulating data frames: split () and stack ()When a <strong>for</strong>mula interface isn’t available <strong>for</strong> a function, the split() function can be used tosplit up a variable by the levels of some factor. If x stores the data and f is a factorindicating which sample the respective data values belong to, then the command split(x,f) will return a list with top-level components containing the values of x corresponding tothe levels of f. For example, the command boxplot(split(x,f)) produces the same result asthe command boxplot(x ~ f).Applying a single function to each component of the list returned by split() may alsobe done with the tapply() function. The basic <strong>for</strong>mat istapply (x, f, function)The function can be a named function, such as mean, or one we define ourselves, as willbe discussed in Chapter 6.Inverse to split() is stack(), which takes a collection of variables stored in a data frameor list and stacks them into two variables. One contains the values, and the other indicateswhich variable the data originally came from. The function unstack() reverses thisprocess and is similar to split(), except that it returns a data frame (and not a list), ifpossible.For example, the data set cancer (<strong>Using</strong>R) contains survival times <strong>for</strong> different typesof cancer. The data is stored in a list, not a data frame, as the samples do not have thesame length. We can create a data object, <strong>for</strong> which the model <strong>for</strong>mula will work usingstack().> cancer$stomach[1] 124 42 25 45 412 51 1112 46 103 876 146340[13] 396…$breast[1] 1235 24 1581 1166 40 727 3808 791 1804 3460 719> stack(cancer)values ind1 124 stomach

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!