10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Data 231.4 Reading in other sources of dataTyping in data sets can be a real chore. It also provides a perfect opportunity <strong>for</strong> errors tocreep into a data set. If the data is already recorded in some <strong>for</strong>mat, it’s better to be ableto read it in. The way this is done depends on how the data is stored, as data sets may befound on web pages, as <strong>for</strong>matted text files, as spreadsheets, or built in to the R program.1.4.1 <strong>Using</strong> R’s built-in libraries and data setsR is designed to have a small code kernel, with additional functionality provided byexternal packages. A modest example would be the data sets that accompany this book.More importantly, many libraries extend R’s base functionality. Many of these comestandard with an R installation; others can be downloaded and installed from theComprehensive R Archive Network (CRAN), http://www.r-project.org/, as describedbelow and in Appendix A.Most packages are not loaded by default, as they take up computer memory that maybe in short supply. Rather, they are selectively loaded using either the library () or require() functions. For instance, the package pkgname is loaded with library (pkgname). In theWindows and Mac OS X GUIs pack-ages can be loaded using a menu bar item.In addition to new functions, many packages contain built-in data sets to provideexamples of the features the package introduces. R comes with a collection of built-indata sets in the datasets package that can be referenced by name. For example, the lynxdata set records the number of lynx trappings in Canada <strong>for</strong> some time period. Typing thedata set name will reference the values:> range(lynx) # range of values[1] 39 6991The packages not automatically loaded when R starts need to be loaded, using library(),be<strong>for</strong>e their data sets are visible. As of version 2.0.0 of R, data sets in a package may beloaded automatically when the package is. This is the case with the data sets referenced inthis text. However, a package need not support this. When this is the case, an extra stepof loading the data set using the data() command is needed. For example, to load thesurvey data set in the MASS package, could be done in this manner:library(MASS)data(survey) # redundant <strong>for</strong> versions >=2.0.0To load a data set without the overhead of loading its package the above sequence ofcommands may be abbreviated by specifying the package name to data(), as in> data(survey, package="MASS")However, this will not load in the help files, if present, <strong>for</strong> the data set or the rest of thepackage. In R, most built-in data sets are well documented, in which case we can check

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!