10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Using</strong> R <strong>for</strong> introductory statistics 382.1.5 FactorsWhere does the ordering of the categories come from in the examples on creating tables?If we looked at the help page <strong>for</strong> table() we would find that the data should beinterpretable as a “factor.” R uses factors to store categorical data.Factors are made with the function factor() or the function as. factor() and have aspecific set of values called levels().At first glance, factors appear to be similar to data vectors, but they are not. We seebelow that they are printed differently and do not have numeric values.> 1:5 # a numeric vector(integer)[1] 1 2 3 4 5> factor(1:5) # now a factor. Notelevels[1] 1 2 3 4 5Levels: 12345> mean(factor(1:5)) # factors are notnumeric[1] NAWarning message:argument is not numeric or logical: returning NA in:mean.default(factor(1:5))> letters [1:5] # a character vector[1] “a” “b” “c” “d” “e”> factor(letters[1:5]) # turned into a factor[1] a b c d eLevels: a b c d eThe initial order of the levels in a factor is determined by the sort () function. In theexample with mean() an error is returned, as factors are treated as numeric even if theirlevels are given numeric-looking values. We used letters to return a character vector. Thisbuilt-in variable contains the 26 letters a through z. The capital letters are in LETTERS.Chapter 4 has more on factors.2.1.6 Problems2.1 Find an example of a table in the media summarizing a univariate variable. Couldyou construct a potential data set that would have this table?2.2 Try to find an example in the media of a misleading barplot. Why is it misleading?Do you think it was meant to be?2.3 Find an example in the media of a pie chart. Does it do a good job of presentingthe data?2.4 Load and attach the data set central .park (<strong>Using</strong>R). The WX variable contains alist of numbers representing bad weather (e.g., 1 <strong>for</strong> fog, 3 <strong>for</strong> thunder, 8 <strong>for</strong> smoke orhaze). NA is used when none of the types occurred. Make a table of the data, then make atable with the extra argument exclude=FALSE. Why is the second table better?

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!