10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Multivariate data 109Make the above conversions, then make a flattened contingency table of mpg, price, andType. Do you see any patterns?4.3 In the previous exercise, variables in the data set Cars93 (MASS) wereinvestigated with tables. Now, make a scatterplot of the variables MPG. city and Price,marking the points according to their Type. Do you see any trend?4.4 For the car safety (<strong>Using</strong>R) data set, make a scatterplot of the variable Driver.deaths versus Other. deaths. Use pch=as. numeric (type) to change the plot characterbased on the value of type. Label any outliers with their make or model using identify ().Do you notice any trends?4.5 The cancer (<strong>Using</strong>R) data set contains survival times <strong>for</strong> cancer patients organizedby the type of cancer. Make side-by-side boxplots of the variables stomach, bronchus,colon, ovary, and breast. Which type has the longest tail? Which has the smallest spread?Are the centers all similar?4.6 The data set UScereal (MASS) lists facts about specific cereals sold in a UnitedStates supermarket. For this data set investigate the following:1. Is there a relationship between manufacturer (mfr), and vitamin type (vitamins byshelf location (shelf)? Do you expect one? Why?2. Look at the relationship between calories and sugars with a scatterplot. Identify theoutliers. Are these also fat-laden cereals?3. Now look at the relationship between calories and sugars with a scatterplot usingdifferent size points given by cex=2*sqrt (fat). (This is called a bubble plot. The area ofeach bubble is proportional to the value of fat.) Describe any additional trend that is seenby adding the bubbles.Can you think of any other expected relationships between the variables?4.2 R basics: data frames and listsMultivariate data consists of multiple data vectors considered as a whole. We are free towork with our data as separate variables, but there are many advantages to combiningthem into a single data object. This makes it easier to save our work, is convenient <strong>for</strong>many functions, and is much more organized. In R these “objects” are usually dataframes.A data frame is used to store rectangular grids of data. Usually each row correspondsto measurements on the same subject, statistical unit, or experimental unit. Each columnis a data vector containing data <strong>for</strong> one of the variables. The collection of entries need notall be of the same type (e.g., numeric, character, or logical), but each column mustcontain the same type of entry as they are data vectors. A rectangular collection of values,all of the same type, may also be stored in a matrix. Although data frames are notnecessarily matrices, as their values need not be numbers, they share the same methods ofaccess.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!