10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Multivariate data 121We can apply functions to lists as well as matrices with the lapply () function or its userfriendlyversion, sapply (). Either will apply a function to each top-level component of alist or the entries of a vector. The lapply () function will return a list, whereas sapply ()will simplify the results into a vector or matrix when appropriate.For example, since a data frame is also a list, the median of each variable above couldhave been found with> sapply(df,median)AA CO DL HP NW TW UA US16.05 18.15 15.50 18.95 14.55 15.65 16.45 14.45(Compare this to the output of lapply (df, median).)4.7 Use the data set mtcars.4.2.5 Problems1. Sort the data set by weight, heaviest first.2. Which car gets the best mileage (largest mpg)? Which gets the worst?3. The cars in rows c(1:3, 8:14, 18:21, 26:28, 30:32) were imported into the UnitedStates. Compare the variable mpg <strong>for</strong> imported and domestic cars using a boxplot. Isthere a difference?4. Make a scatterplot of weight, wt, versus miles per gallon, mpg. Label the pointsaccording to the number of cylinders, cyl. Describe any trends.4.8 The data set cfb (<strong>Using</strong>R) contains consumer finance data <strong>for</strong> 1,000 consumers.Create a data frame consisting of just those consumers with positive INCOME andnegative NETWORTH. What is its size?4.9 The data set hall, fame (<strong>Using</strong>R) contains numerous baseball statistics, includingHall of Fame status, <strong>for</strong> 1,034 players.1. Make a histogram of the number of home runs hit (HR).2. Extract a data frame containing at bats (AB), hits (hits), home runs (HR), and runsbatted in (RBI) <strong>for</strong> all players who are in the Hall of Fame. (The latter can be foundwith Hall.Fame.Membership!="not a member".) Save the data into the data frame hf.3. For the new data frame, hf, make four boxplots using the command:boxplot(lapply(hf,scale))(The scale() function allows all four variables to be compared easily.)Which of the four variables has the most skew?Use matrix notation, list notation, or the subset() function to do the above.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!