10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Multivariate data 105Figure 4.1 Taxi-in-and-out times atNewark Liberty airportFrom the boxplots we see that the second airline (Continental) appears to be the worst, asthe minimum, maximum, and median amount of time are all relatively large. However,the fourth (America West) has the largest median.The boxplot() function will make a boxplot <strong>for</strong> each data vector it is called with. Thisis straight<strong>for</strong>ward to use but has many limitations. It is tedious and prone to errors, asmuch needs to be typed. As well, adding names to the boxplots must be done separately.More importantly, it is a chore to do other things with the data. For example, the variableinorout indicates whether the time is <strong>for</strong> taxi in or taxi out. Taxi-in times should all beabout the same, as airplanes usually land and go to their assigned gate with little delay.Taxi-out times are more likely to vary, as the queue to take off varies in length dependingon the time of day. In the next section we see how to manipulate data to view thisdifference.4.1.3 Comparing relationshipsScatterplots are used to investigate relationships between two variables. They can also beused when there are more than two variables. We can make multiple scatterplots, or plotmultiple relationships on the same scatterplot using different plot characters or colors todistinguish the variables.■ Example 4.3: Birth variables The data set babies (<strong>Using</strong>R) contains severalvariables collected from new mothers as part of a study on child health and development.The variables include gestation period, maternal age, whether and how much the mothersmokes, and other factors, such as mother’s level of education. In all, there are 23variables.R has a built-in function <strong>for</strong> creating scatterplots of all possible pairs of variables. Thisgraphic is called a scatterplot matrix and is made with the pairs() function, as in pairs(babies). For the babies data set this command will create over 500 graphs, as there are somany variables. We hold off on using pairs() until we see how to extract subsets of thevariables in a data frame.We can still explore the data with scatterplots, using different colors or plottingcharacters to mark the points based on in<strong>for</strong>mation from other factors. In this way, wecan see more than two variables at once. For example, the plot of gestation versus weight

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!