10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Univariate data 63Figure 2.17 Amount of equity invehicles2.3.3 BoxplotsA histogram with a density is a good plot <strong>for</strong> visually finding the center, the spread, thetails, and the shape of a distribution. However, it doesn’t work so well to comparedistributions, as histograms are hard to read when overlapped and take up too much spacewhen stacked on top of each other. We will use layered densityplots in the sequel instead.But this too, works well only <strong>for</strong> a handful of data sets at once. A clever diagram <strong>for</strong>presenting just enough in<strong>for</strong>mation to see the center, spread, skew, and length of tails in adata set is the boxplot or box-and-whisker plot. This graphic allows us to compare manydistributions in one figure.A boxplot graphically displays the five-number summary, which contains theminimum, the lower hinge, the median, the upper hinge, and the maximum. (The hingesgive essentially the same in<strong>for</strong>mation as the quartiles.) The choice of hinges over thequartiles was made by <strong>John</strong> Tukey, who invented the boxplot.To show spread, a box is drawn with side length stretching between the two hinges.This length is basically the IQR. The center is illustrated by marking the median with aline through the box. The range is shown with whiskers. In the simplest case, these aredrawn extending from the box to the minimum and maximum data values. Anotherconvention is to make the length of the whiskers no longer than 1.5 times the length ofthe box. Data values that aren’t contained in this range are marked separately with points.Symmetry of the distribution is reflected in symmetry of the boxplot in both thelocation of the median within the box and the lengths of the two whiskers.■ Example 2.10: All-time gross movie sales Figure 2.18 shows a boxplot of theGross variable in the data set alltime .movies (<strong>Using</strong>R). This records the gross domestic(U.S.) ticket sales <strong>for</strong> the top 79 movies of all time. The mini-mum, lower hinge, median,upper hinge, and maximum are marked. In addition, the upper whisker extends from theupper hinge to the largest data point that is less than 1.5 times the H-spread plus the upperhinge. Points larger than this are marked separately, including the one corresponding tothe maximum. This boxplot shows a data set that is skewed right. It has a long right tailand short left tail.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!