10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Using</strong> R <strong>for</strong> introductory statistics 542.19 The built-in data set islands contains the size of the world’s land masses thatexceed 10,000 square miles. Make a stem-and-leaf plot, then compare the mean, median,and 25% trimmed mean. Are they similar?2.20 The data set OBP (<strong>Using</strong>R) contains the on-base percentages <strong>for</strong> the 2002 majorleague baseball season. The value labeled bondsba01 contains this value <strong>for</strong> Barry Bonds.What is his z-score?2.21 For the rivers data set, use the scale() function to find the z-scores. Verify that thez-scores have sample mean() and sample standard deviation 1.2.22 The median absolute deviation is defined asmad(x)=1.4826·median(|x i -median(x)|).(2.5)This is a resistant measure of spread and is implemented in the mad () function. Explainin words what it measures. Compare the values of the sample standard deviation, IQR,and median absolute deviation <strong>for</strong> the exec.pay (<strong>Using</strong>R) data set.2.23 The data set npdb (<strong>Using</strong>R) contains malpractice-award in<strong>for</strong>mation. Thevariable amount is the size of malpractice awards in dollars. Find the mean and medianaward amount. What percentile is the mean? Can you explain why this might be the case?2.24 The data set cabinet (<strong>Using</strong>R) contains in<strong>for</strong>mation on the amount each memberof President George W.Bush’s cabinet saved due to the passing of a tax bill in 2003. Thisin<strong>for</strong>mation is stored in the variable est.tax. savings. Compare the median and the mean.Explain the difference.2.25 We may prefer the standard deviation to measure spread over the variance as theunits are the same as the mean. Some disciplines, such as ecology, prefer to have aunitless measurement of spread. The coefficient of variation is defined as the standarddeviation divided by the mean.One advantage is that the coefficient of variation matches our intuition of spread. Forexample, the numbers 1, 2, 3, 4 and 1001, 1002, 1003, 1004 have the same standarddeviation but much different coefficient of variations. Somehow, we mentally think of thelatter set of numbers as closer together.For the rivers and pi2000 (<strong>Using</strong>R) data sets, find the coefficient of variation.2.26 A lag plot of a data vector plots successive values of the data against each other.By using a lag plot, we can tell whether future values depend on previous values: if not,the graph is scattered; if so, there is often a pattern.Making a lag plot (with lag 1) is quickly done with the indexing notation of negativenumbers. For example, these commands produce a lag plot ‡ of x:> n = length(x)> plot(x[−n],x[−1])(The plot () function plots pairs of points when called with two data vectors.) Look at thelag plots of the following data sets:‡ This is better implemented in the lag.plot() function from the ts package.1. x=rnorm(100) (random data)2. x=sin(1:100) (structured data, but see plot (x))

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!