10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Using</strong> R <strong>for</strong> introductory statistics 172> res.5 = c();res.05 = c()> <strong>for</strong>(i in 1:500) {+ res.5[i] = first.success(0.5)+ res.05[i] = first.success(0.05)+ }> summary(res.5)Min. 1st Qu. Median Mean 3rd Qu. Max.1.00 1.00 1.00 2.01 2.00 11.00> summary(res.05)Min. 1st Qu. Median Mean 3rd Qu. Max.1.0 6.0 13.0 20.1 28.0 120.0From the output of summary() it appears that the sampling distribution has mean 2=1/0.5and 20=1/0.05 respectively. For any p in [0, 1] the mean of the geometric distribution is1/p.6.6 Bootstrap samplesThe basic idea of a bootstrap sample is to sample with replacement from the data,thereby creating a new random sample of the same size as the original. For this randomsample the value of the statistic is computed. Call this a replicate. This process isrepeated to get the sampling distribution of the replicates. From this, inferences are madeabout the unknown parameters.For example, we can estimate µ with the bootstrap. Let the replicate, be the samplemean of the i th bootstrap sample. We estimate µ with the sample mean of thesereplicates. In doing so, we get an estimate <strong>for</strong> the population parameter and a sense of thevariation in the estimate.■ Example 6.3: Albatross by catchThe by catch (<strong>Using</strong>R) data set † contains the number of albatross incidentally caught bysquid fishers <strong>for</strong> 897 hauls of fishing nets, as measured by an observer program.We wish to investigate the number of albatross caught. We can summarize this withthe sample mean, but to get an idea of the underlying distribution of the sample mean, wegenerate 1,000 bootstrap samples and look at their means.First, the data in by catch (<strong>Using</strong>R) is summarized <strong>for</strong> compactness. We expand it toinclude all 897 hauls.> data(bycatch)> hauls = with(bycatch, rep(no.albatross,no.hauls))> n = length(hauls)Now n is 897, and hauls is a data vector containing the number of albatross caught oneach of the 897 hauls. A histogram shows a skewed distribution. Usually, none arecaught, but occasionally many are. As the data is skewed, we know the sample mean canbe a poor predictor of the center. So we create 1,000 bootstrap samples as follows, usingsample().

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!