10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 6SimulationOne in<strong>for</strong>mal description of insanity is “repeating the same action while expecting adifferent result.” By this notion, the act of simulating a distribution could be consideredsomewhat insane, as it involves repeatedly sampling from a distribution and investigatingthe differences in the results. But simulating a distribution is far from insane. Simulatinga distribution can give us great insight into the distribution’s shape, its tails, its mean andvariance, etc. We’ll use simulation to justify the size of n needed in the central limittheorem <strong>for</strong> approximate normality of the sample mean. Simulation is useful with suchspecific questions, as well as with those of a more exploratory nature.In this chapter, we will develop two new computer skills. First <strong>for</strong> loops will beintroduced. These are used to repeat something again and again, such as sampling from adistribution. Then we will see how to define simple functions in R.Defining functions notonly makes <strong>for</strong> less typing; it also organizes your work and train of thought. This isindispensable when you approach larger problems.6.1 The normal approximation <strong>for</strong> the binomialWe begin with a simulation to see how big n should be <strong>for</strong> the binomial distribution to beapproximated by the normal distribution. Although we know explicitly the distribution ofthe binomial, we approach this problem by taking a random sample from this distributionto illustrate the approach of simulation.To per<strong>for</strong>m the simulation, we will take m samples from the binomial distribution <strong>for</strong>some n and p. We should take m to be some large number, so that we get a good idea ofthe underlying population the sample comes from. We will then compare our sample tothe normal distribution with µ=np, and σ 2 =np(1−p). If the sample appears to come fromthis distribution; we will say the approximation is valid.Let p=1/2. We can use the rbinom() function to generate the sample of size m. We tryn=5, 15, and 25. In Figure 6.1 we look at the samples with histograms that are overlaidwith the corresponding normal distribution.> m = 200; p = 1/2;> n = 5> res = rbinom(m,n,p) # store results> hist(res, prob=TRUE, main="n = 5") # don’t <strong>for</strong>getprob=TRUE> curve(dnorm(x, n*p, sqrt(n*p*(1-p))), add=TRUE) # adddensity### repeat last 3 commands with n=15, n=25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!