10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Describing populations 1435.1.3 Sampling from a populationOur probability model <strong>for</strong> a data point is that it is an observation of a random variablewhose distribution describes the parent population. To per<strong>for</strong>m statistical inference abouta parent population, we desire a sample from the population. That is, a sequence ofrandom variables X 1 ,X 2 ,…, X n . A sequence is identically distributed if each randomvariable has the same distribution. A sequence is independent if knowing the value ofsome of the random variables does not give additional in<strong>for</strong>mation about the distributionof the others. A sequence that is both independent and identically distributed is called ani.i.d. sequence, or a random sample.Toss a coin n times. If we let X i be 1 <strong>for</strong> a heads on the ith coin toss and 0 otherwise,then clearly X 1 , X 2 , …, X n is an i.i.d. sequence. For the spinner analogy of generatingdiscrete random variables, the different numbers will be i.i.d. if the spinner is spun sohard each time that it <strong>for</strong>gets where it started and is equally likely to stop at any angle.If we get our random numbers by randomly selecting from a finite population, then thevalues will be independent if the sampling is done with replacement. This might seemcounterintuitive, as there is a chance a member is selected more than once, so the valuesseem dependent. However, the distribution of a future observation is not changed byknowing a previous observation.Random samples generated by sample()The sample() function will take samples of size n from a discrete distribution byspecifying size=n. The sample will be done with replacement if we specifyreplace=TRUE. This is important if we want to produce an i.i.d. sample. The default is tosample without replacement.## toss a coin 10 times. Heads=1, tails=0> sample(0:1,size=10,replace=TRUE)[1] 0 0 1 1 1 1 1 0 1 0> sampled:6,size=10,replace=TRUE) ## roll a die 10times[1] 1422214644## sum of dice roll 10 times> sampled: 6, size=10,replace=TRUE) + sampled : 6,size=10,replace=TRUE)[1] 7 7 7 9 12 4 7 9 5 4■ Example 5.4: Public-opinion polls as random samplesThe goal of a public-opinion poll is to find the proportion of a target population thatshares a given attitude. This is achieved by selecting a sample from the target populationand finding the sample proportion who have the given attitude. A public-opinion poll canbe thought of as a random sample from a target population if each person polled israndomly chosen from the entire population with replacement. Assume we know that thetarget population of 10,000 people has 6,200 that would answer “yes” to our surveyquestion. Then a sample of size 10 could be generated by> sample(rep(0:1,c(3200,6800)),size=10,replace=T)[1] 1 0 1 0 1 1 1 1 1 0

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!