10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Describing populations 1395.1.1 Discrete random variablesNumeric data can be discrete or continuous. As such, our model <strong>for</strong> data comes in thesame two flavors.Let X be a discrete random variable. The range of X is the set of all k where P(X=k]>0.The distribution of X is a specification of these probabilities. Distributions are notarbitrary, as <strong>for</strong> each k in the range, P(X=k)>0 and P(X=k)≤1. Furthermore, as X hassome value, we have ∑ k P(X=k)=1.Here are a few examples <strong>for</strong> which the distribution can be calculated.■ Example 5.1: Number of heads in two coin tosses If a coin is tossed two times wecan keep track of the outcome as a pair. (H, T), <strong>for</strong> example, denotes “heads” then “tails.”The set {(H,H), (H, T), (T,H),(T, T}} contains all possible outcomes. If X is the number ofheads, then X is either 0, 1, or 2. Intuitively, we know that <strong>for</strong> a fair coin all the outcomeshave the same probability, so P(X= 0)=1/4, P(X=1)=1/2, and P(X=2)=1/4.■ Example 5.2: Picking balls from a bag Imagine a bag with N balls, of which R arered and N—R are green. We pick a ball, note its color, replace the ball, and repeat. Let Xbe the number of red balls. As in the previous example, X is 0, 1, or 2. The probabilitythat X=2 is intuitively (R/N)·(R/N) as R/N is the probability of picking a red ball on anyone pick. The probability that X=0 is ((N−R)/N) 2 by the same reasoning, and as allprobabilities add to 1, P(X=1)=2(R/N)((N−R)/N). This specifies the distribution of X.The binomial distribution describes the result of selecting n balls, not two.The intuition that leads us to multiply two probabilities together is due to the twoevents being independent. Two events are independent if knowledge that one occursdoesn’t change the probability of the other occurring. Two events are disjoint if theycan’t both occur <strong>for</strong> a given outcome. Probabilities add with disjoint events.■ Example 5.3: Specifying a distribution We can specify the distribution of adiscrete random variable by first specifying the range of values and then assigning toeach k a number p k =P(X=k) such that ∑p k =1 and p k ≥0. To visualize a physical modelwhere this can be realized, imagine making a pie chart with areas proportional to pk,placing a spinner in the middle, and spinning. The ending position determines the valueof the random variable.Figure 5.1 shows a spike plot of a distribution and a spinner model to realize values ofX. A spike plot shows the probabilities <strong>for</strong> each value in the range of X as spikes,emphasizing the discreteness of the distribution. The spike plot is made with thefollowing commands:> k = 0:4> p=c(1,2,3,2,1)/9> plot(k,p,type="h",xlab="k",ylab="probability",ylim=c(0,max(p)))> points(k,p,pch=16,cex=2) # add the balls to topof spikeThe argument type="h" plots the vertical lines of the spike plot.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!