10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Using</strong> R <strong>for</strong> introductory statistics 142Figure 5.3 Shaded areas can bebroken into pieces and manipulated.This illustrates P(a < X ≤ b)=P(X≤b)- P(X ≤ a).For example, the uni<strong>for</strong>m distribution on [0, 1] has density f(x)=1 on the interval [0, 1]and is 0 otherwise. Let X be a random variable with this density.Then P(X≤b)=b if 0≤b≤1, as the specified area is a rectangle with length b and height1. As well, P(X>b)=1−b <strong>for</strong> the same reason. Clearly, we have P(X≤b)=1−P(X>b).The p.d.f. and c.d.f.For a discrete random variable it is common to define a function f(k) by f(k)= P(X=k).Similarly, <strong>for</strong> a continuous random variable X, it is common to denote the density of X byf(x). Both usages are called p.d.f.’s. For the discrete case, p.d.f. stands <strong>for</strong> probabilitydistribution function, and <strong>for</strong> the continuous case, probability density function. Thecumulative distribution function, c.d.f., is F(b)=P(X≤b). In the discrete case this is givenby ∑ k≤b P(X=k), and in the continuous case it is the area to the left of b under the densityf(x).The mean and standard deviation of a continuous random variableThe concepts of the mean and standard deviation apply <strong>for</strong> continuous random variables,although their definitions require calculus. The intuitive notion <strong>for</strong> the mean of X is that itis the balancing point <strong>for</strong> the density of X. The notation µ or E(X) is used <strong>for</strong> the mean,and σ or SD(X) is used <strong>for</strong> the standard deviation.If X has a uni<strong>for</strong>m distribution on [0, 1], then the mean is 1/2. This is clearly thebalancing point of the graph of the density, which is constant on the interval. Thevariance can be calculated to be 1/12, so σ is about .289.Quantiles of a continuous random variableThe quantiles of a data set roughly split the data by proportions. Let X be a continuousrandom variable with positive density. Referring to Figure 5.2, we see that <strong>for</strong> any givenarea between a and 1 there is a b <strong>for</strong> which the area to the right of b under f is the desiredamount. That is, <strong>for</strong> each p in [0, 1] there is a b such that P(X≤b)=p. This defines the p-quantile or 100· p percentile of X. The quantile function is inverse to the c.d.f., as itreturns the x value <strong>for</strong> a given area, whereas the c.d.f. returns the area <strong>for</strong> a given x value.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!