10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Using</strong> R <strong>for</strong> introductory statistics 188When n is not large, T will also be of value when the population <strong>for</strong> the randomsample is normally distributed. In this case, the sampling distribution of T is the t-distribution with n−1 degrees of freedom. The t-distribution is a symmetric, bell-shapeddistribution that asymptotically approaches the standard normal distribution but <strong>for</strong> smalln has fatter tails. The degrees of freedom, n−1, is a parameter <strong>for</strong> this distribution the waythe mean and standard deviation are <strong>for</strong> the normal distribution. Figure 7.4 shows theresults of a simulation of T <strong>for</strong> n=5. The figures show that T, with 5 degrees of freedom,is long tailed compared to the normal distribution.Confidence intervals <strong>for</strong> the meanLet X 1 ,X 2 , …, X n be a random sample from a population with mean µ and variance σ 2 . Letbe the sample mean, andIf n is small and the population is Normal(µ,σ), then a (1−α) 100% confidence interval<strong>for</strong> µ is given bywhere t* is related to α through the t-distribution with n−1 degrees of freedom byP(−t*≤T n −1≤t*)=1−α.For unsummarized data, the function t. test () will compute the confidence intervals. Atemplate <strong>for</strong> its usage ist.test(x, conf.level=0.95)The data is stored in a data vector (named x above) and the confidence level isspecified with conf. level=.If n is large enough <strong>for</strong> the central limit theorem to apply to the sampling distributionof T, then a (1−α) 100% confidence interval <strong>for</strong> µ is given bywhere z* is related to α byP(−z≤Z≤z*)1−α.Finding t* with R Computing the value of t* (also called t α/2, k) <strong>for</strong> a given α and viceversa is done in a manner similar to finding z*, except that a different density is used. AsR is a consistent language, changing to a new density requires nothing more than usingthe proper family name—t, <strong>for</strong> the t-distribution, and norm <strong>for</strong> the normal—andspecifying the parameter values. In particular, if n is the sample size, then the two arerelated as follows:> tstar = qt(1 − alpha/2,df=n−1)> alpha = 2*pt(−tstar, df=n−1)By way of contrast, <strong>for</strong> z* the corresponding commands are

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!