10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Goodness of fit 255theoretical densities and cumulativedistribution functions are drawnadded to the existing plot using lines(). The following commands produced Figure 9.2:> y = rnorm(20)> plot(density(y), main="Densities”) # densities> curve(dnorm(x), add=TRUE, lty=2)> plot(ecdf(y), main="C.d.f.s”) # c.d.f.s> curve(pnorm(x), add=TRUE, lty=2)If the data is from the population with c.d.f. F, then we would expect that F n is close to Fis some way. But what does “close” mean? In this context, we have two differentfunctions of x. Define the distance between them as the largest difference they have:D=maximum in x of |F n (x)-F(x)|.The surprising thing is that with only the assumption that F is continuous, D has a knownsampling distribution called the Kolmogorov-Smirnov distribution. This is illustrated inFigure 9.3, where the sampling distribution of the statistic <strong>for</strong> n=25 is simulated <strong>for</strong>several families of random data. In each case, we see the same distribution. This factallows us to construct a significance test using the test statistic D. In addition, a similartest can be done to compare two independent samples.The Kolmogorov-Smirnov goodness-of-fit testAssume X 1 , X 2 , …, X n is an i.i.d. sample from a continuous distribution with c.d.f. F(x).Let F n (x) be the empirical c.d.f. A significance test ofH 0 : F(x)=F 0 (x), H A :F(x)≠F 0 (x)Figure 9.3 Density estimates <strong>for</strong> samplingdistribution of the Kolmogorov-Smirnov statistic

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!