10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Goodness of fit 241This gives the total discrepancy between the observed and the expected. We use thesquare as ∑Y i− np i =0. This sum gets larger when a category is larger or smaller thanexpected. So a larger-than-expected value contributes, and any correlated smaller-thanexpectedvalues do, too. As usual, we scale this by the right amount to yield a test statisticwith a known distribution. In this case, each term is divided by the expected amount,producing Pearson’s chi-squared statistic (written using the Greek letter chi):(9.1)Figure 9.1 Simulation of χ 2 statisticwith n=20 and probabilities 3/12,4/12, and 5/12. The chi-squareddensity with 2 degrees of freedom isadded.If the multinomial model is correct, then the asymptotic distribution of Y i is known to bethe chi-squared distribution with k−1 degrees of freedom. The number of degrees offreedom coincides with the number of free ways we can specify the values <strong>for</strong> p i in thenull hypothesis. We are free to choose k−1 of the values but not k, as the values must sumto 1.The chi-squared distribution is a good fit if the expected cell counts are all five ormore. Figure 9.1 shows a simulation and a histogram of the corresponding χ 2 statistic,along with a theoretical density.<strong>Using</strong> this statistic as a test statistic allows us to construct a significance test. Largervalues are now considered more extreme, as they imply more discrepancy from thepredicted amount.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!