10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Using</strong> R <strong>for</strong> introductory statistics 248can still be used as a test statistic after we have estimated each p ij in order to compute the“expected” counts. Again we use the data and the assumptions to estimate the p ij .Basically, the data is used to estimate the marginal probabilities, and the assumption ofindependence allows us to estimate the p ij from there.Table 9.6 Seat-belt usage in Cali<strong>for</strong>nia withmarginal distributionsChildParent buckled unbuckled marginalbuckled 56 8 64unbuckled 2 16 18marginal 58 24 82The marginal probabilities are estimated by the marginal distributions of the data. For ourexample these are given in Table 9.6. The estimate <strong>for</strong>isand <strong>for</strong> it is Similarly, <strong>for</strong> we have andAs usual, we’ve used a “hat” <strong>for</strong> estimated values. With these estimates, wecan use the relationship to find the estimate For our seat-belt datawe have the estimates in Table 9.7. In order to show where the values comes from, thevalues have not been simplified.Table 9.7 Seat-belt usage in Cali<strong>for</strong>nia withestimates <strong>for</strong> the corresponding p ijParentbuckledunbuckledChildbuckled unbuckled marginalmarginal 1With this table we can compute the expected amounts in the ijth cell with This isoften written R i C j /n, where R i is the row sum and C i the column sum, as this simplifiescomputations by hand.With the expected amounts now known, we <strong>for</strong>m the χ 2 statistic as:(92)Under the hypothesis of multinomial data and the independence of the variables, thesampling distribution of χ 2 will be the chi-squared distribution with (n r −1)·(n c −1) degreesof freedom. Why this many? For the row variable we have n r −1 unspecified values thatthe marginal probabilities can take (not n r , as they sum to 1) and similarly <strong>for</strong> the columnvariable. Thus there are (n r −1)·(n c −1) unspecified values.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!