10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Using</strong> R <strong>for</strong> introductory statistics 288■ Example 10.4: Predicting classroom per<strong>for</strong>mance College admissions offices arefaced with the problem of predicting future per<strong>for</strong>mance based on a collection ofmeasures, such as grade-point average and standardized test scores. These values may becorrelated. There may also be other variables that describe why a student does well, suchas type of high school attended or student’s work ethic.Initial student placement is also a big issue. If a student does not place into the rightclass, he may become bored and leave the school. Successful placement is key toretention. For New York City high school graduates, available at time of placement areSAT scores and Regents Exam scores. High school grade-point average may beunreliable or unavailable.The data set stud. recs (<strong>Using</strong>R) contains test scores and initial grades in a math class<strong>for</strong> several randomly selected students. What can we predict about the initial grade basedon the standardized scores?An initial model might be to fit a linear model <strong>for</strong> grade with all the other termsincluded. Other restricted models might be appropriate. For example, are the verbal SATscores useful in predicting grade per<strong>for</strong>mance in a future math class?10.3.2 Fitting the multiple regression model using lm()As seen previously, the method of least squares is used to estimate the parameters in themultiple regression model. We don’t give <strong>for</strong>mulas <strong>for</strong> computing the but note that,since there are p+1 estimated parameters, the estimate <strong>for</strong> the variance changes toTo find these estimates in R, again the lm () function is used. The syntax <strong>for</strong> the model<strong>for</strong>mula varies depending on the type of terms in the model. For these problems, weuse+to add terms to a model,—to drop terms, and I () to insulate terms so that the usualmath notations apply.For example, if x, y, and z are variables, then the following statistical models have thegiven R counterparts:Once the model is given, the lm () function follows the same <strong>for</strong>mat as be<strong>for</strong>e:lm(<strong>for</strong>mula, data=…, subset=…)To illustrate with an artificial example, we simulate the relationship zi=β 0 + β 1 x i +β 2 y i +ε iand then find the estimated coefficients:> x = 1:10; y = rchisq(10,3); z = 1 + x + y + rnorm(10)> lm(z ~ x + y)Call:lm(<strong>for</strong>mula = z ~ x + y)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!