10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Using</strong> R <strong>for</strong> introductory statistics 92[1] 185997The residual can then be computed by subtraction:> 130200—predict(res, data.frame(y1970=55100))[1] −55797The residual is also returned by residuals() after finding out which index corresponds tothe data point:> residuals(res)[which(y1970 == 55100 & y2000 ==130200)]6688−55797We needed both conditions, as there are two homes with an assessed value of $55,100 in1970.More on model <strong>for</strong>mulas Model <strong>for</strong>mulas can be used with many R functions—<strong>for</strong>instance, the plot() function. The plot() function is an example of a generic function in R.For these functions, different implementations are used based on the first argument.When the first argument of the plot() function is a model <strong>for</strong>mula containing numericpredictor and response variables, a scatterplot is created. Previously, we’ve seen thatwhen the argument is the output of the density() function a densityplot is produced. Otherusages will be introduced in the sequel. The scatterplot and regression line could then bemade as follows:> plot(y2000 ~ y1970)> res = lm(y2000 ~ y1970)> abline(res)A small advantage to this usage is that the typing can be reused with the historymechanism. This could also be achieved by saving the model <strong>for</strong>mula to a variable.More importantly, the model <strong>for</strong>mula offers some additional flexibility. With model<strong>for</strong>mula, the argument data= can usually be used to attach a data frame temporarily. Thisconvenience is similar to that offered more generally by the function with(). Both stylesprovide an environment where R can reference the variables within a data frame by theirname, avoiding the trouble of attaching and detaching the data frame. Equally useful isthe argument subset=, which can be used to restrict the rows that are used in the data.This argument can be specified by a logical condition or a specification of indices.We will use both of these arguments in the upcoming examples.3.4.3 Trans<strong>for</strong>mations of the dataAs the old adage goes, “If all you have is a hammer, everything looks like a nail.” Thelinear model is a hammer of sorts; we often try to make the problem at hand fit themodel. As such, it sometimes makes sense to trans<strong>for</strong>m the data to make the linear modelappropriate.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!