10.07.2015 Views

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

Using R for Introductory Statistics : John Verzani

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Using</strong> R <strong>for</strong> introductory statistics 272plot in lower right showscorrelations in error terms.Assessing the linear model <strong>for</strong> the meanA scatterplot of the data with the regression line can show quickly whether the linearmodel seems appropriate <strong>for</strong> the data. If the general trend is not linear, either atrans<strong>for</strong>mation or a different model is called <strong>for</strong>. An example of a cyclical trend (whichcalls <strong>for</strong> a trans<strong>for</strong>mation of the data) is the upper-left plot in Figure 10.3 and is madewith these commands:x = rep(1:10,4)y = rnorm(40, mean=5*sin(x), sd=1)plot(y ~ x); abline(lm(y ~ x))When there is more than one predictor variable, a scatterplot will not be as useful.A residual plot can also show whether the linear model is appropriate and can be madewith more than one predictor. As well, it can detect small deviations from the model thatmay not show up in a scatterplot. The upper-right plot in Figure 10.3 shows a residualplot that finds a sinusoidal trend that will not show up in a scatterplot. It was simulatedwith these commands:> x = rep(1:10,4)> y = rnorm(40,mean = x + .05*sin(x),sd=.01) # smalltrend> res = lm(y~x)> plot(fitted(res),resid(res))The residual plot is one of the four diagnostic plots produced by plot ().Assessing normality of the residualsThe residuals are used to assess whether the error terms in the model are normallydistributed. Although a histogram can be used to investigate normality, we’ve seen thatthe quantile-normal plot is better at visualizing differences from normality. Deviationsfrom a straight line indicate nonnormality. Quantile-normal plots are made with qqnorm(). One of the diagnostic plots produced by plot () is a quantile-normal plot of thestandardized residuals.In addition to normality, an assumption of the model is also that the error terms have acommon variance. A residual plot can show whether this is the case. When it is, theresiduals show scatter about a horizontal line. In many data sets, the variance increases<strong>for</strong> larger values of the predictor. The commands below create a simulation of this. Thegraph showing the effect is in the lower-left of Figure 10.3.> x = rep(1:10,4)> y = rnorm(40, mean = 1 + 1/2*x, sd = x/10)> res = lm(y ~ x)> plot(fitted(res),resid(res))

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!