4.2 Least-Squares Regression

4.2 Least-Squares Regression 

1. If we feel there is a linear relationship between the two variables (the points of the 

scatter diagram cluster roughly in a straight line and the correlation coefficient is 

close to 1 or –1), how do we find the "best" line out of infinitely many that fits 

this data 

2. The criterion we will use to pick the best line is the least-squares criterion. 

This criterion is based on finding the smallest sum of all of the squared errors 

obtained when the calculated linear equation is used to predict the y data values 

(response variables) for each x variable (predictor variable). 

That is, if y is the data value (observed value) for a given value of x and y ˆ 

(read "y hat") is the value predicted from the equation for this x, then we want 

the smallest of Σ(y − y ˆ ) 2 

Note that y − ˆ y is the signed vertical distance between the data point and the 

point on the line for a given x value. It is the difference between the observed and 

the predicted values. This is called the residual. (Residual = observed y – 

predicted y.) 

3. Least-Squares Regression Criterion: (Page 198) The straight line that best fits 

a set of data points is the one having the smallest possible sum of squared errors. 

Thus we want to Minimize Σ residuals 2 

The straight line that best fits a set of data points according to the least-squares 

criterion is called the regression line. 

4. To find the equation ˆ y = b 0 

+ b 1 

x for the best-fit line using the least-square 

criterion, we will use the following formulas. 

b 1 

= r ⋅ s y 

s x 

is the slope of the least-squares regression line 

and 

b 0 

= y − b 1 

x is the y-intercept of the least-squares regression line. 

(Note that x = Σx 

Σy 

is the mean of the predictor variable, y = is the mean of the 

n n 

response variable, s x 

is the standard deviation of the predictor variable, and s y 

is

the standard deviation of the response variable. Also note that 

⎛ x i 

− x ⎞ ⎛ 

⎝ 

⎜ 

s x 

⎠ 

⎟ y i 

− y ⎞ 

∑ ⎜ 

⎝ s 

⎟ 

y ⎠ 

r = 

is the correlation coefficient for the data.) 

n −1 

5. Note that the point (x , y ) is always a point on this least-squares linear regression 

line. (This is useful when drawing this line by hand.) 

7. Because s x 

and s y 

are always positive, the sign on the slope of the leastsquares 

linear regression line is the same as the sign of the correlation 

coefficient r. 

8. The slope ( b 1 

) can be interpreted as the rate of change of the response variable, 

y, with respect to the predictor variable, x. Thus, when x increases by one unit, 

y will change by the amount of b 1 

. 

9. The y-intercept ( b 

0 

) can be interpreted as the predicted value of the response 

variable when the predictor variable is zero. This makes sense only if: 

a. The value of 0 for the predictor variable makes sense. 

b. There is an observed value of the predictor variable near 0. 

10. Never use the least-squares regression line to make predictions for values of 

the predictor variable that are much larger or much smaller than the 

observed values. (We don't even know if the data far outside our data set would 

be linear, much less whether it would follow the same line.) 

11. Cautions: 

Only use this method when the scatter diagram looks roughly linear. (Also check 

the value of r, the correlation coefficient.) 

Outliers are data points that lie far from the regression line relative to the other 

data points. They can sometimes have a significant effect on the regression 

analysis. 

12. Regression Lines in a Calculator: As was noted in the last section, the 

regression line can be found using the calculator. Follow the steps: STAT over 

to CALC ENTER (to select #4). Then enter the x-list (say L1), a comma, 

and the y-list (say L2) and ENTER. The slope and y-intercept will be given on 

2 

the screen as well as the r and r values, provided that you have turned on the 

Diagnostics.

If you want to store the equation under y 

1 

(in the Y = key), then you need to 

call the y 

1 

function. To do this follow the steps: VARS over to Y-VARS 

ENTER (to select Function) ENTER again (to select y 

1 

). Thus, the screen 

should show: 1-Var Stats L 

1 

, L 

2 

, y 

1 

and then ENTER. The resulting screen 

will look the same as it did without the y 

1 

. But, if you go to the Y = key, the 

linear equation will be stored in y 

1 

. You can then graph the scatter plot and the 

regression line by entering ZOOM #9.

4.2 Least-Squares Regression

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?