Lecture 3: Multiple Regression Model - McCombs School of Business

DMBA: Statistics 

Lecture 3: Multiple Regression Model 

Estimation, Inference and F-tests 

Carlos Carvalho 

The University of Texas McCombs School of Business 

mccombs.utexas.edu/faculty/carlos.carvalho/teaching 

1

Today’s Plan 

1. The Multiple Regression Model 

◮ 

◮ 

◮ 

parameter estimation 

inference 

F-tests 

2

The Multiple Regression Model 

Many problems involve more than one independent variable or 

factor which affects the dependent or response variable. 

◮ Multi-factor asset pricing models (beyond CAPM) 

◮ Demand for a product given prices of competing brands, 

advertising,house hold attributes, etc. 

◮ More than size to predict house price! 

In SLR, the conditional mean of Y depends on X. The Multiple 

Linear Regression (MLR) model extends this idea to include more 

than one independent variable. 

3

The MLR Model 

Same as always, but with more covariates. 

Y |X 1 . . . X d 

ind 

∼ N(β 0 + β 1 X 1 . . . + β d X d , σ 2 ) 

Recall the key assumptions of our linear regression model: 

(i) The conditional mean of Y is linear in the X j variables. 

(ii) The additive errors (deviations from line) 

◮ 

◮ 

◮ 

are normally distributed 

independent from each other 

identically distributed (i.e., they have constant variance) 

4


Our interpretation of regression coefficients can be extended from 

the simple single covariate regression case: 

β j = ∂E[Y |X 1, . . . , X d ] 

∂X j 

Holding all other variables constant, β j is the 

average change in Y per unit change in X j . 

5


If d = 2, we can plot the regression surface in 3D. 

Consider sales of a product as predicted by price of this product 

(P1) and the price of a competing product (P2). 

6

The Data and Least Squares 

The data in multiple regression is a set of points with values for 

output Y and for each input variable. 

Data: Y i and x i = [X 1i , X 2i , . . . , X di ], for i = 1, . . . , n. 

Or, as a data array, 

⎡ 

Data = 

⎢ 

⎣ 

⎤ 

Y 1 X 11 X 21 . . . X d1 

Y 2 X 12 X 22 . . . X d2 

. 

⎥ 

⎦ 

Y n X 1n X 2n . . . X dn 

7


Y = β 0 + β 1 X 1 . . . + β d X d + ε, ε ∼ N(0, σ 2 ) 

How do we estimate the MLR model parameters? 

The principle of Least Squares is exactly the same as before: 

◮ Define the fitted values 

◮ Find the best fitting plane by minimizing the sum of squared 

residuals. 

8


Fitted Values: Ŷ i = b 0 + b 1 X 1i + b 2 X 2i . . . + b d X di . 

Residuals: e i = Y i − Ŷ i . 

Standard Error: s = 

√ ∑n 

i=1 e2 i 

n − p 

, where p = d + 1. 

Least Squares: Find b 0 , b 1 , b 2 , . . . , b d to minimize s 2 . 

9



-"./0$123$4$multiple 1"51"00*63$7*.8$.8"$(4)"0$43+$91*:"$ 

Fitted 

;4.4' 

Model: ?6+")@ Sales i = (4)"0 b 0 + b 1 P1 i + b 2 9= P2 i + e i 9> . 

* < = * > * * 

Applied Regression Analysis 

Carlos M. Carvalho 

!""#$%&'$()*+"$, 

10

Residual Standard Error 

First off, the calculation for s 2 = var(e) is exactly the same: 

s 2 = 

∑ n 

i=1 (Y i − Ŷ ) 2 

n − p 

◮ Ŷ i = b 0 + ∑ b j X ji and p = d + 1. 

◮ The residual standard error is ˆσ = s = √ s 2 . 

11

Residuals in MLR 

As in the SLR model, the residuals in multiple regression are 

purged of any relationship to the independent variables. 

We decompose Y into the part predicted by X and the part due to 

idiosyncratic error. 

Y = Ŷ + e 

corr(X j , e) = 0 corr(Ŷ , e) = 0 12

Residuals in MLR 

Consider the residuals from the Sales data: 

residuals 

-0.03 -0.01 0.01 0.03 

residuals 

-0.03 -0.01 0.01 0.03 

residuals 

-0.03 -0.01 0.01 0.03 

0.5 1.0 1.5 2.0 

fitted 

0.2 0.4 0.6 0.8 

P1 

0.2 0.6 1.0 

P2 

13

Fitted Values in MLR 

Another great plot for MLR problems is to look at 

Y (true values) against Ŷ (fitted values). 

Fitted vs True Response for Sales data 

Y 

0.5 1.0 1.5 2.0 

0.5 1.0 1.5 2.0 

Y.hat 

If things are working, these values should form a nice straight line. 

14

Inference for Coefficients 

As before in SLR, the LS linear coefficients are random (different 

for each sample) and correlated with each other. 

The LS estimators are unbiased: E[b j ] = β j for j = 0, . . . , d. 

In particular, the sampling distribution for b is a multivariate 

normal, with mean β = [β 0 · · · β d ] ′ and covariance matrix S b . 

b ∼ N(β, S b ) 

15

Inference for Individual Coefficients 

Intervals and t-statistics are exactly the same as in SLR. 

◮ A (1 − α)100% C.I. for β j is b j ± t α/2,n−p s bj . 

◮ z bj = (b j − βj 0 )/s bj ∼ t n−p (0, 1) is number of standard errors 

between the LS estimate and the null value. 

Intervals and testing via b j & s bj are one-at-a-time procedures: 

◮ You are evaluating the j th coefficient conditional on the other 

X ’s being in the model, but regardless of the values you’ve 

estimated for the other b’s. 

16

In Excel... 

-"./0$123$4$multiple 1"51"00*63$7*.8$.8"$(4)"0$43+$91*:"$ 

;4.4' 

?6+")@ (4)"0* < = 9= 

* 

> 9> 

* 

* 



!""#$%&'$()*+"$, 

17

F-tests 

◮ In many situation, we need a testing procedure that can 

address simultaneous hypotheses about more than one 

coefficient 

◮ Why not the t-test? 

◮ We will look at two important types of simultaneous tests 

(i) Overall Test of Significance 

(ii) Partial F-test 

The first test will help us determine whether or not our regression 

is worth anything... the second will allow us to compare different 

models. 

18

Supervisor Performance Data 

Suppose you are interested in the relationship between the overall 

performance of supervisors to specific activities involving 

interactions between supervisors and employees (from a psychology 

management study) 

The Data 

◮ Y = Overall rating of supervisor 

◮ X 1 = Handles employee complaints 

◮ X 2 = Does not allow special privileges 

◮ X 3 = Opportunity to learn new things 

◮ X 4 = Raises based on performance 

◮ X 5 = Too critical of poor performance 

◮ X 6 = Rate of advancing to better jobs 19





!""#$%&'$()*+"$,- 

20


F-tests 

%.$/0"1"$234$1")2/*53.0*6$0"1"7$81"$2))$/0"$95"::*9*"3/.$.*;3*:*923/7$ 

!02/$2

Why not look at R 2 

◮ R 2 in MLR is still a measure of goodness of fit. 

◮ However it ALWAYS grows as we increase the number of 

explanatory variables. 

◮ Even if there is no relationship between the X ′ s and Y , 

R 2 > 0!! 

◮ To see this let’s look at some “Garbage” Data 

22

F-tests 

Garbage Data 

./$0""$12*03$)"140$5"6"781"$0/9"$587:85"$+818$1281$280$6/12*65$1/$ 

+/$;*12$//7986?"$@ABC 

I made up 6 “garbage” variables that have nothing to do with Y ... 

D*701$)"140$98#"$E=$0/9"$F587:85"G$@HI3$H,3$J3$HKB$

Garbage Data 

◮ R 2 is 26% !! 

◮ As usual, we need a statistical notion of how close is close... 

◮ It turns out that if we transform R 2 we can solve this. 

Define 

f = 

R 2 /(p − 1) 

(1 − R 2 )/(n − p) 

This is the F-test of overall significance. Under the null hypothesis 

f is distributed: 

f ∼ F p−1,n−p 

24

The F -test 

Recall what we are testing: 

H 0 : β 1 = β 2 = . . . β d = 0 

H 1 : at least one β j ≠ 0. 

Under H 0 , f has F p−1,n−p distribution with p − 1 numerator and 

n − p denominator degrees of freedom. 

◮ The F has decreasing variance as the df’s increase. 

◮ Generally, f > 4 is very significant (reject the null). 

The p-value for this test is ϕ = Pr(F p−1,n−p > f ). 

25

The F -test 

What kind of distribution is this? 

F distribution with 4 and 50 d.f. 

p(f) 

0.0 0.2 0.4 0.6 

0 1 2 3 4 5 

f 

It is a right skewed, positive valued family of distributions indexed 

by two parameters (the two df values). 

26

The F-test 

((7 6 

# 5 

5#685 

Two equivalent expressions for f 

9"3:;$$3="$?@A>BA@"C$DA>*AB)";E 

Let’s check this test for the “garbage” data... 

F-tests 

./0$12/34$45"$/6*7*81)$181)9:*:$;:36 



How about the original analysis (survey variables)... 

!""#$%&'$()*+"$,- 

27

Partial F-tests 

◮ What about fitting a reduced model with only a couple of 

X ’s? In other words, do we need all of the X ’s to explain Y ? 

◮ For example, in the Supervisor data we could argue that X 1 

and X 3 were the most important variables in predicting Y . 

◮ The full model (6 covariates) has Rfull 2 = 0.733 while the 

model with only X 1 and X 3 has Rrest 2 = 0.708 (check that!) 

◮ Can we make a decision based only in the R 2 calculations? 

NO!! 

28

Partial F -test 

With the total F -test, we were asking 

“Is this regression worthwhile?” 

Now, we’re asking 

“Is is useful to add these extra covariates to the regression?” 

You always want to use the simplest model possible. 

◮ Only add covariates if they are truly informative. 

29


Consider the regression model 

Y = β 0 + β 1 X 1 . . . + β dbase X dbase + β dbase +1X dbase +1 . . . β dfull X dfull + ε 

Such that d base is the number of covariates in the base (small) 

model and d full > d base is the number in the full (larger) model. 

The Partial F -test is concerned with the hypotheses 

H 0 : β dbase +1 = β dbase +2 = . . . = β dfull = 0 

H 1 : at least one β j ≠ 0 for j > d base . 

30


It turns out that under the null H 0 (i.e. base model is true), 

f = (R2 full − R2 base )/(d full − d base ) 

(1 − Rfull 2 )/(n − d full − 1) 

∼ 

F pfull −p base ,n−p full 

That is, under the null hypothesis, the ratio of normalized R 2 full − R2 base 

(increase in R 2 ) and 1 − R 2 full 

has F -distribution with d full − d base and 

n − d full − 1 df. 

◮ Big f means that R 2 full − R2 base 

is statistically significant. 

◮ Big f means that at least one of the added X ’s is useful. 

31

Supervisor Performance: Partial F -test 

Back to our supervisor data; we want to test 

H 0 : β 2 = β 4 = β 5 = β 6 = 0 

H 1 : at least one β j ≠ 0 for j ∈ {2, 4, 5, 6}. 

The F -stat is f = 

(0.733 − .708)/(6 − 2) 

(1 − .733)/(30 − 6 − 1) = 0.00625 

0.0116 = 0.54 

This leads to a p-value of 0.71 ... What do we conclude? 

32

Glossary and Equations 

F -tests and the null hypothesis distributions: 

◮ Total: f = 

(R2 )/(p−1) 

(1−R 2 )/(n−p) ∼ F p−1,n−p 

◮ Partial: f = (R2 full −R2 base )/(d full −d base ) 

(1−R 2 full )/(n−d full −1) 

∼ F dfull −d base ,n−d full −1 

33

Lecture 3: Multiple Regression Model - McCombs School of Business

Create successful ePaper yourself

Delete template?

Save as template?