27.11.2014 Views

Lecture 3: Multiple Regression Model - McCombs School of Business

Lecture 3: Multiple Regression Model - McCombs School of Business

Lecture 3: Multiple Regression Model - McCombs School of Business

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

DMBA: Statistics<br />

<strong>Lecture</strong> 3: <strong>Multiple</strong> <strong>Regression</strong> <strong>Model</strong><br />

Estimation, Inference and F-tests<br />

Carlos Carvalho<br />

The University <strong>of</strong> Texas <strong>McCombs</strong> <strong>School</strong> <strong>of</strong> <strong>Business</strong><br />

mccombs.utexas.edu/faculty/carlos.carvalho/teaching<br />

1


Today’s Plan<br />

1. The <strong>Multiple</strong> <strong>Regression</strong> <strong>Model</strong><br />

◮<br />

◮<br />

◮<br />

parameter estimation<br />

inference<br />

F-tests<br />

2


The <strong>Multiple</strong> <strong>Regression</strong> <strong>Model</strong><br />

Many problems involve more than one independent variable or<br />

factor which affects the dependent or response variable.<br />

◮ Multi-factor asset pricing models (beyond CAPM)<br />

◮ Demand for a product given prices <strong>of</strong> competing brands,<br />

advertising,house hold attributes, etc.<br />

◮ More than size to predict house price!<br />

In SLR, the conditional mean <strong>of</strong> Y depends on X. The <strong>Multiple</strong><br />

Linear <strong>Regression</strong> (MLR) model extends this idea to include more<br />

than one independent variable.<br />

3


The MLR <strong>Model</strong><br />

Same as always, but with more covariates.<br />

Y |X 1 . . . X d<br />

ind<br />

∼ N(β 0 + β 1 X 1 . . . + β d X d , σ 2 )<br />

Recall the key assumptions <strong>of</strong> our linear regression model:<br />

(i) The conditional mean <strong>of</strong> Y is linear in the X j variables.<br />

(ii) The additive errors (deviations from line)<br />

◮<br />

◮<br />

◮<br />

are normally distributed<br />

independent from each other<br />

identically distributed (i.e., they have constant variance)<br />

4


The MLR <strong>Model</strong><br />

Our interpretation <strong>of</strong> regression coefficients can be extended from<br />

the simple single covariate regression case:<br />

β j = ∂E[Y |X 1, . . . , X d ]<br />

∂X j<br />

Holding all other variables constant, β j is the<br />

average change in Y per unit change in X j .<br />

5


The MLR <strong>Model</strong><br />

If d = 2, we can plot the regression surface in 3D.<br />

Consider sales <strong>of</strong> a product as predicted by price <strong>of</strong> this product<br />

(P1) and the price <strong>of</strong> a competing product (P2).<br />

6


The Data and Least Squares<br />

The data in multiple regression is a set <strong>of</strong> points with values for<br />

output Y and for each input variable.<br />

Data: Y i and x i = [X 1i , X 2i , . . . , X di ], for i = 1, . . . , n.<br />

Or, as a data array,<br />

⎡<br />

Data =<br />

⎢<br />

⎣<br />

⎤<br />

Y 1 X 11 X 21 . . . X d1<br />

Y 2 X 12 X 22 . . . X d2<br />

.<br />

⎥<br />

⎦<br />

Y n X 1n X 2n . . . X dn<br />

7


The Data and Least Squares<br />

Y = β 0 + β 1 X 1 . . . + β d X d + ε, ε ∼ N(0, σ 2 )<br />

How do we estimate the MLR model parameters?<br />

The principle <strong>of</strong> Least Squares is exactly the same as before:<br />

◮ Define the fitted values<br />

◮ Find the best fitting plane by minimizing the sum <strong>of</strong> squared<br />

residuals.<br />

8


The Data and Least Squares<br />

Fitted Values: Ŷ i = b 0 + b 1 X 1i + b 2 X 2i . . . + b d X di .<br />

Residuals: e i = Y i − Ŷ i .<br />

Standard Error: s =<br />

√ ∑n<br />

i=1 e2 i<br />

n − p<br />

, where p = d + 1.<br />

Least Squares: Find b 0 , b 1 , b 2 , . . . , b d to minimize s 2 .<br />

9


The Data and Least Squares<br />

The Data and Least Squares<br />

-"./0$123$4$multiple 1"51"00*63$7*.8$.8"$(4)"0$43+$91*:"$<br />

Fitted<br />

;4.4'<br />

<strong>Model</strong>: ?6+")@ Sales i = (4)"0 b 0 + b 1 P1 i + b 2 9= P2 i + e i 9> . <br />

* < = * > * *<br />

Applied <strong>Regression</strong> Analysis<br />

Carlos M. Carvalho<br />

!""#$%&'$()*+"$,<br />

10


Residual Standard Error<br />

First <strong>of</strong>f, the calculation for s 2 = var(e) is exactly the same:<br />

s 2 =<br />

∑ n<br />

i=1 (Y i − Ŷ ) 2<br />

n − p<br />

◮ Ŷ i = b 0 + ∑ b j X ji and p = d + 1.<br />

◮ The residual standard error is ˆσ = s = √ s 2 .<br />

11


Residuals in MLR<br />

As in the SLR model, the residuals in multiple regression are<br />

purged <strong>of</strong> any relationship to the independent variables.<br />

We decompose Y into the part predicted by X and the part due to<br />

idiosyncratic error.<br />

Y = Ŷ + e<br />

corr(X j , e) = 0 corr(Ŷ , e) = 0 12


Residuals in MLR<br />

Consider the residuals from the Sales data:<br />

residuals<br />

-0.03 -0.01 0.01 0.03<br />

residuals<br />

-0.03 -0.01 0.01 0.03<br />

residuals<br />

-0.03 -0.01 0.01 0.03<br />

0.5 1.0 1.5 2.0<br />

fitted<br />

0.2 0.4 0.6 0.8<br />

P1<br />

0.2 0.6 1.0<br />

P2<br />

13


Fitted Values in MLR<br />

Another great plot for MLR problems is to look at<br />

Y (true values) against Ŷ (fitted values).<br />

Fitted vs True Response for Sales data<br />

Y<br />

0.5 1.0 1.5 2.0<br />

0.5 1.0 1.5 2.0<br />

Y.hat<br />

If things are working, these values should form a nice straight line.<br />

14


Inference for Coefficients<br />

As before in SLR, the LS linear coefficients are random (different<br />

for each sample) and correlated with each other.<br />

The LS estimators are unbiased: E[b j ] = β j for j = 0, . . . , d.<br />

In particular, the sampling distribution for b is a multivariate<br />

normal, with mean β = [β 0 · · · β d ] ′ and covariance matrix S b .<br />

b ∼ N(β, S b )<br />

15


Inference for Individual Coefficients<br />

Intervals and t-statistics are exactly the same as in SLR.<br />

◮ A (1 − α)100% C.I. for β j is b j ± t α/2,n−p s bj .<br />

◮ z bj = (b j − βj 0 )/s bj ∼ t n−p (0, 1) is number <strong>of</strong> standard errors<br />

between the LS estimate and the null value.<br />

Intervals and testing via b j & s bj are one-at-a-time procedures:<br />

◮ You are evaluating the j th coefficient conditional on the other<br />

X ’s being in the model, but regardless <strong>of</strong> the values you’ve<br />

estimated for the other b’s.<br />

16


In Excel...<br />

-"./0$123$4$multiple 1"51"00*63$7*.8$.8"$(4)"0$43+$91*:"$<br />

;4.4'<br />

?6+")@ (4)"0* < = 9=<br />

*<br />

> 9><br />

*<br />

*<br />

Applied <strong>Regression</strong> Analysis<br />

Carlos M. Carvalho<br />

!""#$%&'$()*+"$,<br />

17


F-tests<br />

◮ In many situation, we need a testing procedure that can<br />

address simultaneous hypotheses about more than one<br />

coefficient<br />

◮ Why not the t-test?<br />

◮ We will look at two important types <strong>of</strong> simultaneous tests<br />

(i) Overall Test <strong>of</strong> Significance<br />

(ii) Partial F-test<br />

The first test will help us determine whether or not our regression<br />

is worth anything... the second will allow us to compare different<br />

models.<br />

18


Supervisor Performance Data<br />

Suppose you are interested in the relationship between the overall<br />

performance <strong>of</strong> supervisors to specific activities involving<br />

interactions between supervisors and employees (from a psychology<br />

management study)<br />

The Data<br />

◮ Y = Overall rating <strong>of</strong> supervisor<br />

◮ X 1 = Handles employee complaints<br />

◮ X 2 = Does not allow special privileges<br />

◮ X 3 = Opportunity to learn new things<br />

◮ X 4 = Raises based on performance<br />

◮ X 5 = Too critical <strong>of</strong> poor performance<br />

◮ X 6 = Rate <strong>of</strong> advancing to better jobs 19


Supervisor Performance Data<br />

Supervisor Performance Data<br />

Applied <strong>Regression</strong> Analysis<br />

Carlos M. Carvalho<br />

!""#$%&'$()*+"$,-<br />

20


Supervisor Performance Data<br />

F-tests<br />

%.$/0"1"$234$1")2/*53.0*6$0"1"7$81"$2))$/0"$95"::*9*"3/.$.*;3*:*923/7$<br />

!02/$2


Why not look at R 2<br />

◮ R 2 in MLR is still a measure <strong>of</strong> goodness <strong>of</strong> fit.<br />

◮ However it ALWAYS grows as we increase the number <strong>of</strong><br />

explanatory variables.<br />

◮ Even if there is no relationship between the X ′ s and Y ,<br />

R 2 > 0!!<br />

◮ To see this let’s look at some “Garbage” Data<br />

22


F-tests<br />

Garbage Data<br />

./$0""$12*03$)"140$5"6"781"$0/9"$587:85"$+818$1281$280$6/12*65$1/$<br />

+/$;*12$//7986?"$@ABC<br />

I made up 6 “garbage” variables that have nothing to do with Y ...<br />

D*701$)"140$98#"$E=$0/9"$F587:85"G$@HI3$H,3$J3$HKB$


Garbage Data<br />

◮ R 2 is 26% !!<br />

◮ As usual, we need a statistical notion <strong>of</strong> how close is close...<br />

◮ It turns out that if we transform R 2 we can solve this.<br />

Define<br />

f =<br />

R 2 /(p − 1)<br />

(1 − R 2 )/(n − p)<br />

This is the F-test <strong>of</strong> overall significance. Under the null hypothesis<br />

f is distributed:<br />

f ∼ F p−1,n−p<br />

24


The F -test<br />

Recall what we are testing:<br />

H 0 : β 1 = β 2 = . . . β d = 0<br />

H 1 : at least one β j ≠ 0.<br />

Under H 0 , f has F p−1,n−p distribution with p − 1 numerator and<br />

n − p denominator degrees <strong>of</strong> freedom.<br />

◮ The F has decreasing variance as the df’s increase.<br />

◮ Generally, f > 4 is very significant (reject the null).<br />

The p-value for this test is ϕ = Pr(F p−1,n−p > f ).<br />

25


The F -test<br />

What kind <strong>of</strong> distribution is this?<br />

F distribution with 4 and 50 d.f.<br />

p(f)<br />

0.0 0.2 0.4 0.6<br />

0 1 2 3 4 5<br />

f<br />

It is a right skewed, positive valued family <strong>of</strong> distributions indexed<br />

by two parameters (the two df values).<br />

26


The F-test<br />

((7 6<br />

# 5<br />

5#685<br />

Two equivalent expressions for f<br />

9"3:;$$3="$?@A>BA@"C$DA>*AB)";E<br />

Let’s check this test for the “garbage” data...<br />

F-tests<br />

./0$12/34$45"$/6*7*81)$181)9:*:$;:36<br />

Applied <strong>Regression</strong> Analysis<br />

Carlos M. Carvalho<br />

How about the original analysis (survey variables)...<br />

!""#$%&'$()*+"$,-<br />

27


Partial F-tests<br />

◮ What about fitting a reduced model with only a couple <strong>of</strong><br />

X ’s? In other words, do we need all <strong>of</strong> the X ’s to explain Y ?<br />

◮ For example, in the Supervisor data we could argue that X 1<br />

and X 3 were the most important variables in predicting Y .<br />

◮ The full model (6 covariates) has Rfull 2 = 0.733 while the<br />

model with only X 1 and X 3 has Rrest 2 = 0.708 (check that!)<br />

◮ Can we make a decision based only in the R 2 calculations?<br />

NO!!<br />

28


Partial F -test<br />

With the total F -test, we were asking<br />

“Is this regression worthwhile?”<br />

Now, we’re asking<br />

“Is is useful to add these extra covariates to the regression?”<br />

You always want to use the simplest model possible.<br />

◮ Only add covariates if they are truly informative.<br />

29


Partial F -test<br />

Consider the regression model<br />

Y = β 0 + β 1 X 1 . . . + β dbase X dbase + β dbase +1X dbase +1 . . . β dfull X dfull + ε<br />

Such that d base is the number <strong>of</strong> covariates in the base (small)<br />

model and d full > d base is the number in the full (larger) model.<br />

The Partial F -test is concerned with the hypotheses<br />

H 0 : β dbase +1 = β dbase +2 = . . . = β dfull = 0<br />

H 1 : at least one β j ≠ 0 for j > d base .<br />

30


Partial F -test<br />

It turns out that under the null H 0 (i.e. base model is true),<br />

f = (R2 full − R2 base )/(d full − d base )<br />

(1 − Rfull 2 )/(n − d full − 1)<br />

∼<br />

F pfull −p base ,n−p full<br />

That is, under the null hypothesis, the ratio <strong>of</strong> normalized R 2 full − R2 base<br />

(increase in R 2 ) and 1 − R 2 full<br />

has F -distribution with d full − d base and<br />

n − d full − 1 df.<br />

◮ Big f means that R 2 full − R2 base<br />

is statistically significant.<br />

◮ Big f means that at least one <strong>of</strong> the added X ’s is useful.<br />

31


Supervisor Performance: Partial F -test<br />

Back to our supervisor data; we want to test<br />

H 0 : β 2 = β 4 = β 5 = β 6 = 0<br />

H 1 : at least one β j ≠ 0 for j ∈ {2, 4, 5, 6}.<br />

The F -stat is f =<br />

(0.733 − .708)/(6 − 2)<br />

(1 − .733)/(30 − 6 − 1) = 0.00625<br />

0.0116 = 0.54<br />

This leads to a p-value <strong>of</strong> 0.71 ... What do we conclude?<br />

32


Glossary and Equations<br />

F -tests and the null hypothesis distributions:<br />

◮ Total: f =<br />

(R2 )/(p−1)<br />

(1−R 2 )/(n−p) ∼ F p−1,n−p<br />

◮ Partial: f = (R2 full −R2 base )/(d full −d base )<br />

(1−R 2 full )/(n−d full −1)<br />

∼ F dfull −d base ,n−d full −1<br />

33

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!