Lecture 3: Multiple Regression Model - McCombs School of Business
Lecture 3: Multiple Regression Model - McCombs School of Business
Lecture 3: Multiple Regression Model - McCombs School of Business
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
DMBA: Statistics<br />
<strong>Lecture</strong> 3: <strong>Multiple</strong> <strong>Regression</strong> <strong>Model</strong><br />
Estimation, Inference and F-tests<br />
Carlos Carvalho<br />
The University <strong>of</strong> Texas <strong>McCombs</strong> <strong>School</strong> <strong>of</strong> <strong>Business</strong><br />
mccombs.utexas.edu/faculty/carlos.carvalho/teaching<br />
1
Today’s Plan<br />
1. The <strong>Multiple</strong> <strong>Regression</strong> <strong>Model</strong><br />
◮<br />
◮<br />
◮<br />
parameter estimation<br />
inference<br />
F-tests<br />
2
The <strong>Multiple</strong> <strong>Regression</strong> <strong>Model</strong><br />
Many problems involve more than one independent variable or<br />
factor which affects the dependent or response variable.<br />
◮ Multi-factor asset pricing models (beyond CAPM)<br />
◮ Demand for a product given prices <strong>of</strong> competing brands,<br />
advertising,house hold attributes, etc.<br />
◮ More than size to predict house price!<br />
In SLR, the conditional mean <strong>of</strong> Y depends on X. The <strong>Multiple</strong><br />
Linear <strong>Regression</strong> (MLR) model extends this idea to include more<br />
than one independent variable.<br />
3
The MLR <strong>Model</strong><br />
Same as always, but with more covariates.<br />
Y |X 1 . . . X d<br />
ind<br />
∼ N(β 0 + β 1 X 1 . . . + β d X d , σ 2 )<br />
Recall the key assumptions <strong>of</strong> our linear regression model:<br />
(i) The conditional mean <strong>of</strong> Y is linear in the X j variables.<br />
(ii) The additive errors (deviations from line)<br />
◮<br />
◮<br />
◮<br />
are normally distributed<br />
independent from each other<br />
identically distributed (i.e., they have constant variance)<br />
4
The MLR <strong>Model</strong><br />
Our interpretation <strong>of</strong> regression coefficients can be extended from<br />
the simple single covariate regression case:<br />
β j = ∂E[Y |X 1, . . . , X d ]<br />
∂X j<br />
Holding all other variables constant, β j is the<br />
average change in Y per unit change in X j .<br />
5
The MLR <strong>Model</strong><br />
If d = 2, we can plot the regression surface in 3D.<br />
Consider sales <strong>of</strong> a product as predicted by price <strong>of</strong> this product<br />
(P1) and the price <strong>of</strong> a competing product (P2).<br />
6
The Data and Least Squares<br />
The data in multiple regression is a set <strong>of</strong> points with values for<br />
output Y and for each input variable.<br />
Data: Y i and x i = [X 1i , X 2i , . . . , X di ], for i = 1, . . . , n.<br />
Or, as a data array,<br />
⎡<br />
Data =<br />
⎢<br />
⎣<br />
⎤<br />
Y 1 X 11 X 21 . . . X d1<br />
Y 2 X 12 X 22 . . . X d2<br />
.<br />
⎥<br />
⎦<br />
Y n X 1n X 2n . . . X dn<br />
7
The Data and Least Squares<br />
Y = β 0 + β 1 X 1 . . . + β d X d + ε, ε ∼ N(0, σ 2 )<br />
How do we estimate the MLR model parameters?<br />
The principle <strong>of</strong> Least Squares is exactly the same as before:<br />
◮ Define the fitted values<br />
◮ Find the best fitting plane by minimizing the sum <strong>of</strong> squared<br />
residuals.<br />
8
The Data and Least Squares<br />
Fitted Values: Ŷ i = b 0 + b 1 X 1i + b 2 X 2i . . . + b d X di .<br />
Residuals: e i = Y i − Ŷ i .<br />
Standard Error: s =<br />
√ ∑n<br />
i=1 e2 i<br />
n − p<br />
, where p = d + 1.<br />
Least Squares: Find b 0 , b 1 , b 2 , . . . , b d to minimize s 2 .<br />
9
The Data and Least Squares<br />
The Data and Least Squares<br />
-"./0$123$4$multiple 1"51"00*63$7*.8$.8"$(4)"0$43+$91*:"$<br />
Fitted<br />
;4.4'<br />
<strong>Model</strong>: ?6+")@ Sales i = (4)"0 b 0 + b 1 P1 i + b 2 9= P2 i + e i 9> . <br />
* < = * > * *<br />
Applied <strong>Regression</strong> Analysis<br />
Carlos M. Carvalho<br />
!""#$%&'$()*+"$,<br />
10
Residual Standard Error<br />
First <strong>of</strong>f, the calculation for s 2 = var(e) is exactly the same:<br />
s 2 =<br />
∑ n<br />
i=1 (Y i − Ŷ ) 2<br />
n − p<br />
◮ Ŷ i = b 0 + ∑ b j X ji and p = d + 1.<br />
◮ The residual standard error is ˆσ = s = √ s 2 .<br />
11
Residuals in MLR<br />
As in the SLR model, the residuals in multiple regression are<br />
purged <strong>of</strong> any relationship to the independent variables.<br />
We decompose Y into the part predicted by X and the part due to<br />
idiosyncratic error.<br />
Y = Ŷ + e<br />
corr(X j , e) = 0 corr(Ŷ , e) = 0 12
Residuals in MLR<br />
Consider the residuals from the Sales data:<br />
residuals<br />
-0.03 -0.01 0.01 0.03<br />
residuals<br />
-0.03 -0.01 0.01 0.03<br />
residuals<br />
-0.03 -0.01 0.01 0.03<br />
0.5 1.0 1.5 2.0<br />
fitted<br />
0.2 0.4 0.6 0.8<br />
P1<br />
0.2 0.6 1.0<br />
P2<br />
13
Fitted Values in MLR<br />
Another great plot for MLR problems is to look at<br />
Y (true values) against Ŷ (fitted values).<br />
Fitted vs True Response for Sales data<br />
Y<br />
0.5 1.0 1.5 2.0<br />
0.5 1.0 1.5 2.0<br />
Y.hat<br />
If things are working, these values should form a nice straight line.<br />
14
Inference for Coefficients<br />
As before in SLR, the LS linear coefficients are random (different<br />
for each sample) and correlated with each other.<br />
The LS estimators are unbiased: E[b j ] = β j for j = 0, . . . , d.<br />
In particular, the sampling distribution for b is a multivariate<br />
normal, with mean β = [β 0 · · · β d ] ′ and covariance matrix S b .<br />
b ∼ N(β, S b )<br />
15
Inference for Individual Coefficients<br />
Intervals and t-statistics are exactly the same as in SLR.<br />
◮ A (1 − α)100% C.I. for β j is b j ± t α/2,n−p s bj .<br />
◮ z bj = (b j − βj 0 )/s bj ∼ t n−p (0, 1) is number <strong>of</strong> standard errors<br />
between the LS estimate and the null value.<br />
Intervals and testing via b j & s bj are one-at-a-time procedures:<br />
◮ You are evaluating the j th coefficient conditional on the other<br />
X ’s being in the model, but regardless <strong>of</strong> the values you’ve<br />
estimated for the other b’s.<br />
16
In Excel...<br />
-"./0$123$4$multiple 1"51"00*63$7*.8$.8"$(4)"0$43+$91*:"$<br />
;4.4'<br />
?6+")@ (4)"0* < = 9=<br />
*<br />
> 9><br />
*<br />
*<br />
Applied <strong>Regression</strong> Analysis<br />
Carlos M. Carvalho<br />
!""#$%&'$()*+"$,<br />
17
F-tests<br />
◮ In many situation, we need a testing procedure that can<br />
address simultaneous hypotheses about more than one<br />
coefficient<br />
◮ Why not the t-test?<br />
◮ We will look at two important types <strong>of</strong> simultaneous tests<br />
(i) Overall Test <strong>of</strong> Significance<br />
(ii) Partial F-test<br />
The first test will help us determine whether or not our regression<br />
is worth anything... the second will allow us to compare different<br />
models.<br />
18
Supervisor Performance Data<br />
Suppose you are interested in the relationship between the overall<br />
performance <strong>of</strong> supervisors to specific activities involving<br />
interactions between supervisors and employees (from a psychology<br />
management study)<br />
The Data<br />
◮ Y = Overall rating <strong>of</strong> supervisor<br />
◮ X 1 = Handles employee complaints<br />
◮ X 2 = Does not allow special privileges<br />
◮ X 3 = Opportunity to learn new things<br />
◮ X 4 = Raises based on performance<br />
◮ X 5 = Too critical <strong>of</strong> poor performance<br />
◮ X 6 = Rate <strong>of</strong> advancing to better jobs 19
Supervisor Performance Data<br />
Supervisor Performance Data<br />
Applied <strong>Regression</strong> Analysis<br />
Carlos M. Carvalho<br />
!""#$%&'$()*+"$,-<br />
20
Supervisor Performance Data<br />
F-tests<br />
%.$/0"1"$234$1")2/*53.0*6$0"1"7$81"$2))$/0"$95"::*9*"3/.$.*;3*:*923/7$<br />
!02/$2
Why not look at R 2<br />
◮ R 2 in MLR is still a measure <strong>of</strong> goodness <strong>of</strong> fit.<br />
◮ However it ALWAYS grows as we increase the number <strong>of</strong><br />
explanatory variables.<br />
◮ Even if there is no relationship between the X ′ s and Y ,<br />
R 2 > 0!!<br />
◮ To see this let’s look at some “Garbage” Data<br />
22
F-tests<br />
Garbage Data<br />
./$0""$12*03$)"140$5"6"781"$0/9"$587:85"$+818$1281$280$6/12*65$1/$<br />
+/$;*12$//7986?"$@ABC<br />
I made up 6 “garbage” variables that have nothing to do with Y ...<br />
D*701$)"140$98#"$E=$0/9"$F587:85"G$@HI3$H,3$J3$HKB$
Garbage Data<br />
◮ R 2 is 26% !!<br />
◮ As usual, we need a statistical notion <strong>of</strong> how close is close...<br />
◮ It turns out that if we transform R 2 we can solve this.<br />
Define<br />
f =<br />
R 2 /(p − 1)<br />
(1 − R 2 )/(n − p)<br />
This is the F-test <strong>of</strong> overall significance. Under the null hypothesis<br />
f is distributed:<br />
f ∼ F p−1,n−p<br />
24
The F -test<br />
Recall what we are testing:<br />
H 0 : β 1 = β 2 = . . . β d = 0<br />
H 1 : at least one β j ≠ 0.<br />
Under H 0 , f has F p−1,n−p distribution with p − 1 numerator and<br />
n − p denominator degrees <strong>of</strong> freedom.<br />
◮ The F has decreasing variance as the df’s increase.<br />
◮ Generally, f > 4 is very significant (reject the null).<br />
The p-value for this test is ϕ = Pr(F p−1,n−p > f ).<br />
25
The F -test<br />
What kind <strong>of</strong> distribution is this?<br />
F distribution with 4 and 50 d.f.<br />
p(f)<br />
0.0 0.2 0.4 0.6<br />
0 1 2 3 4 5<br />
f<br />
It is a right skewed, positive valued family <strong>of</strong> distributions indexed<br />
by two parameters (the two df values).<br />
26
The F-test<br />
((7 6<br />
# 5<br />
5#685<br />
Two equivalent expressions for f<br />
9"3:;$$3="$?@A>BA@"C$DA>*AB)";E<br />
Let’s check this test for the “garbage” data...<br />
F-tests<br />
./0$12/34$45"$/6*7*81)$181)9:*:$;:36<br />
Applied <strong>Regression</strong> Analysis<br />
Carlos M. Carvalho<br />
How about the original analysis (survey variables)...<br />
!""#$%&'$()*+"$,-<br />
27
Partial F-tests<br />
◮ What about fitting a reduced model with only a couple <strong>of</strong><br />
X ’s? In other words, do we need all <strong>of</strong> the X ’s to explain Y ?<br />
◮ For example, in the Supervisor data we could argue that X 1<br />
and X 3 were the most important variables in predicting Y .<br />
◮ The full model (6 covariates) has Rfull 2 = 0.733 while the<br />
model with only X 1 and X 3 has Rrest 2 = 0.708 (check that!)<br />
◮ Can we make a decision based only in the R 2 calculations?<br />
NO!!<br />
28
Partial F -test<br />
With the total F -test, we were asking<br />
“Is this regression worthwhile?”<br />
Now, we’re asking<br />
“Is is useful to add these extra covariates to the regression?”<br />
You always want to use the simplest model possible.<br />
◮ Only add covariates if they are truly informative.<br />
29
Partial F -test<br />
Consider the regression model<br />
Y = β 0 + β 1 X 1 . . . + β dbase X dbase + β dbase +1X dbase +1 . . . β dfull X dfull + ε<br />
Such that d base is the number <strong>of</strong> covariates in the base (small)<br />
model and d full > d base is the number in the full (larger) model.<br />
The Partial F -test is concerned with the hypotheses<br />
H 0 : β dbase +1 = β dbase +2 = . . . = β dfull = 0<br />
H 1 : at least one β j ≠ 0 for j > d base .<br />
30
Partial F -test<br />
It turns out that under the null H 0 (i.e. base model is true),<br />
f = (R2 full − R2 base )/(d full − d base )<br />
(1 − Rfull 2 )/(n − d full − 1)<br />
∼<br />
F pfull −p base ,n−p full<br />
That is, under the null hypothesis, the ratio <strong>of</strong> normalized R 2 full − R2 base<br />
(increase in R 2 ) and 1 − R 2 full<br />
has F -distribution with d full − d base and<br />
n − d full − 1 df.<br />
◮ Big f means that R 2 full − R2 base<br />
is statistically significant.<br />
◮ Big f means that at least one <strong>of</strong> the added X ’s is useful.<br />
31
Supervisor Performance: Partial F -test<br />
Back to our supervisor data; we want to test<br />
H 0 : β 2 = β 4 = β 5 = β 6 = 0<br />
H 1 : at least one β j ≠ 0 for j ∈ {2, 4, 5, 6}.<br />
The F -stat is f =<br />
(0.733 − .708)/(6 − 2)<br />
(1 − .733)/(30 − 6 − 1) = 0.00625<br />
0.0116 = 0.54<br />
This leads to a p-value <strong>of</strong> 0.71 ... What do we conclude?<br />
32
Glossary and Equations<br />
F -tests and the null hypothesis distributions:<br />
◮ Total: f =<br />
(R2 )/(p−1)<br />
(1−R 2 )/(n−p) ∼ F p−1,n−p<br />
◮ Partial: f = (R2 full −R2 base )/(d full −d base )<br />
(1−R 2 full )/(n−d full −1)<br />
∼ F dfull −d base ,n−d full −1<br />
33