Logit, Probit and Tobit: Models for Categorical and Limited ...
Logit, Probit and Tobit: Models for Categorical and Limited ...
Logit, Probit and Tobit: Models for Categorical and Limited ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Logit</strong>, <strong>Probit</strong> <strong>and</strong> <strong>Tobit</strong>:<br />
<strong>Models</strong> <strong>for</strong> <strong>Categorical</strong> <strong>and</strong> <strong>Limited</strong><br />
Dependent Variables<br />
By Rajulton Fern<strong>and</strong>o<br />
Presented at<br />
PLCS/RDC Statistics <strong>and</strong> Data Series at Western<br />
March 23 23, 2011
Introduction<br />
• In social science research research, categorical data are often<br />
collected through surveys.<br />
– <strong>Categorical</strong> g Nominal <strong>and</strong> Ordinal variables<br />
– They take only a few values that do NOT have a metric.<br />
• A) ) Binary yCase<br />
• Many dependent variables of interest take only two<br />
values (a dichotomous variable), denoting an event or<br />
non-event <strong>and</strong> coded as 1 <strong>and</strong> 0 respectively. Some<br />
examples:<br />
– The labor <strong>for</strong>ce status of a person.<br />
– Voting behavior of a person (in favor of a new policy).<br />
– Whether a person got married or divorced.<br />
– Whether a person involved in criminal behaviour, etc.
Introduction<br />
• With such variables variables, we can build models that<br />
describe the response probabilities, say P(yi = 1), of<br />
the dependent p variable y yi. i<br />
– For a sample of N independently <strong>and</strong> identically distributed<br />
observations i = 1, ... ,N <strong>and</strong> a (K+1)-dimensional vector x′ i<br />
of f explanatory l t variables, i bl the th probability b bilit th that t y tk takes value l<br />
1 is modeled as<br />
P ( yi<br />
= 1|<br />
xi<br />
) = F ( xi′<br />
β ) = F ( zi<br />
where β is a (K + 1)-dimensional column vector of<br />
parameters.<br />
• The trans<strong>for</strong>mation function F is crucial. It maps the<br />
linear combination into [0,1] <strong>and</strong> satisfies in general<br />
F(−∞) =0 = 0, F(+∞) = =1 1, <strong>and</strong> <strong>and</strong>δF(z)/δz δF(z)/δz > 0 [that is is, it is a<br />
cumulative distribution function].<br />
)
The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> <strong>Models</strong><br />
• When the trans<strong>for</strong>mation function F is the logistic<br />
function, the response probabilities are given by<br />
P(<br />
y<br />
i<br />
= 1 |<br />
x<br />
i<br />
)<br />
=<br />
• And, when the trans<strong>for</strong>mation function F is the<br />
cumulative density function (cdf) of the st<strong>and</strong>ard<br />
normal distribution, the response probabilities are<br />
x ′ β<br />
x ′ β<br />
1<br />
i<br />
i<br />
2<br />
given by<br />
1 − s<br />
P ( yi<br />
= 1 | xi<br />
) = Φ ( xi′<br />
β ) = ∫ Φ ( s ) ds = ∫ e 2<br />
• The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> models are almost identical (see<br />
the Figure next slide) <strong>and</strong> the choice of the model is<br />
arbitrary, bi although l h h llogit i model d l has h certain i<br />
advantages (simplicity <strong>and</strong> ease of interpretation)<br />
1+<br />
x i e ′ i e<br />
e<br />
β<br />
x′<br />
β<br />
i<br />
− ∞<br />
− ∞<br />
2π<br />
ds
Source: J.S. Long, 1997
The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> <strong>Models</strong><br />
• However However, the parameters of the two models are<br />
scaled differently. The parameter estimates in a<br />
logistic g regression g tend to be 1.6 to 1.8 times higher g<br />
than they are in a corresponding probit model.<br />
• The probit p <strong>and</strong> logit g models are estimated by y<br />
maximum likelihood (ML), assuming independence<br />
across observations. The ML estimator of β is<br />
consistent i <strong>and</strong> dasymptotically i ll normally ll distributed. di ib d<br />
However, the estimation rests on the strong<br />
assumption that the latent error term is normally<br />
distributed <strong>and</strong> homoscedastic. If homoscedasticity is<br />
violated, , no easy ysolution.
The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> <strong>Models</strong><br />
• Note: The response function (logistic or probit) is an<br />
S-shaped function, which implies a fixed change in X<br />
has a smaller impact p on the pprobability ywhen<br />
it is<br />
near zero than when it is near the middle. Thus, it is a<br />
non-linear response function.<br />
• How to interpret the coefficients : In both models,<br />
If b > 0 p increases as X increases<br />
If b
The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> <strong>Models</strong><br />
– In the logit model model, we can interpret b as an effect<br />
on the odds. That is, every unit increase in X<br />
results in a multiplicative effect of eb p<br />
on the odds.<br />
Example: If b = 0.25, then e .25 = 1.28. Thus, when X<br />
changes by one unit, p increases by a factor of 1.28, or<br />
changes by 28%.<br />
- In the probit model, use the Z-score terminology.<br />
FFor every unit it increase i in i X, X the th Z Z-score ( (or th the<br />
<strong>Probit</strong> of “success”) increases by b units. [Or, we<br />
can also say that an increase in X changes Z by b<br />
st<strong>and</strong>ard deviation units.]<br />
- If you like, you can convert the z-score to probabilities<br />
y ,y p<br />
using the normal table.
<strong>Models</strong> <strong>for</strong> Polytomous Data<br />
• B) Polytomous Case<br />
– Here we need to distinguish between purely<br />
nominal variables <strong>and</strong> really ordinal variables.<br />
– When the variable is purely nominal, we can<br />
extend the dichotomous logit g model, , using gone<br />
of<br />
the categories as reference <strong>and</strong> modeling the other<br />
responses j=1,2,..m-1 compared to the reference.<br />
• Example: In the case of 3 categories, using the 3rd category<br />
as the reference, logit p1 = ln(p1/p3) <strong>and</strong> logit p2 = ln(p2/p3), which will give g two sets of parameter p estimates.<br />
exp( β 1x<br />
)<br />
P(<br />
y = 1)<br />
=<br />
1 + exp( β 1x<br />
) + exp( β 2 x)<br />
exp( β 2 x)<br />
P ( y = 2 ) =<br />
1 +<br />
exp( β x)<br />
+ exp( β x)<br />
P(<br />
y =<br />
3)<br />
=<br />
1<br />
1<br />
1 + exp( β x)<br />
+ exp( β x)<br />
1<br />
2<br />
2
Polytomous Case<br />
– When the variable is really ordinal, ordinal we use cumulative<br />
logits (or probits). The logits in this model are <strong>for</strong><br />
cumulative categories at each point, contrasting<br />
categories above with categories below.<br />
– Example: Suppose Y has 4 categories; then,<br />
• logit (p (p1) ) = ln{p ln{p1 /(1p / (1-p1)} )} = a a1 + bX<br />
• logit (p 1 + p 2) = ln{(p 1+ p 2 )/(1-p 1 – p 2)} = a 2 + bX<br />
• logit (p 1+p 2+p 3) = ln{(p 1+ p 2 + p 3 )/(1-p 1–p 2–p 3)} = a 3 + bX<br />
– Since these are cumulative logits, the probabilities are<br />
attached to being in category j <strong>and</strong> lower.<br />
– Since the right side changes only in the intercepts,<br />
<strong>and</strong> not in the slope coefficient, this model is known as<br />
Proportional odds model. model Thus Thus, in ordered logistic, logistic we<br />
need to test the assumption of proportionality as well.
Ordinal Logistic<br />
– a a11, a a22, a 3 … are the “intercepts” intercepts that satisfy the property<br />
a1 < a2 < a3… interpreted as “thresholds” of the latent<br />
variable.<br />
– Interpretation of parameter estimates depends on the<br />
software used! Check the software manual.<br />
• If the RHS = a + bX, bX a positive positi e coefficient is associated<br />
more with lower order categories <strong>and</strong> a negative<br />
coefficient is associated more with higher order<br />
categories.<br />
• If the RHS = a – bX, a negative coefficient is more<br />
associated with lower ordered categories categories, <strong>and</strong> a positive<br />
coefficient is more associated with higher ordered<br />
categories.
Model <strong>for</strong> <strong>Limited</strong> Dependent Variable<br />
• C) <strong>Tobit</strong> Model<br />
• This model is <strong>for</strong> metric dependent variable <strong>and</strong><br />
when it is “limited” limited in the sense we observe it only if<br />
it is above or below some cut off level. For example,<br />
– the wages g may ybe limited from below by y the minimum<br />
wage<br />
– The donation amount give to charity<br />
– “Top coding” income at, say, at $300,000<br />
– Time use <strong>and</strong> leisure activity of individuals<br />
– Extramarital affairs<br />
• It is also called censored regression model. Censoring<br />
can be from below or from above, also called left <strong>and</strong><br />
right censoring. [Do not confuse the term “censoring”<br />
with the one used in dynamic modeling.]
The <strong>Tobit</strong> Model<br />
• The model is called <strong>Tobit</strong> because it was first proposed<br />
by Tobin (1958), <strong>and</strong> involves aspects of <strong>Probit</strong> analysis –<br />
a term coined by Goldberger <strong>for</strong> Tobin’s <strong>Probit</strong>.<br />
• Reasoning behind:<br />
– If we include the censored observations as y = 0, the<br />
censored d observations b i on the h lleft f will ill pull ll down d the h end d of f<br />
the line, resulting in underestimates of the intercept <strong>and</strong><br />
overestimates of the slope. p<br />
– If we exclude the censored observations <strong>and</strong> just use the<br />
observations <strong>for</strong> which y>0 (that is, truncating the sample),<br />
it will overestimate the intercept <strong>and</strong> underestimate the<br />
slope.<br />
– The degree g of bias in both will increase as the number of<br />
observations that take on the value of zero increases. (see<br />
Figure next slide)
Source: J.S. Long
The <strong>Tobit</strong> Model<br />
• The <strong>Tobit</strong> model uses all of the in<strong>for</strong>mation,<br />
in<strong>for</strong>mation<br />
including info on censoring <strong>and</strong> provides consistent<br />
estimates.<br />
• It is also a nonlinear model <strong>and</strong> similar to the probit<br />
model. It is estimated using g maximum likelihood<br />
estimation techniques. The likelihood function <strong>for</strong><br />
the tobit model takes the <strong>for</strong>m:<br />
• This is an unusual function, it consists of two terms,<br />
the first <strong>for</strong> non-censored observations (it is the pdf),<br />
<strong>and</strong> dth the second df<strong>for</strong> censored dobservations b ti (iti (it is th the cdf).<br />
df)
The <strong>Tobit</strong> Model<br />
• The estimated tobit coefficients are the marginal<br />
effects of a change in xj on y*, the unobservable latent<br />
variable <strong>and</strong> can be interpreted p in the same way y as in a<br />
linear regression model.<br />
• But such an interpretation may not be useful since we<br />
are interested in the effect of X on the observable y (or<br />
change in the censored outcome).<br />
– It can bbe shown h th that t change h iin y iis found f d by b multiplying lti l i<br />
the coefficient with Pr(a
Illustrations <strong>for</strong> logit, probit <strong>and</strong> tobit models, using womenwk.dta from Baum available at<br />
http://www.stata-press.com/data/imeus/womenwk.dta<br />
Descriptive Statistics<br />
N Minimum Maximum Mean Std. Deviation<br />
age 2000 20 59 36.21 8.287<br />
education 2000 10 20 13.08 3.046<br />
married 2000 0 1 .67 .470<br />
children 2000 0 5 1.64 1.399<br />
wagefull 2000 -1.68 45.81 21.3118 7.01204<br />
wage 1343 5.88 45.81 23.6922 6.30537<br />
lw 1343 1.77 3.82 3.1267 .28651<br />
work 2000 0 1 .67 .470<br />
lwf 2000 .00 3.82 2.0996 1.48752<br />
Valid N (listwise) 1343<br />
Binary Logistic Regression<br />
Step<br />
-2 Log likelihood<br />
Model Summary<br />
Cox & Snell R<br />
Square<br />
Nagelkerke R<br />
Square<br />
1 2055.829 a .212 .295<br />
a. Estimation terminated at iteration number 5 because<br />
parameter estimates changed by less than .001.<br />
Hosmer <strong>and</strong> Lemeshow Test<br />
Step Chi-square df Sig.<br />
1 6.491 8 .592<br />
Variables in the Equation<br />
B S.E. Wald df Sig. Exp(B)<br />
Step 1 a age .058 .007 64.359 1 .000 1.060<br />
education .098 .019 27.747 1 .000 1.103<br />
married .742 .126 34.401 1 .000 2.100<br />
children .764 .052 220.110 1 .000 2.148<br />
Constant -4.159 .332 156.909 1 .000 .016<br />
a. Variable(s) entered on step 1: age, education, married, children.
Binary <strong>Probit</strong> Regression (in SPSS, use the ordinal regression menu <strong>and</strong> select probit<br />
link function. Ignore the test of parallel lines, etc.)<br />
Model -2 Log<br />
Intercept Only 1645.024<br />
Model Fitting In<strong>for</strong>mation<br />
Likelihood Chi-Square df Sig.<br />
Final 1166.702 478.322 4 .000<br />
Link function: <strong>Probit</strong>.<br />
Parameter Estimates<br />
Estimate Std. Error Wald df Sig.<br />
95% Confidence Interval<br />
Lower Bound Upper Bound<br />
Threshold [work = 0] 2.037 .209 94.664 1 .000 1.626 2.447<br />
Location age .035 .004 67.301 1 .000 .026 .043<br />
Link function: <strong>Probit</strong>.<br />
education .058 .011 28.061 1 .000 .037 .080<br />
children .447 .029 243.907 1 .000 .391 .503<br />
[married=0] -.431 .074 33.618 1 .000 -.577 -.285<br />
[married=1] 0 a . . 0 . . .<br />
a. This parameter is set to zero because it is redundant.<br />
<strong>Tobit</strong> regression cannot be done in SPSS. Use Stata. Here are the Stata comm<strong>and</strong>s.<br />
First, fit simple OLS Regression of the variable lwf (just to check)<br />
. regress lwf age married children education<br />
Source | SS df MS Number of obs = 2000<br />
-------------+------------------------------ F( 4, 1995) = 134.21<br />
Model | 937.873188 4 234.468297 Prob > F = 0.0000<br />
Residual | 3485.34135 1995 1.74703827 R-squared = 0.2120<br />
-------------+------------------------------ Adj R-squared = 0.2105<br />
Total | 4423.21454 1999 2.21271363 Root MSE = 1.3218<br />
------------------------------------------------------------------------------<br />
lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
age | .0363624 .003862 9.42 0.000 .0287885 .0439362<br />
married | .3188214 .0690834 4.62 0.000 .1833381 .4543046<br />
children | .3305009 .0213143 15.51 0.000 .2887004 .3723015<br />
education | .0843345 .0102295 8.24 0.000 .0642729 .1043961<br />
_cons | -1.077738 .1703218 -6.33 0.000 -1.411765 -.7437105<br />
------------------------------------------------------------------------------<br />
. tobit lwf age married children education, ll(0)
<strong>Tobit</strong> regression Number of obs = 2000<br />
LR chi2(4) = 461.85<br />
Prob > chi2 = 0.0000<br />
Log likelihood = -3349.9685 Pseudo R2 = 0.0645<br />
------------------------------------------------------------------------------<br />
lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />
-------------+----------------------------------------------------------------<br />
age | .052157 .0057457 9.08 0.000 .0408888 .0634252<br />
married | .4841801 .1035188 4.68 0.000 .2811639 .6871964<br />
children | .4860021 .0317054 15.33 0.000 .4238229 .5481812<br />
education | .1149492 .0150913 7.62 0.000 .0853529 .1445454<br />
_cons | -2.807696 .2632565 -10.67 0.000 -3.323982 -2.291409<br />
-------------+----------------------------------------------------------------<br />
/sigma | 1.872811 .040014 1.794337 1.951285<br />
------------------------------------------------------------------------------<br />
Obs. summary: 657 left-censored observations at lwf0) (predict, pr(0,.))<br />
= .81920975<br />
------------------------------------------------------------------------------<br />
variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X<br />
---------+--------------------------------------------------------------------<br />
age | .0073278 .00083 8.84 0.000 .005703 .008952 36.208<br />
married*| .0706994 .01576 4.48 0.000 .039803 .101596 .6705<br />
children | .0682813 .00479 14.26 0.000 .058899 .077663 1.6445<br />
educat~n | .0161499 .00216 7.48 0.000 .011918 .020382 13.084<br />
------------------------------------------------------------------------------<br />
(*) dy/dx is <strong>for</strong> discrete change of dummy variable from 0 to 1<br />
. mfx compute, predict(e(0,.))<br />
Marginal effects after tobit<br />
y = E(lwf|lwf>0) (predict, e(0,.))<br />
= 2.3102021<br />
------------------------------------------------------------------------------<br />
variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X<br />
---------+--------------------------------------------------------------------<br />
age | .0314922 .00347 9.08 0.000 .024695 .03829 36.208<br />
married*| .2861047 .05982 4.78 0.000 .168855 .403354 .6705<br />
children | .2934463 .01908 15.38 0.000 .256041 .330852 1.6445<br />
educat~n | .0694059 .00912 7.61 0.000 .051531 .087281 13.084<br />
------------------------------------------------------------------------------<br />
(*) dy/dx is <strong>for</strong> discrete change of dummy variable from 0 to 1