10.04.2013 Views

Logit, Probit and Tobit: Models for Categorical and Limited ...

Logit, Probit and Tobit: Models for Categorical and Limited ...

Logit, Probit and Tobit: Models for Categorical and Limited ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Logit</strong>, <strong>Probit</strong> <strong>and</strong> <strong>Tobit</strong>:<br />

<strong>Models</strong> <strong>for</strong> <strong>Categorical</strong> <strong>and</strong> <strong>Limited</strong><br />

Dependent Variables<br />

By Rajulton Fern<strong>and</strong>o<br />

Presented at<br />

PLCS/RDC Statistics <strong>and</strong> Data Series at Western<br />

March 23 23, 2011


Introduction<br />

• In social science research research, categorical data are often<br />

collected through surveys.<br />

– <strong>Categorical</strong> g Nominal <strong>and</strong> Ordinal variables<br />

– They take only a few values that do NOT have a metric.<br />

• A) ) Binary yCase<br />

• Many dependent variables of interest take only two<br />

values (a dichotomous variable), denoting an event or<br />

non-event <strong>and</strong> coded as 1 <strong>and</strong> 0 respectively. Some<br />

examples:<br />

– The labor <strong>for</strong>ce status of a person.<br />

– Voting behavior of a person (in favor of a new policy).<br />

– Whether a person got married or divorced.<br />

– Whether a person involved in criminal behaviour, etc.


Introduction<br />

• With such variables variables, we can build models that<br />

describe the response probabilities, say P(yi = 1), of<br />

the dependent p variable y yi. i<br />

– For a sample of N independently <strong>and</strong> identically distributed<br />

observations i = 1, ... ,N <strong>and</strong> a (K+1)-dimensional vector x′ i<br />

of f explanatory l t variables, i bl the th probability b bilit th that t y tk takes value l<br />

1 is modeled as<br />

P ( yi<br />

= 1|<br />

xi<br />

) = F ( xi′<br />

β ) = F ( zi<br />

where β is a (K + 1)-dimensional column vector of<br />

parameters.<br />

• The trans<strong>for</strong>mation function F is crucial. It maps the<br />

linear combination into [0,1] <strong>and</strong> satisfies in general<br />

F(−∞) =0 = 0, F(+∞) = =1 1, <strong>and</strong> <strong>and</strong>δF(z)/δz δF(z)/δz > 0 [that is is, it is a<br />

cumulative distribution function].<br />

)


The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> <strong>Models</strong><br />

• When the trans<strong>for</strong>mation function F is the logistic<br />

function, the response probabilities are given by<br />

P(<br />

y<br />

i<br />

= 1 |<br />

x<br />

i<br />

)<br />

=<br />

• And, when the trans<strong>for</strong>mation function F is the<br />

cumulative density function (cdf) of the st<strong>and</strong>ard<br />

normal distribution, the response probabilities are<br />

x ′ β<br />

x ′ β<br />

1<br />

i<br />

i<br />

2<br />

given by<br />

1 − s<br />

P ( yi<br />

= 1 | xi<br />

) = Φ ( xi′<br />

β ) = ∫ Φ ( s ) ds = ∫ e 2<br />

• The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> models are almost identical (see<br />

the Figure next slide) <strong>and</strong> the choice of the model is<br />

arbitrary, bi although l h h llogit i model d l has h certain i<br />

advantages (simplicity <strong>and</strong> ease of interpretation)<br />

1+<br />

x i e ′ i e<br />

e<br />

β<br />

x′<br />

β<br />

i<br />

− ∞<br />

− ∞<br />

2π<br />

ds


Source: J.S. Long, 1997


The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> <strong>Models</strong><br />

• However However, the parameters of the two models are<br />

scaled differently. The parameter estimates in a<br />

logistic g regression g tend to be 1.6 to 1.8 times higher g<br />

than they are in a corresponding probit model.<br />

• The probit p <strong>and</strong> logit g models are estimated by y<br />

maximum likelihood (ML), assuming independence<br />

across observations. The ML estimator of β is<br />

consistent i <strong>and</strong> dasymptotically i ll normally ll distributed. di ib d<br />

However, the estimation rests on the strong<br />

assumption that the latent error term is normally<br />

distributed <strong>and</strong> homoscedastic. If homoscedasticity is<br />

violated, , no easy ysolution.


The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> <strong>Models</strong><br />

• Note: The response function (logistic or probit) is an<br />

S-shaped function, which implies a fixed change in X<br />

has a smaller impact p on the pprobability ywhen<br />

it is<br />

near zero than when it is near the middle. Thus, it is a<br />

non-linear response function.<br />

• How to interpret the coefficients : In both models,<br />

If b > 0 p increases as X increases<br />

If b


The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> <strong>Models</strong><br />

– In the logit model model, we can interpret b as an effect<br />

on the odds. That is, every unit increase in X<br />

results in a multiplicative effect of eb p<br />

on the odds.<br />

Example: If b = 0.25, then e .25 = 1.28. Thus, when X<br />

changes by one unit, p increases by a factor of 1.28, or<br />

changes by 28%.<br />

- In the probit model, use the Z-score terminology.<br />

FFor every unit it increase i in i X, X the th Z Z-score ( (or th the<br />

<strong>Probit</strong> of “success”) increases by b units. [Or, we<br />

can also say that an increase in X changes Z by b<br />

st<strong>and</strong>ard deviation units.]<br />

- If you like, you can convert the z-score to probabilities<br />

y ,y p<br />

using the normal table.


<strong>Models</strong> <strong>for</strong> Polytomous Data<br />

• B) Polytomous Case<br />

– Here we need to distinguish between purely<br />

nominal variables <strong>and</strong> really ordinal variables.<br />

– When the variable is purely nominal, we can<br />

extend the dichotomous logit g model, , using gone<br />

of<br />

the categories as reference <strong>and</strong> modeling the other<br />

responses j=1,2,..m-1 compared to the reference.<br />

• Example: In the case of 3 categories, using the 3rd category<br />

as the reference, logit p1 = ln(p1/p3) <strong>and</strong> logit p2 = ln(p2/p3), which will give g two sets of parameter p estimates.<br />

exp( β 1x<br />

)<br />

P(<br />

y = 1)<br />

=<br />

1 + exp( β 1x<br />

) + exp( β 2 x)<br />

exp( β 2 x)<br />

P ( y = 2 ) =<br />

1 +<br />

exp( β x)<br />

+ exp( β x)<br />

P(<br />

y =<br />

3)<br />

=<br />

1<br />

1<br />

1 + exp( β x)<br />

+ exp( β x)<br />

1<br />

2<br />

2


Polytomous Case<br />

– When the variable is really ordinal, ordinal we use cumulative<br />

logits (or probits). The logits in this model are <strong>for</strong><br />

cumulative categories at each point, contrasting<br />

categories above with categories below.<br />

– Example: Suppose Y has 4 categories; then,<br />

• logit (p (p1) ) = ln{p ln{p1 /(1p / (1-p1)} )} = a a1 + bX<br />

• logit (p 1 + p 2) = ln{(p 1+ p 2 )/(1-p 1 – p 2)} = a 2 + bX<br />

• logit (p 1+p 2+p 3) = ln{(p 1+ p 2 + p 3 )/(1-p 1–p 2–p 3)} = a 3 + bX<br />

– Since these are cumulative logits, the probabilities are<br />

attached to being in category j <strong>and</strong> lower.<br />

– Since the right side changes only in the intercepts,<br />

<strong>and</strong> not in the slope coefficient, this model is known as<br />

Proportional odds model. model Thus Thus, in ordered logistic, logistic we<br />

need to test the assumption of proportionality as well.


Ordinal Logistic<br />

– a a11, a a22, a 3 … are the “intercepts” intercepts that satisfy the property<br />

a1 < a2 < a3… interpreted as “thresholds” of the latent<br />

variable.<br />

– Interpretation of parameter estimates depends on the<br />

software used! Check the software manual.<br />

• If the RHS = a + bX, bX a positive positi e coefficient is associated<br />

more with lower order categories <strong>and</strong> a negative<br />

coefficient is associated more with higher order<br />

categories.<br />

• If the RHS = a – bX, a negative coefficient is more<br />

associated with lower ordered categories categories, <strong>and</strong> a positive<br />

coefficient is more associated with higher ordered<br />

categories.


Model <strong>for</strong> <strong>Limited</strong> Dependent Variable<br />

• C) <strong>Tobit</strong> Model<br />

• This model is <strong>for</strong> metric dependent variable <strong>and</strong><br />

when it is “limited” limited in the sense we observe it only if<br />

it is above or below some cut off level. For example,<br />

– the wages g may ybe limited from below by y the minimum<br />

wage<br />

– The donation amount give to charity<br />

– “Top coding” income at, say, at $300,000<br />

– Time use <strong>and</strong> leisure activity of individuals<br />

– Extramarital affairs<br />

• It is also called censored regression model. Censoring<br />

can be from below or from above, also called left <strong>and</strong><br />

right censoring. [Do not confuse the term “censoring”<br />

with the one used in dynamic modeling.]


The <strong>Tobit</strong> Model<br />

• The model is called <strong>Tobit</strong> because it was first proposed<br />

by Tobin (1958), <strong>and</strong> involves aspects of <strong>Probit</strong> analysis –<br />

a term coined by Goldberger <strong>for</strong> Tobin’s <strong>Probit</strong>.<br />

• Reasoning behind:<br />

– If we include the censored observations as y = 0, the<br />

censored d observations b i on the h lleft f will ill pull ll down d the h end d of f<br />

the line, resulting in underestimates of the intercept <strong>and</strong><br />

overestimates of the slope. p<br />

– If we exclude the censored observations <strong>and</strong> just use the<br />

observations <strong>for</strong> which y>0 (that is, truncating the sample),<br />

it will overestimate the intercept <strong>and</strong> underestimate the<br />

slope.<br />

– The degree g of bias in both will increase as the number of<br />

observations that take on the value of zero increases. (see<br />

Figure next slide)


Source: J.S. Long


The <strong>Tobit</strong> Model<br />

• The <strong>Tobit</strong> model uses all of the in<strong>for</strong>mation,<br />

in<strong>for</strong>mation<br />

including info on censoring <strong>and</strong> provides consistent<br />

estimates.<br />

• It is also a nonlinear model <strong>and</strong> similar to the probit<br />

model. It is estimated using g maximum likelihood<br />

estimation techniques. The likelihood function <strong>for</strong><br />

the tobit model takes the <strong>for</strong>m:<br />

• This is an unusual function, it consists of two terms,<br />

the first <strong>for</strong> non-censored observations (it is the pdf),<br />

<strong>and</strong> dth the second df<strong>for</strong> censored dobservations b ti (iti (it is th the cdf).<br />

df)


The <strong>Tobit</strong> Model<br />

• The estimated tobit coefficients are the marginal<br />

effects of a change in xj on y*, the unobservable latent<br />

variable <strong>and</strong> can be interpreted p in the same way y as in a<br />

linear regression model.<br />

• But such an interpretation may not be useful since we<br />

are interested in the effect of X on the observable y (or<br />

change in the censored outcome).<br />

– It can bbe shown h th that t change h iin y iis found f d by b multiplying lti l i<br />

the coefficient with Pr(a


Illustrations <strong>for</strong> logit, probit <strong>and</strong> tobit models, using womenwk.dta from Baum available at<br />

http://www.stata-press.com/data/imeus/womenwk.dta<br />

Descriptive Statistics<br />

N Minimum Maximum Mean Std. Deviation<br />

age 2000 20 59 36.21 8.287<br />

education 2000 10 20 13.08 3.046<br />

married 2000 0 1 .67 .470<br />

children 2000 0 5 1.64 1.399<br />

wagefull 2000 -1.68 45.81 21.3118 7.01204<br />

wage 1343 5.88 45.81 23.6922 6.30537<br />

lw 1343 1.77 3.82 3.1267 .28651<br />

work 2000 0 1 .67 .470<br />

lwf 2000 .00 3.82 2.0996 1.48752<br />

Valid N (listwise) 1343<br />

Binary Logistic Regression<br />

Step<br />

-2 Log likelihood<br />

Model Summary<br />

Cox & Snell R<br />

Square<br />

Nagelkerke R<br />

Square<br />

1 2055.829 a .212 .295<br />

a. Estimation terminated at iteration number 5 because<br />

parameter estimates changed by less than .001.<br />

Hosmer <strong>and</strong> Lemeshow Test<br />

Step Chi-square df Sig.<br />

1 6.491 8 .592<br />

Variables in the Equation<br />

B S.E. Wald df Sig. Exp(B)<br />

Step 1 a age .058 .007 64.359 1 .000 1.060<br />

education .098 .019 27.747 1 .000 1.103<br />

married .742 .126 34.401 1 .000 2.100<br />

children .764 .052 220.110 1 .000 2.148<br />

Constant -4.159 .332 156.909 1 .000 .016<br />

a. Variable(s) entered on step 1: age, education, married, children.


Binary <strong>Probit</strong> Regression (in SPSS, use the ordinal regression menu <strong>and</strong> select probit<br />

link function. Ignore the test of parallel lines, etc.)<br />

Model -2 Log<br />

Intercept Only 1645.024<br />

Model Fitting In<strong>for</strong>mation<br />

Likelihood Chi-Square df Sig.<br />

Final 1166.702 478.322 4 .000<br />

Link function: <strong>Probit</strong>.<br />

Parameter Estimates<br />

Estimate Std. Error Wald df Sig.<br />

95% Confidence Interval<br />

Lower Bound Upper Bound<br />

Threshold [work = 0] 2.037 .209 94.664 1 .000 1.626 2.447<br />

Location age .035 .004 67.301 1 .000 .026 .043<br />

Link function: <strong>Probit</strong>.<br />

education .058 .011 28.061 1 .000 .037 .080<br />

children .447 .029 243.907 1 .000 .391 .503<br />

[married=0] -.431 .074 33.618 1 .000 -.577 -.285<br />

[married=1] 0 a . . 0 . . .<br />

a. This parameter is set to zero because it is redundant.<br />

<strong>Tobit</strong> regression cannot be done in SPSS. Use Stata. Here are the Stata comm<strong>and</strong>s.<br />

First, fit simple OLS Regression of the variable lwf (just to check)<br />

. regress lwf age married children education<br />

Source | SS df MS Number of obs = 2000<br />

-------------+------------------------------ F( 4, 1995) = 134.21<br />

Model | 937.873188 4 234.468297 Prob > F = 0.0000<br />

Residual | 3485.34135 1995 1.74703827 R-squared = 0.2120<br />

-------------+------------------------------ Adj R-squared = 0.2105<br />

Total | 4423.21454 1999 2.21271363 Root MSE = 1.3218<br />

------------------------------------------------------------------------------<br />

lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />

-------------+----------------------------------------------------------------<br />

age | .0363624 .003862 9.42 0.000 .0287885 .0439362<br />

married | .3188214 .0690834 4.62 0.000 .1833381 .4543046<br />

children | .3305009 .0213143 15.51 0.000 .2887004 .3723015<br />

education | .0843345 .0102295 8.24 0.000 .0642729 .1043961<br />

_cons | -1.077738 .1703218 -6.33 0.000 -1.411765 -.7437105<br />

------------------------------------------------------------------------------<br />

. tobit lwf age married children education, ll(0)


<strong>Tobit</strong> regression Number of obs = 2000<br />

LR chi2(4) = 461.85<br />

Prob > chi2 = 0.0000<br />

Log likelihood = -3349.9685 Pseudo R2 = 0.0645<br />

------------------------------------------------------------------------------<br />

lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />

-------------+----------------------------------------------------------------<br />

age | .052157 .0057457 9.08 0.000 .0408888 .0634252<br />

married | .4841801 .1035188 4.68 0.000 .2811639 .6871964<br />

children | .4860021 .0317054 15.33 0.000 .4238229 .5481812<br />

education | .1149492 .0150913 7.62 0.000 .0853529 .1445454<br />

_cons | -2.807696 .2632565 -10.67 0.000 -3.323982 -2.291409<br />

-------------+----------------------------------------------------------------<br />

/sigma | 1.872811 .040014 1.794337 1.951285<br />

------------------------------------------------------------------------------<br />

Obs. summary: 657 left-censored observations at lwf0) (predict, pr(0,.))<br />

= .81920975<br />

------------------------------------------------------------------------------<br />

variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X<br />

---------+--------------------------------------------------------------------<br />

age | .0073278 .00083 8.84 0.000 .005703 .008952 36.208<br />

married*| .0706994 .01576 4.48 0.000 .039803 .101596 .6705<br />

children | .0682813 .00479 14.26 0.000 .058899 .077663 1.6445<br />

educat~n | .0161499 .00216 7.48 0.000 .011918 .020382 13.084<br />

------------------------------------------------------------------------------<br />

(*) dy/dx is <strong>for</strong> discrete change of dummy variable from 0 to 1<br />

. mfx compute, predict(e(0,.))<br />

Marginal effects after tobit<br />

y = E(lwf|lwf>0) (predict, e(0,.))<br />

= 2.3102021<br />

------------------------------------------------------------------------------<br />

variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X<br />

---------+--------------------------------------------------------------------<br />

age | .0314922 .00347 9.08 0.000 .024695 .03829 36.208<br />

married*| .2861047 .05982 4.78 0.000 .168855 .403354 .6705<br />

children | .2934463 .01908 15.38 0.000 .256041 .330852 1.6445<br />

educat~n | .0694059 .00912 7.61 0.000 .051531 .087281 13.084<br />

------------------------------------------------------------------------------<br />

(*) dy/dx is <strong>for</strong> discrete change of dummy variable from 0 to 1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!