10.04.2013 Views

Logit, Probit and Tobit: Models for Categorical and Limited ...

Logit, Probit and Tobit: Models for Categorical and Limited ...

Logit, Probit and Tobit: Models for Categorical and Limited ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Logit</strong>, <strong>Probit</strong> <strong>and</strong> <strong>Tobit</strong>:<br />

<strong>Models</strong> <strong>for</strong> <strong>Categorical</strong> <strong>and</strong> <strong>Limited</strong><br />

Dependent Variables<br />

By Rajulton Fern<strong>and</strong>o<br />

Presented at<br />

PLCS/RDC Statistics <strong>and</strong> Data Series at Western<br />

March 23 23, 2011


Introduction<br />

• In social science research research, categorical data are often<br />

collected through surveys.<br />

– <strong>Categorical</strong> g Nominal <strong>and</strong> Ordinal variables<br />

– They take only a few values that do NOT have a metric.<br />

• A) ) Binary yCase<br />

• Many dependent variables of interest take only two<br />

values (a dichotomous variable), denoting an event or<br />

non-event <strong>and</strong> coded as 1 <strong>and</strong> 0 respectively. Some<br />

examples:<br />

– The labor <strong>for</strong>ce status of a person.<br />

– Voting behavior of a person (in favor of a new policy).<br />

– Whether a person got married or divorced.<br />

– Whether a person involved in criminal behaviour, etc.


Introduction<br />

• With such variables variables, we can build models that<br />

describe the response probabilities, say P(yi = 1), of<br />

the dependent p variable y yi. i<br />

– For a sample of N independently <strong>and</strong> identically distributed<br />

observations i = 1, ... ,N <strong>and</strong> a (K+1)-dimensional vector x′ i<br />

of f explanatory l t variables, i bl the th probability b bilit th that t y tk takes value l<br />

1 is modeled as<br />

P ( yi<br />

= 1|<br />

xi<br />

) = F ( xi′<br />

β ) = F ( zi<br />

where β is a (K + 1)-dimensional column vector of<br />

parameters.<br />

• The trans<strong>for</strong>mation function F is crucial. It maps the<br />

linear combination into [0,1] <strong>and</strong> satisfies in general<br />

F(−∞) =0 = 0, F(+∞) = =1 1, <strong>and</strong> <strong>and</strong>δF(z)/δz δF(z)/δz > 0 [that is is, it is a<br />

cumulative distribution function].<br />

)


The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> <strong>Models</strong><br />

• When the trans<strong>for</strong>mation function F is the logistic<br />

function, the response probabilities are given by<br />

P(<br />

y<br />

i<br />

= 1 |<br />

x<br />

i<br />

)<br />

=<br />

• And, when the trans<strong>for</strong>mation function F is the<br />

cumulative density function (cdf) of the st<strong>and</strong>ard<br />

normal distribution, the response probabilities are<br />

x ′ β<br />

x ′ β<br />

1<br />

i<br />

i<br />

2<br />

given by<br />

1 − s<br />

P ( yi<br />

= 1 | xi<br />

) = Φ ( xi′<br />

β ) = ∫ Φ ( s ) ds = ∫ e 2<br />

• The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> models are almost identical (see<br />

the Figure next slide) <strong>and</strong> the choice of the model is<br />

arbitrary, bi although l h h llogit i model d l has h certain i<br />

advantages (simplicity <strong>and</strong> ease of interpretation)<br />

1+<br />

x i e ′ i e<br />

e<br />

β<br />

x′<br />

β<br />

i<br />

− ∞<br />

− ∞<br />

2π<br />

ds


Source: J.S. Long, 1997


The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> <strong>Models</strong><br />

• However However, the parameters of the two models are<br />

scaled differently. The parameter estimates in a<br />

logistic g regression g tend to be 1.6 to 1.8 times higher g<br />

than they are in a corresponding probit model.<br />

• The probit p <strong>and</strong> logit g models are estimated by y<br />

maximum likelihood (ML), assuming independence<br />

across observations. The ML estimator of β is<br />

consistent i <strong>and</strong> dasymptotically i ll normally ll distributed. di ib d<br />

However, the estimation rests on the strong<br />

assumption that the latent error term is normally<br />

distributed <strong>and</strong> homoscedastic. If homoscedasticity is<br />

violated, , no easy ysolution.


The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> <strong>Models</strong><br />

• Note: The response function (logistic or probit) is an<br />

S-shaped function, which implies a fixed change in X<br />

has a smaller impact p on the pprobability ywhen<br />

it is<br />

near zero than when it is near the middle. Thus, it is a<br />

non-linear response function.<br />

• How to interpret the coefficients : In both models,<br />

If b > 0 p increases as X increases<br />

If b


The <strong>Logit</strong> <strong>and</strong> <strong>Probit</strong> <strong>Models</strong><br />

– In the logit model model, we can interpret b as an effect<br />

on the odds. That is, every unit increase in X<br />

results in a multiplicative effect of eb p<br />

on the odds.<br />

Example: If b = 0.25, then e .25 = 1.28. Thus, when X<br />

changes by one unit, p increases by a factor of 1.28, or<br />

changes by 28%.<br />

- In the probit model, use the Z-score terminology.<br />

FFor every unit it increase i in i X, X the th Z Z-score ( (or th the<br />

<strong>Probit</strong> of “success”) increases by b units. [Or, we<br />

can also say that an increase in X changes Z by b<br />

st<strong>and</strong>ard deviation units.]<br />

- If you like, you can convert the z-score to probabilities<br />

y ,y p<br />

using the normal table.


<strong>Models</strong> <strong>for</strong> Polytomous Data<br />

• B) Polytomous Case<br />

– Here we need to distinguish between purely<br />

nominal variables <strong>and</strong> really ordinal variables.<br />

– When the variable is purely nominal, we can<br />

extend the dichotomous logit g model, , using gone<br />

of<br />

the categories as reference <strong>and</strong> modeling the other<br />

responses j=1,2,..m-1 compared to the reference.<br />

• Example: In the case of 3 categories, using the 3rd category<br />

as the reference, logit p1 = ln(p1/p3) <strong>and</strong> logit p2 = ln(p2/p3), which will give g two sets of parameter p estimates.<br />

exp( β 1x<br />

)<br />

P(<br />

y = 1)<br />

=<br />

1 + exp( β 1x<br />

) + exp( β 2 x)<br />

exp( β 2 x)<br />

P ( y = 2 ) =<br />

1 +<br />

exp( β x)<br />

+ exp( β x)<br />

P(<br />

y =<br />

3)<br />

=<br />

1<br />

1<br />

1 + exp( β x)<br />

+ exp( β x)<br />

1<br />

2<br />

2


Polytomous Case<br />

– When the variable is really ordinal, ordinal we use cumulative<br />

logits (or probits). The logits in this model are <strong>for</strong><br />

cumulative categories at each point, contrasting<br />

categories above with categories below.<br />

– Example: Suppose Y has 4 categories; then,<br />

• logit (p (p1) ) = ln{p ln{p1 /(1p / (1-p1)} )} = a a1 + bX<br />

• logit (p 1 + p 2) = ln{(p 1+ p 2 )/(1-p 1 – p 2)} = a 2 + bX<br />

• logit (p 1+p 2+p 3) = ln{(p 1+ p 2 + p 3 )/(1-p 1–p 2–p 3)} = a 3 + bX<br />

– Since these are cumulative logits, the probabilities are<br />

attached to being in category j <strong>and</strong> lower.<br />

– Since the right side changes only in the intercepts,<br />

<strong>and</strong> not in the slope coefficient, this model is known as<br />

Proportional odds model. model Thus Thus, in ordered logistic, logistic we<br />

need to test the assumption of proportionality as well.


Ordinal Logistic<br />

– a a11, a a22, a 3 … are the “intercepts” intercepts that satisfy the property<br />

a1 < a2 < a3… interpreted as “thresholds” of the latent<br />

variable.<br />

– Interpretation of parameter estimates depends on the<br />

software used! Check the software manual.<br />

• If the RHS = a + bX, bX a positive positi e coefficient is associated<br />

more with lower order categories <strong>and</strong> a negative<br />

coefficient is associated more with higher order<br />

categories.<br />

• If the RHS = a – bX, a negative coefficient is more<br />

associated with lower ordered categories categories, <strong>and</strong> a positive<br />

coefficient is more associated with higher ordered<br />

categories.


Model <strong>for</strong> <strong>Limited</strong> Dependent Variable<br />

• C) <strong>Tobit</strong> Model<br />

• This model is <strong>for</strong> metric dependent variable <strong>and</strong><br />

when it is “limited” limited in the sense we observe it only if<br />

it is above or below some cut off level. For example,<br />

– the wages g may ybe limited from below by y the minimum<br />

wage<br />

– The donation amount give to charity<br />

– “Top coding” income at, say, at $300,000<br />

– Time use <strong>and</strong> leisure activity of individuals<br />

– Extramarital affairs<br />

• It is also called censored regression model. Censoring<br />

can be from below or from above, also called left <strong>and</strong><br />

right censoring. [Do not confuse the term “censoring”<br />

with the one used in dynamic modeling.]


The <strong>Tobit</strong> Model<br />

• The model is called <strong>Tobit</strong> because it was first proposed<br />

by Tobin (1958), <strong>and</strong> involves aspects of <strong>Probit</strong> analysis –<br />

a term coined by Goldberger <strong>for</strong> Tobin’s <strong>Probit</strong>.<br />

• Reasoning behind:<br />

– If we include the censored observations as y = 0, the<br />

censored d observations b i on the h lleft f will ill pull ll down d the h end d of f<br />

the line, resulting in underestimates of the intercept <strong>and</strong><br />

overestimates of the slope. p<br />

– If we exclude the censored observations <strong>and</strong> just use the<br />

observations <strong>for</strong> which y>0 (that is, truncating the sample),<br />

it will overestimate the intercept <strong>and</strong> underestimate the<br />

slope.<br />

– The degree g of bias in both will increase as the number of<br />

observations that take on the value of zero increases. (see<br />

Figure next slide)


Source: J.S. Long


The <strong>Tobit</strong> Model<br />

• The <strong>Tobit</strong> model uses all of the in<strong>for</strong>mation,<br />

in<strong>for</strong>mation<br />

including info on censoring <strong>and</strong> provides consistent<br />

estimates.<br />

• It is also a nonlinear model <strong>and</strong> similar to the probit<br />

model. It is estimated using g maximum likelihood<br />

estimation techniques. The likelihood function <strong>for</strong><br />

the tobit model takes the <strong>for</strong>m:<br />

• This is an unusual function, it consists of two terms,<br />

the first <strong>for</strong> non-censored observations (it is the pdf),<br />

<strong>and</strong> dth the second df<strong>for</strong> censored dobservations b ti (iti (it is th the cdf).<br />

df)


The <strong>Tobit</strong> Model<br />

• The estimated tobit coefficients are the marginal<br />

effects of a change in xj on y*, the unobservable latent<br />

variable <strong>and</strong> can be interpreted p in the same way y as in a<br />

linear regression model.<br />

• But such an interpretation may not be useful since we<br />

are interested in the effect of X on the observable y (or<br />

change in the censored outcome).<br />

– It can bbe shown h th that t change h iin y iis found f d by b multiplying lti l i<br />

the coefficient with Pr(a


Illustrations <strong>for</strong> logit, probit <strong>and</strong> tobit models, using womenwk.dta from Baum available at<br />

http://www.stata-press.com/data/imeus/womenwk.dta<br />

Descriptive Statistics<br />

N Minimum Maximum Mean Std. Deviation<br />

age 2000 20 59 36.21 8.287<br />

education 2000 10 20 13.08 3.046<br />

married 2000 0 1 .67 .470<br />

children 2000 0 5 1.64 1.399<br />

wagefull 2000 -1.68 45.81 21.3118 7.01204<br />

wage 1343 5.88 45.81 23.6922 6.30537<br />

lw 1343 1.77 3.82 3.1267 .28651<br />

work 2000 0 1 .67 .470<br />

lwf 2000 .00 3.82 2.0996 1.48752<br />

Valid N (listwise) 1343<br />

Binary Logistic Regression<br />

Step<br />

-2 Log likelihood<br />

Model Summary<br />

Cox & Snell R<br />

Square<br />

Nagelkerke R<br />

Square<br />

1 2055.829 a .212 .295<br />

a. Estimation terminated at iteration number 5 because<br />

parameter estimates changed by less than .001.<br />

Hosmer <strong>and</strong> Lemeshow Test<br />

Step Chi-square df Sig.<br />

1 6.491 8 .592<br />

Variables in the Equation<br />

B S.E. Wald df Sig. Exp(B)<br />

Step 1 a age .058 .007 64.359 1 .000 1.060<br />

education .098 .019 27.747 1 .000 1.103<br />

married .742 .126 34.401 1 .000 2.100<br />

children .764 .052 220.110 1 .000 2.148<br />

Constant -4.159 .332 156.909 1 .000 .016<br />

a. Variable(s) entered on step 1: age, education, married, children.


Binary <strong>Probit</strong> Regression (in SPSS, use the ordinal regression menu <strong>and</strong> select probit<br />

link function. Ignore the test of parallel lines, etc.)<br />

Model -2 Log<br />

Intercept Only 1645.024<br />

Model Fitting In<strong>for</strong>mation<br />

Likelihood Chi-Square df Sig.<br />

Final 1166.702 478.322 4 .000<br />

Link function: <strong>Probit</strong>.<br />

Parameter Estimates<br />

Estimate Std. Error Wald df Sig.<br />

95% Confidence Interval<br />

Lower Bound Upper Bound<br />

Threshold [work = 0] 2.037 .209 94.664 1 .000 1.626 2.447<br />

Location age .035 .004 67.301 1 .000 .026 .043<br />

Link function: <strong>Probit</strong>.<br />

education .058 .011 28.061 1 .000 .037 .080<br />

children .447 .029 243.907 1 .000 .391 .503<br />

[married=0] -.431 .074 33.618 1 .000 -.577 -.285<br />

[married=1] 0 a . . 0 . . .<br />

a. This parameter is set to zero because it is redundant.<br />

<strong>Tobit</strong> regression cannot be done in SPSS. Use Stata. Here are the Stata comm<strong>and</strong>s.<br />

First, fit simple OLS Regression of the variable lwf (just to check)<br />

. regress lwf age married children education<br />

Source | SS df MS Number of obs = 2000<br />

-------------+------------------------------ F( 4, 1995) = 134.21<br />

Model | 937.873188 4 234.468297 Prob > F = 0.0000<br />

Residual | 3485.34135 1995 1.74703827 R-squared = 0.2120<br />

-------------+------------------------------ Adj R-squared = 0.2105<br />

Total | 4423.21454 1999 2.21271363 Root MSE = 1.3218<br />

------------------------------------------------------------------------------<br />

lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />

-------------+----------------------------------------------------------------<br />

age | .0363624 .003862 9.42 0.000 .0287885 .0439362<br />

married | .3188214 .0690834 4.62 0.000 .1833381 .4543046<br />

children | .3305009 .0213143 15.51 0.000 .2887004 .3723015<br />

education | .0843345 .0102295 8.24 0.000 .0642729 .1043961<br />

_cons | -1.077738 .1703218 -6.33 0.000 -1.411765 -.7437105<br />

------------------------------------------------------------------------------<br />

. tobit lwf age married children education, ll(0)


<strong>Tobit</strong> regression Number of obs = 2000<br />

LR chi2(4) = 461.85<br />

Prob > chi2 = 0.0000<br />

Log likelihood = -3349.9685 Pseudo R2 = 0.0645<br />

------------------------------------------------------------------------------<br />

lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval]<br />

-------------+----------------------------------------------------------------<br />

age | .052157 .0057457 9.08 0.000 .0408888 .0634252<br />

married | .4841801 .1035188 4.68 0.000 .2811639 .6871964<br />

children | .4860021 .0317054 15.33 0.000 .4238229 .5481812<br />

education | .1149492 .0150913 7.62 0.000 .0853529 .1445454<br />

_cons | -2.807696 .2632565 -10.67 0.000 -3.323982 -2.291409<br />

-------------+----------------------------------------------------------------<br />

/sigma | 1.872811 .040014 1.794337 1.951285<br />

------------------------------------------------------------------------------<br />

Obs. summary: 657 left-censored observations at lwf0) (predict, pr(0,.))<br />

= .81920975<br />

------------------------------------------------------------------------------<br />

variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X<br />

---------+--------------------------------------------------------------------<br />

age | .0073278 .00083 8.84 0.000 .005703 .008952 36.208<br />

married*| .0706994 .01576 4.48 0.000 .039803 .101596 .6705<br />

children | .0682813 .00479 14.26 0.000 .058899 .077663 1.6445<br />

educat~n | .0161499 .00216 7.48 0.000 .011918 .020382 13.084<br />

------------------------------------------------------------------------------<br />

(*) dy/dx is <strong>for</strong> discrete change of dummy variable from 0 to 1<br />

. mfx compute, predict(e(0,.))<br />

Marginal effects after tobit<br />

y = E(lwf|lwf>0) (predict, e(0,.))<br />

= 2.3102021<br />

------------------------------------------------------------------------------<br />

variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X<br />

---------+--------------------------------------------------------------------<br />

age | .0314922 .00347 9.08 0.000 .024695 .03829 36.208<br />

married*| .2861047 .05982 4.78 0.000 .168855 .403354 .6705<br />

children | .2934463 .01908 15.38 0.000 .256041 .330852 1.6445<br />

educat~n | .0694059 .00912 7.61 0.000 .051531 .087281 13.084<br />

------------------------------------------------------------------------------<br />

(*) dy/dx is <strong>for</strong> discrete change of dummy variable from 0 to 1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!