Logit, Probit and Tobit: Models for Categorical and Limited ...

Logit, Probit and Tobit: 

Models for Categorical and Limited 

Dependent Variables 

By Rajulton Fernando 

Presented at 

PLCS/RDC Statistics and Data Series at Western 

March 23 23, 2011

Introduction 

• In social science research research, categorical data are often 

collected through surveys. 

– Categorical g Nominal and Ordinal variables 

– They take only a few values that do NOT have a metric. 

• A) ) Binary yCase 

• Many dependent variables of interest take only two 

values (a dichotomous variable), denoting an event or 

non-event and coded as 1 and 0 respectively. Some 

examples: 

– The labor force status of a person. 

– Voting behavior of a person (in favor of a new policy). 

– Whether a person got married or divorced. 

– Whether a person involved in criminal behaviour, etc.

Introduction 

• With such variables variables, we can build models that 

describe the response probabilities, say P(yi = 1), of 

the dependent p variable y yi. i 

– For a sample of N independently and identically distributed 

observations i = 1, ... ,N and a (K+1)-dimensional vector x′ i 

of f explanatory l t variables, i bl the th probability b bilit th that t y tk takes value l 

1 is modeled as 

P ( yi 

= 1| 

xi 

) = F ( xi′ 

β ) = F ( zi 

where β is a (K + 1)-dimensional column vector of 

parameters. 

• The transformation function F is crucial. It maps the 

linear combination into [0,1] and satisfies in general 

F(−∞) =0 = 0, F(+∞) = =1 1, and andδF(z)/δz δF(z)/δz > 0 [that is is, it is a 

cumulative distribution function]. 

)

The Logit and Probit Models 

• When the transformation function F is the logistic 

function, the response probabilities are given by 

P( 

y 

i 

= 1 | 

x 

i 

) 

= 

• And, when the transformation function F is the 

cumulative density function (cdf) of the standard 

normal distribution, the response probabilities are 

x ′ β 

x ′ β 

1 

i 

i 

2 

given by 

1 − s 

P ( yi 

= 1 | xi 

) = Φ ( xi′ 

β ) = ∫ Φ ( s ) ds = ∫ e 2 

• The Logit and Probit models are almost identical (see 

the Figure next slide) and the choice of the model is 

arbitrary, bi although l h h llogit i model d l has h certain i 

advantages (simplicity and ease of interpretation) 

1+ 

x i e ′ i e 

e 

β 

x′ 

β 

i 

− ∞ 

− ∞ 

2π 

ds

Source: J.S. Long, 1997


• However However, the parameters of the two models are 

scaled differently. The parameter estimates in a 

logistic g regression g tend to be 1.6 to 1.8 times higher g 

than they are in a corresponding probit model. 

• The probit p and logit g models are estimated by y 

maximum likelihood (ML), assuming independence 

across observations. The ML estimator of β is 

consistent i and dasymptotically i ll normally ll distributed. di ib d 

However, the estimation rests on the strong 

assumption that the latent error term is normally 

distributed and homoscedastic. If homoscedasticity is 

violated, , no easy ysolution.


• Note: The response function (logistic or probit) is an 

S-shaped function, which implies a fixed change in X 

has a smaller impact p on the pprobability ywhen 

it is 

near zero than when it is near the middle. Thus, it is a 

non-linear response function. 

• How to interpret the coefficients : In both models, 

If b > 0 p increases as X increases 

If b


– In the logit model model, we can interpret b as an effect 

on the odds. That is, every unit increase in X 

results in a multiplicative effect of eb p 

on the odds. 

Example: If b = 0.25, then e .25 = 1.28. Thus, when X 

changes by one unit, p increases by a factor of 1.28, or 

changes by 28%. 

- In the probit model, use the Z-score terminology. 

FFor every unit it increase i in i X, X the th Z Z-score ( (or th the 

Probit of “success”) increases by b units. [Or, we 

can also say that an increase in X changes Z by b 

standard deviation units.] 

- If you like, you can convert the z-score to probabilities 

y ,y p 

using the normal table.

Models for Polytomous Data 

• B) Polytomous Case 

– Here we need to distinguish between purely 

nominal variables and really ordinal variables. 

– When the variable is purely nominal, we can 

extend the dichotomous logit g model, , using gone 

of 

the categories as reference and modeling the other 

responses j=1,2,..m-1 compared to the reference. 

• Example: In the case of 3 categories, using the 3rd category 

as the reference, logit p1 = ln(p1/p3) and logit p2 = ln(p2/p3), which will give g two sets of parameter p estimates. 

exp( β 1x 

) 

P( 

y = 1) 

= 

1 + exp( β 1x 

) + exp( β 2 x) 

exp( β 2 x) 

P ( y = 2 ) = 

1 + 

exp( β x) 

+ exp( β x) 

P( 

y = 

3) 

= 

1 

1 

1 + exp( β x) 

+ exp( β x) 

1 

2 

2

Polytomous Case 

– When the variable is really ordinal, ordinal we use cumulative 

logits (or probits). The logits in this model are for 

cumulative categories at each point, contrasting 

categories above with categories below. 

– Example: Suppose Y has 4 categories; then, 

• logit (p (p1) ) = ln{p ln{p1 /(1p / (1-p1)} )} = a a1 + bX 

• logit (p 1 + p 2) = ln{(p 1+ p 2 )/(1-p 1 – p 2)} = a 2 + bX 

• logit (p 1+p 2+p 3) = ln{(p 1+ p 2 + p 3 )/(1-p 1–p 2–p 3)} = a 3 + bX 

– Since these are cumulative logits, the probabilities are 

attached to being in category j and lower. 

– Since the right side changes only in the intercepts, 

and not in the slope coefficient, this model is known as 

Proportional odds model. model Thus Thus, in ordered logistic, logistic we 

need to test the assumption of proportionality as well.

Ordinal Logistic 

– a a11, a a22, a 3 … are the “intercepts” intercepts that satisfy the property 

a1 < a2 < a3… interpreted as “thresholds” of the latent 

variable. 

– Interpretation of parameter estimates depends on the 

software used! Check the software manual. 

• If the RHS = a + bX, bX a positive positi e coefficient is associated 

more with lower order categories and a negative 

coefficient is associated more with higher order 

categories. 

• If the RHS = a – bX, a negative coefficient is more 

associated with lower ordered categories categories, and a positive 

coefficient is more associated with higher ordered 

categories.

Model for Limited Dependent Variable 

• C) Tobit Model 

• This model is for metric dependent variable and 

when it is “limited” limited in the sense we observe it only if 

it is above or below some cut off level. For example, 

– the wages g may ybe limited from below by y the minimum 

wage 

– The donation amount give to charity 

– “Top coding” income at, say, at $300,000 

– Time use and leisure activity of individuals 

– Extramarital affairs 

• It is also called censored regression model. Censoring 

can be from below or from above, also called left and 

right censoring. [Do not confuse the term “censoring” 

with the one used in dynamic modeling.]

The Tobit Model 

• The model is called Tobit because it was first proposed 

by Tobin (1958), and involves aspects of Probit analysis – 

a term coined by Goldberger for Tobin’s Probit. 

• Reasoning behind: 

– If we include the censored observations as y = 0, the 

censored d observations b i on the h lleft f will ill pull ll down d the h end d of f 

the line, resulting in underestimates of the intercept and 

overestimates of the slope. p 

– If we exclude the censored observations and just use the 

observations for which y>0 (that is, truncating the sample), 

it will overestimate the intercept and underestimate the 

slope. 

– The degree g of bias in both will increase as the number of 

observations that take on the value of zero increases. (see 

Figure next slide)

Source: J.S. Long


• The Tobit model uses all of the information, 

information 

including info on censoring and provides consistent 

estimates. 

• It is also a nonlinear model and similar to the probit 

model. It is estimated using g maximum likelihood 

estimation techniques. The likelihood function for 

the tobit model takes the form: 

• This is an unusual function, it consists of two terms, 

the first for non-censored observations (it is the pdf), 

and dth the second dffor censored dobservations b ti (iti (it is th the cdf). 

df)


• The estimated tobit coefficients are the marginal 

effects of a change in xj on y*, the unobservable latent 

variable and can be interpreted p in the same way y as in a 

linear regression model. 

• But such an interpretation may not be useful since we 

are interested in the effect of X on the observable y (or 

change in the censored outcome). 

– It can bbe shown h th that t change h iin y iis found f d by b multiplying lti l i 

the coefficient with Pr(a

Illustrations for logit, probit and tobit models, using womenwk.dta from Baum available at 

http://www.stata-press.com/data/imeus/womenwk.dta 

Descriptive Statistics 

N Minimum Maximum Mean Std. Deviation 

age 2000 20 59 36.21 8.287 

education 2000 10 20 13.08 3.046 

married 2000 0 1 .67 .470 

children 2000 0 5 1.64 1.399 

wagefull 2000 -1.68 45.81 21.3118 7.01204 

wage 1343 5.88 45.81 23.6922 6.30537 

lw 1343 1.77 3.82 3.1267 .28651 

work 2000 0 1 .67 .470 

lwf 2000 .00 3.82 2.0996 1.48752 

Valid N (listwise) 1343 

Binary Logistic Regression 

Step 

-2 Log likelihood 

Model Summary 

Cox & Snell R 

Square 

Nagelkerke R 

Square 

1 2055.829 a .212 .295 

a. Estimation terminated at iteration number 5 because 

parameter estimates changed by less than .001. 

Hosmer and Lemeshow Test 

Step Chi-square df Sig. 

1 6.491 8 .592 

Variables in the Equation 

B S.E. Wald df Sig. Exp(B) 

Step 1 a age .058 .007 64.359 1 .000 1.060 

education .098 .019 27.747 1 .000 1.103 

married .742 .126 34.401 1 .000 2.100 

children .764 .052 220.110 1 .000 2.148 

Constant -4.159 .332 156.909 1 .000 .016 

a. Variable(s) entered on step 1: age, education, married, children.

Binary Probit Regression (in SPSS, use the ordinal regression menu and select probit 

link function. Ignore the test of parallel lines, etc.) 

Model -2 Log 

Intercept Only 1645.024 

Model Fitting Information 

Likelihood Chi-Square df Sig. 

Final 1166.702 478.322 4 .000 

Link function: Probit. 

Parameter Estimates 

Estimate Std. Error Wald df Sig. 

95% Confidence Interval 

Lower Bound Upper Bound 

Threshold [work = 0] 2.037 .209 94.664 1 .000 1.626 2.447 

Location age .035 .004 67.301 1 .000 .026 .043 

Link function: Probit. 

education .058 .011 28.061 1 .000 .037 .080 

children .447 .029 243.907 1 .000 .391 .503 

[married=0] -.431 .074 33.618 1 .000 -.577 -.285 

[married=1] 0 a . . 0 . . . 

a. This parameter is set to zero because it is redundant. 

Tobit regression cannot be done in SPSS. Use Stata. Here are the Stata commands. 

First, fit simple OLS Regression of the variable lwf (just to check) 

. regress lwf age married children education 

Source | SS df MS Number of obs = 2000 

-------------+------------------------------ F( 4, 1995) = 134.21 

Model | 937.873188 4 234.468297 Prob > F = 0.0000 

Residual | 3485.34135 1995 1.74703827 R-squared = 0.2120 

-------------+------------------------------ Adj R-squared = 0.2105 

Total | 4423.21454 1999 2.21271363 Root MSE = 1.3218 

------------------------------------------------------------------------------ 

lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval] 

-------------+---------------------------------------------------------------- 

age | .0363624 .003862 9.42 0.000 .0287885 .0439362 

married | .3188214 .0690834 4.62 0.000 .1833381 .4543046 

children | .3305009 .0213143 15.51 0.000 .2887004 .3723015 

education | .0843345 .0102295 8.24 0.000 .0642729 .1043961 

_cons | -1.077738 .1703218 -6.33 0.000 -1.411765 -.7437105 

------------------------------------------------------------------------------ 

. tobit lwf age married children education, ll(0)

Tobit regression Number of obs = 2000 

LR chi2(4) = 461.85 

Prob > chi2 = 0.0000 

Log likelihood = -3349.9685 Pseudo R2 = 0.0645 

------------------------------------------------------------------------------ 

lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval] 

-------------+---------------------------------------------------------------- 

age | .052157 .0057457 9.08 0.000 .0408888 .0634252 

married | .4841801 .1035188 4.68 0.000 .2811639 .6871964 

children | .4860021 .0317054 15.33 0.000 .4238229 .5481812 

education | .1149492 .0150913 7.62 0.000 .0853529 .1445454 

_cons | -2.807696 .2632565 -10.67 0.000 -3.323982 -2.291409 

-------------+---------------------------------------------------------------- 

/sigma | 1.872811 .040014 1.794337 1.951285 

------------------------------------------------------------------------------ 

Obs. summary: 657 left-censored observations at lwf0) (predict, pr(0,.)) 

= .81920975 

------------------------------------------------------------------------------ 

variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X 

---------+-------------------------------------------------------------------- 

age | .0073278 .00083 8.84 0.000 .005703 .008952 36.208 

married*| .0706994 .01576 4.48 0.000 .039803 .101596 .6705 

children | .0682813 .00479 14.26 0.000 .058899 .077663 1.6445 

educat~n | .0161499 .00216 7.48 0.000 .011918 .020382 13.084 

------------------------------------------------------------------------------ 

(*) dy/dx is for discrete change of dummy variable from 0 to 1 

. mfx compute, predict(e(0,.)) 

Marginal effects after tobit 

y = E(lwf|lwf>0) (predict, e(0,.)) 

= 2.3102021 

------------------------------------------------------------------------------ 

variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X 

---------+-------------------------------------------------------------------- 

age | .0314922 .00347 9.08 0.000 .024695 .03829 36.208 

married*| .2861047 .05982 4.78 0.000 .168855 .403354 .6705 

children | .2934463 .01908 15.38 0.000 .256041 .330852 1.6445 

educat~n | .0694059 .00912 7.61 0.000 .051531 .087281 13.084 

------------------------------------------------------------------------------ 

(*) dy/dx is for discrete change of dummy variable from 0 to 1

Logit, Probit and Tobit: Models for Categorical and Limited ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?