01.06.2015 Views

Count Regression Introduction - Paul Johnson Homepage

Count Regression Introduction - Paul Johnson Homepage

Count Regression Introduction - Paul Johnson Homepage

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 1 / 48<br />

<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong><br />

<strong>Paul</strong> <strong>Johnson</strong> <br />

December 6, 2011


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 2 / 48<br />

Welcome<br />

This is not ready yet


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 3 / 48<br />

Outline<br />

1 Motivation<br />

2 Poisson.<br />

3 Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

4 Negative Binomial<br />

Gamma distributed heterogeneity<br />

Gamma distribution background<br />

NB estimation<br />

Overdispersion<br />

5 Zero inflated models<br />

6 Additional Readings


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 4 / 48<br />

Motivation<br />

Outline<br />

1 Motivation<br />

2 Poisson.<br />

3 Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

4 Negative Binomial<br />

Gamma distributed heterogeneity<br />

Gamma distribution background<br />

NB estimation<br />

Overdispersion<br />

5 Zero inflated models<br />

6 Additional Readings


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 5 / 48<br />

Motivation<br />

<strong>Count</strong> means<br />

integer valued, 0, 1, 2,...<br />

must be positive (0 or greater).<br />

Well, if the expected number of observed cases is large, and the<br />

distribution of the count data is drawn from the Poisson distribution,<br />

then OLS might be OK. The Poisson is not all that different from a<br />

Normal distribution. So sometimes the Normal case can be thought of as<br />

an approximation.<br />

But if the expected number of counts is small, the Poisson distribution is<br />

not even a little bit like a normal distribution.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 6 / 48<br />

Motivation<br />

Alternatives might not be as good.<br />

There are alternatives<br />

tobit (OLS, but with a truncation at 0)<br />

ordinal logit/probit


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 7 / 48<br />

Motivation<br />

Generalized Linear Model Approach<br />

Recall. We asserted the dependent variable y i as a sum of a<br />

“predictable part” and a “random” part: y i = b 0 + b 1 x i + e i .<br />

If e i is “Normal with a mean of 0 and standard deviation of σ e ”,<br />

then y i is “Normal with a mean of b 0 + b 1x i and a standard deviation<br />

of σ e.”<br />

So OLS with an assumed Normal error implies<br />

y i ∼ N(X i b, σ 2 e )<br />

The symbol “∼” means “is distributed as” or “is drawn from”. X i b is<br />

shorthand matrix terminology for the “linear predictor”,<br />

b o + b 1 x1 i + b 2 x2 i .


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 8 / 48<br />

Motivation<br />

What’s the big point here?<br />

We think of the predictor as determining a parameter in a<br />

distribution from which observations are drawn.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 9 / 48<br />

Motivation<br />

You can use any distribution you want<br />

y i can be any distribution you want<br />

Let the properties of that distribution depend on input variables and<br />

parameters.<br />

For a “count” model, all you absolutely need is an integer-valued<br />

distribution for which y i ≥ 0. 2 possibilities:<br />

Poisson<br />

Negative Binomial


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 10 / 48<br />

Poisson.<br />

Outline<br />

1 Motivation<br />

2 Poisson.<br />

3 Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

4 Negative Binomial<br />

Gamma distributed heterogeneity<br />

Gamma distribution background<br />

NB estimation<br />

Overdispersion<br />

5 Zero inflated models<br />

6 Additional Readings


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 11 / 48<br />

Poisson: λ<br />

Outline<br />

1 Motivation<br />

2 Poisson.<br />

3 Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

4 Negative Binomial<br />

Gamma distributed heterogeneity<br />

Gamma distribution background<br />

NB estimation<br />

Overdispersion<br />

5 Zero inflated models<br />

6 Additional Readings


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 12 / 48<br />

Poisson: λ<br />

The Poisson is a “one parameter” distribution.<br />

The parameter is usually called λ, and that parameter determines<br />

the expected value and the variance.<br />

Pr(y i |input i ) = exp(−λ)λy i<br />

y i !


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 12 / 48<br />

Poisson: λ<br />

Relabel λ as “input”For Interpretation<br />

Instead of the greek letter λ, let’s call it what we mean: “input”.<br />

Pr(y i |input i ) = exp(−input i)input y i<br />

i<br />

y i !<br />

For any y i you put in here, this tells you how likely you are to count<br />

that many “things” if the input is “input”.<br />

When I write “input”, I mean the combined impact of parameters<br />

and variables.<br />

Input is not necessarily simply X i b. In fact, we usually have to “translate”<br />

or “curve” the linear predictor so it fits “within boundaries.”<br />

So “input” is typically some function that depends on X i b, for generality,<br />

g(X i b).


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 13 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

Poisson Sample, small lambda<br />

lambda=5<br />

lambda=10<br />

Density<br />

0.00 0.05 0.10 0.15<br />

Density<br />

0.00 0.04 0.08 0.12<br />

0 2 4 6 8 10 12 14<br />

5 10 15 20<br />

y<br />

y<br />

lambda=50<br />

lambda=200<br />

Density<br />

0.00 0.02 0.04<br />

Density<br />

0.000 0.010 0.020<br />

30 40 50 60 70 80<br />

160 180 200 220 240


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 14 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

Poisson Sample, large lambda<br />

lambda=5<br />

lambda=10<br />

Density<br />

0.00 0.05 0.10 0.15<br />

Density<br />

0.00 0.04 0.08 0.12<br />

0 2 4 6 8 10 12 14<br />

5 10 15 20<br />

y<br />

y<br />

lambda=50<br />

lambda=200<br />

Density<br />

0.00 0.02 0.04<br />

Density<br />

0.000 0.010 0.020<br />

30 40 50 60 70 80<br />

160 180 200 220 240


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 15 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

Noteworthy<br />

1 The Expected Value of the Poisson is λ = ”input”.<br />

2 The Variance of the Poisson is λ = ”input”.<br />

3 The shape changes and gets “more normal” as “input” gets bigger.<br />

Implication: If your count data has high values, then the OLS<br />

Normal model may serve about as well as a Poisson model<br />

However, there are 2 problems.<br />

1 Nonlinearity.<br />

2 Heteroskedasticity.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 16 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

Nonlinear Transformation of X i b Required for Poisson<br />

The input i must be positive! We are considering a “count variable,”<br />

something that is always POSITIVE. The expected value of a Poisson<br />

variable has to be positive. Since the expected value equals the value of<br />

“input i ”, then X i b cannot serve as input i because it may be negative.<br />

All kinds of transformations have been considered to make sure input is<br />

positive. A common way is to say that the input should be<br />

exponentiated, because exp(anything) is positive.<br />

input i = exp(X i b)<br />

Now, that results in the stupid looking exp(exp) appearance of the<br />

Poisson regression model:<br />

Pr(y|Xb) = exp(−exp(Xb))(exp(Xb))y<br />

y!<br />

or it looks slightly less ugly (not much) if we write:<br />

Pr(y|Xb) = exp(−eXb )(e Xb ) y<br />

y!


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 17 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

King called it the “Exponential Poisson”model, others call<br />

it the “log link”.<br />

If<br />

then<br />

input i = exp(X i b)<br />

log(input i ) = X i b<br />

In the Generalized Linear Model literature, they think of the<br />

transformation happing to the left hand side, they call it the link function.<br />

So the exponential on the right is the “inverse link” function.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 18 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

Estimation: straight forward ML<br />

Adjust the b’s to maximize the product of the probabilities of the<br />

observations:<br />

L(b; y, X ) = Pr(y 1 |Xb) ∗ Pr(y 2 |Xb) ∗ ... ∗ Pr(y N |Xb)<br />

Usually, one would take logs, and maximize the log likelihood, which<br />

would be a sum.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 19 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

Interpretation<br />

Recall the expected value of y i given input is just the input itself.<br />

E(y i |X i ) = exp(X i ˆb)<br />

So if the k’th variable changes, the impact is<br />

∂E(y i |X i )<br />

∂x k<br />

= ˆb k ∗ E(y i |X i )<br />

= ˆb k ∗ exp(X i ˆb)<br />

Long discusses the calculation of the percent change in expected y,<br />

i.e.<br />

E(y i |X i , x k + δ)<br />

= exp(ˆb ∗ δ)<br />

E(y i |X i , x k )


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 20 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

See poisson-1.R for this example<br />

Ugly Poisson Data<br />

y<br />

0 1 2 3 4 5 6<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

● ● ●<br />

● ● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●●<br />

●●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●●<br />

●<br />

●<br />

●●●<br />

●<br />

● ●<br />

●<br />

30 40 50 60 70<br />

x1


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 21 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

Poisson GLM Fit<br />

m1 ← glm ( y ∼ x1 + x2 , data=dat , f a m i l y=p o i s s o n ( l i n k=l o g ) )<br />

summary (m1)<br />

C a l l :<br />

glm ( f o r m u l a = y ∼ x1 + x2 , f a m i l y = p o i s s o n ( l i n k = l o g ) , data = dat )<br />

Deviance R e s i d u a l s :<br />

Min 1Q Median 3Q Max<br />

−1.41361 −0.36576 −0.07740 −0.02218 1 .90164<br />

C o e f f i c i e n t s :<br />

E s t i m a t e S t d . E r r o r z v a l u e Pr ( >| z | )<br />

( I n t e r c e p t ) 1 .32000 1 .49137 0 . 8 8 5 0 . 3 7 6<br />

x1 0 .15659 0 .03117 5 . 0 2 4 5 .05e−07 ***<br />

x2 −0.13446 0 .01869 −7.193 6 .34e−13 ***<br />

−−−<br />

S i g n i f . c o des : 0 ' *** ' 0 . 0 0 1 ' ** ' 0 . 0 1 ' * ' 0 . 0 5 ' . ' 0 . 1 ' ' 1<br />

( D i s p e r s i o n p a r a m e t e r f o r p o i s s o n f a m i l y taken to be 1)<br />

N u l l d e v i a n c e : 147 . 4 9 4 on 99 d e g r e e s o f freedom<br />

R e s i d u a l d e v i a n c e : 33 . 7 7 1 on 97 d e g r e e s o f freedom<br />

AIC : 79 . 4<br />

Number o f F i s h e r S c o r i n g i t e r a t i o n s : 7


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 22 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

Just for Curiosity, fit OLS<br />

lm1 ← lm ( y ∼ x1 + x2 , data=dat )<br />

summary ( lm1 )<br />

C a l l :<br />

lm ( f o r m u l a = y ∼ x1 + x2 , data = dat )<br />

R e s i d u a l s :<br />

Min 1Q Median 3Q Max<br />

−1.0386 −0.5489 −0.0799 0 . 2 3 1 9 4 . 7 0 8 2<br />

C o e f f i c i e n t s :<br />

E s t i m a t e S t d . E r r o r t v a l u e Pr ( >| t | )<br />

( I n t e r c e p t ) 0 .965354 0 .655034 1 . 4 7 4 0 .14379<br />

x1 0 .029317 0 .010552 2 . 7 7 8 0 .00656 **<br />

x2 −0.020878 0 .004155 −5.025 2 .3e−06 ***<br />

−−−<br />

S i g n i f . c o des : 0 ' *** ' 0 . 0 0 1 ' ** ' 0 . 0 1 ' * ' 0 . 0 5 ' . ' 0 . 1 ' ' 1<br />

R e s i d u a l s t a n d a r d e r r o r : 0 . 9 4 9 on 97 d e g r e e s o f freedom<br />

M u l t i p l e R 2 : 0 .2388 , A d j u s t e d R 2 : 0 . 2 2 3 1<br />

F − s t a t i s t i c : 15 . 2 1 on 2 and 97 DF, p−value : 1 .793e−06


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 23 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

See poisson-1.R for this example<br />

p l o t ( x1 , y , main=”Ugly P o i s s o n Data ”)<br />

l i b r a r y ( r o c k c h a l k )<br />

newdat ← e x p a n d . g r i d ( x1=p l o t S e q ( dat $x1 , l e n g t h . o u t =50) , x2=mean (<br />

dat $ x2 ) )<br />

newdat $p1 ← p r e d i c t (m1, newdata=newdat , t y p e=”r e s p o n s e ”)<br />

l i n e s ( newdat $x1 , newdat $p1 , lwd =3, c o l=”r e d ”)<br />

newdat $ lmp1 ← p r e d i c t ( lm1 , newdata=newdat )<br />

l i n e s ( newdat $x1 , newdat $lmp1 , lwd =3, c o l=”g r e e n ”)<br />

l e g e n d ( ” t o p l e f t ” , l e g e n d=c ( ”Exp. P o i s s o n ” , ”OLS ”) , lwd=c ( 3 , 3 ) , c o l=<br />

c ( ”r e d ” , ”g r e e n ”) )


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 24 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

See poisson-1.R for this example ...<br />

Ugly Poisson Data<br />

y<br />

0 1 2 3 4 5 6<br />

●<br />

Exp. Poisson<br />

OLS<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

● ●<br />

●<br />

●<br />

●●<br />

●●<br />

●<br />

● ● ●<br />

●<br />

● ●<br />

●●<br />

●<br />

●<br />

●●●<br />

●<br />

● ●<br />

●<br />

●<br />

●<br />

●<br />

● ●<br />

●<br />

30 40 50 60 70<br />

x1


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 25 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

From x2’s point of view<br />

p l o t ( x2 , y , main=”Ugly P o i s s o n Data , Again ”)<br />

newdat ← e x p a n d . g r i d ( x1=mean ( dat $ x1 ) , x2=p l o t S e q ( dat $x2 ,<br />

l e n g t h . o u t =50) )<br />

newdat $p1 ← p r e d i c t (m1, newdata=newdat , t y p e=”r e s p o n s e ”)<br />

p l o t ( y ∼ x2 , dat= dat )<br />

l i n e s ( newdat $x2 , newdat $p1 , lwd =3, c o l=”r e d ”)<br />

lm1 ← lm ( y ∼ x1 + x2 , data=dat )<br />

newdat $ lmp1 ← p r e d i c t ( lm1 , newdata=newdat )<br />

l i n e s ( newdat $x2 , newdat $lmp1 , lwd =3, c o l=”g r e e n ”)<br />

l e g e n d ( ” t o p r i g h t ” , l e g e n d=c ( ”Exp. P o i s s o n ” , ”OLS ”) , lwd=c ( 3 , 3 ) , c o l<br />

=c ( ”r e d ” , ”g r e e n ”) )


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 26 / 48<br />

Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

From x2’s point of view ...<br />

y<br />

0 1 2 3 4 5 6<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

●<br />

Exp. Poisson<br />

OLS<br />

● ● ● ●<br />

● ●●<br />

● ● ●●<br />

●●<br />

●<br />

●<br />

● ● ●<br />

●<br />

●<br />

●●<br />

● ●● ●●<br />

● ●<br />

●<br />

● ● ●<br />

●<br />

60 80 100 120 140<br />

x2


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 27 / 48<br />

Negative Binomial<br />

Outline<br />

1 Motivation<br />

2 Poisson.<br />

3 Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

4 Negative Binomial<br />

Gamma distributed heterogeneity<br />

Gamma distribution background<br />

NB estimation<br />

Overdispersion<br />

5 Zero inflated models<br />

6 Additional Readings


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 28 / 48<br />

Negative Binomial<br />

Poisson Weaknesses<br />

1 Poisson is a “one parameter” model. The variance is not separately<br />

under our control. Maybe we could find a two parameter distribution<br />

with a more well-suited variance parameter.<br />

2 Repeat the same point: The Poisson may not fit the data because<br />

the variance predicted by the Poisson may be too small for the<br />

observed data.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 29 / 48<br />

Negative Binomial<br />

Negative Binomial Derivation: Overdispersion<br />

The Negative Binomial can be described in a number of ways.<br />

I think the “extra randomness” interpretation is the simplest.<br />

input i additional random error that causes “heterogeneity”<br />

(sometimes the term “frailty” is used) in the outputs for cases that<br />

have the same observed values of X i .<br />

Suppose the Poisson process has an expected value:<br />

Note that if<br />

new input i = input i ∗ δ i<br />

δ i = 1<br />

then this thing just degenerates back to the original Poisson model.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 30 / 48<br />

Negative Binomial<br />

Log Link and Multiplicative Error<br />

In the most common version of the Poisson model, we use the “log<br />

link”<br />

input i = exp(X i b)<br />

Supplement with an additional error term u i :<br />

new input i = exp(X i b + u i )


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 31 / 48<br />

Negative Binomial<br />

Multiplicative Additive<br />

Easy:<br />

new input i = exp(X i b + u i ) = exp(X i b) × exp(u i )<br />

So one can either think of the new error as an additive bit of noise<br />

with the linear predictor (+u i ) or a multiplicative effect applied to<br />

the transformed linear predictor (×δ i = exp(u i )).<br />

Obviously, we can convert “back and forth”<br />

u i = log(δ i )


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 32 / 48<br />

Negative Binomial<br />

Vital to Pick δ i Distribution Properly<br />

It is necessary to assume that this new noise is“neutral”, in the sense<br />

that it causes more uncertainty, but it does not change the average<br />

outcome.<br />

That is true if<br />

or, equivalently,<br />

E[δ i ] = 1 =⇒ E(exp(u i )) = 1<br />

E[u i ] = 0<br />

“On average” the extra error term has “no effect”.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 33 / 48<br />

Negative Binomial<br />

Output is Conditional Poisson Model<br />

The maximum likelihood estimation has to be amended to<br />

incorporate a new likelihood component for each case.<br />

Hence, our theory says that GIVEN X i and an additional<br />

perturbation u i , the probability model is a a Poisson process.<br />

P(y i |X i , u i ) = exp(−new input i) × new input y i<br />

i<br />

y!<br />

The input on the right side includes the additional frailty.<br />

.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 34 / 48<br />

Negative Binomial<br />

Gamma distributed heterogeneity<br />

Gamma is most Common Frailty Distribution<br />

Gamma is common probability distribution for δ i = exp(u i )<br />

The full Gamma distribution has two parameters, but we are going<br />

to simplify them so we only need to worry about one, v = shape,<br />

which determines the variance. This simplification of the gamma can<br />

be done in several ways, which will be outlined later.<br />

The key think is this: If δ i is drawn from “a properly selected”<br />

gamma distribution, then E(δ i ) = 1 and<br />

Var[δ i ] = 1/some parameter we choose.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 35 / 48<br />

Negative Binomial<br />

Gamma distribution background<br />

Gamma Density Illustration<br />

The Gamma describes the probability of a continuous variable on [0, ∞].<br />

It can look like a “ski slope” or it can look single-peaked.<br />

Figure: Gamma Distribution


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 36 / 48<br />

Negative Binomial<br />

Gamma distribution background<br />

Gamma PDF<br />

2 parameters, shape and scale. In some books, the scale parameter<br />

is replaced by a parameter called rate, which is equal to 1/scale.<br />

If δ i is Gamma distributed, the probability density function is:<br />

1<br />

f (δ i ) =<br />

scale shape Γ(shape) δ(shape−1) i<br />

e −( δ i<br />

scale )


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 37 / 48<br />

Negative Binomial<br />

Gamma distribution background<br />

What is that Gamma function?<br />

The function Γ(shape) is the<br />

Gamma function (which is a<br />

complicated math thing I’ve never<br />

looked into very much). It is<br />

Γ(s) = ´ ∞<br />

t s−1 e −t dt if s > 0. If<br />

0<br />

you pick s as an integer, Γ(s) is<br />

very easy to calculate:<br />

Γ(s) = (s − 1)! s = 1, 2, ...<br />

So, the value of Γ(1) = 1. And<br />

Γ(2) = 1. And Γ(20) is some<br />

impossibly huge number.<br />

gamma(x)<br />

0 5 10 15 20<br />

Gamma function<br />

0 1 2 3 4 5<br />

x


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 38 / 48<br />

Negative Binomial<br />

Gamma distribution background<br />

Adjust the Gamma PDF to create the right kind of<br />

heterogeneity.<br />

The two parameter Gamma probability distribution has these<br />

interesting properties:<br />

E(δ i ) = shape ∗ scale<br />

Var(δ i ) = shape ∗ scale 2<br />

Simplify: scale = 1/shape<br />

The expected value and variance are<br />

E[δ i ] = shape/shape = 1<br />

Var[δ i ] = shape/shape 2 = 1/shape


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 39 / 48<br />

Negative Binomial<br />

Gamma distribution background<br />

Suppose shape is the Same for All Observations<br />

The shape parameter is assumed to exist, we need to estimate it.<br />

In this formulation, it is easy to see that if shape is very large, then<br />

the variance of δ i is very small. The extra heterogeneity has only a<br />

minor effect, and, in fact, as shape tends to ∞, the value of δ i<br />

collapses around 1.0.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 40 / 48<br />

Negative Binomial<br />

Gamma distribution background<br />

Another Derivation that ends up at the same place<br />

Fix the scale = 1.<br />

Draw a random variable m i with gamma(shape, 1). Then the<br />

probability density formula with scale=1 simplifies to:<br />

1<br />

f (m i ) =<br />

Γ(shape) m(shape−1) i<br />

e −m i<br />

shape > 0<br />

If shape=1, then this is an exponential distribution (because<br />

Γ(1) = 1).<br />

The expected value and variance are:<br />

and<br />

E[m i ] = shape<br />

Var[m i ] = shape.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 41 / 48<br />

Negative Binomial<br />

Gamma distribution background<br />

Not Better formulation, just Different<br />

The advantage of this formulation is that we can easily see what we<br />

need to do to convert m i into our final result:<br />

δ i =<br />

m i<br />

shape<br />

Notice that after diving each draw by shape, we have a variable with<br />

just the same properties as the other formulation.<br />

m i<br />

E(δ i ) = E(<br />

shape ) = 1<br />

shape E(m i) = shape<br />

shape = 1<br />

and also<br />

m i<br />

V (δ i ) = V (<br />

shape ) = 1<br />

shape 2 V (m i) = shape<br />

shape 2 = 1<br />

shape<br />

If you go back and forth between books, you get a headache because<br />

no two books seem to write this down in exactly the same way. But<br />

I’m pretty sure I’ve written it down correctly.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 42 / 48<br />

Negative Binomial<br />

Gamma distribution background<br />

Illustrate m i /shape<br />

.<br />

Histogram gamma/shape, shape= 0.5 scale= 1<br />

0.0 0.6<br />

0 1 2 3 4 5<br />

z<br />

Histogram gamma/shape, shape= 1 scale= 1<br />

0.0 0.6<br />

0 1 2 3 4 5<br />

z<br />

Histogram gamma/shape, shape= 5 scale= 1<br />

0.0 0.6 1.2<br />

0 1 2 3 4 5<br />

z


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 43 / 48<br />

Negative Binomial<br />

Gamma distribution background<br />

Illustrate log( v i<br />

shape )<br />

Histogram log(gamma/shape), shape= 0.5 scale= 1<br />

0.00 0.15<br />

−10 −8 −6 −4 −2 0 2<br />

log(z)<br />

Histogram log(gamma/shape), shape= 1 scale= 1<br />

0.00 0.25<br />

−10 −8 −6 −4 −2 0 2<br />

log(z)<br />

Histogram log(gamma/shape), shape= 5 scale= 1<br />

0.0 0.4 0.8<br />

−10 −8 −6 −4 −2 0 2<br />

log(z)


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 44 / 48<br />

Negative Binomial<br />

Gamma distribution background<br />

About that“shape”parameter<br />

Histogram gamma shape= 0.5 scale= 1<br />

Frequency<br />

0 150 350<br />

0 1 2 3 4 5 6 7<br />

z<br />

Histogram gamma shape= 1 scale= 1<br />

Frequency<br />

0 100<br />

0 1 2 3 4 5 6<br />

z<br />

Histogram gamma shape= 5 scale= 1<br />

Frequency<br />

0 40 80<br />

0 5 10 15<br />

z


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 45 / 48<br />

Negative Binomial<br />

Gamma distribution background<br />

Estimating<br />

Fitting is an iterative, two-stage process.<br />

The shape estimate is chosen<br />

Then the slope parameters are estimated.<br />

Repeat until estimates converge to stable values.<br />

The MASS package for R provides a procedure “glm.nb” which will<br />

do maximum likelihood to estimate the b’s and the shape parameter.<br />

(In Venables & Ripley, p. 207, the “shape” parameter is called θ.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 46 / 48<br />

Negative Binomial<br />

Gamma distribution background<br />

The Negative Binomial Distribution “Pops Out”<br />

If you start with a Poisson model, and then add random noise with<br />

multiplicative Gamma error,<br />

Y | δ i ∼ Poisson(input i ∗ δ i )<br />

The result is known (in probability theory) to be a Negative<br />

Binomial Distribution.<br />

Γ(shape + y) input y shape shape<br />

f y (y|shape, input) = •<br />

Γ(shape)y! (input + shape) shape+y<br />

(Venables and Ripley, 4th ed, p. 206)<br />

E(y i ) = input<br />

Var(y i ) = input + input 2 /shape<br />

Note that if shape = ∞, then the variance of y i is just input i , meaning<br />

the original Poisson model is back! But for other values of the shape<br />

parameter, the variance of y i is greater than in the Poisson model.


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 47 / 48<br />

Zero inflated models<br />

The results indicate one surprise, that the expected value of y is the same<br />

in the Poisson and the NB model. However, the variance is different. In<br />

the NB model, the variance is<br />

(<br />

Var(y i |X ) = exp(X i b) 1 + exp(X )<br />

ib)<br />

v i<br />

Estimates from a Poisson model are inefficient and have bad standard<br />

errors if the data is really produced by a heterogeneous process of the NB<br />

sort.<br />

Note the Poisson model is really“nested”inside the NB model. If we do a<br />

significance test of H o : α = 0 and cannot reject it, then it means we<br />

ought to go back to the Poisson. Long p. 237 discusses other tests. See


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 47 / 48<br />

Zero inflated models<br />

the R package pscl for a test that can be used.<br />

Outline<br />

1 Motivation<br />

2 Poisson.<br />

3 Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

4 Negative Binomial<br />

Gamma distributed heterogeneity<br />

Gamma distribution background<br />

NB estimation<br />

Overdispersion<br />

5 Zero inflated models<br />

6 Additional Readings


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 47 / 48<br />

Zero inflated models<br />

The Poisson or NB models might not match the data because they don’t<br />

have enough observed 0’s.<br />

The “fix” is to think of the probability process as a two step thing. First,<br />

the observed y is either 0 or a number y i . Whether it is observed or not is<br />

modeled by any dichotomous regression model, such as logit or probit.<br />

Second, if it is observed, the count is given by one of the models above.<br />

All kinds of details flow forth if you get into writing out one of these<br />

models. Should the predictors in the dichotomous regression exercise be<br />

the same ones that are used in the Poisson or NB regression? Should we<br />

insist the predictive part of the probit model is proportional to the count<br />

model?<br />

Now, how can a probability process give back a 0?<br />

Either through the failure of the probit stage or a predicted 0 from the<br />

count stage, so in the Poisson<br />

P(y i = 0|X i ) = ψ i + (1 − ψ i ) ∗ exp(−exp(Xb))<br />

(Write out the poisson for y=0 to understand the last term).<br />

And the probability of any other value is given by the regular poisson,<br />

multiplied by (1 − ψ i ) :


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 48 / 48<br />

Additional Readings<br />

P(y i |X i ) = (1 − ψ i ) ∗ Poisson(Xb)


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 48 / 48<br />

Additional Readings<br />

Outline<br />

1 Motivation<br />

2 Poisson.<br />

3 Poisson: λ<br />

Graph the Poisson Distribution to Get a “Feeling”.<br />

4 Negative Binomial<br />

Gamma distributed heterogeneity<br />

Gamma distribution background<br />

NB estimation<br />

Overdispersion<br />

5 Zero inflated models<br />

6 Additional Readings


<strong>Count</strong> <strong>Regression</strong> <strong>Introduction</strong> 48 / 48<br />

Additional Readings<br />

For more reading on <strong>Count</strong> models, consult the following, probably in this<br />

order:<br />

Scott Long, <strong>Regression</strong> Models for Categorical and Limited Dependent<br />

Variables, Chapter 8, “<strong>Count</strong> outcomes”<br />

Gary King. 1988. Statistical Models for Political Science Event <strong>Count</strong>s:<br />

Bias in Conventional Procedures and Evidence for the Exponential<br />

Poisson <strong>Regression</strong> Model, American Journal of Political Science, 32(3):<br />

838-863.<br />

Gary King. 1989. Variance Specification in Event <strong>Count</strong> Models: From<br />

Restrictive Assumptions to a Generalized Estimator. American Journal of<br />

Political Science 33(3): 762-784<br />

Cameron and Trivedi, 1998. <strong>Regression</strong> Analysis of <strong>Count</strong> Data,<br />

Cambridge University Press.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!