Notes on Poisson Regression and Some Extensions

More documents

Recommendations

Info

Pr(Y=y)0 .1 .2 .30 2 4 6 8yFigure 2: Comparison of Zero-Inflated Poisson and Poisson and Observed DataandCoding this in Stata is straightforward.scalar ll = e(ll)scalar npar = e(k)scalar nobs = e(N)scalar AIC = -2*ll + 2*nparscalar BIC = -2*ll + log(nobs)*nparscalar list AICscalar list BICBIC = −2 log L + df log nComparing the models we get AIC (BIC) of 5102.2 (5139.4) for the Poisson regression and AIC(BIC) 4963.1 (5032.1) for the zero-inflated model. Lower is better, so the additional work offitting the zero-inflated model is warranted.Nonindependent Events. It is reasonable to expect that there is a certain degree ofdependence between the individual births that constitute a woman’s completed fertility. We canbuild this dependency into the model in a couple of ways. First, we will rewrite the standardmodel as as random effects model with woman-specific unobserved heterogeneity.log(µ i ) = x i β + u iThis should look familiar. The individual-level random effect u has been added to the model, as itwas in the random effects logit models outlined earlier in the course. Statisticians have longrecognized that if a marginal distribution is combined with a particular prior distribution for therandom effect, the resulting distribution is an entirely new distribution. This is the case here, if wespecify a multiplicative factor v that raises or lowers the rate of childbearing for a women in oursample, and assume that it follows a gamma distribution, the resulting marginal distribution is nolonger Poisson, but negative binomial. For example, suppose we consider a multiplicative model.Let’s assume that v is distributed as gamma.µ i = exp(x i β)v iv ∼ gamma(α, β),10
such that E(v) = α/β and var(v) = α/β 2 . In Bayesian terms, the gamma distribution is theconjugate prior distribution for the Poisson (and other distributions in the exponential family).When the prior distribution is combined with the Poisson distribution for y, conditional on v, theresulting unconditional distribution (or posterior distribution) of y is negative binomial. This wasdiscovered not long after the Poisson distribution, perhaps round 1909.For convenience, we normalize the distribution of v so that it has a mean of 1.0 as follows:E(v) = 1.0and this implies,so the resulting distribution for v isvar(v) = 1/α,g(v) = αα v α−1Γ(α)exp(−αv) α > 0The likelihood of y for the ith woman conditional on her random effect v i is.L(y i |v i )To obtain the marginal likelihood of y for the whole sample, we need to integrate over thedistribution of v for each woman in our sample in order to “average” out the random effect.L m = ∏ ∫L(y i |v i )g(v)dvivThe resulting distribution can be evaluated in closed formL m = ∏ iΓ(y i + 1 α ) [αµ] i yi[1] 1αΓ(y i + 1)Γ( 1 α ) 1 + αµ i 1 + αµ iExercise 5. Derive the negative binomial distribution as a mixture of a Poisson and gammadistribution.A negative binomial variable has mean E(Y ) = µ and variance var(Y ) = µ + µα −1 . The loglikelihood function for this model is,log L =n∑log(1 − α) − log(1 + αy i ) + y i log µ i − (y + 1 α ) log(1 − αµ i) − log Γ(y i + 1).i=1It is a bit more difficult to optimize this model, but it is straightforward. All major statisticalsoftware (SAS, R, Stata have routines for estimating this model.Estimation. We maximize this likelihood with respect to the parameters β and α, theoverdispersion parameter. This is the standard deviation of the gamma-distributed random effectmentioned earlier (note that we have assumed this effect to have a mean of 1 and have estimatedits variance using the current sample of women). In fact, the negative binomial is identical to anindividual-level random-effects Poisson regression. We get identical results using either nbreg orxtpois, however we must supply a respondent ID number in order to identify the clusters (eachof size 1) to be used in the random effects specification.11
Page 1 and 2: Notes on Poisson R
Page 3 and 4: IRLS uses a weighted regression of
Page 5 and 6: . tab yy | Freq. Percent Cum.------
Page 7 and 8: . fitstatMeasures of Fit for poisso
Page 9: -----------------------------------
Page 13 and 14: Proportion0 .1 .2 .30 2 4 6 8Number
Page 15 and 16: Density0 1 2 3 4 5.8 1 1.2 1.4vhat1
Page 17 and 18: coded 0, otherwise it is coded 1 (e
Page 19 and 20: logT | (offset)--------------------
Page 21 and 22: infile v1 intr v3 d ts tf v7 cid c9

Notes on Poisson Regression and Some Extensions

Create successful ePaper yourself

Delete template?

Save as template?