12.07.2015 Views

Notes on Poisson Regression and Some Extensions

Notes on Poisson Regression and Some Extensions

Notes on Poisson Regression and Some Extensions

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<str<strong>on</strong>g>Notes</str<strong>on</strong>g> <strong>on</strong> Poiss<strong>on</strong> Regressi<strong>on</strong> <strong>and</strong> <strong>Some</strong> Extensi<strong>on</strong>sDan Powers Soc385K Fall 07Poiss<strong>on</strong> Regressi<strong>on</strong>. The Poiss<strong>on</strong> distributi<strong>on</strong> is the most comm<strong>on</strong> probability distributi<strong>on</strong> forcount data. The Poiss<strong>on</strong> probability model is appropriate for events that occur r<strong>and</strong>omly overtime <strong>and</strong>/or space. The probability functi<strong>on</strong> for Y is given byPr(Y = y|µ) = e−µ µ yy!y ≥ 0, µ > 0 (1)Usually, we will be interested in estimating µ from the data. For example, an ML estimate is thevalue of µ that is most c<strong>on</strong>sistent with the observed data. This estimate is particularly easy tofind for a model without covariates. It is simply the mean count. For example, the number ofoutbreaks of strikes per week in the U.K. from 1948-1949 follows a Poiss<strong>on</strong> distributi<strong>on</strong>.Table 1: Distributi<strong>on</strong> of Major Strikes in the U.K., 1948-59, by Number of Outbreaks per Week.No. Outbreaks/Wk. Observed Expected0 252 2551 229 2292 109 1033 28 314+ 8 7Totals 626 626χ 2 4 = 0.807, p = 0.94The mean number of strikes in the 626 week observati<strong>on</strong> period is ̂µ = 0.899 (taking 4+ = 4).The expected frequencies are generated as N × Pr(Y = y), where ̂µ is substituted in theexpressi<strong>on</strong> in Eq. 1.Poiss<strong>on</strong> Sampling Assumpti<strong>on</strong>s.• We must be able to divide time or space, usually denoted t, into a large number of subintervalssuch that the probability of occurrence of an event in each of these subintervals is very small.• The probability of the event in each of the subintervals is c<strong>on</strong>stant over the time period beingc<strong>on</strong>sidered.• The probability of two or more event occurrences in each/any subinterval must be smallenough to be ignored.• The occurrence or n<strong>on</strong>occurrence of an event in <strong>on</strong>e interval must not affect the event occurrenceor n<strong>on</strong>occurrence in any other subinterval.1


Moments.var(Y ) = µE(Y ) = µThe mean rateThe variance mean identityHow well does the mean/variance equivalence hold in the example data above? Obtain theobserved proporti<strong>on</strong>s of counts <strong>and</strong> apply the usual formula for the mean <strong>and</strong> variance of ar<strong>and</strong>om variable. We find, as before, E(Y ) = 0.899 <strong>and</strong> var(Y ) = 0.860, which are close enoughfor most government work.Models <strong>and</strong> Estimati<strong>on</strong>.We usually let µ depend <strong>on</strong> data through a loglinear model , so thatlog(µ i ) = x ′ iβThe log likelihood is written aslog L =n∑ {x′i β − exp(x ′ iβ) − log Γ(y i + 1) }i=1We can ignore the last term in this expressi<strong>on</strong> as it does not involve parameters. In other words,the log L evaluated without this term differs from the log L above by an additive c<strong>on</strong>stant that isindependent of the model parameters.Exercise 1: Suppose we are interested in finding the MLE of µ (i.e., without covariates). Showthat the log likelihood is simplyn∑log L = log µ y i − nµ<strong>and</strong> that the MLE is ̂µ = ∑ y i /n by solving the first-order c<strong>on</strong>diti<strong>on</strong>s for an MLE.Exercise 2: Use the sec<strong>on</strong>d-order c<strong>on</strong>diti<strong>on</strong>s for a maximum to show that the variance of̂µ = ̂µ 2 / ∑ y i .Exercise 3: Develop a least-squares estimator for µ by (1) devising a appropriate loss functi<strong>on</strong>(i.e., sum of squared deviati<strong>on</strong>s around the expected value), <strong>and</strong> (2) solving the normal equati<strong>on</strong>for µ.i=1Numerical Methods. (Please skip this if you are not interested in gory details underlying theblack box of most statistical programs.) The score vector <strong>and</strong> Hessian matrix are<strong>and</strong>u = X ′ (y − m)H = −X ′ MX,where m = exp(x ′ îβ) <strong>and</strong> M = diag(m)We can apply Newt<strong>on</strong>-Raphs<strong>on</strong> or IRLS to this problem. The NR iterati<strong>on</strong> steps arêβ (t) = ̂β (t−1) [− H (t−1)] −1U (t−1) .2


IRLS uses a weighted regressi<strong>on</strong> of ẑẑ i = ̂η i + (y i − ̂m i )/ ̂m i ,where ̂η i = x ′ îβ <strong>and</strong> the iterative weights are ̂m i = exp(̂η i ). All quantities (except y) are updatedat each iterati<strong>on</strong>.Interpretati<strong>on</strong>. The exp<strong>on</strong>entiated model parameters exp(β) are interpreted as multiplicativeeffects <strong>on</strong> the rate. For a given coefficient pertaining to a unit change in a variable x, valuesgreater than 1 raise the expected rate; values less than 1 lower the expected rate; values equal to1 have no impact <strong>on</strong> the expected rate.Examples. The following examples show results from models for number of children born for asample of women as a functi<strong>on</strong> of several covariates. A first step is to examine the data. Here weare using the GSS. Let’s determine the overall mean childbearing rate by fitting a null modelusing the glm procedure or poiss<strong>on</strong>. Note the technique used to get baseline rate, or theexp<strong>on</strong>entiated effect of the c<strong>on</strong>stant term.Null Model.. * NULL model. gen c<strong>on</strong>s = 1. glm childs c<strong>on</strong>s, noc<strong>on</strong>s f(p)Generalized linear models No. of obs = 1501Optimizati<strong>on</strong> : ML Residual df = 1500Scale parameter = 1Deviance = 2430.731764 (1/df) Deviance = 1.620488Pears<strong>on</strong> = 2146.818275 (1/df) Pears<strong>on</strong> = 1.431212Variance functi<strong>on</strong>: V(u) = u[Poiss<strong>on</strong>]Link functi<strong>on</strong> : g(u) = ln(u) [Log]AIC = 3.68491Log likelihood = -2764.524679 BIC = -8540.098------------------------------------------------------------------------------| OIMchilds | Coef. Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]-------------+----------------------------------------------------------------c<strong>on</strong>s | .6623651 .0185344 35.74 0.000 .6260383 .6986919------------------------------------------------------------------------------. glm, eform------------------------------------------------------------------------------| OIMchilds | IRR Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]-------------+----------------------------------------------------------------c<strong>on</strong>s | 1.939374 .0359452 35.74 0.000 1.870187 2.01112------------------------------------------------------------------------------3


Pr(Y=y)0 .1 .2 .30 2 4 6 8yFigure 1: Comparis<strong>on</strong> of Poiss<strong>on</strong> Model µ = 2 <strong>and</strong> Observed Data. poiss<strong>on</strong> childs c<strong>on</strong>s, noc<strong>on</strong>sPoiss<strong>on</strong> regressi<strong>on</strong> Number of obs = 1501Wald chi2(1) = 1277.14Log likelihood = -2764.5247 Prob > chi2 = 0.0000------------------------------------------------------------------------------childs | Coef. Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]-------------+----------------------------------------------------------------c<strong>on</strong>s | .6623651 .0185344 35.74 0.000 .6260383 .6986919------------------------------------------------------------------------------. poiss<strong>on</strong>, irrchilds | IRR Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]-------------+----------------------------------------------------------------c<strong>on</strong>s | 1.939374 .0359452 35.74 0.000 1.870187 2.01112------------------------------------------------------------------------------We find that the average number of kids is roughly 2. Suppose we take this rate as the theoreticalmean rate in the populati<strong>on</strong>. We can generate a Poiss<strong>on</strong> distributi<strong>on</strong> under this assumpti<strong>on</strong> <strong>and</strong>compare to the observed distributi<strong>on</strong> as shown in Figure 1. Below are the Stata comm<strong>and</strong>s to dothis.gen y = childsmatrix bpois = e(b)gen lampois = exp(bpois[1,1])scalar lampois = 2* generate the predicted distributi<strong>on</strong> for this value of lambda* <strong>and</strong> plot against empirical distributi<strong>on</strong>gen PrY_pois = exp(-lampois)*lampois^y/exp(lngamma(y+1))twoway (histogram y, discrete fracti<strong>on</strong> blcolor(black) bfcolor(n<strong>on</strong>e) ///legend(off) ytitle(Pr(Y=y)) ) (c<strong>on</strong>nected PrY_pois y, sort legend(off))A listing of the distributi<strong>on</strong> might also useful for comparis<strong>on</strong> to the theoretical distributi<strong>on</strong>µ = 2.4


. tab yy | Freq. Percent Cum.------------+-----------------------------------0 | 359 23.92 23.921 | 266 17.72 41.642 | 404 26.92 68.553 | 247 16.46 85.014 | 128 8.53 93.545 | 48 3.20 96.746 | 20 1.33 98.077 | 8 0.53 98.608 | 21 1.40 100.00------------+-----------------------------------Total | 1,501 100.00. tab y, sum(PrY_pois) mean| Summary of| PrY_poisy | Mean------------+------------0 | .143793971 | .278870252 | .270416833 | .174813094 | .084756995 | .032875096 | .010626187 | .002944028 | .00071369------------+------------Total | .19380861Incidentally, the estimated mean <strong>and</strong> variance of y are respectively 1.94 <strong>and</strong> 2.79, suggesting adeparture from the usual Poiss<strong>on</strong> sampling assumpti<strong>on</strong>s.Full Model. Next we add several covariates to the model. For example, coho10 is a linear effectof 10-year birth cohort to capture secular trends toward fewer children am<strong>on</strong>g later birth cohorts;married is a dummy variable to capture the tendency for women to postp<strong>on</strong>e childbearing untilafter marriage. 1 The models also c<strong>on</strong>trol for resp<strong>on</strong>dent’s educati<strong>on</strong>, income <strong>and</strong> race.. xi:glm childs coh10 married i.deg n<strong>on</strong>wht income, family(p)i.deg _Ideg_1-3 (naturally coded; _Ideg_1 omitted)Generalized linear models No. of obs = 1496Optimizati<strong>on</strong> : ML Residual df = 1489Scale parameter = 1Deviance = 1934.104383 (1/df) Deviance = 1.298928Pears<strong>on</strong> = 1661.586326 (1/df) Pears<strong>on</strong> = 1.1159081 Note that this measure reflects current marital status, not necessarily the marital status at the time of childbearing.5


Variance functi<strong>on</strong>: V(u) = u[Poiss<strong>on</strong>]Link functi<strong>on</strong> : g(u) = ln(u) [Log]AIC = 3.369277Log likelihood = -2513.219143 BIC = -8951.305------------------------------------------------------------------------------| OIMchilds | Coef. Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]-------------+----------------------------------------------------------------coh10 | -.1780332 .0107876 -16.50 0.000 -.1991765 -.15689married | .3857223 .0408991 9.43 0.000 .3055616 .4658831_Ideg_2 | -.161813 .051968 -3.11 0.002 -.2636684 -.0599575_Ideg_3 | -.4039166 .0623524 -6.48 0.000 -.5261251 -.2817081n<strong>on</strong>wht | .2701722 .0450656 6.00 0.000 .1818453 .3584991income | -.0279788 .0080084 -3.49 0.000 -.043675 -.0122826_c<strong>on</strong>s | 2.229226 .100984 22.08 0.000 2.031301 2.427151------------------------------------------------------------------------------. glm, eform------------------------------------------------------------------------------| OIMchilds | IRR Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]-------------+----------------------------------------------------------------coh10 | .8369146 .0090283 -16.50 0.000 .8194053 .8547981married | 1.470676 .0601493 9.43 0.000 1.357387 1.593421_Ideg_2 | .8506003 .044204 -3.11 0.002 .7682282 .9418045_Ideg_3 | .6676998 .0416327 -6.48 0.000 .5908902 .7544939n<strong>on</strong>wht | 1.31019 .0590445 6.00 0.000 1.199429 1.43118income | .972409 .0077875 -3.49 0.000 .957265 .9877925------------------------------------------------------------------------------. * or another way. xi:poiss<strong>on</strong> childs coh10 married i.deg n<strong>on</strong>wht income, irri.deg _Ideg_1-3 (naturally coded; _Ideg_1 omitted)Poiss<strong>on</strong> regressi<strong>on</strong> Number of obs = 1496LR chi2(6) = 483.99Prob > chi2 = 0.0000Log likelihood = -2513.2191 Pseudo R2 = 0.0878------------------------------------------------------------------------------childs | IRR Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]-------------+----------------------------------------------------------------coh10 | .8369146 .0090283 -16.50 0.000 .8194053 .8547981married | 1.470676 .0601493 9.43 0.000 1.357387 1.593421_Ideg_2 | .8506003 .044204 -3.11 0.002 .7682282 .9418045_Ideg_3 | .6676998 .0416327 -6.48 0.000 .5908902 .7544939n<strong>on</strong>wht | 1.31019 .0590445 6.00 0.000 1.199429 1.43118income | .972409 .0077875 -3.49 0.000 .957265 .9877925------------------------------------------------------------------------------6


. fitstatMeasures of Fit for poiss<strong>on</strong> of childsLog-Lik Intercept Only: -2755.216 Log-Lik Full Model: -2513.219D(1489): 5026.438 LR(6): 483.994Prob > LR: 0.000McFadden’s R2: 0.088 McFadden’s Adj R2: 0.085ML (Cox-Snell) R2: 0.276 Cragg-Uhler(Nagelkerke) R2: 0.284AIC: 3.369 AIC*n: 5040.438BIC: -5858.971 BIC’: -440.131BIC used by Stata: 5077.612 AIC used by Stata: 5040.438The c<strong>on</strong>stant implies that the c<strong>on</strong>diti<strong>on</strong>al mean rate of childbearing over the life of the sampledwomen is exp(2.23) = 9.293 children. Wow! This seems large, but remember that it is adjusted upor down depending <strong>on</strong> covariates. Here is a table showing predicted rates by race, marital status,<strong>and</strong> “boomer” status evaluated at the mean level of income <strong>and</strong> educati<strong>on</strong>. This is actually based<strong>on</strong> a different model that distinguishes between those women born before 1950 <strong>and</strong> those bornbetween 1950 <strong>and</strong> 1970 (boomer).. prtab married boomer n<strong>on</strong>whtpoiss<strong>on</strong>: Predicted rates for childs--------------------------------------------| n<strong>on</strong>wht <strong>and</strong> boomer| ------ 0 ----- ------ 1 -----married | 0 1 0 1----------+---------------------------------0 | 2.0372 1.1673 2.5443 1.45791 | 2.8994 1.6614 3.6211 2.0749--------------------------------------------boomer married _Ideg_2 _Ideg_3 n<strong>on</strong>wht incomex= .60093583 .46657754 .5447861 .31149733 .22393048 10.695856The income effect is −0.028, <strong>and</strong> income is measured in $2500 increments, so a $2500 increase inincome brings about a exp(−0.028) = 0.97 change in the number of children.Exercise 4: Show that this is a 3% reducti<strong>on</strong> in the rate per $2500 increase in income. Show thatmarriage increases the rate by 47%.Zero Inflated Poiss<strong>on</strong> Data. Our data could c<strong>on</strong>tain an excess of zeros. By this we meanthat there are some zeros that would occur naturally but that the data could c<strong>on</strong>tain more zerosthan would be expected under Poiss<strong>on</strong> sampling. The excess zeros will affect the mean <strong>and</strong>variance. In particular, the mean/variance equivalence will no l<strong>on</strong>ger hold. For example, themean number of children in the U.S. is about 2 children per family. We can compare this to ourobserved distributi<strong>on</strong> (Fig. 1). However, we find that the unc<strong>on</strong>diti<strong>on</strong>al Poiss<strong>on</strong> model does notpredict the zeros as well as it predicts the other values. In this case, we may want to test thezero-inflated Poiss<strong>on</strong> model against the st<strong>and</strong>ard model.7


The probability functi<strong>on</strong> for zero-inflated count variable Y z can be specified as,⎧⎨p + (1 − p)e −µ if y = 0Pr(Y z = y|p, µ) =⎩(1 − p) e−µ µ yy!if y > 0(2)Unlike the Poiss<strong>on</strong> distributi<strong>on</strong> which is determined by a single parameter, µ, the ZIP distributi<strong>on</strong>is determined by two parameters µ (i.e., the parameter for the st<strong>and</strong>ard Poiss<strong>on</strong> distributi<strong>on</strong>) <strong>and</strong>p (i.e., the probability of being an inflated zero). The mean <strong>and</strong> variance of a zero-inflated countvariable are, respectively,E(Y z ) = µ(1 − p)<strong>and</strong>var(Y z ) = µ(1 − p)(1 + µp),which shows the nature of the overdispersi<strong>on</strong> via the mean to variance ratio when count data arecharacterized by an excess of 0’s. The variance is (1 + µp) times the mean.Estimati<strong>on</strong>.We can model the inflated-zero probability as a functi<strong>on</strong> of covariates,<strong>and</strong> the Poiss<strong>on</strong> rates as,The likelihood functi<strong>on</strong> can then be written as,p i = exp(x′ i α)1 + exp(x ′ i α),µ i = exp(x ′ iβ).log L =n∑log p i + (1 − p i ) exp(µ i ) + log{(1 − p i ) exp(−µ i )} + yµ i − log Γ(y i + 1). (3)i=1Interpretati<strong>on</strong>. Interpretati<strong>on</strong> of the results from this model is straightforward. We caninterpret the parameters in the inflated porti<strong>on</strong> as effects <strong>on</strong> the log odds of being zero. 2 Theother parameters have the usual Poiss<strong>on</strong> regressi<strong>on</strong> interpretati<strong>on</strong>. Using results from the nullmodel <strong>and</strong> applying the formulas above for the mean <strong>and</strong> variance, we get ̂µ = e 0.829 (0.846),which is 1.94 for the mean, where ̂α = −1.7069, so p = 0.153. Using the formula above, thevariance is 2.621. The mean is exactly reproduced by the model, the variance is closer to theestimate of 2.79 from the raw data than was the estimate from the Poiss<strong>on</strong> regressi<strong>on</strong> model.Example.Zero-inflated poiss<strong>on</strong> regressi<strong>on</strong> Number of obs = 1496N<strong>on</strong>zero obs = 1140Zero obs = 356Inflati<strong>on</strong> model = logit LR chi2(6) = 180.92Log likelihood = -2468.552 Prob > chi2 = 0.00002 Other probability distributi<strong>on</strong>s for zero-inflati<strong>on</strong> are possible.8


------------------------------------------------------------------------------childs | Coef. Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]-------------+----------------------------------------------------------------childs |boomer | -.4280169 .0415171 -10.31 0.000 -.509389 -.3466449married | .0851949 .0454331 1.88 0.061 -.0038523 .1742421_Ideg_2 | -.2272821 .0532389 -4.27 0.000 -.3316285 -.1229357_Ideg_3 | -.4033855 .0640953 -6.29 0.000 -.5290099 -.2777611n<strong>on</strong>wht | .1115982 .0490869 2.27 0.023 .0153897 .2078068income | -.0271459 .0083784 -3.24 0.001 -.0435673 -.0107244_c<strong>on</strong>s | 1.473333 .0846589 17.40 0.000 1.307404 1.639261-------------+----------------------------------------------------------------inflate |boomer | 1.486534 .2721459 5.46 0.000 .9531378 2.01993_Ideg_2 | .0095591 .3414682 0.03 0.978 -.6597061 .6788244_Ideg_3 | .7008029 .355765 1.97 0.049 .0035162 1.39809n<strong>on</strong>wht | -1.289767 .3455102 -3.73 0.000 -1.966955 -.6125796_c<strong>on</strong>s | -2.027314 .3265346 -6.21 0.000 -2.66731 -1.387317------------------------------------------------------------------------------Here we find that having less than a HS educati<strong>on</strong> ( Ideg 2) has a negligible effect <strong>on</strong> beingchildless, but a large effect <strong>on</strong> the rate of childbearing. We can use BIC/AIC criteria to check thismodel against the Poiss<strong>on</strong> regressi<strong>on</strong> model (these are not nested). It is worthwhile to examinehow well it fits the observed distributi<strong>on</strong> compared to the Poiss<strong>on</strong> model. Fig. 2 examines the nullmodel of each specificati<strong>on</strong>. Below is the code to make the graph.*do Poiss<strong>on</strong> Assumpti<strong>on</strong>s hold?* zero inflati<strong>on</strong> NULL model accounts for excess 0 counts --> compare to Poiss<strong>on</strong> NULL model*gen c<strong>on</strong>s = 1zip childs, inflate(c<strong>on</strong>s)matrix bzip = e(b)gen lamzip = exp(bzip[1,1])gen p1 = exp(bzip[1,2])/(1 + exp(bzip[1,2]))gen p0 = 1 - p1gen PrY_zip = p1 + p0*( exp(-lamzip)*lamzip^y/exp(lngamma(y+1)) ) if y==0replace PrY_zip = p0*( exp(-lamzip)*lamzip^y/exp(lngamma(y+1)) ) if (y ~=0)twoway (histogram y, discrete fracti<strong>on</strong> blcolor(black) bfcolor(n<strong>on</strong>e) ///legend(off) ytitle(Pr(Y=y)) ) (c<strong>on</strong>nected PrY_pois y, sort legend(off)) ///(c<strong>on</strong>nected PrY_zip y, sort legend(off))There is always the questi<strong>on</strong> of which variables should appear in each secti<strong>on</strong> of the zip model.It is best to retain the complete model specificati<strong>on</strong> in the Poiss<strong>on</strong> part <strong>and</strong> attempt to justify(theoretically) which variables should should be predictive of zeros, <strong>and</strong> included in the zeroinflati<strong>on</strong> porti<strong>on</strong> of the model. Then we can trim the model accordingly based <strong>on</strong> significancetests. How well does this model compare to the st<strong>and</strong>ard Poiss<strong>on</strong> model. They are not nested, sowe should use other ad-hoc criteria such as AIC or BIC. We can use L<strong>on</strong>g <strong>and</strong> Freese fitstatroutine. However, it is not hard to do this by h<strong>and</strong>. First, we need a definiti<strong>on</strong>. There are severalversi<strong>on</strong> of these statistics. Here we adopt <strong>on</strong>e that is <strong>on</strong> the same scale as −2 log L. Both of thesefit statistics provide penalties for over fitting the model. In the case of BIC, there is a sample sizepenalty also.AIC = −2 log L + 2df9


Pr(Y=y)0 .1 .2 .30 2 4 6 8yFigure 2: Comparis<strong>on</strong> of Zero-Inflated Poiss<strong>on</strong> <strong>and</strong> Poiss<strong>on</strong> <strong>and</strong> Observed Data<strong>and</strong>Coding this in Stata is straightforward.scalar ll = e(ll)scalar npar = e(k)scalar nobs = e(N)scalar AIC = -2*ll + 2*nparscalar BIC = -2*ll + log(nobs)*nparscalar list AICscalar list BICBIC = −2 log L + df log nComparing the models we get AIC (BIC) of 5102.2 (5139.4) for the Poiss<strong>on</strong> regressi<strong>on</strong> <strong>and</strong> AIC(BIC) 4963.1 (5032.1) for the zero-inflated model. Lower is better, so the additi<strong>on</strong>al work offitting the zero-inflated model is warranted.N<strong>on</strong>independent Events. It is reas<strong>on</strong>able to expect that there is a certain degree ofdependence between the individual births that c<strong>on</strong>stitute a woman’s completed fertility. We canbuild this dependency into the model in a couple of ways. First, we will rewrite the st<strong>and</strong>ardmodel as as r<strong>and</strong>om effects model with woman-specific unobserved heterogeneity.log(µ i ) = x i β + u iThis should look familiar. The individual-level r<strong>and</strong>om effect u has been added to the model, as itwas in the r<strong>and</strong>om effects logit models outlined earlier in the course. Statisticians have l<strong>on</strong>grecognized that if a marginal distributi<strong>on</strong> is combined with a particular prior distributi<strong>on</strong> for ther<strong>and</strong>om effect, the resulting distributi<strong>on</strong> is an entirely new distributi<strong>on</strong>. This is the case here, if wespecify a multiplicative factor v that raises or lowers the rate of childbearing for a women in oursample, <strong>and</strong> assume that it follows a gamma distributi<strong>on</strong>, the resulting marginal distributi<strong>on</strong> is nol<strong>on</strong>ger Poiss<strong>on</strong>, but negative binomial. For example, suppose we c<strong>on</strong>sider a multiplicative model.Let’s assume that v is distributed as gamma.µ i = exp(x i β)v iv ∼ gamma(α, β),10


such that E(v) = α/β <strong>and</strong> var(v) = α/β 2 . In Bayesian terms, the gamma distributi<strong>on</strong> is thec<strong>on</strong>jugate prior distributi<strong>on</strong> for the Poiss<strong>on</strong> (<strong>and</strong> other distributi<strong>on</strong>s in the exp<strong>on</strong>ential family).When the prior distributi<strong>on</strong> is combined with the Poiss<strong>on</strong> distributi<strong>on</strong> for y, c<strong>on</strong>diti<strong>on</strong>al <strong>on</strong> v, theresulting unc<strong>on</strong>diti<strong>on</strong>al distributi<strong>on</strong> (or posterior distributi<strong>on</strong>) of y is negative binomial. This wasdiscovered not l<strong>on</strong>g after the Poiss<strong>on</strong> distributi<strong>on</strong>, perhaps round 1909.For c<strong>on</strong>venience, we normalize the distributi<strong>on</strong> of v so that it has a mean of 1.0 as follows:E(v) = 1.0<strong>and</strong> this implies,so the resulting distributi<strong>on</strong> for v isvar(v) = 1/α,g(v) = αα v α−1Γ(α)exp(−αv) α > 0The likelihood of y for the ith woman c<strong>on</strong>diti<strong>on</strong>al <strong>on</strong> her r<strong>and</strong>om effect v i is.L(y i |v i )To obtain the marginal likelihood of y for the whole sample, we need to integrate over thedistributi<strong>on</strong> of v for each woman in our sample in order to “average” out the r<strong>and</strong>om effect.L m = ∏ ∫L(y i |v i )g(v)dvivThe resulting distributi<strong>on</strong> can be evaluated in closed formL m = ∏ iΓ(y i + 1 α ) [αµ] i yi[1] 1αΓ(y i + 1)Γ( 1 α ) 1 + αµ i 1 + αµ iExercise 5. Derive the negative binomial distributi<strong>on</strong> as a mixture of a Poiss<strong>on</strong> <strong>and</strong> gammadistributi<strong>on</strong>.A negative binomial variable has mean E(Y ) = µ <strong>and</strong> variance var(Y ) = µ + µα −1 . The loglikelihood functi<strong>on</strong> for this model is,log L =n∑log(1 − α) − log(1 + αy i ) + y i log µ i − (y + 1 α ) log(1 − αµ i) − log Γ(y i + 1).i=1It is a bit more difficult to optimize this model, but it is straightforward. All major statisticalsoftware (SAS, R, Stata have routines for estimating this model.Estimati<strong>on</strong>. We maximize this likelihood with respect to the parameters β <strong>and</strong> α, theoverdispersi<strong>on</strong> parameter. This is the st<strong>and</strong>ard deviati<strong>on</strong> of the gamma-distributed r<strong>and</strong>om effectmenti<strong>on</strong>ed earlier (note that we have assumed this effect to have a mean of 1 <strong>and</strong> have estimatedits variance using the current sample of women). In fact, the negative binomial is identical to anindividual-level r<strong>and</strong>om-effects Poiss<strong>on</strong> regressi<strong>on</strong>. We get identical results using either nbreg orxtpois, however we must supply a resp<strong>on</strong>dent ID number in order to identify the clusters (eachof size 1) to be used in the r<strong>and</strong>om effects specificati<strong>on</strong>.11


. nbreg childsNegative binomial regressi<strong>on</strong> Number of obs = 1501LR chi2(0) = 0.00Prob > chi2 = .Log likelihood = -2710.1185 Pseudo R2 = 0.0000------------------------------------------------------------------------------childs | Coef. Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]-------------+----------------------------------------------------------------_c<strong>on</strong>s | .6623651 .0224627 29.49 0.000 .618339 .7063912-------------+----------------------------------------------------------------/lnalpha | -1.419918 .1278525 -1.670504 -1.169331-------------+----------------------------------------------------------------alpha | .2417339 .0309063 .1881522 .3105746------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 108.81 Prob>=chibar2 = 0.000. gen idnum = _n. xtpois childs, i(idnum)R<strong>and</strong>om effects u_i ~ Gamma Obs per group: min = 1avg = 1.0max = 1Wald chi2(0) = .Log likelihood = -2710.1185 Prob > chi2 = .------------------------------------------------------------------------------childs | Coef. Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]-------------+----------------------------------------------------------------_c<strong>on</strong>s | .6623651 .0224627 29.49 0.000 .618339 .7063912-------------+----------------------------------------------------------------/lnalpha | -1.419918 .1278525 -1.670504 -1.169331-------------+----------------------------------------------------------------alpha | .2417339 .0309063 .1881522 .3105746------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 108.81 Prob>=chibar2 = 0.000The test <strong>on</strong> the alpha parameter is a test of overdispersi<strong>on</strong>. This suggests an improved fit fromthis model. The variance of the r<strong>and</strong>om effect v is 0.24 2 (given by alpha). As this goes to 0, themodel approaches the usual Poiss<strong>on</strong> regressi<strong>on</strong> model. In fact, here the results do not changemuch from the original model. Note that the alpha in the Stata output is not the same α fromthe gamma distributi<strong>on</strong>. Recall that the α parameter from the gamma distributi<strong>on</strong> is thereciprocal of the variance of the r<strong>and</strong>om effect, i.e., the reciprocal of the quantity reported asalpha by Stata, or 4.14. recall that the observed mean <strong>and</strong> variance of y are, respectively, 1.94<strong>and</strong> 2.79. Under the negative binomial regressi<strong>on</strong> model, the estimated mean is the same <strong>and</strong> thevariance is ̂µ + ̂µ/α = 1.94 + 1.94(0.242) = 2.41, which is getting a good deal closer to 2.79.Likelihood ratio tests can also be carried out of the current model against the null model. Thelog L from the corresp<strong>on</strong>ding Poiss<strong>on</strong> regressi<strong>on</strong> model is −2764.52 compared to a log L of−2710.12 for the negative binomial model. The likelihood ratio χ 2 for is 108.81 with 1 df, whichis exactly what is reported in the output. Note that the st<strong>and</strong>ard errors are larger than in theoriginal Poiss<strong>on</strong> regressi<strong>on</strong> model due to this adjustment to the model, so more c<strong>on</strong>servativesignificance tests will follow.12


Proporti<strong>on</strong>0 .1 .2 .30 2 4 6 8Number of Childrenmean = 1.939; overdispersi<strong>on</strong> = .2417observed proporti<strong>on</strong>poiss<strong>on</strong> probneg binom probFigure 3: Negative Binomial <strong>and</strong> Poiss<strong>on</strong> Fits vs Observed DataWe can further compare the Poiss<strong>on</strong> <strong>and</strong> negative binomial fit from the null models againstthe observed distributi<strong>on</strong> of counts using nbvargr (download this ado file) as shown in Fig. 3. Wecan see that accounting for the overdispersi<strong>on</strong> also tends to adjust for the excess zeros. Moreover,for these data, the r<strong>and</strong>om-effects Poiss<strong>on</strong>/negative binomial regressi<strong>on</strong> model appears to fitcounts of 1 more accurately (compare to Fig. 2).Interpreting R<strong>and</strong>om Effects Models. We can think of v as woman-specific unobservedheterogeneity not captured by measured covariates. This r<strong>and</strong>om variable may also be viewed indemographic terms as fecundability, in the case of fertility. In mortality research the unobservedmortality propensity is referred to as frailty. Whatever we call it, this captures the dependence inthe individual (Bernoulli) events that c<strong>on</strong>tribute to the observed counts. An average value of v is1.0. Values below (above) 1.0 suggest lower (higher) fertility propensity. We have assumed amean of 1.0 for this variable <strong>and</strong> the model has estimated the variance in its distributi<strong>on</strong>.Because, in principle, each subject has a specific value of unobserved heterogeneity, the estimateof the β’s must be interpreted c<strong>on</strong>diti<strong>on</strong>al <strong>on</strong> the r<strong>and</strong>om effects in the populati<strong>on</strong>. When thevariance in unobserved heterogeneity is large the coefficients can differ c<strong>on</strong>siderably from those ofthe unc<strong>on</strong>diti<strong>on</strong>al model. In this case the variance in unobserved heterogeneity is small, so theestimates resemble those in the unc<strong>on</strong>diti<strong>on</strong>al model. However, it should be remembered thatestimates are c<strong>on</strong>diti<strong>on</strong>al <strong>on</strong> an unmeasured subject-level variable, so they should not beinterpreted in the same way as those from the st<strong>and</strong>ard Poiss<strong>on</strong> regressi<strong>on</strong> model, which areunc<strong>on</strong>diti<strong>on</strong>al or populati<strong>on</strong> average estimates.A remaining questi<strong>on</strong> is whether it would be possible to obtain an estimate of a particularwoman’s fertility propensity given her data. After estimating the model, we know somethingabout the distributi<strong>on</strong> of the r<strong>and</strong>om effects (at least according to the prior distributi<strong>on</strong> we haveassumed). The Bayesian paradigm is useful for addressing questi<strong>on</strong>s about particular values ofthe r<strong>and</strong>om effects. Specifically, given a gamma prior distributi<strong>on</strong> <strong>on</strong> v, g(v), <strong>and</strong> the likelihood ofthe data c<strong>on</strong>diti<strong>on</strong>al <strong>on</strong> v, f(y|v), the posterior distributi<strong>on</strong> of v isp(v|y) ∝ g(v)f(y|v)With this informati<strong>on</strong>, it is possible to obtain the c<strong>on</strong>diti<strong>on</strong>al expectati<strong>on</strong> needed to answer the13


questi<strong>on</strong> above.∫vE(v|y) = ∫vg(v)f(y|v)dvv g(v)f(y|v)dvThis is referred to the expected a-posteriori estimate of the r<strong>and</strong>om effect, or empirical Bayesestimate for short. In many cases this expressi<strong>on</strong> would have to be solved numerically, or othermethods might need to be used to evaluate it. It turns out to be c<strong>on</strong>venient that that a gammaprior was used. In this case, the posterior distributi<strong>on</strong> is also gamma, but with shape parameterα + y i <strong>and</strong> scale parameter α + µ. If we had observed j multiple events <strong>on</strong> the same individual orcounts am<strong>on</strong>g clusters of j dependent units of analysis, the shape <strong>and</strong> scale parameters would beα + ∑ j y ij <strong>and</strong> α + ∑ j µ ij. Thus, in the case of these types of data structures, we have amultilevel Poiss<strong>on</strong> model. 3Next we fit the full model. Note that the estimated variance in the r<strong>and</strong>om effect is 0.073,which implies almost no variati<strong>on</strong> in the r<strong>and</strong>om effect distributi<strong>on</strong>. A proporti<strong>on</strong>ate reducti<strong>on</strong> inerror statistic can be computed to compare the change in the proporti<strong>on</strong> of variance explained bythis model (indexed by 1) compared to the null model (indexed by 0).R 2 p = var(v) 0 − var(v) 1var(v) 0= 0.242 − 0.073.242 = 0.698This suggests that about 70% of the variance in number of children is accounted for by thewoman-specific variables included in the full model. To gauge the correlati<strong>on</strong> in the individualevents c<strong>on</strong>tributing to total fertility we can c<strong>on</strong>struct a measure a measure asICC =var(u)var(u) + var(y) = 0.0730.073 + 2 = 0.035which is not much. These kinds of r<strong>and</strong>om effects models are much more useful when we haverepeated measures <strong>on</strong> the same set of resp<strong>on</strong>dents (i.e., panel data) or clustered counts data fromindividuals nested in wider c<strong>on</strong>texts (fertility of siblings etc.).R<strong>and</strong>om-effects Poiss<strong>on</strong> regressi<strong>on</strong> Number of obs = 1496Log likelihood = -2537.2144 Prob > chi2 = 0.0000------------------------------------------------------------------------------childs | Coef. Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]-------------+----------------------------------------------------------------boomer | -.5613786 .041419 -13.55 0.000 -.6425583 -.4801989married | .3626 .0438245 8.27 0.000 .2767055 .4484945_Ideg_2 | -.2270964 .055744 -4.07 0.000 -.3363527 -.1178401_Ideg_3 | -.4895984 .065998 -7.42 0.000 -.6189521 -.3602448n<strong>on</strong>wht | .2258126 .0484695 4.66 0.000 .1308141 .3208111income | -.0287399 .008614 -3.34 0.001 -.045623 -.0118567_c<strong>on</strong>s | 1.291956 .0876294 14.74 0.000 1.120205 1.463706-------------+----------------------------------------------------------------/lnalpha | -2.615761 .3047629 -3.213085 -2.018436-------------+----------------------------------------------------------------alpha | .0731122 .0222819 .0402323 .1328631------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 13.78 Prob>=chibar2 = 0.0003 This differs from the usual multilevel models that assume normal or multivariate normal r<strong>and</strong>om effects.14


Density0 1 2 3 4 5.8 1 1.2 1.4vhat1Figure 4: Distributi<strong>on</strong> of Empirical Bayes Estimates of R<strong>and</strong>om EffectThe distributi<strong>on</strong> of r<strong>and</strong>om effects from this model can be shown a couple of ways. A graphicalrepresentati<strong>on</strong> is given in Fig. 4. The comm<strong>and</strong>s to fit the model <strong>and</strong> compute the subject-specificr<strong>and</strong>om effect estimates are given below.xi: xtpois childs boomer married i.deg n<strong>on</strong>wht income, i(idnum)xtpois, irrpredict xb, xbgen lambda1 = exp(xb)scalar alpha = 1/(exp([lnalpha]_c<strong>on</strong>s))gen vhat1 = (y + alpha)/(lambda1 + alpha)sum vhat1, detailhistogram vhat1-------------------------------------------------------------vhat1-------------------------------------------------------------Percentiles Smallest1% .822471 .75523535% .8726416 .780816610% .8964401 .7998554 Obs 149625% .9296511 .8005092 Sum of Wgt. 149650% .9884257 Mean 1Largest Std. Dev. .094407175% 1.053683 1.38304790% 1.123672 1.383047 Variance .008912795% 1.170076 1.420766 Skewness .803510299% 1.288011 1.431209 Kurtosis 4.09398Thus, based <strong>on</strong> this data, the propensity for childbearing (net of covariates) is about 1.25 timeshigher for women at the 90% percentile of the r<strong>and</strong>om effects distributi<strong>on</strong> relative to those at the10th percentile, which is not much of a difference.Extensi<strong>on</strong>s. It is also possible to combine both zero-inflati<strong>on</strong> with the negative binomialdistributi<strong>on</strong>al assumpti<strong>on</strong>. What you get, of course, is the zero-inflated negative binomial model(zinb). There are many extensi<strong>on</strong>s of these extensi<strong>on</strong>s that have not made their way into canned15


packages. They are very useful for particular applicati<strong>on</strong>s. When this model is fit to these data,the overdispersi<strong>on</strong> parameter is no l<strong>on</strong>ger statistically significant.Exercise 6: Fit a comm<strong>on</strong> model specificati<strong>on</strong> (i.e., same set of variables) for all these models <strong>and</strong>compare <strong>on</strong> the basis of likelihood ratio tests (when possible) <strong>and</strong> BIC criteria (for n<strong>on</strong>-nestedmodels). Which model emerges as best for this data.Models for Rates in Time. Let t 1 , t 2 , . . . , t n denote the waiting times until an event occursfor a sample of n individuals, with distributi<strong>on</strong> functi<strong>on</strong> F (t) = Pr(T < t) <strong>and</strong> probability densityfuncti<strong>on</strong> f(t), where we assume that T is a c<strong>on</strong>tinuous r<strong>and</strong>om variable (T > 0). The hazard rate,intensity functi<strong>on</strong>, or failure rate is denoted by µ(t). The hazard rate is the instantaneousprobability of an event in the interval [t, t + ∆t], given that the event has not already occurredbefore the beginning of the interval. More formally, the hazard rate is the limit of a c<strong>on</strong>diti<strong>on</strong>alprobability (or transiti<strong>on</strong> probability)1µ(t) = lim Pr[t ≤ T < t + ∆t | T ≥ t]. (4)∆t→0 ∆t∆t>0The probability of surviving the interval [t, t + ∆t] is given by the survival functi<strong>on</strong>.S(t) = Pr[T > t] = 1 − F (t) =∫ ∞tf(u)du. (5)If we assume that the r<strong>and</strong>om variable for waiting-time (T ) follows an exp<strong>on</strong>ential distributi<strong>on</strong>with density functi<strong>on</strong>, f(t) = µ exp(−µt), the expressi<strong>on</strong>s for the survival functi<strong>on</strong> in Eq. 5 isS(t) = exp(−µt). The expressi<strong>on</strong> for the hazard rate of Eq. 4 is defined is the ratio, µ(t) =f(t)/S(t) = µ. For the exp<strong>on</strong>ential distributi<strong>on</strong>, this implies a c<strong>on</strong>stant hazard over time (i.e., notdepending <strong>on</strong> t).C<strong>on</strong>sider a Poiss<strong>on</strong> variate d that denotes the number of events occurring in an observati<strong>on</strong>window (exposure period) of length t. If events can occur repeatedly over time, <strong>and</strong> if the timesbetween events are independent exp<strong>on</strong>ential variables with mean time to event occurrence is givenby E(T ) = 1/µ (i.e., a time-homogeneous Poiss<strong>on</strong> process), then the probability of d events in atime-interval of length t follows a Poiss<strong>on</strong> distributi<strong>on</strong>,Pr(d | µ, t) = (tµ)d exp(−tµ). (6)d!The mean number of events in time-interval t is, λ = tµ. For a sample of size n, we can model thec<strong>on</strong>diti<strong>on</strong>al mean count as a functi<strong>on</strong> of independent variables, so that for the ith individual, theexpected number of events in time-interval t i isµ i = t i λ i = t i exp(x ′ iβ).The likelihood is a product of the individual Poiss<strong>on</strong> probabilities in Eq. 6, <strong>and</strong> is proporti<strong>on</strong>al toL =n∏(t i λ i ) d iexp(−t i λ i ), (7)i=1which is the kernel of a Poiss<strong>on</strong> likelihood.It will often be the case that we do not observe the event times for some individuals in thesample. In this case, the event times are said to be right censored, <strong>and</strong> the Poiss<strong>on</strong> variate is16


coded 0, otherwise it is coded 1 (event). C<strong>on</strong>sider the likelihood for an exp<strong>on</strong>ential model, whichis appropriate for the r<strong>and</strong>om variable T , <strong>and</strong> deals with censoring.L ==n∏i=1λ d iiexp(−t i λ i )n∏exp(x ′ iβ) d iexp{−t i exp(x ′ iβ)}.i=1(8)It turns out that maximizing the Poiss<strong>on</strong> likelihood of Eq. 7 yields the same MLEs as maximizingthe the exp<strong>on</strong>ential likelihood in Eq. 8. 4 This can be seen by rewriting Eq. 8 in a somewhatdifferent form asL =n∏n∏(t i λ i ) d iexp(−t i λ i )/i=1i=1t d ii . (9)The term in the denominator of Eq. 9 does not depend <strong>on</strong> unknown parameters <strong>and</strong> can beignored in estimati<strong>on</strong>. 5 This implies that the exp<strong>on</strong>ential hazard rate model can be estimatedusing a Poiss<strong>on</strong> regressi<strong>on</strong> model for counts. Taking logs of the Poiss<strong>on</strong> mean (µ i = t i λ i ), weobtain the following loglinear modellog µ i = log t i + log λ i= log t i + x ′ iβ.(10)The term, log t i appearing in this model has a fixed coefficient of <strong>on</strong>e <strong>and</strong> is known as an “offset.”To fit these models using software for Poiss<strong>on</strong> regressi<strong>on</strong>, we declare the log exposure as an offsetterm in the model. The table below shows the cumulative proporti<strong>on</strong> surviving (i.e., notexperiencing) an event (premarital birth) by a particular age. Resp<strong>on</strong>dents can be removed fromrisk of this event through marriage. In this case, by age 20, about 20 percent of the sample hasexperienced a premarital birth; by age 35 about 45% have.. ltable agecens pbirBeg.Std.Interval Total Deaths Lost Survival Error [95% C<strong>on</strong>f. Int.]-------------------------------------------------------------------------------14 15 1345 8 1 0.9940 0.0021 0.9881 0.997015 16 1336 19 3 0.9799 0.0038 0.9708 0.986216 17 1314 41 10 0.9492 0.0060 0.9360 0.959717 18 1263 51 20 0.9106 0.0078 0.8939 0.924718 19 1192 46 68 0.8744 0.0092 0.8552 0.891219 20 1078 49 82 0.8331 0.0105 0.8114 0.852520 21 947 44 66 0.7930 0.0116 0.7692 0.814621 22 837 34 80 0.7591 0.0124 0.7337 0.782522 23 723 21 91 0.7356 0.0131 0.7090 0.760223 24 611 20 85 0.7097 0.0138 0.6816 0.735924 25 506 15 65 0.6873 0.0146 0.6577 0.714825 26 426 6 39 0.6771 0.0149 0.6469 0.705426 27 381 7 42 0.6639 0.0154 0.6327 0.693227 28 332 11 63 0.6396 0.0165 0.6062 0.671028 29 258 8 60 0.6172 0.0177 0.5814 0.65094 The d! term appearing in the denominator does not depend <strong>on</strong> unknown parameters, so it can be effectivelyignored for estimating β.5 For the statistical relati<strong>on</strong>ship to hold, the events must corresp<strong>on</strong>d to n independent, time-homogeneous Poiss<strong>on</strong>processes with rates, λ i. The number of events for the ith process in a time interval of length t follows a Poiss<strong>on</strong>distributi<strong>on</strong> with mean t i λ i . The observed number of events (d i ) is not 0,1,2,3,. . ., but 0 or 1. This is equivalent toobserving the ith process until either a first event occurs, or a fixed time, t i, has elapsed without the event occurring.17


29 30 190 4 49 0.6023 0.0188 0.5643 0.638130 31 137 2 35 0.5922 0.0198 0.5523 0.629831 32 100 3 36 0.5705 0.0227 0.5248 0.613632 33 61 2 31 0.5455 0.0278 0.4894 0.598033 34 28 0 16 0.5455 0.0278 0.4894 0.598034 35 12 0 11 0.5455 0.0278 0.4894 0.598035 36 1 0 1 0.5455 0.0278 0.4894 0.5980-------------------------------------------------------------------------------We can use the methods described above to analyze individual data like these. We can obtaingood results using log linear models for tables. C<strong>on</strong>sider the following table of race by age ofevent (agecat = 1 if < 20, 2 otherwise ) that was derived from these data.. list race agecat D T+------------------------------------+race agecat D T--------------------------------------1. Hispanic 1 31 5082.682. Hispanic 2 36 896.433. Black 1 143 8252.044. Black 2 103 1336.375. White 1 40 12801.896. White 2 38 2773.93+------------------------------------+The variables D i <strong>and</strong> R i denote the number of events in each category <strong>and</strong> the pers<strong>on</strong> years ofexposure to risk for the subjects in that category. The empirical rates are then. We assume D i tobe a Poiss<strong>on</strong> variable with mean µ i . Using the covariates agecat <strong>and</strong> race as x, we can write aloglinear model in log µ ilog(µ i )λ i R i = x ′ iβ + log R iwith log R i as an “offset” term whose coefficient is fixed at 1.. xi: glm D i.agecat i.race, f(p) offset(logT)Generalized linear models No. of obs = 6Optimizati<strong>on</strong> : ML: Newt<strong>on</strong>-Raphs<strong>on</strong> Residual df = 2Scale parameter = 1Deviance = 2.167700605 (1/df) Deviance = 1.08385Pears<strong>on</strong> = 2.176320642 (1/df) Pears<strong>on</strong> = 1.08816Variance functi<strong>on</strong>: V(u) = u[Poiss<strong>on</strong>]Link functi<strong>on</strong> : g(u) = ln(u) [Log]St<strong>and</strong>ard errors : OIMLog likelihood = -18.57892505 AIC = 7.526308BIC = -1.415818333------------------------------------------------------------------------------D | Coef. Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]-------------+----------------------------------------------------------------_Iagecat_2 | 1.557508 .1017639 15.31 0.000 1.358055 1.756962_Irace_2 | .8540021 .1378224 6.20 0.000 .5838751 1.124129_Irace_3 | -.870819 .1666529 -5.23 0.000 -1.197453 -.5441853_c<strong>on</strong>s | -4.937157 .1306748 -37.78 0.000 -5.193275 -4.68103918


logT | (offset)------------------------------------------------------------------------------The baseline rate of premarital birth is exp(−4.937) = 0.007 or about 7 premarital births perthous<strong>and</strong> per year of age (after age 11). This event is exp(1.557) = 4.74 time higher for womenover the age of 20. The comparis<strong>on</strong> category for race is Hispanic, so we find rates that are about58% lower for whites <strong>and</strong> about 2.34 times higher for blacks at any age (relative to Hispanics).Because the rates are proporti<strong>on</strong>ally higher or lower depending <strong>on</strong> the values of the covariates inthe model, this model is referred to as a proporti<strong>on</strong>al hazards model.Exercise 7: Using the table above, fit a model that allows the risk to vary by race depending <strong>on</strong>age. That is, fit a model in which there are n<strong>on</strong>proporti<strong>on</strong>al effects of race <strong>and</strong> determine if thereis evidence of n<strong>on</strong>proporti<strong>on</strong>ality.Hazard Models with Cluster-level Frailty We can combine ideas from the previoussecti<strong>on</strong> to model for events. Suppose we c<strong>on</strong>sider a sampling plan that records age at firstpremarital birth for the subsample of white women in the NLSY. The NLSY includes resp<strong>on</strong>dents<strong>and</strong> up to 6 sibling resp<strong>on</strong>dents in the data. We can use this data structure to illustrate multilevelhazard rate models. Durati<strong>on</strong>, t ij denotes the age at first premarital birth for the ith individualin the jth family. The event-status indicator d ij = 1 denotes a premarital birth at time t ij for theith individual in the jth family. A proporti<strong>on</strong>al hazard model accounting for cluster-level frailty ish(t ij ) = h 0 (t ij ) exp{x ij (t ij )β}v j ,where we assume that v j follows a gamma distributi<strong>on</strong> with mean 1 <strong>and</strong> variance φ = 1/α asbefore. The model can be estimated using the EM-algorithm, which treats v j as a known relativerisk. At each EM iterati<strong>on</strong>, our estimate of v j is updated using current results from a loglinearmodel (Poiss<strong>on</strong> regressi<strong>on</strong>) <strong>and</strong> our current estimates of α as follows:̂v j = ̂α + d +ĵα + H +jwhere d +j is the total number of events in the jth cluster <strong>and</strong> H +j is the sum of the integratedhazards in the jth cluster. The imputed frailties from each iterati<strong>on</strong> are then treated as part ofthe “offset” term in a Poiss<strong>on</strong> regressi<strong>on</strong>. The model can also be estimated directly using ML <strong>and</strong>a Poiss<strong>on</strong> regressi<strong>on</strong> model for panel data (xtpois) in Stata.Stata Example Script.set memory 5000#delimit;infile v1 intr v3 d ts tf v7 cid c9 n<strong>on</strong>intm12 m13 mmed mwrk14 adjinc msinc nsibs southurban fundpro cathol weekly profam selfest test using 6W1.dat;gen offs=log(tf-ts);gen a1 = intr==1;gen a2 = intr==2;gen a3 = intr==3;gen a4 = intr==4;gen a5 = intr==5;19


gen a6 = intr==6;egen id = fill(1 2);xtpois d a1 a2 a3 a4 a5 a6 n<strong>on</strong>int m12 adjinc nsibssouth urban fundpro cathol profam selfest,offset(offs) i(cid) noc<strong>on</strong>s;Results.R<strong>and</strong>om-effects Poiss<strong>on</strong> Number of obs = 9359Group variable (i) : cid Number of groups = 1936R<strong>and</strong>om effects u_i ~ Gamma Obs per group: min = 1avg = 4.8max = 23Wald chi2(16) = 2679.28Log likelihood = -1144.5603 Prob > chi2 = 0.0000------------------------------------------------------------------------------d | Coef. Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]---------+--------------------------------------------------------------------a1 | -4.854541 .8016085 -6.056 0.000 -6.425665 -3.283418a2 | -2.649949 .7782366 -3.405 0.001 -4.175264 -1.124633a3 | -2.143511 .7772958 -2.758 0.006 -3.666983 -.6200393a4 | -2.185339 .779154 -2.805 0.005 -3.712453 -.6582254a5 | -2.053847 .794188 -2.586 0.010 -3.610427 -.4972672a6 | -2.106344 .8363073 -2.519 0.012 -3.745476 -.4672121n<strong>on</strong>int | .9371541 .1693922 5.532 0.000 .6051515 1.269157m12 | .8770516 .1649014 5.319 0.000 .5538508 1.200252adjinc | -.0000559 .0000168 -3.319 0.001 -.0000889 -.0000229nsibs | .0572207 .0386449 1.481 0.139 -.0185219 .1329633south | -.7573547 .2066934 -3.664 0.000 -1.162466 -.3522431urban | .2083464 .1893081 1.101 0.271 -.1626906 .5793835fundpro | .7688226 .2057846 3.736 0.000 .3654923 1.172153cathol | -.2760485 .1953083 -1.413 0.158 -.6588457 .1067487profam | .5185107 .1837613 2.822 0.005 .158345 .8786763selfest | -1.034571 .1925264 -5.374 0.000 -1.411916 -.6572266offs | (offset)---------+--------------------------------------------------------------------/lnalpha | .2281934 .4508365 -.6554299 1.111817---------+--------------------------------------------------------------------alpha | 1.256328 .5663986 .5192188 3.039876------------------------------------------------------------------------------Likelihood ratio test of alpha=0: chi2(1) = 9.34 Prob > chi2 = 0.0022Imputed Frailties. Suppose you want to know more about the distributi<strong>on</strong> of frailty or wantedto know which observed family-level traits might have an impact <strong>on</strong> a family’s frailty? 6 Theexpressi<strong>on</strong> for ̂v j can be evaluated using Stata as follows:set memory 5000#delimit;6 By assumpti<strong>on</strong>, the x’s are uncorrelated with the v’s so statement is c<strong>on</strong>tradictory. That is, under our model’sassumpti<strong>on</strong>s, frailty should not impact other covariates. However, the variance in frailty could be affected by theinclusi<strong>on</strong> of covariates.20


infile v1 intr v3 d ts tf v7 cid c9 n<strong>on</strong>intm12 m13 mmed mwrk14 adjinc msinc nsibs southurban fundpro cathol weekly profam selfest test using 6W1.dat;gen offs=log(tf-ts);gen a1 = intr==1;gen a2 = intr==2;gen a3 = intr==3;gen a4 = intr==4;gen a5 = intr==5;gen a6 = intr==6;egen id = fill(1 2);/* Fit baseline hazard <strong>and</strong> retrieve imputed frailties */xtpois d a1 a2 a3 a4 a5 a6, offset(offs) i(cid) noc<strong>on</strong>s;predict xbeta, xb;gen t = tf - ts;gen Ihaz=t*exp(xbeta - offs);gen a=1/e(alpha);collapse (mean) p = a (rawsum) H=Ihaz (rawsum) D=d, by(cid);gen w = (D+p)/(H+p);summarize w, detail;Here, I let w denote each family’s frailty. We use the collapse comm<strong>and</strong> to get the desiredquantities. Note that p corresp<strong>on</strong>ds to 1/φ, which is called alpha in Stata. Since it is c<strong>on</strong>stantover clusters, we just take the mean. Other quantities H <strong>and</strong> D corresp<strong>on</strong>d to the sum of acluster’s integrated hazards <strong>and</strong> a cluster’s total number of events. This results in a new data setc<strong>on</strong>taining cluster-specific frailties. You could also include observed characteristics of the cluster,in order to see how these would affect frailty in a regressi<strong>on</strong> model. Since frailty must be > 0, agood model might be a loglinear regressi<strong>on</strong> model such as:log v j = x j β + ε.Here, we asked <strong>on</strong>ly for the descriptive summary of frailty from a proporti<strong>on</strong>al hazards modelthat fits <strong>on</strong>ly the baseline hazard. The quantiles are probably most relevant.Log likelihood = -1248.9537 Prob > chi2 = 0.0000------------------------------------------------------------------------------d | Coef. Std. Err. z P>|z| [95% C<strong>on</strong>f. Interval]---------+--------------------------------------------------------------------a1 | -6.45445 .2405448 -26.833 0.000 -6.925909 -5.982991a2 | -4.282342 .1458461 -29.362 0.000 -4.568195 -3.996489a3 | -3.890296 .1553783 -25.038 0.000 -4.194832 -3.58576a4 | -3.995447 .1894533 -21.089 0.000 -4.366768 -3.624125a5 | -3.816231 .2659806 -14.348 0.000 -4.337543 -3.294918a6 | -3.749099 .3883572 -9.654 0.000 -4.510265 -2.987933offs | (offset)---------+--------------------------------------------------------------------/lnalpha | 1.41132 .3308904 .7627865 2.059853---------+--------------------------------------------------------------------alpha | 4.101364 1.357102 2.144243 7.844815------------------------------------------------------------------------------Likelihood ratio test of alpha=0: chi2(1) = 27.61 Prob > chi2 = 0.0000w21


-------------------------------------------------------------Percentiles Smallest1% .2922498 .18123255% .3799957 .200074810% .4219338 .2040029 Obs 193625% .519807 .2328834 Sum of Wgt. 193650% .6829988 Mean 1Largest Std. Dev. 1.07957475% .8374014 6.5031590% 2.366105 7.647584 Variance 1.1654895% 4.012737 8.142287 Skewness 2.92292699% 4.941901 8.929889 Kurtosis 11.43697Note that empirically the estimated mean frailty is 1 <strong>and</strong> the variance is 1.165.22

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!