29.07.2014 Views

Struggles with Survey Weighting and Regression Modeling paper

Struggles with Survey Weighting and Regression Modeling paper

Struggles with Survey Weighting and Regression Modeling paper

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

162 A. GELMAN<br />

We can combine (10) <strong>and</strong>(11) to express each ˆθ j as a<br />

linear combination of the cell means ȳ k ,<br />

J∑<br />

ˆθ k = c kj ȳ j .<br />

j=1<br />

After some algebra, we can write these coefficients as<br />

⎧<br />

σy<br />

2 ⎪⎨ A k A j /A, for j ≠ k,<br />

c kj =<br />

n k<br />

⎪⎩ σθ 2A k + σ y<br />

2 A 2 k<br />

, for j = k,<br />

n k<br />

where<br />

A k =<br />

J∑ 1<br />

σy 2/n k + σθ<br />

2 .<br />

k=1<br />

The payoff now comes in computing the poststratified<br />

estimate,<br />

J∑<br />

ˆθ PS = N k ˆθ/N<br />

k=1<br />

=<br />

J∑<br />

J∑<br />

k=1 j=1<br />

N k<br />

N c kj y¯<br />

j ,<br />

equating this to ∑ j<br />

j=1 W j y¯<br />

j <strong>and</strong> thus deriving the cell<br />

weights,<br />

W j =<br />

J∑<br />

k=1<br />

N k<br />

N c kj<br />

= A j<br />

[<br />

Nj<br />

N σ 2 θ + J ∑<br />

k=1<br />

N k<br />

N<br />

A k<br />

A<br />

σy<br />

2 ]<br />

.<br />

n k<br />

The implicit unit weights are then w pop<br />

j<br />

= (n/n j )W j ,<br />

or<br />

(12)<br />

w pop<br />

j<br />

[<br />

n Nj<br />

= A j<br />

n j N σ θ 2 + σ y<br />

2<br />

AN<br />

n<br />

=<br />

σ 2 y + n j σ 2 θ<br />

×<br />

[<br />

Nj<br />

N σ θ 2 + σ y<br />

2<br />

N<br />

J∑<br />

k=1<br />

]<br />

N k<br />

A k<br />

n k<br />

∑ Jk=1<br />

N k /(σ 2 y + n kσ 2 θ )<br />

∑ Jk=1<br />

n k /(σ 2 y + n kσ 2 θ ) ].<br />

The ratio of sums in (12) is a constant (given the fitted<br />

model) that does not depend on j. Let us approximate<br />

it by N/n (which is appropriate if the sample<br />

proportions n k /N k are independent of the group sizes<br />

N k ). Under this approximation, the unit weights can be<br />

written as<br />

approximate w pop<br />

j<br />

(13)<br />

n j /σy<br />

2 =<br />

n j /σy 2 + 1/σ θ<br />

2<br />

· Nj /N<br />

n j /n<br />

1/σθ<br />

2 +<br />

n j /σy 2 + 1/σ θ<br />

2 · 1,<br />

which is a weighted average of the full poststratification<br />

unit weight, N k/N<br />

n k /n<br />

, <strong>and</strong> the completely smoothed<br />

weight of 1. Hierarchical poststratification is thus approximately<br />

equivalent to a shrinkage of weights by<br />

the same factors as in the shrinkage of the parameter<br />

estimates (10).<br />

Thus, as <strong>with</strong> hierarchical regression models in general,<br />

the amount of shrinkage of the weights depends<br />

on the between- <strong>and</strong> <strong>with</strong>in-stratum variance in the outcome<br />

of interest, y.<br />

Other hierarchical models. Lazzeroni <strong>and</strong> Little<br />

(1998) <strong>and</strong> Elliott <strong>and</strong> Little (2000) discuss various<br />

hierarchical linear regression models, including combinations<br />

of the two models described above (i.e., a<br />

hierarchical regression <strong>with</strong> a cell-level variance component)<br />

<strong>and</strong> models <strong>with</strong> correlations between adjacent<br />

cell categories for ordered predictors.<br />

Another natural generalization is to use logistic regression<br />

for binary inputs. Unfortunately, when we<br />

move away from linear regression, we ab<strong>and</strong>on the<br />

translation invariance of the parameter estimates (i.e.,<br />

the property that adding a constant to all the data affects<br />

only the constant term <strong>and</strong> none of the other regression<br />

coefficients). As a result, for logistic regression,<br />

the poststratified estimate ˆθ PS is no longer a weighted<br />

average of the data, even after controlling for the variance<br />

parameters in the model. However, we suspect<br />

that the model could be linearized, yielding approximate<br />

weights.<br />

3.3 Properties of the Model-Based Poststratified<br />

Estimates<br />

St<strong>and</strong>ard errors. The variance of the poststratified<br />

estimate, ignoring sampling variation in X, can be expressed<br />

using various formulas,<br />

var( ˆθ PS ) = 1 n∑<br />

n 2 wi 2 σ y 2 = 1 J∑<br />

(w pop<br />

j<br />

) 2 n j σy<br />

2 n<br />

= 1<br />

nN<br />

i=1<br />

J∑<br />

j=1<br />

j=1<br />

w pop<br />

j<br />

N pop<br />

j<br />

σy 2 .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!