SAE Manual Sections 1 to 4_1 (May 06).pdf - National Statistical ...

SAE Manual Sections 1 to 4_1 (May 06).pdf - National Statistical ... SAE Manual Sections 1 to 4_1 (May 06).pdf - National Statistical ...

12.11.2014 Views

A Guide to Small Area Estimation - Version 1.1 05/05/2006 ! Calibration Estimator To produce calibration estimators, the original survey weights (usually the inverse probabilities of inclusion in the sample) are replaced with new "calibrated" weights that are in some sense as close as possible to the original weights, but are calibrated on some auxiliary variable available for the population (Chambers, 2005). The small area estimate for this auxiliary variable, calculated using the calibrated weights, will agree with the known population totals. A simple example of calibration is where population age by gender demographic totals are known for each small area. The survey weights are then adjusted so that estimates of population count by age and gender, agree with the known population counts. There are a couple of points to note about the calibration estimator. Firstly it is a straightforward method to put into production because the resulting adjusted (calibrated) weights can be stored on the survey file and used to produce estimates at the desired level of aggregation. Secondly the auxiliary variables should be chosen with care and should relate to variables we wish to produce estimates for. If the calibrated weights are used to produce estimates for variables that aren’t related to the auxiliary variable(s) used in determining the calibrated weights, the resulting estimates may be biased. In general calibrated estimates possess good design-based properties. Government statisticians have historically preferred the design-based to the model-based approach as the resulting estimates are not subject to the consequences of model mis-specification. 4.1.2 Regression Methods Where a higher level of accuracy is required for small area estimates, an alternative is to use regression or model-based approaches, however these methods require a higher level of statistical expertise to implement and interpret results. A wide variety of different regression techniques are available, but for the purposes of this manual, they are divided into two main categories: synthetic and random effects regression models. ! Synthetic Regression Models Synthetic regression models make use of available auxiliary data to mathematically express a deterministic relationship between those auxiliary variables and the target (response) variable we are trying to predict in each small area. Synthetic models assume that all the systematic variability in the response variable is explained by the variability in the values of the auxiliary variables. The remaining variability, which is referred to as the "random noise" or "stochastic variation", is represented by the difference between the predicted value for the response variable under the model and the value observed from the data. These differences are called random errors, residuals or disturbances. In the case of small area models, synthetic models assume that the same deterministic relationship between the variable of interest and the auxiliary variables, holds across a range of small areas, say for example within a state. Australian Bureau of Statistics 26

A Guide to Small Area Estimation - Version 1.1 05/05/2006 Synthetic models work well when all relevant auxiliary variables that help predict the response variable are available, accurate and can be included in the model. However in practice this is more the exception than the rule. ! Random Effects Regressions Models When fitting a synthetic model, the residuals should look like "white noise", however in practice they often display significant between area variation which indicates that there is some other systematic variation in the response variable between different small areas that is not being accounted for by the auxiliary variables. This implies that the synthetic model is missing certain auxiliary variables, the values of which would, had they been available, better help predict differences between small areas. This problem can be addressed by incorporating a random effect into the model. This is done by treating the constant or intercept term in the model as a fixed constant plus a random component known as the random effect. The interpretation of this is that each small area is assigned an intercept term in the model which is allowed to vary, around some overall constant value, from one small area to another. This is usually sufficient to take account of between area variation, however it is possible to include a random effect in a parameter coefficient rather than the intercept term. Doing this further adds to the level of complexity and is not covered in this manual. For models fitted to small area level data, the inclusion of random effects may give a distinct advantage over the synthetic model approach, possibly leading to estimates with higher precision and robustness. In the case of linear models, random effects model can theoretically be shown to give small area estimates that reflect the best trade-off between the accuracy of the direct estimate and the uncertainty associated with the synthetic model. So for a small area that happens to have a low sampling error (eg because of a large sample size, say) relative to the total error (sum of sampling error of the direct estimate and synthetic model error), a random effects model will give more weight to the direct estimate for that small area. On the other hand, for a small area with high sampling error, more weight will be given to the model based estimate as this will be more reliable. While being more complex than synthetic models, random effects models can be estimated using a variety of statistical techniques. However due to their technical nature, this manual will not go into any further detail about how to apply random effects models. A more detailed treatment will be given in the forthcoming technical manual. Australian Bureau of Statistics 27

A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

Synthetic models work well when all relevant auxiliary variables that help<br />

predict the response variable are available, accurate and can be included in the<br />

model. However in practice this is more the exception than the rule.<br />

!<br />

Random Effects Regressions Models<br />

When fitting a synthetic model, the residuals should look like "white noise",<br />

however in practice they often display significant between area variation which<br />

indicates that there is some other systematic variation in the response variable<br />

between different small areas that is not being accounted for by the auxiliary<br />

variables. This implies that the synthetic model is missing certain auxiliary<br />

variables, the values of which would, had they been available, better help<br />

predict differences between small areas.<br />

This problem can be addressed by incorporating a random effect in<strong>to</strong> the<br />

model. This is done by treating the constant or intercept term in the model as<br />

a fixed constant plus a random component known as the random effect. The<br />

interpretation of this is that each small area is assigned an intercept term in the<br />

model which is allowed <strong>to</strong> vary, around some overall constant value, from one<br />

small area <strong>to</strong> another. This is usually sufficient <strong>to</strong> take account of between area<br />

variation, however it is possible <strong>to</strong> include a random effect in a parameter<br />

coefficient rather than the intercept term. Doing this further adds <strong>to</strong> the level<br />

of complexity and is not covered in this manual.<br />

For models fitted <strong>to</strong> small area level data, the inclusion of random effects may<br />

give a distinct advantage over the synthetic model approach, possibly leading<br />

<strong>to</strong> estimates with higher precision and robustness. In the case of linear models,<br />

random effects model can theoretically be shown <strong>to</strong> give small area estimates<br />

that reflect the best trade-off between the accuracy of the direct estimate and<br />

the uncertainty associated with the synthetic model. So for a small area that<br />

happens <strong>to</strong> have a low sampling error (eg because of a large sample size, say)<br />

relative <strong>to</strong> the <strong>to</strong>tal error (sum of sampling error of the direct estimate and<br />

synthetic model error), a random effects model will give more weight <strong>to</strong> the<br />

direct estimate for that small area. On the other hand, for a small area with<br />

high sampling error, more weight will be given <strong>to</strong> the model based estimate as<br />

this will be more reliable.<br />

While being more complex than synthetic models, random effects models can<br />

be estimated using a variety of statistical techniques. However due <strong>to</strong> their<br />

technical nature, this manual will not go in<strong>to</strong> any further detail about how <strong>to</strong><br />

apply random effects models. A more detailed treatment will be given in the<br />

forthcoming technical manual.<br />

Australian Bureau of Statistics 27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!