SAE Manual Sections 1 to 4_1 (May 06).pdf - National Statistical ...

SAE Manual Sections 1 to 4_1 (May 06).pdf - National Statistical ... SAE Manual Sections 1 to 4_1 (May 06).pdf - National Statistical ...

12.11.2014 Views

A Guide to Small Area Estimation - Version 1.1 05/05/2006 3. Some issues in Small Area Estimation 3.1 Sources of Additional Information The aim of small area estimation is to output a set of reliable estimates for each small area for the target variable(s) of interest. The challenge therefore, in small area estimation, is how best to use innovative approaches that take advantage of additional information to circumvent the small sample size problem and provide estimates with improved quality. Small area estimation methods are effective when they can draw upon intrinsic relationships within and between the survey data and other data sources, from which they borrow strength. These relationships, which are schematically represented in Figure 3.1, may be found: o o o o o between the survey based direct estimate and auxiliary information available from administrative data sources, censuses or other surveys or in correlations between direct estimates observed across time or in spatial relationships between neighbouring small areas or in cross-sectional relationships between units with similar characteristics observed in different small areas within some broader region or any combinations of the above. Figure 3.1: Possible sources of additional information Auxiliary Data (Demographic Information) Cross-sectional Relationships Small Area Model Time Series Relationships Multivariate Correlations Spatial Effects It turns out that, in most cases, by far the most important source from which to borrow strength, is the use of auxiliary data. Auxiliary data Australian Bureau of Statistics 14

A Guide to Small Area Estimation - Version 1.1 05/05/2006 One of the more important prerequisites for the successful production of small area estimates is the availability of accurate auxiliary data that is well correlated with the target variable. By auxiliary data we mean one or more variables obtained from either administrative data sources or a census that are included in the model as explanatory variables. The auxiliary data should: o o o comprehensively cover the entire population scope for which small area estimates are required. If an auxiliary data item is not available for the unselected part of the population then small area predictions cannot be made and the affected data items cannot be included in the model. include reliable geographic information so that all units belonging to a small area can be accurately identified, and be contemporaneous with the target variable and other auxiliary data used in the model Model based small area estimates are produced by firstly fitting the model to the sample data to estimate model parameters, which include the intercept and slope parameters. The estimated model is then applied to the population auxiliary data to produce the small area predicted estimates. In the case of a purely area level model, the target variable and auxiliary variables are all at the small area level, so it is relatively straightforward to produce small area estimates as described above. However in the case of unit or person level models, the second step referred to above is a little more complex as the model fitted to the sampled units is generally applied to those population units not selected in the sample. Small area estimates are compiled by taking the sum of the sample unit values for the target variable (obtained from the survey data) and adding to it the sum of the model predictions for the non-sampled units. This approach naturally applies if the survey data can be reliably matched to the auxiliary information using a hard matching identifier such as Medicare number or tax file number. This is common practice in a number of European Union countries where national identifiers exist. However due to privacy considerations and related issues, this practice rarely occurs in Australia. Where it is not possible to distinguish between sampled and non-sampled units on the auxiliary data sources, there are two options available: - apply the model fitted to the sample data to the entire population data file, or - group population units within each small area (eg age by sex), fit a model to the small area by sub-group level sample data and then apply this model to make predictions for the non-sampled population in each small area sub-group. The first approach suffers from the disadvantage that the prediction error for the small area estimates will be increased slightly because target variable values for the sampled units are predicted from the model, thereby contributing to total model error. It would be more preferable, however to make use of the available survey response values which are not subject to model error. If the sampling fraction is very small then this should not be a major concern. The second approach has the advantage that only population counts of the non-sampled population in each small area sub-group are required to make predictions. The predicted totals for the non-sampled population (at the small area sub-group level) can Australian Bureau of Statistics 15

A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

One of the more important prerequisites for the successful production of small area<br />

estimates is the availability of accurate auxiliary data that is well correlated with the<br />

target variable. By auxiliary data we mean one or more variables obtained from either<br />

administrative data sources or a census that are included in the model as explana<strong>to</strong>ry<br />

variables. The auxiliary data should:<br />

o<br />

o<br />

o<br />

comprehensively cover the entire population scope for which small area estimates<br />

are required. If an auxiliary data item is not available for the unselected part of the<br />

population then small area predictions cannot be made and the affected data items<br />

cannot be included in the model.<br />

include reliable geographic information so that all units belonging <strong>to</strong> a small area can<br />

be accurately identified, and<br />

be contemporaneous with the target variable and other auxiliary data used in the<br />

model<br />

Model based small area estimates are produced by firstly fitting the model <strong>to</strong> the<br />

sample data <strong>to</strong> estimate model parameters, which include the intercept and slope<br />

parameters. The estimated model is then applied <strong>to</strong> the population auxiliary data <strong>to</strong><br />

produce the small area predicted estimates.<br />

In the case of a purely area level model, the target variable and auxiliary variables are<br />

all at the small area level, so it is relatively straightforward <strong>to</strong> produce small area<br />

estimates as described above. However in the case of unit or person level models, the<br />

second step referred <strong>to</strong> above is a little more complex as the model fitted <strong>to</strong> the<br />

sampled units is generally applied <strong>to</strong> those population units not selected in the<br />

sample. Small area estimates are compiled by taking the sum of the sample unit<br />

values for the target variable (obtained from the survey data) and adding <strong>to</strong> it the sum<br />

of the model predictions for the non-sampled units.<br />

This approach naturally applies if the survey data can be reliably matched <strong>to</strong> the<br />

auxiliary information using a hard matching identifier such as Medicare number or tax<br />

file number. This is common practice in a number of European Union countries<br />

where national identifiers exist. However due <strong>to</strong> privacy considerations and related<br />

issues, this practice rarely occurs in Australia. Where it is not possible <strong>to</strong> distinguish<br />

between sampled and non-sampled units on the auxiliary data sources, there are two<br />

options available:<br />

- apply the model fitted <strong>to</strong> the sample data <strong>to</strong> the entire population data file, or<br />

- group population units within each small area (eg age by sex), fit a model <strong>to</strong> the<br />

small area by sub-group level sample data and then apply this model <strong>to</strong> make<br />

predictions for the non-sampled population in each small area sub-group.<br />

The first approach suffers from the disadvantage that the prediction error for the small<br />

area estimates will be increased slightly because target variable values for the sampled<br />

units are predicted from the model, thereby contributing <strong>to</strong> <strong>to</strong>tal model error. It would<br />

be more preferable, however <strong>to</strong> make use of the available survey response values which<br />

are not subject <strong>to</strong> model error. If the sampling fraction is very small then this should not<br />

be a major concern.<br />

The second approach has the advantage that only population counts of the non-sampled<br />

population in each small area sub-group are required <strong>to</strong> make predictions. The<br />

predicted <strong>to</strong>tals for the non-sampled population (at the small area sub-group level) can<br />

Australian Bureau of Statistics 15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!