SAE Manual Sections 1 to 4_1 (May 06).pdf - National Statistical ...

More documents

Recommendations

Info

A Guide to Small Area Estimation - Version 1.1 05/05/2006 then be added to the corresponding sample totals to form small area estimates. A potential disadvantage of this approach is that the small area sub-group level model may be less efficient than a unit level model. Auxiliary data may be available at area-level or person/unit level or a combination of both. However, in practice due to confidentiality or security reasons, data from government administrative sources are more likely to be available at some aggregated level. The choice between a unit/person level or area level model will depend on the level at which data for the variable of interest and explanatory variables are available as well as the efficiency of the small area estimates generated. For example, if data for the target variable and the auxiliary variables are only available at the area level, fitting an area level model will be the only option. However if unit level data is available for all variables, either an area level or unit level model is an option. It is also possible to fit a model in which the target variable is at the unit level but some auxiliary variables are at the unit level while others are at area level. Further discussion on the choice of small area model is provided in Section 4.2 below. In practice, the efficiency of predicted small area estimates may be improved by including some auxiliary variables as small area averages. Such covariates are referred to as contextual effects and may be included as an additional covariate even if the variable already appears in the model as a unit level auxiliary variable. Contextual effects allow differences in the area level characteristics in which a person lives to be accounted for in the model. For example, high income earners living in low income areas may have quite different characteristics to people on similarly high incomes living in high income areas, and it may be important to take account of this in the model. We now give an example of the data sources and auxiliary variables that were considered for the disability empirical study. The target variable was whether or not a person has a disability. The auxiliary data was drawn from the survey, a census as well as administrative data sources and comprised: - Survey of Disability, Ageing and Carers (SDAC) (ABS, 1998) - Census of Population and Housing, 2001 (ABS) - Socio-Economic Indexes For Areas (SEIFA) (ABS) - Disability Support Pension (DSP) data from Centrelink Given these sources of data, the following auxiliary variables were considered: - proportion of people in the small area receiving the DSP, - age and sex, income, household structure (from SDAC) - Socio- Economic Indexes For Area (SEIFA) score for the small area, - Indicator of remoteness Some of these variables were only available at the area level while those sourced from SDAC/Census, for example, age, sex and income, were available at the person level. These SDAC variables were chosen subject to the requirement that these variables were similarly defined and available from the census. Another key issue relating to auxiliary data concerns the case where survey data cannot be matched to auxiliary data sources. In order to make predictions for each small area, auxiliary variables obtained from the survey must correspond closely with similar data items available for the rest of the population. If this is not the case then model predictions may be significantly biased. For example in the empirical study of small area Australian Bureau of Statistics 16
A Guide to Small Area Estimation - Version 1.1 05/05/2006 estimates of disability, we used auxiliary variables such as age, sex, income and household structure, found on the SDAC survey file to fit the model and then used the corresponding variables on the population census file to make the small area predictions. When considering potential sources of auxiliary data it is highly advisable to cast a wide net and assess the value of data that may not on first reflection appear highly relevant. For example, in the context of disability data, an economic variable in addition to health related variable may have good predictive power. Some caution however needs to be exercised as it is possible that the correlation between the target and some of the more tenuous auxiliary variables is more due to coincidence than to an intrinsic real world relationship between the two. Such auxiliary variables are referred to as spurious auxiliary variables. Demographic information is a particular form of auxiliary information, relating to population attributes such as age and sex. Many social variables will have some relationship to such demographic data thereby necessitating its use. However there is another reason for using demographic information and that is where the population size or demographic composition of small areas varies considerably. In Australia, with its extreme variation in population densities, this is a very common issue. Cross-sectional relationships Cross-sectional correlations are intrinsic relationships between units (observed at the same time point) with similar characteristics, even if they are not in the same small area. For example, units with the same age, sex and occupational characteristics may have similar health outcomes regardless of whether they live in Sydney or Melbourne. Small area methods borrow strength cross-sectionally by pooling sample data across a broader area (thus obtaining more statistical reliability) and then adjusting each small area estimate according to it's age-sex-occupation profile. In practice, borrowing strength cross-sectionally may be restricted to a predefined broader region if it is believed that cross-sectional relationships are likely to be different between regions. For example exposure to air pollutants is likely to be similar for Sydney and Melbourne but different to that of other cities. Hence Sydney and Melbourne may be combined into a broader region within which cross-sectional relationships can be drawn upon. Time Series Relationships Borrowing strength across time enables the practitioner to effectively pool sample data across time. The sample in each small area may be very sparse at a given time point, however if a sufficiently long time series exists and auto-correlations across time are reasonably strong, data from a number of time points can be pooled together giving a larger effective sample size to utilize in each small area. Time series auto-correlations are utilised to adjust for the degree of similarity or dissimilarity between units observed at specified time periods apart. This approach also has the benefit of reducing the impact of an observed value that is discordant with its neighbouring values in time. Borrowing strength across time adds a considerable degree of complexity to small area estimation and should only be contemplated where statistical expertise is available. Australian Bureau of Statistics 17
Page 1 and 2: A Guide to Small Area Estimation -
Page 15: A Guide to Small Area Estimation -
Page 35: A Guide to Small Area Estimation -

SAE Manual Sections 1 to 4_1 (May 06).pdf - National Statistical ...

Create successful ePaper yourself

Delete template?

Save as template?