12.11.2014 Views

SAE Manual Sections 1 to 4_1 (May 06).pdf - National Statistical ...

SAE Manual Sections 1 to 4_1 (May 06).pdf - National Statistical ...

SAE Manual Sections 1 to 4_1 (May 06).pdf - National Statistical ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A Guide <strong>to</strong> Small Area Estimation - Version 1.1<br />

Key Clients: <strong>National</strong> <strong>Statistical</strong> Centres and Client Services<br />

<strong>May</strong> 20<strong>06</strong>


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

The ABS intends <strong>to</strong> periodically update this manual. Therefore, the<br />

ABS would welcome any comments and suggestions from users.<br />

Readers who would like more information or who would like <strong>to</strong><br />

forward comments on this manual may contact any of the<br />

following ABS officers:<br />

Location Contact Name Phone Number Email address<br />

Central Office Daniel Elazar +61 2 6252 6962 daniel.elazar@abs.gov.au<br />

NSW Edward Szoldra +61 2 9268 4214 edward.szoldra@abs.gov.au<br />

QLD Brett Frazer<br />

John Pres<strong>to</strong>n<br />

SA Justin Lokhorst<br />

Philip Bell<br />

+61 7 3222 6028<br />

+61 7 3222 6229<br />

+61 8 8237 7476<br />

+61 8 8237 7304<br />

brett.frazer@abs.gov.au<br />

john.pres<strong>to</strong>n@abs.gov.au<br />

justin.lokhorst@abs.gov.au<br />

philip.bell@abs.gov.au<br />

TAS Keith Farwell +61 3 6222 5889 keith.farwell@abs.gov.au<br />

VIC Elsa Lapiz +61 3 9615 7364 elsa.lapiz@abs.gov.au<br />

WA Carl Mackin +61 8 9360 5250 carl.mackin@abs.gov.au<br />

Australian Bureau of Statistics 2


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

Contents<br />

1 Introduction 5<br />

1.1 What are Small Area Estimates? 5<br />

1.2 Background of the Small Area Practice <strong>Manual</strong> 6<br />

1.3 Purpose 7<br />

1.4 What are the primary uses for Small Area Estimates? 9<br />

1.5 When should Small Area Estimates be Produced? 9<br />

2 Assessing User Requirements 10<br />

2.1 User Requirements 10<br />

3 Some issues in Small Area Estimation 14<br />

3.1 Sources of Additional Information 14<br />

3.2 Basic Conditions for Success 18<br />

3.3 Choice of Small Area 20<br />

3.4 Variable of Interest 22<br />

3.5 Quality of Auxiliary Data 22<br />

3.6 Confidentiality 24<br />

4 Choice of Small Area Techniques 25<br />

4.1 Types of Small Area Estimation Techniques 25<br />

4.1.1 Simple Small Area Methods 25<br />

4.1.2 Regression Methods 26<br />

4.2 The Modelling Framework 28<br />

4.3 Trade-off between Quality, Cost, Time and Effort 33<br />

5 Case Studies of Small Area Applications 36<br />

5.1 Simple Small Area Models 36<br />

5.1.1 Broad Area Ratio Estima<strong>to</strong>r with No Auxiliary Data 36<br />

5.1.2 Broad Area Ratio Estima<strong>to</strong>r with Auxiliary Data 42<br />

5.2 Regression Based Models 46<br />

5.2.1 Overview 46<br />

5.2.2 Framework for Regression Based Models 48<br />

5.2.3 Regression Based Synthetic Estimates 51<br />

5.2.4 Generating Small Area Estimates from<br />

Person Level Models 56<br />

5.2.5 Discussion of Examples 1-4 60<br />

Australian Bureau of Statistics 3


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

6 Diagnostics For the Quality Small Area Estimates 62<br />

6.1 Introduction 62<br />

6.2 Diagnostics From Case Study 62<br />

6.3 Assessment of Models Against Diagnostics 76<br />

7 Communicating Quality <strong>to</strong> Users 78<br />

7.1 Introduction 78<br />

7.2 Sources of Error 80<br />

7.3 Impact of Errors 83<br />

7.4 Explana<strong>to</strong>ry Notes 85<br />

8 Summary 88<br />

8.1 Points <strong>to</strong> Consider 88<br />

8.2 What Areas of the ABS Provide Small Area Estimates 88<br />

9 Frequently Asked Questions 90<br />

APPENDICES 92<br />

Appendix 1: List of Previous Small Area Work 92<br />

Appendix 2: Technical Notes of Estima<strong>to</strong>rs 96<br />

Appendix 3: SAS Datasets and Codes 100<br />

Appendix 4: Diagnostics Graphs 101<br />

Appendix 5: Explana<strong>to</strong>ry Notes 103<br />

Appendix 6: Quality Declaration 127<br />

BIBLIOGRAPHY 139<br />

LIST OF ACRONYMS 140<br />

Australian Bureau of Statistics 4


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

1.1 What are Small Area Estimates?<br />

1. Introduction<br />

Most ABS surveys are designed <strong>to</strong> provide statistically reliable, design based estimates<br />

only at the national and/or state/terri<strong>to</strong>ry geographic levels. The sheer practical<br />

difficulties and cost of implementing and conducting sample surveys that would provide<br />

reliable estimates at levels finer than state/terri<strong>to</strong>ry are generally prohibitive, both in<br />

terms of the increased sample size required and the added burden on providers of<br />

survey data (respondents). For purposes of this manual, small area estimation refers<br />

<strong>to</strong> methods of producing sufficiently reliable estimates for geographic areas that are <strong>to</strong>o<br />

fine <strong>to</strong> obtain with precision, using direct survey estimation methods. By direct<br />

estimation we mean classical design based survey estimation methods (Saei and<br />

Chambers, 2003) that utilise only the sample units contained in each small area. Small<br />

area estimation methods are used <strong>to</strong> overcome the problem of small samples sizes <strong>to</strong><br />

produce small area estimates that improve upon the quality of direct survey estimates<br />

obtained from the sample in each small area. The more sophisticated of these methods<br />

work by taking advantage of various relationships in the data, and involve, either<br />

implicitly or explicitly, a statistical model 1 <strong>to</strong> describe these relationships. (See <strong>Sections</strong><br />

4.1 & 4.2 for further discussion).<br />

Although conceptually similar, small domain estimates refers <strong>to</strong> those disaggregated<br />

<strong>to</strong> fine classifica<strong>to</strong>ry levels, such as by socioeconomic status, income, labour force status<br />

or industry. It is important <strong>to</strong> note that we have not undertaken any empirical study for<br />

small domain estimation methods for this manual, although intuitively we would expect<br />

that most techniques covered in this manual would still apply. The empirical analysis of<br />

this manual is based on knowledge and experience derived from only one empirical<br />

study, this being a study of the incidence of disability in Australia. This study uses data<br />

from the Survey of Disability, Ageing and Carers (SDAC) (see ABS (2003) for more<br />

details).<br />

1 A statistical model is a mathematical representation of the relationship we assume <strong>to</strong> exist<br />

between the variable we are interested in predicting (known as the response or dependent<br />

variable) and other associated variables (known as the auxiliary, explana<strong>to</strong>ry or independent<br />

variable). A model is then fitted <strong>to</strong> data that contains observed values for both the<br />

dependent variable and the auxiliary variables for each unit. The fitting process produces<br />

estimates of the model parameters such as intercepts and slopes. The unit here may be a<br />

person, a business or a small area itself, depending upon the level at which we wish <strong>to</strong> fit the<br />

model. The model also includes one or more error terms <strong>to</strong> describe the degree of<br />

s<strong>to</strong>chastic or random variation with which predicted values for the response variable deviate<br />

from the observed values.<br />

Australian Bureau of Statistics 5


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

1.2 Background <strong>to</strong> the Small Area Practice <strong>Manual</strong><br />

The small area practice manual project was developed <strong>to</strong> give a simple and clear guide of<br />

how <strong>to</strong> undertake small area estimation. The ABS has previously carried out a number of<br />

small area estimates projects (See Appendix 1). In recent years user demand for these<br />

kinds of statistics has increased. Most of this increase in demand has become apparent<br />

during specific consultations between the ABS and key government users <strong>to</strong> gauge and<br />

assess users’ medium and long term statistical data requirements. Consolidated<br />

examples of these can be found in the Information Development Plan (IDP) (ABS, 2005)<br />

(Catalogue 1362.0 <strong>to</strong> be released in early 20<strong>06</strong>) and State <strong>Statistical</strong> Priorities (ABS<br />

Corporate Information - State <strong>Statistical</strong> Forum 16 February 2005)<br />

This reflects the growing statistical sophistication of users. Also local government bodies<br />

such as cities, councils and shires are taking on a greater role in the long term planning<br />

and socio-economic development of their regions. This increase in demand for small<br />

area data is occurring globally. In response, more advanced methods for producing<br />

reliable small area statistics are being developed and are gaining methodological<br />

acceptance. This was recognised at an international conference held on small area<br />

statistics in Riga, Latvia in 1999 where the Deputy Australian Statistician encouraged<br />

<strong>National</strong> <strong>Statistical</strong> Organisations <strong>to</strong> make greater use of model-based methods <strong>to</strong><br />

produce small area statistics (Trewin 1999). The paper also noted that explaining quality<br />

is an especially important issue for a <strong>National</strong> Statistics Office when producing these<br />

types of estimates and products.<br />

Various areas within the ABS have been involved in the provision of small area estimates<br />

<strong>to</strong> varying levels of sophistication in both the methods used and the quality of the<br />

estimates produced. Table A.1 of Appendix 1 contains a selection of the major pieces of<br />

small area work that have been conducted <strong>to</strong> date. In addition, there has been no<br />

definitive set of clear, ABS wide guidelines on how <strong>to</strong> assess the quality of small area<br />

estimates and what should be the agreed minimum level of quality required before<br />

releasing small area statistics <strong>to</strong> external clients. In other words, what needs <strong>to</strong> be<br />

developed is a cohesive, coordinated approach <strong>to</strong> the production of small area<br />

estimates.<br />

There is a strong need <strong>to</strong> set up a framework for the practice of small area estimation at<br />

all levels of involvement in the small area statistical process. These include client services<br />

areas in regional offices or Central Office (CO), Methodology Division, <strong>National</strong><br />

<strong>Statistical</strong> Centres and senior managers responsible for clearing and releasing small area<br />

output. Such a framework is important for ensuring that consistent practices are used<br />

across the ABS in producing small area estimates and that these practices accord with<br />

best practices used both in the ABS and in statistical agencies overseas.<br />

Australian Bureau of Statistics 6


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

A consistent approach <strong>to</strong> the production of small area estimates is important for the<br />

following reasons:<br />

o<br />

o<br />

o<br />

o<br />

a need for the ABS <strong>to</strong> more precisely understand users' small area needs, ie how they<br />

utilise small area estimates in their decision making. Getting this right at the outset<br />

will ensure effort is efficiently directed <strong>to</strong> producing small area estimates that are fit<br />

for purpose.<br />

<strong>to</strong> ensure that small area estimates are produced with sufficient quality and are<br />

appropriate <strong>to</strong> user requirements.<br />

<strong>to</strong> ensure that users fully understand the assumptions and conditions underpinning<br />

output data and the fitness for use.<br />

<strong>to</strong> ensure small area estimation methodologies are sound, robust and practicable for<br />

a large range of small area estimation problems.<br />

Linked <strong>to</strong> this is the broader issue of the circumstances in which the ABS should or<br />

should not be producing small area estimates. These decisions need <strong>to</strong> be made by<br />

determining the risk that the provision of such data will detract from informed decision<br />

making.<br />

This manual has been prepared on the basis of work done on small area estimates of<br />

disability. Although the wording of the manual inadvertently reflects the context of a<br />

household based population survey, the small area methods described can also be<br />

applied <strong>to</strong> the context of economic/business collections. As further empirical studies are<br />

applied <strong>to</strong> other data contexts we anticipate that the manual will be expanded and<br />

adapted <strong>to</strong> include examples relating <strong>to</strong> economic data.<br />

1.3 Purpose<br />

This volume of the manual, which is the first of two volumes, the second of which will<br />

contain a more technical treatment, aims <strong>to</strong> provide a simple non-technical guide on the<br />

production, uses, quality and validation of small area estimates. The intended audience<br />

includes survey practitioners, consultants, methodologists and users of small area data.<br />

The broad objectives of the Small Area Estimation Practice <strong>Manual</strong> are as follows:<br />

o<br />

o<br />

o<br />

o<br />

To build a stable bridge between the knowledge, the theory and the practice of small<br />

area estimation while taking account of ABS priorities and policies with regards <strong>to</strong><br />

the production of small area statistics. This should result in a more consistent and<br />

quality assured approach <strong>to</strong> producing small area estimates within the ABS.<br />

To realise a quantum increase in the level of ABS knowledge and understanding of<br />

small area estimation techniques, how and under what conditions they can be<br />

applied, and how <strong>to</strong> measure or assess the quality of the small area estimates<br />

produced.<br />

To provide coherent, relevant, accurate and accessible information on small area<br />

estimation practices and techniques which are used regularly by their intended<br />

audiences and updated <strong>to</strong> reflect increases in knowledge and understanding.<br />

To ensure that practitioners within the ABS have a clear understanding of the quality<br />

and assumptions underpinning the small area estimates produced and that these are<br />

clearly communicated <strong>to</strong> users so that small area estimates are used appropriately<br />

and for the purposes intended.<br />

Intended audience<br />

Australian Bureau of Statistics 7


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

o<br />

o<br />

o<br />

This guide aims <strong>to</strong> give advice for <strong>National</strong> <strong>Statistical</strong> Centres and regional offices on<br />

how <strong>to</strong> advise, respond <strong>to</strong> and incorporate small area estimates in<strong>to</strong> their work, so<br />

they can apply simple models themselves and know when <strong>to</strong> draw on<br />

methodological skills for more complex models.<br />

A second volume of the manual will cover the more technical aspects of small area<br />

estimation and will be primarily aimed at methodologists and technical analysts<br />

involved in producing modeled small area estimates. The technical manual will cover<br />

in more detail the methodological and statistical issues that arise in small area<br />

estimation.<br />

The content of this manual contains material on the application of basic statistical<br />

models. The manual therefore assumes the reader has a basic familiarity with the<br />

theory and application of such models. Some parts of the manual contain references<br />

<strong>to</strong> somewhat more advanced methods. In such instances warning boxes strongly<br />

recommend <strong>to</strong> the reader that further methodological advice should be obtained<br />

from Methodology Division (ABS) before applying such techniques.<br />

What it is - A Guide <strong>to</strong>:<br />

o<br />

o<br />

o<br />

o<br />

o<br />

o<br />

what issues need <strong>to</strong> be thought through before undertaking a small area exercise,<br />

the methods and techniques available in small area estimation, the relative<br />

advantages and disadvantages and assumptions involved in each,<br />

who <strong>to</strong> talk <strong>to</strong>, who has implemented specific approaches in<strong>to</strong> practice already and<br />

where <strong>to</strong> find relevant documentation,<br />

the trips and traps of putting various techniques in<strong>to</strong> practice,<br />

how <strong>to</strong> best measure the reliability of small area predictions,<br />

how <strong>to</strong> detect model miss-specification and what diagnostics are available for<br />

assessing the overall quality of small area estimates.<br />

What it is not<br />

o<br />

An up-<strong>to</strong>- date encyclopedia of all the literature on small area techniques. The focus<br />

of this manual is much more on the practice of small area estimation in the<br />

production of government statistics. Compiling and maintaining an up-<strong>to</strong>-date<br />

summary of the technical literature would be highly resource intensive as the field is<br />

relatively new and rapidly evolving. It would also make it more difficult for the<br />

practitioner <strong>to</strong> access.<br />

Finally, we emphasise that this manual has been written under the assumption that the<br />

primary goal of small area data users is <strong>to</strong> obtain descriptive statistics of the relative<br />

characteristics of small areas rather than obtain the form of some dynamic structural<br />

process which generates those small area characteristics. The manual is therefore<br />

premised upon a descriptive framework for the ultimate decision making objectives,<br />

even though analytical methods are used <strong>to</strong> construct the models used <strong>to</strong> predict those<br />

small area characteristics. In other words, we assume users are primarily interested in<br />

the predictions from those models, not just the form and structure of the models per se.<br />

Australian Bureau of Statistics 8


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

1.4 What are the primary uses for Small Area Estimates?<br />

Federal, state and local government bodies involved in program funding / evaluation or<br />

regional planning are typically the primary users of ABS small area data. They require<br />

estimates of specified accuracy <strong>to</strong> assist them in making informed decisions on how <strong>to</strong><br />

allocate resources or apply for additional resources. The need for government services<br />

<strong>to</strong> justify their decision making and be accountable <strong>to</strong> the community is seen as a very<br />

important fac<strong>to</strong>r.<br />

Small area estimates are often used by program administra<strong>to</strong>rs <strong>to</strong> determine or<br />

benchmark their funding allocations. Without the small area information, the<br />

administra<strong>to</strong>rs have difficulty in assessing the actual need for goods and services in each<br />

area. This can result in undesirable scenarios such as "the squeaky wheel gets the<br />

grease", whereby interest groups or areas which are most vocal receive a greater share of<br />

the funding allocations. Small area estimates provide detailed information on each area<br />

allowing for objective and informed decision making.<br />

Local government demand for small area data has also increased as they become<br />

increasingly aware and interested in the role statistics can play in informing them about<br />

what is happening in their own jurisdictions.<br />

1.5 When should Small Area Estimates be Produced?<br />

Small area estimates should only be produced when there is strong and justified user<br />

demand as well as no alternate data at the small area level that will serve the required<br />

purpose. In addition there needs <strong>to</strong> be adequate survey and auxiliary data <strong>to</strong> ensure that<br />

the outputs produced will be of sufficient quality <strong>to</strong> fit their intended purpose.<br />

Small area estimates should primarily be considered where key policy making decisions<br />

require discerning between relative needs of different small areas and such information<br />

does not currently exist or requires updating (eg. Disability data). To develop small area<br />

estimates, significant resources in staff time <strong>to</strong> develop, check and get approval for<br />

release is needed. The complexity of most small area estimation exercises and the<br />

difficulty in validating the reliability of the output makes it very difficult <strong>to</strong> fully au<strong>to</strong>mate<br />

the production process. To a large extent, each small area undertaking has <strong>to</strong> be tailored<br />

<strong>to</strong> the nature and specifics of the problem at hand. Therefore, care needs <strong>to</strong> be taken <strong>to</strong><br />

ensure the need for the small area estimates warrants the effort required.<br />

The first step is <strong>to</strong> discuss with the users <strong>to</strong> see if state or part of state estimates would<br />

be adequate. If there is not much variation between the small areas then more broad<br />

estimates would be adequate. It is also worth investigating any sources of administrative<br />

data that can be used as auxiliary data for a small area model. Finally, it is worthwhile<br />

checking that the chosen small area model fitted <strong>to</strong> the data is appropriate for that data<br />

and inherent assumptions in the model do at least approximately hold. For example<br />

fitting a linear model <strong>to</strong> the data would require that the errors are identically and<br />

independently distributed with zero mean and constant variance. It is therefore prudent<br />

<strong>to</strong> check such assumptions are reasonable and have been satisfied before estimating the<br />

model.<br />

Australian Bureau of Statistics 9


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

2. Assessing User Requirements<br />

2.1 User Requirements<br />

Understanding user requirements for small area estimates is paramount for providing a<br />

high quality small area product that meets the client's decision making requirements.<br />

The importance of gaining a thorough understanding of user requirements at this initial<br />

phase cannot be over emphasised. Shortcuts taken at this phase will often lead <strong>to</strong> an<br />

inferior quality product and/or valuable time and resources lost along the way. With the<br />

right questions, users will be able <strong>to</strong> give a clear indication as <strong>to</strong> what information is<br />

critical in their decision making. Users are also a valuable resource in helping <strong>to</strong><br />

determine the best potential sources for borrowing strength.<br />

Where complex techniques need <strong>to</strong> be applied, Methodology Division (MD) staff will<br />

need <strong>to</strong> be involved in performing the methodological work and it is highly<br />

recommended that MD staff are directly involved in the discussion with clients at the<br />

earliest possible opportunity.<br />

Table 2.1 below displays a checklist of the key questions <strong>to</strong> ask clients when<br />

commencing a small area exercise.<br />

Table 2.1 Checklist of Questions <strong>to</strong> Ask Users.<br />

Question<br />

A) What are the key policy making or program funding decisions that require small area data ?<br />

B) What are the organisation's strategic context, goals and desired outcomes, in which these<br />

decision making requirements are nested ?<br />

C) What small area data do users think would best meet their decision making requirements<br />

and what level of geography is required ?<br />

D) What are the consequences for users’ decision making outcomes if the small area data is<br />

incorrect, say, by 5%, 10%, 20%, etc? Which small area estimates have the greatest priority in<br />

terms of accuracy requirements ?<br />

E) Are there any conceptual models, either social or economic, that are believed <strong>to</strong> describe the<br />

process which influences the variable(s) for which we are <strong>to</strong> calculate small area estimates ?<br />

F) What administrative data is available and relevant as auxiliary information <strong>to</strong> support the<br />

modeling of the small area estimates? How is this data collected, for what purpose is it used,<br />

and how accurate is it likely <strong>to</strong> be ?<br />

G) Will small area estimates be required <strong>to</strong> be disaggregated by other categories ?<br />

H) What previous studies have been used, if any, <strong>to</strong> undertake the policy/funding decision for<br />

which small area estimates are required ?<br />

Australian Bureau of Statistics 10


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

A) What are the key policy making or program funding decisions that<br />

require small area data?<br />

Knowing how the small area data will be used as input <strong>to</strong> user’s decision making process<br />

is essential in ensuring the small area output meets user requirements. User decision<br />

making requirements can vary considerably. Some may be quite sophisticated and<br />

quantitatively based. Others may be quite informal and qualitatively based. In the former<br />

case, the decision making process should be identified and well unders<strong>to</strong>od as inherent<br />

assumptions may help determine just how accurate small area data really needs <strong>to</strong> be. It<br />

is also important <strong>to</strong> ensure, where possible, that the small area data is consistent and<br />

compatible with the users’ decision making process, and that the output of this process<br />

meets user expectations, not just the ABS small area output. A quality assessment should<br />

include measures of the fitness for purpose of small area output.<br />

However many users do not have sophisticated, quantitatively based decision making<br />

processes, and may have difficulty in articulating the very nature of the problem they<br />

wish <strong>to</strong> solve.<br />

Before undertaking the project it is worth investigating whether the small area estimates<br />

requested may suit the needs of a wider range of clients. Quite often similar data is<br />

required by different clients and can be useful for a wide range of users. By incorporating<br />

their needs in<strong>to</strong> the project, this increases the value of the final product with minimal<br />

additional cost.<br />

B) What are the organisation's strategic context, goals and desired<br />

outcomes, in which these decision making requirements are nested?<br />

Need <strong>to</strong> ask users what the data problem is, why data needs <strong>to</strong> be obtained, the decision<br />

making processes used, what the users are trying <strong>to</strong> find out and why. This can be<br />

matched up with what is possible <strong>to</strong> estimate from the available data. Any possible<br />

limitations then can be identified early and additional information can be sought or the<br />

user can be made aware. When the final product is created the user has a good<br />

understanding of the limitations and the product is a close as is possible <strong>to</strong> what they<br />

need.<br />

C) What small area data do users think would best meet their decision<br />

making requirements and what level of geography is required?<br />

A minimum level of information on the variable of interest is needed in each small area.<br />

Given the available data, the user needs <strong>to</strong> be aware that a given level of the quality for<br />

the small area estimates is subject <strong>to</strong> a trade-off between the level of what geographic<br />

level and level of detail in the data is possible <strong>to</strong> model. That is, in the context of<br />

household based collections, a reasonably common characteristic of the variable of<br />

interest (say, greater than 10%) may be estimated at a reasonably fine level of geography<br />

such as <strong>Statistical</strong> Local Area (SLA). However, a variable of interest representing less<br />

than 1% of the population, can only be reliably estimated at a broader level of geography<br />

such as <strong>Statistical</strong> Sub-Division (SSD). For example, in the disability study estimates for<br />

physical disability (which accounts for more than 10%) could be obtained at a reasonably<br />

Australian Bureau of Statistics 11


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

fine geographic and level of detail as compared <strong>to</strong> psychological disability (which is<br />

around 1%). This choice also depends on the quality of data which is discussed in the<br />

next section.<br />

D) What are the consequences for users’ decision making outcomes if the<br />

small area data is incorrect, say, by 5%, 10%, 20%, etc? Which small<br />

area estimates have the greatest priority in terms of accuracy<br />

requirements?<br />

The answer <strong>to</strong> this question will drive the level of quality and hence resources required<br />

<strong>to</strong> produce small area estimates of acceptable quality <strong>to</strong> users. If large funds from a<br />

government program are <strong>to</strong> be allocated <strong>to</strong> regions based on the small area estimates,<br />

then a high level of quality assurance and validation is required. However, if all that is<br />

required is an approximate guide <strong>to</strong> indicate areas where there may be unmet need, say<br />

for program evaluation purposes, then broad quality checks may be adequate.<br />

To assess how accurate small area estimates need <strong>to</strong> be before they start <strong>to</strong> adversely<br />

impact upon decision making outcomes, it is important <strong>to</strong> understand the entire<br />

decision making process and the way in which small area estimates feeding in<strong>to</strong> that<br />

process impact upon the outputs. This involves understanding the assumptions implicit<br />

in the process. The analyst needs <strong>to</strong> work out, in consultation with users, how accurate<br />

final decision making outcomes need <strong>to</strong> be. By working backwards, it may be possible<br />

<strong>to</strong> work out what level of accuracy in the small areas estimates will give this level of<br />

accuracy in the decision making outcomes. A sensitivity analysis is another approach<br />

that can also be undertaken <strong>to</strong> determine how sensitive final decisions are <strong>to</strong> changes in<br />

the small area estimates.<br />

Zaslavsky and Schirm (2002) discuss, in the context of funding allocations, how<br />

interactions between the provisions of the funding formula, data sources and estimation<br />

procedures used <strong>to</strong> derive formula inputs can have unanticipated consequences that are<br />

inconsistent with the policy goals of a program.<br />

E) Are there any conceptual models, either social or economic, that are<br />

believed <strong>to</strong> describe the process which influences the variable(s) for<br />

which we are <strong>to</strong> calculate small area estimates<br />

This is a great opportunity <strong>to</strong> get expert advice on what variables should have a<br />

relationship with the population of interest. This will give a theoretical base <strong>to</strong> look at<br />

certain variables which can then be confirmed by statistical analysis. A widely accepted<br />

theoretical model or framework, published in the literature and/or supported by<br />

empirical investigations can greatly assist in deciding which variables, interaction terms<br />

and contextual effects should be included in the small area model or in validating the<br />

predicted estimates. Should you decide <strong>to</strong> include other variables not included in the<br />

framework or exclude variables that are included, you are aware of the potential need <strong>to</strong><br />

justify the decision.<br />

Australian Bureau of Statistics 12


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

F) What administrative data is available and relevant as auxiliary<br />

information <strong>to</strong> support the modeling of the small area estimates? How is<br />

this data collected, for what purpose is it used, and how accurate is it<br />

likely <strong>to</strong> be?<br />

It is important <strong>to</strong> cast the net wide in considering all potential sources of auxiliary data<br />

that may help improve the goodness of fit and specification of the small area model.<br />

The importance of understanding differences between auxiliary data and the survey data<br />

cannot be overstated. Administrative datasets may not reflect the entire population of<br />

interest or be as reliable as it is captured during some other process (ie. tax collection).<br />

A careful assessment should be made of the differences in:<br />

o<br />

o<br />

o<br />

o<br />

o<br />

o<br />

o<br />

concepts,<br />

data item definitions,<br />

(standard) classifications used<br />

scope<br />

mode of data collection<br />

reference periods<br />

editing procedures<br />

across all the data sources in order <strong>to</strong> at least understand the limitations of the small<br />

area model.<br />

G) Will small area estimates be required <strong>to</strong> be disaggregated by other<br />

categories?<br />

Users often request a whole range of small area data at different levels that may actually<br />

be superfluous <strong>to</strong> their needs. Here it is useful <strong>to</strong> find out what is the minimum level of<br />

data and geographic detail required <strong>to</strong> meet their needs. Prioritise any further<br />

breakdowns either at the geographic or sub-population level so during the modeling<br />

time is best spent on the essential models.<br />

H) What previous studies have been used, if any, by the clients <strong>to</strong><br />

undertake the policy/funding decision for which small area estimates are<br />

required?<br />

This allows the project <strong>to</strong> compare results <strong>to</strong> current or previous studies, which will give<br />

a good outline if it is consistent with other research. It also allows research in<strong>to</strong> what<br />

problems have come up in the past.<br />

Australian Bureau of Statistics 13


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

3. Some issues in Small Area Estimation<br />

3.1 Sources of Additional Information<br />

The aim of small area estimation is <strong>to</strong> output a set of reliable estimates for each small<br />

area for the target variable(s) of interest. The challenge therefore, in small area<br />

estimation, is how best <strong>to</strong> use innovative approaches that take advantage of additional<br />

information <strong>to</strong> circumvent the small sample size problem and provide estimates with<br />

improved quality. Small area estimation methods are effective when they can draw upon<br />

intrinsic relationships within and between the survey data and other data sources, from<br />

which they borrow strength. These relationships, which are schematically represented in<br />

Figure 3.1, may be found:<br />

o<br />

o<br />

o<br />

o<br />

o<br />

between the survey based direct estimate and auxiliary information available from<br />

administrative data sources, censuses or other surveys or<br />

in correlations between direct estimates observed across time or<br />

in spatial relationships between neighbouring small areas or<br />

in cross-sectional relationships between units with similar characteristics observed in<br />

different small areas within some broader region<br />

or any combinations of the above.<br />

Figure 3.1: Possible sources of additional information<br />

Auxiliary Data<br />

(Demographic<br />

Information)<br />

Cross-sectional<br />

Relationships<br />

Small<br />

Area<br />

Model<br />

Time Series<br />

Relationships<br />

Multivariate<br />

Correlations<br />

Spatial<br />

Effects<br />

It turns out that, in most cases, by far the most important source from which <strong>to</strong> borrow<br />

strength, is the use of auxiliary data.<br />

Auxiliary data<br />

Australian Bureau of Statistics 14


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

One of the more important prerequisites for the successful production of small area<br />

estimates is the availability of accurate auxiliary data that is well correlated with the<br />

target variable. By auxiliary data we mean one or more variables obtained from either<br />

administrative data sources or a census that are included in the model as explana<strong>to</strong>ry<br />

variables. The auxiliary data should:<br />

o<br />

o<br />

o<br />

comprehensively cover the entire population scope for which small area estimates<br />

are required. If an auxiliary data item is not available for the unselected part of the<br />

population then small area predictions cannot be made and the affected data items<br />

cannot be included in the model.<br />

include reliable geographic information so that all units belonging <strong>to</strong> a small area can<br />

be accurately identified, and<br />

be contemporaneous with the target variable and other auxiliary data used in the<br />

model<br />

Model based small area estimates are produced by firstly fitting the model <strong>to</strong> the<br />

sample data <strong>to</strong> estimate model parameters, which include the intercept and slope<br />

parameters. The estimated model is then applied <strong>to</strong> the population auxiliary data <strong>to</strong><br />

produce the small area predicted estimates.<br />

In the case of a purely area level model, the target variable and auxiliary variables are<br />

all at the small area level, so it is relatively straightforward <strong>to</strong> produce small area<br />

estimates as described above. However in the case of unit or person level models, the<br />

second step referred <strong>to</strong> above is a little more complex as the model fitted <strong>to</strong> the<br />

sampled units is generally applied <strong>to</strong> those population units not selected in the<br />

sample. Small area estimates are compiled by taking the sum of the sample unit<br />

values for the target variable (obtained from the survey data) and adding <strong>to</strong> it the sum<br />

of the model predictions for the non-sampled units.<br />

This approach naturally applies if the survey data can be reliably matched <strong>to</strong> the<br />

auxiliary information using a hard matching identifier such as Medicare number or tax<br />

file number. This is common practice in a number of European Union countries<br />

where national identifiers exist. However due <strong>to</strong> privacy considerations and related<br />

issues, this practice rarely occurs in Australia. Where it is not possible <strong>to</strong> distinguish<br />

between sampled and non-sampled units on the auxiliary data sources, there are two<br />

options available:<br />

- apply the model fitted <strong>to</strong> the sample data <strong>to</strong> the entire population data file, or<br />

- group population units within each small area (eg age by sex), fit a model <strong>to</strong> the<br />

small area by sub-group level sample data and then apply this model <strong>to</strong> make<br />

predictions for the non-sampled population in each small area sub-group.<br />

The first approach suffers from the disadvantage that the prediction error for the small<br />

area estimates will be increased slightly because target variable values for the sampled<br />

units are predicted from the model, thereby contributing <strong>to</strong> <strong>to</strong>tal model error. It would<br />

be more preferable, however <strong>to</strong> make use of the available survey response values which<br />

are not subject <strong>to</strong> model error. If the sampling fraction is very small then this should not<br />

be a major concern.<br />

The second approach has the advantage that only population counts of the non-sampled<br />

population in each small area sub-group are required <strong>to</strong> make predictions. The<br />

predicted <strong>to</strong>tals for the non-sampled population (at the small area sub-group level) can<br />

Australian Bureau of Statistics 15


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

then be added <strong>to</strong> the corresponding sample <strong>to</strong>tals <strong>to</strong> form small area estimates. A<br />

potential disadvantage of this approach is that the small area sub-group level model may<br />

be less efficient than a unit level model.<br />

Auxiliary data may be available at area-level or person/unit level or a combination of<br />

both. However, in practice due <strong>to</strong> confidentiality or security reasons, data from<br />

government administrative sources are more likely <strong>to</strong> be available at some aggregated<br />

level. The choice between a unit/person level or area level model will depend on the<br />

level at which data for the variable of interest and explana<strong>to</strong>ry variables are available as<br />

well as the efficiency of the small area estimates generated. For example, if data for<br />

the target variable and the auxiliary variables are only available at the area level, fitting<br />

an area level model will be the only option. However if unit level data is available for<br />

all variables, either an area level or unit level model is an option. It is also possible <strong>to</strong><br />

fit a model in which the target variable is at the unit level but some auxiliary variables<br />

are at the unit level while others are at area level. Further discussion on the choice of<br />

small area model is provided in Section 4.2 below.<br />

In practice, the efficiency of predicted small area estimates may be improved by<br />

including some auxiliary variables as small area averages. Such covariates are referred <strong>to</strong><br />

as contextual effects and may be included as an additional covariate even if the variable<br />

already appears in the model as a unit level auxiliary variable. Contextual effects allow<br />

differences in the area level characteristics in which a person lives <strong>to</strong> be accounted for in<br />

the model. For example, high income earners living in low income areas may have quite<br />

different characteristics <strong>to</strong> people on similarly high incomes living in high income areas,<br />

and it may be important <strong>to</strong> take account of this in the model.<br />

We now give an example of the data sources and auxiliary variables that were considered<br />

for the disability empirical study. The target variable was whether or not a person has a<br />

disability. The auxiliary data was drawn from the survey, a census as well as<br />

administrative data sources and comprised:<br />

- Survey of Disability, Ageing and Carers (SDAC) (ABS, 1998)<br />

- Census of Population and Housing, 2001 (ABS)<br />

- Socio-Economic Indexes For Areas (SEIFA) (ABS)<br />

- Disability Support Pension (DSP) data from Centrelink<br />

Given these sources of data, the following auxiliary variables were considered:<br />

- proportion of people in the small area receiving the DSP,<br />

- age and sex, income, household structure (from SDAC)<br />

- Socio- Economic Indexes For Area (SEIFA) score for the small area,<br />

- Indica<strong>to</strong>r of remoteness<br />

Some of these variables were only available at the area level while those sourced from<br />

SDAC/Census, for example, age, sex and income, were available at the person level.<br />

These SDAC variables were chosen subject <strong>to</strong> the requirement that these variables were<br />

similarly defined and available from the census.<br />

Another key issue relating <strong>to</strong> auxiliary data concerns the case where survey data cannot<br />

be matched <strong>to</strong> auxiliary data sources. In order <strong>to</strong> make predictions for each small area,<br />

auxiliary variables obtained from the survey must correspond closely with similar data<br />

items available for the rest of the population. If this is not the case then model<br />

predictions may be significantly biased. For example in the empirical study of small area<br />

Australian Bureau of Statistics 16


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

estimates of disability, we used auxiliary variables such as age, sex, income and<br />

household structure, found on the SDAC survey file <strong>to</strong> fit the model and then used the<br />

corresponding variables on the population census file <strong>to</strong> make the small area<br />

predictions.<br />

When considering potential sources of auxiliary data it is highly advisable <strong>to</strong> cast a wide<br />

net and assess the value of data that may not on first reflection appear highly relevant.<br />

For example, in the context of disability data, an economic variable in addition <strong>to</strong> health<br />

related variable may have good predictive power. Some caution however needs <strong>to</strong> be<br />

exercised as it is possible that the correlation between the target and some of the more<br />

tenuous auxiliary variables is more due <strong>to</strong> coincidence than <strong>to</strong> an intrinsic real world<br />

relationship between the two. Such auxiliary variables are referred <strong>to</strong> as spurious<br />

auxiliary variables.<br />

Demographic information is a particular form of auxiliary information, relating <strong>to</strong><br />

population attributes such as age and sex. Many social variables will have some<br />

relationship <strong>to</strong> such demographic data thereby necessitating its use. However there is<br />

another reason for using demographic information and that is where the population size<br />

or demographic composition of small areas varies considerably. In Australia, with its<br />

extreme variation in population densities, this is a very common issue.<br />

Cross-sectional relationships<br />

Cross-sectional correlations are intrinsic relationships between units (observed at the<br />

same time point) with similar characteristics, even if they are not in the same small area.<br />

For example, units with the same age, sex and occupational characteristics may have<br />

similar health outcomes regardless of whether they live in Sydney or Melbourne. Small<br />

area methods borrow strength cross-sectionally by pooling sample data across a broader<br />

area (thus obtaining more statistical reliability) and then adjusting each small area<br />

estimate according <strong>to</strong> it's age-sex-occupation profile. In practice, borrowing strength<br />

cross-sectionally may be restricted <strong>to</strong> a predefined broader region if it is believed that<br />

cross-sectional relationships are likely <strong>to</strong> be different between regions. For example<br />

exposure <strong>to</strong> air pollutants is likely <strong>to</strong> be similar for Sydney and Melbourne but different<br />

<strong>to</strong> that of other cities. Hence Sydney and Melbourne may be combined in<strong>to</strong> a broader<br />

region within which cross-sectional relationships can be drawn upon.<br />

Time Series Relationships<br />

Borrowing strength across time enables the practitioner <strong>to</strong> effectively pool sample data<br />

across time. The sample in each small area may be very sparse at a given time point,<br />

however if a sufficiently long time series exists and au<strong>to</strong>-correlations across time are<br />

reasonably strong, data from a number of time points can be pooled <strong>to</strong>gether giving a<br />

larger effective sample size <strong>to</strong> utilize in each small area. Time series au<strong>to</strong>-correlations are<br />

utilised <strong>to</strong> adjust for the degree of similarity or dissimilarity between units observed at<br />

specified time periods apart. This approach also has the benefit of reducing the impact<br />

of an observed value that is discordant with its neighbouring values in time. Borrowing<br />

strength across time adds a considerable degree of complexity <strong>to</strong> small area estimation<br />

and should only be contemplated where statistical expertise is available.<br />

Australian Bureau of Statistics 17


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

Spatial Relationships<br />

Spatial relationships in the data can be harnessed in much the same way that time series<br />

relationships can be. Thus, if we hypothesize that different units bear some relationship<br />

<strong>to</strong> each other that depends upon the distance and direction between them, units can<br />

then be pooled <strong>to</strong>gether <strong>to</strong> give a greater effective sample size for each small area<br />

estimate. This approach also has the benefit of reducing the impact of the odd unit value<br />

that is discordant with its neighbouring values. Spatial methods are commonly used in<br />

the contexts of health, disease, agricultural or environmental data but may be quite<br />

applicable <strong>to</strong> other specific <strong>to</strong>pics.<br />

As in the case of time series relationships, borrowing strength through spatial<br />

relationships adds additional complexity <strong>to</strong> the small area estimation and should only be<br />

contemplated where statistical expertise is available.<br />

Multivariate Relationships<br />

In a univariate model the response or target variable is a single variable. In this manual<br />

the models referred <strong>to</strong> are univariate models. So using the example of disability type<br />

(physical, sensory, intellectual, psychological/psychiatric, head injury/acquired brain<br />

damage), a separate univariate model is fitted <strong>to</strong> each of the disability types. In a<br />

multivariate model, the target variable is a vec<strong>to</strong>r of these variables and the model is<br />

fitted <strong>to</strong> these variables simultaneously.<br />

A multivariate approach may be more efficient in terms of producing more accurate<br />

predictions if there are strong correlations between the constituent variables. For<br />

example, physical impairment may have a strong correlation with sensory impairment. A<br />

multivariate approach that takes advantage of this additional information should be<br />

more robust and give more accurate estimates. However, multivariate models add<br />

additional complexity <strong>to</strong> small area estimation and should only be contemplated where<br />

statistical expertise is available.<br />

3.2 Basic Conditions for Success<br />

The first step in undertaking a small area exercise is <strong>to</strong> determine the quality of the<br />

direct estimates and the auxiliary data at the small area level. The variable of interest is<br />

often drawn from a sample survey, which can not provide estimates at a fine level due <strong>to</strong><br />

small sample size in each small area and correspondingly high Relative Standard Errors<br />

(RSE's). Auxiliary data can be obtained from many sources including administrative<br />

datasets, survey variables and census counts. Table 3.1 outlines some issues that will<br />

help in determining whether the basic conditions for producing quality small area<br />

estimates are being met.<br />

Australian Bureau of Statistics 18


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

Table 3.1: Recipe for Success<br />

Ingredient<br />

Small Area Size<br />

Each small area should have a reasonable<br />

sample. Few small areas should have no<br />

sample.<br />

Variable of Interest<br />

Reasonably common population<br />

characteristic<br />

Consistent estimates across small areas<br />

Model Specification<br />

Model is well-specified, meaning that:<br />

o all main determinants or explana<strong>to</strong>rs<br />

(auxiliary variables) for the target variable<br />

are included in the model and<br />

o the model reflects the correct form of the<br />

relationship between the target variable<br />

and the auxiliary variables (eg linear,<br />

quadratic, logistic etc) and that variance<br />

structures are accounted for correctly.<br />

Auxiliary Data<br />

Strong theoretical relationship between<br />

auxiliary variable and population of interest<br />

<strong>Statistical</strong>ly significant relationships between<br />

auxiliary data and small area estimates.<br />

The auxiliary data has been accurately<br />

collected and maintained and uses similar<br />

scope and definitions <strong>to</strong> the survey data.<br />

No missing values<br />

Compatibility of auxiliary data with census<br />

data in terms of consistency of definitions of<br />

variables, measurement, timing and other<br />

issues.<br />

Confidentiality<br />

Maintain confidentiality standards<br />

Reason<br />

The smaller the sample the harder it is <strong>to</strong> reliably discern<br />

the characteristics of individual small areas. More reliance<br />

is then placed on the assumption that the small area is<br />

similar <strong>to</strong> others. It also becomes more difficult <strong>to</strong> identify<br />

relationships either in the data or with auxiliary data. This<br />

will lead <strong>to</strong> lower quality small area estimates .<br />

Similar reason <strong>to</strong> small area size. In the context of<br />

household surveys, the rarer the characteristic the smaller<br />

the likely sample<br />

Key assumption with simple synthetic models.<br />

Mis-specification may result in incorrect predictions and<br />

incorrect measures of the statistical reliability of those<br />

predictions.<br />

Allows easy identification of potential auxiliary variables<br />

and aids in explanation of method <strong>to</strong> users.<br />

Allows a reasonable small area model <strong>to</strong> be estimated.<br />

Eliminates a further source of error that would otherwise<br />

impact upon the quality of the final small area output.<br />

Missing values can bias estimates or cause model failure.<br />

Where possible ensure these have been accounted for<br />

before modelling.<br />

Reduces further sources of errors caused due <strong>to</strong><br />

inconsistency of definitions, measurement and other<br />

changes over time.<br />

ABS mission statement provides an assurance concerning<br />

the confidentiality of the data it collects.<br />

Australian Bureau of Statistics 19


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

3.3 Choice of Small Area<br />

Within the ABS, the choice of small areas generally aligns with pre-specified boundaries<br />

as defined by the Australian Standard Geographical Classification (ASGC). Each area<br />

within Australia is broken up in<strong>to</strong> Census Collection Districts (CDs) within <strong>Statistical</strong><br />

Local Areas (SLAs) within <strong>Statistical</strong> Subdivisions (SSDs) within <strong>Statistical</strong> Divisions (SDs)<br />

within states. If possible, it is generally advisable <strong>to</strong> use ASGC classifications as they<br />

provide a consistent and integrated framework with a readily available set of<br />

concordances. However, other agencies often have different boundaries for their<br />

administrative areas. These boundaries generally line up with council boundaries which<br />

again line up with SLAs. Another common boundary is the postcode which can be<br />

related <strong>to</strong> a CD, although only approximately. A new geographical unit, called the<br />

meshblock, will be introduced in the 20<strong>06</strong> population census for output purposes. The<br />

meshblock is considerably smaller than the CD, and with the help of the Geocoded<br />

<strong>National</strong> Address File (G-NAF) will improve the accuracy with which locations are coded<br />

<strong>to</strong> other ASGC classifications. In Section 2 we saw how it is important <strong>to</strong> find out from<br />

users the broadest area that will meet their small area requirements in order <strong>to</strong> improve<br />

the reliability of the modelled estimates. Figure 3.2 depicts the different choices that<br />

must be made <strong>to</strong> get reasonable estimates.<br />

Figure 3.2: Choosing the Appropriate Small Area<br />

Australian Bureau of Statistics 20


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

As discussed in Section 3.2 under "Auxiliary Data", a choice exists as <strong>to</strong> the level at which<br />

small area models should be applied. In practice, users may require small area data at<br />

different levels of aggregation and it may be expedient <strong>to</strong> fit the model at the finest level,<br />

produce small area estimates at that level and let users aggregate those estimates up <strong>to</strong><br />

the required levels of aggregation. However it is important <strong>to</strong> realise that this may run<br />

the risk of incurring what is known as the a<strong>to</strong>mistic fallacy (PAHO, 2003).<br />

The a<strong>to</strong>mistic fallacy occurs when trying <strong>to</strong> draw inferences about units defined at a<br />

higher level of aggregation from a model fitted at a lower level of aggregation.<br />

Relationships between lower levels of aggregated units may not be the same as those<br />

between higher level aggregated units. Hence if another model was fitted at the higher<br />

level, the estimated model parameters and the predictions may be quite different <strong>to</strong><br />

those for the model fitted at the lower level. A similar fallacy, called the ecological<br />

fallacy, may occur in the reverse situation of fitting a model at a broad regional level and<br />

assuming that the inferences drawn can be readily applied <strong>to</strong> small areas within those<br />

regions. Wherever possible it is advisable <strong>to</strong> make model inferences (that is predicted<br />

estimates and their associated measures of accuracy) at the small area level required by<br />

users. Where small area estimates are <strong>to</strong> be aggregated the extent of the aggregation<br />

should be kept <strong>to</strong> a minimum. If small area estimates are produced at the LGA level,<br />

aggregation <strong>to</strong> user defined regions consisting of only a few LGAs may be acceptable but<br />

aggregation of many LGA estimates should be met with caution.<br />

In choosing the most appropriate small area <strong>to</strong> use, consideration needs <strong>to</strong> be given <strong>to</strong><br />

the sample size in each small area. This needs <strong>to</strong> be sufficient so that the model can<br />

produce appropriately reliable estimates. The size of the sample will depend on the<br />

strengths of the cross-sectional relationships or other areas for borrowing strength. If<br />

these are quite strong then perhaps as few as ten or twenty will be sufficient. In the<br />

absence of strong relationships in the data, a larger sample size of perhaps a couple of<br />

hundred units may be required in each small area. The sample sizes referred <strong>to</strong> here<br />

should be interpreted as a very rough guide as, apart from model strength, the required<br />

sample size will also depend upon the variation of units within each small area.<br />

The number of small areas is also important especially if units are clustered within small<br />

areas. Generally having more small areas will help improve the goodness of fit of the<br />

small area model. Consideration should also be given <strong>to</strong> the geographical distribution<br />

of the sample through each small area. In ABS household surveys clustering is used in<br />

the sample design <strong>to</strong> help reduce costs, with the result that in remote areas all of the<br />

sampled dwellings in a small area may have been selected from one or more small <strong>to</strong>wns<br />

and none from throughout the vast rural expanse. This is likely <strong>to</strong> result in bias if the<br />

characteristics of people in those <strong>to</strong>wns are different <strong>to</strong> those in the remote rural areas.<br />

The allocation of the sample across small areas will often reflect the relative frequency<br />

with which the characteristic or variable of interest occurs in the population. In the case<br />

of a common sub-population such as the number of persons employed or the number of<br />

persons with a disability then local government area (LGA) may be a suitable choice of<br />

small area. For rare characteristics such as indigenous status or a particular type of<br />

illness then larger areas may be required <strong>to</strong> give reasonable estimates. For this, the<br />

<strong>Statistical</strong> Subdivision (SSD) or broader regions may be required.<br />

Of course the decision on which level of geography <strong>to</strong> choose for the small area will<br />

Australian Bureau of Statistics 21


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

ultimately hinge upon user decision making requirements. It makes sense <strong>to</strong> choose<br />

small area that are as close as possible <strong>to</strong> the areas used for program planning and<br />

implementation. However, such areas are often really no more than administrative<br />

regions, chosen for pragmatic or logistical reasons such as transport costs or workforce<br />

management efficiency. <strong>Statistical</strong> units within these administrative regions are not<br />

necessarily homogenous with respect <strong>to</strong> the variable we are trying <strong>to</strong> calculate small area<br />

estimates for. If this is the case it may be worth considering (subject <strong>to</strong> the minimum<br />

sample size requirement) small areas at a finer level with greater homogeneity <strong>to</strong> obtain<br />

a better fitting model. Small area estimates at this level can then be aggregated <strong>to</strong> the<br />

required administrative regional level.<br />

For example, in the disability empirical study, disability programs are funded and<br />

administered at the level of Disability and Health Services Regions (DHSR) which are<br />

aggregations of usually a few LGAs. LGA was considered sufficiently close for modelling<br />

purposes while also having the advantage of sufficient sample sizes and higher level of<br />

homogeneity with respect <strong>to</strong> disability characteristics.<br />

Another example is that of producing small area estimates of water usage. One might<br />

consider using water catchment areas because that is the level required by users,<br />

however these are not always standardised across water and energy authorities. There is<br />

also the problem of geocoding ASGC classifications on which ABS data is based <strong>to</strong> the<br />

water catchment area. Water catchment areas can also be vast along major river systems,<br />

encompassing very different land uses, rainfall patterns and geological drainage features.<br />

3.4 Variable of Interest<br />

The variable of interest is typically measured from an ABS sample survey. This forms our<br />

dependent variable <strong>to</strong> build the small area model around. If the proportion of the<br />

population with a characteristic of interest is constant across broad geographic areas<br />

(e.g. assuming each small area has say, the same rate of heart attacks within NSWs), then<br />

auxiliary data are not really needed and a simple technique such as the broad area ratio<br />

estima<strong>to</strong>r will give good results.<br />

In practice, however, this will be a strong assumption <strong>to</strong> make. If we believe that small<br />

area proportions vary with other fac<strong>to</strong>rs then auxiliary information will be required <strong>to</strong><br />

build a model. The auxiliary data can help explain the variation between small areas and<br />

assist in creating quality small area estimates.<br />

Another point for consideration is that in many applications there will be not just one<br />

but a number of variables of interest requiring small area estimates. Auxiliary data may<br />

not be available for each of these and the strength of the relationship between each<br />

variable of interest and the available auxiliary variables may vary markedly. Prioritising<br />

the variables of interest with users will assist in focusing effort <strong>to</strong> improve the quality of<br />

those estimates that matter most.<br />

3.5 Quality of Auxiliary Data<br />

Potential auxiliary data should be evaluated for their relationship <strong>to</strong> the variable(s) of<br />

interest, both theoretically and statistically as well as the accuracy and reliability with<br />

which they have been collected. The theoretical relationship should emanate from<br />

tested social or economic theories. A careful examination should be made <strong>to</strong> understand<br />

Australian Bureau of Statistics 22


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

any major differences between the auxiliary data and the variables of interest.<br />

Consideration should be given <strong>to</strong> the purpose for which the data was initially collected,<br />

how was it processed and edited, what conceptual definitions were used and what is the<br />

scope of the auxiliary data holdings. This will allow appropriate auxiliary information <strong>to</strong><br />

be chosen <strong>to</strong> improve the model, aid in explaining <strong>to</strong> users what fac<strong>to</strong>rs are driving the<br />

small area estimates and help pinpoint potential sources of error.<br />

In summary the following aspects should always be examined carefully when<br />

considering administrative data for use as auxiliary variables:<br />

o<br />

o<br />

o<br />

o<br />

o<br />

o<br />

o<br />

o<br />

o<br />

o<br />

Population scope of the data<br />

Definitions of variables / concepts used<br />

Purpose for collecting data / what is it used for<br />

Reference period<br />

Questionnaire (or form) and collection methodology used <strong>to</strong> collect the data,<br />

Survey design used<br />

Quality of the framework used <strong>to</strong> select units from<br />

The extent of missing data. What if any, imputation treatments were used?<br />

Classifications used<br />

Editing or data validation process used<br />

In the disability study, auxiliary data was sourced from Centrelink on the number of<br />

people receiving the Disability Support Pension (DSP). In areas with a greater<br />

proportion of people receiving the DSP we would expect a higher incidence of disability.<br />

A person’s eligibility <strong>to</strong> receive the DSP is related <strong>to</strong> their ability <strong>to</strong> undertake<br />

employment related activities, whereas the ABS Survey of Disability, Ageing and Carers<br />

(SDAC) concept of disability relates <strong>to</strong> a person’s ability <strong>to</strong> undertake a wide range of<br />

household, social as well as employment activities.<br />

There are a number of simple approaches for evaluating the strength of the statistical<br />

relationship between the variable of interest and the auxiliary data. The strength and<br />

statistical significance of this relationship can be analysed through simple scatter plots,<br />

correlations or simple models. Where substantial differences between the data do<br />

appear, it may be possible in some circumstances <strong>to</strong> improve the statistical relationship<br />

by the application of suitable adjustments or imputation methods <strong>to</strong> the auxiliary data <strong>to</strong><br />

make it more comparable with the response variable. The aim of these adjustments may<br />

be <strong>to</strong> reduce the impact of scope or definitional differences or <strong>to</strong> treat outliers in the<br />

auxiliary data. Such adjustments may help <strong>to</strong> improve the statistical relationship between<br />

the auxiliary data and the response variable. However it is important that a statistician be<br />

consulted before applying such adjustments.<br />

Australian Bureau of Statistics 23


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

3.6 Confidentiality<br />

Protecting the confidentiality of data provided <strong>to</strong> the ABS is of utmost importance and is<br />

enshrined in the Census and Statistics Act, 1905. The risk of breaches of confidentiality<br />

need <strong>to</strong> be carefully assessed in the case of small area data releases, as such releases<br />

naturally produce a higher level of detail than is normally the case. Hence care must be<br />

taken <strong>to</strong> ensure that the potential for identifying individual persons or businesses is<br />

greatly reduced. The risk of identification is increased when:<br />

o<br />

o<br />

o<br />

The population of interest is quite rare<br />

The geographic area is very small<br />

A major part of the small area estimate can be attributed <strong>to</strong> units with unusual<br />

characteristics. (Such as in the case of doc<strong>to</strong>rs in remote areas or the<br />

telecommunications sec<strong>to</strong>r)<br />

The release of small area estimates should follow the standard ABS guidelines.. While the<br />

fine level of geography increases the risk of identification, this risk may <strong>to</strong> some extent<br />

be mitigated by the inherent smoothing of the data and additional model error<br />

introduced by the modeling process itself. However this does not mean that all caution<br />

can be thrown <strong>to</strong> the wind. Most small area projects will be commissioned by external<br />

agencies and individuals in these or other agencies may be realistically expected <strong>to</strong> be in<br />

a position <strong>to</strong> obtain knowledge of the models used <strong>to</strong> produce the small area estimates.<br />

Such information could possibly be used <strong>to</strong> identify individuals. Another issue is that<br />

although most small area estimates will be modeled and hence incur model error and<br />

further smoothing, there is the risk that an individual is correctly identified from the data<br />

although using incorrect logic. There will still be a public perception that the Act has<br />

been breached. In conclusion, all possible steps <strong>to</strong> avoid disclosure should be taken in<br />

preparing small area data for release and the Data Access and Confidentiality<br />

Methodology Unit should be consulted prior <strong>to</strong> release.<br />

Australian Bureau of Statistics 24


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

4. Choice of Small Area Techniques<br />

4.1 Types of Small Area Estimation Techniques<br />

In this section we discuss some of the more common techniques available for small area<br />

estimation. We consider these techniques under the general headings of "Simple Small<br />

area Methods" (Section 4.1.1) and "Regression Methods" (Section 4.1.2). Although the<br />

methods discussed under Section 4.1.1 can be formulated in terms of a regression<br />

model, and hence would conceptually belong under Section 4.1.2, we have treated them<br />

separately because they are:<br />

1.<br />

2.<br />

simple <strong>to</strong> implement and require less statistical expertise. They are also commonly<br />

used <strong>to</strong> produce small area estimates by many government agencies.<br />

they are often appropriate as an initial exercise <strong>to</strong> obtain "rough" small area<br />

estimates, before attempting more rigorous techniques<br />

4.1.1 Simple Small Area Methods<br />

Here we discuss the simpler methods that involve weighted survey estimates derived for<br />

a given level of geography that can be applied without the explicit application of<br />

statistical models. These methods include the:<br />

!<br />

Direct Estima<strong>to</strong>r<br />

Direct estimates are classical design-based estima<strong>to</strong>rs that are obtained by<br />

applying survey weights <strong>to</strong> the sample units in each small area (Saei and<br />

Chambers, 2003). Since most ABS surveys are designed <strong>to</strong> provide reliable<br />

estimates only at the national or state levels, sample sizes are often <strong>to</strong>o small at<br />

the small area level <strong>to</strong> produce reliable direct estimates. Small area estimation<br />

is therefore concerned with alternative techniques that can produce small area<br />

estimates with higher accuracy than that of direct estimates.<br />

!<br />

Broad Area Ratio Estima<strong>to</strong>r (BARE)<br />

This estima<strong>to</strong>r is one of the simplest types of synthetic estima<strong>to</strong>rs. It is<br />

calculated by pro-rating a broad area direct estimate by the ratio of the small<br />

area <strong>to</strong> broad area populations. This estima<strong>to</strong>r applies the reliable broad area<br />

estimate proportionately across all small areas contained in the broad region.<br />

The success of the BARE estima<strong>to</strong>r hinges largely on the choice of the broad<br />

area. The broad area needs <strong>to</strong> be chosen large enough <strong>to</strong> afford a direct<br />

estimate that is sufficiently reliable but small enough that all small areas within<br />

the broad area are sufficiently homogenous in the characteristic of interest. It<br />

is important <strong>to</strong> note that if small areas are in fact not homogenous within the<br />

broad region then the BARE will be biased. In practice this is difficult <strong>to</strong> verify<br />

hence caution should be exercised when using the BARE. It should only be<br />

used when users are aware of, and fully prepared <strong>to</strong> accept this assumption.<br />

Australian Bureau of Statistics 25


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

!<br />

Calibration Estima<strong>to</strong>r<br />

To produce calibration estima<strong>to</strong>rs, the original survey weights (usually the<br />

inverse probabilities of inclusion in the sample) are replaced with new<br />

"calibrated" weights that are in some sense as close as possible <strong>to</strong> the original<br />

weights, but are calibrated on some auxiliary variable available for the<br />

population (Chambers, 2005). The small area estimate for this auxiliary<br />

variable, calculated using the calibrated weights, will agree with the known<br />

population <strong>to</strong>tals. A simple example of calibration is where population age by<br />

gender demographic <strong>to</strong>tals are known for each small area. The survey weights<br />

are then adjusted so that estimates of population count by age and gender,<br />

agree with the known population counts.<br />

There are a couple of points <strong>to</strong> note about the calibration estima<strong>to</strong>r. Firstly it is<br />

a straightforward method <strong>to</strong> put in<strong>to</strong> production because the resulting<br />

adjusted (calibrated) weights can be s<strong>to</strong>red on the survey file and used <strong>to</strong><br />

produce estimates at the desired level of aggregation. Secondly the auxiliary<br />

variables should be chosen with care and should relate <strong>to</strong> variables we wish <strong>to</strong><br />

produce estimates for. If the calibrated weights are used <strong>to</strong> produce estimates<br />

for variables that aren’t related <strong>to</strong> the auxiliary variable(s) used in determining<br />

the calibrated weights, the resulting estimates may be biased. In general<br />

calibrated estimates possess good design-based properties. Government<br />

statisticians have his<strong>to</strong>rically preferred the design-based <strong>to</strong> the model-based<br />

approach as the resulting estimates are not subject <strong>to</strong> the consequences of<br />

model mis-specification.<br />

4.1.2 Regression Methods<br />

Where a higher level of accuracy is required for small area estimates, an alternative is <strong>to</strong><br />

use regression or model-based approaches, however these methods require a higher<br />

level of statistical expertise <strong>to</strong> implement and interpret results. A wide variety of<br />

different regression techniques are available, but for the purposes of this manual, they<br />

are divided in<strong>to</strong> two main categories: synthetic and random effects regression models.<br />

!<br />

Synthetic Regression Models<br />

Synthetic regression models make use of available auxiliary data <strong>to</strong><br />

mathematically express a deterministic relationship between those auxiliary<br />

variables and the target (response) variable we are trying <strong>to</strong> predict in each<br />

small area. Synthetic models assume that all the systematic variability in the<br />

response variable is explained by the variability in the values of the auxiliary<br />

variables. The remaining variability, which is referred <strong>to</strong> as the "random noise"<br />

or "s<strong>to</strong>chastic variation", is represented by the difference between the<br />

predicted value for the response variable under the model and the value<br />

observed from the data. These differences are called random errors, residuals<br />

or disturbances.<br />

In the case of small area models, synthetic models assume that the same<br />

deterministic relationship between the variable of interest and the auxiliary<br />

variables, holds across a range of small areas, say for example within a state.<br />

Australian Bureau of Statistics 26


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

Synthetic models work well when all relevant auxiliary variables that help<br />

predict the response variable are available, accurate and can be included in the<br />

model. However in practice this is more the exception than the rule.<br />

!<br />

Random Effects Regressions Models<br />

When fitting a synthetic model, the residuals should look like "white noise",<br />

however in practice they often display significant between area variation which<br />

indicates that there is some other systematic variation in the response variable<br />

between different small areas that is not being accounted for by the auxiliary<br />

variables. This implies that the synthetic model is missing certain auxiliary<br />

variables, the values of which would, had they been available, better help<br />

predict differences between small areas.<br />

This problem can be addressed by incorporating a random effect in<strong>to</strong> the<br />

model. This is done by treating the constant or intercept term in the model as<br />

a fixed constant plus a random component known as the random effect. The<br />

interpretation of this is that each small area is assigned an intercept term in the<br />

model which is allowed <strong>to</strong> vary, around some overall constant value, from one<br />

small area <strong>to</strong> another. This is usually sufficient <strong>to</strong> take account of between area<br />

variation, however it is possible <strong>to</strong> include a random effect in a parameter<br />

coefficient rather than the intercept term. Doing this further adds <strong>to</strong> the level<br />

of complexity and is not covered in this manual.<br />

For models fitted <strong>to</strong> small area level data, the inclusion of random effects may<br />

give a distinct advantage over the synthetic model approach, possibly leading<br />

<strong>to</strong> estimates with higher precision and robustness. In the case of linear models,<br />

random effects model can theoretically be shown <strong>to</strong> give small area estimates<br />

that reflect the best trade-off between the accuracy of the direct estimate and<br />

the uncertainty associated with the synthetic model. So for a small area that<br />

happens <strong>to</strong> have a low sampling error (eg because of a large sample size, say)<br />

relative <strong>to</strong> the <strong>to</strong>tal error (sum of sampling error of the direct estimate and<br />

synthetic model error), a random effects model will give more weight <strong>to</strong> the<br />

direct estimate for that small area. On the other hand, for a small area with<br />

high sampling error, more weight will be given <strong>to</strong> the model based estimate as<br />

this will be more reliable.<br />

While being more complex than synthetic models, random effects models can<br />

be estimated using a variety of statistical techniques. However due <strong>to</strong> their<br />

technical nature, this manual will not go in<strong>to</strong> any further detail about how <strong>to</strong><br />

apply random effects models. A more detailed treatment will be given in the<br />

forthcoming technical manual.<br />

Australian Bureau of Statistics 27


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

4.2 The Modelling Framework<br />

Figure 4.1 presents a schematic representation of the small area modeling framework<br />

followed in this manual. Figure 4.2 complements Figure 4.1 by providing a list of key<br />

questions the purpose of which is <strong>to</strong> aid the decision making process of small area<br />

modeling in a reasonably systematic approach. The objective of these questions is <strong>to</strong><br />

help the modeller/analyst better understand the modeling framework (Figure 4.1) and<br />

hence be able <strong>to</strong> choose the most appropriate technique for a given set of data. This,<br />

however, does not mean these are the only questions that need <strong>to</strong> be raised in this kind<br />

of exercise.<br />

The left-hand-side of Figure 4.1 shows the simplest small area methods, these being the<br />

Direct and Broad Area Ratio estima<strong>to</strong>rs, which are frequently used in the absence of<br />

good quality auxiliary data. The answer <strong>to</strong> question 1 of Figure 4.2 is important as good<br />

quality auxiliary data is a key requisite in order <strong>to</strong> proceed <strong>to</strong> the regression-based small<br />

area estima<strong>to</strong>rs. We take good auxiliary data <strong>to</strong> mean area-level and/or unit-level data<br />

that are potentially correlated (both theoretically and empirically) with the variable of<br />

interest. Section 3.5 discusses some of the ways the quality of auxiliary data can be<br />

determined. The quality of the auxiliary data, therefore, has a large bearing on the<br />

reliability of model predictions for the variable of interest. In other words, when good<br />

quality auxiliary data is available one can choose among a number of regression-based<br />

estima<strong>to</strong>rs that “borrow strength” from the relationship between the variable of interest<br />

and the auxiliary data; thereby improving the quality of small area estimates/predictions.<br />

Australian Bureau of Statistics 28


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

Figure 4.1: Small Area Modelling Framework<br />

Small Area<br />

Methods<br />

Simple Small Area<br />

Models<br />

Regression based<br />

Models<br />

Less complex<br />

More complex<br />

Direct<br />

Estima<strong>to</strong>r<br />

Broad Area<br />

Ratio<br />

Estima<strong>to</strong>r<br />

Linear Models for<br />

- Continuous data<br />

With No Auxiliary<br />

data<br />

With Auxiliary<br />

data<br />

Synthetic<br />

Regression<br />

Models<br />

Area<br />

Level<br />

Analysis<br />

Unit<br />

Level<br />

Analysis<br />

Random<br />

Effects<br />

Models<br />

Generalised Linear Models<br />

- Count data(poisson model)<br />

- Binary data (logistic model)<br />

Univariate Analysis<br />

Multivariate Analysis<br />

The classes of regression based estima<strong>to</strong>rs are shown in the right-hand-side of Figure<br />

4.1. These estima<strong>to</strong>rs can be classified in<strong>to</strong> two major categories, namely, the synthetic<br />

regression models and the random effects models which are relatively more complex<br />

than their synthetic counterparts. For the moment let us focus on the synthetic models.<br />

Once this choice is made the next choice is between a linear or generalised linear<br />

model. The Linear model, which is the simplest of all, is suitable if the variable of interest<br />

is continuous (e.g.; income, age, etc. ). If the variable of interest is not continuous<br />

(binary or count data) one can select appropriately from a wide range of Generalised<br />

Linear Models. The most common examples are the Logistic and Poisson models which<br />

are used <strong>to</strong> model binary and count data, respectively.<br />

Australian Bureau of Statistics 29


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

Clearly, as indicated in questions 2 <strong>to</strong> 3 of Figure 4.2, the choice of any of these or other<br />

models depends on the following important interrelated fac<strong>to</strong>rs:<br />

i.<br />

ii.<br />

iii.<br />

iv.<br />

v.<br />

the level at which the small area estimates are required. Are small area estimates<br />

required at area-level or at some other sub-population such as age by sex group.<br />

the nature of the auxiliary data available related <strong>to</strong> the variable of interest. Again,<br />

these may include whether the data is at the unit-level (person-level), area-level or<br />

both.<br />

the nature of the variable of interest, i.e., whether it is continuous, binary or count<br />

data.<br />

users quality requirements for small area estimates<br />

access <strong>to</strong> statistical expertise<br />

Small area models can be fitted either at area-level or person-level. Area level models are<br />

fitted when the variable of interest and associated covariates in the auxiliary data are<br />

observed at the level of the specific geographic area, which is referred in Figure 4.1 as<br />

area-level analysis. On the other hand a unit/person-level analysis refers <strong>to</strong><br />

unit/person-level model that makes use of individual/unit level data in the analysis. When<br />

a model is fitted using unit/person-level data then the predictions based on this model<br />

must be aggregated <strong>to</strong> produce area-level estimates. It is also possible <strong>to</strong> fit a unit/person<br />

level model involving both individual and area-level covariates.<br />

Choosing the right model for the right type of data is crucial in the modelling process.<br />

For example, if the auxiliary information consists of data observed at area or unit level<br />

and the variable of interest is of a continuous nature, then it will be appropriate <strong>to</strong> use a<br />

linear model <strong>to</strong> estimate the variable of interest. Alternatively, if we have unit level data<br />

where the variable of interest is binary (e.g., 1= person has a disability and 0 = person<br />

has no disability) which is usually the case in many small area models, then we would go<br />

for a model that captures the binary nature of the observations, such as the logistic<br />

regression model. Similarly, if our data provides, say, area level count data of people<br />

with a disability then a suitable choice would be the Poisson model which is appropriate<br />

for count data models. It is also possible <strong>to</strong> use two or more models (e.g., unit-level and<br />

area-level models) provided that the dataset is amenable <strong>to</strong> such analyses . For instance,<br />

as we will see in the examples of Section 5 , the logistic and Poisson models are used <strong>to</strong><br />

predict person-level and area-level disability proportions, respectively.<br />

Australian Bureau of Statistics 30


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

Figure 4.2: Key Questions for Small Area Modelling<br />

If NO<br />

Q1. Do you have good quality auxiliary Data?<br />

If Yes<br />

Use Linear or Generalised<br />

linear models, depending<br />

on your data.<br />

Q2. Is the variable of interest of continuous, binary or<br />

count data?<br />

If<br />

Continuous<br />

data: Linear<br />

model<br />

If Binary<br />

data:<br />

Logistic<br />

Model<br />

If Count<br />

Data:<br />

Poisson<br />

model<br />

Simple<br />

Direct or<br />

Broad Area<br />

Ratio<br />

Estima<strong>to</strong>rs<br />

are the<br />

likely<br />

candidates.<br />

Q3. Shall I use an area-level or unit-level model or both?<br />

Q4. At what level is my auxiliary data available and of<br />

good quality?<br />

Good Area Level<br />

or unit level<br />

continuous data<br />

Good<br />

Unit Level binary<br />

data<br />

Good Area Level<br />

count data<br />

Q5. Are there likely <strong>to</strong> be major differences between<br />

small areas that are not taken in<strong>to</strong> account by the<br />

auxiliary data?<br />

If Yes<br />

Use random effects model<br />

Consult methodology<br />

staff for technical advice<br />

The next key question (as indicated by questions 5 of Figure 4.2) is when and why do we<br />

use the random effects models as compared <strong>to</strong> the synthetic models. To start with, the<br />

preceding discussion on the choice of models (linear versus generalised linear) also<br />

applies <strong>to</strong> the random effects models as well. However, the random effects models are<br />

different in that they include an additional error component <strong>to</strong> account for differences<br />

between units that aren’t explained by the auxiliary variables. In other words, synthetic<br />

models assume that the variable of interest can be determined from the same functional<br />

relationship with the auxiliary variables, and that this relationship applies across all small<br />

areas.<br />

This assumption, however, could be restrictive for a number of reasons. For example, in<br />

the disability data some small areas are located in remote areas with limited support<br />

facilities and services while others are in big cities with better infrastructure and services<br />

where people with disability could move there <strong>to</strong> take advantage of the improved<br />

services. Some areas are may have larger population of indigenous people relative <strong>to</strong><br />

others which again may affect disability rates in different areas. Yet, others are located in<br />

coastal areas that attract people of retirement age and the elderly. These fac<strong>to</strong>rs are not<br />

fully accounted for in the auxiliary data. Thus, unless these and other fac<strong>to</strong>rs are taken<br />

in<strong>to</strong> account in the model, they could limit the predictive abilities of synthetic models<br />

Australian Bureau of Statistics 31


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

for some small areas or units. Such differences, therefore, call for a more general/flexible<br />

specification of the models <strong>to</strong> capture the area-specific (person-specific) fac<strong>to</strong>rs after<br />

taking account of the auxiliary variables - and hence the random effects models. Thus,<br />

the choice between random effects models versus synthetic models could be made on<br />

the basis of one or more of the following fac<strong>to</strong>rs:<br />

i.<br />

ii.<br />

iii.<br />

iv.<br />

v.<br />

prior knowledge of small areas or units vis-a-vis the auxiliary data gained from<br />

experience or through discussions with subject matter specialists,<br />

users/stakeholders, etc. (for example, we may not have a lot of faith in our auxiliary<br />

variables /the synthetic model).<br />

from statistical outcomes based on the models. A close assessment or evaluation of<br />

the small area estimates/predictions from comparative synthetic and random effects<br />

models and see whether they meet expectations.<br />

on the basis of statistical/econometric tests (a battery of diagnostic and statistical<br />

tests) on the adequacy of the models.<br />

when one wants small areas with large samples <strong>to</strong> be less affected by the model<br />

because the direct estimates for such areas can be expected <strong>to</strong> be quite reliable in<br />

their own right. The random effects model allows for a suitable trade-off between the<br />

reliability of the direct estimates and reliability of model estimates.<br />

when one wants <strong>to</strong> apply the model <strong>to</strong> areas with no sample in them (out of sample<br />

areas). Random effects models allow for greater flexibility in applying the model <strong>to</strong><br />

make predictions for areas other than those <strong>to</strong> which it was fitted.<br />

Clearly, once the random effects models are chosen they require a higher level of<br />

statistical skill and some familiarity with specialised software. It is also true that more<br />

complex models may not necessarily provide better results. This is particularly true if<br />

sufficiently strong relationships in the data, from which <strong>to</strong> borrow strength, are simply<br />

not present in the data. One should be aware that results from simple models may be as<br />

good as those from complex ones. In other words, as will be discussed later in this<br />

section, the gains in efficiency of estimates from using more complex models need <strong>to</strong> be<br />

assessed.<br />

An important aspect of the modeling process which may also have significant bearing on<br />

the complexity and quality of the analysis is whether the variable of interest involves a<br />

univariate or multivariate analytical framework. Here we are specifically referring<br />

whether the variable of interest is a univariate or multivariate form. For example, in the<br />

disability study, if our variable of interest is simply <strong>to</strong> predict whether a person has an<br />

impairment or not (i.e., 1= person has a disability and 0= person has no disability)<br />

regardless of the type of impairment then this is within a univariate framework. On the<br />

other hand, a breakdown of the variable of interest by type of impairment (e.g., physical,<br />

mental, sensory etc.) would involve a multivariate framework. The real issue here is that<br />

while a univariate analysis is simpler <strong>to</strong> undertake, a multivariate analysis provides an<br />

opportunity <strong>to</strong> exploit additional information on the correlations that exist between the<br />

various types of impairment and hence improve the reliability of estimates.<br />

Australian Bureau of Statistics 32


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

4.3 Trade-off between Quality, Cost, Time and Effort<br />

In this manual, the term ‘quality’ is used <strong>to</strong> indicate the overall level of accuracy,<br />

acceptability and reliability of small area estimates, both from a statistical point of view<br />

and in terms of providing a more informed and reliable decision making capability for<br />

users. More specifically, we borrow from the characterisation of quality as having six<br />

dimensions, these being: relevance, accuracy, timeliness, accessibility, interpretability<br />

and coherence (Allen, 2001). The ABS has a strategic policy of ensuring the quality of all<br />

its output and clearly demonstrating that quality <strong>to</strong> users (ABS, 2002), and this is<br />

particularly relevant and important for the production and release of small area<br />

estimates.<br />

A key aspect that needs <strong>to</strong> be taken in<strong>to</strong> consideration is whether the gains in terms of<br />

quality of outputs from using more complex methods outweigh the time, costs and<br />

effort required <strong>to</strong> generate, interpret and validate the results. Regardless of the degree of<br />

sophistication contemplated at the outset, the small area practitioner is well advised <strong>to</strong><br />

commence with simpler techniques (say, synthetic models). Should resources and user<br />

requirements permit, more rigorous statistical techniques may be applied in stages<br />

resulting in a choice of competing models (say, fitting both synthetic and random effects<br />

models <strong>to</strong> the data). Choosing the best model in light of expert knowledge and<br />

informed judgment would lead <strong>to</strong> improved results and decision making outcomes.<br />

Figure 4.3 below provides some indication of the trade-off between quality, cost, time<br />

and effort in small area modeling. It should be clear that these relationships are not<br />

linear in nature and one cannot authoritatively represent such relationship in a simple<br />

two-dimensional diagram like this. The purpose is, however, <strong>to</strong> provide a rough idea on<br />

the kind of relationships that may exist between quality and cost/time/effort. In Figure<br />

4.3, quality is represented by the vertical axis and could range from, say, low <strong>to</strong> high<br />

levels of quality. The horizontal axis represents cost/time/effort combined. The three<br />

terms (cost/time/effort) are presented in this way as they are interrelated and one is<br />

implicitly defined by the other. In simple terms, it is assumed that increased effort<br />

implies a longer time frame and presupposes more resources and higher costs.<br />

Australian Bureau of Statistics 33


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

Figure 4.3 : Trade-off between Quality and Cost / Time / Effort<br />

<strong>Statistical</strong> expertise<br />

Robustness of results<br />

Understanding results<br />

Interpretability<br />

Issues<br />

Validity of assumptions<br />

User requirements<br />

Availability of resources<br />

Timeliness/deadlines<br />

Quality<br />

Simple<br />

models<br />

Complex<br />

models<br />

Level of<br />

precision<br />

Finer<br />

disaggregation<br />

Good auxiliary data<br />

Cost/time/effort<br />

As you can see from Figure 4.3, good quality auxiliary data is a crucial prerequisite for<br />

obtaining quality small area estimates. Quality is of course a relative term and depends<br />

very much upon the clients’ decision making requirements.<br />

Assuming that we have good quality auxiliary data, we would expect more sophisticated<br />

methods <strong>to</strong> provide results of a higher level of quality, as indicated by the upward slope<br />

of the cost-quality curve. The same curve also indicates that somewhere in the<br />

continuum there exists an optimal point (a level of precision) whereby any additional<br />

effort/cost/time from that point on, has either marginal or declining effects on quality.<br />

More elaborate techniques may give only marginal improvements in accuracy but<br />

decrease timeliness, an important dimension of quality. Overall quality may also be<br />

eroded when exceedingly smaller areas or finer disaggregations of the data are<br />

demanded. For example, in the disability analysis, disaggregating disability by type of<br />

impairment, level of severity and age group, in addition <strong>to</strong> the small area level, leads <strong>to</strong><br />

poor quality estimates, especially for the rarest impairment types such as sensory.<br />

There are also other issues, as shown just above the cost-curve in Figure 4.3, that may<br />

have significant bearing in relation <strong>to</strong> quality and cost of small area estimates. For<br />

example, the use of more complex models may require a higher level of subject matter<br />

knowledge and expertise <strong>to</strong> assist in understanding and interpreting model results.<br />

Such knowledge is also important for testing the validity of assumptions inherent in the<br />

model and in checking the robustness and sensitivity of model results.<br />

Finally, there are important points that have <strong>to</strong> be made in relation <strong>to</strong> the quality versus<br />

cost issue discussed above. Firstly, that simplicity is an important aspect of quality in<br />

that it aids the interpretability of small area output. We do not intend <strong>to</strong> imply from<br />

Figure 4.3 that simpler methods always imply poor quality estimates. More complex<br />

methods should only be attempted where there are likely <strong>to</strong> be demonstrable gains in<br />

Australian Bureau of Statistics 34


A Guide <strong>to</strong> Small Area Estimation - Version 1.1 05/05/20<strong>06</strong><br />

the accuracy of small area estimates. Secondly , the use of sophisticated methods may<br />

not necessarily lead <strong>to</strong> higher costs. For instance, once a strong analytic capability <strong>to</strong><br />

undertake small area estimation has been established (in terms of statistical skill and<br />

other resources) any increase in cost, effort and time for undertaking complex small area<br />

methods may be marginal.<br />

Australian Bureau of Statistics 35

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!