26.12.2012 Views

Current Population Survey Design and Methodology - Census Bureau

Current Population Survey Design and Methodology - Census Bureau

Current Population Survey Design and Methodology - Census Bureau

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Hurwitz, <strong>and</strong> Bershad, 1961) <strong>and</strong> correlated response variance,<br />

one form of which is interviewer variance (a measure<br />

of the variability among responses obtained by different<br />

interviewers over repeated administrations). Similarly,<br />

when a particular design-estimator fails over repeated<br />

sampling to include a particular set of population units in<br />

the sampling frame or to ensure that all units provide the<br />

required data, bias can be viewed as having components<br />

such as coverage bias, unit nonresponse bias, or item nonresponse<br />

bias (Groves, 1989). For example, a survey<br />

administered solely by telephone could result in coverage<br />

bias for estimates relating to the total population if the<br />

nontelephone households were different from the telephone<br />

households with respect to the characteristic being<br />

measured (which almost always occurs).<br />

One common theme of these types of models is the<br />

decomposition of total mean squared error into two sets<br />

of components, one resulting from the fact that estimates<br />

are based on a sample of units rather than the entire<br />

population (sampling error) <strong>and</strong> the other due to alternative<br />

specifications of procedures for conducting the<br />

sample survey (nonsampling error). (Since nonsampling<br />

error is defined negatively, it ends up being a catch-all<br />

term for all errors other than sampling error, <strong>and</strong> can<br />

include issues such as individual behavior.) Conceptually,<br />

nonsampling error in the context of statistical science has<br />

both variance <strong>and</strong> bias components. However, when total<br />

mean squared error is decomposed mathematically to<br />

include a sampling error term <strong>and</strong> one or more other nonsampling<br />

error terms, it is often difficult to categorize<br />

such terms as either variance or bias. The term nonsampling<br />

error is used rather loosely in the survey literature to<br />

denote mean squared error, variance, or bias in the precise<br />

mathematical sense <strong>and</strong> to imply error in the more general<br />

sense of process mistakes (see next section).<br />

Some nonsampling error components which are conceptually<br />

known to exist have yet to be expressed in practical<br />

mathematical models. Two examples are the bias associated<br />

with the use of a particular set of interviewers <strong>and</strong><br />

the variance associated with the selection of one of the<br />

numerous possible sets of questions. In addition, the estimation<br />

of many nonsampling errors—<strong>and</strong> sampling<br />

bias—is extremely expensive <strong>and</strong> difficult or even impossible<br />

in practice. The estimation of bias, for example,<br />

requires knowledge of the truth, which may be sometimes<br />

verifiable from records (e.g., number of hours paid for by<br />

employer) but often is not verifiable (e.g., number of<br />

hours actually worked). As a consequence, survey organizations<br />

typically concentrate on estimating the one component<br />

of total mean squared error for which practical<br />

methods have been developed—variance.<br />

It is frequently possible to construct an unbiased estimator<br />

of variance. In the case of complex surveys like the<br />

CPS, estimators have been developed that typically rely on<br />

the proposition— usually well-grounded—that the variability<br />

among estimates based on various subsamples of the<br />

one actual sample is a good proxy for the variability<br />

among all the possible samples like the one at h<strong>and</strong>. In<br />

the case of the CPS, 160 subsamples or replicates are used<br />

in variance estimation for the 2000 design. (For more specifics,<br />

see Chapter 14.) It is important to note that the estimates<br />

of variance resulting from the use of this <strong>and</strong> similar<br />

methods are not merely estimates of sampling<br />

variance. The variance estimates include the effects of<br />

some nonsampling errors, such as response variance <strong>and</strong><br />

intra-interviewer correlation. On the other h<strong>and</strong>, users<br />

should be aware of the fact that for some statistics these<br />

estimates of st<strong>and</strong>ard error might be statistically significant<br />

underestimates of total error, an important consideration<br />

when making inferences based on survey data.<br />

To draw conclusions from survey data, samplers rely on<br />

the theory of finite population sampling from a repeated<br />

sampling perspective: If the specified sample designestimator<br />

methodology were implemented repeatedly <strong>and</strong><br />

the sample size sufficiently large, the probability distribution<br />

of the estimates would be very close to a normal distribution.<br />

Thus, one could safely expect 90 percent of the estimates<br />

to be within two st<strong>and</strong>ard errors of the mean of all possible<br />

sample estimates (st<strong>and</strong>ard error is the square root<br />

of the estimate of variance) (Gonzalez et al., 1975; Moore,<br />

1997). However, one cannot claim that the probability is<br />

.90 that the true population value falls in a particular interval.<br />

In the case of a biased estimator due to nonresponse,<br />

undercoverage, or other types of nonsampling error, confidence<br />

intervals may not cover the population parameter at<br />

the desired 90-percent rate. In such cases, a st<strong>and</strong>ard<br />

error estimator may indirectly account for some elements<br />

of nonsampling error in addition to sampling error <strong>and</strong><br />

lead to confidence intervals having greater than the nominal<br />

90-percent coverage. On the other h<strong>and</strong>, if the bias is<br />

substantial, confidence intervals can have less than the<br />

desired coverage.<br />

QUALITY MEASURES IN STATISTICAL PROCESS<br />

MONITORING<br />

The process of conducting a survey includes numerous<br />

steps or components, such as defining concepts, translating<br />

concepts into questions, selecting a sample of units<br />

from what may be an imperfect list of population units,<br />

hiring <strong>and</strong> training interviewers to ask people in the<br />

sample unit the questions, coding responses into predefined<br />

categories, <strong>and</strong> creating estimates that take into<br />

account the fact that not everyone in the population of<br />

interest had a chance to be in the sample <strong>and</strong> not all of<br />

those in the sample elected to provide responses. It is a<br />

process where the possibility exists at each step of making<br />

a mistake in process specification <strong>and</strong> deviating during<br />

implementation from the predefined specifications.<br />

13–2 Overview of Data Quality Concepts <strong>Current</strong> <strong>Population</strong> <strong>Survey</strong> TP66<br />

U.S. <strong>Bureau</strong> of Labor Statistics <strong>and</strong> U.S. <strong>Census</strong> <strong>Bureau</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!