Current Population Survey Design and Methodology - Census Bureau
Current Population Survey Design and Methodology - Census Bureau
Current Population Survey Design and Methodology - Census Bureau
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Hurwitz, <strong>and</strong> Bershad, 1961) <strong>and</strong> correlated response variance,<br />
one form of which is interviewer variance (a measure<br />
of the variability among responses obtained by different<br />
interviewers over repeated administrations). Similarly,<br />
when a particular design-estimator fails over repeated<br />
sampling to include a particular set of population units in<br />
the sampling frame or to ensure that all units provide the<br />
required data, bias can be viewed as having components<br />
such as coverage bias, unit nonresponse bias, or item nonresponse<br />
bias (Groves, 1989). For example, a survey<br />
administered solely by telephone could result in coverage<br />
bias for estimates relating to the total population if the<br />
nontelephone households were different from the telephone<br />
households with respect to the characteristic being<br />
measured (which almost always occurs).<br />
One common theme of these types of models is the<br />
decomposition of total mean squared error into two sets<br />
of components, one resulting from the fact that estimates<br />
are based on a sample of units rather than the entire<br />
population (sampling error) <strong>and</strong> the other due to alternative<br />
specifications of procedures for conducting the<br />
sample survey (nonsampling error). (Since nonsampling<br />
error is defined negatively, it ends up being a catch-all<br />
term for all errors other than sampling error, <strong>and</strong> can<br />
include issues such as individual behavior.) Conceptually,<br />
nonsampling error in the context of statistical science has<br />
both variance <strong>and</strong> bias components. However, when total<br />
mean squared error is decomposed mathematically to<br />
include a sampling error term <strong>and</strong> one or more other nonsampling<br />
error terms, it is often difficult to categorize<br />
such terms as either variance or bias. The term nonsampling<br />
error is used rather loosely in the survey literature to<br />
denote mean squared error, variance, or bias in the precise<br />
mathematical sense <strong>and</strong> to imply error in the more general<br />
sense of process mistakes (see next section).<br />
Some nonsampling error components which are conceptually<br />
known to exist have yet to be expressed in practical<br />
mathematical models. Two examples are the bias associated<br />
with the use of a particular set of interviewers <strong>and</strong><br />
the variance associated with the selection of one of the<br />
numerous possible sets of questions. In addition, the estimation<br />
of many nonsampling errors—<strong>and</strong> sampling<br />
bias—is extremely expensive <strong>and</strong> difficult or even impossible<br />
in practice. The estimation of bias, for example,<br />
requires knowledge of the truth, which may be sometimes<br />
verifiable from records (e.g., number of hours paid for by<br />
employer) but often is not verifiable (e.g., number of<br />
hours actually worked). As a consequence, survey organizations<br />
typically concentrate on estimating the one component<br />
of total mean squared error for which practical<br />
methods have been developed—variance.<br />
It is frequently possible to construct an unbiased estimator<br />
of variance. In the case of complex surveys like the<br />
CPS, estimators have been developed that typically rely on<br />
the proposition— usually well-grounded—that the variability<br />
among estimates based on various subsamples of the<br />
one actual sample is a good proxy for the variability<br />
among all the possible samples like the one at h<strong>and</strong>. In<br />
the case of the CPS, 160 subsamples or replicates are used<br />
in variance estimation for the 2000 design. (For more specifics,<br />
see Chapter 14.) It is important to note that the estimates<br />
of variance resulting from the use of this <strong>and</strong> similar<br />
methods are not merely estimates of sampling<br />
variance. The variance estimates include the effects of<br />
some nonsampling errors, such as response variance <strong>and</strong><br />
intra-interviewer correlation. On the other h<strong>and</strong>, users<br />
should be aware of the fact that for some statistics these<br />
estimates of st<strong>and</strong>ard error might be statistically significant<br />
underestimates of total error, an important consideration<br />
when making inferences based on survey data.<br />
To draw conclusions from survey data, samplers rely on<br />
the theory of finite population sampling from a repeated<br />
sampling perspective: If the specified sample designestimator<br />
methodology were implemented repeatedly <strong>and</strong><br />
the sample size sufficiently large, the probability distribution<br />
of the estimates would be very close to a normal distribution.<br />
Thus, one could safely expect 90 percent of the estimates<br />
to be within two st<strong>and</strong>ard errors of the mean of all possible<br />
sample estimates (st<strong>and</strong>ard error is the square root<br />
of the estimate of variance) (Gonzalez et al., 1975; Moore,<br />
1997). However, one cannot claim that the probability is<br />
.90 that the true population value falls in a particular interval.<br />
In the case of a biased estimator due to nonresponse,<br />
undercoverage, or other types of nonsampling error, confidence<br />
intervals may not cover the population parameter at<br />
the desired 90-percent rate. In such cases, a st<strong>and</strong>ard<br />
error estimator may indirectly account for some elements<br />
of nonsampling error in addition to sampling error <strong>and</strong><br />
lead to confidence intervals having greater than the nominal<br />
90-percent coverage. On the other h<strong>and</strong>, if the bias is<br />
substantial, confidence intervals can have less than the<br />
desired coverage.<br />
QUALITY MEASURES IN STATISTICAL PROCESS<br />
MONITORING<br />
The process of conducting a survey includes numerous<br />
steps or components, such as defining concepts, translating<br />
concepts into questions, selecting a sample of units<br />
from what may be an imperfect list of population units,<br />
hiring <strong>and</strong> training interviewers to ask people in the<br />
sample unit the questions, coding responses into predefined<br />
categories, <strong>and</strong> creating estimates that take into<br />
account the fact that not everyone in the population of<br />
interest had a chance to be in the sample <strong>and</strong> not all of<br />
those in the sample elected to provide responses. It is a<br />
process where the possibility exists at each step of making<br />
a mistake in process specification <strong>and</strong> deviating during<br />
implementation from the predefined specifications.<br />
13–2 Overview of Data Quality Concepts <strong>Current</strong> <strong>Population</strong> <strong>Survey</strong> TP66<br />
U.S. <strong>Bureau</strong> of Labor Statistics <strong>and</strong> U.S. <strong>Census</strong> <strong>Bureau</strong>