Current Population Survey Design and Methodology - Census Bureau
Current Population Survey Design and Methodology - Census Bureau
Current Population Survey Design and Methodology - Census Bureau
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
either does not know the answer to a question or refuses<br />
to provide the answer. Item nonresponse in the CPS is<br />
modest (see Chapter 16, Table 16−4).<br />
One of three imputation methods are used to compensate<br />
for item nonresponse in the CPS. Before the edits are<br />
applied, the daily data files are merged <strong>and</strong> the combined<br />
file is sorted by state <strong>and</strong> PSU within state. This sort<br />
ensures that allocated values are from geographically<br />
related records; that is, missing values for records in Maryl<strong>and</strong><br />
will not receive values from records in California. This<br />
is an important distinction since many labor force <strong>and</strong><br />
industry <strong>and</strong> occupation characteristics are geographically<br />
clustered.<br />
The edits effectively blank all entries in inappropriate<br />
questions (e.g., followed incorrect path of questions) <strong>and</strong><br />
ensure that all appropriate questions have valid entries.<br />
For the most part, illogical entries or out-of-range entries<br />
have been eliminated with the use of electronic instruments;<br />
however, the edits still address these possibilities,<br />
which may arise from data transmission problems <strong>and</strong><br />
occasional instrument malfunctions. The main purpose of<br />
the edits, however, is to assign values to questions where<br />
the response was ‘‘Don’t know’’ or ‘‘Refused.’’ This is<br />
accomplished by using 1 of the 3 imputation techniques<br />
described below.<br />
The edits are run in a deliberate <strong>and</strong> logical sequence.<br />
Demographic variables are edited first because several of<br />
those variables are used to allocate missing values in the<br />
other modules. The labor force module is edited next<br />
since labor force status <strong>and</strong> related items are used to<br />
impute missing values for industry <strong>and</strong> occupation codes<br />
<strong>and</strong> so forth.<br />
The three imputation methods used by the CPS edits are<br />
described below:<br />
1. Relational imputation infers the missing value from<br />
other characteristics on the person’s record or within<br />
the household. For instance, if race is missing, it is<br />
assigned based on the race of another household<br />
member, or failing that, taken from the previous<br />
record on the file. Similarly, if relationship data is<br />
missing, it is assigned by looking at the age <strong>and</strong> sex<br />
of the person in conjunction with the known relationship<br />
of other household members. Missing occupation<br />
codes are sometimes assigned by analyzing the industry<br />
codes <strong>and</strong> vice versa. This technique is used as<br />
appropriate across all edits. If missing values cannot<br />
be assigned using this technique, they are assigned<br />
using one of the two following methods.<br />
2. Longitudinal edits are used in most of the labor force<br />
edits, as appropriate. If a question is blank <strong>and</strong> the<br />
individual is in the second or later month’s interview,<br />
the edit procedure looks at last month’s data to determine<br />
whether there was an entry for that item. If so,<br />
last month’s entry is assigned; otherwise, the item is<br />
assigned a value using the appropriate hot deck, as<br />
described next.<br />
3. The third imputation method is commonly referred to<br />
as ‘‘hot deck’’ allocation. This method assigns a missing<br />
value from a record with similar characteristics,<br />
which is the hot deck. Hot decks are defined by variables<br />
such as age, race, <strong>and</strong> sex. Other characteristics<br />
used in hot decks vary depending on the nature of the<br />
unanswered question. For instance, most labor force<br />
questions use age, race, sex, <strong>and</strong> occasionally another<br />
correlated labor force item such as full- or part-time<br />
status. This means the number of cells in labor force<br />
hot decks are relatively small, perhaps fewer than<br />
100. On the other h<strong>and</strong>, the weekly earnings hot deck<br />
is defined by age, race, sex, usual hours, occupation,<br />
<strong>and</strong> educational attainment. This hot deck has several<br />
thous<strong>and</strong> cells.<br />
All CPS items that require imputation for missing values<br />
have an associated hot deck . The initial values for the hot<br />
decks are the ending values from the preceding month. As<br />
a record passes through the editing procedures, it will<br />
either donate a value to each hot deck in its path or<br />
receive a value from the hot deck. For instance, in a hypothetical<br />
case, the hot deck for question X is defined by the<br />
characteristics Black/non-Black, male/female, <strong>and</strong> age<br />
16−25/25+. Further assume a record has the value of<br />
White, male, <strong>and</strong> age 64. When this record reaches question<br />
X, the edits determine whether it has a valid entry. If<br />
so, that record’s value for question X replaces the value in<br />
the hot deck reserved for non-Black, male, <strong>and</strong> age 25+.<br />
Comparably, if the record was missing a value for item X,<br />
it would be assigned the value in the hot deck designated<br />
for non-Black, male, <strong>and</strong> age 25+.<br />
As stated above, the various edits are logically sequenced,<br />
in accordance with the needs of subsequent edits. The<br />
edits <strong>and</strong> codes, in order of sequence, are:<br />
1. Household edits <strong>and</strong> codes. This processing step<br />
performs edits <strong>and</strong> creates recodes for items pertaining<br />
to the household. It classifies households as interviews<br />
or noninterviews <strong>and</strong> edits items appropriately.<br />
Hot deck allocations defined by geography <strong>and</strong> other<br />
related variables are used in this edit.<br />
2. Demographic edits <strong>and</strong> codes. This processing<br />
step ensures consistency among all demographic variables<br />
for all individuals within a household. It ensures<br />
all interviewed households have one <strong>and</strong> only one reference<br />
person <strong>and</strong> that entries stating marital status,<br />
spouse, <strong>and</strong> parents are all consistent. It also creates<br />
families based upon these characteristics. It uses longitudinal<br />
editing, hot deck allocation defined by<br />
related demographic characteristics, <strong>and</strong> relational<br />
imputation.<br />
9–2 Data Preparation <strong>Current</strong> <strong>Population</strong> <strong>Survey</strong> TP66<br />
U.S. <strong>Bureau</strong> of Labor Statistics <strong>and</strong> U.S. <strong>Census</strong> <strong>Bureau</strong>