Causal Inference - South Africa Government Online
Causal Inference - South Africa Government Online
Causal Inference - South Africa Government Online
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
REPUBLIC OF SOUTH AFRICA<br />
GOVERNMENT-WIDE MONITORING & IMPACT EVALUATION SEMINAR<br />
Session I<br />
<strong>Causal</strong> <strong>Inference</strong><br />
Sebastian Martinez<br />
June 2006<br />
Slides by Sebastian Galiani, Paul Gertler and Sebastian Martinez<br />
ORGANIZED BY THE WORLD BANK AFRICA IMPACT EVALUATION INITIATIVE<br />
IN COLLABORATION WITH HUMAN DEVELOPMENT NETWORK<br />
AND WORLD BANK INSTITUTE
Motivation<br />
• Objective in evaluation is to estimate<br />
the CAUSAL effect of intervention<br />
(treatment) t on outcome Y<br />
– What is the effect of a cash transfer on<br />
household consumption?<br />
• For causal inference we must<br />
understand the data generation<br />
process<br />
– For impact evaluation, this means<br />
understanding the behavioral process<br />
that generates the data<br />
• how benefits are assigned
Technical Group<br />
• <strong>Causal</strong> <strong>Inference</strong><br />
• Experimental design/randomization<br />
• Quasi-experiments<br />
– Regression Discontinuity<br />
– Double differences (Diff in diff)<br />
–Matching<br />
– Instrumental Variables<br />
• Sampling and Data
<strong>Causal</strong> Analysis<br />
• The aim of standard statistical analysis, typified by<br />
likelihood and other estimation techniques, is to<br />
infer parameters of a distribution from samples<br />
drawn of that distribution.<br />
• With the help of such parameters, one can:<br />
1. Infer association among variables,<br />
2. Estimate the likelihood of past and future events,<br />
3. As well as update the likelihood of events in light of new<br />
evidence or new measurement.
<strong>Causal</strong> Analysis<br />
• These tasks are managed well by standard<br />
statistical analysis as long as experimental<br />
conditions remain the same.<br />
• <strong>Causal</strong> analysis goes one step further:<br />
– Its aim is to infer aspects of the data generation<br />
process.<br />
– With the help of such aspects, one can deduce<br />
not only the likelihood of events under static<br />
conditions, but also the dynamics of events<br />
under changing conditions.
<strong>Causal</strong> Analysis<br />
• This capability includes:<br />
1.Predicting the effects of interventions<br />
2.Predicting the effects of spontaneous changes<br />
3.Identifying causes of reported events<br />
• This distinction implies that causal and<br />
associational concepts do not mix.
<strong>Causal</strong> Analysis<br />
The word cause is not in the vocabulary of standard<br />
probability theory.<br />
• All Probability theory allows us to say is that two<br />
events are mutually correlated, or dependent –<br />
meaning that if we find one, we can expect to<br />
encounter the other.<br />
• Scientists seeking causal explanations for<br />
complex phenomena or rationales for policy<br />
decisions must therefore supplement the language<br />
of probability with a vocabulary for causality.
<strong>Causal</strong> Analysis<br />
• Two languages for causality have<br />
been proposed:<br />
1.Structural equation modeling (ESM)<br />
(Haavelmo 1943).<br />
2.The Neyman-Rubin potential outcome<br />
model (RCM) (Neyman, 1923; Rubin,<br />
1974).
The Rubin <strong>Causal</strong> Model<br />
• Define the population by U. Each unit in U<br />
is denoted by u.<br />
• For each u ∈ U, there is associated a value<br />
Y(u) of the variable of interest Y, which we<br />
call: the response variable.<br />
• Let A be a second variable defined on U.<br />
We call A an attribute of the units in U.
The Rubin <strong>Causal</strong> Model<br />
• The key notion is the potential for<br />
exposing or not exposing each unit to the<br />
action of a cause:<br />
• Each unit has to be potentially exposable<br />
to any one of the causes.<br />
• Thus, Rubin takes the position that causes<br />
are only those things that could be<br />
treatments in hypothetical experiments.<br />
• An attribute cannot be a cause in an<br />
experiment, because the notion of potential<br />
exposability does not apply to it.
The Rubin <strong>Causal</strong> Model<br />
• For simplicity, we assume that there are just<br />
two causes or level of treatment.<br />
• Let D be a variable that indicates the cause<br />
to which each unit in U is exposed:<br />
⎧t<br />
D = ⎨<br />
⎩c<br />
if<br />
if<br />
unit u is exposed to treatment<br />
unit u is exposed to control<br />
In a controlled study, D is constructed by the<br />
experimenter. In an uncontrolled study, it is<br />
determined by factors beyond the<br />
experimenter’s control.
The Rubin <strong>Causal</strong> Model<br />
• The values of Y are potentially affected by<br />
the particular cause, t or c, to which the<br />
unit is exposed.<br />
• Thus, we need two response variables:<br />
Y t (u), Y c (u)<br />
• Y t is the value of the response that would<br />
be observed if the unit were exposed to t<br />
and<br />
• Y c is the value that would be observed on<br />
the same unit if it were exposed to c.
The Rubin <strong>Causal</strong> Model<br />
• Let D also be expressed as a binary<br />
variable:<br />
D = 1 if D = t and D = 0 if D = c<br />
• Then, the outcome of each individual can<br />
be written as:<br />
Y(U) = D Y 1 + (1 – D) Y 0
The Rubin <strong>Causal</strong> Model<br />
• Definition: For every unit u treatment {D u = 1 instead of D u = 0}<br />
causes the effect<br />
δ u = Y 1 (u) – Y 0 (u)<br />
• This definition of a causal effect assumes that the treatment<br />
status of one individual does not affect the potential outcomes<br />
of other individuals.<br />
• Fundamental Problem of <strong>Causal</strong> <strong>Inference</strong>: It is impossible<br />
to observe the value of Y 1 (u) and Y 0 (u) on the same unit and,<br />
therefore, it is impossible to observe the effect of t on u.<br />
• Another way to express this problem is to say that we cannot<br />
infer the effect of treatment because we do not have the<br />
counterfactual evidence i.e. what would have happened in the<br />
absence of treatment.
The Rubin <strong>Causal</strong> Model<br />
• Given that the causal effect for a single unit u<br />
cannot be observed, we aim to identify the<br />
average causal effect for the entire population or<br />
for sub-populations.<br />
• The average treatment effect ATE of t (relative to<br />
c) over U (or any sub-population) is given by:<br />
ATE =E [Y 1 (u) – Y 0 (u)]<br />
= E [Y 1 (u)] – E [Y 0 (u)]<br />
= δ<br />
= Y − Y<br />
1 0<br />
(1)
The Rubin <strong>Causal</strong> Model<br />
• The statistical solution replaces the impossible-toobserve<br />
causal effect of t on a specific unit with<br />
the possible-to-estimate average causal effect of t<br />
over a population of units.<br />
• Although E(Y 1 ) and E(Y 0 ) cannot both be<br />
calculated, they can be estimated.<br />
• Most econometrics methods attempt to construct<br />
from observational data consistent estimates of<br />
Y and Y<br />
1 0
The Rubin <strong>Causal</strong> Model<br />
• Consider the following simple estimator of<br />
ATE:<br />
ˆ<br />
δ<br />
=<br />
[Ŷ1 | D = 1]-[Ŷ0<br />
| D =<br />
0]<br />
(2)<br />
• Note that equation (1) is defined for the<br />
whole population, whereas equation (2)<br />
represents an estimator to be evaluated on a<br />
sample drawn from that population
• Let π equal the proportion of the population<br />
that would be assigned to the treatment<br />
group.<br />
• Decomposing ATE, we have:<br />
δ<br />
= π δ{ D= 1}<br />
+ ( 1−π<br />
) δ{<br />
D=<br />
0}<br />
[( − Y ) | D = 1] + (1 − ) [( Y − Y ) | D 0]<br />
δ = π<br />
π<br />
Y1 0<br />
1 0<br />
=<br />
δ =<br />
[ π [Y<br />
]<br />
1<br />
| D = 1] + (1 − π )[Y1<br />
| D = 0] +<br />
[ π [Y<br />
]<br />
0<br />
| D = 1] + (1 − π )[Y0<br />
| D = 0] = Y1<br />
− Y0
• If we assume that<br />
[<br />
0<br />
Y1 | D = 1] = [Y1<br />
| D = 0] and [Y0<br />
| D = 1] = [Y | D =<br />
δ<br />
δ =<br />
=<br />
[ π [Y1<br />
| D = 1] + (1 − π )[Y1<br />
| D = 1] ]<br />
[ π [Y | D = 0] + (1 − π )[Y | D = 0] ]<br />
0<br />
[<br />
0<br />
Y | D = 1] - [Y | D =<br />
1<br />
Which is consistently estimated by its sample<br />
analog estimator:<br />
ˆ<br />
δ<br />
=<br />
[Ŷ | D = 1] - [Ŷ | D =<br />
1 0<br />
0<br />
+<br />
0]<br />
0]<br />
0]
The principal way to achieve this uncorrelatedness is<br />
through random assignment of treatment.<br />
• Thus, a sufficient condition for the standard<br />
estimator to consistently estimate the true ATE is<br />
that:<br />
[<br />
0<br />
Y1 | D = 1] = [Y1<br />
| D = 0] and [Y0<br />
| D = 1] = [Y | D =<br />
In this situation, the average outcome under the<br />
treatment and the average outcome under the control<br />
do not differ between the treatment and control groups<br />
In order to satisfy these conditions, it is sufficient that<br />
treatment assignment D be uncorrelated with the<br />
potential outcome distributions of Y 1 and Y 2 .<br />
0]
• In most circumstances, there is simply no<br />
information available on how those in the<br />
control group would have reacted if they had<br />
received the treatment instead.<br />
• This is the basis for an important insight into<br />
the potential biases of the standard<br />
estimator (2).<br />
• After a bit of algebra, it can be shown that:<br />
ˆ<br />
δ = δ +<br />
0 0<br />
)<br />
{D= 1}<br />
− δ{D=<br />
1 4 4 4 4 2 4 4 4 4 3 1 44<br />
2 4 43<br />
([Y<br />
| D = 1] − [Y | D = 0] ) + (1 − π ( δ )<br />
Baseline Difference<br />
0}<br />
Treatment Heterogeneity
• This equation specifies the two sources of<br />
biases that need to be eliminated from<br />
estimates of causal effects from<br />
observational studies.<br />
1. Selection Bias: Baseline difference.<br />
2. Treatment Heterogeneity.<br />
• Most of the methods available only deal with<br />
selection bias, simply assuming that the<br />
treatment effect is constant in the population<br />
or by redefining the parameter of interest in<br />
the population.
Treatment on the Treated<br />
• ATE is not always the parameter of<br />
interest.<br />
• In a variety of policy contexts, it is the<br />
average treatment effect for the treated<br />
that is of substantive interest:<br />
TOT =E [Y 1 (u) – Y 0 (u)| D = 1]<br />
=E [Y 1 (u)| D = 1] – E [Y 0 (u)| D = 1]
Treatment on the Treated<br />
• The standard estimator (2) consistently<br />
estimates TOT if:<br />
[<br />
0<br />
Y | D = 1] = [Y | D =<br />
0<br />
0]
Structural Equation Modeling<br />
• Structural equation modeling was<br />
originally developed by geneticists<br />
(Wright 1921) and economists<br />
(Haavelmo 1943).
Structural Equations<br />
• Definition: An equation<br />
y = β x + ε (3)<br />
is said to be structural if it is to be interpreted as<br />
follows:<br />
• In an ideal experiment where we control X to x and<br />
any other set Z of variables (not containing X or Y)<br />
to z, the value y of Y is given by β x + ε, where ε is<br />
not a function of the settings x and z.<br />
• This definition is in the spirit of Haavelmo (1943),<br />
who explicitly interpreted each structural equation as<br />
a statement about a hypothetical controlled<br />
experiment.
• Thus, to the often asked question, “Under what<br />
conditions can we give causal interpretation to<br />
structural coefficients?”<br />
• Haavelmo would have answered: Always!<br />
• According to the founding father of SEM, the<br />
conditions that make the equation y = β x + ε<br />
structural are precisely those that make the<br />
causal connection between X and Y have no<br />
other value but β, and ensuring that nothing<br />
about the statistical relationship between x and ε<br />
can ever change this interpretation of β.
• The average causal effect: The average<br />
causal effect on Y of treatment level x is<br />
the difference in the conditional<br />
expectations:<br />
E(Y|X = x) – E(Y|X = 0)<br />
• In the context of dichotomous interventions<br />
(x = 1), this causal effect is called the<br />
average treatment effect (ATE).
Representing Interventions<br />
• Consider the structural model M:<br />
z = f z (w)<br />
x = f x (z, ν)<br />
y = f y (x, u)<br />
• We represent an intervention in the model through<br />
a mathematical operator denoted d 0 (x).<br />
• d 0 (x) simulates physical interventions by deleting<br />
certain functions from the model, replacing them<br />
by a constant X = x, while keeping the rest of the<br />
model unchanged.
• From this distribution, one is able to assess<br />
treatment efficacy by comparing aspects of this<br />
distribution at different levels of x .<br />
• To emulate an intervention d 0 (x 0 ) that holds X<br />
constant (at X = x 0 ) in model M, replace the<br />
equation for x with x = x 0 , and obtain a new model,<br />
M x0<br />
z = f z (w)<br />
x = x 0<br />
y = f y (x, u)<br />
• The joint distribution associated with the modified<br />
model, denoted P(z, y| d 0 (x 0 )) describes the postintervention<br />
(“experimental”) distribution.
• Definition: The interpretation of a structural<br />
equation as a statement about the behavior of Y<br />
under a hypothetical intervention yields a simple<br />
definition for the structural parameters.<br />
The meaning of β in the equation y = β x + ε is<br />
simply<br />
β =<br />
∂<br />
∂x<br />
E[Y |<br />
d<br />
o<br />
(x)]
Counterfactual Analysis in Structural<br />
Models<br />
• Consider again model M xo . Call the solution<br />
of Y the potential response of Y to x 0 .<br />
• We denote it as Y x0 (u, ν, w).<br />
• This entity can be given a counterfactual<br />
interpretation, for it stands for the way an<br />
individual with characteristics (u, ν, w) would<br />
respond, had the treatment been x 0 , rather<br />
than the x = f x (z, ν) actually received by the<br />
individual.
• In our example,<br />
Y x0 (u, ν, w) = Y x0 (u) = y = f y (x 0 , u)<br />
• This interpretation of counterfactuals, cast as<br />
solutions to modified systems of equations, provides<br />
the conceptual and formal link between structural<br />
equation modeling and the Rubin potential-outcome<br />
framework.<br />
• It ensures us that the end results of the two<br />
approaches will be the same.<br />
• Thus, the choice of model is strictly a matter of<br />
convenience or insight.
References<br />
• Judea Pearl (2000): <strong>Causal</strong>ity: Models, Reasoning<br />
and <strong>Inference</strong>, CUP. Chapters 1, 5 and 7.<br />
• Trygve Haavelmo (1944): “The probability<br />
approach in econometrics”, Econometrica 12, pp.<br />
iii-vi+1-115.<br />
• Arthur Goldberger (1972): “Structural Equations<br />
Methods in the Social Sciences”, Econometrica<br />
40, pp. 979-1002.<br />
• Donald B. Rubin (1974): “Estimating causal effects<br />
of treatments in randomized and nonrandomized<br />
experiments”, Journal of Educational Psychology<br />
66, pp. 688-701.<br />
• Paul W. Holland (1986): “Statistics and <strong>Causal</strong><br />
<strong>Inference</strong>”, Journal of the American Statistical<br />
Association 81, pp. 945-70, with discussion.