30.01.2014 Views

Causal Inference - South Africa Government Online

Causal Inference - South Africa Government Online

Causal Inference - South Africa Government Online

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

REPUBLIC OF SOUTH AFRICA<br />

GOVERNMENT-WIDE MONITORING & IMPACT EVALUATION SEMINAR<br />

Session I<br />

<strong>Causal</strong> <strong>Inference</strong><br />

Sebastian Martinez<br />

June 2006<br />

Slides by Sebastian Galiani, Paul Gertler and Sebastian Martinez<br />

ORGANIZED BY THE WORLD BANK AFRICA IMPACT EVALUATION INITIATIVE<br />

IN COLLABORATION WITH HUMAN DEVELOPMENT NETWORK<br />

AND WORLD BANK INSTITUTE


Motivation<br />

• Objective in evaluation is to estimate<br />

the CAUSAL effect of intervention<br />

(treatment) t on outcome Y<br />

– What is the effect of a cash transfer on<br />

household consumption?<br />

• For causal inference we must<br />

understand the data generation<br />

process<br />

– For impact evaluation, this means<br />

understanding the behavioral process<br />

that generates the data<br />

• how benefits are assigned


Technical Group<br />

• <strong>Causal</strong> <strong>Inference</strong><br />

• Experimental design/randomization<br />

• Quasi-experiments<br />

– Regression Discontinuity<br />

– Double differences (Diff in diff)<br />

–Matching<br />

– Instrumental Variables<br />

• Sampling and Data


<strong>Causal</strong> Analysis<br />

• The aim of standard statistical analysis, typified by<br />

likelihood and other estimation techniques, is to<br />

infer parameters of a distribution from samples<br />

drawn of that distribution.<br />

• With the help of such parameters, one can:<br />

1. Infer association among variables,<br />

2. Estimate the likelihood of past and future events,<br />

3. As well as update the likelihood of events in light of new<br />

evidence or new measurement.


<strong>Causal</strong> Analysis<br />

• These tasks are managed well by standard<br />

statistical analysis as long as experimental<br />

conditions remain the same.<br />

• <strong>Causal</strong> analysis goes one step further:<br />

– Its aim is to infer aspects of the data generation<br />

process.<br />

– With the help of such aspects, one can deduce<br />

not only the likelihood of events under static<br />

conditions, but also the dynamics of events<br />

under changing conditions.


<strong>Causal</strong> Analysis<br />

• This capability includes:<br />

1.Predicting the effects of interventions<br />

2.Predicting the effects of spontaneous changes<br />

3.Identifying causes of reported events<br />

• This distinction implies that causal and<br />

associational concepts do not mix.


<strong>Causal</strong> Analysis<br />

The word cause is not in the vocabulary of standard<br />

probability theory.<br />

• All Probability theory allows us to say is that two<br />

events are mutually correlated, or dependent –<br />

meaning that if we find one, we can expect to<br />

encounter the other.<br />

• Scientists seeking causal explanations for<br />

complex phenomena or rationales for policy<br />

decisions must therefore supplement the language<br />

of probability with a vocabulary for causality.


<strong>Causal</strong> Analysis<br />

• Two languages for causality have<br />

been proposed:<br />

1.Structural equation modeling (ESM)<br />

(Haavelmo 1943).<br />

2.The Neyman-Rubin potential outcome<br />

model (RCM) (Neyman, 1923; Rubin,<br />

1974).


The Rubin <strong>Causal</strong> Model<br />

• Define the population by U. Each unit in U<br />

is denoted by u.<br />

• For each u ∈ U, there is associated a value<br />

Y(u) of the variable of interest Y, which we<br />

call: the response variable.<br />

• Let A be a second variable defined on U.<br />

We call A an attribute of the units in U.


The Rubin <strong>Causal</strong> Model<br />

• The key notion is the potential for<br />

exposing or not exposing each unit to the<br />

action of a cause:<br />

• Each unit has to be potentially exposable<br />

to any one of the causes.<br />

• Thus, Rubin takes the position that causes<br />

are only those things that could be<br />

treatments in hypothetical experiments.<br />

• An attribute cannot be a cause in an<br />

experiment, because the notion of potential<br />

exposability does not apply to it.


The Rubin <strong>Causal</strong> Model<br />

• For simplicity, we assume that there are just<br />

two causes or level of treatment.<br />

• Let D be a variable that indicates the cause<br />

to which each unit in U is exposed:<br />

⎧t<br />

D = ⎨<br />

⎩c<br />

if<br />

if<br />

unit u is exposed to treatment<br />

unit u is exposed to control<br />

In a controlled study, D is constructed by the<br />

experimenter. In an uncontrolled study, it is<br />

determined by factors beyond the<br />

experimenter’s control.


The Rubin <strong>Causal</strong> Model<br />

• The values of Y are potentially affected by<br />

the particular cause, t or c, to which the<br />

unit is exposed.<br />

• Thus, we need two response variables:<br />

Y t (u), Y c (u)<br />

• Y t is the value of the response that would<br />

be observed if the unit were exposed to t<br />

and<br />

• Y c is the value that would be observed on<br />

the same unit if it were exposed to c.


The Rubin <strong>Causal</strong> Model<br />

• Let D also be expressed as a binary<br />

variable:<br />

D = 1 if D = t and D = 0 if D = c<br />

• Then, the outcome of each individual can<br />

be written as:<br />

Y(U) = D Y 1 + (1 – D) Y 0


The Rubin <strong>Causal</strong> Model<br />

• Definition: For every unit u treatment {D u = 1 instead of D u = 0}<br />

causes the effect<br />

δ u = Y 1 (u) – Y 0 (u)<br />

• This definition of a causal effect assumes that the treatment<br />

status of one individual does not affect the potential outcomes<br />

of other individuals.<br />

• Fundamental Problem of <strong>Causal</strong> <strong>Inference</strong>: It is impossible<br />

to observe the value of Y 1 (u) and Y 0 (u) on the same unit and,<br />

therefore, it is impossible to observe the effect of t on u.<br />

• Another way to express this problem is to say that we cannot<br />

infer the effect of treatment because we do not have the<br />

counterfactual evidence i.e. what would have happened in the<br />

absence of treatment.


The Rubin <strong>Causal</strong> Model<br />

• Given that the causal effect for a single unit u<br />

cannot be observed, we aim to identify the<br />

average causal effect for the entire population or<br />

for sub-populations.<br />

• The average treatment effect ATE of t (relative to<br />

c) over U (or any sub-population) is given by:<br />

ATE =E [Y 1 (u) – Y 0 (u)]<br />

= E [Y 1 (u)] – E [Y 0 (u)]<br />

= δ<br />

= Y − Y<br />

1 0<br />

(1)


The Rubin <strong>Causal</strong> Model<br />

• The statistical solution replaces the impossible-toobserve<br />

causal effect of t on a specific unit with<br />

the possible-to-estimate average causal effect of t<br />

over a population of units.<br />

• Although E(Y 1 ) and E(Y 0 ) cannot both be<br />

calculated, they can be estimated.<br />

• Most econometrics methods attempt to construct<br />

from observational data consistent estimates of<br />

Y and Y<br />

1 0


The Rubin <strong>Causal</strong> Model<br />

• Consider the following simple estimator of<br />

ATE:<br />

ˆ<br />

δ<br />

=<br />

[Ŷ1 | D = 1]-[Ŷ0<br />

| D =<br />

0]<br />

(2)<br />

• Note that equation (1) is defined for the<br />

whole population, whereas equation (2)<br />

represents an estimator to be evaluated on a<br />

sample drawn from that population


• Let π equal the proportion of the population<br />

that would be assigned to the treatment<br />

group.<br />

• Decomposing ATE, we have:<br />

δ<br />

= π δ{ D= 1}<br />

+ ( 1−π<br />

) δ{<br />

D=<br />

0}<br />

[( − Y ) | D = 1] + (1 − ) [( Y − Y ) | D 0]<br />

δ = π<br />

π<br />

Y1 0<br />

1 0<br />

=<br />

δ =<br />

[ π [Y<br />

]<br />

1<br />

| D = 1] + (1 − π )[Y1<br />

| D = 0] +<br />

[ π [Y<br />

]<br />

0<br />

| D = 1] + (1 − π )[Y0<br />

| D = 0] = Y1<br />

− Y0


• If we assume that<br />

[<br />

0<br />

Y1 | D = 1] = [Y1<br />

| D = 0] and [Y0<br />

| D = 1] = [Y | D =<br />

δ<br />

δ =<br />

=<br />

[ π [Y1<br />

| D = 1] + (1 − π )[Y1<br />

| D = 1] ]<br />

[ π [Y | D = 0] + (1 − π )[Y | D = 0] ]<br />

0<br />

[<br />

0<br />

Y | D = 1] - [Y | D =<br />

1<br />

Which is consistently estimated by its sample<br />

analog estimator:<br />

ˆ<br />

δ<br />

=<br />

[Ŷ | D = 1] - [Ŷ | D =<br />

1 0<br />

0<br />

+<br />

0]<br />

0]<br />

0]


The principal way to achieve this uncorrelatedness is<br />

through random assignment of treatment.<br />

• Thus, a sufficient condition for the standard<br />

estimator to consistently estimate the true ATE is<br />

that:<br />

[<br />

0<br />

Y1 | D = 1] = [Y1<br />

| D = 0] and [Y0<br />

| D = 1] = [Y | D =<br />

In this situation, the average outcome under the<br />

treatment and the average outcome under the control<br />

do not differ between the treatment and control groups<br />

In order to satisfy these conditions, it is sufficient that<br />

treatment assignment D be uncorrelated with the<br />

potential outcome distributions of Y 1 and Y 2 .<br />

0]


• In most circumstances, there is simply no<br />

information available on how those in the<br />

control group would have reacted if they had<br />

received the treatment instead.<br />

• This is the basis for an important insight into<br />

the potential biases of the standard<br />

estimator (2).<br />

• After a bit of algebra, it can be shown that:<br />

ˆ<br />

δ = δ +<br />

0 0<br />

)<br />

{D= 1}<br />

− δ{D=<br />

1 4 4 4 4 2 4 4 4 4 3 1 44<br />

2 4 43<br />

([Y<br />

| D = 1] − [Y | D = 0] ) + (1 − π ( δ )<br />

Baseline Difference<br />

0}<br />

Treatment Heterogeneity


• This equation specifies the two sources of<br />

biases that need to be eliminated from<br />

estimates of causal effects from<br />

observational studies.<br />

1. Selection Bias: Baseline difference.<br />

2. Treatment Heterogeneity.<br />

• Most of the methods available only deal with<br />

selection bias, simply assuming that the<br />

treatment effect is constant in the population<br />

or by redefining the parameter of interest in<br />

the population.


Treatment on the Treated<br />

• ATE is not always the parameter of<br />

interest.<br />

• In a variety of policy contexts, it is the<br />

average treatment effect for the treated<br />

that is of substantive interest:<br />

TOT =E [Y 1 (u) – Y 0 (u)| D = 1]<br />

=E [Y 1 (u)| D = 1] – E [Y 0 (u)| D = 1]


Treatment on the Treated<br />

• The standard estimator (2) consistently<br />

estimates TOT if:<br />

[<br />

0<br />

Y | D = 1] = [Y | D =<br />

0<br />

0]


Structural Equation Modeling<br />

• Structural equation modeling was<br />

originally developed by geneticists<br />

(Wright 1921) and economists<br />

(Haavelmo 1943).


Structural Equations<br />

• Definition: An equation<br />

y = β x + ε (3)<br />

is said to be structural if it is to be interpreted as<br />

follows:<br />

• In an ideal experiment where we control X to x and<br />

any other set Z of variables (not containing X or Y)<br />

to z, the value y of Y is given by β x + ε, where ε is<br />

not a function of the settings x and z.<br />

• This definition is in the spirit of Haavelmo (1943),<br />

who explicitly interpreted each structural equation as<br />

a statement about a hypothetical controlled<br />

experiment.


• Thus, to the often asked question, “Under what<br />

conditions can we give causal interpretation to<br />

structural coefficients?”<br />

• Haavelmo would have answered: Always!<br />

• According to the founding father of SEM, the<br />

conditions that make the equation y = β x + ε<br />

structural are precisely those that make the<br />

causal connection between X and Y have no<br />

other value but β, and ensuring that nothing<br />

about the statistical relationship between x and ε<br />

can ever change this interpretation of β.


• The average causal effect: The average<br />

causal effect on Y of treatment level x is<br />

the difference in the conditional<br />

expectations:<br />

E(Y|X = x) – E(Y|X = 0)<br />

• In the context of dichotomous interventions<br />

(x = 1), this causal effect is called the<br />

average treatment effect (ATE).


Representing Interventions<br />

• Consider the structural model M:<br />

z = f z (w)<br />

x = f x (z, ν)<br />

y = f y (x, u)<br />

• We represent an intervention in the model through<br />

a mathematical operator denoted d 0 (x).<br />

• d 0 (x) simulates physical interventions by deleting<br />

certain functions from the model, replacing them<br />

by a constant X = x, while keeping the rest of the<br />

model unchanged.


• From this distribution, one is able to assess<br />

treatment efficacy by comparing aspects of this<br />

distribution at different levels of x .<br />

• To emulate an intervention d 0 (x 0 ) that holds X<br />

constant (at X = x 0 ) in model M, replace the<br />

equation for x with x = x 0 , and obtain a new model,<br />

M x0<br />

z = f z (w)<br />

x = x 0<br />

y = f y (x, u)<br />

• The joint distribution associated with the modified<br />

model, denoted P(z, y| d 0 (x 0 )) describes the postintervention<br />

(“experimental”) distribution.


• Definition: The interpretation of a structural<br />

equation as a statement about the behavior of Y<br />

under a hypothetical intervention yields a simple<br />

definition for the structural parameters.<br />

The meaning of β in the equation y = β x + ε is<br />

simply<br />

β =<br />

∂<br />

∂x<br />

E[Y |<br />

d<br />

o<br />

(x)]


Counterfactual Analysis in Structural<br />

Models<br />

• Consider again model M xo . Call the solution<br />

of Y the potential response of Y to x 0 .<br />

• We denote it as Y x0 (u, ν, w).<br />

• This entity can be given a counterfactual<br />

interpretation, for it stands for the way an<br />

individual with characteristics (u, ν, w) would<br />

respond, had the treatment been x 0 , rather<br />

than the x = f x (z, ν) actually received by the<br />

individual.


• In our example,<br />

Y x0 (u, ν, w) = Y x0 (u) = y = f y (x 0 , u)<br />

• This interpretation of counterfactuals, cast as<br />

solutions to modified systems of equations, provides<br />

the conceptual and formal link between structural<br />

equation modeling and the Rubin potential-outcome<br />

framework.<br />

• It ensures us that the end results of the two<br />

approaches will be the same.<br />

• Thus, the choice of model is strictly a matter of<br />

convenience or insight.


References<br />

• Judea Pearl (2000): <strong>Causal</strong>ity: Models, Reasoning<br />

and <strong>Inference</strong>, CUP. Chapters 1, 5 and 7.<br />

• Trygve Haavelmo (1944): “The probability<br />

approach in econometrics”, Econometrica 12, pp.<br />

iii-vi+1-115.<br />

• Arthur Goldberger (1972): “Structural Equations<br />

Methods in the Social Sciences”, Econometrica<br />

40, pp. 979-1002.<br />

• Donald B. Rubin (1974): “Estimating causal effects<br />

of treatments in randomized and nonrandomized<br />

experiments”, Journal of Educational Psychology<br />

66, pp. 688-701.<br />

• Paul W. Holland (1986): “Statistics and <strong>Causal</strong><br />

<strong>Inference</strong>”, Journal of the American Statistical<br />

Association 81, pp. 945-70, with discussion.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!