Estimating Distributions of Counterfactuals with an Application ... - UCL

Estimating Distributions of Counterfactuals with an Application ... - UCL Estimating Distributions of Counterfactuals with an Application ... - UCL

13.10.2014 Views

INTERNATIONAL ECONOMIC REVIEW May 2003 Vol. 44, No. 2 2001 LAWRENCE R. KLEIN LECTURE ESTIMATING DISTRIBUTIONS OF TREATMENT EFFECTS WITH AN APPLICATION TO THE RETURNS TO SCHOOLING AND MEASUREMENT OF THE EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE ∗ BY PEDRO CARNEIRO,KARSTEN T. HANSEN, AND JAMES J. HECKMAN 1 Department of Economics, University of Chicago; Kellogg School of Management, Northwestern University; Department of Economics, University of Chicago and The American Bar Foundation This article uses factor models to identify and estimate the distributions of counterfactuals. We extend LISREL frameworks to a dynamic treatment effect setting, extending matching to account for unobserved conditioning variables. Using these models, we can identify all pairwise and joint treatment effects. We apply these methods to a model of schooling and determine the intrinsic uncertainty facing agents at the time they make their decisions about enrollment in school. We go beyond the “Veil of Ignorance” in evaluating educational policies and determine who benefits and who loses from commonly proposed educational reforms. ∗ Manuscript received October 2000; revised January 2003. 1 Previous versions of this paper were given at the Midwest Econometrics Group, Chicago, October 2000, Washington University St. Louis, May, 2001, the Nordic Econometrics Meetings, May, 2001 and workshops at Chicago August, 2002 and Stanford, January, 2003. A simple version of this paper is presented in Carneiro, Hansen, and Heckman (2001). A version of this paper was presented by Heckman as the Klein Lecture at the University of Pennsylvania, September, 28, 2001 and also at the IFAU conference in Stockholm Sweden, October 2001. We are grateful to all workshop participants. We especially thank Mark Duggan, Orazio Attanasio, and Michael Keane for comments on the first draft of this paper. We have benefited from discussions with Ricardo Barros, Richard Blundell, Francisco Buera, Flavio Cunha, Mark Duggan, Lars Hansen, Steven Levitt, Bin Li, Luigi Pistaferri, and Sergio Urzua on subsequent drafts. We single out Salvador Navarro and Edward Vytlacil for especially helpful comments. We are grateful to Flavio Cunha and Salvador Navarro for exceptional research assistance and hard work. This research is supported by NSF 97-09-873, SES-0099195, and NICHD-5RO1-HD34958. Heckman’s work was also supported by the American Bar Foundation and the Donner Foundation, Pedro Carneiro’s research was supported by Fundação Ciência and Tecnologia and Fundação Calouste Gulbenkian. Please address correspondence to: James J. Heckman, Department of Economics, University of Chicago, 1126 E. 59th Street, Chicago, IL 60637, USA. Tel: +773 702-0634. Fax: +773 702-8490. E-mail: jjh@uchicago.edu. 361

INTERNATIONAL<br />

ECONOMIC<br />

REVIEW<br />

May 2003<br />

Vol. 44, No. 2<br />

2001 LAWRENCE R. KLEIN LECTURE<br />

ESTIMATING DISTRIBUTIONS OF TREATMENT EFFECTS<br />

WITH AN APPLICATION TO THE RETURNS TO SCHOOLING<br />

AND MEASUREMENT OF THE EFFECTS OF UNCERTAINTY<br />

ON COLLEGE CHOICE ∗<br />

BY PEDRO CARNEIRO,KARSTEN T. HANSEN, AND JAMES J. HECKMAN 1<br />

Department <strong>of</strong> Economics, University <strong>of</strong> Chicago; Kellogg School<br />

<strong>of</strong> M<strong>an</strong>agement, Northwestern University; Department <strong>of</strong> Economics,<br />

University <strong>of</strong> Chicago <strong>an</strong>d The Americ<strong>an</strong> Bar Foundation<br />

This article uses factor models to identify <strong>an</strong>d estimate the distributions <strong>of</strong><br />

counterfactuals. We extend LISREL frameworks to a dynamic treatment effect<br />

setting, extending matching to account for unobserved conditioning variables.<br />

Using these models, we c<strong>an</strong> identify all pairwise <strong>an</strong>d joint treatment effects. We<br />

apply these methods to a model <strong>of</strong> schooling <strong>an</strong>d determine the intrinsic uncertainty<br />

facing agents at the time they make their decisions about enrollment in<br />

school. We go beyond the “Veil <strong>of</strong> Ignor<strong>an</strong>ce” in evaluating educational policies<br />

<strong>an</strong>d determine who benefits <strong>an</strong>d who loses from commonly proposed educational<br />

reforms.<br />

∗ M<strong>an</strong>uscript received October 2000; revised J<strong>an</strong>uary 2003.<br />

1 Previous versions <strong>of</strong> this paper were given at the Midwest Econometrics Group, Chicago, October<br />

2000, Washington University St. Louis, May, 2001, the Nordic Econometrics Meetings, May, 2001 <strong>an</strong>d<br />

workshops at Chicago August, 2002 <strong>an</strong>d St<strong>an</strong>ford, J<strong>an</strong>uary, 2003. A simple version <strong>of</strong> this paper is<br />

presented in Carneiro, H<strong>an</strong>sen, <strong>an</strong>d Heckm<strong>an</strong> (2001). A version <strong>of</strong> this paper was presented by Heckm<strong>an</strong><br />

as the Klein Lecture at the University <strong>of</strong> Pennsylv<strong>an</strong>ia, September, 28, 2001 <strong>an</strong>d also at the IFAU<br />

conference in Stockholm Sweden, October 2001. We are grateful to all workshop particip<strong>an</strong>ts. We especially<br />

th<strong>an</strong>k Mark Dugg<strong>an</strong>, Orazio Att<strong>an</strong>asio, <strong>an</strong>d Michael Ke<strong>an</strong>e for comments on the first draft <strong>of</strong> this<br />

paper. We have benefited from discussions <strong>with</strong> Ricardo Barros, Richard Blundell, Fr<strong>an</strong>cisco Buera,<br />

Flavio Cunha, Mark Dugg<strong>an</strong>, Lars H<strong>an</strong>sen, Steven Levitt, Bin Li, Luigi Pistaferri, <strong>an</strong>d Sergio Urzua<br />

on subsequent drafts. We single out Salvador Navarro <strong>an</strong>d Edward Vytlacil for especially helpful comments.<br />

We are grateful to Flavio Cunha <strong>an</strong>d Salvador Navarro for exceptional research assist<strong>an</strong>ce <strong>an</strong>d<br />

hard work. This research is supported by NSF 97-09-873, SES-0099195, <strong>an</strong>d NICHD-5RO1-HD34958.<br />

Heckm<strong>an</strong>’s work was also supported by the Americ<strong>an</strong> Bar Foundation <strong>an</strong>d the Donner Foundation,<br />

Pedro Carneiro’s research was supported by Fundação Ciência <strong>an</strong>d Tecnologia <strong>an</strong>d Fundação Calouste<br />

Gulbenki<strong>an</strong>. Please address correspondence to: James J. Heckm<strong>an</strong>, Department <strong>of</strong> Economics, University<br />

<strong>of</strong> Chicago, 1126 E. 59th Street, Chicago, IL 60637, USA. Tel: +773 702-0634. Fax: +773 702-8490.<br />

E-mail: jjh@uchicago.edu.<br />

361


362 CARNEIRO, HANSEN, AND HECKMAN<br />

1. INTRODUCTION<br />

The recent literature on evaluating social programs finds that persons (or firms<br />

or institutions) respond to the same policy differently (Heckm<strong>an</strong>, 2001). The distribution<br />

<strong>of</strong> responses is usually summarized by some me<strong>an</strong>. A variety <strong>of</strong> me<strong>an</strong>s c<strong>an</strong><br />

be defined depending on the conditioning variables used. Different me<strong>an</strong>s <strong>an</strong>swer<br />

different policy questions. There is no uniquely defined “effect” <strong>of</strong> a policy.<br />

The research reported here moves beyond me<strong>an</strong>s as descriptions <strong>of</strong> policy outcomes<br />

<strong>an</strong>d determines joint counterfactual distributions <strong>of</strong> outcomes for alternative<br />

interventions. From the knowledge <strong>of</strong> the joint distributions <strong>of</strong> counterfactual<br />

outcomes it is possible to determine the proportion <strong>of</strong> people who benefit or lose<br />

from making a particular policy choice (taking or not taking particular treatments),<br />

the origin <strong>an</strong>d destination outcomes <strong>of</strong> those who ch<strong>an</strong>ge states because <strong>of</strong> policy<br />

interventions, <strong>an</strong>d the amount <strong>of</strong> gain (or loss) from various policy choices by<br />

persons at different deciles <strong>of</strong> <strong>an</strong> initial prepolicy distribution. Our work builds<br />

on previous research by Heckm<strong>an</strong> <strong>an</strong>d Smith (1993, 1998) <strong>an</strong>d Heckm<strong>an</strong> et al.<br />

(1997) that uses experimental data to bound or point-identify joint counterfactual<br />

distributions. We extend the <strong>an</strong>alysis <strong>of</strong> Aakvik et al. (1999, 2003), who use<br />

factor models to identify counterfactual distributions to consider indicators for<br />

unobservables, implications from choice theory, <strong>an</strong>d to exploit the benefits <strong>of</strong><br />

p<strong>an</strong>el data.<br />

From the joint distribution <strong>of</strong> counterfactuals, it is possible to generate all me<strong>an</strong>,<br />

medi<strong>an</strong>, or other qu<strong>an</strong>tile gains, to identify all pairwise treatment effects in a<br />

multi-outcome setting, <strong>an</strong>d to determine how much <strong>of</strong> the variability in returns<br />

across persons comes from variability in the distributions <strong>of</strong> the outcome selected<br />

<strong>an</strong>d how much comes from variability in opportunity distributions. Using the<br />

joint distribution <strong>of</strong> counterfactuals, it is possible to develop a more nu<strong>an</strong>ced<br />

underst<strong>an</strong>ding <strong>of</strong> the distributional impacts <strong>of</strong> public policies, <strong>an</strong>d to move beyond<br />

comparisons <strong>of</strong> aggregate overall distributions induced by different policies to<br />

consider how people in different portions <strong>of</strong> <strong>an</strong> initial distribution are affected<br />

by public policy. We extend the <strong>an</strong>alysis <strong>of</strong> DiNardo et al. (1996) to consider<br />

self-selection as a determin<strong>an</strong>t <strong>of</strong> aggregate wage <strong>an</strong>d earnings distributions.<br />

Using our methods, we re<strong>an</strong>alyze the model <strong>of</strong> Willis <strong>an</strong>d Rosen (1979), who<br />

apply the Roy model (1951) to the economics <strong>of</strong> education. We extend their model<br />

to account for uncertainty in the returns to education. We also distinguish between<br />

present value income-maximizing <strong>an</strong>d utility-maximizing evaluations <strong>of</strong> schooling<br />

choices <strong>an</strong>d we estimate the net nonpecuniary benefit <strong>of</strong> attending college. We<br />

use information on the choices <strong>of</strong> agents to determine how much <strong>of</strong> the ex post<br />

heterogeneity in the return to schooling is forecastable at the time agents make<br />

their schooling choices. This procedure extends the <strong>an</strong>alysis <strong>of</strong> Flavin (1981) to<br />

a discrete choice setting. This allows us to identify the effect <strong>of</strong> uncertainty on<br />

schooling choices. Ex <strong>an</strong>te, there is a great deal <strong>of</strong> uncertainty regarding the returns<br />

to schooling (in utils or dollars). Ex post, 8% <strong>of</strong> college graduates regret going to<br />

college.<br />

The pl<strong>an</strong> <strong>of</strong> this article is as follows. Section 2 presents the essential idea underlying<br />

the identification strategy used in this article <strong>an</strong>d how our approach is


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 363<br />

related to previous work. Section 3 presents a general policy evaluation framework<br />

for counterfactual distributions <strong>with</strong> multiple treatments followed over time.<br />

The strategy pursued in this article is based on using low-dimensional factors to<br />

generate distributions <strong>of</strong> potential outcomes. We show how our methods generalize<br />

the method <strong>of</strong> matching by allowing some or all <strong>of</strong> the variables that generate<br />

the conditional independence assumed in matching to be unobserved by the <strong>an</strong>alyst.<br />

Section 4 introduces the factor models used in this article. Section 5 presents<br />

pro<strong>of</strong>s <strong>of</strong> semiparametric identification. Section 6 applies the <strong>an</strong>alysis to extend the<br />

Rosen–Willis model <strong>of</strong> college choice to account for uncertainty <strong>an</strong>d to estimate<br />

the information about future earnings available to agents at the time schooling<br />

decisions are made. Section 7 reports estimates <strong>of</strong> the distributions <strong>of</strong> returns to<br />

schooling, the components unforecastable by the agent at the time schooling decisions<br />

are made, <strong>an</strong>d the nonpecuniary net benefits from attending college. Section 8<br />

applies our estimates to evaluate a reform <strong>of</strong> the U.S. educational system. It illustrates<br />

the power <strong>of</strong> our method to lift the commonly invoked veil <strong>of</strong> ignor<strong>an</strong>ce<br />

<strong>an</strong>d move beyond aggregate distributions <strong>of</strong> outcomes to underst<strong>an</strong>d the consequences<br />

<strong>of</strong> public policies on persons in various parts <strong>of</strong> the overall distribution.<br />

Section 9 concludes. We first provide a brief introduction to the literature to put<br />

this article in context.<br />

2. ESTIMATING DISTRIBUTIONS OF COUNTERFACTUAL OUTCOMES<br />

In order to place the approach used in this article in the context <strong>of</strong> <strong>an</strong> emerging<br />

literature on heterogeneous treatment effects, it is helpful to motivate our work by<br />

a two-outcome, two-treatment cross section model. For simplicity, in this section it<br />

is assumed that the outcomes are continuous r<strong>an</strong>dom variables. The <strong>an</strong>alysis in the<br />

rest <strong>of</strong> this article is for multiple treatments <strong>an</strong>d multiple outcomes followed over<br />

time, <strong>an</strong>d the outcomes may be discrete, continuous, or mixed discrete-continuous.<br />

The agent c<strong>an</strong> experience one <strong>of</strong> two possible counterfactual states <strong>with</strong> associated<br />

outcomes (Y 0 , Y 1 ). The states are schooling levels in our empirical <strong>an</strong>alysis.<br />

X is a determin<strong>an</strong>t <strong>of</strong> the counterfactual outcomes (Y 0 , Y 1 ); S = 1 if the agent is<br />

in state 1; S = 0 otherwise. The observed outcome is Y = SY 1 + (1 − S)Y 0 . There<br />

may be <strong>an</strong> instrument (or set <strong>of</strong> instruments) Z such that (Y 0 , Y 1 ) ⊥⊥ Z | X <strong>an</strong>d<br />

Pr(S = 1 | Z, X) depends on Z for all X (so it is a nontrivial function <strong>of</strong> Z), i.e., Z<br />

is in the choice probability but not the outcome equation. (A ⊥⊥ B| C me<strong>an</strong>s A is<br />

independent <strong>of</strong> B given C). We show below that such a Z is not strictly required in<br />

our approach. The st<strong>an</strong>dard treatment effect model assumes policies (Z) that affect<br />

choices <strong>of</strong> treatment but not potential outcomes (Y 0 , Y 1 ). General equilibrium<br />

effects are ignored. 2<br />

The goal <strong>of</strong> our <strong>an</strong>alysis is to recover F(Y 0 , Y 1 | X). As noted in Heckm<strong>an</strong> (1992),<br />

Heckm<strong>an</strong> <strong>an</strong>d Smith (1993, 1998), <strong>an</strong>d Heckm<strong>an</strong> et al. (1997), from this joint distribution<br />

it is possible to estimate the proportion <strong>of</strong> people who benefit (in terms <strong>of</strong><br />

gross gains) from participation in the program (Pr(Y 1 > Y 0 | X)), gains to particip<strong>an</strong>ts<br />

at selected levels <strong>of</strong> the no-treatment distribution (F(Y 1 − Y 0 | Y 0 = y 0 , X)),<br />

2 See Heckm<strong>an</strong> et al. (1998a, 1998b, 1998c, 2000) for a treatment <strong>of</strong> general equilibrium policy<br />

evaluation.


364 CARNEIRO, HANSEN, AND HECKMAN<br />

or treatment distribution (F(Y 1 − Y 0 | Y 1 = y 1 , X)), the option value <strong>of</strong> social programs,<br />

<strong>an</strong>d a variety <strong>of</strong> other questions that c<strong>an</strong> be <strong>an</strong>swered using distributions <strong>of</strong><br />

potential outcomes including conventional me<strong>an</strong> treatment effects <strong>an</strong>d qu<strong>an</strong>tiles<br />

<strong>of</strong> the gains (Y 1 − Y 0 ) for those who receive treatment.<br />

The problem <strong>of</strong> recovering joint distributions arises because we observe Y 0 if<br />

S = 0 <strong>an</strong>d Y 1 if S = 1. Thus we know F(Y 0 | S = 0, X), F(Y 1 | S = 1, X) but not<br />

F(Y 0 | X) orF(Y 1 | X). In addition, we do not observe the pair (Y 0 , Y 1 ) for <strong>an</strong>yone.<br />

Thus, we c<strong>an</strong>not directly obtain F(Y 1 , Y 0 | S, X) from the data. Additional<br />

information is required to identify the joint distribution.<br />

There are, then, two separate problems. The first is a selection problem. From<br />

F(Y 1 | S = 1, X) <strong>an</strong>d F(Y 0 | S = 0, X), under what conditions c<strong>an</strong> one recover<br />

F(Y 1 | X) <strong>an</strong>d F(Y 0 | X), respectively? The second problem is how to construct<br />

the joint distribution F(Y 0 , Y 1 | X) from the two marginals.<br />

Assuming that the selection problem c<strong>an</strong> be surmounted, the classical probability<br />

results due to Fréchet (1951) <strong>an</strong>d Hoeffding (1940) show how to bound<br />

F(Y 1 , Y 0 | S, X) using the marginal distributions. In practice these bounds are very<br />

wide, <strong>an</strong>d the inferences based on the bounding distributions are <strong>of</strong>ten not useful. 3<br />

The traditional (pre-1985) approach to program evaluation in economics assumed<br />

that F(Y 0 , Y 1 | X) is degenerate because conditional on X, Y 1 <strong>an</strong>d Y 0 are<br />

deterministically related:<br />

(1)<br />

Y 1 ≡ Y 0 + (X)<br />

This is the “common effect” assumption that postulates that conditional on X,<br />

the treatment has the same effect on everyone. From the me<strong>an</strong>s <strong>of</strong> F(Y 0 | S =<br />

0, X) <strong>an</strong>d F(Y 1 | S = 1, X) corrected for selection, one c<strong>an</strong> identify E((X)) =<br />

E(Y 1 | X) − E(Y 0 | X). (See Heckm<strong>an</strong> <strong>an</strong>d Robb, 1985, 1986 (reprinted 2000) for<br />

a variety <strong>of</strong> estimators for this case <strong>an</strong>d for discussion <strong>of</strong> more general cases.)<br />

Heckm<strong>an</strong> <strong>an</strong>d Smith (1993, 1998) <strong>an</strong>d Heckm<strong>an</strong> et al. (1997) relax this assumption<br />

by assuming perfect r<strong>an</strong>king across different counterfactual outcome<br />

distributions. Assuming absolutely continuous <strong>an</strong>d strictly increasing marginal<br />

distributions, they postulate that qu<strong>an</strong>tiles are perfectly r<strong>an</strong>ked so Y 1 =<br />

F −1<br />

1,X (F 0,X(Y 0 )) where F 1,X = F 1 (y 1 | X) <strong>an</strong>d F 0,X = F 0 (y 0 | X). This assumption<br />

generates a deterministic relationship that turns out to be the tight upper bound<br />

<strong>of</strong> the Fréchet bounds. An alternative assumption is that people are perfectly<br />

inversely r<strong>an</strong>ked so the best in one distribution is the worst in the other: Y 1 =<br />

F −1<br />

1,X (1 − F 0,X(Y 0 )). This is the tight Fréchet lower bound. More generally, one c<strong>an</strong><br />

associate qu<strong>an</strong>tiles across distributions more freely. Heckm<strong>an</strong> et al. (1997) use<br />

Markov tr<strong>an</strong>sition kernels that stochastically map qu<strong>an</strong>tiles <strong>of</strong> one distribution<br />

into qu<strong>an</strong>tiles <strong>of</strong> <strong>an</strong>other. They define a pair <strong>of</strong> Markov kernels M(y 1 , y 0 | X) <strong>an</strong>d<br />

˜M(y 0 , y 1 | X) such that<br />

∫<br />

F 1 (y 1 | X) =<br />

∫<br />

F 0 (y 0 | X) =<br />

M(y 1 , y 0 | X)dF 0 (y 0 | X)<br />

˜M(y 0 , y 1 | X)dF 1 (y 1 | X)<br />

3 See Heckm<strong>an</strong> <strong>an</strong>d Smith (1993, 1998) <strong>an</strong>d Heckm<strong>an</strong> et al. (1997).


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 365<br />

Allowing these operators to be degenerate produces a variety <strong>of</strong> deterministic<br />

tr<strong>an</strong>sformations, including the two previously presented, as special cases <strong>of</strong> a general<br />

mapping. Different (M, ˜M) pairs produce different joint distributions. 4 These<br />

stochastic or deterministic tr<strong>an</strong>sformations supply the missing information needed<br />

to construct the joint distributions.<br />

A perfect r<strong>an</strong>king (or perfect inverse r<strong>an</strong>king) assumption is convenient. It<br />

generalizes the perfect-r<strong>an</strong>king, const<strong>an</strong>t-shift assumptions implicit in the conventional<br />

literature. It allows us to apply conditional qu<strong>an</strong>tile methods to estimate<br />

the distributions <strong>of</strong> gains. 5 However, it imposes a strong <strong>an</strong>d arbitrary dependence<br />

across distributions. Our empirical <strong>an</strong>alysis shows that this assumption is at odds<br />

<strong>with</strong> data on the returns to education.<br />

An alternative approach to constructing joint distributions due to Heckm<strong>an</strong><br />

<strong>an</strong>d Honoré (1990), Heckm<strong>an</strong> (1990), <strong>an</strong>d Heckm<strong>an</strong> <strong>an</strong>d Smith (1998) uses the<br />

economics <strong>of</strong> the model by assuming that<br />

(2)<br />

S = 1(µ s (Z) ≥ e s )<br />

where µ s (Z) is a me<strong>an</strong> net utility, Z ⊥⊥ e s , <strong>an</strong>d “1” is a logical indicator (=1ifthe<br />

argument is valid; =0 otherwise). In addition they assume that<br />

Y 1 = µ 1 (X) + U 1 , E(U 1 ) = 0<br />

Y 0 = µ 0 (X) + U 0 , E(U 0 ) = 0<br />

where (U 1 , U 0 ) ⊥⊥ (X, Z). 6 In the special case where S = 1(Y 1 ≥ Y 0 ) (the Roy<br />

model), Heckm<strong>an</strong> <strong>an</strong>d Honoré (1990) present conditions on µ 1 ,µ 0 , <strong>an</strong>d X such<br />

that F(U 1, U 0 ) <strong>an</strong>d µ 1 (X),µ 0 (X) <strong>an</strong>d hence F(Y 0 , Y 1 | X) are identified from data<br />

on choices (S), characteristics (X), <strong>an</strong>d observed outcomes Y = SY 1 + (1 − S)Y 0 .<br />

Buera (2002) extends their approach to nonseparable models <strong>with</strong> weaker exclusion<br />

restrictions.<br />

Heckm<strong>an</strong> (1990) <strong>an</strong>d Heckm<strong>an</strong> <strong>an</strong>d Smith (1998) consider more general decision<br />

rules <strong>of</strong> the form (2) under the assumption that (Z, X) ⊥⊥ (U 0 , U 1 , e s ) <strong>an</strong>d the<br />

further conditions (i) µ s (Z) is a nontrivial function <strong>of</strong> Z conditional on X <strong>an</strong>d (ii)<br />

full support assumptions on µ 1 (X),µ 0 (X), <strong>an</strong>d µ s (Z). They establish nonparametric<br />

identification <strong>of</strong> F(U 0 , e s ), F(U 1 , e s ) up to a scale for e s <strong>an</strong>d µ 1 (X),µ 0 (X),<br />

<strong>an</strong>d µ s (Z) suitably scaled. 7 Hence, under their assumptions, they c<strong>an</strong> identify<br />

F(Y 0 , S | X, Z) <strong>an</strong>d F(Y 1 , S | X, Z) but not the joint distributions F(Y 0 , Y 1 | X) or<br />

F(Y 0 , Y 1 , S | X, Z) unless the U 0 , U 1 , e s dependence is restricted.<br />

Aakvik et al. (1999, 2003) build on Heckm<strong>an</strong> (1990) <strong>an</strong>d Heckm<strong>an</strong> <strong>an</strong>d Smith<br />

(1998) by postulating a factor structure connecting (U 0 , U 1 , e s ). Our work builds<br />

4 Conditions under which (M, ˜M) determine the joint distribution are presented in Roz<strong>an</strong>ov (1982).<br />

5 See, e.g., Heckm<strong>an</strong> et al. (1997), or Athey <strong>an</strong>d Imbens (2002).<br />

6 Me<strong>an</strong> or medi<strong>an</strong> zero assumptions on (U 0 , U 1 ) are also used.<br />

7 See their articles for exact conditions. Heckm<strong>an</strong> <strong>an</strong>d Smith (1998) present the most general set <strong>of</strong><br />

conditions.


366 CARNEIRO, HANSEN, AND HECKMAN<br />

on their <strong>an</strong>alysis so we describe its essential idea. Suppose that the unobservables<br />

follow a factor structure,<br />

U 0 = α 0 θ + ε 0 , U 1 = α 1 θ + ε 1 , e s = α s θ + ε s<br />

where θ ⊥⊥ (ε 0 ,ε 1 ,ε s ) <strong>an</strong>d the ε’s are mutually independent. In their setup, θ<br />

is a scalar. θ c<strong>an</strong> be <strong>an</strong> unobservable trait like ability or motivation that affects<br />

all outcomes. Because the factor loadings, α 0 ,α 1 ,α s , may be different, the<br />

factors may affect outcomes <strong>an</strong>d choices differently. Recall that one c<strong>an</strong> identify<br />

F(U 0 , e s ) <strong>an</strong>d F(U 1 , e s ) under the conditions specified in Heckm<strong>an</strong> <strong>an</strong>d<br />

Smith (1998) <strong>an</strong>d generalized in Theorems 1–3 below. Thus, one c<strong>an</strong> identify<br />

cov(U 0 , e s ) = α 0 α s σθ 2 <strong>an</strong>d cov(U 1, e s ) = α 1 α s σθ 2 assuming finite vari<strong>an</strong>ces <strong>an</strong>d assuming<br />

E(θ) = 0, E(θ 2 ) = σθ 2. With some normalizations (e.g., σ θ 2 = 1,α s = 1),<br />

under conditions specified in Section 5, we c<strong>an</strong> nonparametrically identify the distribution<br />

<strong>of</strong> θ <strong>an</strong>d the distributions <strong>of</strong> ε 0 ,ε 1 ,ε s (the last up to scale). With α 1 ,α 0 ,α s<br />

<strong>an</strong>d the distributions <strong>of</strong> θ,ε 0 ,ε 1 ,ε s in h<strong>an</strong>d, we c<strong>an</strong> construct the joint distribution<br />

F(Y 0 , Y 1 | X). 8<br />

This article builds on this basic idea <strong>an</strong>d extends it to a more general setting.<br />

We consider a model <strong>with</strong> multiple factors, multiple treatments, <strong>an</strong>d multiple time<br />

periods. Outcome measures may be discrete or continuous. We follow the psychometric<br />

literature by adjoining measurement <strong>an</strong>d choice equations to outcome<br />

equations to pin down the distribution <strong>of</strong> θ. With this framework we c<strong>an</strong> estimate<br />

all pairwise treatment effects in a multiple outcome setting. We also consider the<br />

benefits for identification <strong>of</strong> having access to imperfect measurements on vector<br />

θ, which are observed for all persons independent <strong>of</strong> their treatment status. This<br />

model integrates the LISREL framework <strong>of</strong> Jöreskog (1977) into a model <strong>of</strong> discrete<br />

choice <strong>an</strong>d a model <strong>of</strong> multiple treatment effects. We develop this model<br />

in Section 4 after presenting a more general framework for counterfactuals <strong>an</strong>d<br />

treatment effects in a multi-outcome, possibly dynamic setting.<br />

3. POLICY COUNTERFACTUALS FOR THE MULTIPLE OUTCOME CASE<br />

This section defines policy counterfactuals for the multiple treatment case. For<br />

specificity, think <strong>of</strong> states as schooling levels <strong>an</strong>d different ages as periods in the<br />

life cycle. Associated <strong>with</strong> each state s (schooling level) is a vector <strong>of</strong> outcomes at<br />

age a for person ω ∈ (a set <strong>of</strong> indices) <strong>with</strong> elements<br />

(3)<br />

Y s,a (ω)<br />

s = 1,..., ¯S, a = 1,...,Ā,<br />

where there are ¯S states <strong>an</strong>d Ā ages. Associated <strong>with</strong> each person ω is a vector<br />

X(ω) <strong>of</strong> expl<strong>an</strong>atory variables.<br />

The ceteris paribus effect (or individual treatment effect) <strong>of</strong> a move from state<br />

s ′ at age a ′′ to state s at age a is<br />

8 Aakvik et al. (1999) present other sets <strong>of</strong> identifying assumptions.


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 367<br />

(4)<br />

((s, a), (s ′ , a ′′ ),ω) = Y s,a (ω) − Y s ′ ,a ′′(ω)<br />

Since it is usually not possible to observe the same person in both s <strong>an</strong>d s ′ , <strong>an</strong>alysts<br />

<strong>of</strong>ten focus on estimating various population level versions <strong>of</strong> these parameters<br />

for different conditioning sets. 9 In this article, we estimate the distributions <strong>of</strong><br />

potential outcomes <strong>an</strong>d parameters derived from these distributions, including<br />

the Average Treatment Effect,<br />

ATE((s, a), (s ′ , a ′′ ), x) = E(Y s,a − Y s ′ ,a′′ | X = x)<br />

<strong>an</strong>d the Marginal Treatment Effect, the average gain from moving from s ′ to s<br />

for those on the margin <strong>of</strong> indifference between s <strong>an</strong>d s ′ . We are interested in<br />

determining the joint distributions <strong>of</strong> the counterfactuals ((s, a), (s ′ , a ′′ ), x) for<br />

different conditioning sets.<br />

Associated <strong>with</strong> each treatment or state (schooling choice) is a choice equation<br />

associated <strong>with</strong> a level <strong>of</strong> lifetime utility: V s (ω), s = 1,..., ¯S. Utilities are assumed<br />

to be absolutely continuous. Agents select treatment states (schooling levels) ˜s to<br />

maximize utility:<br />

(5)<br />

˜s = arg max{V s (ω)} ¯S<br />

s=1<br />

s<br />

Associated <strong>with</strong> choices are expl<strong>an</strong>atory variables Z(ω). A distinctive feature <strong>of</strong><br />

the econometric approach to program evaluation is that it evaluates policies both<br />

in terms <strong>of</strong> objective outcomes (the Y s,a (ω)) <strong>an</strong>d in terms <strong>of</strong> subjective outcomes<br />

(the utilities <strong>of</strong> the agents making the choices). Both subjective <strong>an</strong>d objective<br />

evaluations are useful in evaluating policy. Choice theory is also used to guide <strong>an</strong>d<br />

rationalize specific choices <strong>of</strong> estimators. It enables us to separate variability from<br />

intrinsic uncertainty, as we demonstrate below.<br />

This framework is sufficiently general to encompass a variety <strong>of</strong> choice processes<br />

including sequential dynamic programming models 10 <strong>an</strong>d ordered choice<br />

models, 11 as well as more general unordered choice models. We let D s = 1if<br />

treatment s is selected. Since there are ¯S mutually exclusive states, ∑ ¯S<br />

s = 1 D s = 1.<br />

In this notation, the marginal treatment effect for choices s <strong>an</strong>d s ′ is<br />

(6)<br />

MTE(a, V s,s ′) = E(Y s,a − Y s ′ ,a | V s = V s ′ = V s,s ′ ≥ V j , j ≠ s, s ′ )<br />

It is the average gain <strong>of</strong> going from s ′ to s at age a for persons indifferent between<br />

s <strong>an</strong>d s ′ given that s <strong>an</strong>d s ′ are the best two choices in the choice set, <strong>an</strong>d that their<br />

level <strong>of</strong> utility is V s,s ′.<br />

9 Heckm<strong>an</strong> <strong>an</strong>d Smith (1998) <strong>an</strong>d Heckm<strong>an</strong> et al. (1999) discuss conditions under which it is possible<br />

to estimate (4).<br />

10 See Eckstein <strong>an</strong>d Wolpin (1999) <strong>an</strong>d Ke<strong>an</strong>e <strong>an</strong>d Wolpin (1997).<br />

11 See Cameron <strong>an</strong>d Heckm<strong>an</strong> (1998) <strong>an</strong>d H<strong>an</strong>sen et al. (2001).


368 CARNEIRO, HANSEN, AND HECKMAN<br />

Aggregating over choices s ′ = 1,..., ¯S; s ′ ≠ s, we may define the marginal treatment<br />

effect over all origin states as<br />

(7)<br />

MTE s<br />

(a, {V s,s ′} ¯S<br />

s ′ =1,s ′ ≠s<br />

)<br />

=<br />

¯S∑<br />

s ′ =1<br />

s ′ ≠s<br />

(<br />

/ (<br />

))<br />

MTE(a, V s,s ′) f (V s , V s ′ | V s = V s ′ = V s,s ′ ≥ V j , j ≠ s, s ′ ) ψ a, {V s,s ′}<br />

s S ′ =1,s ′ ≠s<br />

the weighted average <strong>of</strong> the pairwise marginal treatment effects from all source<br />

states to s (at a given level <strong>of</strong> utility V s,s ′) <strong>with</strong> the weights being the density <strong>of</strong><br />

persons at each relev<strong>an</strong>t margin for specified values <strong>of</strong> utility where<br />

ψ ( a, {V s,s ′} ¯S )<br />

¯S∑<br />

s ′ = 1,s ′ ≠s = f (V s , V s ′ | V s = V s ′ = V s,s ′ ≥ V j , j ≠ s, s ′ )<br />

s ′ = 1<br />

s ′ ≠s<br />

is a normalizing const<strong>an</strong>t (the population density <strong>of</strong> people at all margins given a<br />

<strong>an</strong>d {V s,s ′}<br />

s S ′ =1,s ′ ≠s<br />

), assumed to be positive.<br />

We next present a framework for estimating the distributions <strong>of</strong> the treatment<br />

effects <strong>an</strong>d the parameters derived from them, which allows us to estimate the<br />

parameters defined in this section as well as other parameters. To simplify the<br />

notation, we suppress the ω argument in the rest <strong>of</strong> the article.<br />

4. FACTOR STRUCTURE MODELS<br />

The strategy adopted in this article identifies the distribution <strong>of</strong> counterfactuals<br />

by postulating a low-dimensional set <strong>of</strong> factors θ so that, conditional on them<br />

<strong>an</strong>d the covariates X <strong>an</strong>d Z, the Y s,a <strong>an</strong>d V s ′ are jointly independent for all s, s ′ ,<br />

<strong>an</strong>d a. The distributions <strong>of</strong> the components <strong>of</strong> θ are nonparametrically identified<br />

under the conditions specified below. With these distributions in h<strong>an</strong>d, it is possible<br />

to construct the distribution <strong>of</strong> counterfactuals. Under the conditions specified in<br />

Section 5, it is possible <strong>with</strong> low-dimensional factors to nonparametrically identify<br />

the counterfactual distributions <strong>an</strong>d to estimate all <strong>of</strong> the treatment effects in the<br />

literature suitably extended to multidimensional versions.<br />

Throughout this article we <strong>an</strong>alyze a separable-in-the-errors system. Thus, preferences<br />

c<strong>an</strong> be described by<br />

(8)<br />

V s = µ s (Z) − e s<br />

s = 1,..., ¯S.<br />

It is conventional to assume that µ s (Z) = Z ′ β s <strong>with</strong> s = 1,..., ¯S. Linear approximations<br />

to value functions are advocated by Heckm<strong>an</strong> (1981) <strong>an</strong>d Eckstein <strong>an</strong>d<br />

Wolpin (1989) <strong>an</strong>d are developed systematically in Geweke et al. (2001). Our approach<br />

does not require linearity but critically relies on separability between the<br />

deterministic portion <strong>of</strong> the model <strong>an</strong>d the errors e s . Following Heckm<strong>an</strong> (1981),


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 369<br />

Cameron <strong>an</strong>d Heckm<strong>an</strong> (1987, 1998), <strong>an</strong>d McFadden (1984), write<br />

(9)<br />

e s = α ′ s θ + ε s<br />

where θ is a K × 1 vector <strong>of</strong> mutually independent factors (θ l ⊥⊥ θ l ′,l≠ l ′ ) <strong>an</strong>d<br />

define ε s = (ε 1 ,...,ε S )<br />

(10)<br />

θ ⊥⊥ ε s ε s ⊥⊥ ε s ′ ∀ s, s ′ = 1,..., ¯S <strong>an</strong>d s ≠ s ′<br />

E(θ) = 0; E(ε s ) = 0; D k = 1ifV k is maximal in {V s (Z)}<br />

s=1 S .12<br />

Potential outcomes at age a, Ys,a ∗ , are stochastically dependent on each other<br />

<strong>an</strong>d the choices only through their dependence on the observables X, Z <strong>an</strong>d the<br />

factors θ:<br />

(11)<br />

Y ∗<br />

s,a = µ s,a(X) + α ′ s,a θ + ε s,a<br />

where E(ε s,a ) = 0.<br />

Potential outcomes are separable in observables <strong>an</strong>d unobservables. A<br />

linear-in-parameters version is written as µ s,a (X) = X ′ β s,a .Define ε Y = (ε 1,1 ,<br />

...,ε 1,A ,...,ε s,1 ,...,ε s,A ,...,ε S,A )as<br />

(12)<br />

θ ⊥⊥ ε Y<br />

(13)<br />

ε s,a ⊥⊥ ε s ′ ,a ′′; ∀ s ≠ s′ ; ∀ a, a ′′<br />

<strong>an</strong>d<br />

(14)<br />

(15)<br />

ε s,a ⊥⊥ ε s ′; ∀ s ′ , s = 1,..., ¯S; a = 1,...,Ā<br />

(Z, X) ⊥⊥ (θ,ε Y ,ε s )<br />

The Ys,a ∗ may be vector valued.<br />

When the outcome is continuous, the observed value corresponds to the latent<br />

variable (Y s,a = Ys,a ∗ ). When the outcome is discrete (e.g., employment status), we<br />

interpret Ys,a ∗ in (11) as a latent variable. In that case, Y s,a is <strong>an</strong> indicator function<br />

Y s,a = 1(Ys,a ∗ ≥ 0). Tobit <strong>an</strong>d other censored cases c<strong>an</strong> be accommodated. Other<br />

mixed discrete-continuous cases c<strong>an</strong> be h<strong>an</strong>dled in a conventional fashion. 13<br />

One motivation for the factor representation is that agents may observe components<br />

<strong>of</strong> θ (or variables that sp<strong>an</strong> those components) <strong>an</strong>d act on them (e.g.,<br />

choose schooling levels), whereas the econometrici<strong>an</strong> does not observe θ. Below,<br />

we present methods for testing whether agents observe some or all components <strong>of</strong><br />

12 In the case <strong>of</strong> ties, use the choice <strong>with</strong> the lowest index.<br />

13 See H<strong>an</strong>sen et al. (2003) for duration models <strong>with</strong> general forms <strong>of</strong> dependence functions generated<br />

by this type <strong>of</strong> model.


370 CARNEIRO, HANSEN, AND HECKMAN<br />

θ. Conditional on θ <strong>an</strong>d X, the potential outcomes are independent. If (12)–(15)<br />

accurately describe the data-generating process, we obtain the conditional independence<br />

assumptions used in matching (see, e.g., Cochr<strong>an</strong>e <strong>an</strong>d Rubin, 1973;<br />

Rosenbaum <strong>an</strong>d Rubin, 1983).<br />

In matching it is assumed that Y s,a ⊥⊥ D s | X, Z,θ for all s. 14 From this assumption,<br />

we c<strong>an</strong> identify ATE from the right-h<strong>an</strong>d side <strong>of</strong> the following expression for<br />

continuous observed outcomes, which c<strong>an</strong> be constructed if θ is observable,<br />

E(Y s,a − Y s ′ ,a | X, Z,θ) = E(Y s,a | X, Z,θ,D s = 1)<br />

− E(Y s ′ ,a | X, Z,θ,D s ′ = 1)<br />

In this case treatment on the treated, ATE, <strong>an</strong>d MTE are the same parameter<br />

conditional on θ, X, <strong>an</strong>d Z (Heckm<strong>an</strong>, 2001; Aakvik et al., 2003). Our framework<br />

differs from matching by allowing the factors that generate the conditional independence<br />

that underlies matching to be unobserved by the <strong>an</strong>alyst. In this sense,<br />

our approach is more robust th<strong>an</strong> matching. The price for this robustness is the<br />

assumed independence between θ <strong>an</strong>d (X, Z).<br />

Factor structure models are notorious for being identified by arbitrary normalization<br />

<strong>an</strong>d exclusion restrictions. To reduce this arbitrariness <strong>an</strong>d render greater<br />

interpretability to estimates obtained from our model, we adjoin a measurement<br />

system to choice equations (8) <strong>an</strong>d outcome system (11). Various measurements<br />

c<strong>an</strong> be interpreted as indicators <strong>of</strong> specific factors (e.g., test scores may proxy<br />

ability). Having measurements on the factors also facilitates identifiability under<br />

weaker assumptions as we demonstrate in Section 5. However, measurements are<br />

not strictly required for identification in our model. Outcome, measurement, <strong>an</strong>d<br />

choice equations are interch<strong>an</strong>geable sources <strong>of</strong> identification in a sense that we<br />

make precise in Section 5.<br />

Consider a system <strong>of</strong> L measurements on the K factors, initially assumed to be<br />

for continuous outcome measures:<br />

(16)<br />

M 1 = µ 1 (X) + β 11 θ 1 +···+β 1K θ K + ε M 1<br />

.<br />

M L = µ L (X) + β L1 θ 1 +···+β LK θ K + εL<br />

M<br />

ε M = (ε M 1 ,...,εM L ), E(εM ) = 0 <strong>an</strong>d where we assume θ ⊥⊥ ε M ,θ ⊥⊥ ε s ,ε s ⊥⊥<br />

(ε M ,ε Y ),ε M i<br />

⊥⊥ ε M j<br />

∀ i ≠ j, <strong>an</strong>d i, j = 1,...,L. For interpretability, we assume<br />

θ i ⊥⊥ θ j , ∀ i ≠ j, i, j = 1,...,K. We develop the case <strong>with</strong> discrete measurements<br />

on latent continuous variables in Section 5. One c<strong>an</strong> think <strong>of</strong> the outcome measures<br />

as <strong>an</strong> s-dependent measurement system. The measures (16) are the same<br />

across all s.<br />

Measurement system (16) allows for fallible measures <strong>of</strong> outcomes. Thus, in<br />

our schooling choice <strong>an</strong>alysis we are not committed to the infallibility <strong>of</strong> test<br />

14 Strictly speaking, matching models do not distinguish between X <strong>an</strong>d Z. See Heckm<strong>an</strong> <strong>an</strong>d<br />

Navarro (2003).


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 371<br />

scores as measurements <strong>of</strong> ability. Measurement system (16) allows us to proxy<br />

unobservables accounting for measurement error <strong>an</strong>d hence enables us to improve<br />

on the proxy procedure <strong>of</strong> Olley <strong>an</strong>d Pakes (1996), which assumes no measurement<br />

error.<br />

4.1. Choice Equations. Our <strong>an</strong>alysis applies to both ordered discrete choice<br />

models <strong>an</strong>d unordered choice models as <strong>an</strong>alyzed by Cameron <strong>an</strong>d Heckm<strong>an</strong><br />

(1998) <strong>an</strong>d H<strong>an</strong>sen et al. (2001). In this article, we focus attention on a new ordered<br />

choice model. Other choice models c<strong>an</strong> easily be accommodated in our framework<br />

<strong>an</strong>d richer models are a source <strong>of</strong> additional identifying information. 15<br />

For <strong>an</strong> ordered discrete choice model, let utility index I be written as<br />

(17)<br />

I = ϕ(Z) + ε W , ε W = γ ′ θ + ε I , σ 2 W = γ ′ ∑ θ γ + σ 2 I<br />

where E(ε 2 I ) = σ I 2, <strong>an</strong>d ∑ θ<br />

is the covari<strong>an</strong>ce matrix <strong>of</strong> θ. A linear-in-parameters<br />

version that is the one developed in this article is written as ϕ(Z) = Zη. Choices<br />

are generated by index ϕ(Z) falling in various intervals.<br />

(18)<br />

D 1 = 1 if−∞< I ≤ c 1<br />

D s = 1 ⇔ c s−1 < I ≤ c s s = 2,..., ¯S − 1<br />

D S = 1 ifc S−1 < I < ∞<br />

where c 0 =−∞. It is required that c s ≥ c s−1 for all s ≥ 2. This is a special case<br />

<strong>of</strong> r<strong>an</strong>dom utility model (8) in which states are ordered <strong>an</strong>d pairwise contrasts<br />

possess a special structure. 16<br />

We c<strong>an</strong> parameterize the c s to be functions <strong>of</strong> state-specific regressors, e.g., c s =<br />

Q s ρ s where we restrict c s ≥ c s−1 . We could also follow a suggestion in Heckm<strong>an</strong><br />

et al. (1999) to incorporate one-sided shocks ν s <strong>an</strong>d work <strong>with</strong> stochastic thresholds<br />

˜c s in place <strong>of</strong> c s : ˜c s = c s + ν s , s = 1,...,S − 1, where ν s ≥ ν s−1 <strong>an</strong>d ν s ≥ 0. 17<br />

15 See Heckm<strong>an</strong> <strong>an</strong>d Navarro (2001) for a comparison among alternative models <strong>of</strong> completed<br />

schooling. H<strong>an</strong>sen et al. (2003) develop a parallel <strong>an</strong>alysis for a one-factor multinomial choice model.<br />

16 Specifically, it is assumed for (8) that µ s (Z ) is concave in s for each Z (Cameron <strong>an</strong>d Heckm<strong>an</strong>,<br />

1998), that<br />

e s − e s−1 = τ s = 2,...,S<br />

<strong>with</strong> e 1 as <strong>an</strong> initial condition, that<br />

(∗) µ s (Z ) − µ s−1 (Z ) = ϕ(Z ) + c s−1<br />

<strong>with</strong> µ 1 (Z ) as <strong>an</strong> initial condition <strong>an</strong>d that c s ≥ c s−1 for all s = 1,..., ¯S. Ch<strong>an</strong>ges in utilities<br />

across states are independent <strong>of</strong> s, except for <strong>an</strong> intercept. Then in (17) ε W = τ + e 1 . If we set all <strong>of</strong><br />

the i.i.d. components <strong>of</strong> (9) to zero (the uniquenesses ε s ) we get the ordered probit model. As noted<br />

in the text, <strong>an</strong>d developed in Heckm<strong>an</strong> <strong>an</strong>d Navarro (2001), we c<strong>an</strong> generalize this model to allow<br />

e s − e s−1 = τ + χ s where χ s ≥ 0 is a one-sided r<strong>an</strong>dom variable <strong>an</strong>d still secure identification. The<br />

requirement (∗) precludes a strict r<strong>an</strong>dom utility model because preferences are state specific. (The<br />

strict r<strong>an</strong>dom utility model requires that µ s (Z ) not depend on s but Z c<strong>an</strong> vary across s. See, e.g.,<br />

Matzkin, 1993).<br />

17 Write ν s = ∑ s<br />

j=2 ρ j , where ρ j ⊥ ρ j ′( j ≠ j ′ ),ρ j ⊥ ε W ,ρ j ⊥ (Z, Q),ρ j ≥ 0,ϕ(Z ) = Z ′ η. This<br />

model is identified under the assumptions in Cameron <strong>an</strong>d Heckm<strong>an</strong> (1998) even <strong>with</strong>out <strong>an</strong>y


372 CARNEIRO, HANSEN, AND HECKMAN<br />

Conditioning on Q s = q s , s = 1,..., ¯S, <strong>an</strong>d assuming that the Support (Z| Q s =<br />

q s , s = 1,..., ¯S) = Support(ε W ), we c<strong>an</strong> apply the conditions presented in<br />

Cameron <strong>an</strong>d Heckm<strong>an</strong> (1998) to identify the distribution <strong>of</strong> F εW ,η,c 1 ,...,c S−1<br />

up to scale. We c<strong>an</strong> nonparametrically identify c s (Q s ) over the support <strong>of</strong> Q s under<br />

conditions specified in Theorem 2 below. Unlike the case <strong>of</strong> the more general<br />

unordered discrete choice model (see Elrod <strong>an</strong>d Ke<strong>an</strong>e, 1995; Ben Akiva et al.,<br />

2001), <strong>with</strong>out further restrictions on the distribution <strong>of</strong> ε W , we c<strong>an</strong>not identify<br />

the factors generating ε W using only choice data. H<strong>an</strong>sen et al. (2003) present <strong>an</strong><br />

<strong>an</strong>alysis parallel to the one given here for a multinomial probit model. In that<br />

model, the distributions <strong>of</strong> factors c<strong>an</strong> be identified from choice data.<br />

4.2. Models for Factors. Factor models are notorious for being identified<br />

through arbitrary assumptions about how factors enter in different equations.<br />

This led to their disuse after their introduction into economics by Jöreskog <strong>an</strong>d<br />

Goldberger (1975), Goldberger (1972), Chamberlain <strong>an</strong>d Griliches (1975), <strong>an</strong>d<br />

Chamberlain (1977a, 1977b).<br />

The essential identification problem in factor <strong>an</strong>alysis is clearly stated by<br />

Anderson <strong>an</strong>d Rubin (1956). If there are L measurements on K mutually independent<br />

factors arrayed in a vector θ, we may write outcomes G in terms <strong>of</strong><br />

latent variables θ as<br />

(19)<br />

G = µ + θ + ε<br />

where G is L × 1,θ ⊥⊥ ε, µ is <strong>an</strong> L × 1 vector <strong>of</strong> me<strong>an</strong>s, which may depend on<br />

X,θ is K × 1,ε is L × 1, <strong>an</strong>d is L × K. ε i ⊥⊥ ε j , i, j = 1,...,L, i ≠ j. At this<br />

exclusion restrictions, so Q s c<strong>an</strong> just include <strong>an</strong> intercept. The pro<strong>of</strong> is trivial. Normalize ρ 1 = 0. From<br />

the first choice we compute,<br />

Pr(D 1 = 1 | Z) = Pr(Z ′ η + ε W ≤ c 1 )<br />

so we c<strong>an</strong> identify f (ε W ) <strong>an</strong>d η up to scale σ W , assuming ε W <strong>an</strong>d the ρ j have densities <strong>with</strong><br />

respect to the Lebesgue measure <strong>an</strong>d nonv<strong>an</strong>ishing characteristic function in addition to other<br />

st<strong>an</strong>dard regularity conditions. We suppress the intercept in Z. One c<strong>an</strong>not distinguish the intercept<br />

from c 1 . Proceeding to further choices we obtain<br />

Pr(D 1 + D 2 = 1|Z) = Pr(Z ′ η + ε W ≤ c 2 + ν 2 )<br />

= Pr(ε W − ν 2 ≤ c 2 − Z ′ η)<br />

Therefore, we c<strong>an</strong> identify f (ε W − ν 2 ) <strong>an</strong>d c 2 up to scale σ εW −ν 2<br />

. The scale is determined by the first<br />

normalization (relative to σ εW ). We c<strong>an</strong> estimate σε W − ν 2<br />

σ εW<br />

= ( σ ε 2 + σ 2 W ν 2<br />

) 1/2 by taking the ratio <strong>of</strong> the<br />

σε 2 W<br />

normalized η from the second choice probability to the normalized ratio <strong>of</strong> η from the first choice probability<br />

for <strong>an</strong>y coordinate <strong>of</strong> η.Define ψ(X) as the characteristic function <strong>of</strong> X. From the assumed independence<br />

<strong>of</strong> ε W <strong>an</strong>d ν 2 ,ψ(ε W − ν 2 ) = ψ(ε W )ψ(−ν 2 ). Therefore, we c<strong>an</strong> identify ψ(ε W − ν 2 )<br />

= ψ(−ν ψ(ε W ) 2 ),<br />

<strong>an</strong>d we c<strong>an</strong> determine f (−ν 2 ) from the convolution theorem adopting a normalization for σ W .<br />

Proceeding sequentially, we obtain Pr(D 1 + D 2 + D 3 +···+D k = 1 |Z) = Pr(Z ′ η + ε W ≤ c k + ν k ),<br />

<strong>an</strong>d c<strong>an</strong> identify c k <strong>an</strong>d f (ν k ) up to the normalization given in the first step. From f (ν k ), we c<strong>an</strong> use<br />

deconvolution to identify f (ρ j ), j = 2,...,S. See Heckm<strong>an</strong> <strong>an</strong>d Navarro (2001) for further details<br />

<strong>an</strong>d extensions to factor models. Nowhere in this <strong>an</strong>alysis do we use the assumption that Q s contains<br />

regressors.


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 373<br />

point, ε is a general notation that will be linked to specific ε’s in Section 5. Even<br />

if θ i ⊥⊥ θ j , i ≠ j, i, j = 1,...,K, the model is underidentified. As we will see,<br />

the G in this article is a more general system th<strong>an</strong> the system based solely on<br />

measurements invari<strong>an</strong>t across states M so we distinguish (16) <strong>an</strong>d (19). It will<br />

include M as well as state-dependent outcomes (Ys,a ∗ ) <strong>an</strong>d the indices generating<br />

choice equations.<br />

Using only the information in the covari<strong>an</strong>ce matrices, as is common in factor<br />

<strong>an</strong>alysis,<br />

(20)<br />

cov(G) = θ ′ + D ε<br />

where θ is a diagonal matrix <strong>of</strong> the vari<strong>an</strong>ces <strong>of</strong> the factors, <strong>an</strong>d D ε is a diagonal<br />

matrix <strong>of</strong> the “uniqueness” vari<strong>an</strong>ces. We observe G but not θ or ε, <strong>an</strong>d we seek to<br />

identify , θ , <strong>an</strong>d D ε . Without some restrictions, this is clearly <strong>an</strong> impossible task.<br />

Conventional factor-<strong>an</strong>alytic models make assumptions to identify parameters.<br />

The restriction that the components <strong>of</strong> θ are independent is one restriction that we<br />

have already made, but it is not enough. The diagonals <strong>of</strong> cov(G) combine elements<br />

<strong>of</strong> D ε <strong>with</strong> parameters from the rest <strong>of</strong> the model. Once those other parameters are<br />

L(L− 1)<br />

2<br />

determined, the diagonals identify D ε . Accordingly, we c<strong>an</strong> only rely on the<br />

nondiagonal elements to identify the K vari<strong>an</strong>ces (assuming θ i ⊥⊥ θ j , ∀i ≠ j) <strong>an</strong>d<br />

the L × K factor loadings. Since the scale <strong>of</strong> each θ i is arbitrary, one factor loading<br />

devoted to each factor is normalized to unity to set the scale. Accordingly, we<br />

require that<br />

so<br />

L(L − 1)<br />

≥ (L × K − K) +<br />

} {{<br />

2<br />

}<br />

} {{ } }{{}<br />

K<br />

Number <strong>of</strong> unrestricted Vari<strong>an</strong>ces <strong>of</strong> θ<br />

Number <strong>of</strong> <strong>of</strong>f-diagonal covari<strong>an</strong>ce elements<br />

L ≥ 2K + 1<br />

is a necessary condition for identification.<br />

The strategy pursued in this article is tr<strong>an</strong>sparent <strong>an</strong>d assumes that there are two<br />

or more elements <strong>of</strong> G devoted exclusively to factor θ 1 , <strong>an</strong>d at least three elements<br />

<strong>of</strong> G that are generated by factor θ 1 , two or more other elements <strong>of</strong> G devoted<br />

only to factors θ 1 <strong>an</strong>d θ 2 , <strong>with</strong> at least three elements <strong>of</strong> G that depend on θ 1<br />

<strong>an</strong>d θ 2 , <strong>an</strong>d so forth. This strategy is motivated by our access to psychometric <strong>an</strong>d<br />

longitudinal data. Test scores may only proxy ability (θ 1 ). Other measurements<br />

may proxy only (θ 1 ,θ 2 ). Measurements on earnings from p<strong>an</strong>el data may proxy<br />

(θ 1 ,θ 2 ,θ 3 ), etc.<br />

Order G under this assumption so that we get the following pattern for (we<br />

assume that the displayed λ ij are not zero):


374 CARNEIRO, HANSEN, AND HECKMAN<br />

(21)<br />

⎛<br />

⎞<br />

1 0 0 0<br />

. ... ... 0<br />

λ 21 0 0 0<br />

. ... ... 0<br />

λ 31 1 0 0<br />

. ... ... 0<br />

λ 41 λ 42 0 0<br />

. ... ... 0<br />

λ<br />

= 51 λ 52 1 0<br />

. ... ... 0<br />

.<br />

λ 61 λ 62 λ 63 0<br />

. ... ... 0<br />

λ 71 λ 72 λ 73 1<br />

. 0 ... 0<br />

. λ 81 λ 82 λ 83 λ .. 84 0 ... 0<br />

⎜<br />

⎝<br />

... ... ... ...<br />

. ... ... ... ⎟<br />

⎠<br />

λ L,1 λ L,2 λ L,3 ...<br />

. ... ... λ L,K<br />

Assuming nonzero covari<strong>an</strong>ces<br />

In particular<br />

cov(g j , g l ) = λ j1 λ l1 σ 2 θ 1<br />

, l = 1, 2; j = 1,...,L; j ≠ l<br />

Assuming λ l1 ≠ 0, we obtain<br />

cov(g 1 , g l ) = λ l1 σ 2 θ 1<br />

cov(g 2 , g l ) = λ l1 λ 21 σ 2 θ 1<br />

cov(g 2 , g l )<br />

cov(g 1 , g l ) = λ 21<br />

Hence, from cov(g 1 , g 2 ) = λ 21 σ 2 θ 1<br />

, we obtain σ 2 θ 1<br />

, <strong>an</strong>d hence λ l1 ,l= 1,...,L. We<br />

c<strong>an</strong> proceed to the next set <strong>of</strong> two measurements <strong>an</strong>d identify<br />

cov(g l , g j ) = λ l1 λ j1 σ 2 θ 1<br />

+ λ l2 λ j2 σ 2 θ 2<br />

, l = 3, 4; j ≥ 3; j ≠ l<br />

Since we know the first term on the right-h<strong>an</strong>d side by the previous argument,<br />

we c<strong>an</strong> proceed using cov(g l , g j ) − λ l1 λ j1 σ 2 θ 1<br />

<strong>an</strong>d identify the λ j2 , j = 1,...,L using<br />

the previous line <strong>of</strong> reasoning (some <strong>of</strong> these elements are fixed to zero).<br />

Proceeding in this fashion, we c<strong>an</strong> identify <strong>an</strong>d θ subject to diagonal normalizations.<br />

This argument works for all but the system for the Kth <strong>an</strong>d final factor.<br />

Observe that for all <strong>of</strong> the preceding factors there are at least three measurements<br />

that depend on θ j , j = 1,...,K − 1, although only two <strong>of</strong> the measurements<br />

need to depend solely on θ 1 ,...,θ K−1 , respectively. To obtain the necessary three


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 375<br />

measurements for the Kth <strong>an</strong>d final factor, we require that there be at least three<br />

outcomes <strong>with</strong> measurements that depend on θ 1 ,...,θ K .<br />

Knowing <strong>an</strong>d θ , we c<strong>an</strong> identify D ε . Use <strong>of</strong> dedicated measurement systems<br />

for specific factors <strong>an</strong>d p<strong>an</strong>el data helps to eliminate much <strong>of</strong> the arbitrariness<br />

that plagued factor <strong>an</strong>alysis during its introduction in economics in the 1970s.<br />

Although m<strong>an</strong>y other restrictions on the model are possible, the one we adopt has<br />

the adv<strong>an</strong>tage <strong>of</strong> simplicity <strong>an</strong>d interpretability in m<strong>an</strong>y contexts. 18<br />

Our <strong>an</strong>alysis uses a version <strong>of</strong> (19), coupled <strong>with</strong> the exclusion restrictions exemplified<br />

in (21), to identify the joint distributions <strong>of</strong> counterfactuals. We extend<br />

conventional factor <strong>an</strong>alysis in three ways. First, following Heckm<strong>an</strong> (1981) <strong>an</strong>d<br />

Muthen (1984), we allow the G to include latent index functions like I (associated<br />

<strong>with</strong> the choice equations) or like Ys,a ∗ as well as their m<strong>an</strong>ifestations (the r<strong>an</strong>dom<br />

variables they generate). Thus, the G may include discrete or censored r<strong>an</strong>dom<br />

variables generated by latent r<strong>an</strong>dom variables. We c<strong>an</strong> identify components associated<br />

solely <strong>with</strong> the discrete case only up to unknown scale factors—the familiar<br />

indeterminacy in discrete choice <strong>an</strong>alysis. Choice indices, measurements, <strong>an</strong>d state<br />

contingent outcomes are all informative on θ. The factor <strong>an</strong>alysis in this article<br />

is conducted on the latent continuous variables that generate the m<strong>an</strong>ifest outcomes.<br />

Second, we extend factor <strong>an</strong>alysis to a case <strong>with</strong> counterfactuals where<br />

certain variables are only observed if state s is observed. This extension enables<br />

us to identify the full joint distribution <strong>of</strong> counterfactuals. Third, we prove nonparametric<br />

identification <strong>of</strong> the distributions <strong>of</strong> θ <strong>an</strong>d ε, <strong>an</strong>d do not rely on <strong>an</strong>y<br />

normality assumptions.<br />

18 Other normalizations are possible. All require that there be at least three measurements on<br />

each factor, although we c<strong>an</strong> get by <strong>with</strong> only one dedicated measurement. Consider the following<br />

example (due to Salvador Navarro): Let L = 5, K = 2.<br />

g 1 = θ 1 + ε 1 , g 2 = λ 21 θ 1 + θ 2 + ε 2<br />

g 3 = λ 31 θ 1 + λ 32 θ 2 + ε 3 , g 4 = λ 41 θ 1 + λ 42 θ 2 + ε 4<br />

g 5 = λ 51 θ 1 + λ 52 θ 2 + ε 5<br />

Assuming nonv<strong>an</strong>ishing covari<strong>an</strong>ces <strong>an</strong>d factor loadings,<br />

λ 32 = cov(g 1, g 5 )cov(g 3 , g 4 ) − cov(g 3 , g 5 )cov(g 1 , g 4 )<br />

cov(g 2 , g 4 )cov(g 1 , g 5 ) − cov(g 1 , g 4 )cov(g 2 , g 5 )<br />

if λ 22 λ 42 λ 51 − λ 41 λ 52 ≠ 0. Then<br />

if λ 31 ≠ λ 32 λ 21 .<br />

λ 41 = cov(g 3, g 4 ) − cov(g 2 , g 4 )λ 32<br />

cov(g 1 , g 3 ) − cov(g 1 , g 2 )λ 32<br />

λ 21 = cov(g 1, g 2 )λ 41<br />

cov(g 1 , g 4 )<br />

,λ 31 = cov(g 1, g 3 )λ 41<br />

,λ 51 = cov(g 1, g 5 )λ 41<br />

cov(g 1 , g 4 ) cov(g 1 , g 4 )<br />

σ 2 θ 1<br />

= cov(g 1, g 4 )<br />

λ 41<br />

,σ 2 θ 2<br />

= cov(g 2, g 3 ) − λ 21 λ 31 σ 2 θ 1<br />

λ 32<br />

λ 42 = cov(g 2, g 4 ) − λ 21 λ 41 σ 2 θ 1<br />

σ 2 θ 2<br />

,λ 52 = cov(g 2, g 5 ) − λ 21 λ 51 σ 2 θ 1<br />

σ 2 θ 2


376 CARNEIRO, HANSEN, AND HECKMAN<br />

5. IDENTIFICATION OF SEMIPARAMETRIC FACTOR MODELS WITH DISCRETE<br />

CHOICES AND DISCRETE AND CONTINUOUS OUTCOMES<br />

In order to establish identification, we need to be clear about the raw data<br />

<strong>with</strong> which we are working. For each set <strong>of</strong> s-contingent potential outcomes, there<br />

is a system like (19): ˜G s = (M, Y s , D s ) where Y s is a vector <strong>of</strong> state contingent<br />

outcomes. Outcome variables in ˜G s are <strong>of</strong> two types: (a) continuous variables<br />

<strong>an</strong>d (b) discrete or censored r<strong>an</strong>dom variables, including binary strings associated<br />

<strong>with</strong> durations (e.g., unemployment). When the r<strong>an</strong>dom variables are discrete<br />

or censored, we work <strong>with</strong> the latent variables generating them. We array the<br />

continuous portions <strong>of</strong> ˜G s <strong>an</strong>d the index functions generating the discrete portions<br />

into G s .<br />

Let M c denote the continuous measurements, <strong>an</strong>d let Ys<br />

c be the continuous<br />

counterfactual outcomes. Let M d be the discrete components <strong>of</strong> M, whereas Ys<br />

d<br />

are the discrete components <strong>of</strong> Y s . Table 1 defines the variables used in our <strong>an</strong>alysis.<br />

Under separability the continuous variables c<strong>an</strong> be written as<br />

M c = µ c m (X) + Uc m<br />

Y c<br />

s = µc s (X) + Uc s<br />

Associated <strong>with</strong> the discrete variables are latent continuous variables<br />

M ∗d = µ d m (X) + Ud m<br />

Y ∗d<br />

s<br />

= µ d s (X) + Ud s<br />

where Um d, Ud s are assumed to be continuous.19 The indicator variable is generated<br />

by latent variable I as defined in (17).<br />

The data used for the factor <strong>an</strong>alysis are G s = (M c , M ∗d , Ys c,<br />

Y s<br />

∗d , I). For simplicity,<br />

in this article we assume that the “discrete” variables are in fact binary valued.<br />

Extensions to censored r<strong>an</strong>dom variables <strong>an</strong>d to binary strings are straightforward,<br />

<strong>an</strong>d are developed in a later article. We observe ˜G s when D s = 1. For each s, we<br />

have a system <strong>of</strong> outcome variables. Although the outcomes are s-dependent, the<br />

measurements are observed independently <strong>of</strong> the value assumed by D s .<br />

The distinction between measurements (M ) whose values do not depend on<br />

the value assumed by D s <strong>an</strong>d the state contingent outcomes Y s that depend on the<br />

state s that is observed is essential. There is no selection bias in observing M, but<br />

in general there is a selection bias in observing Y = ∑ S<br />

s=1 D sY s .<br />

M, Y, <strong>an</strong>d D s , s = 1,...,S all contain information on θ. The information from<br />

M is easier to access, <strong>an</strong>d traditional factor <strong>an</strong>alysis is based on such measurements.<br />

Nonetheless, the identification <strong>of</strong> counterfactual states does not require M. IfM<br />

is available, however, the interpretation <strong>of</strong> θ is more tr<strong>an</strong>sparent.<br />

Before turning to our factor <strong>an</strong>alysis, we first establish conditions under which<br />

we c<strong>an</strong> identify the joint distribution <strong>of</strong> M c , M ∗d , Y c , I, which constitute the<br />

s , Y∗d s<br />

19 In particular, Um d, Ud s are assumed to have a distribution that is absolutely continuous <strong>with</strong> respect<br />

to the Lebesgue measure.


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 377<br />

TABLE 1<br />

COMPONENTS OF<br />

˜<br />

G s<br />

Continuous<br />

Discrete<br />

Variables defined to be the same for all s M c M d<br />

Variables defined for s Y c s Y d s<br />

Indicator <strong>of</strong> state — D s<br />

data for the factor <strong>an</strong>alysis. To underst<strong>an</strong>d the basic ideas, we break this task into<br />

three parts: (a) identification <strong>of</strong> the joint distribution <strong>of</strong> (M c , M ∗d ), (b) identification<br />

<strong>of</strong> the parameters in choice system (17) <strong>an</strong>d (18), <strong>an</strong>d (c) identification <strong>of</strong> the<br />

full joint distribution <strong>of</strong> (M c , M ∗d , Y c<br />

factor <strong>an</strong>alyzed.<br />

We assume that<br />

s , Y∗d s<br />

, I). This full distribution is subsequently<br />

(A-1) (Um c , Ud m , Uc s , Ud s ,ε W) have distribution functions that are absolutely continuous<br />

<strong>with</strong> respect to Lebesgue measure <strong>with</strong> me<strong>an</strong>s zero 20 <strong>with</strong><br />

support U c m × Ud m × Uc s × Ud s × E W <strong>with</strong> upper <strong>an</strong>d lower limits being<br />

Ūm c , Ūd m , Ūc s , Ūd s , ε W <strong>an</strong>d U c m , Ud m , Uc s , Ud s ,ε W , respectively, which may be<br />

bounded or infinite. Thus the joint system is measurably separable (variation<br />

free). 21 We assume finite vari<strong>an</strong>ces. 22 The cumulative distribution<br />

function <strong>of</strong> ε W is assumed to be strictly increasing over its full support<br />

(ε W , ε W ).<br />

(A-2) (X, Z, Q) ⊥⊥ (U,ε W ) where U = (Um c , Ud m , Uc s , Ud s ), <strong>an</strong>d where Q is a vector<br />

<strong>of</strong> state-specific regressorsQ= (Q 1 ,...,Q S ) .<br />

We denote by “∼” normalized values where the normalizations in our context<br />

are usually st<strong>an</strong>dard deviations <strong>of</strong> latent index errors. We first consider identification<br />

<strong>of</strong> the joint distribution <strong>of</strong> M. Our results are contained in Theorem 1.<br />

THEOREM 1. From data on F(M | X), one c<strong>an</strong> identify the joint distribution<br />

<strong>of</strong> (Um c , Ud m ) (the latter component only up to scale), the function µd m (X)<br />

is identified (up to scale) <strong>an</strong>d µ c m (X) is identified over the support <strong>of</strong> X provided<br />

that the following assumptions, in addition to the relev<strong>an</strong>t components <strong>of</strong><br />

(A-1) <strong>an</strong>d (A-2), are invoked:<br />

(A-3) Order the discrete measurement components to be first. Suppose that there<br />

are N m,d discrete components, followed by N m,c continuous components.<br />

Assume Support (µ d 1,m (X),...,µd N m,d ,m (X)) ⊇ Support(Ud 1,m ,...,Ud N m ,m ).<br />

Conditions (A-1) <strong>an</strong>d (A-3) imply that (µ d 1,m (X),...,µd N m,d ,m(X)) is measurably<br />

separable (variation free) in all <strong>of</strong> its coordinates when “⊇” is replaced by “=”.<br />

20 Alternatively, we could normalize the medi<strong>an</strong>s to be zero.<br />

21 Foradefinition <strong>of</strong> measurable separability, see Florens et al. (1990, Subsection 5.2). The key idea<br />

is that we c<strong>an</strong> vary each <strong>of</strong> the coordinates <strong>of</strong> the vector freely.<br />

22 This assumption c<strong>an</strong> be relaxed. It only affects certain normalizations.


378 CARNEIRO, HANSEN, AND HECKMAN<br />

(A-4) For each l = 1,...,N m,d , µ d l,m (X) = Xβd l,m .<br />

(A-5) The X lives in a subset <strong>of</strong> R N X<br />

, where N X is the dimension <strong>of</strong> the X regressors.<br />

There exists no proper linear subspace <strong>of</strong> R N X<br />

having probability 1<br />

under F X , the distribution function <strong>of</strong> X.<br />

PROOF. See Appendix A.<br />

Condition (A-4) is conventional (see Cosslett, 1983; M<strong>an</strong>ski, 1988). Weaker<br />

conditions are available using the <strong>an</strong>alysis <strong>of</strong> Matzkin (1992, 1993). Support<br />

condition (A-3) appears in Cameron <strong>an</strong>d Heckm<strong>an</strong> (1998) <strong>an</strong>d Aakvik et<br />

al. (1999). The easiest way to satisfy it is to have exclusions: one continuous<br />

component in µ<br />

l,m d (X) that is not <strong>an</strong> argument in the others. But that<br />

is only a sufficient condition. Even <strong>with</strong>out exclusion, this condition c<strong>an</strong> be<br />

satisfied if there are enough continuous regressors in X <strong>an</strong>d the µ<br />

l,m d (X)<br />

have a full r<strong>an</strong>k Jacobi<strong>an</strong>—<strong>with</strong> respect to the derivatives <strong>of</strong> the continuous<br />

(X) variables. Intuitively, if the r<strong>an</strong>k condition is satisfied, we c<strong>an</strong> hold<br />

µ<br />

l,m d (X) at ¯µd l,m<br />

<strong>an</strong>d vary the other arguments. Formally, this r<strong>an</strong>k condition<br />

requires that if we array the coefficients <strong>of</strong> the continuous variables<br />

coefficients <strong>of</strong> βl,m d ,˜β l,m d , into a N m,d by N X matrix, where here N X is the<br />

number <strong>of</strong> continuous components <strong>of</strong> X, that R<strong>an</strong>k{˜β<br />

l,m d }N m,d<br />

l=1 ≥ N m,d. This requires<br />

N m,d continuous variables. It also requires that the coefficients are linearly<br />

independent. If the number <strong>of</strong> continuous components is N X < N m,d ,<br />

we c<strong>an</strong> only identify N X components <strong>of</strong> the distribution <strong>of</strong> Um d . See Cameron<br />

<strong>an</strong>d Heckm<strong>an</strong> (1998), Aakvik et al. (1999), or H<strong>an</strong>sen et al. (2003) for<br />

more discussion <strong>of</strong> this case <strong>of</strong> identification <strong>with</strong>out conventional exclusion<br />

restrictions.<br />

We next turn to identification <strong>of</strong> the generalized ordered discrete choice model<br />

(17). This extends the pro<strong>of</strong> in Cameron <strong>an</strong>d Heckm<strong>an</strong> (1998) by parameterizing<br />

the cut points. A more general version <strong>of</strong> this model appears in Heckm<strong>an</strong> <strong>an</strong>d<br />

Navarro (2001).<br />

THEOREM 2. For the relev<strong>an</strong>t subsets <strong>of</strong> the conditions (A-1) <strong>an</strong>d (A-2) (specifically,<br />

assuming absolute continuity <strong>of</strong> the distribution <strong>of</strong> ε W <strong>with</strong> respect to Lebesgue<br />

measure <strong>an</strong>d ε W ⊥⊥ (Z, Q)) <strong>an</strong>d the additional assumptions:<br />

(A-6) c s (Q s ) = Q s η s , s = 1,...,S,ϕ(Z) = Z ′ β<br />

(A-7) (Q 1 , Z) is full r<strong>an</strong>k (there is no proper subspace <strong>of</strong> the support (Q 1 , Z)<br />

<strong>with</strong> probability 1). The Z contains no intercept.<br />

(A-8) Q s for s = 2,...,S is full r<strong>an</strong>k (there is no proper subspace <strong>of</strong> (R Q s<br />

) <strong>with</strong><br />

probability 1).<br />

(A-9) Support (c, (Q 1 ) − ϕ(Z)) ⊇ Support (ε W )<br />

Then the distribution function F εW is known up to a scale normalization<br />

on ε W <strong>an</strong>d c s (Q s ), s = 1,...¯s, <strong>an</strong>d ϕ(Z) are identified up to a scale<br />

normalization.


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 379<br />

PROOF. See Appendix A.<br />

Our choice system c<strong>an</strong> be made more nonparametric using the type <strong>of</strong> restrictions<br />

introduced in Matzkin, although we eschew that generality here. Matzkin<br />

<strong>an</strong>d Lewbel (2002) weaken (A-6) generalizing the <strong>an</strong>alysis <strong>of</strong> Matzkin (1992) assuming<br />

that the c s are const<strong>an</strong>ts.<br />

We next turn to the identification <strong>of</strong> the joint system (M c , M ∗d , Ys c,<br />

Y∗d s , I). The<br />

data for each choice system (including the data on choice probabilities) generate<br />

the left-h<strong>an</strong>d side <strong>of</strong><br />

(22) Pr ( M c ≤ m c , M ∗d ≤ 0, Y c<br />

s<br />

× Pr(D s = 1 | Z, Q s , Q s−1 )<br />

≤ y c s , Y∗d s ≤ 0 | D s = 1, X, Z, Q s , Q s−1<br />

)<br />

=<br />

∫ m c −µ c m (X ) ∫ −˜µ d<br />

m (X )<br />

U<br />

˜<br />

c U d m<br />

∫ y c<br />

s −µ c s (X ) ∫ −˜µ d (X )<br />

U c s<br />

× dU c m dŨd m dUc s dŨd s d˜ε W.<br />

˜<br />

U d s<br />

∫ cs (Qs )−ϕ(Z )<br />

σ W<br />

c s−1 (Q s−1 )−ϕ(Z )<br />

σ W<br />

f ( U m,Ũd c Uc s,Ũd s,˜ε )<br />

W<br />

From Theorem 1 we know µ c m (X)(=Xβc m ) <strong>an</strong>d ˜µd m (X)(=X˜β m d ) <strong>an</strong>d the joint<br />

distribution <strong>of</strong> (Um c , Ũd m ). From Theorem 2, we know c s(Q s )−ϕ(Z )<br />

σ W<br />

= Q sη s −Z ′ β<br />

σ W<br />

, s =<br />

1,...,S <strong>an</strong>d the coefficients η s ,β <strong>an</strong>d the distribution F˜εW . Notice that c s (Q s ) ≥<br />

c s−1 (Q s−1 ) is a requirement <strong>of</strong> the ordered choice model. We maintain the following<br />

assumptions:<br />

(A-10) Support(−˜µ d m (X), −˜µd s (X), ( c s(Q s ) − ϕ(Z)<br />

σ W<br />

− c s−1(Q s−1 ) − ϕ(Z)<br />

σ W<br />

)) ⊇<br />

Support(U d m , Ud s ,˜ε W) = (U d m × Ud s × Ẽ W ).<br />

(A-11) There is no proper linear subspace <strong>of</strong> (X, Z, Q s , Q s−1 ) <strong>with</strong> probability<br />

one so the model is full r<strong>an</strong>k.<br />

As a consequence <strong>of</strong> (A-6) <strong>an</strong>d (A-10) we c<strong>an</strong> find values <strong>of</strong> Q s , Q s−1 , ¯Q s , Q s−1<br />

,<br />

respectively, so that<br />

lim<br />

Qs → ¯Qs<br />

Q s−1 →Q s−1<br />

Pr (D s = 1 | Z, Q s , Q s−1 ) = 1<br />

In these limit sets (which may depend on Z), under the stated conditions (A-1)–<br />

(A-11), we c<strong>an</strong> identify the joint distribution <strong>of</strong> (M c , M ∗d , Ys c,<br />

Y∗d s ), s = 1,...,S<br />

using <strong>an</strong> argument parallel to the one used to prove Theorem 1. These limit<br />

sets produce S different joint distributions (corresponding to each value <strong>of</strong> s)<br />

but do not generate joint distributions across the s (i.e., the joint distribution<br />

<strong>of</strong> M c , M ∗d , Ys c,<br />

Y∗d s across s values). However, M is common across these systems.<br />

Using the dependence <strong>of</strong> M <strong>an</strong>d Y s , s = 1,...,S, on a common θ we c<strong>an</strong>


380 CARNEIRO, HANSEN, AND HECKMAN<br />

sometimes identify the joint distribution. See Carneiro et al. (2001) for <strong>an</strong> example.<br />

Thus, <strong>with</strong> a measurement system M we do not strictly require information<br />

on the choice index I to identify the model.<br />

Following <strong>an</strong> argument <strong>of</strong> Heckm<strong>an</strong> (1990), Heckm<strong>an</strong> <strong>an</strong>d Honoré (1990), <strong>an</strong>d<br />

Heckm<strong>an</strong> <strong>an</strong>d Smith (1998), we c<strong>an</strong> identify µ c s (X) up to <strong>an</strong> additive const<strong>an</strong>t <strong>with</strong>out<br />

passing to the limit set where Pr(D s = 1 | Z, Q s , Q s−1 ) = 1. This is not possible<br />

for the identification <strong>of</strong> ˜µ d s (X) because there is no counterpart to the variation in<br />

ys c for the discrete component. This is the content <strong>of</strong> the following theorem that<br />

combines the key ideas <strong>of</strong> Theorems 1 <strong>an</strong>d 2 to produce <strong>an</strong> identification theorem<br />

for the general case.<br />

THEOREM 3. Under assumptions (A-1), (A-2), (A-4), <strong>an</strong>d (A-6)–(A-11), µ c m (X),<br />

µ c s (X), ˜µd m (X), ˜µd s (X),˜ϕ(Z), c s(Q s ) s = 1,...,S − 1 are identified as is the joint<br />

distribution F(Um c , Ũd m , Uc s , Ũd s ,˜ε W).<br />

PROOF. See Appendix A.<br />

As noted in the discussion following Theorem 1, <strong>with</strong>out st<strong>an</strong>dard exclusion<br />

restrictions we may only be able to identify subcomponents <strong>of</strong> the joint distribution<br />

if N X < N m,d where N X is the number <strong>of</strong> continuous regressors. Note that the µ c s,l ,<br />

˜µ d s,l<br />

may only be defined over their supports. Under <strong>an</strong> additional r<strong>an</strong>k or variationfree<br />

condition on the regressors we recover these functions everywhere over the<br />

support <strong>of</strong> X.<br />

5.1. Factor Analysis. The thrust <strong>of</strong> Theorems 1–3 is that under the stated<br />

conditions we know the joint distributions <strong>of</strong> (U s , U m , ˜ε W ) s = 1,...,S where U s =<br />

(Us d, Uc s ). We factor <strong>an</strong>alyze them under assumptions like those invoked in matrix<br />

(21) <strong>with</strong> two or more <strong>of</strong> these elements dependent solely on θ 1 , <strong>an</strong> additional<br />

two or more elements dependent solely on (θ 1 ,θ 2 ), <strong>an</strong>d so forth, but at least three<br />

final elements dependent on θ K . There is a total <strong>of</strong> A× R outcomes in each state<br />

where R is the number <strong>of</strong> outcome measures in each state at each age (e.g., wages,<br />

employment, occupation), there are M non-state-contingent measurements, <strong>an</strong>d<br />

˜ε W is a scalar. Thus, L in (21) is (A× R) + M + 1 in dimension for each system<br />

s, s = 1,...,S.<br />

We write the unobservables in factor structure form<br />

U s,a = α s,a ′ θ + ε s,a<br />

U m = α m ′ θ + ε m<br />

<strong>with</strong> s = 1,...,S; a = 1,...,A<br />

<strong>with</strong> m = 1,...,N m<br />

˜ε W = γ ′ θ + ε I<br />

The α s,a may be different across s-states so that each s system may depend on<br />

different elements <strong>of</strong> θ. Theα m are not, nor is the γ . There may be multiple<br />

measurements <strong>of</strong> outcomes so in principle α s,a may be a matrix <strong>an</strong>d ε s,a a vector <strong>of</strong><br />

mutually independent components. Our empirical <strong>an</strong>alysis is for the vector case.


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 381<br />

The choice <strong>of</strong> how to select the blocks <strong>of</strong> (21) may appear to be arbitrary, but in<br />

m<strong>an</strong>y applications there are natural orderings. Thus in the empirical work reported<br />

below we estimate a two-factor model. We have a vector <strong>of</strong> five test scores that<br />

proxy latent ability (θ 1 ). The state contingent outcomes (earnings) equations <strong>an</strong>d<br />

choice equations plausibly depend on both θ 1 <strong>an</strong>d θ 2. In m<strong>an</strong>y applications there<br />

are <strong>of</strong>ten natural allocations <strong>of</strong> factors to various measurements. However, to<br />

avoid arbitrariness, a carefully reasoned defense <strong>of</strong> <strong>an</strong>y allocation is required. We<br />

now formalize identification in this system.<br />

THEOREM 4. Under the normalizations on the factor loadings <strong>of</strong> the type in (21)<br />

for one system s under the conditions <strong>of</strong> Theorems 1–3, given the normalizations<br />

for the unobservables for the discrete components <strong>an</strong>d given at least 2K + 1 measurements<br />

(Y, M, I), the unrestricted factor loadings <strong>an</strong>d the vari<strong>an</strong>ces <strong>of</strong> the factors<br />

(σ 2 θ i<br />

, i = 1,...,K) are identified for all systems.<br />

PROOF. The pro<strong>of</strong> is implicit in the discussion surrounding equation (21). <br />

Observe that since the σθ 2 i<br />

, i = 1,...,K, are identified in one system, normalizations<br />

<strong>of</strong> specific factor loadings to unity are only required in that system since<br />

we c<strong>an</strong> apply the knowledge <strong>of</strong> these vari<strong>an</strong>ces to the other systems. 23 Thus, for<br />

the other systems (values <strong>of</strong> the state other th<strong>an</strong> s) we do not need to normalize<br />

<strong>an</strong>y factor loading to unity.<br />

We c<strong>an</strong> also nonparametrically identify the densities <strong>of</strong> the uniquenesses <strong>an</strong>d<br />

the factors. This follows from mutual independence <strong>of</strong> the θ i , i = 1,...,K, <strong>an</strong>d<br />

<strong>an</strong> application <strong>of</strong> Kotlarski’s Theorem (1967). We first state Kotlarski’s Theorem<br />

<strong>an</strong>d then we apply it to our problem.<br />

Write ({U m } N m<br />

m=1 , {U s,a}<br />

a=1 A ,˜ε W) in vector form as T s . Order the vectors so that<br />

the first B 1 (≥2) elements depend only on θ 1 , the next B 2 − B 1 (≥2) elements<br />

depend on (θ 1 ,θ 2 ) <strong>an</strong>d so forth. Let T1 s <strong>an</strong>d Ts<br />

2 be the first two elements <strong>of</strong> Ts .<br />

(This is purely a notational convenience). We order the elements <strong>of</strong> T s so that the<br />

first block depends solely on θ 1 , (assuming that there are B 1 such measurements)<br />

the second block depends solely on θ 1 ,θ 2 (there are B 2 − B 1 such measurements),<br />

<strong>an</strong>d so forth, following the convention established in Equation (21). We require<br />

B 1 ≥ 2, B 2 − B 1 ≥ 2, <strong>an</strong>d B K − B K−1 ≥ 3.<br />

THEOREM 5.<br />

If<br />

T s<br />

1 = θ 1 + v 1<br />

<strong>an</strong>d<br />

T s<br />

2 = θ 1 + v 2<br />

<strong>an</strong>d θ 1 ⊥⊥ v 1 ⊥ v 2 , the me<strong>an</strong>s <strong>of</strong> all three generating r<strong>an</strong>dom variables are finite,<br />

E(v 1 ) = E(v 2 ) = 0, <strong>an</strong>d the conditions <strong>of</strong> Fubini’s theorem are satisfied for each<br />

23 In the discussion <strong>of</strong> equation (21) we could have normalized the vari<strong>an</strong>ces <strong>of</strong> the σ 2 θ i<br />

, i = 1,...,K,<br />

to one rather th<strong>an</strong> certain factor loadings, although this is less straightforward <strong>an</strong>d requires the imposition<br />

<strong>of</strong> certain sign restrictions.


382 CARNEIRO, HANSEN, AND HECKMAN<br />

r<strong>an</strong>dom variable, <strong>an</strong>d the r<strong>an</strong>dom variables possess nonv<strong>an</strong>ishing (a.e.) characteristic<br />

functions, then the densities <strong>of</strong> (θ 1 ,v 1 ,v 2 ), g(θ 1 ), g 1 (v 1 ), g 2 (v 2 ), respectively,<br />

are identified.<br />

PROOF. See Kotlarski (1967). See also Rao (1992). <br />

Applied to our context, consider the first two equations <strong>of</strong> T <strong>an</strong>d suppose that<br />

the components depend only on θ 1 . We use our notation for the factor loadings to<br />

write<br />

T s<br />

1 = λs 11 θ 1 + ε s 1<br />

where λ s 11 = 1<br />

T s<br />

2 = λs 21 θ 1 + ε s 2<br />

where λ s 21 ≠ 0<br />

Here we use a notation associating the subscript <strong>of</strong> εi s <strong>with</strong> its position in the T s<br />

vector. Applying Theorem 4, we c<strong>an</strong> identify λ s 21<br />

(subject to the normalization<br />

λ s 11 = 1).24 Thus we c<strong>an</strong> rewrite these equations as<br />

T s<br />

1 = θ 1 + ε s 1<br />

T2<br />

s<br />

λ s = θ 1 + ε ∗,s<br />

2<br />

21<br />

where ε ∗,s<br />

2<br />

= ε2 s/λs 21<br />

. Applying Kotlarski’s theorem, we c<strong>an</strong> nonparametrically<br />

identify the densities g(θ 1 ), g 1 (ε1 s), <strong>an</strong>d g 2(ε ∗,s<br />

2 ). Since we know λs 21<br />

we c<strong>an</strong> identify<br />

g(ε2 s). Let B 1 denote the number <strong>of</strong> measurements (elements <strong>of</strong> T s ) that depend<br />

only on θ 1 . Proceeding through the first B 1 measurements, we c<strong>an</strong> identify g(εi s),<br />

i = 1,...,B 1 .<br />

Proceeding to equations B 1 + 1 <strong>an</strong>d B 1 + 2 (corresponding to the first two measurements<br />

in the next set <strong>of</strong> equations that depend on θ 1 <strong>an</strong>d θ 2 ), we may use the<br />

normalization adopted in Theorem 4 to write<br />

T s B 1 +1 = λs B 1 +1,1 θ 1 + θ 2 + ε s B 1 +1<br />

T s B 1 +2 = λs B 1 +2,1 θ 1 + λ s B 1 +2,2 θ 2 + ε s B 1 +2<br />

Rearr<strong>an</strong>ging, we may write these equations as<br />

T s B 1 +1 − λs B 1 +1,1 θ 1 = θ 2 + ε s B 1 +1<br />

T s B+2 − λs B 1 +2,1 θ 1<br />

λ s B 1 +2,2<br />

= θ 2 + ε ∗,s<br />

B 1 +2<br />

24 To be able to identify λ s 21 we need a third measurement on this factor, which we c<strong>an</strong> get from<br />

equation B 1 + 1. Since there is no equation B K+1 , we require that B K − B K−1 ≥ 3 in order to be able<br />

to identify the loadings on θ K .


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 383<br />

where ε ∗,s<br />

B 1 +2 = εs B 1 +2<br />

, <strong>an</strong>d the ε s λ s B B 1 +2,2<br />

1 +1<br />

<strong>an</strong>d ε∗,s<br />

B 1 +2<br />

are mutually independent. Hence<br />

by Theorem 5, we c<strong>an</strong> identify densities g(θ 2 ), g(ε s B 1 +1 ), g(εs B 1 +2<br />

). Exploiting the<br />

structure (21), we c<strong>an</strong> proceed sequentially to identify the densities <strong>of</strong> θ,g(θ i ), i =<br />

1,...,K, <strong>an</strong>d the uniqueness, g(εi s) for all the components <strong>of</strong> vector Ts . For the<br />

components <strong>of</strong> εi s corresponding to discrete measurements, we do not identify the<br />

scale. Armed <strong>with</strong> knowledge <strong>of</strong> the densities <strong>of</strong> the θ i <strong>an</strong>d the factor loadings<br />

for other values <strong>of</strong> s, we c<strong>an</strong> apply st<strong>an</strong>dard deconvolution methods to nonparametrically<br />

identify the uniqueness <strong>of</strong> the ε i ’s for the other systems. Thus, we c<strong>an</strong><br />

nonparametrically identify all the error terms for the model. Notice that, in principle,<br />

we c<strong>an</strong> estimate separate distributions <strong>of</strong> the θ i for each s system <strong>an</strong>d thus<br />

c<strong>an</strong> test the hypothesis <strong>of</strong> equality <strong>of</strong> these distributions across systems.<br />

The essential idea in this article is to obtain identification <strong>of</strong> the joint counterfactual<br />

distributions through the dependence across s <strong>of</strong> Y s = (Ys d,<br />

Yc s ) on the common<br />

factors that also generate M or I. In this sense, measurements <strong>an</strong>d choices are both<br />

sources <strong>of</strong> identifying information, <strong>an</strong>d c<strong>an</strong> be traded <strong>of</strong>f in terms <strong>of</strong> identification.<br />

We next apply our framework to a well-posed economic model.<br />

6. GENERALIZING THE WILLIS–ROSEN MODEL<br />

We revisit Willis <strong>an</strong>d Rosen’s application <strong>of</strong> the Roy model (1979) to the economics<br />

<strong>of</strong> education, adding uncertainty <strong>an</strong>d nonpecuniary net returns to schooling,<br />

<strong>an</strong>d identifying counterfactual distributions <strong>of</strong> gross <strong>an</strong>d net returns. In this<br />

article the outcomes are utility outcomes, present value outcomes <strong>an</strong>d rates <strong>of</strong><br />

return.<br />

Suppose that agents c<strong>an</strong>not lend or borrow <strong>an</strong>d possess log preferences<br />

(utility = ln C, where C is consumption). Suppose that agents are choosing between<br />

high school <strong>an</strong>d college so S = 2. The utility <strong>of</strong> attending college is<br />

V(1) =<br />

A∑<br />

a=0<br />

ln Y 1 a<br />

(1 + ρ) a − ln P<br />

where ln P is the “cost” <strong>of</strong> going to school. The costs include tuition costs <strong>an</strong>d the<br />

psychic benefits from working in sector 1 (relative to sector 0). Thus, costs may be<br />

negative. ρ is a subjective rate <strong>of</strong> time preference. The utility <strong>of</strong> completing only<br />

high school is<br />

V(0) =<br />

A∑<br />

a=0<br />

ln Y 0<br />

a<br />

(1 + ρ) a<br />

where Ya<br />

0 <strong>an</strong>d Y1 a are earnings from high school <strong>an</strong>d college, respectively, at age<br />

a. The psychic costs or benefits in logs for high school are normalized to zero. We<br />

c<strong>an</strong> only identify relative psychic “costs” or benefits.


384 CARNEIRO, HANSEN, AND HECKMAN<br />

Latent variables <strong>an</strong>d costs are generated by a factor structure. The equations<br />

are<br />

ln Y j<br />

a = µ j (X) + ( α j a) ′θ + ε<br />

j<br />

a j = 0, 1, a = 1,...,A.<br />

ln P = µ P (Z) + (α P ) ′ θ + ε P<br />

In addition, we have measurements on test scores M = µ M (x) + α ′ M θ + εM , where<br />

θ ⊥⊥ [(ε j<br />

i,a )I i=1 ,1 j=0 ,A a=0 ,ε P].<br />

The agent makes decisions about schooling under uncertainty about different<br />

components <strong>of</strong> the model. I θ is the information set. The expected value V <strong>of</strong> going<br />

to college is<br />

⎡<br />

A∑<br />

V = E (V (1) − V (0) | I θ ) = E Iθ<br />

⎢<br />

⎣a=0<br />

µ 1 a (X) − µ0 a (X) + ( α 1 a − α0 a) ′<br />

θ + ε<br />

1<br />

a − ε 0 a<br />

(1 + ρ) a<br />

− [µ P (Z) + α ′ P θ + ε P]<br />

If future innovations in earnings (εa 1,ε0 a ), a = 0,...,Aare not known at the time<br />

schooling decisions are made but innovations in costs are known, we may write<br />

the agent’s preference function as<br />

V =<br />

( A∑<br />

a=0<br />

) [<br />

µ a 1 (X) − µ0 a (X)<br />

A∑<br />

(<br />

α<br />

1<br />

(1 + ρ) a − µ P (Z) +<br />

a − α 0 ′<br />

]<br />

a)<br />

(1 + ρ) a − α ′ P E Iθ (θ) − ε P<br />

a=0<br />

As we shall see, this assumption about agent knowledge <strong>of</strong> future innovations in<br />

earnings is testable. Assume that σ P = (var(ε P )) 1 2 < ∞. Then<br />

( )<br />

V<br />

= 1 A∑<br />

µ a 1 (X) − µ0 a (X)<br />

σ P σ P (1 + ρ) a − µ P (Z)<br />

a=0<br />

+<br />

( A∑<br />

a=0<br />

(<br />

α<br />

1<br />

a − α 0 a) ′<br />

(1 + ρ) a − α ′ P<br />

)<br />

1<br />

σ P<br />

E Iθ (θ) − ε P<br />

σ P<br />

D s = 1if V σ P<br />

> 0; D s = 0 otherwise.<br />

Specifying alternative information sets (I θ ) <strong>an</strong>d examining the resulting fit <strong>of</strong><br />

the model to data, we c<strong>an</strong> determine the information sets that agents act on.<br />

Exact econometric specifications are presented in Section 7. We test whether<br />

agents act on components <strong>of</strong> θ that also appear in outcome equations realized<br />

after the choices are made. The estimated dependence between schooling choices<br />

<strong>an</strong>d subsequent realizations <strong>of</strong> earnings enables us to identify the components in<br />

⎤<br />

⎥<br />


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 385<br />

the agent’s information set at the time schooling decisions are being made. This<br />

extends the method <strong>of</strong> Flavin (1981) <strong>an</strong>d H<strong>an</strong>sen et al. (1991) to a discrete choice<br />

setting. If agents do not act on these components, then those components are<br />

intrinsically uncertain at the time agents make their schooling decisions unless<br />

nongeneric c<strong>an</strong>cellations occur. 25 Because we c<strong>an</strong> identify the joint distributions<br />

<strong>of</strong> unobservables, we c<strong>an</strong> <strong>an</strong>swer questions Willis <strong>an</strong>d Rosen could not such as:<br />

(1) How highly correlated are latent skills (utilities) across sectoral choices? (2)<br />

How much intrinsic uncertainty do agents face? (3) How import<strong>an</strong>t is uncertainty<br />

in explaining schooling choices? (4) What fraction <strong>of</strong> the population regrets its ex<br />

<strong>an</strong>te schooling choice ex post? We c<strong>an</strong> also separate out net psychic components<br />

<strong>of</strong> the returns to schooling (the ln P ) from monetary components.<br />

Observe that as a consequence <strong>of</strong> the log specification <strong>of</strong> preferences (including<br />

the additive separability <strong>of</strong> the θ <strong>an</strong>d ε), me<strong>an</strong> preserving spreads in εa,θ j <strong>an</strong>d<br />

ε P produce no ch<strong>an</strong>ge in me<strong>an</strong> utility. The probability <strong>of</strong> selection D s = 1 is also<br />

invari<strong>an</strong>t to me<strong>an</strong> preserving spreads for εa j but not for θ <strong>an</strong>d ε P since their vari<strong>an</strong>ce<br />

enters the choice probability if these components are known to the agent.<br />

In addition, a me<strong>an</strong> preserving spread in ln Y is not the same as a me<strong>an</strong> preserving<br />

spread in Y. Me<strong>an</strong> preserving spreads in Y have <strong>an</strong> effect on utility since<br />

E(Y) = e µ E(e ε ). Define the residual from the me<strong>an</strong> as H, H = e µ e ε − e µ E(e ε )so<br />

var(H) = e 2µ (E(e 2ε ) − [E(e ε )] 2 ). A me<strong>an</strong> preserving spread keeps the me<strong>an</strong> <strong>of</strong> Y<br />

fixed at const<strong>an</strong>t k = E(Y) = e µ E(e ε ).<br />

For a perturbation in the vari<strong>an</strong>ce <strong>of</strong> ε that ch<strong>an</strong>ges ε to ε, <strong>an</strong>d defining<br />

f (ε) as the density <strong>of</strong> ε, locally 0 = dµ + [∫ εe ε f (ε) dε]<br />

d so dµ =− [∫ εe ε f (ε) dε]<br />

d.<br />

E(e ε )<br />

E(e ε )<br />

Moreover, because E(ε) = 0 <strong>an</strong>d εe ε is convex increasing in ε, the derivative<br />

is positive. In a log normal example, E(e ε ) = e σ 2<br />

2 , E(e 2ε ) = e 2σ 2 , var(H) =<br />

e 2µ (e 2σ 2 − e σ 2 ), k = e µ e σ 2<br />

2 , ln k = µ + σ 2<br />

2 , (−dµ) = d(σ 2 )<br />

so <strong>an</strong> increase in the vari<strong>an</strong>ce<br />

is equivalent to a decrease in the me<strong>an</strong> utility. We consider the effects <strong>of</strong><br />

2<br />

me<strong>an</strong> preserving spreads on both me<strong>an</strong> log utility <strong>an</strong>d on the probability that V is<br />

positive (college is selected). We now turn to the empirical <strong>an</strong>alysis <strong>of</strong> this article.<br />

7. EMPIRICAL RESULTS<br />

We use the NLSY data for white males described in Appendix B <strong>an</strong>d augmented<br />

<strong>with</strong> the PSID data to estimate the Willis–Rosen Model. We focus on two schooling<br />

decisions; graduating from a four-year college or graduating from high school. We<br />

thus abstract from the full multiplicity <strong>of</strong> choices <strong>of</strong> schooling as did Rosen <strong>an</strong>d<br />

Willis. This is clearly a bold simplification but it allows us to focus on the main<br />

points <strong>of</strong> this article.<br />

25 In principle, the future (αa 1,α0 a ) c<strong>an</strong> be uncertain at the date decisions are made. Assuming that<br />

these factor loadings are independent <strong>of</strong> θ, we c<strong>an</strong> replace these expressions by E Iθ (αa 1), E I θ<br />

(αa 0)<br />

<strong>with</strong>out affecting the identifiability <strong>of</strong> (αa 1,α0 a ), provided the conditions <strong>of</strong> Theorem 4 are met, but<br />

it affects the identifiability <strong>an</strong>d interpretation <strong>of</strong> α P . A more general version <strong>of</strong> this model would<br />

postulate two r<strong>an</strong>dom variables θ <strong>an</strong>d θ ∗ . Agents act on θ ∗ whereas θ is the true value. It would be<br />

interesting to identify the joint distributions <strong>of</strong> θ <strong>an</strong>d θ ∗ under, for example, a rational expectations<br />

assumption. We leave this for a later occasion.


386 CARNEIRO, HANSEN, AND HECKMAN<br />

As a measurement system (M) for cognitive ability we use five components<br />

<strong>of</strong> the ASVAB test battery (arithmetic reasoning, word knowledge, paragraph<br />

composition, math knowledge, <strong>an</strong>d coding speed). We dedicate the first factor<br />

(θ 1 ) to the ability measurement system <strong>an</strong>d exclude the other factors from that<br />

system (recall the normalizations in Equation (21)). We include family background<br />

variables as additional covariates in the ASVAB test equations (the µ M (X)).<br />

To simplify the empirical <strong>an</strong>alysis, we divide the lifetimes <strong>of</strong> individuals into two<br />

periods. The first period covers ages 19–29, <strong>an</strong>d the second covers ages 30–65. We<br />

compute <strong>an</strong>nual earnings by multiplying the hourly wage by hours worked each<br />

year for each individual. 26 We impute missing wages <strong>an</strong>d project earnings for the<br />

ages not observed in the NLSY data using the procedure described in Appendix<br />

B. The NLSY data do not contain information on the full life cycle <strong>of</strong> earnings.<br />

We project the missing NLSY earnings using estimates <strong>of</strong> lifetime earnings from<br />

the PSID data.<br />

Table 2a,b presents the sample statistics. They show that whereas college graduates<br />

have higher earnings th<strong>an</strong> high school graduates, all <strong>of</strong> the gain <strong>of</strong> attending<br />

college comes after age 30. College graduates also have much higher test scores<br />

<strong>an</strong>d come from better family backgrounds th<strong>an</strong> high school graduates. They are<br />

more likely to live in locations where a college is present <strong>an</strong>d where college tuition<br />

is lower.<br />

In the notation <strong>of</strong> Section 5, ¯S = 2 (two choices), ¯R = 1 (there is one outcome<br />

per person, earnings), ¯M = 5 (there are five test scores that are generated solely by<br />

θ 1 ), <strong>an</strong>d Ā = 2 (there are two periods in the life cycle). In addition, there is a utility<br />

index I. The test scores depend solely on θ 1 . The outcomes <strong>an</strong>d index are allowed to<br />

depend on (θ 1 ,θ 2 ). Since K = 2, assuming nonzero factor loadings, we satisfy the<br />

conditions for identification presented in Theorem 4. We have five measurements<br />

generated solely by θ 1 . There are three measurements generated by θ 1 <strong>an</strong>d θ 2 for<br />

each schooling level. (Outcomes <strong>an</strong>d choices are defined for each choice system.)<br />

Exclusion restrictions are given in Table 2c along <strong>with</strong> the specification <strong>of</strong> each<br />

<strong>of</strong> the equations. Tuition <strong>an</strong>d family background identify the parameters <strong>of</strong> the<br />

earnings equations. Local labor market variables identify the parameters <strong>of</strong> utility<br />

equations. Assuming that test scores are continuous outcomes, no exclusions are<br />

needed for identification <strong>of</strong> the test score equations <strong>an</strong>d their distribution.<br />

In this section, to facilitate the exposition we denote the college state (choice<br />

1) by c, whereas high school (choice 0) is denoted by h. We model log earnings<br />

(utility <strong>of</strong> earnings) at each age as<br />

(23)<br />

ln Y a,s = δ a,s + X ′ β a,s + η 1,s × experience a + η 2,s × experience 2 a + α′ a,s θ + ε a,s<br />

where Y a,s is earnings in period (age) a if the schooling level is s, X is a vector <strong>of</strong><br />

covariates, θ is a vector <strong>of</strong> factors <strong>an</strong>d η 1,s <strong>an</strong>d η 2,s are calculated by the procedure<br />

described in Appendix B. We compute the present value <strong>of</strong> log earnings (lifetime<br />

26 We set zero earnings to $1 in this article to simplify computations (ln(1) = 0).


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 387<br />

TABLE 2a<br />

DESCRIPTIVE STATISTICS OF VARIABLES (NLSY 79—WHITE MALES)<br />

Overall High School College<br />

Variables Obs Me<strong>an</strong> Std. Dev. Min Max Obs Me<strong>an</strong> Std. Dev. Min Max Obs Me<strong>an</strong> Std. Dev. Min Max<br />

Tuition at age 17 1161 20.68 7.70 0.00 53.59 704 21.17 7.85 0.00 47.06 457 19.92 7.40 0.00 53.59<br />

(hundreds <strong>of</strong> dollars)<br />

Urb<strong>an</strong> at age 14 1161 0.74 0.44 0.00 1.00 704 0.69 0.46 0.00 1.00 457 0.82 0.38 0.00 1.00<br />

Broken at age 14 1161 0.13 0.34 0.00 1.00 704 0.15 0.36 0.00 1.00 457 0.11 0.31 0.00 1.00<br />

Number <strong>of</strong> siblings 1161 2.83 1.77 0.00 15.00 704 3.03 1.85 0.00 15.00 457 2.51 1.60 0.00 11.00<br />

South at age 14 1161 0.19 0.39 0.00 1.00 704 0.19 0.39 0.00 1.00 457 0.19 0.39 0.00 1.00<br />

Mother education 1161 12.39 2.20 3.00 20.00 704 11.69 1.86 3.00 20.00 457 13.47 2.26 6.00 20.00<br />

Father education 1161 12.71 3.16 0.00 20.00 704 11.61 2.70 0.00 20.00 457 14.40 3.08 4.00 20.00<br />

Age in 1980 1161 19.27 2.19 16.00 23.00 704 19.27 2.17 16.00 23.00 457 19.28 2.21 16.00 23.00<br />

Dist<strong>an</strong>ce to college at age 17 1161 7.68 15.58 0.00 100.20 704 8.87 16.01 0.00 100.20 457 5.84 14.75 0.00 96.59<br />

Dummy for birth in 1957 1161 0.10 0.30 0.00 1.00 704 0.10 0.31 0.00 1.00 457 0.09 0.29 0.00 1.00<br />

Dummy for birth in 1958 1161 0.10 0.30 0.00 1.00 704 0.09 0.28 0.00 1.00 457 0.12 0.33 0.00 1.00<br />

Dummy for birth in 1959 1161 0.11 0.31 0.00 1.00 704 0.11 0.31 0.00 1.00 457 0.10 0.30 0.00 1.00<br />

Dummy for birth in 1960 1161 0.14 0.35 0.00 1.00 704 0.15 0.36 0.00 1.00 457 0.13 0.34 0.00 1.00<br />

Dummy for birth in 1961 1161 0.14 0.34 0.00 1.00 704 0.14 0.34 0.00 1.00 457 0.13 0.34 0.00 1.00<br />

Dummy for birth in 1962 1161 0.16 0.37 0.00 1.00 704 0.16 0.37 0.00 1.00 457 0.16 0.36 0.00 1.00<br />

Dummy for birth in 1963 1161 0.13 0.34 0.00 1.00 704 0.12 0.33 0.00 1.00 457 0.14 0.35 0.00 1.00<br />

Education status (0 if HS, 1161 0.39 0.49 0.00 1.00 704 0.00 0.00 0.00 0.00 457 1.00 0.00 1.00 1.00<br />

1 if college)<br />

In school at ASVAB test date 1161 0.67 0.47 0.00 1.00 704 0.49 0.50 0.00 1.00 457 0.94 0.23 0.00 1.00<br />

Arithmetic reasoning 1161 0.15 0.95 −2.39 1.42 704 −0.22 0.91 −2.39 1.42 457 0.73 0.70 −1.96 1.42<br />

Word knowledge (ASVAB 3) 1161 0.14 0.88 −3.71 1.16 704 −0.19 0.92 −3.71 1.16 457 0.64 0.50 −2.24 1.16<br />

Paragraph composition 1161 0.14 0.89 −3.50 1.21 704 −0.17 0.96 −3.50 1.21 457 0.62 0.47 −1.62 1.21<br />

(ASVAB 4)<br />

Coding speed (ASVAB 6) 1161 0.15 0.95 −3.03 2.79 704 −0.12 0.90 −2.89 2.09 457 0.57 0.87 −3.03 2.79<br />

Math knowledge (ASVAB 7) 1161 0.14 0.97 −2.14 1.58 704 −0.36 0.80 −2.14 1.58 457 0.91 0.68 −1.83 1.58<br />

Present value <strong>of</strong> earnings ∗ 1161 956.13 730.87 18.12 7861.67 704 694.56 321.93 18.12 1885.85 457 1359.07 964.47 77.02 7861.67<br />

∗ Earnings in thous<strong>an</strong>ds <strong>of</strong> dollars.


388 CARNEIRO, HANSEN, AND HECKMAN<br />

TABLE 2b<br />

DESCRIPTIVE STATISTICS—PRESENT VALUE OF LOG EARNINGS (DISCOUNT RATE = 3%, NLSY 79—WHITE MALES)<br />

Overall High School College<br />

Variables Obs Me<strong>an</strong> Std. Dev. Min Max Obs Me<strong>an</strong> Std. Dev. Min Max Obs Me<strong>an</strong> Std. Dev. Min Max<br />

Present value 1161 956.13 730.87 18.12 7861.67 704 694.56 321.93 18.12 1885.85 457 1359.07 964.47 77.02 7861.67<br />

<strong>of</strong> earnings<br />

(working life) ∗<br />

Present value 1161 157.25 72.02 3.91 509.21 704 162.46 74.66 3.91 457.00 457 149.22 67.04 13.55 509.21<br />

<strong>of</strong> earnings<br />

in Period 1 ∗<br />

Present value 1161 798.88 694.55 14.21 7533.31 704 532.10 251.57 14.21 1582.89 457 1209.84 922.21 63.48 7533.31<br />

<strong>of</strong> earnings<br />

in Period 2 ∗<br />

NOTE: ∗ Earnings in thous<strong>an</strong>ds <strong>of</strong> dollars.<br />

Working life = From age 19–65.<br />

Period 1 = From age 19 to 29, inclusive.<br />

Period 2 = From age 30 to 65.


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 389<br />

TABLE 2c<br />

COVARIATES INCLUDED IN OUTCOME, COST AND TEST EQUATIONS<br />

Cost <strong>of</strong> Schooling<br />

Tuition <strong>an</strong>d Foregone Psychic<br />

Earnings Earnings Cost Test Scores<br />

Intercept Yes Yes Yes Yes<br />

Urb<strong>an</strong> Yes Yes Yes Yes<br />

South Yes Yes Yes Yes<br />

Cohort dummies Yes Yes Yes –<br />

Me<strong>an</strong> local unemployment rate Yes Yes – –<br />

Average local wage Yes Yes – –<br />

Local tution – Yes – –<br />

Number <strong>of</strong> siblings – – Yes Yes<br />

Mother’s education – – Yes Yes<br />

Father’s education – – Yes Yes<br />

Broken family – – Yes<br />

Enrolled in school at test date – – Yes<br />

Age in 1980 – – Yes<br />

utility) in the first period (ages 19–29 years) <strong>an</strong>d in the second period (ages 30 to<br />

65 years). Let V 1,s be the period 1 gross utility <strong>of</strong> achieving schooling level s, <strong>an</strong>d<br />

V 2,s be the period 2 gross utility <strong>of</strong> obtaining schooling level s. Using (23), we write<br />

the gross utilities as<br />

V 1,s = ¯δ 1,s + X ′ ¯β 1,s + ᾱ ′ 1,s θ + ¯ε 1,s<br />

V 2,s = ¯δ 2,s + X ′ ¯β 2,s + ᾱ ′ 2,s θ + ¯ε 2,s<br />

These are the outcome equations for the model that we estimate. To see this, notice<br />

that<br />

V 1,s =<br />

=<br />

=<br />

∑A 1<br />

a=19<br />

∑A 1<br />

a=19<br />

∑A 1<br />

a=19<br />

+<br />

ln Y a,s<br />

(1 + ρ) a<br />

δ a,s + X ′ β a,s + α ′ a,s θ + ε a,s + η 1,s × experience a + η 2,s × experience 2 a<br />

(1 + ρ) a<br />

δ a,s + X ′ β a,s + η 1,s × experience a + η 2,s × experience 2 a<br />

(1 + ρ) a<br />

[<br />

A1<br />

∑<br />

a=19<br />

]<br />

α a,s<br />

′<br />

(1 + ρ) a θ +<br />

∑A 1<br />

a=19<br />

= ¯δ 1,s + X ′ ¯β 1,s + ᾱ ′ 1,s θ + ¯ε 1,s<br />

ε a,s<br />

(1 + ρ) a


390 CARNEIRO, HANSEN, AND HECKMAN<br />

where<br />

A 1 = 29<br />

ρ = 0.03 (the prespecified discount rate)<br />

¯δ 1,s =<br />

¯β 1,s =<br />

∑A 1<br />

a=19<br />

∑A 1<br />

a=19<br />

δ a,s + η 1,s × experience a + η 2,s × experience 2 a<br />

(1 + ρ) a<br />

β a,s<br />

(1 + ρ) a<br />

A 1<br />

ᾱ<br />

1,s ′ = ∑ α a,s<br />

′<br />

(1 + ρ) a<br />

¯ε 1,s =<br />

a=19<br />

∑A 1<br />

a=19<br />

ε a,s<br />

(1 + ρ) a<br />

<strong>an</strong>d the terms for the second period <strong>of</strong> data for ages (30–65) are defined <strong>an</strong>alogously.<br />

The cost or psychic net return <strong>of</strong> going to college is written as<br />

ln P = δ P + Z ′ γ + α ′ P θ + ε P<br />

These costs c<strong>an</strong> be negative as they entail both psychic <strong>an</strong>d tuition components.<br />

Assuming that the agents know X, Z,θ <strong>an</strong>d ε P , the criterion for the choice <strong>of</strong><br />

schooling is<br />

V = E (V 1,c + V 2,c − V 1,h − V 2,h | X,θ) − E(ln P | Z, X,θ, ε P )<br />

= ¯δ 1,c + X ′ ¯β 1,c + ᾱ<br />

1,c ′ θ + ¯δ 2,c + X ′ ¯β 2,c + ᾱ<br />

2,c ′ θ − ¯δ 1,h − X ′ ¯β 1,h − ᾱ<br />

1,h ′ θ − ¯δ 2,h<br />

− X ′ ¯β 2,h − ᾱ<br />

2,h ′ θ − δ P − Z ′ γ − α ′ P θ − ε P<br />

= (¯δ 1,c + ¯δ 2,c − ¯δ 1,h − ¯δ 2,h − δ P ) + X ′ ( ¯β 1,c + ¯β 2,c − ¯β 1,h − ¯β 2,h ) − Z ′ γ<br />

+ (ᾱ<br />

1,c ′ + ᾱ′ 2,c − ᾱ′ 1,h − ᾱ′ 2,h − α′ P )θ − ε P<br />

Individuals go to college if V > 0. We test (<strong>an</strong>d do not reject) the hypothesis that at<br />

the time they make their college decision agents know their cost function <strong>an</strong>d both<br />

factors, θ, but not the uniquenesses in the outcome equations. These expressions<br />

c<strong>an</strong> be modified in <strong>an</strong> obvious way to accommodate other information sets.<br />

The test score equations have a similar structure. Let T j be test score j,<br />

T j = X ′ T ω j + α ′ test j<br />

θ + ε test j<br />

where X T is the vector <strong>of</strong> covariates in the test score equation, <strong>an</strong>d ω j is the<br />

coefficient vector. The distributions <strong>of</strong> the θ <strong>an</strong>d ε are nonparametrically identified<br />

under the assumptions supporting Theorems 1–5. In this article, we assume that


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 391<br />

each factor is generated by a mixture <strong>of</strong> normals distribution,<br />

(24)<br />

θ k ∼<br />

∑J k<br />

j=1<br />

p k, j φ ( f k | µ j,k ,τ j,k ) ,<br />

k = 1,...,K<br />

Mixtures <strong>of</strong> normals <strong>with</strong> a large enough number <strong>of</strong> components approximate <strong>an</strong>y<br />

distribution <strong>of</strong> θ k <strong>an</strong>d the ε arbitrarily well (Ferguson, 1983). We assume that the<br />

ε’s are normal although, in principle, they are nonparametrically identified from<br />

the <strong>an</strong>alysis <strong>of</strong> Theorem 5.<br />

We estimate the model using Markov Chain Monte Carlo methods as described<br />

in Appendix C for 55,000 iterations, discarding the first 5000 iterations to allow<br />

the chain to converge to its stationary distribution. We retain every 10th <strong>of</strong> the<br />

remaining 50,000 iterations for a total <strong>of</strong> 5000 iterations. 27 The Markov Chain<br />

mixes well <strong>with</strong> most autocorrelations dying out at around lag 25 to 50.<br />

We estimate models <strong>with</strong> one factor <strong>an</strong>d <strong>with</strong> two factors. The estimated coefficients<br />

are presented as Tables A1 through A5 in the supplementary tables<br />

on the website (http://lily.src.uchicago.edu/CHH estimating.html). The two-factor<br />

model specifies that the first factor only appears in test scores <strong>an</strong>d choice equations<br />

whereas the second factor appears in all equations. No additional factors are<br />

necessary to fit our data. Thus, we conclude that the innovations in the earnings<br />

process (ε j a) are not in the agent’s information set at the time schooling decisions<br />

are made. If they were, they would be <strong>an</strong> additional source <strong>of</strong> covari<strong>an</strong>ce (i.e.,<br />

they would generate additional factors) between the choice equation <strong>an</strong>d future<br />

earnings. If we use only one factor that enters in all equations, the quality <strong>of</strong> the fit<br />

is much poorer (results available on request). From this testing procedure we infer<br />

that agents know both components <strong>of</strong> θ at the time they enroll in college. Figure 1<br />

shows the fit <strong>of</strong> the density <strong>of</strong> the present value <strong>of</strong> log earnings (or lifetime utility<br />

<strong>of</strong> earnings excluding psychic costs <strong>an</strong>d benefits) for everyone in the population.<br />

It graphs the actual <strong>an</strong>d predicted densities <strong>of</strong> gross utility. The fit is very good.<br />

Results for each schooling group are available in the supplement on the website<br />

<strong>an</strong>d are equally good (χ 2 goodness <strong>of</strong> fit tests are passed overall as well as for<br />

the distribution <strong>of</strong> utility for each schooling group; see Table A6). In order to<br />

achieve this good fit it is necessary to allow for nonnormal factors. Figure 2 shows<br />

the density <strong>of</strong> each <strong>of</strong> the estimated factors <strong>an</strong>d compares them <strong>with</strong> a benchmark<br />

normal <strong>with</strong> the same me<strong>an</strong> <strong>an</strong>d st<strong>an</strong>dard deviation. Neither factor is normal. 28<br />

There is evidence <strong>of</strong> selection on ability (factor 1), <strong>with</strong> the less able less likely to<br />

attend college. There is weaker evidence <strong>of</strong> selection on factor 2 (see Figures A1<br />

<strong>an</strong>d A2 posted at the website).<br />

Table 3a,b presents the factor loadings in the outcome, choice, <strong>an</strong>d measurement<br />

equations. 29 Both factors have a positive effect on gross utility for both schooling<br />

levels in each period <strong>an</strong>d on schooling attainment (the I). Factor 1 explains most<br />

<strong>of</strong> the vari<strong>an</strong>ce in the test score system (see Table 3b) whereas factor 2 explains<br />

27 The run time was about 122 minutes on a 1.2 GHz AMD Athlon PC.<br />

28 The distributions <strong>of</strong> the factors by schooling level are shown in Figures A1 <strong>an</strong>d A2 on the website.<br />

29 The coefficient estimates for the model are posted on the website.


392 CARNEIRO, HANSEN, AND HECKMAN<br />

0.35<br />

Actual<br />

Predicted<br />

0.3<br />

0.25<br />

Dens ity(Utility)<br />

0.2<br />

0.15<br />

0.1<br />

0.05<br />

0<br />

0 2 4 6 8 10 12<br />

Utility<br />

All densities are estimated using a 100 point grid over the domain <strong>an</strong>d a Gaussi<strong>an</strong> kernel <strong>with</strong> b<strong>an</strong>dwidth <strong>of</strong> 0.12.<br />

Utility=<br />

1<br />

Σ a(1+0.03) a log(Y a,s )<br />

FIGURE 1<br />

DENSITY OF EX POST GROSS UTILITY<br />

most <strong>of</strong> the vari<strong>an</strong>ce in the utility outcome system (see Table 3a). The return to<br />

college in terms <strong>of</strong> gross utility (gross utility differences) is given by<br />

V 1,c + V 2,c − V 1,h − V 2,h = (¯δ 1,c + ¯δ 2,c − ¯δ 1,h − ¯δ 2,h ) + X ′ ( ¯β 1,c + ¯β 2,c − ¯β 1,h − ¯β 2,h )<br />

+ (ᾱ<br />

1,c ′ + ᾱ′ 2,c − ᾱ′ 1,h − ᾱ′ 2,h )θ +(¯ε 1,c + ¯ε 2,c − ¯ε 1,h − ¯ε 2,h )<br />

Both factors raise returns (see the base <strong>of</strong> Table 3a). Although the second factor<br />

explains much more <strong>of</strong> the vari<strong>an</strong>ce in utility th<strong>an</strong> the first factor, the first factor<br />

explains more <strong>of</strong> the vari<strong>an</strong>ce in returns th<strong>an</strong> the second factor, although it<br />

only explains 30% <strong>of</strong> the vari<strong>an</strong>ce in returns. We infer that agents know θ (the<br />

factors) based on the superior fit <strong>of</strong> a model that includes nonzero factor loadings<br />

on both factors in the choice equation but not the innovations in outcomes (the<br />

ε’s in the outcome equations) at the time they make their schooling decisions.<br />

Our results indicate that the unpredictability in gross utility gains (i.e., differences)<br />

<strong>of</strong> going to college is much larger th<strong>an</strong> the unpredictability in utility levels.<br />

Both factors have a negative impact on “costs” (the factor loadings are negative in<br />

the “cost” or psychic return function). Therefore, both factors positively influence


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 393<br />

1<br />

0.9<br />

0.8<br />

vari<strong>an</strong>ce = 0.3019<br />

θ1<br />

normal version <strong>of</strong> θ1<br />

θ2<br />

normal version <strong>of</strong> θ2<br />

0.7<br />

Density( )<br />

θ<br />

0.6<br />

0.5<br />

0.4<br />

vari<strong>an</strong>ce = 0.5747<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2<br />

θ<br />

Normal densities are defined to be normal <strong>with</strong> same me<strong>an</strong> <strong>an</strong>d vari<strong>an</strong>ce as the corresponding θ.<br />

All dens ities are es timated us ing a 100 point grid over the domain <strong>an</strong>d a G aus s i<strong>an</strong> kernel <strong>with</strong><br />

b<strong>an</strong>dwidth <strong>of</strong> 0.12.<br />

FIGURE 2<br />

FACTOR AND NORMAL DENSITIES<br />

the likelihood <strong>of</strong> going to college since both contribute positively to returns <strong>an</strong>d<br />

negatively to costs.<br />

Figure 3 plots the estimated ex post factual <strong>an</strong>d counterfactual gross college<br />

utility densities for college graduates <strong>an</strong>d high school graduates, respectively<br />

(see Figure A3 on the website for the corresponding figure for high school utility).<br />

College graduates have the highest level <strong>of</strong> gross utility both as high school graduates<br />

<strong>an</strong>d as college graduates. They also have the highest gross gains <strong>of</strong> going to<br />

college as demonstrated in Figure 4. 30,31 Figure 5 presents the marginal treatment<br />

effect as defined in Equation (6) using utils as the outcome. This is the me<strong>an</strong> ex post<br />

gross gain in utils <strong>of</strong> going to college as a function <strong>of</strong> ε W , which is <strong>an</strong> index <strong>of</strong> variables<br />

that increase the likelihood <strong>of</strong> enrollment in college. It shows that individuals<br />

30 If we consider net gains by subtracting costs, the difference between college graduates <strong>an</strong>d high<br />

school graduates will be even higher because costs are lower for college graduates.<br />

31 We c<strong>an</strong> also compute gross utility gains as a percentage <strong>of</strong> the gross utility in high school as<br />

See Figure A4 on the website.<br />

R = V 1,c + V 2,c<br />

V 1,h + V 2,h<br />

− 1.


394 CARNEIRO, HANSEN, AND HECKMAN<br />

TABLE 3a<br />

FACTOR LOADINGS<br />

Utility<br />

Factor Loading<br />

St<strong>an</strong>dard Error<br />

Potential first period θ 1 0.1419 0.0324<br />

Utility in high school θ 2 1.0000 0.0000<br />

Total vari<strong>an</strong>ce Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by:<br />

0.3460 θ 1 = 0.0351 θ 2 = 0.8717<br />

Potential second period θ 1 0.2277 0.0519<br />

Utility in high school θ 2 1.6432 0.0262<br />

Total vari<strong>an</strong>ce Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by:<br />

0.8951 θ 1 = 0.0349 θ 2 = 0.9096<br />

Potential first period θ 1 0.1888 0.0559<br />

Utility in college θ 2 0.9402 0.0676<br />

Total vari<strong>an</strong>ce Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by:<br />

0.3455 θ 1 = 0.0634 θ 2 = 0.7718<br />

Potential second period θ 1 0.3908 0.0979<br />

Utility in college θ 2 1.7217 0.1203<br />

Total vari<strong>an</strong>ce Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by:<br />

1.0860 θ 1 = 0.0848 θ 2 = 0.8241<br />

Total vari<strong>an</strong>ce for schooling ‘s’ at period ‘a’ = α 2 s,a,1 σ 2 θ1 + α2 s,a,2 σ 2 θ2 + σ 2 εsa .<br />

Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by factor ‘k’, in schooling ‘s’ at period ‘a’ = α 2 s,a,k σ 2 θ κ<br />

/Total<br />

vari<strong>an</strong>ce<br />

Returns<br />

Factor Loading<br />

St<strong>an</strong>dard Error<br />

(Ut. College- θ 1 0.2039 0.1523<br />

Ut. High School) θ 2 0.0235 0.1892<br />

Total vari<strong>an</strong>ce Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by:<br />

0.2835 θ 1 = 0.1164 θ 2 = 0.0359<br />

Total vari<strong>an</strong>ce = (α c,2,1 + α c,1,1 − α HS,2,1 − α HS,1,1 ) 2 σ 2 θ1 + (α c,2,2 + α c,1,2 − α HS,2,2 − α HS,1,2 ) 2 σ 2 θ 2<br />

+<br />

σ 2 εc,2 + σ 2 εc,1 + σ 2 ε HS,2<br />

+ σ 2 ε HS,1<br />

.<br />

Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by factor ‘k = (α c,2,k + α c,1,k − α HS,2,k − α HS,1,k ) 2 σθk 2 vari<strong>an</strong>ce.<br />

/Total<br />

who are likely to enroll in college have higher returns to college th<strong>an</strong> those who are<br />

unlikely to enroll in college who have lower values <strong>of</strong> ε W . Figure 5 also shows the<br />

distribution <strong>of</strong> ε W in the population. Most <strong>of</strong> the mass <strong>of</strong> this distribution is at values<br />

<strong>of</strong> ε W around 0. M<strong>an</strong>y individuals have negative gross utility returns (excluding<br />

psychic benefits <strong>of</strong> going to college). Even among those deciding to go to college,<br />

39.53% would have higher utility (ignoring psychic components) had they not<br />

gone to college. There is a definite fall in utility gains as college enrollment is exp<strong>an</strong>ded<br />

to the less college prone. Table 4 confirms Figure 3 <strong>an</strong>d shows that college<br />

graduates have higher ex post potential high school <strong>an</strong>d college utility th<strong>an</strong> high<br />

school graduates in high school <strong>an</strong>d in college (these are gross utilities). Table 5<br />

shows that the gross ex post returns <strong>of</strong> going to college are higher for those who


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 395<br />

TABLE 3b<br />

FACTOR LOADINGS<br />

AFQT<br />

Factor Loading<br />

St<strong>an</strong>dard Error<br />

Arithmetic reasoning θ 1 1.0000 0.0000 0.7391<br />

Total vari<strong>an</strong>ce Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by:<br />

0.7764 θ 1 = 0.7391<br />

Coding speed θ 1 0.9672 0.0275 0.7308<br />

Total vari<strong>an</strong>ce Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by:<br />

0.7340 θ 1 = 0.7308<br />

Math knowledge θ 1 0.6313 0.0350 0.2843<br />

Total vari<strong>an</strong>ce Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by<br />

0.8049 θ 1 = 0.2843<br />

Word knowledge θ 1 0.7508 0.0317 0.5219<br />

Total vari<strong>an</strong>ce Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by:<br />

0.6193 θ 1 = 0.5219<br />

Paragraph composition θ 1 0.8080 0.0345 0.5301<br />

Total vari<strong>an</strong>ce Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by:<br />

0.7061 θ 1 = 0.5301<br />

Total vari<strong>an</strong>ce for test ‘t’ = α 2 t,1 σ 2 θ1 + α2 t,2 σ 2 θ2 + σ 2 εt .<br />

Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by factor ‘k’ = αt,k 2 σ θk 2 /Total vari<strong>an</strong>ce.<br />

Choice<br />

Factor Loading<br />

St<strong>an</strong>dard Error<br />

Cost function ∗ θ 1 −2.1250 0.5042<br />

θ 2 −1.0278 0.3799<br />

Total vari<strong>an</strong>ce Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by:<br />

0.8951 θ 1 = 0.0349 θ 2 = 0.9096<br />

Choice ∗∗ θ 1 2.3349 0.4904<br />

θ 2 1.0466 0.4277<br />

Total vari<strong>an</strong>ce Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by:<br />

6.1544 θ 1 = 0.5297 θ 2 = 0.0604<br />

∗ Cost = µ cost + α cost,1 θ 1 + α cost,2 θ 2 + ε cost .<br />

∗∗ Choice = µ c,2 + µ c,1 − µ HS,2 − µ HS,1 + (α c,2,1 + α c,1,1 − α HS,2,1 − α HS,1,1 − α cost,1 ) θ 1 +<br />

(α c,2,2 + α c,1,2 − α HS,2,2 − α HS,1,2 − α cost,2 ) θ 2 − µ cost − ε cost .<br />

Total vari<strong>an</strong>ce <strong>of</strong> cost = α 2 cost,1 σ 2 θ1 + α2 cost,2 σ 2 θ2 + σ 2 ε cost<br />

.<br />

Proportion <strong>of</strong> total vari<strong>an</strong>ce <strong>of</strong> cost explained by factor ‘k’ = αcost,k 2 σ θk 2 /Total vari<strong>an</strong>ce <strong>of</strong> cost.<br />

Total vari<strong>an</strong>ce <strong>of</strong> choice = (α c,2,1 + α c,1,1 − α HS,2,1 − α HS,1,1 − α cost,1 ) 2 σ 2 θ1 + (α c,2,2 + α c,1,2 −<br />

α HS,2,2 − α HS,1,2 − α cost,2 ) 2 σ 2 θ2 + σ 2 εcost .<br />

Proportion <strong>of</strong> total vari<strong>an</strong>ce explained by factor ‘k”=(α c,2,k + α c,1,k − α HS,2,k − α HS,1,k − α cost,k ) 2 σθ 2 k<br />

/<br />

Total vari<strong>an</strong>ce <strong>of</strong> choice.<br />

choose to go to college. These results are expected given the pattern shown in<br />

Figure 4. The returns for attending college for the average high school graduate<br />

are negative. The ex post gross returns to college for the individual at the margin<br />

(V = 0) are about 0.59% <strong>of</strong> total high school utility. Since these individuals are<br />

exactly at the margin, these gains correspond exactly to the cost they are facing.


396 CARNEIRO, HANSEN, AND HECKMAN<br />

0.09<br />

0.08<br />

High School*<br />

College**<br />

0.07<br />

0.06<br />

Density(Utility)<br />

0.05<br />

0.04<br />

0.03<br />

0.02<br />

0.01<br />

0<br />

0 2 4 6 8 10 12<br />

Utility<br />

* Counterfactual: f( V c |Choice=High School)<br />

** Predicted: f( V c |Choice=College)<br />

All densities are estimated using a 100 point grid over the domain <strong>an</strong>d a Gaussi<strong>an</strong> kernel <strong>with</strong> b<strong>an</strong>dwidth <strong>of</strong> 0.12.<br />

Utility=<br />

1<br />

Σ a(1+0.03) a log(Y a,s )<br />

FIGURE 3<br />

DENSITY OF EX POST COLLEGE GROSS UTILITY<br />

Once we account for the nonmonetary costs <strong>an</strong>d benefits <strong>of</strong> going to college (net<br />

returns reported in the bottom two rows <strong>of</strong> Table 5), the relative returns <strong>of</strong> going<br />

to college become more negative for high school graduates <strong>an</strong>d more positive for<br />

college graduates. Since ln P c<strong>an</strong> be allocated as either a cost or a return, there<br />

are two ways to compute returns depending on whether ln P is treated as a cost<br />

(row 2) or a return (row 3). We present two sets <strong>of</strong> net return estimates depending<br />

on how costs or gains (ln P) are allocated. These are bounds since the actual<br />

allocation between cost <strong>an</strong>d benefit is indeterminate.<br />

The patterns <strong>of</strong> Figures 3–5 are essentially reproduced for ex post present<br />

value <strong>of</strong> earnings in Figures 6–8. Table 6 shows that college graduates have earnings<br />

57.6% higher th<strong>an</strong> they would have had (or $608,372 higher, on average) if<br />

they had not attended college. High school graduates have a gross gain <strong>of</strong> 43%<br />

(or $362,987) if they go to college. Notice that even though the utility gains <strong>of</strong><br />

going to college are negative for high school graduates, the money returns are<br />

positive <strong>an</strong>d large. Table 7 shows that even though 39.66% <strong>of</strong> the persons going<br />

to college would have had a higher utility in high school th<strong>an</strong> in college


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 397<br />

TABLE 4<br />

AVERAGE GROSS UTILITY IN DIFFERENT STATES (FACTUAL OR COUNTERFACTUAL) FOR PERSONS WHO GO TO<br />

HIGH SCHOOL OR WHO GO TO COLLEGE AND FOR PEOPLE AT THE MARGIN<br />

(Does Not Include Utility “Cost” or Psychic Returns to College)<br />

Actual Schooling Level<br />

Factual or Counterfactual<br />

Utility for People<br />

Schooling Level High School 1 College 2 at Margin 3<br />

High school + 7.8580 8.6125 8.305<br />

Std. error 0.0604 0.0737 0.1388<br />

College ++ 7.7262 8.6885 8.305<br />

Std. error 0.0638 0.0763 0.1388<br />

1 + E(V h | choice = high school) <strong>an</strong>d 1 ++ E(V c | choice = high school).<br />

2 + E(V h | choice = college) <strong>an</strong>d 2 ++ E(V c | choice = college).<br />

3 + E(V h | V = 0) <strong>an</strong>d 3 ++ E(V c | V = 0).<br />

0.7<br />

0.6<br />

High School*<br />

College**<br />

0.5<br />

Density(Utility Differences)<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

-3 -2 -1 0 1 2 3<br />

*f(V c -V h |Choice=High School)<br />

Utility Differences<br />

** f(V c -V h |Choice=College)<br />

All densities are estimated using a 100 point grid over the domain <strong>an</strong>d a Gaussi<strong>an</strong> kernel <strong>with</strong> b<strong>an</strong>dwidth <strong>of</strong> 0.12.<br />

1<br />

Utility= Σ a(1+0.03) a log(Y a,s<br />

)<br />

FIGURE 4<br />

DENSITY OF EX POST GROSS UTILITY DIFFERENCES (COLLEGE–HIGH SCHOOL)<br />

(ignoring psychic gains), only 6.9% <strong>of</strong> this population had higher earnings in<br />

high school th<strong>an</strong> in college. Once we account for psychic benefits, the proportion<br />

<strong>of</strong> college students regretting their decisions is roughly the same whether we<br />

measure regret in present value or utils. This shows the import<strong>an</strong>ce <strong>of</strong> accounting


398 CARNEIRO, HANSEN, AND HECKMAN<br />

TABLE 5<br />

FACTUAL AND COUNTERFACTUAL RETURNS FOR PERSONS WHO GO TO HIGH<br />

SCHOOL OR COLLEGE<br />

Gross Return High School 1 College 2<br />

College vs. high school (relative) + −0.0180 0.0126<br />

Std. error 0.1590 0.0178<br />

Net returns<br />

College vs. high school (relative) ++ −0.2398 0.3161<br />

Std. error 0.2502 0.3178<br />

College vs. high school (relative) +++ −0.4227 0.1892<br />

Std. error 0.5770 0.0144<br />

1 + E((V c /V h ) − 1 | choice = high school).<br />

2 + E((V c /V h ) − 1 | choice = college).<br />

1 ++ E((V c /V h − ln P)/(V h + ln P) | choice = high school).<br />

2 ++ E((V c /V h − ln P)/(V h + ln P) | choice = college).<br />

1 +++ E((V c /V h − ln P)/(V h ) | choice = high school).<br />

2 +++ E((V c /V h − ln P)/(V h ) | choice = college).<br />

NOTE: We make the distinction between the second <strong>an</strong>d third line in<br />

this table because in our framework we c<strong>an</strong>not separate nonmonetary<br />

costs from nonmonetary benefits <strong>of</strong> going to college, so we allocate<br />

ln P both ways.<br />

for psychic returns in <strong>an</strong>alyzing schooling choices. Among high school graduates,<br />

95.90% do not regret not going to college (measured in utils), but 85.26%<br />

regret the decision fin<strong>an</strong>cially. The marginal treatment effect has the same general<br />

shape when present values <strong>of</strong> earnings are used instead <strong>of</strong> gross utility (see<br />

Figure 8).<br />

Table 8 shows the probability <strong>of</strong> being in decile i <strong>of</strong> the college potential discounted<br />

earnings distribution conditional on being in decile j <strong>of</strong> the high school<br />

potential earnings distribution. (These are gross earnings.) It shows that neither<br />

<strong>an</strong> independence assumption across counterfactual outcomes, which is the Veil<br />

<strong>of</strong> Ignor<strong>an</strong>ce assumption used in applied welfare theory, (see, e.g., Sen, 1973)<br />

or in aggregate income inequality decompositions (DiNardo et al., 1996), nor a<br />

perfect r<strong>an</strong>king assumption, which is sometimes used to construct counterfactual<br />

joint distributions <strong>of</strong> outcomes (see, e.g., Heckm<strong>an</strong> et al., 1997 or Athey <strong>an</strong>d<br />

Imbens, 2002) are satisfied in the data. There is a strong positive dependence<br />

between potential outcomes in each counterfactual state, but there is no perfect<br />

dependence. There are subst<strong>an</strong>tial nonzero elements outside the diagonal.<br />

We get similar results for utility (discounted log earnings). See Table A13 at our<br />

website.<br />

We have already shown that there is a large dispersion in the distribution <strong>of</strong><br />

utilities, utility returns, earnings, <strong>an</strong>d earnings returns to college. However, this<br />

dispersion c<strong>an</strong> be due to heterogeneity that is known at the time the agent makes<br />

schooling decisions, or it c<strong>an</strong> be due to heterogeneity that is not predictable by the<br />

agent at that time. Figure 9 plots the densities <strong>of</strong> the unforecastable component<br />

<strong>of</strong> college gross utilities at the time college decisions are made for fixed X values,


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 399<br />

1<br />

MTE<br />

Density( ε W )<br />

0.04<br />

Marginal Treatment Effect<br />

0<br />

0.02<br />

-1<br />

-10 -8 -6 -4 -2 0 2 4 6 8 10 0<br />

ε W =(α' 1,c + α ' 2,c - α' 1,h - α' 2,h - α' p )θ - ε p<br />

ε<br />

W<br />

All densities are estimated using a 100 point grid over the domain <strong>an</strong>d a Gaussi<strong>an</strong> kernel <strong>with</strong> b<strong>an</strong>dwidth <strong>of</strong> 0.12.<br />

Density( ε W )<br />

FIGURE 5<br />

DENSITY OF ε W AND MARGINAL TREATMENT EFFECT: E(V c − V h |ε W )<br />

under three different information sets. (The X are fixed at their me<strong>an</strong>s.) The solid<br />

line corresponds to the case where the agent does not know his factor (θ) nor his<br />

innovations (the ε’s in the outcome equations). The other two lines correspond<br />

respectively to the cases where the agent knows θ 2 only, or both θ 1 <strong>an</strong>d θ 2 . 32<br />

Knowledge <strong>of</strong> θ 2 dramatically decreases the uncertainty faced, but knowledge <strong>of</strong><br />

factor 1 (associated <strong>with</strong> cognitive ability) has only a small effect on the amount <strong>of</strong><br />

uncertainty faced by the agent. We obtain a similar figure in terms <strong>of</strong> gross utility<br />

in high school. 33 However, even though knowledge <strong>of</strong> θ 2 reduces dramatically<br />

the amount <strong>of</strong> uncertainty faced in terms <strong>of</strong> the levels <strong>of</strong> gross utility in each<br />

counterfactual state, it has only a small effect on the uncertainty faced in terms <strong>of</strong><br />

returns (see Figure 10). Table 9 reports the vari<strong>an</strong>ces <strong>of</strong> gross <strong>an</strong>d net utility <strong>an</strong>d<br />

gross <strong>an</strong>d net present value <strong>of</strong> earnings under different information sets <strong>of</strong> agents.<br />

Giving agents more information (knowledge <strong>of</strong> factors) reduces the vari<strong>an</strong>ce in<br />

utilities or present values as perceived by agents. However, reducing uncertainty<br />

32 If the agent knows θ 1 , θ 2 , ε college , <strong>an</strong>d ε high school then he faces no uncertainty.<br />

33 These results are available on request from the authors, <strong>an</strong>d are posted on the website.


400 CARNEIRO, HANSEN, AND HECKMAN<br />

1.4 x10-3<br />

High School*<br />

College**<br />

1.2<br />

1<br />

Density(Earnings Differences)<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

-500 0 500 1000 1500 2000<br />

Earnings Differences (1000's)<br />

*f(PV c -PV h |Choice=High School)<br />

**f(PV c -PV h |Choice=College)<br />

PV<br />

1<br />

h = Σ a (1+0.03) a Y h,a<br />

All densities are estimated using a 100 point grid over the domain <strong>an</strong>d a Gaussi<strong>an</strong> kernel <strong>with</strong> b<strong>an</strong>dwidth <strong>of</strong> 0.12.<br />

FIGURE 6<br />

DENSITY OF EX POST GROSS LIFETIME EARNINGS DIFFERENCES (COLLEGE–HIGH SCHOOL)<br />

barely budges the forecast returns to schooling measured in dollars or utils—the<br />

message <strong>of</strong> Figure 10. Analogous results are obtained for present value <strong>of</strong> earnings.<br />

See Figures A15 <strong>an</strong>d A16 posted at our website.<br />

The fact that a two-factor model is adequate to fit the data implies that the agents<br />

c<strong>an</strong>not forecast future shocks <strong>of</strong> log earnings (¯ε 1,c , ¯ε 2,c , ¯ε 1,h , ¯ε 2,h ) at the time they<br />

make their schooling decision. (If they did, they would enter as additional factors<br />

in the estimated model.) Even though the factors (θ) explain most <strong>of</strong> the vari<strong>an</strong>ce<br />

in the levels <strong>of</strong> utilities, they explain less th<strong>an</strong> half <strong>of</strong> the vari<strong>an</strong>ce in returns, which<br />

may lead the reader to conclude that the reason so m<strong>an</strong>y college graduates would<br />

have higher gross utility in high school th<strong>an</strong> in college (39%) is because they c<strong>an</strong>not<br />

accurately forecast their returns <strong>of</strong> going to college. However, this is not the case.<br />

As shown in Table 7 once we account for psychic benefits or costs <strong>of</strong> attending<br />

college (P) relative to attending high school, only 8% <strong>of</strong> college graduates regret<br />

going to college. This suggests a subst<strong>an</strong>tial part <strong>of</strong> the gain to college is due to<br />

nonpecuniary components. Furthermore, Table 10 shows that if individuals had


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 401<br />

0.9<br />

1<br />

High School*<br />

College**<br />

0.8<br />

0.7<br />

Density(Returns)<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

-0.5 0 0.5 1 1.5 2 2.5<br />

Returns<br />

*f((PV c /PV h )-1|Choice=High School)<br />

**f((PV c<br />

/PV h<br />

)-1|Choice=College)<br />

PV<br />

1<br />

h = Σ a (1+0.03) a Y h,a<br />

All densities are estimated using a 100 point grid over the domain <strong>an</strong>d a Gaussi<strong>an</strong> kernel <strong>with</strong> b<strong>an</strong>dwidth <strong>of</strong> 0.12.<br />

FIGURE 7<br />

DENSITY OF EX POST RELATIVE GROSS EARNINGS DIFFERENCES (COLLEGE–HIGH SCHOOL)<br />

knowledge <strong>of</strong> (¯ε 1,c , ¯ε 2,c , ¯ε 1,h , ¯ε 2,h ), keeping their average expected earnings the<br />

same, very few <strong>of</strong> them would ch<strong>an</strong>ge their schooling decision. Uncertainty in<br />

gains to schooling is subst<strong>an</strong>tial but knowledge <strong>of</strong> this uncertainty has a very small<br />

effect on the choice <strong>of</strong> schooling because the vari<strong>an</strong>ce <strong>of</strong> gains is so much smaller<br />

th<strong>an</strong> the vari<strong>an</strong>ce <strong>of</strong> psychic costs or benefits, <strong>an</strong>d it is the latter that drives most<br />

<strong>of</strong> the heterogeneity in schooling decisions. In addition, there is uncertainty about<br />

the level <strong>of</strong> both college <strong>an</strong>d high school earnings. See the vari<strong>an</strong>ces reported for<br />

each in Table 9. The uncertainty in the return comes from both sources although<br />

the literature emphasizes the uncertainty in college earnings. When conducting<br />

this experiment, we make sure that the average expected earnings are the same<br />

because a me<strong>an</strong> preserving reduction in the uncertainty faced by the agents in<br />

terms <strong>of</strong> utility is not the same as a me<strong>an</strong> preserving ch<strong>an</strong>ge in uncertainty in terms<br />

<strong>of</strong> levels <strong>of</strong> earnings (see Appendix D). 34 In particular, a ch<strong>an</strong>ge in the vari<strong>an</strong>ce <strong>of</strong><br />

(¯ε 1,c , ¯ε 2,c , ¯ε 1,h , ¯ε 2,h ) would not ch<strong>an</strong>ge the expected utility in each schooling level<br />

34 See the numbers posted at the website.


402 CARNEIRO, HANSEN, AND HECKMAN<br />

8<br />

MTE<br />

Density( ε W)<br />

0.04<br />

6<br />

Relative Marginal Treatment Effect<br />

4<br />

0.02<br />

2<br />

0<br />

-2<br />

-10 -8 -6 -4 -2 0 2 4 6 8 10 0<br />

Density( W )<br />

ε<br />

ε W<br />

ε W =(α' 1,c + α ' 2,c - α' 1,h - α' 2,h - α' ' p )θ - ε p<br />

All dens ities are es timated us ing a 100 point grid over the domain <strong>an</strong>d a G aus s i<strong>an</strong> kernel <strong>with</strong> b<strong>an</strong>dwidth <strong>of</strong> 0.12.<br />

FIGURE 8<br />

DENSITY OF ε w AND RELATIVE MARGINAL EX POST TREATMENT EFFECT FOR PRESENT VALUE OF GROSS<br />

EARNINGS E((PV c /PV h ) − 1 | ε w )<br />

but would ch<strong>an</strong>ge expected earnings in each schooling level. The numbers reported<br />

in Table 10 take this into account. When agents know their (¯ε 1,c , ¯ε 2,c , ¯ε 1,h , ¯ε 2,h ),<br />

they face less uncertainty. Knowing these components is equivalent to setting<br />

= 0 in the expression at the end <strong>of</strong> Section 6, a special case <strong>of</strong> me<strong>an</strong> preserving<br />

shrinkage where vari<strong>an</strong>ces are set to zero. The expected utility at each schooling<br />

level increases. 35<br />

8. SOME EVIDENCE ON AN EDUCATIONAL REFORM<br />

Using the estimated model, we evaluate the effect <strong>of</strong> a full subsidy to college<br />

tuition. We move beyond the Veil <strong>of</strong> Ignor<strong>an</strong>ce, which is based on <strong>an</strong> <strong>an</strong>onymity<br />

35 We compute the compensation (which c<strong>an</strong> be negative or positive) required by each individual<br />

to keep average earnings the same after the uncertainty is reduced. Then we provide the individual<br />

<strong>with</strong> this compensation together <strong>with</strong> knowledge <strong>of</strong> (¯ε 1,c , ¯ε 2,c , ¯ε 1,h , ¯ε 2,h ) <strong>an</strong>d finally we compute<br />

the percentage <strong>of</strong> individuals who would ch<strong>an</strong>ge their schooling decision if they had knowledge <strong>of</strong><br />

(¯ε 1,c , ¯ε 2,c , ¯ε 1,h , ¯ε 2,h ) but had the same present value <strong>of</strong> earnings in each schooling level. We use the<br />

procedure described at the end <strong>of</strong> Section 6 applied to each period to adjust utility for the effects <strong>of</strong><br />

me<strong>an</strong> preserving spreads in earnings (see Appendix D).


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 403<br />

TABLE 6<br />

RETURNS TO COLLEGE IN TERMS OF LIFETIME EARNINGS (EXCLUDES NONPECUNIARY RETURNS) FOR PEOPLE<br />

WHO GO TO HIGH SCHOOL, COLLEGE, OR ARE AT THE MARGIN<br />

Earnings for People<br />

High School 1 College 2 at Margin 3<br />

Gross returns<br />

College vs. high school + 0.4379 0.5764 0.5274<br />

Std. error 0.0228 0.0365 0.0634<br />

Net returns<br />

College vs. high school ++ 0.4162 0.5607 0.5092<br />

Std. error 0.0213 0.0366 0.0605<br />

1 + E((PV c /PV h ) − 1 | choice = high school).<br />

2 + E((PV c /PV h ) − 1 | choice = college).<br />

3 + E((PV c /PV h ) − 1 | V = 0).<br />

1 ++ E((PV c /(PV h +PV tuition )) − 1 | choice = high school).<br />

2 ++ E((PV c /(PV h +PV tuition )) − 1 | choice = college).<br />

3 ++ E((PV c /(PV h +PV tuition )) − 1 | V = 0).<br />

PV j = ∑ a (1/(1 + 0.03)) a Y a,j , that is, the interest rate is 3%.<br />

TABLE 7<br />

PERCENTAGE OF PEOPLE WITH NEGATIVE RETURNS TO COLLEGE (NET AND GROSS)<br />

Gross<br />

Net ∗<br />

Utility Earnings Utility Earnings<br />

High school graduates 56.22% 13.62% 95.91% 14.74%<br />

College graduates 39.66% 6.90% 8.32% 7.28%<br />

∗ Net me<strong>an</strong>s net <strong>of</strong> total cost for utility <strong>an</strong>d net <strong>of</strong> tuition costs for earnings.<br />

TABLE 8<br />

Pr(d i < V c ≤ d i+1 | d j < V h ≤ d j+1 ) WHERE d i IS THE iTH DECILE OF THE PRESENT VALUE OF THE COLLEGE<br />

EARNINGS DISTRIBUTION AND d j IS THE jTH DECILE OF THE PRESENT VALUE OF THE HIGH SCHOOL EARNINGS<br />

DISTRIBUTION ∗<br />

College Deciles<br />

High school<br />

Deciles 1 2 3 4 5 6 7 8 9 10<br />

1 0.7436 0.1936 0.0459 0.0121 0.0035 0.0009 0.0003 0.0001 0.0000 0.0000<br />

2 0.1846 0.3799 0.2372 0.1173 0.0503 0.0206 0.0072 0.0022 0.0006 0.0001<br />

3 0.0482 0.2219 0.2640 0.2078 0.1337 0.0727 0.0344 0.0131 0.0036 0.0005<br />

4 0.0154 0.1108 0.1944 0.2172 0.1902 0.1371 0.0806 0.0389 0.0134 0.0021<br />

5 0.0055 0.0535 0.1240 0.1781 0.1986 0.1807 0.1372 0.0819 0.0341 0.0065<br />

6 0.0019 0.0253 0.0732 0.1274 0.1706 0.1917 0.1826 0.1359 0.0740 0.0175<br />

7 0.0006 0.0103 0.0382 0.0788 0.1271 0.1728 0.2011 0.1926 0.1357 0.0427<br />

8 0.0001 0.0038 0.0171 0.0422 0.0802 0.1300 0.1816 0.2257 0.2173 0.1020<br />

9 0.0000 0.0008 0.0053 0.0165 0.0379 0.0740 0.1288 0.2082 0.2919 0.2365<br />

10 0.0000 0.0000 0.0006 0.0026 0.0079 0.0194 0.0465 0.1015 0.2294 0.5921<br />

∗ Thus the number in row j column i is the probability that a person <strong>with</strong> potential high school earnings<br />

in the jth decile <strong>of</strong> the high school earnings distribution has potential college earnings in the ith decile<br />

<strong>of</strong> the college earnings distribution.


404 CARNEIRO, HANSEN, AND HECKMAN<br />

FIGURE 9<br />

DENSITY OF EX ANTE GROSS UTILITY UNDER DIFFERENT INFORMATION SETS<br />

assumption <strong>an</strong>d evaluates reforms considering only their overall impact on inequality,<br />

to consider those individuals that are benefited by the reform. We consider<br />

only partial equilibrium treatment effects <strong>an</strong>d do not consider the full cost<br />

<strong>of</strong> fin<strong>an</strong>cing the reforms. Table 4 shows the average lifetime gross utility <strong>of</strong> particip<strong>an</strong>ts<br />

before the policy ch<strong>an</strong>ge <strong>an</strong>d Table 5 shows their prepolicy average return<br />

to college. These tables compare these levels <strong>an</strong>d returns <strong>with</strong> what the marginal<br />

particip<strong>an</strong>t attracted into schooling by the policy would earn. The marginal person<br />

has lower utility in college <strong>an</strong>d lower returns to college th<strong>an</strong> the average person<br />

in college (also see Figure 5). Since the policy affects the schooling decisions<br />

<strong>of</strong> the individuals at the margin, the policy will produce a decline in the quality <strong>of</strong><br />

college graduates after the policy is implemented, since the new entr<strong>an</strong>ts are <strong>of</strong><br />

lower average quality th<strong>an</strong> the incumbents.<br />

Despite the subst<strong>an</strong>tial size <strong>of</strong> the policy ch<strong>an</strong>ges we consider, the induced<br />

effects on participation are small. The full tuition subsidy only increases graduation<br />

from four-year college by 4%. 36 The policies operate unevenly over the deciles<br />

<strong>of</strong> the initial outcome distribution. Figure 11 shows the proportion <strong>of</strong> high school<br />

36 This comes from a simulation available on request from the authors.


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 405<br />

FIGURE 10<br />

DENSITY OF EX ANTE GROSS UTILITY DIFFERENCES (COLLEGE–HIGH SCHOOL) UNDER DIFFERENT<br />

INFORMATION SETS<br />

TABLE 9<br />

AGENTS FORECAST VARIANCE OF COLLEGE-HIGH SCHOOL RETURNS UNDER DIFFERENT INFORMATION SETS FOR<br />

THE AGENTS GROSS UTILITY<br />

Var(W col − W hs ) Var(W col ) Var(W hs )<br />

I =∅ 0.2836 2.5155 2.2747<br />

I = {f 2 } 0.2726 0.3590 0.1666<br />

I = {f 1 ,f 2 } 0.2355 0.1540 0.0815<br />

We purge the effect <strong>of</strong> urb<strong>an</strong>, south, cohort dummies, average local unemployment rate <strong>an</strong>d average<br />

wage on wages before computing these vari<strong>an</strong>ces.<br />

persons in each decile <strong>of</strong> the high school present value <strong>of</strong> earnings distribution<br />

induced to graduate from four-year college by the tuition subsidy. The figure<br />

shows that providing a free college education mostly affects people at the top end<br />

<strong>of</strong> the high school earnings distribution. 37 The policy does not benefit the poor. A<br />

37 The same result holds when we consider distributions <strong>of</strong> utilities instead <strong>of</strong> distributions <strong>of</strong> lifetime<br />

earnings. See Figure A15 on the website.


406 CARNEIRO, HANSEN, AND HECKMAN<br />

TABLE 10<br />

PEOPLE WHO CHOOSE DIFFERENTLY UNDER DIFFERENT INFORMATION SETS<br />

COMPENSATING FOR THE CHANGE IN RISK<br />

Original Choice<br />

Fraction that Ch<strong>an</strong>ge Choice<br />

Ĩ = {θ 1 , θ 2 , ε C , ε HS } Ĩ = {θ 1 }<br />

High school 0.1181 0.1091<br />

College 0.0159 0.0191<br />

Total 0.0866 0.0813<br />

0.1800<br />

0.1600<br />

0.1400<br />

0.1200<br />

Proportion<br />

0.1000<br />

0.0800<br />

0.0600<br />

0.0400<br />

0.0200<br />

0.0000<br />

1 2 3 4 5 6 7 8 9 10<br />

Decile<br />

FIGURE 11<br />

PROPORTION OF PEOPLE INDUCED INTO COLLEGE BY FULL SUBSIDY TO COLLEGE TUITION WHEN INFORMATION<br />

SET = {θ 1 , θ 2 } BY DECILE OF INITIAL HIGH SCHOOL EARNINGS DISTRIBUTION<br />

calculation based on the Veil <strong>of</strong> Ignor<strong>an</strong>ce using the Gini coefficient would show<br />

no effect <strong>of</strong> the policy up to two decimal points. Our <strong>an</strong>alysis relaxes the Veil <strong>of</strong><br />

Ignor<strong>an</strong>ce, <strong>an</strong>d lets us study the impact <strong>of</strong> policies on persons at different positions<br />

<strong>of</strong> the income distribution. It goes beyond the counterfactual simulations used in<br />

the inequality literature (see, e.g., DiNardo et al., 1996) to account for self-selection<br />

by agents into sectors in response to policy ch<strong>an</strong>ges.<br />

9. SUMMARY AND CONCLUSIONS<br />

This article uses low-dimensional factor models to generate counterfactual distributions<br />

<strong>of</strong> potential outcomes. It extends matching by allowing some <strong>of</strong> the<br />

variables that determine the conditional independence assumed in matching to<br />

be unobserved by the <strong>an</strong>alyst. Semiparametric identification is established.<br />

We apply our methods to a problem in the economics <strong>of</strong> education. We extend<br />

the Willis–Rosen model to explicitly account for dependence in potential<br />

outcomes across potential schooling states, to account for psychic benefits in<br />

the return to schooling <strong>an</strong>d to measure the effect <strong>of</strong> uncertainty on schooling<br />

choices. We extend the framework <strong>of</strong> Flavin (1981) <strong>an</strong>d H<strong>an</strong>sen et al. (1991), who


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 407<br />

estimate the impact <strong>of</strong> uncertainty on consumption choices to a discrete choice<br />

setting to estimate agent information sets. Our framework extends the inequality<br />

decomposition <strong>an</strong>alysis <strong>of</strong> DiNardo et al. (1996) to account for self-selection in<br />

the choice <strong>of</strong> sectors.<br />

Our <strong>an</strong>alysis reveals subst<strong>an</strong>tial heterogeneity in the returns to schooling,<br />

much <strong>of</strong> which is unpredictable at the time schooling decisions are made. We<br />

also find a subst<strong>an</strong>tial nonpecuniary return to college. Although there is subst<strong>an</strong>tial<br />

uncertainty in forecasting returns at the time schooling decisions are<br />

made, eliminating it has modest effects on schooling choices. Uncertainty is inherent<br />

in both college <strong>an</strong>d high school outcomes at the time schooling decisions<br />

are made. In addition, nonpecuniary factors play a domin<strong>an</strong>t role in schooling<br />

choices. The assumption <strong>of</strong> perfect r<strong>an</strong>king <strong>of</strong> potential outcome across alternative<br />

choices is soundly rejected, although potential outcomes are strongly positively<br />

correlated.<br />

We simulate a tuition reduction policy to determine who benefits or loses from<br />

it. We go beyond the Veil <strong>of</strong> Ignor<strong>an</strong>ce to determine which persons are affected<br />

by the policy. The policy favors those at the top <strong>of</strong> the income distribution. This<br />

simulation illustrates the power <strong>of</strong> our method to lift the Veil <strong>of</strong> Ignor<strong>an</strong>ce, <strong>an</strong>d<br />

to count the losers <strong>an</strong>d gainers from <strong>an</strong>y policy initiative.<br />

APPENDIX A: PROOFS OF THEOREMS 38<br />

PROOF OF THEOREM 1. The case where M consists <strong>of</strong> purely continuous components<br />

is trivial. We observe M c for each X <strong>an</strong>d c<strong>an</strong> recover the marginal distribution<br />

for each component. Recall that M is not state dependent.<br />

For the purely discrete case, we encounter the usual problem that there is no<br />

direct observable counterpart for µ d m (X). Under (A-1)–(A-5), we c<strong>an</strong> use the<br />

<strong>an</strong>alysis <strong>of</strong> M<strong>an</strong>ski (1988) to identify the slope coefficients βl,m d up to scale, <strong>an</strong>d the<br />

marginal distribution <strong>of</strong> Ul,m d . From the assumption that the me<strong>an</strong> (or medi<strong>an</strong>) <strong>of</strong><br />

Ul,m d is zero, we c<strong>an</strong> identify the intercept in βd l,m<br />

. We c<strong>an</strong> repeat this for all discrete<br />

components. Therefore, coordinate by coordinate we c<strong>an</strong> identify the marginal<br />

distributions <strong>of</strong> Um c , Ũd m ,µc m (X), <strong>an</strong>d ˜µd m (X), the latter up to scale (“∼” me<strong>an</strong>s<br />

identified up to scale).<br />

To recover the joint distribution write<br />

(<br />

Pr (M c ≤ m c , M d = (0,...,0) | X) = F U c m ,Ũm<br />

d mc − µ c m (X) , −˜µd m (X))<br />

by Assumption (A-2). To identify F U c m ,Ũ (t 1, t<br />

m d 2 ) for <strong>an</strong>y given evaluation points in<br />

the support <strong>of</strong> (Um c , Ũd m ), we know the function ˜µd m (X) <strong>an</strong>d using (A-3) we c<strong>an</strong><br />

find <strong>an</strong> X where ˜µ d m (X) = t 2. Let ̂x denote this value, so ˜µ d m (̂x) = t 2. In this pro<strong>of</strong>,<br />

t 1 , t 2 may be vectors. Thus,<br />

Pr (M c ≤ m c , M d = (0,...,0) | X = ̂x) = F U c m ,Ũ d m (m c − µ c m (̂x) , t 2)<br />

38 We th<strong>an</strong>k Edward Vytlacil for simplifying <strong>an</strong>d clarifying the statements <strong>an</strong>d pro<strong>of</strong>s <strong>of</strong> all three<br />

theorems in this section.


408 CARNEIRO, HANSEN, AND HECKMAN<br />

Let ̂m c = t 1 − µ c m (̂x) to obtain<br />

Pr (M c ≤ ̂m c , M d = (0,...,0) | X = ̂x) = F U c m ,Ũ d m (t 1, t 2 )<br />

We know the left-h<strong>an</strong>d side <strong>an</strong>d thus identify F U c m ,Ũ at the evaluation point t 1, t<br />

m d 2 .<br />

Since (t 1 , t 2 ) is <strong>an</strong>y arbitrary evaluation point in the support <strong>of</strong> Um c , Ũd m we c<strong>an</strong> thus<br />

identify the full joint distribution.<br />

<br />

PROOF OF THEOREM 2.<br />

(<br />

c1 (Q 1 ) − ϕ (Z)<br />

Pr (D 1 = 1 | Z, Q 1 ) = Pr<br />

> ε )<br />

W<br />

σ W σ W<br />

Under (A-1), (A-2), (A-6), (A-7), <strong>an</strong>d (A-9), it follows that c 1(Q 1 ) − ϕ(Z)<br />

σ W<br />

<strong>an</strong>d F˜εW<br />

(where ˜ε W = ε W<br />

σW<br />

) are identified (see M<strong>an</strong>ski, 1988; Matzkin, 1992, 1993). Under<br />

r<strong>an</strong>k condition (A-7), identification <strong>of</strong> c 1(Q 1 ) − ϕ(Z )<br />

σ W<br />

implies identification <strong>of</strong> c 1(Q 1 )<br />

σ W<br />

<strong>an</strong>d ϕ(Z )<br />

σ W<br />

separately. Write<br />

( ) ( )<br />

c2 (Q 2 ) − ϕ (Z) c1 (Q 1 ) − ϕ (Z)<br />

Pr (D 2 = 1 | Z, Q 1, Q 2 ) = F˜εW − F˜εW<br />

σ W σ W<br />

From the absolute continuity <strong>of</strong>˜ε W <strong>an</strong>d the assumption that the distribution function<br />

<strong>of</strong>˜ε W is strictly increasing, we c<strong>an</strong> write<br />

c 2 (Q 2 )<br />

σ W<br />

[<br />

( )]<br />

= F −1<br />

c1 (Q 1 ) − ϕ (Z)<br />

˜ε W<br />

Pr (D 2 = 1 | Z, Q 1 , Q 2 ) + F˜εW + ϕ (Z)<br />

σ W σ W<br />

Thus, we c<strong>an</strong> identify c 2(Q 2)<br />

σ W<br />

over its support <strong>an</strong>d, proceeding sequentially, we c<strong>an</strong><br />

identify c s(Q s)<br />

σ W<br />

, s = 3,...,S. Under (A-8) we c<strong>an</strong> identify η s , s = 2,...,S. <br />

Observe that we could use the final choice (Pr(s = S)) rather th<strong>an</strong> the initial<br />

choice to start <strong>of</strong>f the pro<strong>of</strong> <strong>of</strong> identification using <strong>an</strong> obvious ch<strong>an</strong>ge in the assumptions.<br />

PROOF OF THEOREM 3. From (A-2), the unobservables are jointly independent <strong>of</strong><br />

(X, Z, Q). For fixed values <strong>of</strong> (Z, Q s , Q s−1 ), we may vary the points <strong>of</strong> evaluation<br />

for the continuous coordinates (ys c ) <strong>an</strong>d pick alternative values <strong>of</strong> X = ̂x to trace<br />

out the vector µ c (X) up to intercept terms. Thus, we c<strong>an</strong> identify µ c s,l<br />

(X) upto<br />

a const<strong>an</strong>t for all l = 1,...,N c,s (Heckm<strong>an</strong> <strong>an</strong>d Honoré, 1990). Under (A-2), we<br />

recover the same functions for whatever values <strong>of</strong> Z, Q s , Q s−1 are prespecified<br />

as long as c s (Q s ) > c s−1 (Q s−1 ), so that there is <strong>an</strong> interval <strong>of</strong> ε W bounded above<br />

<strong>an</strong>d below <strong>with</strong> positive probability. This identification result does not require <strong>an</strong>y<br />

passage to a limit argument.


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 409<br />

For values <strong>of</strong> (Z, Q s , Q s−1 ) such that<br />

lim Pr (D<br />

Qs → ¯Qs (Z ) s = 1 | Z, Q s , Q s−1 ) = 1<br />

Q s−1 →Q s−1<br />

(Z )<br />

where ¯Q s (Z) is <strong>an</strong> upper limit <strong>an</strong>d Q s−1 (Z) is a lower limit, <strong>an</strong>d we allow the<br />

limits to depend on Z, we essentially integrate out˜ε W <strong>an</strong>d obtain<br />

Pr ( M c ≤ m c , ˜µ d m ≤−Ud m , Uc s ≤ yc s − µc (X), Ũ d s<br />

≤−˜µ d s (X))<br />

We know that this probability c<strong>an</strong> be achieved by virtue <strong>of</strong> the support condition<br />

<strong>of</strong> Assumption (A-10).<br />

Then proceeding as in the pro<strong>of</strong> <strong>of</strong> Theorem 1, we c<strong>an</strong> identify ˜µ d s (X) coordinate<br />

by coordinate <strong>an</strong>d we obtain the const<strong>an</strong>ts in µ c s,l (X), l = 1,...,N c,s as well as<br />

the const<strong>an</strong>ts in ˜µ d (X). From the assumption <strong>of</strong> me<strong>an</strong> or medi<strong>an</strong> zero <strong>of</strong> the<br />

unobservables. In this exercise, we use the full r<strong>an</strong>k condition on X, which is part<br />

<strong>of</strong> Assumption (A-11).<br />

With these functions in h<strong>an</strong>d, under the full conditions <strong>of</strong> Assumption (A-10),<br />

we c<strong>an</strong> fix ys c, yc m , ˜µd s , ˜µd m , c s(Q s) − ϕ(Z)<br />

σ W<br />

, c s−1(Q s−1) − ϕ(Z)<br />

σ W<br />

at different values to trace out<br />

the joint distribution F(Um c , Ũd m , Uc s , Ũd s ,˜ε W). 39<br />

<br />

APPENDIX B: DESCRIPTION OF THE DATA<br />

We use white males from NLSY79. In the original sample there are 2439 individuals.<br />

We consider the information on these individuals from ages 19 to 35. We<br />

discard 663 individuals because they have observations missing for at least one <strong>of</strong><br />

the covariate variables we use in the <strong>an</strong>alysis. Table 2a,b contains a description <strong>of</strong><br />

the number <strong>of</strong> missing observations per variable. For example, we discard 50 individuals<br />

because we do not observe whether they were living in the South when they<br />

were 14 years old or not. Then we discard <strong>an</strong>other 6 for not having information on<br />

whether they lived in <strong>an</strong> urb<strong>an</strong> area at age 14, <strong>an</strong>other 5 for not reporting the number<br />

<strong>of</strong> siblings, 221 for not indicating parental education <strong>an</strong>d so on, as described<br />

in Table 2. We then restrict the NLSY sample to white males <strong>with</strong> a high school or<br />

college degree. We define high school graduates as individuals having a high school<br />

degree or having completed 12 grades <strong>an</strong>d never reporting college attend<strong>an</strong>ce. We<br />

define participation in college as having a college degree or having completed more<br />

39 Using a st<strong>an</strong>dard definition <strong>of</strong> identification, a model (F U ,β) is identified iff for <strong>an</strong>y alternative<br />

parameters (F ∗ U ,β∗ ) ≠ (F U ,β), there exists some ε>0 such that<br />

Pr(|F U (β) − F ∗ U (β∗ )| >ε) > 0<br />

where the probability is computed <strong>with</strong> respect to the density <strong>of</strong> the data-generating process.<br />

Our use <strong>of</strong> limit set arguments may appear to contradict the st<strong>an</strong>dard definition <strong>of</strong> identification<br />

because <strong>of</strong> zero probability in the limit sets. However, this intuition is false. See the argument in<br />

Aakvik et al. (1999), Theorem 1, which justifies the appeal to limit arguments used in this article in<br />

terms <strong>of</strong> st<strong>an</strong>dard definitions <strong>of</strong> identification.


410 CARNEIRO, HANSEN, AND HECKMAN<br />

th<strong>an</strong> 16 years in school. We exclude the oversample <strong>of</strong> poor whites. Experience is<br />

Mincer experience (age-12 if high-school graduate, age-16 for college graduate).<br />

The variables that we include in the outcome <strong>an</strong>d choice equations are number <strong>of</strong><br />

siblings, parental years <strong>of</strong> schooling, AFQT, year <strong>of</strong> birth dummies, average tuition<br />

<strong>of</strong> the colleges in the county the individual lives in at 17 (we simulate the policy<br />

ch<strong>an</strong>ge by decreasing this variable by $1000 for each individual), dist<strong>an</strong>ce to the<br />

nearest college at 17, average local blue collar wage in state <strong>of</strong> residence at 17 (or<br />

in 1979, for individuals entering the sample at ages older th<strong>an</strong> 17) <strong>an</strong>d local unemployment<br />

rate in the county <strong>of</strong> residence in 1979. For the construction <strong>of</strong> the tuition<br />

variable see Cameron <strong>an</strong>d Heckm<strong>an</strong> (2001). Dist<strong>an</strong>ce to college is constructed by<br />

matching college location data in HEGIS (Higher Education General Information<br />

Survey) <strong>with</strong> county <strong>of</strong> residence in NLSY. State average blue collar wages<br />

are constructed by using data from the BLS. For a description <strong>of</strong> the NLSY sample<br />

see BLS (2001).<br />

In 1980, NLSY respondents were administered a battery <strong>of</strong> 10 achievement<br />

tests referred to as the Armed Forces Vocational Aptitude Battery (ASVAB; see<br />

Cawley et al., 1997, for a complete description). The math <strong>an</strong>d verbal components<br />

<strong>of</strong> the ASVAB c<strong>an</strong> be aggregated into the Armed Forces Qualification Test<br />

(AFQT) scores. 40 M<strong>an</strong>y studies have used the overall AFQT score as a control<br />

variable, arguing that this is a measure <strong>of</strong> scholastic ability. We argue that AFQT<br />

is <strong>an</strong> imperfect proxy for scholastic ability <strong>an</strong>d use the factor structure to capture<br />

this. We also avoid a potential aggregation bias by using each <strong>of</strong> the components<br />

<strong>of</strong> the ASVAB as a separate measure.<br />

For our <strong>an</strong>alysis, we use the r<strong>an</strong>dom sample <strong>of</strong> the NLSY <strong>an</strong>d restrict the sample<br />

to 1161 white males for whom we have information on schooling, several parental<br />

background variables, test scores, <strong>an</strong>d behavior. Dist<strong>an</strong>ce to nearest college at<br />

each date is constructed in the following way: Take the county <strong>of</strong> residence <strong>of</strong><br />

each individual <strong>an</strong>d all other counties <strong>with</strong>in the same state. The dist<strong>an</strong>ce between<br />

two counties is defined as the dist<strong>an</strong>ce between the center <strong>of</strong> each county. If<br />

there exists a college (2-year or 4-year) in the county <strong>of</strong> residence where a person<br />

lives then the dist<strong>an</strong>ce to the nearest college (2-year or 4-year) variable takes the<br />

value <strong>of</strong> zero. Otherwise, we compute dist<strong>an</strong>ce (in miles) to the nearest county<br />

<strong>with</strong> a college. Then we construct the dist<strong>an</strong>ce to the nearest college at 17 by<br />

using the county <strong>of</strong> residence at 17. However, for people who were older th<strong>an</strong><br />

17 in 1979 we use the county <strong>of</strong> residence in 1979 for the construction <strong>of</strong> this<br />

variable.<br />

Tuition at age 17 is average tuition in colleges in the county <strong>of</strong> residence at<br />

17. If there is no college in the county then average tuition in the state is taken<br />

instead. For details on the construction <strong>of</strong> this variable see Cameron <strong>an</strong>d Heckm<strong>an</strong><br />

(2001).<br />

Local labor market variables for the county <strong>of</strong> residence are computed using<br />

information in the 5% sample <strong>of</strong> the 1980 Census. For each county group in the<br />

census we compute the local unemployment rate <strong>an</strong>d average wage for high school<br />

40 Implemented in 1950, the AFQT score is used by the army to screen draftees.


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 411<br />

dropouts, high school graduates, individuals <strong>with</strong> some college, <strong>an</strong>d four-year college<br />

graduates. We do not have this variable for years other th<strong>an</strong> 1980 so, for each<br />

county, we assume that it is a good proxy for local labor market conditions in all<br />

the other years where NLSY respondents are assumed to be making the schooling<br />

decisions we consider in this article.<br />

We also use the variable log <strong>an</strong>nual labor earnings. We extract this variable<br />

from the NLSY79 reported <strong>an</strong>nual earnings from wages <strong>an</strong>d salary. Earnings (in<br />

thous<strong>an</strong>ds <strong>of</strong> dollars) are discounted to 1993 using the Consumer Price Index<br />

reported by the Bureau <strong>of</strong> Labor Statistics. Missing values for this variable may<br />

occur here for two reasons: first, because respondents do not report earnings from<br />

wages/salary, <strong>an</strong>d second, because the NLSY becomes bi<strong>an</strong>nual after 1994 <strong>an</strong>d<br />

this prevents us from observing respondents when they reach certain ages. For<br />

example, because the NLSY79 was not conducted in 1995, we do not observe<br />

individuals born in 1964 when they are 31-year-olds. In this case we input missing<br />

values.<br />

To predict missing log earnings between ages 19 <strong>an</strong>d 35 <strong>an</strong>d extrapolate from age<br />

36 to 65 years we pool NLSY <strong>an</strong>d PSID data. From the latter, we use the sample<br />

<strong>of</strong> white males that are household heads <strong>an</strong>d that are either high school or college<br />

graduates according to the definition given above. This produces a sample <strong>of</strong> 3043<br />

individuals from the PSID. To get <strong>an</strong>nual earnings, we multiply the reported CPIadjusted<br />

(1993 = 100) hourly wage rate by the <strong>an</strong>nual hours worked <strong>an</strong>d divide<br />

the outcome by 1000. Then we take logs to have <strong>an</strong> NLSY-comparable variable.<br />

Similarly to NLSY, we generate the Minceri<strong>an</strong> experience according to the rule<br />

given above. We also generate dummy variables for cohorts. The first (omitted)<br />

cohort, consists <strong>of</strong> individuals born between 1896 <strong>an</strong>d 1905, the second consists <strong>of</strong><br />

individuals born between 1906 <strong>an</strong>d 1915, <strong>an</strong>d so on up to the last cohort, which is<br />

made up <strong>of</strong> PSID respondents born between 1976 <strong>an</strong>d 1985. We pool NLSY <strong>an</strong>d<br />

PSID by merging the NLSY respondents in the PSID cohort born between 1956<br />

<strong>an</strong>d 1965.<br />

Let Y ia denote log earnings <strong>of</strong> agent i at age a. For each schooling choice s, we<br />

model the earnings-experience pr<strong>of</strong>ile as<br />

Y ia (s) = α + β 0 X ia + β 1 X 2<br />

ia + Dγ + ε ia<br />

ε ia = η i + v ia<br />

(B.1)<br />

v ia = ρv ia−1 + κ ia<br />

where X ia is Mincer experience, D is a set <strong>of</strong> dummy variables that indicate cohort,<br />

η i is the individual effect, <strong>an</strong>d κ ia is white noise. In Table A14 posted at<br />

http://lily.src.uchicago.edu/CHH estimating.html we report the OLS estimates for<br />

α, β 0 ,β 1 ,γ,ρ based on the pooled data set.


412 CARNEIRO, HANSEN, AND HECKMAN<br />

Now, let ˆε ia be the estimated residual <strong>of</strong> the earnings–experience pr<strong>of</strong>ile. An<br />

estimator <strong>of</strong> the individual effect η i is<br />

ˆη i =<br />

65∑<br />

1<br />

∑ 65<br />

a=19 φ φ iaˆε ia ,<br />

ia a=19<br />

where φ ia = 1 (if individual i is observed at age a)<br />

Then, we c<strong>an</strong> obtain <strong>an</strong> estimator <strong>of</strong> v ia by computing<br />

ˆv ia = ˆε ia − ˆη i<br />

Now, given ̂v ia we c<strong>an</strong> run Equation (B.1) <strong>an</strong>d then compute ρ. From this we<br />

obtain <strong>an</strong> estimator <strong>of</strong> κ ia according to<br />

ˆκ ia = ˆv ia − ˆρ ˆv ia−1<br />

We c<strong>an</strong> then predict earnings for missing observations for ages 19 to 35<br />

<strong>an</strong>d perform the extrapolation from ages 36 to 65 by computing for each<br />

individual<br />

Ŷ ia (s) = ˆα + ˆβ 0 X ia + ˆβ 1 X 2 ia + Dˆγ + ˆε ia<br />

= ˆα + ˆβ 0 X ia + ˆβ 1 X 2 ia + Dˆγ + ˆη i + ˆρ ˆv ia−1 + ˆκ ia<br />

Note that to get ˆε ia we do not set ˆκ ia equal to zero. Instead, we sample 10<br />

draws from its distribution <strong>an</strong>d average them for each individual, for each time<br />

period.<br />

The next step is to get the present value <strong>of</strong> log earnings at age 19 for each<br />

agent. In order to do it we discount log earnings at each period using a discount<br />

rate <strong>of</strong> 3%. For identification purposes we then break each individual’s working<br />

life in two periods. The first one goes from age 19 to 29. The second period goes<br />

from age 30 all the way to 65. This produces a p<strong>an</strong>el in which the first observation<br />

for each agent is the present value <strong>of</strong> log earnings from age 19 to 29 <strong>an</strong>d<br />

the second is the present value <strong>of</strong> log earnings from age 30 to 65. This me<strong>an</strong>s<br />

that lifetime present value <strong>of</strong> log earnings is just the sum <strong>of</strong> these two components.<br />

Table 2b contains descriptive statistics for the present value <strong>of</strong> log earnings<br />

for the entire working life period <strong>an</strong>d also for the two subperiods used in the<br />

<strong>an</strong>alysis.<br />

APPENDIX C: MARKOV CHAIN MONTE CARLO SIMULATION METHODS<br />

Due to the complex nature <strong>of</strong> the likelihood function we rely on Markov Chain<br />

Monte Carlo techniques to estimate the model. These are computer-intensive<br />

algorithms based on designing <strong>an</strong> ergodic discrete-time continuous-state Markov<br />

chain <strong>with</strong> a tr<strong>an</strong>sition kernel having <strong>an</strong> invari<strong>an</strong>t measure equal to the posterior


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 413<br />

distribution <strong>of</strong> the parameter vector ψ; see Robert <strong>an</strong>d Casella (1999) for details.<br />

In particular, we will be using the Gibbs sampling algorithm. 41<br />

We first describe how the Gibbs sampler c<strong>an</strong> be used to estimate models in<br />

the general setup laid out in Section 4. Let ψ s,a be parameters specific to the<br />

distribution <strong>of</strong> outcomes <strong>with</strong> schooling level s at age a, let ψ m be parameters<br />

specific to the distribution <strong>of</strong> measurements, let ψ c be parameters specific tothe<br />

distribution <strong>of</strong> schooling choice, <strong>an</strong>d let ψ θ be parameters specific to the factor<br />

distributions. Let n be the number <strong>of</strong> observations. Let the outcome matrix over<br />

all ages <strong>with</strong> schooling level s be Y s,i = (Ys,i c , Y∗d s,i<br />

) <strong>an</strong>d the vector <strong>of</strong> measurements<br />

is M.<br />

The complete data likelihood for completed schooling level S = s is<br />

f (M, Y s , I,θ| ψ) =<br />

∏<br />

i:D i,s =1<br />

f (M i , Y s,i , I i ,θ i | ψ)<br />

where ψ = [ψ s,a ,ψ m ,ψ c ,ψ θ ], “i” denotes a subscript for individual i <strong>an</strong>d<br />

f (M i , Y s,i , I i ,θ i | ψ) = f (M i | ψ m ,θ i ) ×<br />

The complete data posterior is<br />

f (M, Y, I,θ,ψ| data) ∝<br />

Ā∏<br />

f (Y s,a,i | ψ s,a ,θ i ) f (I i | θ i ,ψ c ) f (θ i | ψ)<br />

a=1<br />

¯S∏<br />

s=1<br />

f (θ, M, Ys ∗ , I | ψ) f (ψ)<br />

where Y = (Y 1 ,...,Y¯S).<br />

In what follows the conditional posteriors that constitute the tr<strong>an</strong>sition kernel<br />

<strong>of</strong> the Gibbs sampler will be derived.<br />

Choice Equations.<br />

Conditional on the factors we have<br />

(C.1)<br />

f ( η, γ, ρ | ψ −(η,γ,ρ) ,θ )<br />

{ }<br />

n∏ ∣<br />

∝ f (I i Z i ′ η + γ ′ θ i , 1)<br />

i=1<br />

×<br />

{ ¯s∑<br />

j=1<br />

1(c i, j−1 < I i < c i, j )D i, j }1(c i1 < ···< c i ¯s ) f (η, γ ) f (ρ)<br />

}<br />

41 For other uses <strong>of</strong> Markov Chain Monte Carlo techniques in models <strong>an</strong>d applications related<br />

to ours, see Chib <strong>an</strong>d Hamilton (2000), who implement MCMC methods for a p<strong>an</strong>el version <strong>of</strong> a<br />

generalized Roy model, <strong>an</strong>d Chib <strong>an</strong>d Hamilton (2002), who consider various cross-sectional treatment<br />

models.


414 CARNEIRO, HANSEN, AND HECKMAN<br />

This marginal c<strong>an</strong> be factored into two conditionals. Conditional on ρ we have<br />

f ( η, γ | ρ,ψ −(η,γ,ρ) ,θ ) ∝<br />

n∏<br />

i=1<br />

f (I i | Z ′<br />

i η + γ ′ θ i , 1) f (η, γ )<br />

This is the posterior for a normal regression model <strong>with</strong> covariates Z i ,θ i <strong>an</strong>d<br />

precision fixed at 1. With f (η, γ ) multivariate normal this is a multivariate normal<br />

distribution.<br />

The second conditional (for ρ)is<br />

f ( ρ | η, γ, ψ −(η,γ,ρ) ,θ ) ∝<br />

n∏<br />

¯s∑<br />

i=1 j=1<br />

1(c i, j−1 < I i < c i, j )D i, j 1(c i1


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 415<br />

This factors into n independent truncated normals,<br />

f ( I | ψ −(η,γ,ρ) ,θ ) =<br />

n∏<br />

i=1<br />

TN (ci, j−1 ,c ij )(I i | Z ′<br />

i η + γ ′ θ i , 1)<br />

So, we sample I i , i = 1,...,N, one at a time from truncated normals.<br />

Measurement Equations.<br />

form<br />

The continuous measurement equations are <strong>of</strong> the<br />

(C.3)<br />

M i, j = X m,i, ′ j βc m, j + αc ′ m, j θ i + εm,i, c j<br />

Given X m,i, j ,θ i , this is a linear regression model. With multivariate normal priors<br />

on (βm, c j ,αc m, j ) <strong>an</strong>d a gamma prior on the precision <strong>of</strong> εc m,i, j<br />

, this is in the form<br />

<strong>of</strong> the st<strong>an</strong>dard conjugate Bayesi<strong>an</strong> linear regression model, <strong>with</strong> a conditional<br />

normal distribution for βm, c j given the precision <strong>of</strong> εc m,i, j<br />

<strong>an</strong>d a gamma distribution<br />

for the precision conditional on βm, c j .<br />

Let the last m − m 1 elements <strong>of</strong> the measurement vector M be binary indices<br />

generated as<br />

M d j = 1 ( M ∗d<br />

j ≥ 0 ) , j = m 1 + 1,...,m<br />

The parameters in the binary measurements are samples as above <strong>with</strong> two exceptions.<br />

First, a separate step samples the latent measurements, M ∗d<br />

j<br />

,as<br />

M ∗d<br />

i, j ∼<br />

{ ( ∣<br />

TN(0,∞) M<br />

∗d<br />

i, j<br />

TN (−∞,0)<br />

(<br />

M<br />

∗d<br />

i, j<br />

∣ X<br />

′<br />

m,i, j<br />

βm, d j + αd ′<br />

m, j θi , 1 ) if Mi, d j<br />

= 1<br />

∣ X<br />

′<br />

m,i, j<br />

βm, d j + αd ′<br />

m, j θi , 1 ) if Mi, d j<br />

= 0<br />

Second, the precision is not sampled but fixed at one.<br />

Outcome Equations. Let Y s,a be the outcome vector at age a <strong>with</strong> schooling<br />

level s. Suppose both employment <strong>an</strong>d wage outcomes are modeled. Let Ys,a c be<br />

the wage outcome <strong>an</strong>d Ys,a d the employment outcome. Also, let Y∗,d s,a be the latent<br />

employment index. By the factor structure assumption we have<br />

f ( Y c<br />

for a person working.<br />

The model for wages is<br />

s,a , Y∗,d s,a<br />

∣ θ ) = f ( ∣<br />

Y c<br />

s,a<br />

∣ θ ) f ( Y ∗,d<br />

s,a<br />

Ys,a,i c = X 1,a,i ′ βc s,a + αc ′ s,a θ i + εa,s,i<br />

c<br />

∣ θ )<br />

where εa,s,i c ∼ N(0,τs,a c ). This is in the form <strong>of</strong> a st<strong>an</strong>dard linear regression model<br />

under normality <strong>an</strong>d (βs,a c ,αc s,a ,τc s,a ) is sampled as above (using multivariate normal<br />

<strong>an</strong>d gamma priors).


416 CARNEIRO, HANSEN, AND HECKMAN<br />

We c<strong>an</strong> allow for general state dependence by modeling the latent employment<br />

tr<strong>an</strong>sition indices as<br />

Y d,∗<br />

s,a,i<br />

=<br />

{ X<br />

′<br />

2,a,s,i<br />

βa,s,0 d + αd ′<br />

a,s,0 θi + εa,s,i,0 d , if Yd<br />

s,a−1,i<br />

= 0<br />

X<br />

2,a,s,i ′ βd a,s,1 + αd ′<br />

a,s,1 θi + εa,s,i,1 d , if Yd<br />

s,a−1,i<br />

= 1<br />

where εa,s,i,0 d <strong>an</strong>d εd a,s,i,1<br />

are both st<strong>an</strong>dard normal.<br />

The conditional <strong>of</strong> (β 2,a,s,0 ,α 2,a,s,0 ) <strong>an</strong>d (β 2,a,s,1 ,α 2,a,s,1 )is<br />

f ( βa,s,0 d )<br />

,αd ∣<br />

a,s,0 ψ−β d<br />

a,s,0 ,αa,s,0<br />

d<br />

∝ f ( βa,s,0 d ) ∏<br />

,αd a,s,0 f ( Y d,∗<br />

∣<br />

s,a,i X<br />

′<br />

2,a,s,i βa,s,0 d + αd ′<br />

a,s,0 θi , 1 )<br />

i:Ys,a−1,i d =0<br />

f ( βa,s,1 d )<br />

,αd ∣<br />

a,s,1 ψ−β d<br />

a,s,1 ,αa,s,1<br />

d<br />

∝ f ( βa,s,1 d ) ∏<br />

,αd a,s,1 f ( Y d,∗<br />

s,a,i | X′ 2,a,s,i βd a,s,1 + αd ′<br />

a,s,1 θi , 1 )<br />

i:Ys,a−1,i d =1<br />

Both <strong>of</strong> these are normal regression models <strong>with</strong> the precision fixed at 1. The<br />

latent employment indices are sampled as in the usual binary choice framework<br />

(see Albert <strong>an</strong>d Chib, 1993).<br />

Factors. The conditional for θ factors into n conditionals for θ 1 ,...,θ n . To see<br />

what the conditional for θ i is note that all contributions <strong>of</strong> θ i originate from linear<br />

regression models,<br />

I i − Z<br />

i ′ η = γ ′ θ i + ε I,i ,<br />

(choice model)<br />

M j − X<br />

m,i, ′ j β m, j = α<br />

m, ′ j θ i + ε m, j , (measurements)<br />

Ys,a,i c − X′ 1,a,i βc s,a = α c s,a ′ θ i + εa,s,i c , (wages)<br />

Y d,∗<br />

2,s,a,i − X′ 2,a,s,i βd a,s,l = α d ′ a,s,l θ i + εa,s,i,l d , (employment)<br />

This equation system is <strong>of</strong> the form<br />

Ŷ i = A i θ i + u i<br />

where u i ∼ N(0, i ), where i is a diagonal precision matrix. The conditional<br />

posterior for θ i is then<br />

{<br />

f (θ i | ψ) ∝ exp − 1 }<br />

2 (Ŷ i − A i θ i ) ′ i (Ŷ i − A i θ i ) f (θ i )<br />

where<br />

f (θ i ) =<br />

K∏<br />

J K ∑<br />

k=1 j=1<br />

p k, j N ( θ ik | µ k, j ,τ k, j<br />

)


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 417<br />

We sample θ ik |{θ ij } j≠k one at a time from their respective conditionals, which c<strong>an</strong><br />

be shown to be a mixture <strong>of</strong> normals <strong>with</strong> <strong>an</strong> updated (data-dependent) mixture<br />

<strong>of</strong> weights <strong>an</strong>d parameters.<br />

Conditional on the factor vector θ, we have<br />

θ ik ∼<br />

∑J k<br />

j=1<br />

p l, j N(θ il | µ l, j ,τ l, j ),<br />

i = 1,...,n<br />

Conditional on θ we c<strong>an</strong> treat the factors as known <strong>an</strong>d update the mixture<br />

parameters (p k ,µ k ,τ k ). We follow the “group indicator” approach in Diebolt <strong>an</strong>d<br />

Robert (1994) <strong>an</strong>d augment the parameter vector by a sequence <strong>of</strong> latent group<br />

indicators defined as g i = j if a θ i, j originates from a mixture component j. Conditional<br />

on the mixture group indicators the mixture parameters are easily sampled<br />

<strong>an</strong>d conditional on the mixture parameters the group indicators are simple multinomials.<br />

To preserve the identification <strong>of</strong> intercepts we constrain the mixture to<br />

have me<strong>an</strong> zero using the method proposed in Richardson et al. (2000).<br />

The estimation <strong>of</strong> the structural models in Section 7 is done as above <strong>with</strong> a<br />

few modifications. The choice model is a probit so the cut point is c = 0, <strong>an</strong>d<br />

no ρ parameters are estimated. The cross-equation restrictions are imposed as<br />

follows: Let Ỹ i = (V 1,h,i , V 2,h,i , V 1,c,i , V 2,c,i , V i ), i.e., the stacked outcomes under<br />

high school <strong>an</strong>d college <strong>an</strong>d the choice index. We c<strong>an</strong> then write the model as<br />

Ỹ i = W i ψ + Ɣθ i + ε i<br />

M i = X i ω + α test θ i + ε test<br />

where ψ ={{¯δ 1,s , ¯δ 2,s , ¯β 1,s , ¯β 2,s } s ,δ P ,γ}, <strong>an</strong>d W i <strong>an</strong>d the loading matrix Ɣ =<br />

Ɣ({ᾱ 1,s , ᾱ 2,s } s ,α P ) are defined appropriately. This model is now in the form <strong>of</strong><br />

the system described above <strong>an</strong>d the required conditionals are derived as above.<br />

APPENDIX D: MEAN PRESERVING SPREAD<br />

For the model described in Section 7, assume that ε a,s are independent <strong>an</strong>d<br />

identically normally distributed <strong>with</strong>in each period,<br />

ε a,s ∼ N ( 0,σ 2 s,1)<br />

for ages 19–29<br />

ε a,s ∼ N ( 0,σ 2 s,2)<br />

for ages 30–65.<br />

Then,<br />

(<br />

ε 1,s ∼ N 0,<br />

(<br />

ε 2,s ∼ N 0,<br />

∑29<br />

a=19<br />

∑65<br />

a=30<br />

)<br />

σs,1<br />

2<br />

(1 + ρ) a<br />

)<br />

σs,2<br />

2<br />

(1 + ρ) a


418 CARNEIRO, HANSEN, AND HECKMAN<br />

At each age<br />

where<br />

ln Y a,s = δ a,s + X ′ β a,s + α ′ a,s θ + ε a,s + η 1,s × experience a<br />

+ η 2,s × experience 2 a = µ a,s + ε a,s<br />

then<br />

µ a,s = δ a,s + X ′ β a,s + α ′ a,s θ + η 1,s × experience a + η 2,s × experience 2 a<br />

E(Y a,s | X,θ) = exp(µ a,s )E[exp(ε a,s )].<br />

We do a me<strong>an</strong> preserving spread at each age a by giving the individual knowledge<br />

<strong>of</strong> ε a,s :<br />

Then,<br />

E (Y a,s | X,θ,ε a,s ) = exp (µ a,s + ε a,s ) = exp(µ ′ a,s )<br />

exp(µ ′ a,s ) = exp(µ a,s)E[exp(ε a,s )]<br />

Since the ε a,s are i.i.d. we c<strong>an</strong> drop the age subscript on the ε:<br />

exp(µ ′ a,s ) = exp (µ a,s) E (exp (ε s ))<br />

The me<strong>an</strong> preserving spread is actually a combination <strong>of</strong> age-by-age me<strong>an</strong> preserving<br />

spreads. Finally, compute<br />

Define<br />

µ 1,s =<br />

µ 2,s =<br />

29∑<br />

a=19<br />

65∑<br />

a=30<br />

µ ′ 1,s = 29∑<br />

a=19<br />

µ ′ 2,s = 65∑<br />

a=30<br />

µ a,s<br />

(1 + ρ) a<br />

µ a,s<br />

(1 + ρ) a<br />

µ ′ a,s<br />

(1 + ρ) a<br />

µ ′ a,s<br />

(1 + ρ) a<br />

V = µ 1,C + µ 2,C − µ 1,a − µ 2,a − Zγ − α ′ p θ − ε p<br />

V ′ = µ ′ 1,C + µ′ 2,C − µ ′ 1,a − µ ′ 2,a + Zγ − α ′ p θ − ε p + ¯ε 1,C + ¯ε 2,C − ¯ε 1,a − ¯ε 2,a


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 419<br />

The probability <strong>of</strong> going to college is given by<br />

for the first case <strong>an</strong>d for the second case<br />

Pr (V > 0)<br />

Pr (V ′ > 0)<br />

The experiment for the case where we remove θ 1 from the information set <strong>of</strong> the<br />

agent, keeping age-by-age me<strong>an</strong> earnings const<strong>an</strong>t, is <strong>an</strong>alogous to the one just<br />

described.<br />

REFERENCES<br />

AAKVIK, A., J. HECKMAN, AND E. VYTLACIL, “Training Effects on Employment when the<br />

Training Effects are Heterogeneous: An <strong>Application</strong> to Norwegi<strong>an</strong> Vocational Rehabilitation<br />

Programs,” M<strong>an</strong>uscript, University <strong>of</strong> Chicago, 1999.<br />

——, ——, AND ——, “Treatment Effects For Discrete Outcomes when Responses To Treatment<br />

Vary Among Observationally Identical Persons: An <strong>Application</strong> to Norwegi<strong>an</strong><br />

Vocational Rehabilitation Programs,” NBER Working Paper No. TO262, Journal <strong>of</strong><br />

Econometrics, 2003, forthcoming.<br />

ALBERT,J.,AND S. CHIB, “Bayesi<strong>an</strong> Analysis <strong>of</strong> Binary <strong>an</strong>d Polychotomous Response Data,”<br />

Journal <strong>of</strong> the Americ<strong>an</strong> Statistical Association 88 (1993), 669–79.<br />

ANDERSON,T.W.,AND H. RUBIN, “Statistical Inference in Factor Analysis,” in J. Neym<strong>an</strong>, ed.,<br />

Proceedings <strong>of</strong> Third Berkeley Symposium on Mathematical Statistics <strong>an</strong>d Probability,<br />

5 (Berkeley: University <strong>of</strong> California Press, 1956), 111–50.<br />

ATHEY, S., AND G. IMBENS, “Identification <strong>an</strong>d Inference in Nonlinear Difference-In-<br />

Differences Models,” NBER Technical Working Paper T0280, 2002.<br />

BEN AKIVA, M., BOLDUC, D.,AND WALKER, J.“Specification, Identification <strong>an</strong>d Estimation<br />

<strong>of</strong> the Logit Kernel (or Continuous Mixed Logit Model),” M<strong>an</strong>uscript, Department<br />

<strong>of</strong> Civil Engineering, MIT, February 2001.<br />

BUERA, F.J.,“Testable Implications <strong>an</strong>d Identification <strong>of</strong> Occupational Choice Models,”<br />

unpublished m<strong>an</strong>uscript, University <strong>of</strong> Chicago, 2002.<br />

Bureau <strong>of</strong> Labor Statistics, NLS H<strong>an</strong>dbook 2001 (Washington, D.C.: U.S. Department <strong>of</strong><br />

Labor, 2001).<br />

CAMERON, S.,AND J. HECKMAN, “Son <strong>of</strong> CTM: The DCPA Approach Based on Discrete<br />

Factor Structure Models,” Unpublished m<strong>an</strong>uscript, University <strong>of</strong> Chicago, 1987.<br />

——, AND ——, “Life Cycle Schooling <strong>an</strong>d Dynamic Selection Bias,” Journal <strong>of</strong> Political<br />

Economy 106 (1998), 262–333.<br />

——, AND ——, “The Dynamics <strong>of</strong> Educational Attainment for Blacks, Whites <strong>an</strong>d Hisp<strong>an</strong>ics,”<br />

Journal <strong>of</strong> Political Economy 109 (2001), 455–99.<br />

CARNEIRO, P., K. HANSEN, AND J. HECKMAN, “Removing the Veil <strong>of</strong> Ignor<strong>an</strong>ce in Assessing<br />

the Distributional Impacts <strong>of</strong> Social Policies,” Swedish Economic Policy Review, 8,<br />

(2001), 273–301.<br />

CAWLEY, J., K. CONNEELY, J.HECKMAN, AND E. VYTLACIL, “Cognitive Ability, Wages, <strong>an</strong>d<br />

Meritocracy,” in B. Devlin, S. E. Feinberg, D. Resnick, <strong>an</strong>d K. Roeder, eds., Intelligence<br />

Genes, <strong>an</strong>d Success: Scientists Respond to the Bell Curve (New York: Springer-Verlag,<br />

1997), 179–92.<br />

CHAMBERLAIN, G.,“Education, Income, <strong>an</strong>d Ability Revisited,” Journal <strong>of</strong> Econometrics<br />

1977a), 241–57<br />

——, “An Instrumental Variable Interpretation <strong>of</strong> Identification in Vari<strong>an</strong>ce Components<br />

<strong>an</strong>d MIMIC Models,” in Paul Taubm<strong>an</strong>, ed., Kinometrics: Determin<strong>an</strong>ts <strong>of</strong> Socio-<br />

Economic Success Within <strong>an</strong>d Between Families (Amsterdam: North-Holl<strong>an</strong>d, (1977b).


420 CARNEIRO, HANSEN, AND HECKMAN<br />

——, AND Z. GRILICHES, “Unobservables <strong>with</strong> a Vari<strong>an</strong>ce-Components Structure: Ability,<br />

Schooling, <strong>an</strong>d the Economic Success <strong>of</strong> Brothers,” International Economic Review<br />

16 (1975), 422–49.<br />

CHIB,S.,AND B. HAMILTON, “Bayesi<strong>an</strong> Analysis <strong>of</strong> Cross Section <strong>an</strong>d Clustered Data Treatment<br />

Models” Journal <strong>of</strong> Econometrics 97 (2000), 25–50.<br />

——, AND ——, “Semiparametric Bayes Analysis <strong>of</strong> Longitudinal Data Treatment Models,”<br />

Journal <strong>of</strong> Econometrics, 110 (2002), 67–89.<br />

COCHRANE, W.G.,AND D. RUBIN, “Controlling Bias in Observational Studies: A Review,”<br />

S<strong>an</strong>khya A 35 (1973), 417–46.<br />

COSSLETT, S. R., “Distribution-Free Maximum Likelihood Estimator <strong>of</strong> the Binary Choice<br />

Model,” Econometrica, 51 (1983), 765–82.<br />

DIEBOLT, J., AND C. P. ROBERT, “Estimation <strong>of</strong> Finite Mixture <strong>Distributions</strong> Through<br />

Bayesi<strong>an</strong> Sampling,” Journal <strong>of</strong> the Royal Statistical Society, Series B, 56 (1994), 363–75.<br />

DINARDO, J., N. M. FORTIN, AND T. LEMIEUX, “Labor Market Institutions <strong>an</strong>d the Distribution<br />

<strong>of</strong> Wages, 1973–1992: A Semiparametric Approach,” Econometrica, 64 (1996): 1001–<br />

44.<br />

ECKSTEIN, Z., AND K. WOLPIN, “The Specification <strong>an</strong>d Estimation <strong>of</strong> Dynamic Stochastic<br />

Discrete Choice Models: A Survey,” Journal <strong>of</strong> Hum<strong>an</strong> Resources, 24 (1989), 562–98.<br />

——, AND ——, “Dynamic Labour Force Participation <strong>of</strong> Married Women <strong>an</strong>d Endogenous<br />

Work Experience,” Review Economic Studies 56 (1999), 375–90.<br />

ELROD, T.AND M. KEANE, “A Factor-<strong>an</strong>alytic Probit Model for Representing the Market<br />

Structure in P<strong>an</strong>el Data,” Journal <strong>of</strong> Marketing Research 32 (1995), 1–16.<br />

FERGUSON, T.S.,“Bayesi<strong>an</strong> Density Estimation by Mixtures <strong>of</strong> Normal <strong>Distributions</strong>,” in<br />

M. Rizvi, J. Rustagi, <strong>an</strong>d D. Siegmund, eds., Recent Adv<strong>an</strong>ces in Statistics (New York:<br />

Academic Press, 1983) 287–302.<br />

FLAVIN, M., “The Adjustment <strong>of</strong> Consumption to Ch<strong>an</strong>ging Expectations about Future<br />

Income,” Journal <strong>of</strong> Political Economy 89 (1981), 974–1009.<br />

FLORENS, J., M. MOUCHART, AND J. ROLIN. Elements <strong>of</strong> Bayesi<strong>an</strong> Statistics (New York:<br />

M. Dekker, 1990).<br />

FRÉCHET, M., “Sur les tableaux de corrélation dont les marges sont donneés,” Annals<br />

Université Lyon, Sect. A, Series 3, 14(1951), 53–77.<br />

GEWEKE, J., D. HOUSER, AND M. KEANE, “Simulation Based inference for Dynamic Multinomial<br />

Choice Models,” in B. H. Baltaji, ed., Comp<strong>an</strong>ion for Theoretical Econometrics<br />

(London: Basil Blackwell, 2001).<br />

GOLDBERGER,A.S.,“Structural Equation Methods in the Social Sciences.” Econometrica,<br />

40 (1972), 979–1001.<br />

HANSEN, K., J. HECKMAN, AND K. MULLEN, “The Effect <strong>of</strong> Schooling <strong>an</strong>d Ability on Achievement<br />

Test Scores,” Journal <strong>of</strong> Econometrics, 2003, forthcoming.<br />

——, ——, AND S. NAVARRO, “Nonparametric Identification <strong>of</strong> Time to Treatment Models<br />

<strong>an</strong>d The Joint <strong>Distributions</strong> <strong>of</strong> <strong>Counterfactuals</strong>,” Unpublished M<strong>an</strong>uscript, University<br />

<strong>of</strong> Chicago, 2003.<br />

HANSEN, L., W. ROBERDS, AND T. SARGENT, “Time Series Implications <strong>of</strong> Present Value<br />

Budget Bal<strong>an</strong>ce <strong>an</strong>d <strong>of</strong> Martingale Models <strong>of</strong> Consumption <strong>an</strong>d Taxes,” in L. H<strong>an</strong>sen<br />

<strong>an</strong>d T. Sargent, eds., Rational Expectations Econometrics (Boulder, CO: Westview<br />

Press, 1991).<br />

HECKMAN,J.,“Statistical Models for Discrete P<strong>an</strong>el Data,” in C. M<strong>an</strong>ski <strong>an</strong>d D. McFadden,<br />

eds., Structural Analysis <strong>of</strong> Discrete Data With Econometric <strong>Application</strong>s (Cambridge,<br />

MA: MIT Press, 1981).<br />

——, “Varieties <strong>of</strong> Selection Bias.” Americ<strong>an</strong> Economic Review 80 (1990), 313–18.<br />

——, “R<strong>an</strong>domization <strong>an</strong>d Social Policy Evaluation,” in C. F. M<strong>an</strong>ski <strong>an</strong>d Irwin Garfinkel,<br />

eds. Evaluating Welfare <strong>an</strong>d Training Programs, (Cambridge, MA: Harvard University<br />

Press, 1992).<br />

——, “Micro Data, Heterogeneity, <strong>an</strong>d the Evaluation <strong>of</strong> Public Policy: Nobel Lecture,”<br />

Journal <strong>of</strong> Political Economy. 109 (2001), 673–748.


EFFECTS OF UNCERTAINTY ON COLLEGE CHOICE 421<br />

——, AND B. HONORÉ, “The Empirical Content <strong>of</strong> the Roy Model,” Econometrica, 58 (1990),<br />

1121–49.<br />

——, AND S. NAVARRO, “Ordered Discrete Choice Models <strong>with</strong> Stochastic Shocks,”<br />

M<strong>an</strong>uscript, University <strong>of</strong> Chicago, 2001.<br />

——, AND ——, “Using Matching, Instrumental Variables <strong>an</strong>d Control Functions to Estimate<br />

Economic Choice Models,” Review <strong>of</strong> Economics <strong>an</strong>d Statistics (2003), forthcoming.<br />

——, AND R. ROBB. “Alternative Methods for Evaluating the Impact <strong>of</strong> Interventions,” in<br />

J. Heckm<strong>an</strong> <strong>an</strong>d B. Singer, eds., Longitudinal Analysis <strong>of</strong> Labor Market Data. (New<br />

York: Cambridge University Press, 1985).<br />

——, AND ——, (1986). “Alternative Methods for Solving the Problem <strong>of</strong> Selection Bias<br />

in Evaluating the Impact <strong>of</strong> Treatments on Outcomes,” in Drawing Inferences from<br />

Self-Selected Samples, H. Wainer, ed. (New York: Springer-Verlag, 1986; reprinted in<br />

2000 by Lawrence Erlbaum Associates).<br />

——, AND J. SMITH, “Assessing the Case for R<strong>an</strong>domized Evaluation <strong>of</strong> Social Programs.”<br />

in K. Jensen <strong>an</strong>d P. K. Madsen, eds., Measuring Labour Market Measures: Evaluating<br />

The Effects <strong>of</strong> Active Labour Market Policy Initiatives, (Copenhagen: Ministry Labour,<br />

1993).<br />

——, AND ——, “Evaluating the Welfare State,” in Econometrics <strong>an</strong>d Economic Theory<br />

in the 20th Century: The Ragnar Frisch Centennial, Econometric Society Monograph<br />

Series, S. Strom, ed., (Cambridge: Cambridge University Press, 1998).<br />

——,R.LALONDE, AND J. SMITH, “The Economics <strong>an</strong>d Econometrics <strong>of</strong> Active Labor Market<br />

Programs,” In O. Ashenfelter <strong>an</strong>d D. Card, eds, H<strong>an</strong>dbook <strong>of</strong> Labor Economics,<br />

Volume 3 (Amsterdam: Elsevier, 1999).<br />

——,L.LOCHNER, AND C. TABER, “Explaining Rising Wage Inequality: Explorations With<br />

A Dynamic General Equilibrium Model <strong>of</strong> Earnings With Heterogeneous Agents,”<br />

Review <strong>of</strong> Economic Dynamics, 1 (1998a), 1–58.<br />

——, ——, AND ——, “General Equilibrium Treatment Effects: A Study <strong>of</strong> Tuition Policy,”<br />

Americ<strong>an</strong> Economic Review, 88 (1998b), 381–86.<br />

——, ——, AND ——, “Tax Policy <strong>an</strong>d Hum<strong>an</strong> Capital Formation,” Americ<strong>an</strong> Economic<br />

Review, 88 (1998c), 293–97.<br />

——, ——, AND ——, “General Equilibrium Cost Benefit Analysis <strong>of</strong> Education <strong>an</strong>d Tax<br />

Policies,” in G. R<strong>an</strong>is <strong>an</strong>d L. K. Raut, eds., Trade, Growth <strong>an</strong>d Development: Essays in<br />

Honor <strong>of</strong> T. N. Srinivas<strong>an</strong> (Amsterdam: Elsevier Science, B.V. 2000), 291–393, Chapter<br />

14.<br />

——, T.SMITH, AND N. CLEMENTS, “Making the Most out <strong>of</strong> Program Evaluations <strong>an</strong>d<br />

Social Experiments: Accounting for Heterogeneity in Program Impacts,” Review <strong>of</strong><br />

Economic Studies 64 (1997), 487–535.<br />

HOEFFDING,W.,“Masstabinvari<strong>an</strong>te Korrelationtheorie,” Schriften des Mathematischen Instituts<br />

und des Instituts für Angew<strong>an</strong>dte Mathematik der Universität Berlin, 5(1940),<br />

197–233.<br />

JÖRESKOG, K., “Structural Equations Models in the Social Sciences: Specification, Estimation<br />

<strong>an</strong>d Testing.” In <strong>Application</strong>s <strong>of</strong> Statistics, P. R. Krishnaih, ed. (Amsterdam: North<br />

Holl<strong>an</strong>d, 1977), 265–87.<br />

JÖRESKOG,K.G.,AND A. S. GOLDBERGER, “Estimation <strong>of</strong> a Model <strong>with</strong> Multiple Indicators<br />

<strong>an</strong>d Multiple Causes <strong>of</strong> a Single Latent Variable,” Journal <strong>of</strong> the Americ<strong>an</strong> Statistical<br />

Association, 70 (1975), 631–39.<br />

KEANE, M., AND K. WOLPIN, “The Career Decisions <strong>of</strong> Young Men,” Journal <strong>of</strong> Political<br />

Economy 105 (1997), 473–522.<br />

KOTLARSKI, I.“On Characterizing the Gamma <strong>an</strong>d Normal Distribution,” Pacific Journal<br />

<strong>of</strong> Mathematics 20 (1967), 69–76.<br />

MANSKI,C.F.“Identification <strong>of</strong> Binary Response Models,” Journal <strong>of</strong> the Americ<strong>an</strong> Statistical<br />

Association, 83 (1988), 729–38.<br />

MATZKIN, R., “Nonparametric <strong>an</strong>d Distribution-Free Estimation <strong>of</strong> the Binary Threshold<br />

Crossing <strong>an</strong>d the Binary Choice Models,” Econometrica 60, (1992): 239–70.


422 CARNEIRO, HANSEN, AND HECKMAN<br />

——, “Nonparametric Identification <strong>an</strong>d Estimation <strong>of</strong> Polychotomous Choice Models,”<br />

Journal <strong>of</strong> Econometrics 58, (1993) 137–68.<br />

——, AND A. LEWBEL, “Notes on Single Index Restrictions,” Unpublished M<strong>an</strong>uscript,<br />

Northwestern University, 2002.<br />

MCFADDEN, D.,“Econometric Analysis <strong>of</strong> Qualitative Response Models,” in Z. Griliches<br />

<strong>an</strong>d M. Intrilligator, eds. H<strong>an</strong>dbook <strong>of</strong> Econometrics, Volume II, (Amsterdam: North<br />

Holl<strong>an</strong>d, 1984).<br />

MUTHEN, B.,“A General Structural Equation Model With Dichotomous, Ordered Categorical<br />

<strong>an</strong>d Continuous Latent Variable Indicators,” Psychometrika 49 (1984), 115–32.<br />

OLLEY, S.G.,AND A. PAKES, “The Dynamics <strong>of</strong> Productivity in the Telecommunications<br />

Equipment Industry,” Econometrica, 64 (1996), 1263–97.<br />

RAO PRAKASA,B.L.S.,Identifiability in Stochastic Models: Characterization <strong>of</strong> Probability<br />

<strong>Distributions</strong> (Boston: Academic Press, 1992).<br />

RICHARDSON, S., L. LEBLOND,I.JAUSSENT, AND P. J. GREEN, “Mixture Models in Measurement<br />

Error Problems, <strong>with</strong> Reference to Epidemiological Studies,” Working paper, 2000.<br />

ROBERT,C.P.,AND G. CASELLA, Monte Carlo Statistical Methods (New York: Springer, 1999).<br />

ROSENBAUM,P.,AND D. RUBIN, “The Central Role <strong>of</strong> the Propensity Score in Observational<br />

Studies for Causal Effects,” Biometrika 70 (1983), 41–55.<br />

ROY, A., “Some Thoughts on the Distribution <strong>of</strong> Earnings,” Oxford Economic Papers, 3<br />

(1951), 135–46.<br />

ROZANOV, Y.A.,Markov R<strong>an</strong>dom Fields (Berlin: Spring Verlag, 1982).<br />

SEN, A.K.,On Economic Inequality (Oxford: Clarendon Press, 1973).<br />

WILLIS, R., AND S. ROSEN, “Education <strong>an</strong>d Self-Selection,” Journal <strong>of</strong> Political Economy,<br />

87 (1979), S7–S36.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!