19.09.2015 Views

Introduction to Panel Data Analysis

Introduction to Panel Data Analysis

Introduction to Panel Data Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Introduction</strong> <strong>to</strong> <strong>Panel</strong> <strong>Data</strong> <strong>Analysis</strong><br />

Youngki Shin<br />

Department of Economics<br />

Email: yshin29@uwo.ca<br />

Statistics and <strong>Data</strong> Series at Western<br />

November 21, 2012<br />

1 / 40


Motivation<br />

More observations mean more information.<br />

2 / 40


Motivation<br />

More observations mean more information.<br />

More observations with a certain structure mean much more<br />

information: pooled cross sections and panel data<br />

2 / 40


Motivation<br />

More observations mean more information.<br />

More observations with a certain structure mean much more<br />

information: pooled cross sections and panel data<br />

How can we extract additional information from pooled cross sections<br />

or panel data?<br />

2 / 40


Example<br />

Effect of an Incinera<strong>to</strong>r on Housing Prices<br />

With cross-sectional data in 1981, we have<br />

̂rprice = 101, 307.5 − 30, 688.27nearinc<br />

(3, 093.0) (5, 827.71)<br />

n = 142<br />

3 / 40


Example<br />

Effect of an Incinera<strong>to</strong>r on Housing Prices<br />

With cross-sectional data in 1981, we have<br />

̂rprice = 101, 307.5 − 30, 688.27nearinc<br />

n = 142<br />

(3, 093.0) (5, 827.71)<br />

With another cross-sectional data in 1978 when there were no incinera<strong>to</strong>r,<br />

we have<br />

̂rprice = 82, 517.23 − 18, 824.37nearinc<br />

n = 179<br />

(2, 653.79) (4, 744.59)<br />

3 / 40


Example<br />

Effect of an Incinera<strong>to</strong>r on Housing Prices<br />

With cross-sectional data in 1981, we have<br />

̂rprice = 101, 307.5 − 30, 688.27nearinc<br />

n = 142<br />

(3, 093.0) (5, 827.71)<br />

With another cross-sectional data in 1978 when there were no incinera<strong>to</strong>r,<br />

we have<br />

̂rprice = 82, 517.23 − 18, 824.37nearinc<br />

n = 179<br />

(2, 653.79) (4, 744.59)<br />

Therefore, the true effect of the incinera<strong>to</strong>r in not −30, 688.27 but<br />

−30, 688.27 − (−18, 824.37) = −11, 863.90.<br />

3 / 40


Outline<br />

<strong>Data</strong> Structure<br />

Policy Evaluation with Pooled Cross Sections<br />

Three Approaches in <strong>Panel</strong> <strong>Data</strong> Estimation<br />

First Difference (FD) Estima<strong>to</strong>r<br />

Fixed Effect (FE) Estima<strong>to</strong>r<br />

Random Effect (RE) Estima<strong>to</strong>r<br />

Empirical Application: Smoking on Birth Outcomes<br />

Concluding Remarks<br />

4 / 40


Outline<br />

<strong>Data</strong> Structure<br />

Policy Evaluation with Pooled Cross Sections<br />

Three Approaches in <strong>Panel</strong> <strong>Data</strong> Estimation<br />

First Difference (FD) Estima<strong>to</strong>r<br />

Fixed Effect (FE) Estima<strong>to</strong>r<br />

Random Effect (RE) Estima<strong>to</strong>r<br />

Empirical Application: Smoking on Birth Outcomes<br />

Concluding Remarks<br />

5 / 40


<strong>Data</strong> Structure (cont.)<br />

A set of pooled cross sections is obtained by sampling randomly from<br />

a large population at different time points.<br />

A (typical) panel data set follow the same individuals over time.<br />

For example, consider that I sample three individuals from this room<br />

at two time points:<br />

Time Pooled <strong>Panel</strong><br />

t=1 John, Jane, Evelyn Eric, Andrew, Rachel<br />

t=2 Kyle, Justin, Lisa Eric, Andrew, Rachel<br />

6 / 40


<strong>Data</strong> Structure (cont.)<br />

A Snapshot of <strong>Data</strong><br />

Table: Pooled <strong>Data</strong><br />

year rprice nearinc y81<br />

1978 60000 0 0<br />

1978 54000 1 0<br />

1978 38000 1 0<br />

. . .<br />

1981 82000 1 1<br />

1981 52000 0 1<br />

1981 97000 0 1<br />

Table: <strong>Panel</strong> <strong>Data</strong><br />

id year inf unem<br />

12 1950 7.3 3.5<br />

12 1951 9.1 2.7<br />

16 1950 5.3 5.4<br />

16 1951 4.6 6.7<br />

. . . .<br />

43 1950 7.1 4.2<br />

43 1951 8.5 3.2<br />

47 1950 6.7 5.4<br />

47 1951 2.6 9.4<br />

7 / 40


<strong>Data</strong> Structure (cont.)<br />

There are also very useful panel structures other than the<br />

individual-time combination.<br />

1 Twins data: i is for twins id, and t is for the individual among the<br />

specific twins. Control for unobserved generic fac<strong>to</strong>rs.<br />

2 School data: students sampled from many schools (or classrooms).<br />

Then, i is for school id, and t is for the student in school i.<br />

8 / 40


<strong>Data</strong> Structure (cont.)<br />

Examples of pooled cross sections:<br />

Current Population Survey (CPS), USA<br />

Examples of a panel data:<br />

Labor Market Activity Survey (LMAS), Canada<br />

<strong>Panel</strong> Study of Income Dynamics (PSID), USA<br />

National Longitudinal Survey of Youth (NLSY), USA<br />

A time series of provincial (or country) level data.<br />

ex) inflation and unemployment rate of 50 countries in 1950–2010.<br />

It is usually easier <strong>to</strong> collect pooled cross sections than <strong>to</strong> do panel<br />

data.<br />

9 / 40


Outline<br />

<strong>Data</strong> Structure<br />

Policy Evaluation with Pooled Cross Sections<br />

Three Approaches in <strong>Panel</strong> <strong>Data</strong> Estimation<br />

First Difference (FD) Estima<strong>to</strong>r<br />

Fixed Effect (FE) Estima<strong>to</strong>r<br />

Random Effect (RE) Estima<strong>to</strong>r<br />

Empirical Application: Smoking on Birth Outcomes<br />

Concluding Remarks<br />

10 / 40


Policy Evaluation with Pooled Cross Sections<br />

Difference-in-Difference Estima<strong>to</strong>r<br />

Terminology:<br />

Treatment Group: those who are affected by a policy (a treatment)<br />

Control Group: those who are not.<br />

The object of policy evaluation is <strong>to</strong> measure the (mean) difference of<br />

outcomes between the treatment group and the control group. This<br />

measure is also called the average treatment effect.<br />

Consider that you are testing the effect of a new drug. How can you<br />

design the experiment? Randomization.<br />

Recall the incinera<strong>to</strong>r and housing prices example. Is randomization<br />

possible?<br />

11 / 40


Policy Evaluation with Pooled Cross Sections<br />

Difference-in-Difference Estima<strong>to</strong>r<br />

Consider an example of a drug test:<br />

blprs i = β 0 + β 1 treat i + u i<br />

If you randomized the control/treatment groups well, i.e.<br />

Cov(treat i , u i ) = 0, then you can estimate the effect of the drug by a<br />

single cross section.<br />

In policy evaluation in social sciences, treat i and u i are easily<br />

correlated:<br />

log(wage i ) = β 0 + β 1 jbtrn i + u i<br />

12 / 40


Policy Evaluation with Pooled Cross Sections<br />

Difference-in-Difference Estima<strong>to</strong>r<br />

Pooled cross sections help us <strong>to</strong> evaluate the policy effect correctly by<br />

measuring the difference twice (before and after the policy<br />

implementation.)<br />

Recall the two regressions in the incinera<strong>to</strong>r example:<br />

rprice = γ 0 + γ 1 nearinc + u in years 1978 and 1981<br />

ˆδ 1 = ˆγ 1,81 − ˆγ 1,78<br />

= ( rprice 81,nr − rprice 81,fr<br />

)<br />

−<br />

(<br />

rprice78,nr − rprice 78,fr<br />

)<br />

If perfectly randomized, the second term is 0.<br />

This estima<strong>to</strong>r is called the Difference-in-Difference estima<strong>to</strong>r.<br />

13 / 40


Policy Evaluation with a Pooled Cross Section<br />

Difference-in-Difference Estima<strong>to</strong>r<br />

The effect can be estimated just by a single regression with some<br />

dummy variable.<br />

rprice = β 0 + δ 0 y81 + β 1 nearinc + δ 1 y81 · nearinc + u<br />

This result is not intuitive. Just follow the logic:<br />

Before (y81 = 0) After (y81 = 1) After-Before<br />

Control (nearinc = 0) β 0 β 0 + δ 0 δ 0<br />

Treatment (nearinc = 1) β 0 + β 1 β 0 + δ 0 + β 1 + δ 1 δ 0 + δ 1<br />

Treatment-Control β 1 β 1 + δ 1 δ 1<br />

Therefore, δ 1 in the above regression gives the same estimate of the<br />

Difference-in-Difference estima<strong>to</strong>r.<br />

14 / 40


Outline<br />

<strong>Data</strong> Structure<br />

Policy Evaluation with Pooled Cross Sections<br />

Three Approaches in <strong>Panel</strong> <strong>Data</strong> Estimation<br />

First Difference (FD) Estima<strong>to</strong>r<br />

Fixed Effect (FE) Estima<strong>to</strong>r<br />

Random Effect (RE) Estima<strong>to</strong>r<br />

Empirical Application: Smoking on Birth Outcomes<br />

Concluding Remarks<br />

15 / 40


<strong>Panel</strong> <strong>Data</strong> and the First Difference (FD) Estima<strong>to</strong>r<br />

In panel data, we follow the same individual over time. This specific<br />

structure enables us <strong>to</strong> conduct a better analysis.<br />

Specifically, we can control for certain types of omitted variables<br />

called unobserved heterogeneity.<br />

Let us think about some examples:<br />

log(wage it ) = β 0 + δ 0 d2 t + β 1 educ it + a i + u it<br />

} {{ }<br />

v it<br />

Notation: now we have two subscripts, i and t.<br />

Both a i and u it are unobservables called a fixed effect and an<br />

idiosyncratic error, respectively.<br />

16 / 40


<strong>Panel</strong> <strong>Data</strong> and the First Difference (FD) Estima<strong>to</strong>r<br />

For simplicity, consider two periods model:<br />

y it = β 0 + δ 0 d2 t + β 1 x it + a i + u it t = 1, 2.<br />

The pooled OLS does not work well since a i is usually correlated with<br />

x it , i.e. Cov(v it , x it ) ≠ 0.<br />

A simple solution is the First-Difference (FD) estima<strong>to</strong>r.<br />

y i2 = (β 0 + δ 0 ) + β 1 x i2 + a i + u i2 t = 2<br />

y i1 = β 0 + β 1 x i1 + a i + u i1 t = 1<br />

Taking a difference gives<br />

y i2 − y i1 = δ 0 + β 1 (x i2 − x i1 ) + (u i2 − u i1 )<br />

or<br />

∆y i = δ 0 + β 1 ∆x i + ∆u i .<br />

17 / 40


<strong>Panel</strong> <strong>Data</strong> and the First Difference (FD) Estima<strong>to</strong>r<br />

The (pooled) OLS works in the new regression,<br />

1 ∆u i and ∆x i are uncorrelated;<br />

2 ∆x i has some variation.<br />

∆y i = δ 0 + β 1 ∆x i + ∆u i ,<br />

if<br />

The second condition is violated if x it does not change over time:<br />

ex) gender, race, etc.. Then, ∆x i = 0.<br />

Even in the wage equation example,<br />

log(wage it ) = β 0 + δ 0 d2 t + β 1 educ it + a i + u it ,<br />

Most working population do not increase the years of educ.<br />

18 / 40


<strong>Panel</strong> <strong>Data</strong> and the First Difference (FD) Estima<strong>to</strong>r<br />

More than Two Time Periods<br />

When panel data contain more than two time periods, we can still<br />

apply the FD estima<strong>to</strong>r <strong>to</strong> control for unobserved heterogeneity.<br />

The sufficient condition for the estima<strong>to</strong>r <strong>to</strong> be valid is<br />

This condition is violated when<br />

Cov(x it , u is ) = 0 for all t and s.<br />

1 Future regressors react <strong>to</strong> the past dependent variable (feedback);<br />

2 Regressors contain a lagged dependent variable;<br />

3 An important (i.e. related <strong>to</strong> x it ) time-varying regressor is omitted.<br />

Take differences with adjacent time periods and run the following<br />

regression when t = 1, 2, and 3:<br />

∆y it = α 0 + α 3 d3 t + β 1 ∆x it + ∆u it for t = 2, 3.<br />

19 / 40


Additional Remarks on FD Estima<strong>to</strong>r<br />

Due <strong>to</strong> the expansion over the time dimension, serial correlation may<br />

arise.<br />

Also, we cannot exclude the heteroskedasticity problem.<br />

Since we use the OLS estima<strong>to</strong>r, we can apply the White correction or<br />

the HAC estimation method as before.<br />

20 / 40


Fixed Effect Estima<strong>to</strong>r<br />

Consider a simple error component model again:<br />

y it = β 1 x it + a i + u it , t = 1, . . . , T and i = 1, . . . , n.<br />

We assume that the idiosyncratic error u it is ‘innocuous’ in the sense:<br />

E(u it |X i ) = 0 or E(u it |x it ) = 0.<br />

However, the individual fixed effect a i could be arbitrarily correlated<br />

with x it .<br />

We have already known that the FD estima<strong>to</strong>r cancels out the<br />

unobserved heterogeneity a i .<br />

21 / 40


Fixed Effect Estima<strong>to</strong>r<br />

There is a different way <strong>to</strong> cancel out unobserved heterogeneity.<br />

First, fix the individual i and take an average over time:<br />

ȳ i = β 1 ¯x i + a i + ū i .<br />

where<br />

ȳ i = 1 T<br />

T∑<br />

y it , ¯x i = 1 T<br />

t=1<br />

T∑<br />

x it , and ū i = 1 T<br />

t=1<br />

T∑<br />

u it .<br />

t=1<br />

The point is<br />

ā i = 1 T<br />

T∑<br />

a i = 1 T Ta i = a i .<br />

t=1<br />

22 / 40


Fixed Effect Estima<strong>to</strong>r<br />

Now, take a difference between two equations:<br />

y it = β 1 x it + a i + u it , t = 1, 2, . . . , T .<br />

ȳ i = β 1 ¯x i + a i + ū i .<br />

Then, what we have is<br />

y it − ȳ i = β 1 (x it − ¯x i ) + (u it − ū i ),<br />

t = 1, 2, . . . , T<br />

or<br />

ÿ it = β 1 ẍ it + ü it , t = 1, 2, . . . , T .<br />

We may apply the pooled OLS on the last equation.<br />

23 / 40


Fixed Effect Estima<strong>to</strong>r<br />

The FE estima<strong>to</strong>r uses information from within group (i) variation:<br />

ÿ i1 = y i1 − ȳ i<br />

ÿ i2 = y i2 − ȳ i<br />

.<br />

ÿ iT = y iT − ȳ i<br />

For this reason, the FE estima<strong>to</strong>r is also called within estima<strong>to</strong>r.<br />

This can be readily extended <strong>to</strong> a multiple regression model:<br />

ÿ it = β 1 ẍ 1it + β 2 ẍ 2it + . . . + β k ẍ kit + ü it<br />

24 / 40


Fixed Effect Estima<strong>to</strong>r<br />

FD vs. FE<br />

If T = 2, the FD estima<strong>to</strong>r and the FE estima<strong>to</strong>r are identical:<br />

( )<br />

yi1 + y i2<br />

ÿ i2 ≡ y i2 − ȳ i = y i2 −<br />

= y 1 − y 2<br />

≡ 1 2<br />

2 2 ∆y i2.<br />

Therefore,<br />

ÿ i2 = β 1 ẍ it + ü it<br />

⇐⇒ 1 2 ∆y i2 = β 1<br />

1<br />

2 ∆x i2 + 1 2 ∆u i2<br />

⇐⇒ ∆y i2 = β 1 ∆x i2 + ∆u i2<br />

However, they are different in a finite sample if T > 2. Unless there is<br />

a unit root (or severe serial correlation) problem, you would better use<br />

the FE estima<strong>to</strong>r.<br />

25 / 40


Random Effect Estima<strong>to</strong>r<br />

In the random effect model:<br />

we assume that<br />

y it = β 0 + β 1 x it + a i + u it ,<br />

Cov(x it , a i ) = 0.<br />

Then, we come back <strong>to</strong> the ‘nice’ world where we don’t need <strong>to</strong><br />

cancel out a i . Just use the pooled OLS?<br />

26 / 40


Random Effect Estima<strong>to</strong>r<br />

In the random effect model:<br />

we assume that<br />

y it = β 0 + β 1 x it + a i + u it ,<br />

Cov(x it , a i ) = 0.<br />

Then, we come back <strong>to</strong> the ‘nice’ world where we don’t need <strong>to</strong><br />

cancel out a i . Just use the pooled OLS?<br />

No. There is a serial correlation problem.<br />

26 / 40


Random Effect Estima<strong>to</strong>r<br />

Serial Correlation in the RE model<br />

We have two components in the error term:<br />

v it = a i + u it<br />

Suppose that u it is <strong>to</strong>tally innocuous again:<br />

Cov(a i , u it ) = Cov(u it , u is ) = 0 for t ≠ s.<br />

Now, we calculate Corr(v it , v is ) and show that it is not zero:<br />

Var(v it ) = Var(a i + u it ) = σ 2 a + σ 2 u<br />

Cov(v it , v is ) = E ((a i + u it )(a i + u is ))<br />

= E(a 2 i + a i u is + a i u it + u it u is )<br />

= E(a 2 i ) = σ 2 a<br />

27 / 40


Random Effect Estima<strong>to</strong>r<br />

Serial Correlation in the RE model<br />

Therefore,<br />

Corr(v it , v is ) =<br />

σ2 a<br />

σ 2 a + σ 2 u<br />

≠ 0<br />

Any inference based on the pooled OLS would be incorrect.<br />

However, we know how <strong>to</strong> fix this problem. Do GLS!<br />

We want <strong>to</strong> transform the original model in<strong>to</strong><br />

ỹ it = β 0 + β 1˜x it + ṽ it<br />

where ṽ it does not have the serial correlation anymore.<br />

28 / 40


Random Effect Estima<strong>to</strong>r<br />

We multiplied ρ and <strong>to</strong>ok a difference when there is a AR(1) serial<br />

correlation. In this case, we multiply<br />

and take a difference as<br />

[<br />

σu<br />

2 λ = 1 −<br />

σu 2 + T σa<br />

2<br />

] (1/2)<br />

y it − λȳ i = β 0 (1 − λ) + β 1 (x it − λ¯x i ) + v it − λ¯v i<br />

We can show that ṽ it (= v it − λ¯v i ) is not serially correlated.<br />

The λ should be estimated by ˆλ.<br />

This specific GLS estima<strong>to</strong>r is called the Random Effect (RE)<br />

estima<strong>to</strong>r.<br />

29 / 40


Random Effect Estima<strong>to</strong>r<br />

The RE estima<strong>to</strong>r is something between the pooled OLS and the FE<br />

estima<strong>to</strong>r. Note that in Equation:<br />

y it − λȳ i = β 0 (1 − λ) + β 1 (x it − λ¯x i ) + v it − λ¯v i ,<br />

it becomes the pooled OLS when λ = 0, and does the FE estima<strong>to</strong>r<br />

when λ = 1.<br />

The λ is always between 0 and 1 in the RE model.<br />

As T → ∞, the FE and RE estima<strong>to</strong>rs are equivalent since λ → 1.<br />

30 / 40


Random Effect Estima<strong>to</strong>r<br />

RE vs. FE<br />

If you believe that there is obvious endogenous fixed fac<strong>to</strong>r, a i , in<br />

your model, you should use the FE estima<strong>to</strong>r.<br />

Otherwise, the RE estima<strong>to</strong>r will tell you more: non time-varying<br />

regressors, efficiency etc.<br />

Keep in mind that the RE estima<strong>to</strong>r is not even consistent if<br />

Cov(x it , a i ) ≠ 0.<br />

We can test whether Cov(x it , a i ) = 0 or not.<br />

31 / 40


Random Effect Estima<strong>to</strong>r<br />

Hausman Test<br />

The idea of the Hausman test is simple. The null hypothesis is<br />

H 0 :Cov(x it , a i ) = 0<br />

H 1 :Cov(x it , a i ) ≠ 0<br />

Under H 0 , both RE and FE are consistent:<br />

p p<br />

̂β RE → β, ̂βFE → β.<br />

Thus, we can expect that ̂β RE ≈ ̂β FE .<br />

However, under H 1 , only ̂β FE is consistent. Therefore, we reject H 0 if<br />

the difference between ̂β RE and ̂β FE is large enough.<br />

32 / 40


Outline<br />

<strong>Data</strong> Structure<br />

Policy Evaluation with Pooled Cross Sections<br />

Three Approaches in <strong>Panel</strong> <strong>Data</strong> Estimation<br />

First Difference (FD) Estima<strong>to</strong>r<br />

Fixed Effect (FE) Estima<strong>to</strong>r<br />

Random Effect (RE) Estima<strong>to</strong>r<br />

Empirical Application: Smoking on Birth Outcomes<br />

Concluding Remarks<br />

33 / 40


Empirical Application: Smoking on Birth Outcomes<br />

“Infants born <strong>to</strong> women who smoke during pregnancy have a lower average<br />

birthweight... Low birthweight is associated with increased risk for<br />

neonatal, perinatal, and infant morbidity and mortality.”<br />

(Women and Smoking: A Report of the Surgeon General, 2001, requoted from<br />

Abrevaya (2006))<br />

34 / 40


Empirical Application: Smoking on Birth Outcomes<br />

The direct medical costs: According <strong>to</strong> the estimates of Lewit et al.<br />

(1995), the low-birthweight (LBW) infants (less than 10% of births)<br />

account for more than 1/3 of health care costs during the first year of life.<br />

The long-term costs:<br />

“Hack et al. (1995) find that LBW babies have developmental problems in<br />

cognition, attention and neuromo<strong>to</strong>r functioning that persist until<br />

adolescence.” (Abrevaya (2006))<br />

35 / 40


Empirical Application: Smoking on Birth Outcomes<br />

How <strong>to</strong> Estimate<br />

The OLS estimates would be biased in<strong>to</strong> the negative direction due <strong>to</strong><br />

endogeneity.<br />

IV estimation?<br />

492 Comparison between J. ABREVAYA OLS and IV estimates<br />

from Abrevaya (2006)<br />

36 / 40


Empirical Application: Smoking on Birth Outcomes<br />

The fixed-effect (FE) estimation can be used if panel data are<br />

available.<br />

Abrevaya (2006) constructed a pseudo panel data set and showed<br />

that the FE estimate is smaller than that of the OLS.<br />

y ib = x ′ ib β + γs ib + c i + u ib<br />

where i is Mom’s id and b is the order of a baby from Mom i.<br />

The estimation results for γ by OLS and FE are −243.27(3.20)<br />

−144.04(4.75), respectively.<br />

37 / 40


Concluding Remarks<br />

38 / 40


Concluding Remarks<br />

Pooled cross sections are very similar <strong>to</strong> a single cross section, but<br />

observations across different time points help evaluate the correct<br />

policy effect.<br />

38 / 40


Concluding Remarks<br />

Pooled cross sections are very similar <strong>to</strong> a single cross section, but<br />

observations across different time points help evaluate the correct<br />

policy effect.<br />

Extra information contained in panel data enables us <strong>to</strong> control for<br />

the individual fixed effect by FD and FE estima<strong>to</strong>rs.<br />

38 / 40


Concluding Remarks<br />

Pooled cross sections are very similar <strong>to</strong> a single cross section, but<br />

observations across different time points help evaluate the correct<br />

policy effect.<br />

Extra information contained in panel data enables us <strong>to</strong> control for<br />

the individual fixed effect by FD and FE estima<strong>to</strong>rs.<br />

If the fixed effect is not correlated with regressors, we can apply RE<br />

estima<strong>to</strong>r, which is a GLS estima<strong>to</strong>r.<br />

38 / 40


Concluding Remarks<br />

Pooled cross sections are very similar <strong>to</strong> a single cross section, but<br />

observations across different time points help evaluate the correct<br />

policy effect.<br />

Extra information contained in panel data enables us <strong>to</strong> control for<br />

the individual fixed effect by FD and FE estima<strong>to</strong>rs.<br />

If the fixed effect is not correlated with regressors, we can apply RE<br />

estima<strong>to</strong>r, which is a GLS estima<strong>to</strong>r.<br />

<strong>Panel</strong> data are not restricted <strong>to</strong> the individual-time structure.<br />

38 / 40


Stata Commands<br />

Load the data set filename.dta.<br />

First, we need <strong>to</strong> set an id variable and a time variable. Check the<br />

relevant variable names.<br />

xtset id time.<br />

Now type xtsum.<br />

The command for the FE estima<strong>to</strong>r is<br />

xtreg dep x1 x2 x3, . . ., fe<br />

The command for the RE estima<strong>to</strong>r is<br />

xtreg dep x1 x2 x3, . . ., re<br />

39 / 40

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!