Introduction to Panel Data Analysis
Introduction to Panel Data Analysis
Introduction to Panel Data Analysis
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Introduction</strong> <strong>to</strong> <strong>Panel</strong> <strong>Data</strong> <strong>Analysis</strong><br />
Youngki Shin<br />
Department of Economics<br />
Email: yshin29@uwo.ca<br />
Statistics and <strong>Data</strong> Series at Western<br />
November 21, 2012<br />
1 / 40
Motivation<br />
More observations mean more information.<br />
2 / 40
Motivation<br />
More observations mean more information.<br />
More observations with a certain structure mean much more<br />
information: pooled cross sections and panel data<br />
2 / 40
Motivation<br />
More observations mean more information.<br />
More observations with a certain structure mean much more<br />
information: pooled cross sections and panel data<br />
How can we extract additional information from pooled cross sections<br />
or panel data?<br />
2 / 40
Example<br />
Effect of an Incinera<strong>to</strong>r on Housing Prices<br />
With cross-sectional data in 1981, we have<br />
̂rprice = 101, 307.5 − 30, 688.27nearinc<br />
(3, 093.0) (5, 827.71)<br />
n = 142<br />
3 / 40
Example<br />
Effect of an Incinera<strong>to</strong>r on Housing Prices<br />
With cross-sectional data in 1981, we have<br />
̂rprice = 101, 307.5 − 30, 688.27nearinc<br />
n = 142<br />
(3, 093.0) (5, 827.71)<br />
With another cross-sectional data in 1978 when there were no incinera<strong>to</strong>r,<br />
we have<br />
̂rprice = 82, 517.23 − 18, 824.37nearinc<br />
n = 179<br />
(2, 653.79) (4, 744.59)<br />
3 / 40
Example<br />
Effect of an Incinera<strong>to</strong>r on Housing Prices<br />
With cross-sectional data in 1981, we have<br />
̂rprice = 101, 307.5 − 30, 688.27nearinc<br />
n = 142<br />
(3, 093.0) (5, 827.71)<br />
With another cross-sectional data in 1978 when there were no incinera<strong>to</strong>r,<br />
we have<br />
̂rprice = 82, 517.23 − 18, 824.37nearinc<br />
n = 179<br />
(2, 653.79) (4, 744.59)<br />
Therefore, the true effect of the incinera<strong>to</strong>r in not −30, 688.27 but<br />
−30, 688.27 − (−18, 824.37) = −11, 863.90.<br />
3 / 40
Outline<br />
<strong>Data</strong> Structure<br />
Policy Evaluation with Pooled Cross Sections<br />
Three Approaches in <strong>Panel</strong> <strong>Data</strong> Estimation<br />
First Difference (FD) Estima<strong>to</strong>r<br />
Fixed Effect (FE) Estima<strong>to</strong>r<br />
Random Effect (RE) Estima<strong>to</strong>r<br />
Empirical Application: Smoking on Birth Outcomes<br />
Concluding Remarks<br />
4 / 40
Outline<br />
<strong>Data</strong> Structure<br />
Policy Evaluation with Pooled Cross Sections<br />
Three Approaches in <strong>Panel</strong> <strong>Data</strong> Estimation<br />
First Difference (FD) Estima<strong>to</strong>r<br />
Fixed Effect (FE) Estima<strong>to</strong>r<br />
Random Effect (RE) Estima<strong>to</strong>r<br />
Empirical Application: Smoking on Birth Outcomes<br />
Concluding Remarks<br />
5 / 40
<strong>Data</strong> Structure (cont.)<br />
A set of pooled cross sections is obtained by sampling randomly from<br />
a large population at different time points.<br />
A (typical) panel data set follow the same individuals over time.<br />
For example, consider that I sample three individuals from this room<br />
at two time points:<br />
Time Pooled <strong>Panel</strong><br />
t=1 John, Jane, Evelyn Eric, Andrew, Rachel<br />
t=2 Kyle, Justin, Lisa Eric, Andrew, Rachel<br />
6 / 40
<strong>Data</strong> Structure (cont.)<br />
A Snapshot of <strong>Data</strong><br />
Table: Pooled <strong>Data</strong><br />
year rprice nearinc y81<br />
1978 60000 0 0<br />
1978 54000 1 0<br />
1978 38000 1 0<br />
. . .<br />
1981 82000 1 1<br />
1981 52000 0 1<br />
1981 97000 0 1<br />
Table: <strong>Panel</strong> <strong>Data</strong><br />
id year inf unem<br />
12 1950 7.3 3.5<br />
12 1951 9.1 2.7<br />
16 1950 5.3 5.4<br />
16 1951 4.6 6.7<br />
. . . .<br />
43 1950 7.1 4.2<br />
43 1951 8.5 3.2<br />
47 1950 6.7 5.4<br />
47 1951 2.6 9.4<br />
7 / 40
<strong>Data</strong> Structure (cont.)<br />
There are also very useful panel structures other than the<br />
individual-time combination.<br />
1 Twins data: i is for twins id, and t is for the individual among the<br />
specific twins. Control for unobserved generic fac<strong>to</strong>rs.<br />
2 School data: students sampled from many schools (or classrooms).<br />
Then, i is for school id, and t is for the student in school i.<br />
8 / 40
<strong>Data</strong> Structure (cont.)<br />
Examples of pooled cross sections:<br />
Current Population Survey (CPS), USA<br />
Examples of a panel data:<br />
Labor Market Activity Survey (LMAS), Canada<br />
<strong>Panel</strong> Study of Income Dynamics (PSID), USA<br />
National Longitudinal Survey of Youth (NLSY), USA<br />
A time series of provincial (or country) level data.<br />
ex) inflation and unemployment rate of 50 countries in 1950–2010.<br />
It is usually easier <strong>to</strong> collect pooled cross sections than <strong>to</strong> do panel<br />
data.<br />
9 / 40
Outline<br />
<strong>Data</strong> Structure<br />
Policy Evaluation with Pooled Cross Sections<br />
Three Approaches in <strong>Panel</strong> <strong>Data</strong> Estimation<br />
First Difference (FD) Estima<strong>to</strong>r<br />
Fixed Effect (FE) Estima<strong>to</strong>r<br />
Random Effect (RE) Estima<strong>to</strong>r<br />
Empirical Application: Smoking on Birth Outcomes<br />
Concluding Remarks<br />
10 / 40
Policy Evaluation with Pooled Cross Sections<br />
Difference-in-Difference Estima<strong>to</strong>r<br />
Terminology:<br />
Treatment Group: those who are affected by a policy (a treatment)<br />
Control Group: those who are not.<br />
The object of policy evaluation is <strong>to</strong> measure the (mean) difference of<br />
outcomes between the treatment group and the control group. This<br />
measure is also called the average treatment effect.<br />
Consider that you are testing the effect of a new drug. How can you<br />
design the experiment? Randomization.<br />
Recall the incinera<strong>to</strong>r and housing prices example. Is randomization<br />
possible?<br />
11 / 40
Policy Evaluation with Pooled Cross Sections<br />
Difference-in-Difference Estima<strong>to</strong>r<br />
Consider an example of a drug test:<br />
blprs i = β 0 + β 1 treat i + u i<br />
If you randomized the control/treatment groups well, i.e.<br />
Cov(treat i , u i ) = 0, then you can estimate the effect of the drug by a<br />
single cross section.<br />
In policy evaluation in social sciences, treat i and u i are easily<br />
correlated:<br />
log(wage i ) = β 0 + β 1 jbtrn i + u i<br />
12 / 40
Policy Evaluation with Pooled Cross Sections<br />
Difference-in-Difference Estima<strong>to</strong>r<br />
Pooled cross sections help us <strong>to</strong> evaluate the policy effect correctly by<br />
measuring the difference twice (before and after the policy<br />
implementation.)<br />
Recall the two regressions in the incinera<strong>to</strong>r example:<br />
rprice = γ 0 + γ 1 nearinc + u in years 1978 and 1981<br />
ˆδ 1 = ˆγ 1,81 − ˆγ 1,78<br />
= ( rprice 81,nr − rprice 81,fr<br />
)<br />
−<br />
(<br />
rprice78,nr − rprice 78,fr<br />
)<br />
If perfectly randomized, the second term is 0.<br />
This estima<strong>to</strong>r is called the Difference-in-Difference estima<strong>to</strong>r.<br />
13 / 40
Policy Evaluation with a Pooled Cross Section<br />
Difference-in-Difference Estima<strong>to</strong>r<br />
The effect can be estimated just by a single regression with some<br />
dummy variable.<br />
rprice = β 0 + δ 0 y81 + β 1 nearinc + δ 1 y81 · nearinc + u<br />
This result is not intuitive. Just follow the logic:<br />
Before (y81 = 0) After (y81 = 1) After-Before<br />
Control (nearinc = 0) β 0 β 0 + δ 0 δ 0<br />
Treatment (nearinc = 1) β 0 + β 1 β 0 + δ 0 + β 1 + δ 1 δ 0 + δ 1<br />
Treatment-Control β 1 β 1 + δ 1 δ 1<br />
Therefore, δ 1 in the above regression gives the same estimate of the<br />
Difference-in-Difference estima<strong>to</strong>r.<br />
14 / 40
Outline<br />
<strong>Data</strong> Structure<br />
Policy Evaluation with Pooled Cross Sections<br />
Three Approaches in <strong>Panel</strong> <strong>Data</strong> Estimation<br />
First Difference (FD) Estima<strong>to</strong>r<br />
Fixed Effect (FE) Estima<strong>to</strong>r<br />
Random Effect (RE) Estima<strong>to</strong>r<br />
Empirical Application: Smoking on Birth Outcomes<br />
Concluding Remarks<br />
15 / 40
<strong>Panel</strong> <strong>Data</strong> and the First Difference (FD) Estima<strong>to</strong>r<br />
In panel data, we follow the same individual over time. This specific<br />
structure enables us <strong>to</strong> conduct a better analysis.<br />
Specifically, we can control for certain types of omitted variables<br />
called unobserved heterogeneity.<br />
Let us think about some examples:<br />
log(wage it ) = β 0 + δ 0 d2 t + β 1 educ it + a i + u it<br />
} {{ }<br />
v it<br />
Notation: now we have two subscripts, i and t.<br />
Both a i and u it are unobservables called a fixed effect and an<br />
idiosyncratic error, respectively.<br />
16 / 40
<strong>Panel</strong> <strong>Data</strong> and the First Difference (FD) Estima<strong>to</strong>r<br />
For simplicity, consider two periods model:<br />
y it = β 0 + δ 0 d2 t + β 1 x it + a i + u it t = 1, 2.<br />
The pooled OLS does not work well since a i is usually correlated with<br />
x it , i.e. Cov(v it , x it ) ≠ 0.<br />
A simple solution is the First-Difference (FD) estima<strong>to</strong>r.<br />
y i2 = (β 0 + δ 0 ) + β 1 x i2 + a i + u i2 t = 2<br />
y i1 = β 0 + β 1 x i1 + a i + u i1 t = 1<br />
Taking a difference gives<br />
y i2 − y i1 = δ 0 + β 1 (x i2 − x i1 ) + (u i2 − u i1 )<br />
or<br />
∆y i = δ 0 + β 1 ∆x i + ∆u i .<br />
17 / 40
<strong>Panel</strong> <strong>Data</strong> and the First Difference (FD) Estima<strong>to</strong>r<br />
The (pooled) OLS works in the new regression,<br />
1 ∆u i and ∆x i are uncorrelated;<br />
2 ∆x i has some variation.<br />
∆y i = δ 0 + β 1 ∆x i + ∆u i ,<br />
if<br />
The second condition is violated if x it does not change over time:<br />
ex) gender, race, etc.. Then, ∆x i = 0.<br />
Even in the wage equation example,<br />
log(wage it ) = β 0 + δ 0 d2 t + β 1 educ it + a i + u it ,<br />
Most working population do not increase the years of educ.<br />
18 / 40
<strong>Panel</strong> <strong>Data</strong> and the First Difference (FD) Estima<strong>to</strong>r<br />
More than Two Time Periods<br />
When panel data contain more than two time periods, we can still<br />
apply the FD estima<strong>to</strong>r <strong>to</strong> control for unobserved heterogeneity.<br />
The sufficient condition for the estima<strong>to</strong>r <strong>to</strong> be valid is<br />
This condition is violated when<br />
Cov(x it , u is ) = 0 for all t and s.<br />
1 Future regressors react <strong>to</strong> the past dependent variable (feedback);<br />
2 Regressors contain a lagged dependent variable;<br />
3 An important (i.e. related <strong>to</strong> x it ) time-varying regressor is omitted.<br />
Take differences with adjacent time periods and run the following<br />
regression when t = 1, 2, and 3:<br />
∆y it = α 0 + α 3 d3 t + β 1 ∆x it + ∆u it for t = 2, 3.<br />
19 / 40
Additional Remarks on FD Estima<strong>to</strong>r<br />
Due <strong>to</strong> the expansion over the time dimension, serial correlation may<br />
arise.<br />
Also, we cannot exclude the heteroskedasticity problem.<br />
Since we use the OLS estima<strong>to</strong>r, we can apply the White correction or<br />
the HAC estimation method as before.<br />
20 / 40
Fixed Effect Estima<strong>to</strong>r<br />
Consider a simple error component model again:<br />
y it = β 1 x it + a i + u it , t = 1, . . . , T and i = 1, . . . , n.<br />
We assume that the idiosyncratic error u it is ‘innocuous’ in the sense:<br />
E(u it |X i ) = 0 or E(u it |x it ) = 0.<br />
However, the individual fixed effect a i could be arbitrarily correlated<br />
with x it .<br />
We have already known that the FD estima<strong>to</strong>r cancels out the<br />
unobserved heterogeneity a i .<br />
21 / 40
Fixed Effect Estima<strong>to</strong>r<br />
There is a different way <strong>to</strong> cancel out unobserved heterogeneity.<br />
First, fix the individual i and take an average over time:<br />
ȳ i = β 1 ¯x i + a i + ū i .<br />
where<br />
ȳ i = 1 T<br />
T∑<br />
y it , ¯x i = 1 T<br />
t=1<br />
T∑<br />
x it , and ū i = 1 T<br />
t=1<br />
T∑<br />
u it .<br />
t=1<br />
The point is<br />
ā i = 1 T<br />
T∑<br />
a i = 1 T Ta i = a i .<br />
t=1<br />
22 / 40
Fixed Effect Estima<strong>to</strong>r<br />
Now, take a difference between two equations:<br />
y it = β 1 x it + a i + u it , t = 1, 2, . . . , T .<br />
ȳ i = β 1 ¯x i + a i + ū i .<br />
Then, what we have is<br />
y it − ȳ i = β 1 (x it − ¯x i ) + (u it − ū i ),<br />
t = 1, 2, . . . , T<br />
or<br />
ÿ it = β 1 ẍ it + ü it , t = 1, 2, . . . , T .<br />
We may apply the pooled OLS on the last equation.<br />
23 / 40
Fixed Effect Estima<strong>to</strong>r<br />
The FE estima<strong>to</strong>r uses information from within group (i) variation:<br />
ÿ i1 = y i1 − ȳ i<br />
ÿ i2 = y i2 − ȳ i<br />
.<br />
ÿ iT = y iT − ȳ i<br />
For this reason, the FE estima<strong>to</strong>r is also called within estima<strong>to</strong>r.<br />
This can be readily extended <strong>to</strong> a multiple regression model:<br />
ÿ it = β 1 ẍ 1it + β 2 ẍ 2it + . . . + β k ẍ kit + ü it<br />
24 / 40
Fixed Effect Estima<strong>to</strong>r<br />
FD vs. FE<br />
If T = 2, the FD estima<strong>to</strong>r and the FE estima<strong>to</strong>r are identical:<br />
( )<br />
yi1 + y i2<br />
ÿ i2 ≡ y i2 − ȳ i = y i2 −<br />
= y 1 − y 2<br />
≡ 1 2<br />
2 2 ∆y i2.<br />
Therefore,<br />
ÿ i2 = β 1 ẍ it + ü it<br />
⇐⇒ 1 2 ∆y i2 = β 1<br />
1<br />
2 ∆x i2 + 1 2 ∆u i2<br />
⇐⇒ ∆y i2 = β 1 ∆x i2 + ∆u i2<br />
However, they are different in a finite sample if T > 2. Unless there is<br />
a unit root (or severe serial correlation) problem, you would better use<br />
the FE estima<strong>to</strong>r.<br />
25 / 40
Random Effect Estima<strong>to</strong>r<br />
In the random effect model:<br />
we assume that<br />
y it = β 0 + β 1 x it + a i + u it ,<br />
Cov(x it , a i ) = 0.<br />
Then, we come back <strong>to</strong> the ‘nice’ world where we don’t need <strong>to</strong><br />
cancel out a i . Just use the pooled OLS?<br />
26 / 40
Random Effect Estima<strong>to</strong>r<br />
In the random effect model:<br />
we assume that<br />
y it = β 0 + β 1 x it + a i + u it ,<br />
Cov(x it , a i ) = 0.<br />
Then, we come back <strong>to</strong> the ‘nice’ world where we don’t need <strong>to</strong><br />
cancel out a i . Just use the pooled OLS?<br />
No. There is a serial correlation problem.<br />
26 / 40
Random Effect Estima<strong>to</strong>r<br />
Serial Correlation in the RE model<br />
We have two components in the error term:<br />
v it = a i + u it<br />
Suppose that u it is <strong>to</strong>tally innocuous again:<br />
Cov(a i , u it ) = Cov(u it , u is ) = 0 for t ≠ s.<br />
Now, we calculate Corr(v it , v is ) and show that it is not zero:<br />
Var(v it ) = Var(a i + u it ) = σ 2 a + σ 2 u<br />
Cov(v it , v is ) = E ((a i + u it )(a i + u is ))<br />
= E(a 2 i + a i u is + a i u it + u it u is )<br />
= E(a 2 i ) = σ 2 a<br />
27 / 40
Random Effect Estima<strong>to</strong>r<br />
Serial Correlation in the RE model<br />
Therefore,<br />
Corr(v it , v is ) =<br />
σ2 a<br />
σ 2 a + σ 2 u<br />
≠ 0<br />
Any inference based on the pooled OLS would be incorrect.<br />
However, we know how <strong>to</strong> fix this problem. Do GLS!<br />
We want <strong>to</strong> transform the original model in<strong>to</strong><br />
ỹ it = β 0 + β 1˜x it + ṽ it<br />
where ṽ it does not have the serial correlation anymore.<br />
28 / 40
Random Effect Estima<strong>to</strong>r<br />
We multiplied ρ and <strong>to</strong>ok a difference when there is a AR(1) serial<br />
correlation. In this case, we multiply<br />
and take a difference as<br />
[<br />
σu<br />
2 λ = 1 −<br />
σu 2 + T σa<br />
2<br />
] (1/2)<br />
y it − λȳ i = β 0 (1 − λ) + β 1 (x it − λ¯x i ) + v it − λ¯v i<br />
We can show that ṽ it (= v it − λ¯v i ) is not serially correlated.<br />
The λ should be estimated by ˆλ.<br />
This specific GLS estima<strong>to</strong>r is called the Random Effect (RE)<br />
estima<strong>to</strong>r.<br />
29 / 40
Random Effect Estima<strong>to</strong>r<br />
The RE estima<strong>to</strong>r is something between the pooled OLS and the FE<br />
estima<strong>to</strong>r. Note that in Equation:<br />
y it − λȳ i = β 0 (1 − λ) + β 1 (x it − λ¯x i ) + v it − λ¯v i ,<br />
it becomes the pooled OLS when λ = 0, and does the FE estima<strong>to</strong>r<br />
when λ = 1.<br />
The λ is always between 0 and 1 in the RE model.<br />
As T → ∞, the FE and RE estima<strong>to</strong>rs are equivalent since λ → 1.<br />
30 / 40
Random Effect Estima<strong>to</strong>r<br />
RE vs. FE<br />
If you believe that there is obvious endogenous fixed fac<strong>to</strong>r, a i , in<br />
your model, you should use the FE estima<strong>to</strong>r.<br />
Otherwise, the RE estima<strong>to</strong>r will tell you more: non time-varying<br />
regressors, efficiency etc.<br />
Keep in mind that the RE estima<strong>to</strong>r is not even consistent if<br />
Cov(x it , a i ) ≠ 0.<br />
We can test whether Cov(x it , a i ) = 0 or not.<br />
31 / 40
Random Effect Estima<strong>to</strong>r<br />
Hausman Test<br />
The idea of the Hausman test is simple. The null hypothesis is<br />
H 0 :Cov(x it , a i ) = 0<br />
H 1 :Cov(x it , a i ) ≠ 0<br />
Under H 0 , both RE and FE are consistent:<br />
p p<br />
̂β RE → β, ̂βFE → β.<br />
Thus, we can expect that ̂β RE ≈ ̂β FE .<br />
However, under H 1 , only ̂β FE is consistent. Therefore, we reject H 0 if<br />
the difference between ̂β RE and ̂β FE is large enough.<br />
32 / 40
Outline<br />
<strong>Data</strong> Structure<br />
Policy Evaluation with Pooled Cross Sections<br />
Three Approaches in <strong>Panel</strong> <strong>Data</strong> Estimation<br />
First Difference (FD) Estima<strong>to</strong>r<br />
Fixed Effect (FE) Estima<strong>to</strong>r<br />
Random Effect (RE) Estima<strong>to</strong>r<br />
Empirical Application: Smoking on Birth Outcomes<br />
Concluding Remarks<br />
33 / 40
Empirical Application: Smoking on Birth Outcomes<br />
“Infants born <strong>to</strong> women who smoke during pregnancy have a lower average<br />
birthweight... Low birthweight is associated with increased risk for<br />
neonatal, perinatal, and infant morbidity and mortality.”<br />
(Women and Smoking: A Report of the Surgeon General, 2001, requoted from<br />
Abrevaya (2006))<br />
34 / 40
Empirical Application: Smoking on Birth Outcomes<br />
The direct medical costs: According <strong>to</strong> the estimates of Lewit et al.<br />
(1995), the low-birthweight (LBW) infants (less than 10% of births)<br />
account for more than 1/3 of health care costs during the first year of life.<br />
The long-term costs:<br />
“Hack et al. (1995) find that LBW babies have developmental problems in<br />
cognition, attention and neuromo<strong>to</strong>r functioning that persist until<br />
adolescence.” (Abrevaya (2006))<br />
35 / 40
Empirical Application: Smoking on Birth Outcomes<br />
How <strong>to</strong> Estimate<br />
The OLS estimates would be biased in<strong>to</strong> the negative direction due <strong>to</strong><br />
endogeneity.<br />
IV estimation?<br />
492 Comparison between J. ABREVAYA OLS and IV estimates<br />
from Abrevaya (2006)<br />
36 / 40
Empirical Application: Smoking on Birth Outcomes<br />
The fixed-effect (FE) estimation can be used if panel data are<br />
available.<br />
Abrevaya (2006) constructed a pseudo panel data set and showed<br />
that the FE estimate is smaller than that of the OLS.<br />
y ib = x ′ ib β + γs ib + c i + u ib<br />
where i is Mom’s id and b is the order of a baby from Mom i.<br />
The estimation results for γ by OLS and FE are −243.27(3.20)<br />
−144.04(4.75), respectively.<br />
37 / 40
Concluding Remarks<br />
38 / 40
Concluding Remarks<br />
Pooled cross sections are very similar <strong>to</strong> a single cross section, but<br />
observations across different time points help evaluate the correct<br />
policy effect.<br />
38 / 40
Concluding Remarks<br />
Pooled cross sections are very similar <strong>to</strong> a single cross section, but<br />
observations across different time points help evaluate the correct<br />
policy effect.<br />
Extra information contained in panel data enables us <strong>to</strong> control for<br />
the individual fixed effect by FD and FE estima<strong>to</strong>rs.<br />
38 / 40
Concluding Remarks<br />
Pooled cross sections are very similar <strong>to</strong> a single cross section, but<br />
observations across different time points help evaluate the correct<br />
policy effect.<br />
Extra information contained in panel data enables us <strong>to</strong> control for<br />
the individual fixed effect by FD and FE estima<strong>to</strong>rs.<br />
If the fixed effect is not correlated with regressors, we can apply RE<br />
estima<strong>to</strong>r, which is a GLS estima<strong>to</strong>r.<br />
38 / 40
Concluding Remarks<br />
Pooled cross sections are very similar <strong>to</strong> a single cross section, but<br />
observations across different time points help evaluate the correct<br />
policy effect.<br />
Extra information contained in panel data enables us <strong>to</strong> control for<br />
the individual fixed effect by FD and FE estima<strong>to</strong>rs.<br />
If the fixed effect is not correlated with regressors, we can apply RE<br />
estima<strong>to</strong>r, which is a GLS estima<strong>to</strong>r.<br />
<strong>Panel</strong> data are not restricted <strong>to</strong> the individual-time structure.<br />
38 / 40
Stata Commands<br />
Load the data set filename.dta.<br />
First, we need <strong>to</strong> set an id variable and a time variable. Check the<br />
relevant variable names.<br />
xtset id time.<br />
Now type xtsum.<br />
The command for the FE estima<strong>to</strong>r is<br />
xtreg dep x1 x2 x3, . . ., fe<br />
The command for the RE estima<strong>to</strong>r is<br />
xtreg dep x1 x2 x3, . . ., re<br />
39 / 40