VAR models Fabio Canova ICREA-UPF, AMeN and CEPR March ...
VAR models Fabio Canova ICREA-UPF, AMeN and CEPR March ...
VAR models Fabio Canova ICREA-UPF, AMeN and CEPR March ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>VAR</strong> <strong>models</strong><br />
<strong>Fabio</strong> <strong>Canova</strong><br />
<strong>ICREA</strong>-<strong>UPF</strong>, <strong>AMeN</strong> <strong>and</strong> <strong>CEPR</strong><br />
<strong>March</strong> 2009
Outline<br />
• Wold Theorem <strong>and</strong> <strong>VAR</strong> Specification<br />
• Coefficients <strong>and</strong> covariance matrix estimation<br />
• Computing impulse responses, variance <strong>and</strong> historical decompositions.<br />
• Structural <strong>VAR</strong>s.<br />
• Interpretation problems.
References<br />
Hamilton, J., (1994), Time Series Analysis, Princeton University Press, Princeton, NJ,<br />
ch.10-11.<br />
<strong>Canova</strong>, F., (1995), ”<strong>VAR</strong> Models: Specification, Estimation, Inference <strong>and</strong> Forecasting”,<br />
in H. Pesaran <strong>and</strong> M. Wickens, eds., H<strong>and</strong>book of Applied Econometrics, Ch.2,Blackwell,<br />
Oxford, UK.<br />
<strong>Canova</strong>, F., (1995), ”The Economics of <strong>VAR</strong> Models”, in K. Hoover, ed., Macroecono-<br />
metrics: Tensions <strong>and</strong> Prospects, KluwerPress,NY,NY.<br />
Blanchard, O. <strong>and</strong> Quah, D. (1989), ”The Dynamic Effect of Aggregate Dem<strong>and</strong> <strong>and</strong><br />
Supply Disturbances”, American Economic Review, 79, 655-673.<br />
<strong>Canova</strong>, F. <strong>and</strong> Pina, J. (2005) ”Monetary Policy Misspecification in <strong>VAR</strong> <strong>models</strong>”, in C.<br />
Diebolt, <strong>and</strong> Krystou, C. (eds.) New Trends In Macroeconomics, Springer Verlag.
<strong>Canova</strong>, F. <strong>and</strong> De Nicolo, G (2002), ” Money Matters for Business Cycle Fluctuations<br />
in the G7”, Journal of Monetary Economics, 49, 1131-1159.<br />
Cooley, T. <strong>and</strong> Dwyer, M. (1998), ”Business Cycle Analysis without much Theory: A<br />
Look at Structural <strong>VAR</strong>s, Journal of Econometrics, 83, 57-88.<br />
Faust, J. (1998), ” On the Robustness of Identified <strong>VAR</strong> Conclusions about Money” ,<br />
Carnegie-Rochester Conference Series on Public Policy, 49, 207-244.<br />
Faust, J. <strong>and</strong> Leeper, E. (1997), ”Do Long Run Restrictions Really Identify Anything?”,<br />
Journal of Business <strong>and</strong> Economic Statistics, 15, 345-353.<br />
Hansen, L. <strong>and</strong> Sargent, T., (1991), ”Two Difficulties in Interpreting Vector Autoregressions”,<br />
in Hansen, L. <strong>and</strong> Sargent, T., (eds.), Rational Expectations Econometrics,<br />
Westview Press: Boulder London.<br />
Kilian, L. (1998), ”Small Sample confidence Intervals for Impulse Response Functions”,<br />
Review of Economics <strong>and</strong> Statistics, 218-230.
Lippi, M. <strong>and</strong> Reichlin, L., (1993), ”The Dynamic Effect of Aggregate Dem<strong>and</strong> <strong>and</strong><br />
Supply Disturbances: A Comment”, American Economic Review, 83, 644-652.<br />
Lippi, M. <strong>and</strong> Reichlin, L., (1994), ”<strong>VAR</strong> Analysis, Non-Fundamental Representation,<br />
Blaschke Matrices”, Journal of Econometrics, 63, 307- 325.<br />
Marcet, A. (1991), ”Time Aggregation of Econometric Time Series ”, in Hansen, L. <strong>and</strong><br />
Sargent, T., (eds.), Rational Expectations Econometrics, WestviewPress: Boulder&<br />
London.<br />
Sims, C. <strong>and</strong> Zha, T. (1999), “Error B<strong>and</strong>s for Impulse Responses”, Econometrica, 67,<br />
1113-1155.<br />
Sims, C., Stock, J. <strong>and</strong> Watson, M. (1990), ”Inference in Linear Time Series Models with<br />
some unit roots”, Econometrica, 58, 113-144.<br />
Chari, V., Kehoe, P. <strong>and</strong> McGrattan, E. (2004) A critique of Structural <strong>VAR</strong>s using<br />
Business cycle theory, Fed of Minneapolis, working paper 631.
Fern<strong>and</strong>ez Villaverde, J., Rubio Ramirez, J., Sargent, T. <strong>and</strong> M. Watson (2007) The ABC<br />
<strong>and</strong> (D’s) to underst<strong>and</strong> <strong>VAR</strong>s, American Economic Review.<br />
Uhlig, H. (2005) What are the Effects of Monetary Policy? Results from an agnostic<br />
Identification procedure, it Journal of Monetary Economics.<br />
Erceg, C, Guerrieri, L. <strong>and</strong> Gust, C. (2005) Can long run restrictions identify technology<br />
shocks?, Journal of the European Economic Association.<br />
Giordani, P. (2004) ” An Alternative Explanation of the Price Puzzle”, Journal of Mone-<br />
tary Economics, 51, 1271-1296.<br />
Dedola, L. <strong>and</strong> Neri, S. (2007), ”What does a technology shock do? A <strong>VAR</strong> analysis with<br />
model-basedsignrestrictions”,Journal of Monetary Economics.
1 Preliminary<br />
• Lag Operator: �yt = yt−1; � i yt = yt−i, whereyt m × 1 vector.<br />
• Matrix lag Operator (a0 = I normalization):<br />
a(�)yt ≡ a0yt + a1�yt + a2� 2 yt + .......aq� q yt<br />
= yt + a1yt−1 + a2yt−2 + .......aqyt−q (1)<br />
Example 1 yt = et + d1et−1 + d2et−2. Using the lag operator yt = et +<br />
d1�et + d2� 2 et or yt =(1+d1� + d2� 2 )et ≡ d(�)et.<br />
Example 2 yt = a1yt−1 + et. Using the lag operator yt = a1�yt + et or<br />
yt(1 − a1�) =et or a(�)yt = et.
2 What are <strong>VAR</strong>s?<br />
- They are multivariate autoregressive linear time series <strong>models</strong> of the form<br />
yt = A1yt−1 + A2yt−2 + ...+ Aqyt−q + et et ∼ (0, Σe) (2)<br />
where yt is a m × 1 vector <strong>and</strong> Aj are m × m matrices each j =1,...q.<br />
Advantages:<br />
- Every variable is interdependent <strong>and</strong> endogenous.<br />
-Anyyt has a autoregressive representation under some conditions.<br />
- Simple to use <strong>and</strong> estimate.
Disadvantages :<br />
- <strong>VAR</strong> is a reduced form model; no economic interpretation is possible.<br />
- Potentially difficult to relate <strong>VAR</strong> dynamics with DSGE dynamics.
3 Woldtheorem<strong>and</strong>thenews<br />
Wold Theorem: Underlinearity <strong>and</strong> stationarity, any vector of time series<br />
y † t canbewrittenasy† t = ay−∞ + P ∞ j=0 Djet−j, wherey−∞ contains<br />
constants, et−j are the news at t − j, Dj are m × m matrices each j, <strong>and</strong><br />
a is a m × k matrix of coefficients.<br />
-Letyt ≡ y † t − ay−∞. Wold theorem tells us that, apart from initial<br />
conditions, time series are the accumulation over time of news.<br />
- A news et =1attimet has D0 effect on yt, D1 effect on yt+1, D2<br />
on yt+2, etc.. Henceyt is a moving average (MA) of the news, i.e. yt =<br />
D(�)et.
Two issues<br />
a) If Ft−1 is the information available at t − 1, the news are<br />
et = yt − E[yt|Ft−1] (3)<br />
• The news are unpredictable given the past (E(et|Ft−1) =0),butcontemporaneously<br />
correlated (et ∼ (0, Σe)).<br />
To give a name the news in each equation, need to find a matrix ˜P such<br />
that ˜P ˜P 0 = Σe. Then:<br />
yt = D(�) ˜P ˜P −1 et = ˜D(�)˜et ˜et ∼ (0, ˜P −1 Σe<br />
˜P −10<br />
= I) (4)<br />
Examples of ˜P: Choleski (lower triangular) factor; ˜P = PΛ 0.5 ;whereP is<br />
the eigenvector matrix, Λ the eigenvalue matrix, etc.
Example 3 If Σe =<br />
˜P −1 et ∼ (0,I).<br />
"<br />
1 4<br />
4 25<br />
b) The news are not uniquely defined.<br />
In fact, for any H such that HH 0 = I<br />
<strong>and</strong> E(et,e 0 t )=E(˜et, ˜e 0 t ).<br />
#<br />
its Choleski factor is ˜P =<br />
yt = D(�)et = D(�)HH 0 et = ˜D(�)˜et<br />
"<br />
1 4<br />
0 3<br />
#<br />
so that<br />
(5)
• St<strong>and</strong>ard packages choose the ”fundamental” news representation: i.e.<br />
the one for which D0 is the largest among all the Dj coefficients.<br />
• Some economic <strong>models</strong> imply non-fundamental representations (e.g.<br />
<strong>models</strong> where news are anticipated)- see later on.
<strong>VAR</strong>s<br />
• If the Dj coefficients decay to zero fast enough, D(�) isinvertible <strong>and</strong><br />
where I − A(�) =D(�) −1 .<br />
yt = D(�)et<br />
D(�) −1 yt = et<br />
yt = A(�)yt−1 + et (6)<br />
• A<strong>VAR</strong>(∞) can represents any vector of time series yt under linearity,<br />
stationarity <strong>and</strong> invertibility.<br />
• A <strong>VAR</strong>(q), q fixed, approximates yt well if Dj are close to zero for j<br />
large.
Summary<br />
- We can represent any data with a linear <strong>VAR</strong>(∞) under the assumptions<br />
made.<br />
-Withafinite sample of data need to carefully check the lag length of the<br />
<strong>VAR</strong> (news can’t be predictable).<br />
- If we want a constant coefficient representation, we need stationarity of<br />
yt.
4 Specification<br />
Many ways of choosing the lag length:<br />
A) Likelihood ratio (LR) test<br />
LR = 2[lnL(α un , Σ un<br />
e ) − ln L(α re , Σ re<br />
e )] (7)<br />
= T (ln |Σ re<br />
e | − ln |Σun e |) D → χ 2 (ν) (8)<br />
where L is the likelihood function, ”UN”(”RE”) denotes the unrestricted<br />
(restricted) estimator, ν = number of restrictions of the form R(α) =0.<br />
• LR test biased in small samples. If T small, use<br />
LR c =(T − qm)(ln |Σ re | − ln |Σ un |)<br />
where q = number of lags, m =numberofvariables.
• Sequential testing approach<br />
1) Choose an upper ¯q<br />
2) Test <strong>VAR</strong>(¯q − 1) against <strong>VAR</strong>(¯q), if not reject<br />
3) Test <strong>VAR</strong>(¯q − 2) against <strong>VAR</strong>(¯q − 1)<br />
4) Continue until rejection.<br />
ML ratio is an in-sample criteria. What if we are interested in out-of-sample<br />
forecasting exercises?
Let Σy(1) =<br />
T +mq<br />
T Σe.<br />
B) AIC criterion: minq AIC(q) =ln|Σy(1)|(q)+ 2qm2<br />
T<br />
• AIC is inconsistent. It overestimates true order q with positive probability.<br />
C) HQC criterion: minq HQC(q) =ln|Σy(1)|(q)+(2qm2 )<br />
ln ln T<br />
T<br />
• HQC is consistent (in probability).<br />
D) SWC criterion: minq SWC(q) =ln|Σy(1)|(q)+(qm2 )<br />
ln T<br />
T<br />
• SWC is strongly consistent (in a.s.).
• Criteria B)-D) trade-off the fit ofthemodel(thesizeofΣe) withthe<br />
number of parameters of the model m∗q for a given sample size T . Hence<br />
criteria B)-D) prefer smaller to larger scale <strong>models</strong>.<br />
Criterion T=40 T=80 T=120 T=200<br />
q=2q=4q=6q=2q=4q=6q=2 q=4 q=6 q=2q=4q=6<br />
AIC 1.6 3.2 4.8 0.8 1.6 2.4 0.53 1.06 1.6 0.32 0.64 0.96<br />
HQC 0.52 4.17 6.26 1.18 2.36 3.54 0.83 1.67 2.50 0.53 1.06 1.6<br />
SWC 2.95 5.9 8.85 1.75 3.5 5.25 1.27 2.55 3.83 0.84 1.69 2.52<br />
Table 1: Penalties of AIC, HQC, SWC, m=4<br />
- Penalties increase with q <strong>and</strong> fall with T . Penalty of SWC is the harshest.<br />
- Ivanov <strong>and</strong> Kilian (2006): Quality of B)-D) depends on the frequency of<br />
data <strong>and</strong> on the DGP. Typically HQC more appropriate.
• Criteria A)-D) must be applied to the system not to single equations.<br />
Example 4 <strong>VAR</strong> for the Euro area, 1980:1-1999:4; use output, prices, interest<br />
rates <strong>and</strong> M3, set ¯q =7.<br />
Hypothesis LR LR c q AIC HQC SWC<br />
q=6 vs. q=72.9314e-5(∗) 0.0447 7 -7.556 -6.335 -4.482<br />
q=5 vs. q=6 3.6400e-4 0.1171 6 -7.413 -6.394 -4.851<br />
q=4 vs. q=5 0.0509 0.5833 5 -7.494 -6.675 -5.437<br />
q=3 vs. q=4 0.0182 0.4374 4 -7.522 -6.905 -5.972<br />
q=2 vs. q=3 0.0919 0.6770 3-7.635(∗)-7.219(∗) -6.591<br />
q=1 vs. q=2 3.0242e-7 6.8182e-3(∗)2 -7.226 -7.012 -6.689(∗)<br />
Table 2: Tests for the Lag length of a <strong>VAR</strong><br />
• Different criteria choose different lag lenghts.
Checking Stationarity<br />
All variable stationary/ all unit roots → easy.<br />
Some cointegration. Transform <strong>VAR</strong> into VECM.<br />
• Impose cointegration restrictions.<br />
• Disregard cointegration restrictions.<br />
Data are stationary. Can’t see it because of small samples.<br />
If Bayesian: stationarity/nonstationarity issue does not matter for inference.
Checking for Breaks<br />
Wald test: yt =(A1(�)I1)yt−1 +(A2(�)I2)yt−1 + et<br />
I1 =0fort ≤ t1; I1 =1fort>t1 <strong>and</strong> I2 =1− I1.<br />
Use S(t1,T)=T (ln |Σre e | − ln |Σun e |) D → χ2 (ν); ν = dim(A1(�)) (Andrew<br />
<strong>and</strong> Ploberger (1994)).<br />
If t1 unknown, but belongs [t l ,t u ] compute S(t1,T)forallthet1 in the<br />
interval. Check for breaks using maxt1 S(t1,T).
5 Alternative Representation of <strong>VAR</strong>(q)<br />
Consider<br />
yt, et m × 1 vectors; et ∼ (0, Σe).<br />
yt = A(�)yt−1 + et<br />
Different representation useful for different purposes.<br />
- Companion form useful for computing moments, ML estimators.<br />
- Simultaneous equation useful for evaluating the likelihood <strong>and</strong> computing<br />
restricted estimates.<br />
(9)
5.1 Companion form<br />
• Transform a m-variable <strong>VAR</strong>(p) into a mp-variable <strong>VAR</strong>(1).<br />
Example 5 Consider a <strong>VAR</strong>(3). Let Yt =[yt,yt−1,yt−2] 0 ; Et =[et, 0, 0] 0 ;<br />
<strong>and</strong><br />
A =<br />
⎡<br />
⎢<br />
⎣<br />
A1 A2 A3<br />
Im 0 0<br />
0 Im 0<br />
Then the <strong>VAR</strong>(3) can be rewritten as<br />
⎤<br />
⎥<br />
⎦ Σ E =<br />
⎡<br />
⎢<br />
⎣<br />
Σe 0 0<br />
0 0 0<br />
0 0 0<br />
Yt = AYt−1 + Et Et ∼ N(0, Σ E) (10)<br />
where Yt, Et are 3m × 1 vectors <strong>and</strong> A is 3m × 3m.<br />
⎤<br />
⎥<br />
⎦
5.2 Simultaneous equations setup (SES)<br />
There are two alternative representations:<br />
1) Let xt =[yt−1,yt−2,...]; X =[x1,...,x T ] 0 (a T × mq matrix), Y =<br />
[y1,...,y T ] 0 (a T ×m matrix); <strong>and</strong> if A =[A 0 1 ,...A0 q] 0 is a mq ×m matrix<br />
Y = XA + E (11)<br />
2) Let i indicate the subscript for the i − th column vector. The equation<br />
for variable i is yi = xαi + ei. Stacking the columns of yi,ei into where<br />
mT × 1 vectors we have<br />
y = (Im ⊗ x)α + e ≡ Xα + e (12)
6 Parameters <strong>and</strong> covariance matrix estimation<br />
6.1 Unrestricted <strong>VAR</strong>(q)<br />
Assume that y−q+1,...,y0 are known <strong>and</strong> et ∼ N(0, Σe) then<br />
where A 0 1<br />
yt|(yt−1,...,y0,y−1,y−q+1) ∼ N(A(�)yt−1, Σe) (13)<br />
∼ N(A 0 1 Yt−1, Σe) (14)<br />
is the first row of A (m × mq). Let α = vec(A1).
Since f(yt|yt−1,...,y−q+1) = Q<br />
j f(yj|yj−1,...,y−q+1) ln L(α, Σe) = X<br />
ln L(yj|yj−1,...,y−q+1)<br />
Setting<br />
∂ ln L(α,Σe)<br />
∂α<br />
A 0 1,ML<br />
j<br />
= − Tm<br />
2 ln(2π)+T<br />
2<br />
− 1 X<br />
2<br />
t<br />
=0wehave<br />
= [<br />
<strong>and</strong>j-thcolumn(a1× mq vector) is<br />
A 0 1j,ML =[X<br />
t<br />
ln |Σ−1<br />
e |<br />
(yt − A 0 1 Yt−1) 0 Σ −1<br />
e (yt − A 0 1 Y j−1) (15)<br />
TX<br />
Yt−1Y<br />
t=1<br />
0 t−1 ]−1 TX<br />
[ Yt−1y<br />
t=1<br />
0 t ]=A01,OLS Yt−1Y0 t−1 ]−1 TX<br />
[ Yt−1yjt] =A<br />
t=1<br />
0 1j,OLS<br />
(16)
Why is OLS equivalent to maximum likelihood?<br />
- Because, if the initial conditions are known, maximizing the log-likelihood<br />
is equivalent to minimizing the sum of square errors!<br />
Why is it that single equation OLS is the same as full information maximum<br />
likelihood?<br />
- Because we have the same regressors in every equation!
Plugging A 1,ML into ln L(α, Σe), we obtain the concentrated likelihood<br />
ln L(Σe) = − T<br />
2<br />
1<br />
(m ln(2π)+ln|Σ−1 e |) −<br />
2<br />
t=1<br />
where et,ML =(yt− A1,MLYt−1). Using ∂(b0Qb) we have<br />
<strong>and</strong> σ i,i 0 = 1 T<br />
∂ ln L(Σe)<br />
∂Σ<br />
Σ 0 ML 6= Σ0 OLS<br />
= T 2 Σ0 e − 1 2<br />
Σ 0 ML<br />
TX<br />
∂Q = b0b; PTt=1 et,MLe0 t,ML =0or<br />
= 1<br />
T<br />
P Tt=1 e i 0 t,ML e 0 it,ML .<br />
= 1<br />
T −1<br />
P Tt=1 e t,MLe 0 t,ML<br />
TX<br />
et,MLe t=1<br />
0 t,ML<br />
e 0 t,ML Σ−1<br />
e e t,ML (17)<br />
∂ ln |Q|<br />
∂Q =(Q0 ) −1<br />
butequivalentforlargeT .<br />
(18)
6.2 <strong>VAR</strong>(q) with restrictions<br />
Assume restrictions are of the form α = Rθ+r, whereR is mk×k1 matrix<br />
of rank k1; r is a mk × 1 vector; θ a k1 × 1vector.<br />
Example 6 i) Lag restrictions: Aq =0.Herek1 = m2 (q − 1), r =0,<strong>and</strong><br />
R =[Im1 , 0].<br />
ii) Block exogeneity of y2t in a bivariate <strong>VAR</strong>(2). Here<br />
R = blockdiag[R1,R2], whereRi, i =1, 2 is upper triangular.<br />
iii) Cointegration restrictions.
Plugging the restrictions in (12) we have<br />
y =(Im ⊗ x)α + e =(Im ⊗ x)(Rθ + r)+e<br />
Let y † ≡ y − (I ⊗ x)r =(I ⊗ x)Rθ + e. Since<br />
∂ ln L<br />
∂θ<br />
= R∂ ln L<br />
∂α :<br />
θML = [R 0 (Σ −1<br />
e ⊗ x 0 x)R] −1 R[Σ −1<br />
e ⊗ x]y †<br />
(19)<br />
αML Σ<br />
= RθML + r (20)<br />
0 e = 1 X<br />
eMLe T<br />
0 ML<br />
(21)<br />
t
• For a <strong>VAR</strong>(q) without restrictions:<br />
Summary<br />
-ML<strong>and</strong>OLSestimatorsofA1 coincide.<br />
- OLS estimation of A1, equation by equation, is consistent <strong>and</strong> efficient<br />
(if assumptions are correct).<br />
- OLS <strong>and</strong> ML estimators of Σe asymptotically coincide for large T .
• For a <strong>VAR</strong>(q) with restrictions:<br />
-MLestimatorofA1 is different from the OLS estimator.<br />
-MLisconsistent/efficient if restrictions are true. It is inconsistent if<br />
restrictions are false.<br />
In general:<br />
- OLS consistent if stationarity assumption is wrong (t-tests incorrect).<br />
- OLS inconsistent if lag length wrong (regressors correlated with error<br />
term).
7 Summarizing the results<br />
Unusual to report estimates if <strong>VAR</strong> coefficients, st<strong>and</strong>ard errors <strong>and</strong> R 2 .<br />
-Mostof<strong>VAR</strong>coefficients insignificant.<br />
- R 2 always exceeds 0.99.<br />
How do we summarize results in an informative way?
7.1 Impulse responses (IR)<br />
• What is the effect of a surprise cut in interest rates on inflation?<br />
• It traces out the MAR of yt.<br />
Three ways to calculate impulse responses:<br />
- Recursive approach.<br />
- Non-recursive approach.<br />
- Forecast revisions.
• Recursive method.<br />
Assume we have an estimate Aj. ThenDτ =[D i,i0<br />
τ ]= Pmax[τ,q]<br />
j=1<br />
where τ refers to the horizon, D0 = I, Dj =0∀ j ≥ q.<br />
Aτ−jDj,<br />
Example 7 Suppose yt =¯y + A1yt−1 + A2yt−2 + et. Then applying the<br />
formula we have D0 = I, D1 = D0A1, D2 = D1A1 + D0A2, ...,<br />
D k = D k−1A1 + D k−2A2 + ...+ D k−qAq.<br />
For orthogonal news: ˜Pe ˜P 0 e = Σe then ˜D k = D k ˜Pe.
Sometimes useful to calculate multipliers to the news.<br />
• Long run multiplier D(1) = (A0 + A1 + ...+ Aq) −1<br />
• Partial multipliers, up to horizon τ, are( P τ j=0 Aj) −1 .
7.2 Variance decomposition: τ-steps ahead forecast error<br />
• How much of the variance of, say, output is due to supply shocks?<br />
Uses:<br />
yt+τ − yt(τ) =<br />
τ−1<br />
X<br />
j=0<br />
yt(τ) istheτ-steps ahead prediction of yt.<br />
˜Dj˜et+τ−j D0 = I (22)<br />
Computes share of the variance of yi,t+τ − yi,t(τ) due to each ˜e i 0 ,t+τ−j ,<br />
i, i 0 =1, 2,...,m.
7.3 Historical decomposition<br />
• What is the contribution of supply shocks to the productivity revival of<br />
the late 1990s?<br />
Let ˆyi,t(τ) =yi,t+τ − yi,t(τ) betheτ-steps ahead forecast error in the i-th<br />
variable of the <strong>VAR</strong>. Then:<br />
ˆyi,t(τ) =<br />
mX<br />
i 0 =1<br />
˜D i0<br />
(�)˜e i 0 t+τ<br />
- Computes the path of ˆyi,t(τ) due to each ˜e i 0.<br />
(23)<br />
• Same ingredients are needed to compute impulse responses, the variance<br />
<strong>and</strong> the historical decompositions. Different packaging!!
Example 8 US data for (Y,π, R, M1) for 1973:1-1993:12. Othogonalize<br />
using a Choleski decomposition. What is the effect of a money shock?<br />
1.0<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0.0<br />
-0.2<br />
-0.4<br />
response to a shock in money<br />
0 5 10 15 20<br />
gnp<br />
prices<br />
interest<br />
money
What is the contribution of various shocks to var(y) <strong>and</strong> var(π)?<br />
Y π<br />
HorizonShock1Shock2Shock3Shock4Shock1Shock2Shock3Shock4<br />
4 0.99 0.001 0.003 0.001 0.07 0.86 0.01 0.03<br />
12 0.93 0.01 0.039 0.02 0.24 0.60 0.08 0.07<br />
24 0.79 0.01 0.15 0.04 0.52 0.36 0.07 0.04<br />
Table 3: Variance decomposition, percentages
Historical decomposition of GDP, conditional on 1989 information.<br />
4.55<br />
4.50<br />
4.45<br />
4.40<br />
4.35<br />
4.30<br />
4.25<br />
4.20<br />
4.55<br />
4.50<br />
4.45<br />
4.40<br />
4.35<br />
4.30<br />
4.25<br />
4.20<br />
Shocks in gnp<br />
1975 1976 1977 1978<br />
Shocks in prices<br />
Historical decomposition of gnp<br />
variable<br />
baseline<br />
shocks<br />
variable<br />
baseline<br />
shocks<br />
1975 1976 1977 1978<br />
4.55<br />
4.50<br />
4.45<br />
4.40<br />
4.35<br />
4.30<br />
4.25<br />
4.20<br />
4.55<br />
4.50<br />
4.45<br />
4.40<br />
4.35<br />
4.30<br />
4.25<br />
4.20<br />
Shocks in interest<br />
variable<br />
baseline<br />
shocks<br />
1975 1976 1977 1978<br />
Shocks in money<br />
variable<br />
baseline<br />
shocks<br />
1975 1976 1977 1978
8 Identification: Obtaining S<strong>VAR</strong>s<br />
8.1 Why Structural <strong>VAR</strong>s<br />
<strong>VAR</strong>s are reduced form <strong>models</strong>. Therefore:<br />
• Shocks are linear combination of meaningful economic disturbances.<br />
• Difficult to relate responses computed from <strong>VAR</strong>s with responses of<br />
theoretical <strong>models</strong>.<br />
• Can’t be used for policy analyses (Lucas critique).
What is a S<strong>VAR</strong>? It is a linear dynamic structural model of the form:<br />
Its reduced form is:<br />
A0yt = A1yt−1 + ...+ Aqyt−q + εt εt ∼ (0, Σε) (24)<br />
yt = A1yt−1 + ...+ Aqyt−q + et et ∼ (0, Σe) (25)<br />
where Aj = AjA −1<br />
0 , et = A −1<br />
0 εt.<br />
We want to go from (25) to (24), since (25) is easy to estimate (just use<br />
OLS equation by equation). To do this, we need A0. But to estimate it,<br />
we need restrictions, since Aj, Σe have less free parameters than A0, Σε.<br />
Distinguish: Stationary vs. Nonstationary <strong>VAR</strong>s.
8.2 Stationary <strong>VAR</strong>s<br />
<strong>VAR</strong> : yt = A(�)yt−1 + et et ∼ (0, Σe) (26)<br />
S<strong>VAR</strong> : A0yt = A(�)yt−1 + �t �t ∼ (0, Σ� = diag{σi}) (27)<br />
Log linearized DSGE <strong>models</strong> are stationary S<strong>VAR</strong>s! We know<br />
y2t = A22y2t−1 + A21y3t (28)<br />
y1t = A11y2t−1 + A12y3t (29)<br />
where y2t are states, y1t are controls, y3t are shocks. So<br />
"<br />
A21<br />
A0 =<br />
0<br />
# −1<br />
0<br />
,<br />
A12<br />
"<br />
A21 A(�) =<br />
0<br />
# −1 "<br />
0<br />
A12<br />
A22 0<br />
A11 0<br />
#
(26) <strong>and</strong> (27) imply<br />
so that<br />
A0et = �t<br />
(30)<br />
A −1<br />
0 Σ�A 0 −1<br />
0 = Σe (31)<br />
To recover structural parameters from (31) we need at least as many<br />
equations as unknowns.<br />
• Order condition: If there are m variables, need m(m − 1)/2 restrictions.<br />
This is because there are m 2 free parameters on the left h<strong>and</strong> side of (31)<br />
<strong>and</strong> only m(m+1)/2 parameters in Σe (m 2 = m(m+1)/2+m(m−1)/2).<br />
• Rank condition: rank of A −1<br />
0 Σ�A 0−1 0<br />
• Just identified vs. overidentified.<br />
equal to the rank of Σe.
Example 9 i) Choleski decomposition of Σe has exactly m(m − 1)/2 zeros<br />
restrictions. Implications:<br />
- A −1<br />
0 is lower triangular.<br />
-Variablei does not affect variable i − 1 simultaneously, but it affects<br />
variable i +1.<br />
ii) yt =[GDPt,Pt,it,Mt]. Thenneed6restrictions,e.g.<br />
⎡<br />
⎢<br />
⎣<br />
1 0 0 0<br />
α01 1 0 α02<br />
0 0 1 α03<br />
α04 α05 α06 1<br />
⎤<br />
⎥<br />
⎦
How do you estimate a S<strong>VAR</strong>? Use a two-step approach:<br />
- Get (unrestricted) estimates of A(�) <strong>and</strong>Σe.<br />
- Use restrictions on A0 to estimate Σ� <strong>and</strong> free parameters of A0.<br />
-UseA(�) =A −1<br />
0 A(�) to trace out structural dynamics.<br />
Unless the system is in Choleski format, we need ML to estimate A0 in<br />
just identified systems (see appendix).<br />
For over-identified systems, always need ML to estimate A0.
Example 10 (Blanchard <strong>and</strong> Perotti, 2002) <strong>VAR</strong> with Tt,gt,Tt. Assume<br />
A0et = B�t where<br />
A0 =<br />
⎡<br />
⎢<br />
⎣<br />
1 0 a01<br />
0 1 a02<br />
a03 a04 1<br />
⎤<br />
⎥<br />
⎦ B =<br />
⎡<br />
⎢<br />
⎣<br />
1 b1 0<br />
b2 1 0<br />
0 0 1<br />
Impose that there is no discretionary response in Tt <strong>and</strong> gt to yt within the<br />
quarter (information delay).<br />
6+3 (variance) parameters, at most 6 parameters in Σe. Need additional<br />
restrictions. Get information about a01,a02 from external sources: impose<br />
either b1 =0or b2 =0<br />
With a01,a02 fixed, Two stage approach has a IV interpretation: �1t,�2t<br />
used a instruments in third equation.<br />
⎤<br />
⎥<br />
⎦
8.3 Nonstationary <strong>VAR</strong>s<br />
Let <strong>VAR</strong> <strong>and</strong> S<strong>VAR</strong> be:<br />
∆yt = D(�)et = D(1)et + D ∗ (�)∆et (32)<br />
∆yt = D(�)A0�t = D(�)(1)A0�t + D ∗ (�)A0∆�t (33)<br />
where D(�) =(I−A(�)�) −1 , D(�) =(1−A(�)�) −1 ,D∗ (�) ≡ D(�)−D(1)<br />
1−� ,<br />
D∗ (�) ≡ D(�)−D(1)<br />
1−� .Matchingcoefficients: D(�)A0�t = D(�)et.<br />
Separating permanent <strong>and</strong> transitory components <strong>and</strong> using for the latter<br />
only contemporaneous restrictions we have<br />
D(1)A0�t = D(1)et (34)<br />
A0∆�t = ∆et (35)<br />
If yt is stationary, D(1) = D(1) = 0 <strong>and</strong> (34) is vacuous.
Two types of restrictions to estimate A0: short <strong>and</strong> long run.<br />
Example 11 In a <strong>VAR</strong>(2)imposing (34) requires one restriction. Suppose<br />
that D(1) 12 =0(�2t has no long run effect on y1t). If Σ� = I, thethree<br />
elements of D(1)A0Σ�A 0 0 D(1)0 can be obtained from the Choleski factor<br />
of D(1)ΣeD(1) 0 .<br />
• Blanchard-Quah: decomposition in permanent-transitory components<br />
(use (34)-(35)). If yt =[∆y1t,y2t], (m × 1); y1t are I(1); y2t are I(0) <strong>and</strong><br />
yt =¯y + D(�)�t, where�t ∼ iid(0, Σ�)<br />
µ ∆y1t<br />
∆y2t<br />
<strong>and</strong> D1(1) = [1, 0].<br />
<br />
=<br />
µ ¯y1<br />
0<br />
<br />
+<br />
µ D1(1)<br />
0<br />
<br />
�t +<br />
µ (1 − �)D †<br />
1 (�)<br />
(1 − �)D †<br />
2 (�)<br />
-y2t could be any variable which is stationary <strong>and</strong> is influenced by both<br />
shocks<br />
<br />
�t<br />
(36)
-Choleskisystems<br />
Example 12<br />
Problems with st<strong>and</strong>ard identification:<br />
pt = a11e s t (37)<br />
yt = a21e s t + a21e d t (38)<br />
Price is set in advance of knowing dem<strong>and</strong> shocks. Choleski ordering with<br />
p first.<br />
This is equivalent to estimating p on lagged p <strong>and</strong> lagged y (this gives e s t )<br />
<strong>and</strong>thenestimatingy on lagged y, oncurrent<strong>and</strong>laggedp (this gives e s t ).
yt = a11e s t (39)<br />
pt = a21e s t + a21e d t (40)<br />
Quantity is set in advance of knowing dem<strong>and</strong> shocks. Choleski ordering<br />
with y first.<br />
This is equivalent to estimating y on lagged y <strong>and</strong> lagged p (this gives e s t )<br />
<strong>and</strong>thenestimatingp on lagged p, on current <strong>and</strong> lagged y (this gives e s t ).<br />
In general without a structural model in mind difficult to interpret Choleski<br />
systems. Cooley-LeRoy (1985): unless some strong restrictions are imposed<br />
dynamic <strong>models</strong> do not have a Choleski structure.
- Long run restrictions Faust-Leeper (1997).<br />
1.2<br />
0.8<br />
0.4<br />
-0.0<br />
-0.4<br />
-0.8<br />
-1.2<br />
Restriction not satified<br />
Long run restrictions<br />
5 10 15 20<br />
1.2<br />
0.8<br />
0.4<br />
-0.0<br />
-0.4<br />
-0.8<br />
-1.2<br />
5 10 15 20<br />
Restriction satified
- Long run restrictions Cooley-Dweyer (1998): take a RBC driven by a<br />
unit root technology shock. Simulate data. Run a <strong>VAR</strong> with(yt,nt) <strong>and</strong><br />
identify two shocks (permanent/transitory).Possible to do this. Transitory<br />
shocksexplainlargeportionofvarianceofyt.<br />
- Long run restrictions Erceg, et. al (2005): long run restrictions poor in<br />
small samples. Chari, et. al. (2006) potentially important truncation bias<br />
due to a <strong>VAR</strong>(q) q finite.
- Short run restrictions (<strong>Canova</strong>-Pina (2006))<br />
The DGP is a 3 equations New-Keynesian model<br />
True responses Inertial responses
Summary<br />
- Problematic to relate S<strong>VAR</strong> identified with Choleski, Short or Long restriction<br />
to theories.<br />
- Solution link more S<strong>VAR</strong> to theory: use restrictions which are more<br />
common in DSGE <strong>models</strong>
8.4 Alternative identification Scheme<br />
<strong>Canova</strong>-De Nicolo’ (2002), Faust (1998), Uhlig (1999): use sign (<strong>and</strong><br />
shape) restrictions.<br />
Example 13 i) Aggregate supply shocks: Y ↑, Inf ↓; aggregate dem<strong>and</strong><br />
shocks: Y ↑, Inf ↑ → dem<strong>and</strong> <strong>and</strong> supply shocks impose different<br />
sign restrictions on cov(Yt,INFs). Restrictions shared by a large class of<br />
<strong>models</strong> with different foundations. Use these for identification.<br />
ii) Monetary Shocks: response of Y is humped shaped, dies out in 3-4<br />
quarters → shape restrictions on cov(Yt,is). Use these for identification.
Exploit the non-uniqueness of news.<br />
- Given any set of orthogonal news, check if responses of yit to shocks εjt<br />
have the right sign. If not<br />
- Construct another set of news <strong>and</strong> repeat the exercise.<br />
-Stopwhenyoufind a εjt with the right characteristics or<br />
-Takealltherepresentationssatisfyingtherestrictions<strong>and</strong>computethe<br />
mean/ median (<strong>and</strong> s.e.) of the statistics of interest for all of those satisfying<br />
the restrictions.
Implementation of sign restrictions (<strong>Canova</strong>-De Nicolo’(2002)):<br />
• Orthogonalize Σe = ˜P ˜P 0 (e.g. Choleski or eigenvalue-eigenvector decomposition).<br />
• Check if any shock produces the correlation pattern for (yit, y i 0 t ). If not<br />
• For any H : HH 0 = I, Σe = ˜PHH 0 ˜P 0 = ˆP ˆP 0 .<br />
• Check if any shock under new decomposition produced the required<br />
correlation pattern for (yit, y i 0 t ). If not choose another H, etc.
• Number of H infinite. Write H = H(ω), ω ∈ (0, 2π). H(ω) are called<br />
rotation (Givens) matrices.<br />
Example 14 Suppose M=2. Then H(ω) =<br />
"<br />
#<br />
"<br />
cos(ω) −sin(ω)<br />
sin(ω) cos(ω)<br />
#<br />
or H(ω) =<br />
cos(ω)<br />
sin(ω)<br />
sin(ω)<br />
−cos(ω)<br />
. Varying ω, we trace out all possible structural MA<br />
representations that could have generated the data.
Example 15 Comparing responses to US moneary shocks 1964-2001.<br />
Prices<br />
Output<br />
Money<br />
10<br />
-5<br />
-10<br />
-15<br />
5<br />
0<br />
-6<br />
-12<br />
10<br />
-10<br />
-20<br />
-30<br />
6<br />
0<br />
0<br />
Sign restrictions<br />
0<br />
0<br />
0<br />
Horizon (Months)<br />
12<br />
6<br />
0<br />
-6<br />
10<br />
5<br />
0<br />
-5<br />
18<br />
9<br />
0<br />
-9<br />
Choleski restrictions<br />
0<br />
0<br />
0<br />
Horizon (Months)
Example 16 Studying the effects of fiscal shocks in US states: 1950-2005.<br />
corr(G,Y)corr(T,Y)corr(G, DEF)corr(T,DEF)corr(G,T)<br />
Gshocks > 0 > 0 > 0<br />
BB shocks < 0 = 0 = 1<br />
Tax shocks < 0 < 0 = 0<br />
Table 4: Identification restrictions
8.5 Sign restrictions in large systems<br />
• The use of rotation matrices is complicated in large scale systems since<br />
there are many rotations one needs to consider.<br />
Algorithm 8.1 1. Start from some orthogonal representation yt = D(�)�t<br />
2. Draw an m × m matrix G from N(0,1). Find G = QR.<br />
3. Compute responses as D 0 (�) =D(�) ∗ Q. Check if restrictions are satisfied.<br />
4. Repeat 2.-3. until L draws are found.
9 Interpretation problems with <strong>VAR</strong>s<br />
• Time Aggregation (Sargent-Hansen (1991), Marcet (1991)).<br />
- Agents take decisions at a frequency which is different than the frequency<br />
of the data available to the econometrician. What are the consequences?<br />
- The MA representation of the econometrician is a complex combination<br />
of the MA representation due to agents’ actions.
Example 17 A humped-shaped monthly response can be transformed into<br />
a smoothly declining quarterly response.<br />
Size of responses<br />
1.4<br />
1.2<br />
1.0<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0.0<br />
-0.2<br />
monthly<br />
quarterly<br />
0 2 4 6<br />
Horizon (Months)<br />
8 10 12 14<br />
How to detect aggregation problems? Run a <strong>VAR</strong> with data at different<br />
frequencies, if you can. Check if differences exists.
• Non-linearities<br />
Example 18 (Markov switching model). Suppose P (st =1|st−1 =1)=<br />
p, P (st =0|st−1 =0)=q. This process has a linear <strong>VAR</strong> representation<br />
st =(1− q)+(p + q − 1)st−1 + et<br />
<strong>and</strong>aslongaseitherp or q or both are less than one a MAR exist. Good!!!<br />
But: errors are non-normal (binomial). Conditional on st−1 =1<br />
Conditional on st−1 =0<br />
et = 1−p with probability p (41)<br />
= −p with probability 1 − p (42)<br />
et = −(1 − q) with probability q (43)<br />
= q with probability 1 − q (44)
• How do you check for normality/ nonlinearities<br />
-Ifet is normal:<br />
T 0.5<br />
"<br />
S3<br />
S4 − 3 ∗ Im<br />
#<br />
∼ N(0,<br />
Sj is the j-th estimated moment of et<br />
"<br />
6 ∗ Im 0<br />
0 24∗ Im<br />
-Regressêt on y 2 t−1 ,logyt−1, etc. Check significance.<br />
#<br />
)
• Stationarity is violated<br />
Example 19 Great Moderation.<br />
- Changes in the variance of the process are continuous. Can’t really use<br />
subsample analysis.<br />
- There exist a version of the Wold theorem without covariance stationarity.<br />
where var(et) =Σt.<br />
y † t = aty−∞ +<br />
∞X<br />
Djtet−j<br />
j=0<br />
• Usetimevaryingcoefficients <strong>VAR</strong>s with e.g. stochastic volatility
• Small Scale <strong>VAR</strong>s. People use them because:<br />
a) Estimates more precise<br />
b) Easier to identify shocks. But generate:<br />
- Omitted variables, Braun-Mittnik (1993).<br />
- Misaggregation of shocks, Cooley-Dweyer (1998), <strong>Canova</strong>-Pina (2006).
What is the consequence of omitting variables?<br />
In a bivariate <strong>VAR</strong>(q):<br />
A11(�)<br />
A21(�)<br />
A12(�)<br />
A22(�)<br />
ate representation for y1t is<br />
"<br />
#"<br />
y1t<br />
y2t<br />
#<br />
=<br />
"<br />
e1t<br />
e2t<br />
#<br />
, the univari-<br />
[A11(�) − A12(�)A22(�) −1 A21(�)]y1t = e1t − A12(�)A22(�) −1 e2t ≡ υt<br />
(45)<br />
Example 20 Suppose m =4, estimate bivariate <strong>VAR</strong>; three possible mod-<br />
els. The system with variables 1 <strong>and</strong> 3 has errors<br />
Ψ(�)Φ(�)<br />
Φ(�) =<br />
"<br />
"<br />
e2t<br />
e4t<br />
#<br />
where Ψ(�) =<br />
A22(�) A24(�)<br />
A42(�) A44(�)<br />
"<br />
A12(�) A14(�)<br />
# −1<br />
A32(�) A34(�)<br />
.Easytoverifythat:<br />
#<br />
"<br />
υ1t<br />
υ2t<br />
#<br />
≡<br />
"<br />
e1t<br />
e3t<br />
#<br />
−
• Atruem variables <strong>VAR</strong>(1), is transformed into a <strong>VAR</strong>(∞) with disturbance<br />
υt if only m1
What is the problem of omitting shocks?<br />
Aggregation theorem (Faust <strong>and</strong> Leeper (1997)): Structural MA for a<br />
partition with m1
If there are m a shocks of one type <strong>and</strong> m b shocks of another type, m a +<br />
m b = m <strong>and</strong> m1 =2. Then<br />
• eit,i=1, 2 recovers a linear combination of shocks of type i 0 = a, b only<br />
if D ‡ (�) isblockdiagonal.<br />
• eit,i = 1, 2 recovers a linear combination of current shocks of type<br />
i 0 = a, b only if D ‡ (�) =D ‡ , ∀� <strong>and</strong> block diagonal.
Example 21 Suppose m =4,m1 =2,m2 =2.Then<br />
⎡<br />
⎣ D‡<br />
11<br />
D ‡<br />
21<br />
(�) D‡<br />
(�) D‡<br />
12<br />
22<br />
(�) D‡<br />
13 (�) D‡<br />
14 (�)<br />
(�) D‡<br />
23 (�) D‡<br />
24 (�)<br />
⎡<br />
⎤<br />
⎢<br />
⎦ ⎢<br />
⎣<br />
�1t<br />
�2t<br />
�3t<br />
�4t<br />
⎤<br />
⎥<br />
⎦ =<br />
- e1t recovers type 1 shocks if D ‡<br />
13 (�) =D‡ 14 (�) =0<strong>and</strong> e2t recovers type<br />
2shocksifD ‡<br />
21 (�) =D‡ 22 (�) =0.<br />
- e1t recovers current type 1 shocks if D ‡<br />
ii 0(�) =D ‡<br />
ii 0, ∀� i,i 0 =1, 2.<br />
"<br />
e1t<br />
e2t<br />
#
• Non-Wold decompositions (Lippi-Reichlin (1994), Leeper (1991), Hansen-<br />
Sargent (1991)). Certain economic <strong>models</strong> do not have a fundamental<br />
MAR representation.<br />
e.g. Diffusion <strong>models</strong>; <strong>models</strong> where agents anticipate tax changes.<br />
Example 22 Hall consumption/saving problem.<br />
Assume yt = et a white noise. Assume β = R −1 < 1 <strong>and</strong> quadratic<br />
preferences. Solution for consumption: ct = ct−1 +(1− R −1 )et. No<br />
problem!!<br />
If we only observe saving out of labor income st = yt − ct, the solution is<br />
st − st−1 = R −1 et − et−1<br />
(49)<br />
(49) is non-fundamental: the coefficient on et less than the coefficient on<br />
et−1
Estimate<br />
st − st−1 = ut − R −1 ut−1<br />
Different shapes!! Same autocovariance generating function.<br />
• Relationship DSGE <strong>models</strong> <strong>and</strong> <strong>VAR</strong>s<br />
Log-linearized solution of a DSGE model is of the form:<br />
(50)<br />
y2t = A22(θ)y2t−1 + A21(θ)y3t (51)<br />
y1t = A11(θ)y2t−1 + A12(θ)y3t (52)<br />
y2t = states <strong>and</strong> the driving forces, y1t = controls, y3t shocks.
-Ifbothy2t <strong>and</strong> y1t are observables DSGE is a restricted <strong>VAR</strong>(1))<br />
-Ify2 are omitted, what is the representation of y1t?<br />
• Three alternative results for reduced systems with only y1t<br />
Atrue<strong>VAR</strong>(p)modelistransformedineithera<strong>VAR</strong>(∞) or<strong>VAR</strong>MA(p-<br />
1,p-1) or a <strong>VAR</strong>MA(p,p) depending on the assumptions made.
Example 23 Suppose<br />
yt = kt + et (53)<br />
kt = a1kt−1 + a0�t (54)<br />
a1 persistence, a0 contemporaneous effect.If we observe both yt <strong>and</strong> kt<br />
restricted <strong>VAR</strong>(1). No problem.<br />
If only yt is observable<br />
1−a1�<br />
1+a0−a1� yt = et or<br />
yt = a0<br />
1+a0<br />
X<br />
j<br />
( a1<br />
1+a0<br />
) j yt−j + et<br />
(55)<br />
If a0 is small <strong>and</strong> a1 high ( a1<br />
1+a0 )j will be large even for large j. Need<br />
very long lag length to whiten residuals.If long run restrictions are used,<br />
potentially important truncation bias.
Summary<br />
- A system with reduced number of variables needs a very generous lag<br />
length to approximate the dynamics of the true model.<br />
-Ifsamplesizeisshortthiscouldbeaproblem.<br />
- Omitting a ”state” much more important than omitting a ”control”.<br />
- Omission does not matter very much if the true model has dynamics<br />
which die out quickly.<br />
Chari. Kehoe, McGrattan (2006), Christiano, Eichenbaum <strong>and</strong> Vigfusson<br />
(2006), Fern<strong>and</strong>ez et al.(2007), Ravenna (2007).
10 Exercises<br />
1) Take quarterly data for output growth <strong>and</strong> inflation for your favorite country. Identify<br />
supply <strong>and</strong> dem<strong>and</strong> shocks by finding all the rotations which satisfy the following restric-<br />
tions: supply ∆y ↑,Inf ↓, dem<strong>and</strong>∆y ↑,Inf ↑. How do impulse responses produced<br />
by the rotations that jointly satisfy the restrictions look like? How do they compare with<br />
those obtained using the restriction that only supply shocks affect output in the long run,<br />
butbothdem<strong>and</strong><strong>and</strong>supplyshockscanaffect inflation in the long run?<br />
2) Consider the following New Keynesian model<br />
xt = Etxt+1 + 1<br />
ϕ (it − Etπt+1)+v1t<br />
(56)<br />
πt = βπt+1 + κxt + v2t (57)<br />
it = φrit−1 +(1−φr)(φππt + φxxt)+v3t (58)<br />
where xt is the output gap, πt the inflation rate <strong>and</strong> Rt the nominal interest rate.
i) Plot impulse responses to the three shocks.<br />
ii) Simulate 11000 data points from this model after you have set ϕ =1.5, β =0.99,<br />
κ =0.8,φr =0.6,φπ =1.2,φx =0.2 ρ1 =0.9,ρ2 =0.9, σ1 = σ2 = σ3 =0.1, <strong>and</strong> discard<br />
the first 1000. With the remaining data estimate a three variable <strong>VAR</strong>. In particular<br />
(i) estimate the lag length optimally, (ii) check if the model you have selected have well<br />
specified residuals <strong>and</strong> (iii) whether you detect breaks in the specification or not.<br />
iii) With the estimated model apply a Choleski decomposition in the order (y, π, R) <strong>and</strong><br />
check how the impulse responses compare with the true ones in i). Is there any noticeable<br />
difference. Why?<br />
iv) Now try the ordering (R, y, π) doyounoticeanydifference with iii)? Why?<br />
3) Obtain data for output <strong>and</strong> hours (employment) for your favorite country - each group<br />
should use a different country. Construct a measure of labor productivity <strong>and</strong> run a <strong>VAR</strong><br />
on labor productivity <strong>and</strong> hours as in Gali (1999, AER) after you have appropriately selected<br />
the statistical nature of the model. Identify technology shocks as the only source<br />
of labor productivity in the long run. How much of the fluctuations in hours <strong>and</strong> labor
productivity do they explain at the 4 years horizon? Repeat the exercise using the restric-<br />
tion that in response to technology shocks output <strong>and</strong> labor productivity must increase<br />
contemporaneously. Are technology shocks a major source of cyclical fluctuations?
Appendix: Inference in S<strong>VAR</strong>s<br />
- This applies to Choleski, Non-recursive, Long run identifications<br />
- This applies to Classical or Bayesian inference (flat prior)<br />
• Find the maximum likelihood estimators of Aj <strong>and</strong> A0 (this is enough to find the mode<br />
of Aj).<br />
• Find the posterior distribution of Aj <strong>and</strong> A0 (to get the posterior of Aj).<br />
If prior on Aj, A0 is non-informative <strong>and</strong> data abundant, shape of the likelihood is the<br />
same as the shape of the posterior. In the other cases Bayesian analysis is different from<br />
classical one.
Assume Σ� = I. The likelihood of the S<strong>VAR</strong> is<br />
L(Aj, A0|y) ∝ |A −1<br />
0 A−10<br />
0 | −0.5T<br />
exp{0.5 X<br />
(yt − A(�)yt−1)<br />
t<br />
0 (A −1<br />
0 A−10<br />
0 )−1 (yt − A(�)yt−1)} (59)<br />
= |A0| T exp{0.5 X<br />
(yt − A(�)yt−1) 0 (A −1<br />
0 A−10<br />
0 )−1 (yt − A(�)yt−1)}<br />
t<br />
If there are no restrictions on Aj, A(�)ML = A(�)OLS <strong>and</strong> var(A(�)ML) =A −1<br />
0 A−10<br />
0 ⊗<br />
(Y 0<br />
t−1 Yt−1) −1 , Yt−1 =[yt−1,...,yt−p]. Nice, because easy to compute.<br />
(60)
Using the estimator of A(�)ML into the likelihood we have:<br />
L(A(�) =A(�)ML, A0|y) ∝ |A0| T exp{0.5tr(SMLA 0 0A0)} (61)<br />
where SML =(yt − A(�)MLyt−1) 0 (yt − A(�)MLyt−1)/T − k, k is number of regressors in<br />
each equation, tr isthetraceofthematrix.<br />
Conclusion: (Two step approach):<br />
a) Find A(�)ML.<br />
b) Maximize (61) to find A0.<br />
c) Use Aj = AjA0 to trace out structural dynamics.<br />
Typically difficult to maximize analytically, need numerical routines (both for likelihood<br />
<strong>and</strong> posterior computations).
Note if instead of conditioning on A(�)ML we integrate it out we have:<br />
L(A0|y) ∝ |A0| T −k exp{0.5tr(SMLA 0 0A0)}<br />
so if g(A0)| ∝ |A0| k , g(A0|y) ∝ L(A0|y, A(�) =A(�)ML).<br />
• Bayesian analysis with flat priors equivalent to classical analysis conditional on A(�)ML.
Summary<br />
• Choleski identification, no restrictions on the <strong>VAR</strong>. Maximization of (61) implies that<br />
A0A 0 0 =(SML/T ) −1 .HenceÂ0 = chol((SML/T ) −0.5 ). Nice shortcut.<br />
• Non-recursive identification, no restrictions on the <strong>VAR</strong>. Need to maximize (61) (no<br />
short cuts possible).<br />
• Long run restrictions. Note that A(1) −1 = A(1) −1A −1<br />
0 . If A(1)−1 is lower triangular,<br />
A0 can be found using A0A0 0 =(SML/T ) −1 where<br />
A(1) −1<br />
MLA−1 0 is lower triangular. Solution is<br />
Â0 = chol(A(1) −1<br />
ML SML/T A(1) −10<br />
ML )−1 A(1) −1<br />
ML<br />
• If the long run system is not recursive, solution more complicated.<br />
• If the system is over-identified, can’t use a two step approach. Need to jointly maximize<br />
the likelihood function with respect to Aj, A0.
• With sign restrictions no maximization needed. Find the region of the parameter space<br />
which satisfies the restrictions. Can do this numerically using a version of an acceptance<br />
sampling algorithm .
Monte Carlo st<strong>and</strong>ard errors for impulse responses<br />
If prior on Aj, A0 is non-informative posterior is proportional to the likelihood. The<br />
likelihood of the <strong>VAR</strong> is the product of a normal for A(�), conditional on A(�)ML <strong>and</strong><br />
Σ −1 <strong>and</strong> a Wishart for Σ −1 . Then the algorithm works as follows (Choleski system):<br />
Algorithm 10.1<br />
1. Draw Σ −1 from a Wishart, conditional on the data.<br />
2. Set A l 0 = chol(Σ−1 ) l .<br />
3. Draw A(�) l from a Normal with mean A(�)ML <strong>and</strong> variance (A l 0 )−1 (A l 0 )−10<br />
⊗ (Y 0<br />
t−1 Yt−1) −1 .
4. Set A(�) l = A(�) l A l 0 . Compute (A(�)l ) −1 (the MA of the model).<br />
5. Repeat steps 1.-4. L times. Order draws <strong>and</strong> compute percentiles.
If the restrictions are not in Choleski format, substitute step 2. with the maximization of<br />
the likelihood of A0. If the system is overidentified, need to use another approach (see<br />
chapter 10). For long run restrictions use:<br />
Algorithm 10.2<br />
1. Draw Σ −1 from a Wishart, conditional on the data.<br />
2. Set A l 0<br />
= chol(A(1)−1<br />
ML Σl A(1) −10<br />
ML )−1 A(1) −1<br />
ML .<br />
3. Draw A(�) l from a Normal with mean A(�)ML <strong>and</strong> variance (A l 0 )−1 (A l 0 )−10<br />
⊗ (Y 0<br />
t−1 Yt−1) −1 .<br />
4. Repeat steps 1.-4. L times. Order draws <strong>and</strong> compute percentiles.
For a system where sign restrictions are imposed the approach is easy. Just need draws<br />
for Σ <strong>and</strong> A0. Thealgorithmis:<br />
Algorithm 10.3<br />
1. Choose a H such that HH 0 = I.<br />
2. Draw Σ −1 from a Wishart, conditional on the data.<br />
3. Set A l 0 = Hsqrt(Σl ).<br />
4. Draw A(�) l from a Normal with mean A(�)ML <strong>and</strong> variance (A l 0 )−1 HH 0 (A l 0 )−10<br />
⊗ (Y 0<br />
t−1 Yt−1) −1 .
5. Set A(�) l = A(�) l A l 0 . Compute (A(�)l ) −1 (the MA of the model). If column i of<br />
A(�) l ) −1 satisfies the sign restriction, keep draw otherwise throw it.<br />
6. Repeat steps 1.-5. until L draws are obtained. Order draws <strong>and</strong> compute median, mode,<br />
mean, percentiles, etc.<br />
• Could also r<strong>and</strong>omize on H. ManyH such that HH 0 = I. Could have a prior on Hs.<br />
Since H does not enter the likelihood, posterior of H =priorofH.