07.12.2012 Views

VAR models Fabio Canova ICREA-UPF, AMeN and CEPR March ...

VAR models Fabio Canova ICREA-UPF, AMeN and CEPR March ...

VAR models Fabio Canova ICREA-UPF, AMeN and CEPR March ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>VAR</strong> <strong>models</strong><br />

<strong>Fabio</strong> <strong>Canova</strong><br />

<strong>ICREA</strong>-<strong>UPF</strong>, <strong>AMeN</strong> <strong>and</strong> <strong>CEPR</strong><br />

<strong>March</strong> 2009


Outline<br />

• Wold Theorem <strong>and</strong> <strong>VAR</strong> Specification<br />

• Coefficients <strong>and</strong> covariance matrix estimation<br />

• Computing impulse responses, variance <strong>and</strong> historical decompositions.<br />

• Structural <strong>VAR</strong>s.<br />

• Interpretation problems.


References<br />

Hamilton, J., (1994), Time Series Analysis, Princeton University Press, Princeton, NJ,<br />

ch.10-11.<br />

<strong>Canova</strong>, F., (1995), ”<strong>VAR</strong> Models: Specification, Estimation, Inference <strong>and</strong> Forecasting”,<br />

in H. Pesaran <strong>and</strong> M. Wickens, eds., H<strong>and</strong>book of Applied Econometrics, Ch.2,Blackwell,<br />

Oxford, UK.<br />

<strong>Canova</strong>, F., (1995), ”The Economics of <strong>VAR</strong> Models”, in K. Hoover, ed., Macroecono-<br />

metrics: Tensions <strong>and</strong> Prospects, KluwerPress,NY,NY.<br />

Blanchard, O. <strong>and</strong> Quah, D. (1989), ”The Dynamic Effect of Aggregate Dem<strong>and</strong> <strong>and</strong><br />

Supply Disturbances”, American Economic Review, 79, 655-673.<br />

<strong>Canova</strong>, F. <strong>and</strong> Pina, J. (2005) ”Monetary Policy Misspecification in <strong>VAR</strong> <strong>models</strong>”, in C.<br />

Diebolt, <strong>and</strong> Krystou, C. (eds.) New Trends In Macroeconomics, Springer Verlag.


<strong>Canova</strong>, F. <strong>and</strong> De Nicolo, G (2002), ” Money Matters for Business Cycle Fluctuations<br />

in the G7”, Journal of Monetary Economics, 49, 1131-1159.<br />

Cooley, T. <strong>and</strong> Dwyer, M. (1998), ”Business Cycle Analysis without much Theory: A<br />

Look at Structural <strong>VAR</strong>s, Journal of Econometrics, 83, 57-88.<br />

Faust, J. (1998), ” On the Robustness of Identified <strong>VAR</strong> Conclusions about Money” ,<br />

Carnegie-Rochester Conference Series on Public Policy, 49, 207-244.<br />

Faust, J. <strong>and</strong> Leeper, E. (1997), ”Do Long Run Restrictions Really Identify Anything?”,<br />

Journal of Business <strong>and</strong> Economic Statistics, 15, 345-353.<br />

Hansen, L. <strong>and</strong> Sargent, T., (1991), ”Two Difficulties in Interpreting Vector Autoregressions”,<br />

in Hansen, L. <strong>and</strong> Sargent, T., (eds.), Rational Expectations Econometrics,<br />

Westview Press: Boulder London.<br />

Kilian, L. (1998), ”Small Sample confidence Intervals for Impulse Response Functions”,<br />

Review of Economics <strong>and</strong> Statistics, 218-230.


Lippi, M. <strong>and</strong> Reichlin, L., (1993), ”The Dynamic Effect of Aggregate Dem<strong>and</strong> <strong>and</strong><br />

Supply Disturbances: A Comment”, American Economic Review, 83, 644-652.<br />

Lippi, M. <strong>and</strong> Reichlin, L., (1994), ”<strong>VAR</strong> Analysis, Non-Fundamental Representation,<br />

Blaschke Matrices”, Journal of Econometrics, 63, 307- 325.<br />

Marcet, A. (1991), ”Time Aggregation of Econometric Time Series ”, in Hansen, L. <strong>and</strong><br />

Sargent, T., (eds.), Rational Expectations Econometrics, WestviewPress: Boulder&<br />

London.<br />

Sims, C. <strong>and</strong> Zha, T. (1999), “Error B<strong>and</strong>s for Impulse Responses”, Econometrica, 67,<br />

1113-1155.<br />

Sims, C., Stock, J. <strong>and</strong> Watson, M. (1990), ”Inference in Linear Time Series Models with<br />

some unit roots”, Econometrica, 58, 113-144.<br />

Chari, V., Kehoe, P. <strong>and</strong> McGrattan, E. (2004) A critique of Structural <strong>VAR</strong>s using<br />

Business cycle theory, Fed of Minneapolis, working paper 631.


Fern<strong>and</strong>ez Villaverde, J., Rubio Ramirez, J., Sargent, T. <strong>and</strong> M. Watson (2007) The ABC<br />

<strong>and</strong> (D’s) to underst<strong>and</strong> <strong>VAR</strong>s, American Economic Review.<br />

Uhlig, H. (2005) What are the Effects of Monetary Policy? Results from an agnostic<br />

Identification procedure, it Journal of Monetary Economics.<br />

Erceg, C, Guerrieri, L. <strong>and</strong> Gust, C. (2005) Can long run restrictions identify technology<br />

shocks?, Journal of the European Economic Association.<br />

Giordani, P. (2004) ” An Alternative Explanation of the Price Puzzle”, Journal of Mone-<br />

tary Economics, 51, 1271-1296.<br />

Dedola, L. <strong>and</strong> Neri, S. (2007), ”What does a technology shock do? A <strong>VAR</strong> analysis with<br />

model-basedsignrestrictions”,Journal of Monetary Economics.


1 Preliminary<br />

• Lag Operator: �yt = yt−1; � i yt = yt−i, whereyt m × 1 vector.<br />

• Matrix lag Operator (a0 = I normalization):<br />

a(�)yt ≡ a0yt + a1�yt + a2� 2 yt + .......aq� q yt<br />

= yt + a1yt−1 + a2yt−2 + .......aqyt−q (1)<br />

Example 1 yt = et + d1et−1 + d2et−2. Using the lag operator yt = et +<br />

d1�et + d2� 2 et or yt =(1+d1� + d2� 2 )et ≡ d(�)et.<br />

Example 2 yt = a1yt−1 + et. Using the lag operator yt = a1�yt + et or<br />

yt(1 − a1�) =et or a(�)yt = et.


2 What are <strong>VAR</strong>s?<br />

- They are multivariate autoregressive linear time series <strong>models</strong> of the form<br />

yt = A1yt−1 + A2yt−2 + ...+ Aqyt−q + et et ∼ (0, Σe) (2)<br />

where yt is a m × 1 vector <strong>and</strong> Aj are m × m matrices each j =1,...q.<br />

Advantages:<br />

- Every variable is interdependent <strong>and</strong> endogenous.<br />

-Anyyt has a autoregressive representation under some conditions.<br />

- Simple to use <strong>and</strong> estimate.


Disadvantages :<br />

- <strong>VAR</strong> is a reduced form model; no economic interpretation is possible.<br />

- Potentially difficult to relate <strong>VAR</strong> dynamics with DSGE dynamics.


3 Woldtheorem<strong>and</strong>thenews<br />

Wold Theorem: Underlinearity <strong>and</strong> stationarity, any vector of time series<br />

y † t canbewrittenasy† t = ay−∞ + P ∞ j=0 Djet−j, wherey−∞ contains<br />

constants, et−j are the news at t − j, Dj are m × m matrices each j, <strong>and</strong><br />

a is a m × k matrix of coefficients.<br />

-Letyt ≡ y † t − ay−∞. Wold theorem tells us that, apart from initial<br />

conditions, time series are the accumulation over time of news.<br />

- A news et =1attimet has D0 effect on yt, D1 effect on yt+1, D2<br />

on yt+2, etc.. Henceyt is a moving average (MA) of the news, i.e. yt =<br />

D(�)et.


Two issues<br />

a) If Ft−1 is the information available at t − 1, the news are<br />

et = yt − E[yt|Ft−1] (3)<br />

• The news are unpredictable given the past (E(et|Ft−1) =0),butcontemporaneously<br />

correlated (et ∼ (0, Σe)).<br />

To give a name the news in each equation, need to find a matrix ˜P such<br />

that ˜P ˜P 0 = Σe. Then:<br />

yt = D(�) ˜P ˜P −1 et = ˜D(�)˜et ˜et ∼ (0, ˜P −1 Σe<br />

˜P −10<br />

= I) (4)<br />

Examples of ˜P: Choleski (lower triangular) factor; ˜P = PΛ 0.5 ;whereP is<br />

the eigenvector matrix, Λ the eigenvalue matrix, etc.


Example 3 If Σe =<br />

˜P −1 et ∼ (0,I).<br />

"<br />

1 4<br />

4 25<br />

b) The news are not uniquely defined.<br />

In fact, for any H such that HH 0 = I<br />

<strong>and</strong> E(et,e 0 t )=E(˜et, ˜e 0 t ).<br />

#<br />

its Choleski factor is ˜P =<br />

yt = D(�)et = D(�)HH 0 et = ˜D(�)˜et<br />

"<br />

1 4<br />

0 3<br />

#<br />

so that<br />

(5)


• St<strong>and</strong>ard packages choose the ”fundamental” news representation: i.e.<br />

the one for which D0 is the largest among all the Dj coefficients.<br />

• Some economic <strong>models</strong> imply non-fundamental representations (e.g.<br />

<strong>models</strong> where news are anticipated)- see later on.


<strong>VAR</strong>s<br />

• If the Dj coefficients decay to zero fast enough, D(�) isinvertible <strong>and</strong><br />

where I − A(�) =D(�) −1 .<br />

yt = D(�)et<br />

D(�) −1 yt = et<br />

yt = A(�)yt−1 + et (6)<br />

• A<strong>VAR</strong>(∞) can represents any vector of time series yt under linearity,<br />

stationarity <strong>and</strong> invertibility.<br />

• A <strong>VAR</strong>(q), q fixed, approximates yt well if Dj are close to zero for j<br />

large.


Summary<br />

- We can represent any data with a linear <strong>VAR</strong>(∞) under the assumptions<br />

made.<br />

-Withafinite sample of data need to carefully check the lag length of the<br />

<strong>VAR</strong> (news can’t be predictable).<br />

- If we want a constant coefficient representation, we need stationarity of<br />

yt.


4 Specification<br />

Many ways of choosing the lag length:<br />

A) Likelihood ratio (LR) test<br />

LR = 2[lnL(α un , Σ un<br />

e ) − ln L(α re , Σ re<br />

e )] (7)<br />

= T (ln |Σ re<br />

e | − ln |Σun e |) D → χ 2 (ν) (8)<br />

where L is the likelihood function, ”UN”(”RE”) denotes the unrestricted<br />

(restricted) estimator, ν = number of restrictions of the form R(α) =0.<br />

• LR test biased in small samples. If T small, use<br />

LR c =(T − qm)(ln |Σ re | − ln |Σ un |)<br />

where q = number of lags, m =numberofvariables.


• Sequential testing approach<br />

1) Choose an upper ¯q<br />

2) Test <strong>VAR</strong>(¯q − 1) against <strong>VAR</strong>(¯q), if not reject<br />

3) Test <strong>VAR</strong>(¯q − 2) against <strong>VAR</strong>(¯q − 1)<br />

4) Continue until rejection.<br />

ML ratio is an in-sample criteria. What if we are interested in out-of-sample<br />

forecasting exercises?


Let Σy(1) =<br />

T +mq<br />

T Σe.<br />

B) AIC criterion: minq AIC(q) =ln|Σy(1)|(q)+ 2qm2<br />

T<br />

• AIC is inconsistent. It overestimates true order q with positive probability.<br />

C) HQC criterion: minq HQC(q) =ln|Σy(1)|(q)+(2qm2 )<br />

ln ln T<br />

T<br />

• HQC is consistent (in probability).<br />

D) SWC criterion: minq SWC(q) =ln|Σy(1)|(q)+(qm2 )<br />

ln T<br />

T<br />

• SWC is strongly consistent (in a.s.).


• Criteria B)-D) trade-off the fit ofthemodel(thesizeofΣe) withthe<br />

number of parameters of the model m∗q for a given sample size T . Hence<br />

criteria B)-D) prefer smaller to larger scale <strong>models</strong>.<br />

Criterion T=40 T=80 T=120 T=200<br />

q=2q=4q=6q=2q=4q=6q=2 q=4 q=6 q=2q=4q=6<br />

AIC 1.6 3.2 4.8 0.8 1.6 2.4 0.53 1.06 1.6 0.32 0.64 0.96<br />

HQC 0.52 4.17 6.26 1.18 2.36 3.54 0.83 1.67 2.50 0.53 1.06 1.6<br />

SWC 2.95 5.9 8.85 1.75 3.5 5.25 1.27 2.55 3.83 0.84 1.69 2.52<br />

Table 1: Penalties of AIC, HQC, SWC, m=4<br />

- Penalties increase with q <strong>and</strong> fall with T . Penalty of SWC is the harshest.<br />

- Ivanov <strong>and</strong> Kilian (2006): Quality of B)-D) depends on the frequency of<br />

data <strong>and</strong> on the DGP. Typically HQC more appropriate.


• Criteria A)-D) must be applied to the system not to single equations.<br />

Example 4 <strong>VAR</strong> for the Euro area, 1980:1-1999:4; use output, prices, interest<br />

rates <strong>and</strong> M3, set ¯q =7.<br />

Hypothesis LR LR c q AIC HQC SWC<br />

q=6 vs. q=72.9314e-5(∗) 0.0447 7 -7.556 -6.335 -4.482<br />

q=5 vs. q=6 3.6400e-4 0.1171 6 -7.413 -6.394 -4.851<br />

q=4 vs. q=5 0.0509 0.5833 5 -7.494 -6.675 -5.437<br />

q=3 vs. q=4 0.0182 0.4374 4 -7.522 -6.905 -5.972<br />

q=2 vs. q=3 0.0919 0.6770 3-7.635(∗)-7.219(∗) -6.591<br />

q=1 vs. q=2 3.0242e-7 6.8182e-3(∗)2 -7.226 -7.012 -6.689(∗)<br />

Table 2: Tests for the Lag length of a <strong>VAR</strong><br />

• Different criteria choose different lag lenghts.


Checking Stationarity<br />

All variable stationary/ all unit roots → easy.<br />

Some cointegration. Transform <strong>VAR</strong> into VECM.<br />

• Impose cointegration restrictions.<br />

• Disregard cointegration restrictions.<br />

Data are stationary. Can’t see it because of small samples.<br />

If Bayesian: stationarity/nonstationarity issue does not matter for inference.


Checking for Breaks<br />

Wald test: yt =(A1(�)I1)yt−1 +(A2(�)I2)yt−1 + et<br />

I1 =0fort ≤ t1; I1 =1fort>t1 <strong>and</strong> I2 =1− I1.<br />

Use S(t1,T)=T (ln |Σre e | − ln |Σun e |) D → χ2 (ν); ν = dim(A1(�)) (Andrew<br />

<strong>and</strong> Ploberger (1994)).<br />

If t1 unknown, but belongs [t l ,t u ] compute S(t1,T)forallthet1 in the<br />

interval. Check for breaks using maxt1 S(t1,T).


5 Alternative Representation of <strong>VAR</strong>(q)<br />

Consider<br />

yt, et m × 1 vectors; et ∼ (0, Σe).<br />

yt = A(�)yt−1 + et<br />

Different representation useful for different purposes.<br />

- Companion form useful for computing moments, ML estimators.<br />

- Simultaneous equation useful for evaluating the likelihood <strong>and</strong> computing<br />

restricted estimates.<br />

(9)


5.1 Companion form<br />

• Transform a m-variable <strong>VAR</strong>(p) into a mp-variable <strong>VAR</strong>(1).<br />

Example 5 Consider a <strong>VAR</strong>(3). Let Yt =[yt,yt−1,yt−2] 0 ; Et =[et, 0, 0] 0 ;<br />

<strong>and</strong><br />

A =<br />

⎡<br />

⎢<br />

⎣<br />

A1 A2 A3<br />

Im 0 0<br />

0 Im 0<br />

Then the <strong>VAR</strong>(3) can be rewritten as<br />

⎤<br />

⎥<br />

⎦ Σ E =<br />

⎡<br />

⎢<br />

⎣<br />

Σe 0 0<br />

0 0 0<br />

0 0 0<br />

Yt = AYt−1 + Et Et ∼ N(0, Σ E) (10)<br />

where Yt, Et are 3m × 1 vectors <strong>and</strong> A is 3m × 3m.<br />

⎤<br />

⎥<br />


5.2 Simultaneous equations setup (SES)<br />

There are two alternative representations:<br />

1) Let xt =[yt−1,yt−2,...]; X =[x1,...,x T ] 0 (a T × mq matrix), Y =<br />

[y1,...,y T ] 0 (a T ×m matrix); <strong>and</strong> if A =[A 0 1 ,...A0 q] 0 is a mq ×m matrix<br />

Y = XA + E (11)<br />

2) Let i indicate the subscript for the i − th column vector. The equation<br />

for variable i is yi = xαi + ei. Stacking the columns of yi,ei into where<br />

mT × 1 vectors we have<br />

y = (Im ⊗ x)α + e ≡ Xα + e (12)


6 Parameters <strong>and</strong> covariance matrix estimation<br />

6.1 Unrestricted <strong>VAR</strong>(q)<br />

Assume that y−q+1,...,y0 are known <strong>and</strong> et ∼ N(0, Σe) then<br />

where A 0 1<br />

yt|(yt−1,...,y0,y−1,y−q+1) ∼ N(A(�)yt−1, Σe) (13)<br />

∼ N(A 0 1 Yt−1, Σe) (14)<br />

is the first row of A (m × mq). Let α = vec(A1).


Since f(yt|yt−1,...,y−q+1) = Q<br />

j f(yj|yj−1,...,y−q+1) ln L(α, Σe) = X<br />

ln L(yj|yj−1,...,y−q+1)<br />

Setting<br />

∂ ln L(α,Σe)<br />

∂α<br />

A 0 1,ML<br />

j<br />

= − Tm<br />

2 ln(2π)+T<br />

2<br />

− 1 X<br />

2<br />

t<br />

=0wehave<br />

= [<br />

<strong>and</strong>j-thcolumn(a1× mq vector) is<br />

A 0 1j,ML =[X<br />

t<br />

ln |Σ−1<br />

e |<br />

(yt − A 0 1 Yt−1) 0 Σ −1<br />

e (yt − A 0 1 Y j−1) (15)<br />

TX<br />

Yt−1Y<br />

t=1<br />

0 t−1 ]−1 TX<br />

[ Yt−1y<br />

t=1<br />

0 t ]=A01,OLS Yt−1Y0 t−1 ]−1 TX<br />

[ Yt−1yjt] =A<br />

t=1<br />

0 1j,OLS<br />

(16)


Why is OLS equivalent to maximum likelihood?<br />

- Because, if the initial conditions are known, maximizing the log-likelihood<br />

is equivalent to minimizing the sum of square errors!<br />

Why is it that single equation OLS is the same as full information maximum<br />

likelihood?<br />

- Because we have the same regressors in every equation!


Plugging A 1,ML into ln L(α, Σe), we obtain the concentrated likelihood<br />

ln L(Σe) = − T<br />

2<br />

1<br />

(m ln(2π)+ln|Σ−1 e |) −<br />

2<br />

t=1<br />

where et,ML =(yt− A1,MLYt−1). Using ∂(b0Qb) we have<br />

<strong>and</strong> σ i,i 0 = 1 T<br />

∂ ln L(Σe)<br />

∂Σ<br />

Σ 0 ML 6= Σ0 OLS<br />

= T 2 Σ0 e − 1 2<br />

Σ 0 ML<br />

TX<br />

∂Q = b0b; PTt=1 et,MLe0 t,ML =0or<br />

= 1<br />

T<br />

P Tt=1 e i 0 t,ML e 0 it,ML .<br />

= 1<br />

T −1<br />

P Tt=1 e t,MLe 0 t,ML<br />

TX<br />

et,MLe t=1<br />

0 t,ML<br />

e 0 t,ML Σ−1<br />

e e t,ML (17)<br />

∂ ln |Q|<br />

∂Q =(Q0 ) −1<br />

butequivalentforlargeT .<br />

(18)


6.2 <strong>VAR</strong>(q) with restrictions<br />

Assume restrictions are of the form α = Rθ+r, whereR is mk×k1 matrix<br />

of rank k1; r is a mk × 1 vector; θ a k1 × 1vector.<br />

Example 6 i) Lag restrictions: Aq =0.Herek1 = m2 (q − 1), r =0,<strong>and</strong><br />

R =[Im1 , 0].<br />

ii) Block exogeneity of y2t in a bivariate <strong>VAR</strong>(2). Here<br />

R = blockdiag[R1,R2], whereRi, i =1, 2 is upper triangular.<br />

iii) Cointegration restrictions.


Plugging the restrictions in (12) we have<br />

y =(Im ⊗ x)α + e =(Im ⊗ x)(Rθ + r)+e<br />

Let y † ≡ y − (I ⊗ x)r =(I ⊗ x)Rθ + e. Since<br />

∂ ln L<br />

∂θ<br />

= R∂ ln L<br />

∂α :<br />

θML = [R 0 (Σ −1<br />

e ⊗ x 0 x)R] −1 R[Σ −1<br />

e ⊗ x]y †<br />

(19)<br />

αML Σ<br />

= RθML + r (20)<br />

0 e = 1 X<br />

eMLe T<br />

0 ML<br />

(21)<br />

t


• For a <strong>VAR</strong>(q) without restrictions:<br />

Summary<br />

-ML<strong>and</strong>OLSestimatorsofA1 coincide.<br />

- OLS estimation of A1, equation by equation, is consistent <strong>and</strong> efficient<br />

(if assumptions are correct).<br />

- OLS <strong>and</strong> ML estimators of Σe asymptotically coincide for large T .


• For a <strong>VAR</strong>(q) with restrictions:<br />

-MLestimatorofA1 is different from the OLS estimator.<br />

-MLisconsistent/efficient if restrictions are true. It is inconsistent if<br />

restrictions are false.<br />

In general:<br />

- OLS consistent if stationarity assumption is wrong (t-tests incorrect).<br />

- OLS inconsistent if lag length wrong (regressors correlated with error<br />

term).


7 Summarizing the results<br />

Unusual to report estimates if <strong>VAR</strong> coefficients, st<strong>and</strong>ard errors <strong>and</strong> R 2 .<br />

-Mostof<strong>VAR</strong>coefficients insignificant.<br />

- R 2 always exceeds 0.99.<br />

How do we summarize results in an informative way?


7.1 Impulse responses (IR)<br />

• What is the effect of a surprise cut in interest rates on inflation?<br />

• It traces out the MAR of yt.<br />

Three ways to calculate impulse responses:<br />

- Recursive approach.<br />

- Non-recursive approach.<br />

- Forecast revisions.


• Recursive method.<br />

Assume we have an estimate Aj. ThenDτ =[D i,i0<br />

τ ]= Pmax[τ,q]<br />

j=1<br />

where τ refers to the horizon, D0 = I, Dj =0∀ j ≥ q.<br />

Aτ−jDj,<br />

Example 7 Suppose yt =¯y + A1yt−1 + A2yt−2 + et. Then applying the<br />

formula we have D0 = I, D1 = D0A1, D2 = D1A1 + D0A2, ...,<br />

D k = D k−1A1 + D k−2A2 + ...+ D k−qAq.<br />

For orthogonal news: ˜Pe ˜P 0 e = Σe then ˜D k = D k ˜Pe.


Sometimes useful to calculate multipliers to the news.<br />

• Long run multiplier D(1) = (A0 + A1 + ...+ Aq) −1<br />

• Partial multipliers, up to horizon τ, are( P τ j=0 Aj) −1 .


7.2 Variance decomposition: τ-steps ahead forecast error<br />

• How much of the variance of, say, output is due to supply shocks?<br />

Uses:<br />

yt+τ − yt(τ) =<br />

τ−1<br />

X<br />

j=0<br />

yt(τ) istheτ-steps ahead prediction of yt.<br />

˜Dj˜et+τ−j D0 = I (22)<br />

Computes share of the variance of yi,t+τ − yi,t(τ) due to each ˜e i 0 ,t+τ−j ,<br />

i, i 0 =1, 2,...,m.


7.3 Historical decomposition<br />

• What is the contribution of supply shocks to the productivity revival of<br />

the late 1990s?<br />

Let ˆyi,t(τ) =yi,t+τ − yi,t(τ) betheτ-steps ahead forecast error in the i-th<br />

variable of the <strong>VAR</strong>. Then:<br />

ˆyi,t(τ) =<br />

mX<br />

i 0 =1<br />

˜D i0<br />

(�)˜e i 0 t+τ<br />

- Computes the path of ˆyi,t(τ) due to each ˜e i 0.<br />

(23)<br />

• Same ingredients are needed to compute impulse responses, the variance<br />

<strong>and</strong> the historical decompositions. Different packaging!!


Example 8 US data for (Y,π, R, M1) for 1973:1-1993:12. Othogonalize<br />

using a Choleski decomposition. What is the effect of a money shock?<br />

1.0<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0.0<br />

-0.2<br />

-0.4<br />

response to a shock in money<br />

0 5 10 15 20<br />

gnp<br />

prices<br />

interest<br />

money


What is the contribution of various shocks to var(y) <strong>and</strong> var(π)?<br />

Y π<br />

HorizonShock1Shock2Shock3Shock4Shock1Shock2Shock3Shock4<br />

4 0.99 0.001 0.003 0.001 0.07 0.86 0.01 0.03<br />

12 0.93 0.01 0.039 0.02 0.24 0.60 0.08 0.07<br />

24 0.79 0.01 0.15 0.04 0.52 0.36 0.07 0.04<br />

Table 3: Variance decomposition, percentages


Historical decomposition of GDP, conditional on 1989 information.<br />

4.55<br />

4.50<br />

4.45<br />

4.40<br />

4.35<br />

4.30<br />

4.25<br />

4.20<br />

4.55<br />

4.50<br />

4.45<br />

4.40<br />

4.35<br />

4.30<br />

4.25<br />

4.20<br />

Shocks in gnp<br />

1975 1976 1977 1978<br />

Shocks in prices<br />

Historical decomposition of gnp<br />

variable<br />

baseline<br />

shocks<br />

variable<br />

baseline<br />

shocks<br />

1975 1976 1977 1978<br />

4.55<br />

4.50<br />

4.45<br />

4.40<br />

4.35<br />

4.30<br />

4.25<br />

4.20<br />

4.55<br />

4.50<br />

4.45<br />

4.40<br />

4.35<br />

4.30<br />

4.25<br />

4.20<br />

Shocks in interest<br />

variable<br />

baseline<br />

shocks<br />

1975 1976 1977 1978<br />

Shocks in money<br />

variable<br />

baseline<br />

shocks<br />

1975 1976 1977 1978


8 Identification: Obtaining S<strong>VAR</strong>s<br />

8.1 Why Structural <strong>VAR</strong>s<br />

<strong>VAR</strong>s are reduced form <strong>models</strong>. Therefore:<br />

• Shocks are linear combination of meaningful economic disturbances.<br />

• Difficult to relate responses computed from <strong>VAR</strong>s with responses of<br />

theoretical <strong>models</strong>.<br />

• Can’t be used for policy analyses (Lucas critique).


What is a S<strong>VAR</strong>? It is a linear dynamic structural model of the form:<br />

Its reduced form is:<br />

A0yt = A1yt−1 + ...+ Aqyt−q + εt εt ∼ (0, Σε) (24)<br />

yt = A1yt−1 + ...+ Aqyt−q + et et ∼ (0, Σe) (25)<br />

where Aj = AjA −1<br />

0 , et = A −1<br />

0 εt.<br />

We want to go from (25) to (24), since (25) is easy to estimate (just use<br />

OLS equation by equation). To do this, we need A0. But to estimate it,<br />

we need restrictions, since Aj, Σe have less free parameters than A0, Σε.<br />

Distinguish: Stationary vs. Nonstationary <strong>VAR</strong>s.


8.2 Stationary <strong>VAR</strong>s<br />

<strong>VAR</strong> : yt = A(�)yt−1 + et et ∼ (0, Σe) (26)<br />

S<strong>VAR</strong> : A0yt = A(�)yt−1 + �t �t ∼ (0, Σ� = diag{σi}) (27)<br />

Log linearized DSGE <strong>models</strong> are stationary S<strong>VAR</strong>s! We know<br />

y2t = A22y2t−1 + A21y3t (28)<br />

y1t = A11y2t−1 + A12y3t (29)<br />

where y2t are states, y1t are controls, y3t are shocks. So<br />

"<br />

A21<br />

A0 =<br />

0<br />

# −1<br />

0<br />

,<br />

A12<br />

"<br />

A21 A(�) =<br />

0<br />

# −1 "<br />

0<br />

A12<br />

A22 0<br />

A11 0<br />

#


(26) <strong>and</strong> (27) imply<br />

so that<br />

A0et = �t<br />

(30)<br />

A −1<br />

0 Σ�A 0 −1<br />

0 = Σe (31)<br />

To recover structural parameters from (31) we need at least as many<br />

equations as unknowns.<br />

• Order condition: If there are m variables, need m(m − 1)/2 restrictions.<br />

This is because there are m 2 free parameters on the left h<strong>and</strong> side of (31)<br />

<strong>and</strong> only m(m+1)/2 parameters in Σe (m 2 = m(m+1)/2+m(m−1)/2).<br />

• Rank condition: rank of A −1<br />

0 Σ�A 0−1 0<br />

• Just identified vs. overidentified.<br />

equal to the rank of Σe.


Example 9 i) Choleski decomposition of Σe has exactly m(m − 1)/2 zeros<br />

restrictions. Implications:<br />

- A −1<br />

0 is lower triangular.<br />

-Variablei does not affect variable i − 1 simultaneously, but it affects<br />

variable i +1.<br />

ii) yt =[GDPt,Pt,it,Mt]. Thenneed6restrictions,e.g.<br />

⎡<br />

⎢<br />

⎣<br />

1 0 0 0<br />

α01 1 0 α02<br />

0 0 1 α03<br />

α04 α05 α06 1<br />

⎤<br />

⎥<br />


How do you estimate a S<strong>VAR</strong>? Use a two-step approach:<br />

- Get (unrestricted) estimates of A(�) <strong>and</strong>Σe.<br />

- Use restrictions on A0 to estimate Σ� <strong>and</strong> free parameters of A0.<br />

-UseA(�) =A −1<br />

0 A(�) to trace out structural dynamics.<br />

Unless the system is in Choleski format, we need ML to estimate A0 in<br />

just identified systems (see appendix).<br />

For over-identified systems, always need ML to estimate A0.


Example 10 (Blanchard <strong>and</strong> Perotti, 2002) <strong>VAR</strong> with Tt,gt,Tt. Assume<br />

A0et = B�t where<br />

A0 =<br />

⎡<br />

⎢<br />

⎣<br />

1 0 a01<br />

0 1 a02<br />

a03 a04 1<br />

⎤<br />

⎥<br />

⎦ B =<br />

⎡<br />

⎢<br />

⎣<br />

1 b1 0<br />

b2 1 0<br />

0 0 1<br />

Impose that there is no discretionary response in Tt <strong>and</strong> gt to yt within the<br />

quarter (information delay).<br />

6+3 (variance) parameters, at most 6 parameters in Σe. Need additional<br />

restrictions. Get information about a01,a02 from external sources: impose<br />

either b1 =0or b2 =0<br />

With a01,a02 fixed, Two stage approach has a IV interpretation: �1t,�2t<br />

used a instruments in third equation.<br />

⎤<br />

⎥<br />


8.3 Nonstationary <strong>VAR</strong>s<br />

Let <strong>VAR</strong> <strong>and</strong> S<strong>VAR</strong> be:<br />

∆yt = D(�)et = D(1)et + D ∗ (�)∆et (32)<br />

∆yt = D(�)A0�t = D(�)(1)A0�t + D ∗ (�)A0∆�t (33)<br />

where D(�) =(I−A(�)�) −1 , D(�) =(1−A(�)�) −1 ,D∗ (�) ≡ D(�)−D(1)<br />

1−� ,<br />

D∗ (�) ≡ D(�)−D(1)<br />

1−� .Matchingcoefficients: D(�)A0�t = D(�)et.<br />

Separating permanent <strong>and</strong> transitory components <strong>and</strong> using for the latter<br />

only contemporaneous restrictions we have<br />

D(1)A0�t = D(1)et (34)<br />

A0∆�t = ∆et (35)<br />

If yt is stationary, D(1) = D(1) = 0 <strong>and</strong> (34) is vacuous.


Two types of restrictions to estimate A0: short <strong>and</strong> long run.<br />

Example 11 In a <strong>VAR</strong>(2)imposing (34) requires one restriction. Suppose<br />

that D(1) 12 =0(�2t has no long run effect on y1t). If Σ� = I, thethree<br />

elements of D(1)A0Σ�A 0 0 D(1)0 can be obtained from the Choleski factor<br />

of D(1)ΣeD(1) 0 .<br />

• Blanchard-Quah: decomposition in permanent-transitory components<br />

(use (34)-(35)). If yt =[∆y1t,y2t], (m × 1); y1t are I(1); y2t are I(0) <strong>and</strong><br />

yt =¯y + D(�)�t, where�t ∼ iid(0, Σ�)<br />

µ ∆y1t<br />

∆y2t<br />

<strong>and</strong> D1(1) = [1, 0].<br />

<br />

=<br />

µ ¯y1<br />

0<br />

<br />

+<br />

µ D1(1)<br />

0<br />

<br />

�t +<br />

µ (1 − �)D †<br />

1 (�)<br />

(1 − �)D †<br />

2 (�)<br />

-y2t could be any variable which is stationary <strong>and</strong> is influenced by both<br />

shocks<br />

<br />

�t<br />

(36)


-Choleskisystems<br />

Example 12<br />

Problems with st<strong>and</strong>ard identification:<br />

pt = a11e s t (37)<br />

yt = a21e s t + a21e d t (38)<br />

Price is set in advance of knowing dem<strong>and</strong> shocks. Choleski ordering with<br />

p first.<br />

This is equivalent to estimating p on lagged p <strong>and</strong> lagged y (this gives e s t )<br />

<strong>and</strong>thenestimatingy on lagged y, oncurrent<strong>and</strong>laggedp (this gives e s t ).


yt = a11e s t (39)<br />

pt = a21e s t + a21e d t (40)<br />

Quantity is set in advance of knowing dem<strong>and</strong> shocks. Choleski ordering<br />

with y first.<br />

This is equivalent to estimating y on lagged y <strong>and</strong> lagged p (this gives e s t )<br />

<strong>and</strong>thenestimatingp on lagged p, on current <strong>and</strong> lagged y (this gives e s t ).<br />

In general without a structural model in mind difficult to interpret Choleski<br />

systems. Cooley-LeRoy (1985): unless some strong restrictions are imposed<br />

dynamic <strong>models</strong> do not have a Choleski structure.


- Long run restrictions Faust-Leeper (1997).<br />

1.2<br />

0.8<br />

0.4<br />

-0.0<br />

-0.4<br />

-0.8<br />

-1.2<br />

Restriction not satified<br />

Long run restrictions<br />

5 10 15 20<br />

1.2<br />

0.8<br />

0.4<br />

-0.0<br />

-0.4<br />

-0.8<br />

-1.2<br />

5 10 15 20<br />

Restriction satified


- Long run restrictions Cooley-Dweyer (1998): take a RBC driven by a<br />

unit root technology shock. Simulate data. Run a <strong>VAR</strong> with(yt,nt) <strong>and</strong><br />

identify two shocks (permanent/transitory).Possible to do this. Transitory<br />

shocksexplainlargeportionofvarianceofyt.<br />

- Long run restrictions Erceg, et. al (2005): long run restrictions poor in<br />

small samples. Chari, et. al. (2006) potentially important truncation bias<br />

due to a <strong>VAR</strong>(q) q finite.


- Short run restrictions (<strong>Canova</strong>-Pina (2006))<br />

The DGP is a 3 equations New-Keynesian model<br />

True responses Inertial responses


Summary<br />

- Problematic to relate S<strong>VAR</strong> identified with Choleski, Short or Long restriction<br />

to theories.<br />

- Solution link more S<strong>VAR</strong> to theory: use restrictions which are more<br />

common in DSGE <strong>models</strong>


8.4 Alternative identification Scheme<br />

<strong>Canova</strong>-De Nicolo’ (2002), Faust (1998), Uhlig (1999): use sign (<strong>and</strong><br />

shape) restrictions.<br />

Example 13 i) Aggregate supply shocks: Y ↑, Inf ↓; aggregate dem<strong>and</strong><br />

shocks: Y ↑, Inf ↑ → dem<strong>and</strong> <strong>and</strong> supply shocks impose different<br />

sign restrictions on cov(Yt,INFs). Restrictions shared by a large class of<br />

<strong>models</strong> with different foundations. Use these for identification.<br />

ii) Monetary Shocks: response of Y is humped shaped, dies out in 3-4<br />

quarters → shape restrictions on cov(Yt,is). Use these for identification.


Exploit the non-uniqueness of news.<br />

- Given any set of orthogonal news, check if responses of yit to shocks εjt<br />

have the right sign. If not<br />

- Construct another set of news <strong>and</strong> repeat the exercise.<br />

-Stopwhenyoufind a εjt with the right characteristics or<br />

-Takealltherepresentationssatisfyingtherestrictions<strong>and</strong>computethe<br />

mean/ median (<strong>and</strong> s.e.) of the statistics of interest for all of those satisfying<br />

the restrictions.


Implementation of sign restrictions (<strong>Canova</strong>-De Nicolo’(2002)):<br />

• Orthogonalize Σe = ˜P ˜P 0 (e.g. Choleski or eigenvalue-eigenvector decomposition).<br />

• Check if any shock produces the correlation pattern for (yit, y i 0 t ). If not<br />

• For any H : HH 0 = I, Σe = ˜PHH 0 ˜P 0 = ˆP ˆP 0 .<br />

• Check if any shock under new decomposition produced the required<br />

correlation pattern for (yit, y i 0 t ). If not choose another H, etc.


• Number of H infinite. Write H = H(ω), ω ∈ (0, 2π). H(ω) are called<br />

rotation (Givens) matrices.<br />

Example 14 Suppose M=2. Then H(ω) =<br />

"<br />

#<br />

"<br />

cos(ω) −sin(ω)<br />

sin(ω) cos(ω)<br />

#<br />

or H(ω) =<br />

cos(ω)<br />

sin(ω)<br />

sin(ω)<br />

−cos(ω)<br />

. Varying ω, we trace out all possible structural MA<br />

representations that could have generated the data.


Example 15 Comparing responses to US moneary shocks 1964-2001.<br />

Prices<br />

Output<br />

Money<br />

10<br />

-5<br />

-10<br />

-15<br />

5<br />

0<br />

-6<br />

-12<br />

10<br />

-10<br />

-20<br />

-30<br />

6<br />

0<br />

0<br />

Sign restrictions<br />

0<br />

0<br />

0<br />

Horizon (Months)<br />

12<br />

6<br />

0<br />

-6<br />

10<br />

5<br />

0<br />

-5<br />

18<br />

9<br />

0<br />

-9<br />

Choleski restrictions<br />

0<br />

0<br />

0<br />

Horizon (Months)


Example 16 Studying the effects of fiscal shocks in US states: 1950-2005.<br />

corr(G,Y)corr(T,Y)corr(G, DEF)corr(T,DEF)corr(G,T)<br />

Gshocks > 0 > 0 > 0<br />

BB shocks < 0 = 0 = 1<br />

Tax shocks < 0 < 0 = 0<br />

Table 4: Identification restrictions


8.5 Sign restrictions in large systems<br />

• The use of rotation matrices is complicated in large scale systems since<br />

there are many rotations one needs to consider.<br />

Algorithm 8.1 1. Start from some orthogonal representation yt = D(�)�t<br />

2. Draw an m × m matrix G from N(0,1). Find G = QR.<br />

3. Compute responses as D 0 (�) =D(�) ∗ Q. Check if restrictions are satisfied.<br />

4. Repeat 2.-3. until L draws are found.


9 Interpretation problems with <strong>VAR</strong>s<br />

• Time Aggregation (Sargent-Hansen (1991), Marcet (1991)).<br />

- Agents take decisions at a frequency which is different than the frequency<br />

of the data available to the econometrician. What are the consequences?<br />

- The MA representation of the econometrician is a complex combination<br />

of the MA representation due to agents’ actions.


Example 17 A humped-shaped monthly response can be transformed into<br />

a smoothly declining quarterly response.<br />

Size of responses<br />

1.4<br />

1.2<br />

1.0<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0.0<br />

-0.2<br />

monthly<br />

quarterly<br />

0 2 4 6<br />

Horizon (Months)<br />

8 10 12 14<br />

How to detect aggregation problems? Run a <strong>VAR</strong> with data at different<br />

frequencies, if you can. Check if differences exists.


• Non-linearities<br />

Example 18 (Markov switching model). Suppose P (st =1|st−1 =1)=<br />

p, P (st =0|st−1 =0)=q. This process has a linear <strong>VAR</strong> representation<br />

st =(1− q)+(p + q − 1)st−1 + et<br />

<strong>and</strong>aslongaseitherp or q or both are less than one a MAR exist. Good!!!<br />

But: errors are non-normal (binomial). Conditional on st−1 =1<br />

Conditional on st−1 =0<br />

et = 1−p with probability p (41)<br />

= −p with probability 1 − p (42)<br />

et = −(1 − q) with probability q (43)<br />

= q with probability 1 − q (44)


• How do you check for normality/ nonlinearities<br />

-Ifet is normal:<br />

T 0.5<br />

"<br />

S3<br />

S4 − 3 ∗ Im<br />

#<br />

∼ N(0,<br />

Sj is the j-th estimated moment of et<br />

"<br />

6 ∗ Im 0<br />

0 24∗ Im<br />

-Regressêt on y 2 t−1 ,logyt−1, etc. Check significance.<br />

#<br />

)


• Stationarity is violated<br />

Example 19 Great Moderation.<br />

- Changes in the variance of the process are continuous. Can’t really use<br />

subsample analysis.<br />

- There exist a version of the Wold theorem without covariance stationarity.<br />

where var(et) =Σt.<br />

y † t = aty−∞ +<br />

∞X<br />

Djtet−j<br />

j=0<br />

• Usetimevaryingcoefficients <strong>VAR</strong>s with e.g. stochastic volatility


• Small Scale <strong>VAR</strong>s. People use them because:<br />

a) Estimates more precise<br />

b) Easier to identify shocks. But generate:<br />

- Omitted variables, Braun-Mittnik (1993).<br />

- Misaggregation of shocks, Cooley-Dweyer (1998), <strong>Canova</strong>-Pina (2006).


What is the consequence of omitting variables?<br />

In a bivariate <strong>VAR</strong>(q):<br />

A11(�)<br />

A21(�)<br />

A12(�)<br />

A22(�)<br />

ate representation for y1t is<br />

"<br />

#"<br />

y1t<br />

y2t<br />

#<br />

=<br />

"<br />

e1t<br />

e2t<br />

#<br />

, the univari-<br />

[A11(�) − A12(�)A22(�) −1 A21(�)]y1t = e1t − A12(�)A22(�) −1 e2t ≡ υt<br />

(45)<br />

Example 20 Suppose m =4, estimate bivariate <strong>VAR</strong>; three possible mod-<br />

els. The system with variables 1 <strong>and</strong> 3 has errors<br />

Ψ(�)Φ(�)<br />

Φ(�) =<br />

"<br />

"<br />

e2t<br />

e4t<br />

#<br />

where Ψ(�) =<br />

A22(�) A24(�)<br />

A42(�) A44(�)<br />

"<br />

A12(�) A14(�)<br />

# −1<br />

A32(�) A34(�)<br />

.Easytoverifythat:<br />

#<br />

"<br />

υ1t<br />

υ2t<br />

#<br />

≡<br />

"<br />

e1t<br />

e3t<br />

#<br />


• Atruem variables <strong>VAR</strong>(1), is transformed into a <strong>VAR</strong>(∞) with disturbance<br />

υt if only m1


What is the problem of omitting shocks?<br />

Aggregation theorem (Faust <strong>and</strong> Leeper (1997)): Structural MA for a<br />

partition with m1


If there are m a shocks of one type <strong>and</strong> m b shocks of another type, m a +<br />

m b = m <strong>and</strong> m1 =2. Then<br />

• eit,i=1, 2 recovers a linear combination of shocks of type i 0 = a, b only<br />

if D ‡ (�) isblockdiagonal.<br />

• eit,i = 1, 2 recovers a linear combination of current shocks of type<br />

i 0 = a, b only if D ‡ (�) =D ‡ , ∀� <strong>and</strong> block diagonal.


Example 21 Suppose m =4,m1 =2,m2 =2.Then<br />

⎡<br />

⎣ D‡<br />

11<br />

D ‡<br />

21<br />

(�) D‡<br />

(�) D‡<br />

12<br />

22<br />

(�) D‡<br />

13 (�) D‡<br />

14 (�)<br />

(�) D‡<br />

23 (�) D‡<br />

24 (�)<br />

⎡<br />

⎤<br />

⎢<br />

⎦ ⎢<br />

⎣<br />

�1t<br />

�2t<br />

�3t<br />

�4t<br />

⎤<br />

⎥<br />

⎦ =<br />

- e1t recovers type 1 shocks if D ‡<br />

13 (�) =D‡ 14 (�) =0<strong>and</strong> e2t recovers type<br />

2shocksifD ‡<br />

21 (�) =D‡ 22 (�) =0.<br />

- e1t recovers current type 1 shocks if D ‡<br />

ii 0(�) =D ‡<br />

ii 0, ∀� i,i 0 =1, 2.<br />

"<br />

e1t<br />

e2t<br />

#


• Non-Wold decompositions (Lippi-Reichlin (1994), Leeper (1991), Hansen-<br />

Sargent (1991)). Certain economic <strong>models</strong> do not have a fundamental<br />

MAR representation.<br />

e.g. Diffusion <strong>models</strong>; <strong>models</strong> where agents anticipate tax changes.<br />

Example 22 Hall consumption/saving problem.<br />

Assume yt = et a white noise. Assume β = R −1 < 1 <strong>and</strong> quadratic<br />

preferences. Solution for consumption: ct = ct−1 +(1− R −1 )et. No<br />

problem!!<br />

If we only observe saving out of labor income st = yt − ct, the solution is<br />

st − st−1 = R −1 et − et−1<br />

(49)<br />

(49) is non-fundamental: the coefficient on et less than the coefficient on<br />

et−1


Estimate<br />

st − st−1 = ut − R −1 ut−1<br />

Different shapes!! Same autocovariance generating function.<br />

• Relationship DSGE <strong>models</strong> <strong>and</strong> <strong>VAR</strong>s<br />

Log-linearized solution of a DSGE model is of the form:<br />

(50)<br />

y2t = A22(θ)y2t−1 + A21(θ)y3t (51)<br />

y1t = A11(θ)y2t−1 + A12(θ)y3t (52)<br />

y2t = states <strong>and</strong> the driving forces, y1t = controls, y3t shocks.


-Ifbothy2t <strong>and</strong> y1t are observables DSGE is a restricted <strong>VAR</strong>(1))<br />

-Ify2 are omitted, what is the representation of y1t?<br />

• Three alternative results for reduced systems with only y1t<br />

Atrue<strong>VAR</strong>(p)modelistransformedineithera<strong>VAR</strong>(∞) or<strong>VAR</strong>MA(p-<br />

1,p-1) or a <strong>VAR</strong>MA(p,p) depending on the assumptions made.


Example 23 Suppose<br />

yt = kt + et (53)<br />

kt = a1kt−1 + a0�t (54)<br />

a1 persistence, a0 contemporaneous effect.If we observe both yt <strong>and</strong> kt<br />

restricted <strong>VAR</strong>(1). No problem.<br />

If only yt is observable<br />

1−a1�<br />

1+a0−a1� yt = et or<br />

yt = a0<br />

1+a0<br />

X<br />

j<br />

( a1<br />

1+a0<br />

) j yt−j + et<br />

(55)<br />

If a0 is small <strong>and</strong> a1 high ( a1<br />

1+a0 )j will be large even for large j. Need<br />

very long lag length to whiten residuals.If long run restrictions are used,<br />

potentially important truncation bias.


Summary<br />

- A system with reduced number of variables needs a very generous lag<br />

length to approximate the dynamics of the true model.<br />

-Ifsamplesizeisshortthiscouldbeaproblem.<br />

- Omitting a ”state” much more important than omitting a ”control”.<br />

- Omission does not matter very much if the true model has dynamics<br />

which die out quickly.<br />

Chari. Kehoe, McGrattan (2006), Christiano, Eichenbaum <strong>and</strong> Vigfusson<br />

(2006), Fern<strong>and</strong>ez et al.(2007), Ravenna (2007).


10 Exercises<br />

1) Take quarterly data for output growth <strong>and</strong> inflation for your favorite country. Identify<br />

supply <strong>and</strong> dem<strong>and</strong> shocks by finding all the rotations which satisfy the following restric-<br />

tions: supply ∆y ↑,Inf ↓, dem<strong>and</strong>∆y ↑,Inf ↑. How do impulse responses produced<br />

by the rotations that jointly satisfy the restrictions look like? How do they compare with<br />

those obtained using the restriction that only supply shocks affect output in the long run,<br />

butbothdem<strong>and</strong><strong>and</strong>supplyshockscanaffect inflation in the long run?<br />

2) Consider the following New Keynesian model<br />

xt = Etxt+1 + 1<br />

ϕ (it − Etπt+1)+v1t<br />

(56)<br />

πt = βπt+1 + κxt + v2t (57)<br />

it = φrit−1 +(1−φr)(φππt + φxxt)+v3t (58)<br />

where xt is the output gap, πt the inflation rate <strong>and</strong> Rt the nominal interest rate.


i) Plot impulse responses to the three shocks.<br />

ii) Simulate 11000 data points from this model after you have set ϕ =1.5, β =0.99,<br />

κ =0.8,φr =0.6,φπ =1.2,φx =0.2 ρ1 =0.9,ρ2 =0.9, σ1 = σ2 = σ3 =0.1, <strong>and</strong> discard<br />

the first 1000. With the remaining data estimate a three variable <strong>VAR</strong>. In particular<br />

(i) estimate the lag length optimally, (ii) check if the model you have selected have well<br />

specified residuals <strong>and</strong> (iii) whether you detect breaks in the specification or not.<br />

iii) With the estimated model apply a Choleski decomposition in the order (y, π, R) <strong>and</strong><br />

check how the impulse responses compare with the true ones in i). Is there any noticeable<br />

difference. Why?<br />

iv) Now try the ordering (R, y, π) doyounoticeanydifference with iii)? Why?<br />

3) Obtain data for output <strong>and</strong> hours (employment) for your favorite country - each group<br />

should use a different country. Construct a measure of labor productivity <strong>and</strong> run a <strong>VAR</strong><br />

on labor productivity <strong>and</strong> hours as in Gali (1999, AER) after you have appropriately selected<br />

the statistical nature of the model. Identify technology shocks as the only source<br />

of labor productivity in the long run. How much of the fluctuations in hours <strong>and</strong> labor


productivity do they explain at the 4 years horizon? Repeat the exercise using the restric-<br />

tion that in response to technology shocks output <strong>and</strong> labor productivity must increase<br />

contemporaneously. Are technology shocks a major source of cyclical fluctuations?


Appendix: Inference in S<strong>VAR</strong>s<br />

- This applies to Choleski, Non-recursive, Long run identifications<br />

- This applies to Classical or Bayesian inference (flat prior)<br />

• Find the maximum likelihood estimators of Aj <strong>and</strong> A0 (this is enough to find the mode<br />

of Aj).<br />

• Find the posterior distribution of Aj <strong>and</strong> A0 (to get the posterior of Aj).<br />

If prior on Aj, A0 is non-informative <strong>and</strong> data abundant, shape of the likelihood is the<br />

same as the shape of the posterior. In the other cases Bayesian analysis is different from<br />

classical one.


Assume Σ� = I. The likelihood of the S<strong>VAR</strong> is<br />

L(Aj, A0|y) ∝ |A −1<br />

0 A−10<br />

0 | −0.5T<br />

exp{0.5 X<br />

(yt − A(�)yt−1)<br />

t<br />

0 (A −1<br />

0 A−10<br />

0 )−1 (yt − A(�)yt−1)} (59)<br />

= |A0| T exp{0.5 X<br />

(yt − A(�)yt−1) 0 (A −1<br />

0 A−10<br />

0 )−1 (yt − A(�)yt−1)}<br />

t<br />

If there are no restrictions on Aj, A(�)ML = A(�)OLS <strong>and</strong> var(A(�)ML) =A −1<br />

0 A−10<br />

0 ⊗<br />

(Y 0<br />

t−1 Yt−1) −1 , Yt−1 =[yt−1,...,yt−p]. Nice, because easy to compute.<br />

(60)


Using the estimator of A(�)ML into the likelihood we have:<br />

L(A(�) =A(�)ML, A0|y) ∝ |A0| T exp{0.5tr(SMLA 0 0A0)} (61)<br />

where SML =(yt − A(�)MLyt−1) 0 (yt − A(�)MLyt−1)/T − k, k is number of regressors in<br />

each equation, tr isthetraceofthematrix.<br />

Conclusion: (Two step approach):<br />

a) Find A(�)ML.<br />

b) Maximize (61) to find A0.<br />

c) Use Aj = AjA0 to trace out structural dynamics.<br />

Typically difficult to maximize analytically, need numerical routines (both for likelihood<br />

<strong>and</strong> posterior computations).


Note if instead of conditioning on A(�)ML we integrate it out we have:<br />

L(A0|y) ∝ |A0| T −k exp{0.5tr(SMLA 0 0A0)}<br />

so if g(A0)| ∝ |A0| k , g(A0|y) ∝ L(A0|y, A(�) =A(�)ML).<br />

• Bayesian analysis with flat priors equivalent to classical analysis conditional on A(�)ML.


Summary<br />

• Choleski identification, no restrictions on the <strong>VAR</strong>. Maximization of (61) implies that<br />

A0A 0 0 =(SML/T ) −1 .HenceÂ0 = chol((SML/T ) −0.5 ). Nice shortcut.<br />

• Non-recursive identification, no restrictions on the <strong>VAR</strong>. Need to maximize (61) (no<br />

short cuts possible).<br />

• Long run restrictions. Note that A(1) −1 = A(1) −1A −1<br />

0 . If A(1)−1 is lower triangular,<br />

A0 can be found using A0A0 0 =(SML/T ) −1 where<br />

A(1) −1<br />

MLA−1 0 is lower triangular. Solution is<br />

Â0 = chol(A(1) −1<br />

ML SML/T A(1) −10<br />

ML )−1 A(1) −1<br />

ML<br />

• If the long run system is not recursive, solution more complicated.<br />

• If the system is over-identified, can’t use a two step approach. Need to jointly maximize<br />

the likelihood function with respect to Aj, A0.


• With sign restrictions no maximization needed. Find the region of the parameter space<br />

which satisfies the restrictions. Can do this numerically using a version of an acceptance<br />

sampling algorithm .


Monte Carlo st<strong>and</strong>ard errors for impulse responses<br />

If prior on Aj, A0 is non-informative posterior is proportional to the likelihood. The<br />

likelihood of the <strong>VAR</strong> is the product of a normal for A(�), conditional on A(�)ML <strong>and</strong><br />

Σ −1 <strong>and</strong> a Wishart for Σ −1 . Then the algorithm works as follows (Choleski system):<br />

Algorithm 10.1<br />

1. Draw Σ −1 from a Wishart, conditional on the data.<br />

2. Set A l 0 = chol(Σ−1 ) l .<br />

3. Draw A(�) l from a Normal with mean A(�)ML <strong>and</strong> variance (A l 0 )−1 (A l 0 )−10<br />

⊗ (Y 0<br />

t−1 Yt−1) −1 .


4. Set A(�) l = A(�) l A l 0 . Compute (A(�)l ) −1 (the MA of the model).<br />

5. Repeat steps 1.-4. L times. Order draws <strong>and</strong> compute percentiles.


If the restrictions are not in Choleski format, substitute step 2. with the maximization of<br />

the likelihood of A0. If the system is overidentified, need to use another approach (see<br />

chapter 10). For long run restrictions use:<br />

Algorithm 10.2<br />

1. Draw Σ −1 from a Wishart, conditional on the data.<br />

2. Set A l 0<br />

= chol(A(1)−1<br />

ML Σl A(1) −10<br />

ML )−1 A(1) −1<br />

ML .<br />

3. Draw A(�) l from a Normal with mean A(�)ML <strong>and</strong> variance (A l 0 )−1 (A l 0 )−10<br />

⊗ (Y 0<br />

t−1 Yt−1) −1 .<br />

4. Repeat steps 1.-4. L times. Order draws <strong>and</strong> compute percentiles.


For a system where sign restrictions are imposed the approach is easy. Just need draws<br />

for Σ <strong>and</strong> A0. Thealgorithmis:<br />

Algorithm 10.3<br />

1. Choose a H such that HH 0 = I.<br />

2. Draw Σ −1 from a Wishart, conditional on the data.<br />

3. Set A l 0 = Hsqrt(Σl ).<br />

4. Draw A(�) l from a Normal with mean A(�)ML <strong>and</strong> variance (A l 0 )−1 HH 0 (A l 0 )−10<br />

⊗ (Y 0<br />

t−1 Yt−1) −1 .


5. Set A(�) l = A(�) l A l 0 . Compute (A(�)l ) −1 (the MA of the model). If column i of<br />

A(�) l ) −1 satisfies the sign restriction, keep draw otherwise throw it.<br />

6. Repeat steps 1.-5. until L draws are obtained. Order draws <strong>and</strong> compute median, mode,<br />

mean, percentiles, etc.<br />

• Could also r<strong>and</strong>omize on H. ManyH such that HH 0 = I. Could have a prior on Hs.<br />

Since H does not enter the likelihood, posterior of H =priorofH.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!