VAR models Fabio Canova ICREA-UPF, AMeN and CEPR March ...

VAR models 

Fabio Canova 

ICREA-UPF, AMeN and CEPR 

March 2009

Outline 

• Wold Theorem and VAR Specification 

• Coefficients and covariance matrix estimation 

• Computing impulse responses, variance and historical decompositions. 

• Structural VARs. 

• Interpretation problems.

References 

Hamilton, J., (1994), Time Series Analysis, Princeton University Press, Princeton, NJ, 

ch.10-11. 

Canova, F., (1995), ”VAR Models: Specification, Estimation, Inference and Forecasting”, 

in H. Pesaran and M. Wickens, eds., Handbook of Applied Econometrics, Ch.2,Blackwell, 

Oxford, UK. 

Canova, F., (1995), ”The Economics of VAR Models”, in K. Hoover, ed., Macroecono- 

metrics: Tensions and Prospects, KluwerPress,NY,NY. 

Blanchard, O. and Quah, D. (1989), ”The Dynamic Effect of Aggregate Demand and 

Supply Disturbances”, American Economic Review, 79, 655-673. 

Canova, F. and Pina, J. (2005) ”Monetary Policy Misspecification in VAR models”, in C. 

Diebolt, and Krystou, C. (eds.) New Trends In Macroeconomics, Springer Verlag.

Canova, F. and De Nicolo, G (2002), ” Money Matters for Business Cycle Fluctuations 

in the G7”, Journal of Monetary Economics, 49, 1131-1159. 

Cooley, T. and Dwyer, M. (1998), ”Business Cycle Analysis without much Theory: A 

Look at Structural VARs, Journal of Econometrics, 83, 57-88. 

Faust, J. (1998), ” On the Robustness of Identified VAR Conclusions about Money” , 

Carnegie-Rochester Conference Series on Public Policy, 49, 207-244. 

Faust, J. and Leeper, E. (1997), ”Do Long Run Restrictions Really Identify Anything?”, 

Journal of Business and Economic Statistics, 15, 345-353. 

Hansen, L. and Sargent, T., (1991), ”Two Difficulties in Interpreting Vector Autoregressions”, 

in Hansen, L. and Sargent, T., (eds.), Rational Expectations Econometrics, 

Westview Press: Boulder London. 

Kilian, L. (1998), ”Small Sample confidence Intervals for Impulse Response Functions”, 

Review of Economics and Statistics, 218-230.

Lippi, M. and Reichlin, L., (1993), ”The Dynamic Effect of Aggregate Demand and 

Supply Disturbances: A Comment”, American Economic Review, 83, 644-652. 

Lippi, M. and Reichlin, L., (1994), ”VAR Analysis, Non-Fundamental Representation, 

Blaschke Matrices”, Journal of Econometrics, 63, 307- 325. 

Marcet, A. (1991), ”Time Aggregation of Econometric Time Series ”, in Hansen, L. and 

Sargent, T., (eds.), Rational Expectations Econometrics, WestviewPress: Boulder& 

London. 

Sims, C. and Zha, T. (1999), “Error Bands for Impulse Responses”, Econometrica, 67, 

1113-1155. 

Sims, C., Stock, J. and Watson, M. (1990), ”Inference in Linear Time Series Models with 

some unit roots”, Econometrica, 58, 113-144. 

Chari, V., Kehoe, P. and McGrattan, E. (2004) A critique of Structural VARs using 

Business cycle theory, Fed of Minneapolis, working paper 631.

Fernandez Villaverde, J., Rubio Ramirez, J., Sargent, T. and M. Watson (2007) The ABC 

and (D’s) to understand VARs, American Economic Review. 

Uhlig, H. (2005) What are the Effects of Monetary Policy? Results from an agnostic 

Identification procedure, it Journal of Monetary Economics. 

Erceg, C, Guerrieri, L. and Gust, C. (2005) Can long run restrictions identify technology 

shocks?, Journal of the European Economic Association. 

Giordani, P. (2004) ” An Alternative Explanation of the Price Puzzle”, Journal of Mone- 

tary Economics, 51, 1271-1296. 

Dedola, L. and Neri, S. (2007), ”What does a technology shock do? A VAR analysis with 

model-basedsignrestrictions”,Journal of Monetary Economics.

1 Preliminary 

• Lag Operator: �yt = yt−1; � i yt = yt−i, whereyt m × 1 vector. 

• Matrix lag Operator (a0 = I normalization): 

a(�)yt ≡ a0yt + a1�yt + a2� 2 yt + .......aq� q yt 

= yt + a1yt−1 + a2yt−2 + .......aqyt−q (1) 

Example 1 yt = et + d1et−1 + d2et−2. Using the lag operator yt = et + 

d1�et + d2� 2 et or yt =(1+d1� + d2� 2 )et ≡ d(�)et. 

Example 2 yt = a1yt−1 + et. Using the lag operator yt = a1�yt + et or 

yt(1 − a1�) =et or a(�)yt = et.

2 What are VARs? 

- They are multivariate autoregressive linear time series models of the form 

yt = A1yt−1 + A2yt−2 + ...+ Aqyt−q + et et ∼ (0, Σe) (2) 

where yt is a m × 1 vector and Aj are m × m matrices each j =1,...q. 

Advantages: 

- Every variable is interdependent and endogenous. 

-Anyyt has a autoregressive representation under some conditions. 

- Simple to use and estimate.

Disadvantages : 

- VAR is a reduced form model; no economic interpretation is possible. 

- Potentially difficult to relate VAR dynamics with DSGE dynamics.

3 Woldtheoremandthenews 

Wold Theorem: Underlinearity and stationarity, any vector of time series 

y † t canbewrittenasy† t = ay−∞ + P ∞ j=0 Djet−j, wherey−∞ contains 

constants, et−j are the news at t − j, Dj are m × m matrices each j, and 

a is a m × k matrix of coefficients. 

-Letyt ≡ y † t − ay−∞. Wold theorem tells us that, apart from initial 

conditions, time series are the accumulation over time of news. 

- A news et =1attimet has D0 effect on yt, D1 effect on yt+1, D2 

on yt+2, etc.. Henceyt is a moving average (MA) of the news, i.e. yt = 

D(�)et.

Two issues 

a) If Ft−1 is the information available at t − 1, the news are 

et = yt − E[yt|Ft−1] (3) 

• The news are unpredictable given the past (E(et|Ft−1) =0),butcontemporaneously 

correlated (et ∼ (0, Σe)). 

To give a name the news in each equation, need to find a matrix ˜P such 

that ˜P ˜P 0 = Σe. Then: 

yt = D(�) ˜P ˜P −1 et = ˜D(�)˜et ˜et ∼ (0, ˜P −1 Σe 

˜P −10 

= I) (4) 

Examples of ˜P: Choleski (lower triangular) factor; ˜P = PΛ 0.5 ;whereP is 

the eigenvector matrix, Λ the eigenvalue matrix, etc.

Example 3 If Σe = 

˜P −1 et ∼ (0,I). 

" 

1 4 

4 25 

b) The news are not uniquely defined. 

In fact, for any H such that HH 0 = I 

and E(et,e 0 t )=E(˜et, ˜e 0 t ). 

# 

its Choleski factor is ˜P = 

yt = D(�)et = D(�)HH 0 et = ˜D(�)˜et 

" 

1 4 

0 3 

# 

so that 

(5)

• Standard packages choose the ”fundamental” news representation: i.e. 

the one for which D0 is the largest among all the Dj coefficients. 

• Some economic models imply non-fundamental representations (e.g. 

models where news are anticipated)- see later on.

VARs 

• If the Dj coefficients decay to zero fast enough, D(�) isinvertible and 

where I − A(�) =D(�) −1 . 

yt = D(�)et 

D(�) −1 yt = et 

yt = A(�)yt−1 + et (6) 

• AVAR(∞) can represents any vector of time series yt under linearity, 

stationarity and invertibility. 

• A VAR(q), q fixed, approximates yt well if Dj are close to zero for j 

large.

Summary 

- We can represent any data with a linear VAR(∞) under the assumptions 

made. 

-Withafinite sample of data need to carefully check the lag length of the 

VAR (news can’t be predictable). 

- If we want a constant coefficient representation, we need stationarity of 

yt.

4 Specification 

Many ways of choosing the lag length: 

A) Likelihood ratio (LR) test 

LR = 2[lnL(α un , Σ un 

e ) − ln L(α re , Σ re 

e )] (7) 

= T (ln |Σ re 

e | − ln |Σun e |) D → χ 2 (ν) (8) 

where L is the likelihood function, ”UN”(”RE”) denotes the unrestricted 

(restricted) estimator, ν = number of restrictions of the form R(α) =0. 

• LR test biased in small samples. If T small, use 

LR c =(T − qm)(ln |Σ re | − ln |Σ un |) 

where q = number of lags, m =numberofvariables.

• Sequential testing approach 

1) Choose an upper ¯q 

2) Test VAR(¯q − 1) against VAR(¯q), if not reject 

3) Test VAR(¯q − 2) against VAR(¯q − 1) 

4) Continue until rejection. 

ML ratio is an in-sample criteria. What if we are interested in out-of-sample 

forecasting exercises?

Let Σy(1) = 

T +mq 

T Σe. 

B) AIC criterion: minq AIC(q) =ln|Σy(1)|(q)+ 2qm2 

T 

• AIC is inconsistent. It overestimates true order q with positive probability. 

C) HQC criterion: minq HQC(q) =ln|Σy(1)|(q)+(2qm2 ) 

ln ln T 

T 

• HQC is consistent (in probability). 

D) SWC criterion: minq SWC(q) =ln|Σy(1)|(q)+(qm2 ) 

ln T 

T 

• SWC is strongly consistent (in a.s.).

• Criteria B)-D) trade-off the fit ofthemodel(thesizeofΣe) withthe 

number of parameters of the model m∗q for a given sample size T . Hence 

criteria B)-D) prefer smaller to larger scale models. 

Criterion T=40 T=80 T=120 T=200 

q=2q=4q=6q=2q=4q=6q=2 q=4 q=6 q=2q=4q=6 

AIC 1.6 3.2 4.8 0.8 1.6 2.4 0.53 1.06 1.6 0.32 0.64 0.96 

HQC 0.52 4.17 6.26 1.18 2.36 3.54 0.83 1.67 2.50 0.53 1.06 1.6 

SWC 2.95 5.9 8.85 1.75 3.5 5.25 1.27 2.55 3.83 0.84 1.69 2.52 

Table 1: Penalties of AIC, HQC, SWC, m=4 

- Penalties increase with q and fall with T . Penalty of SWC is the harshest. 

- Ivanov and Kilian (2006): Quality of B)-D) depends on the frequency of 

data and on the DGP. Typically HQC more appropriate.

• Criteria A)-D) must be applied to the system not to single equations. 

Example 4 VAR for the Euro area, 1980:1-1999:4; use output, prices, interest 

rates and M3, set ¯q =7. 

Hypothesis LR LR c q AIC HQC SWC 

q=6 vs. q=72.9314e-5(∗) 0.0447 7 -7.556 -6.335 -4.482 

q=5 vs. q=6 3.6400e-4 0.1171 6 -7.413 -6.394 -4.851 

q=4 vs. q=5 0.0509 0.5833 5 -7.494 -6.675 -5.437 

q=3 vs. q=4 0.0182 0.4374 4 -7.522 -6.905 -5.972 

q=2 vs. q=3 0.0919 0.6770 3-7.635(∗)-7.219(∗) -6.591 

q=1 vs. q=2 3.0242e-7 6.8182e-3(∗)2 -7.226 -7.012 -6.689(∗) 

Table 2: Tests for the Lag length of a VAR 

• Different criteria choose different lag lenghts.

Checking Stationarity 

All variable stationary/ all unit roots → easy. 

Some cointegration. Transform VAR into VECM. 

• Impose cointegration restrictions. 

• Disregard cointegration restrictions. 

Data are stationary. Can’t see it because of small samples. 

If Bayesian: stationarity/nonstationarity issue does not matter for inference.

Checking for Breaks 

Wald test: yt =(A1(�)I1)yt−1 +(A2(�)I2)yt−1 + et 

I1 =0fort ≤ t1; I1 =1fort>t1 and I2 =1− I1. 

Use S(t1,T)=T (ln |Σre e | − ln |Σun e |) D → χ2 (ν); ν = dim(A1(�)) (Andrew 

and Ploberger (1994)). 

If t1 unknown, but belongs [t l ,t u ] compute S(t1,T)forallthet1 in the 

interval. Check for breaks using maxt1 S(t1,T).

5 Alternative Representation of VAR(q) 

Consider 

yt, et m × 1 vectors; et ∼ (0, Σe). 

yt = A(�)yt−1 + et 

Different representation useful for different purposes. 

- Companion form useful for computing moments, ML estimators. 

- Simultaneous equation useful for evaluating the likelihood and computing 

restricted estimates. 

(9)

5.1 Companion form 

• Transform a m-variable VAR(p) into a mp-variable VAR(1). 

Example 5 Consider a VAR(3). Let Yt =[yt,yt−1,yt−2] 0 ; Et =[et, 0, 0] 0 ; 

and 

A = 

⎡ 

⎢ 

⎣ 

A1 A2 A3 

Im 0 0 

0 Im 0 

Then the VAR(3) can be rewritten as 

⎤ 

⎥ 

⎦ Σ E = 

⎡ 

⎢ 

⎣ 

Σe 0 0 

0 0 0 

0 0 0 

Yt = AYt−1 + Et Et ∼ N(0, Σ E) (10) 

where Yt, Et are 3m × 1 vectors and A is 3m × 3m. 

⎤ 

⎥ 

⎦

5.2 Simultaneous equations setup (SES) 

There are two alternative representations: 

1) Let xt =[yt−1,yt−2,...]; X =[x1,...,x T ] 0 (a T × mq matrix), Y = 

[y1,...,y T ] 0 (a T ×m matrix); and if A =[A 0 1 ,...A0 q] 0 is a mq ×m matrix 

Y = XA + E (11) 

2) Let i indicate the subscript for the i − th column vector. The equation 

for variable i is yi = xαi + ei. Stacking the columns of yi,ei into where 

mT × 1 vectors we have 

y = (Im ⊗ x)α + e ≡ Xα + e (12)

6 Parameters and covariance matrix estimation 

6.1 Unrestricted VAR(q) 

Assume that y−q+1,...,y0 are known and et ∼ N(0, Σe) then 

where A 0 1 

yt|(yt−1,...,y0,y−1,y−q+1) ∼ N(A(�)yt−1, Σe) (13) 

∼ N(A 0 1 Yt−1, Σe) (14) 

is the first row of A (m × mq). Let α = vec(A1).

Since f(yt|yt−1,...,y−q+1) = Q 

j f(yj|yj−1,...,y−q+1) ln L(α, Σe) = X 

ln L(yj|yj−1,...,y−q+1) 

Setting 

∂ ln L(α,Σe) 

∂α 

A 0 1,ML 

j 

= − Tm 

2 ln(2π)+T 

2 

− 1 X 

2 

t 

=0wehave 

= [ 

andj-thcolumn(a1× mq vector) is 

A 0 1j,ML =[X 

t 

ln |Σ−1 

e | 

(yt − A 0 1 Yt−1) 0 Σ −1 

e (yt − A 0 1 Y j−1) (15) 

TX 

Yt−1Y 

t=1 

0 t−1 ]−1 TX 

[ Yt−1y 

t=1 

0 t ]=A01,OLS Yt−1Y0 t−1 ]−1 TX 

[ Yt−1yjt] =A 

t=1 

0 1j,OLS 

(16)

Why is OLS equivalent to maximum likelihood? 

- Because, if the initial conditions are known, maximizing the log-likelihood 

is equivalent to minimizing the sum of square errors! 

Why is it that single equation OLS is the same as full information maximum 

likelihood? 

- Because we have the same regressors in every equation!

Plugging A 1,ML into ln L(α, Σe), we obtain the concentrated likelihood 

ln L(Σe) = − T 

2 

1 

(m ln(2π)+ln|Σ−1 e |) − 

2 

t=1 

where et,ML =(yt− A1,MLYt−1). Using ∂(b0Qb) we have 

and σ i,i 0 = 1 T 

∂ ln L(Σe) 

∂Σ 

Σ 0 ML 6= Σ0 OLS 

= T 2 Σ0 e − 1 2 

Σ 0 ML 

TX 

∂Q = b0b; PTt=1 et,MLe0 t,ML =0or 

= 1 

T 

P Tt=1 e i 0 t,ML e 0 it,ML . 

= 1 

T −1 

P Tt=1 e t,MLe 0 t,ML 

TX 

et,MLe t=1 

0 t,ML 

e 0 t,ML Σ−1 

e e t,ML (17) 

∂ ln |Q| 

∂Q =(Q0 ) −1 

butequivalentforlargeT . 

(18)

6.2 VAR(q) with restrictions 

Assume restrictions are of the form α = Rθ+r, whereR is mk×k1 matrix 

of rank k1; r is a mk × 1 vector; θ a k1 × 1vector. 

Example 6 i) Lag restrictions: Aq =0.Herek1 = m2 (q − 1), r =0,and 

R =[Im1 , 0]. 

ii) Block exogeneity of y2t in a bivariate VAR(2). Here 

R = blockdiag[R1,R2], whereRi, i =1, 2 is upper triangular. 

iii) Cointegration restrictions.

Plugging the restrictions in (12) we have 

y =(Im ⊗ x)α + e =(Im ⊗ x)(Rθ + r)+e 

Let y † ≡ y − (I ⊗ x)r =(I ⊗ x)Rθ + e. Since 

∂ ln L 

∂θ 

= R∂ ln L 

∂α : 

θML = [R 0 (Σ −1 

e ⊗ x 0 x)R] −1 R[Σ −1 

e ⊗ x]y † 

(19) 

αML Σ 

= RθML + r (20) 

0 e = 1 X 

eMLe T 

0 ML 

(21) 

t

• For a VAR(q) without restrictions: 

Summary 

-MLandOLSestimatorsofA1 coincide. 

- OLS estimation of A1, equation by equation, is consistent and efficient 

(if assumptions are correct). 

- OLS and ML estimators of Σe asymptotically coincide for large T .

• For a VAR(q) with restrictions: 

-MLestimatorofA1 is different from the OLS estimator. 

-MLisconsistent/efficient if restrictions are true. It is inconsistent if 

restrictions are false. 

In general: 

- OLS consistent if stationarity assumption is wrong (t-tests incorrect). 

- OLS inconsistent if lag length wrong (regressors correlated with error 

term).

7 Summarizing the results 

Unusual to report estimates if VAR coefficients, standard errors and R 2 . 

-MostofVARcoefficients insignificant. 

- R 2 always exceeds 0.99. 

How do we summarize results in an informative way?

7.1 Impulse responses (IR) 

• What is the effect of a surprise cut in interest rates on inflation? 

• It traces out the MAR of yt. 

Three ways to calculate impulse responses: 

- Recursive approach. 

- Non-recursive approach. 

- Forecast revisions.

• Recursive method. 

Assume we have an estimate Aj. ThenDτ =[D i,i0 

τ ]= Pmax[τ,q] 

j=1 

where τ refers to the horizon, D0 = I, Dj =0∀ j ≥ q. 

Aτ−jDj, 

Example 7 Suppose yt =¯y + A1yt−1 + A2yt−2 + et. Then applying the 

formula we have D0 = I, D1 = D0A1, D2 = D1A1 + D0A2, ..., 

D k = D k−1A1 + D k−2A2 + ...+ D k−qAq. 

For orthogonal news: ˜Pe ˜P 0 e = Σe then ˜D k = D k ˜Pe.

Sometimes useful to calculate multipliers to the news. 

• Long run multiplier D(1) = (A0 + A1 + ...+ Aq) −1 

• Partial multipliers, up to horizon τ, are( P τ j=0 Aj) −1 .

7.2 Variance decomposition: τ-steps ahead forecast error 

• How much of the variance of, say, output is due to supply shocks? 

Uses: 

yt+τ − yt(τ) = 

τ−1 

X 

j=0 

yt(τ) istheτ-steps ahead prediction of yt. 

˜Dj˜et+τ−j D0 = I (22) 

Computes share of the variance of yi,t+τ − yi,t(τ) due to each ˜e i 0 ,t+τ−j , 

i, i 0 =1, 2,...,m.

7.3 Historical decomposition 

• What is the contribution of supply shocks to the productivity revival of 

the late 1990s? 

Let ˆyi,t(τ) =yi,t+τ − yi,t(τ) betheτ-steps ahead forecast error in the i-th 

variable of the VAR. Then: 

ˆyi,t(τ) = 

mX 

i 0 =1 

˜D i0 

(�)˜e i 0 t+τ 

- Computes the path of ˆyi,t(τ) due to each ˜e i 0. 

(23) 

• Same ingredients are needed to compute impulse responses, the variance 

and the historical decompositions. Different packaging!!

Example 8 US data for (Y,π, R, M1) for 1973:1-1993:12. Othogonalize 

using a Choleski decomposition. What is the effect of a money shock? 

1.0 

0.8 

0.6 

0.4 

0.2 

0.0 

-0.2 

-0.4 

response to a shock in money 

0 5 10 15 20 

gnp 

prices 

interest 

money

What is the contribution of various shocks to var(y) and var(π)? 

Y π 

HorizonShock1Shock2Shock3Shock4Shock1Shock2Shock3Shock4 

4 0.99 0.001 0.003 0.001 0.07 0.86 0.01 0.03 

12 0.93 0.01 0.039 0.02 0.24 0.60 0.08 0.07 

24 0.79 0.01 0.15 0.04 0.52 0.36 0.07 0.04 

Table 3: Variance decomposition, percentages

Historical decomposition of GDP, conditional on 1989 information. 

4.55 

4.50 

4.45 

4.40 

4.35 

4.30 

4.25 

4.20 

4.55 

4.50 

4.45 

4.40 

4.35 

4.30 

4.25 

4.20 

Shocks in gnp 

1975 1976 1977 1978 

Shocks in prices 

Historical decomposition of gnp 

variable 

baseline 

shocks 

variable 

baseline 

shocks 

1975 1976 1977 1978 

4.55 

4.50 

4.45 

4.40 

4.35 

4.30 

4.25 

4.20 

4.55 

4.50 

4.45 

4.40 

4.35 

4.30 

4.25 

4.20 

Shocks in interest 

variable 

baseline 

shocks 

1975 1976 1977 1978 

Shocks in money 

variable 

baseline 

shocks 

1975 1976 1977 1978

8 Identification: Obtaining SVARs 

8.1 Why Structural VARs 

VARs are reduced form models. Therefore: 

• Shocks are linear combination of meaningful economic disturbances. 

• Difficult to relate responses computed from VARs with responses of 

theoretical models. 

• Can’t be used for policy analyses (Lucas critique).

What is a SVAR? It is a linear dynamic structural model of the form: 

Its reduced form is: 

A0yt = A1yt−1 + ...+ Aqyt−q + εt εt ∼ (0, Σε) (24) 

yt = A1yt−1 + ...+ Aqyt−q + et et ∼ (0, Σe) (25) 

where Aj = AjA −1 

0 , et = A −1 

0 εt. 

We want to go from (25) to (24), since (25) is easy to estimate (just use 

OLS equation by equation). To do this, we need A0. But to estimate it, 

we need restrictions, since Aj, Σe have less free parameters than A0, Σε. 

Distinguish: Stationary vs. Nonstationary VARs.

8.2 Stationary VARs 

VAR : yt = A(�)yt−1 + et et ∼ (0, Σe) (26) 

SVAR : A0yt = A(�)yt−1 + �t �t ∼ (0, Σ� = diag{σi}) (27) 

Log linearized DSGE models are stationary SVARs! We know 

y2t = A22y2t−1 + A21y3t (28) 

y1t = A11y2t−1 + A12y3t (29) 

where y2t are states, y1t are controls, y3t are shocks. So 

" 

A21 

A0 = 

0 

# −1 

0 

, 

A12 

" 

A21 A(�) = 

0 

# −1 " 

0 

A12 

A22 0 

A11 0 

#

(26) and (27) imply 

so that 

A0et = �t 

(30) 

A −1 

0 Σ�A 0 −1 

0 = Σe (31) 

To recover structural parameters from (31) we need at least as many 

equations as unknowns. 

• Order condition: If there are m variables, need m(m − 1)/2 restrictions. 

This is because there are m 2 free parameters on the left hand side of (31) 

and only m(m+1)/2 parameters in Σe (m 2 = m(m+1)/2+m(m−1)/2). 

• Rank condition: rank of A −1 

0 Σ�A 0−1 0 

• Just identified vs. overidentified. 

equal to the rank of Σe.

Example 9 i) Choleski decomposition of Σe has exactly m(m − 1)/2 zeros 

restrictions. Implications: 

- A −1 

0 is lower triangular. 

-Variablei does not affect variable i − 1 simultaneously, but it affects 

variable i +1. 

ii) yt =[GDPt,Pt,it,Mt]. Thenneed6restrictions,e.g. 

⎡ 

⎢ 

⎣ 

1 0 0 0 

α01 1 0 α02 

0 0 1 α03 

α04 α05 α06 1 

⎤ 

⎥ 

⎦

How do you estimate a SVAR? Use a two-step approach: 

- Get (unrestricted) estimates of A(�) andΣe. 

- Use restrictions on A0 to estimate Σ� and free parameters of A0. 

-UseA(�) =A −1 

0 A(�) to trace out structural dynamics. 

Unless the system is in Choleski format, we need ML to estimate A0 in 

just identified systems (see appendix). 

For over-identified systems, always need ML to estimate A0.

Example 10 (Blanchard and Perotti, 2002) VAR with Tt,gt,Tt. Assume 

A0et = B�t where 

A0 = 

⎡ 

⎢ 

⎣ 

1 0 a01 

0 1 a02 

a03 a04 1 

⎤ 

⎥ 

⎦ B = 

⎡ 

⎢ 

⎣ 

1 b1 0 

b2 1 0 

0 0 1 

Impose that there is no discretionary response in Tt and gt to yt within the 

quarter (information delay). 

6+3 (variance) parameters, at most 6 parameters in Σe. Need additional 

restrictions. Get information about a01,a02 from external sources: impose 

either b1 =0or b2 =0 

With a01,a02 fixed, Two stage approach has a IV interpretation: �1t,�2t 

used a instruments in third equation. 

⎤ 

⎥ 

⎦

8.3 Nonstationary VARs 

Let VAR and SVAR be: 

∆yt = D(�)et = D(1)et + D ∗ (�)∆et (32) 

∆yt = D(�)A0�t = D(�)(1)A0�t + D ∗ (�)A0∆�t (33) 

where D(�) =(I−A(�)�) −1 , D(�) =(1−A(�)�) −1 ,D∗ (�) ≡ D(�)−D(1) 

1−� , 

D∗ (�) ≡ D(�)−D(1) 

1−� .Matchingcoefficients: D(�)A0�t = D(�)et. 

Separating permanent and transitory components and using for the latter 

only contemporaneous restrictions we have 

D(1)A0�t = D(1)et (34) 

A0∆�t = ∆et (35) 

If yt is stationary, D(1) = D(1) = 0 and (34) is vacuous.

Two types of restrictions to estimate A0: short and long run. 

Example 11 In a VAR(2)imposing (34) requires one restriction. Suppose 

that D(1) 12 =0(�2t has no long run effect on y1t). If Σ� = I, thethree 

elements of D(1)A0Σ�A 0 0 D(1)0 can be obtained from the Choleski factor 

of D(1)ΣeD(1) 0 . 

• Blanchard-Quah: decomposition in permanent-transitory components 

(use (34)-(35)). If yt =[∆y1t,y2t], (m × 1); y1t are I(1); y2t are I(0) and 

yt =¯y + D(�)�t, where�t ∼ iid(0, Σ�) 

µ ∆y1t 

∆y2t 

and D1(1) = [1, 0]. 

 

= 

µ ¯y1 

0 

 

+ 

µ D1(1) 

0 

 

�t + 

µ (1 − �)D † 

1 (�) 

(1 − �)D † 

2 (�) 

-y2t could be any variable which is stationary and is influenced by both 

shocks 

 

�t 

(36)

-Choleskisystems 

Example 12 

Problems with standard identification: 

pt = a11e s t (37) 

yt = a21e s t + a21e d t (38) 

Price is set in advance of knowing demand shocks. Choleski ordering with 

p first. 

This is equivalent to estimating p on lagged p and lagged y (this gives e s t ) 

andthenestimatingy on lagged y, oncurrentandlaggedp (this gives e s t ).

yt = a11e s t (39) 

pt = a21e s t + a21e d t (40) 

Quantity is set in advance of knowing demand shocks. Choleski ordering 

with y first. 

This is equivalent to estimating y on lagged y and lagged p (this gives e s t ) 

andthenestimatingp on lagged p, on current and lagged y (this gives e s t ). 

In general without a structural model in mind difficult to interpret Choleski 

systems. Cooley-LeRoy (1985): unless some strong restrictions are imposed 

dynamic models do not have a Choleski structure.

- Long run restrictions Faust-Leeper (1997). 

1.2 

0.8 

0.4 

-0.0 

-0.4 

-0.8 

-1.2 

Restriction not satified 

Long run restrictions 

5 10 15 20 

1.2 

0.8 

0.4 

-0.0 

-0.4 

-0.8 

-1.2 

5 10 15 20 

Restriction satified

- Long run restrictions Cooley-Dweyer (1998): take a RBC driven by a 

unit root technology shock. Simulate data. Run a VAR with(yt,nt) and 

identify two shocks (permanent/transitory).Possible to do this. Transitory 

shocksexplainlargeportionofvarianceofyt. 

- Long run restrictions Erceg, et. al (2005): long run restrictions poor in 

small samples. Chari, et. al. (2006) potentially important truncation bias 

due to a VAR(q) q finite.

- Short run restrictions (Canova-Pina (2006)) 

The DGP is a 3 equations New-Keynesian model 

True responses Inertial responses

Summary 

- Problematic to relate SVAR identified with Choleski, Short or Long restriction 

to theories. 

- Solution link more SVAR to theory: use restrictions which are more 

common in DSGE models

8.4 Alternative identification Scheme 

Canova-De Nicolo’ (2002), Faust (1998), Uhlig (1999): use sign (and 

shape) restrictions. 

Example 13 i) Aggregate supply shocks: Y ↑, Inf ↓; aggregate demand 

shocks: Y ↑, Inf ↑ → demand and supply shocks impose different 

sign restrictions on cov(Yt,INFs). Restrictions shared by a large class of 

models with different foundations. Use these for identification. 

ii) Monetary Shocks: response of Y is humped shaped, dies out in 3-4 

quarters → shape restrictions on cov(Yt,is). Use these for identification.

Exploit the non-uniqueness of news. 

- Given any set of orthogonal news, check if responses of yit to shocks εjt 

have the right sign. If not 

- Construct another set of news and repeat the exercise. 

-Stopwhenyoufind a εjt with the right characteristics or 

-Takealltherepresentationssatisfyingtherestrictionsandcomputethe 

mean/ median (and s.e.) of the statistics of interest for all of those satisfying 

the restrictions.

Implementation of sign restrictions (Canova-De Nicolo’(2002)): 

• Orthogonalize Σe = ˜P ˜P 0 (e.g. Choleski or eigenvalue-eigenvector decomposition). 

• Check if any shock produces the correlation pattern for (yit, y i 0 t ). If not 

• For any H : HH 0 = I, Σe = ˜PHH 0 ˜P 0 = ˆP ˆP 0 . 

• Check if any shock under new decomposition produced the required 

correlation pattern for (yit, y i 0 t ). If not choose another H, etc.

• Number of H infinite. Write H = H(ω), ω ∈ (0, 2π). H(ω) are called 

rotation (Givens) matrices. 

Example 14 Suppose M=2. Then H(ω) = 

" 

# 

" 

cos(ω) −sin(ω) 

sin(ω) cos(ω) 

# 

or H(ω) = 

cos(ω) 

sin(ω) 

sin(ω) 

−cos(ω) 

. Varying ω, we trace out all possible structural MA 

representations that could have generated the data.

Example 15 Comparing responses to US moneary shocks 1964-2001. 

Prices 

Output 

Money 

10 

-5 

-10 

-15 

5 

0 

-6 

-12 

10 

-10 

-20 

-30 

6 

0 

0 

Sign restrictions 

0 

0 

0 

Horizon (Months) 

12 

6 

0 

-6 

10 

5 

0 

-5 

18 

9 

0 

-9 

Choleski restrictions 

0 

0 

0 

Horizon (Months)

Example 16 Studying the effects of fiscal shocks in US states: 1950-2005. 

corr(G,Y)corr(T,Y)corr(G, DEF)corr(T,DEF)corr(G,T) 

Gshocks > 0 > 0 > 0 

BB shocks < 0 = 0 = 1 

Tax shocks < 0 < 0 = 0 

Table 4: Identification restrictions

8.5 Sign restrictions in large systems 

• The use of rotation matrices is complicated in large scale systems since 

there are many rotations one needs to consider. 

Algorithm 8.1 1. Start from some orthogonal representation yt = D(�)�t 

2. Draw an m × m matrix G from N(0,1). Find G = QR. 

3. Compute responses as D 0 (�) =D(�) ∗ Q. Check if restrictions are satisfied. 

4. Repeat 2.-3. until L draws are found.

9 Interpretation problems with VARs 

• Time Aggregation (Sargent-Hansen (1991), Marcet (1991)). 

- Agents take decisions at a frequency which is different than the frequency 

of the data available to the econometrician. What are the consequences? 

- The MA representation of the econometrician is a complex combination 

of the MA representation due to agents’ actions.

Example 17 A humped-shaped monthly response can be transformed into 

a smoothly declining quarterly response. 

Size of responses 

1.4 

1.2 

1.0 

0.8 

0.6 

0.4 

0.2 

0.0 

-0.2 

monthly 

quarterly 

0 2 4 6 

Horizon (Months) 

8 10 12 14 

How to detect aggregation problems? Run a VAR with data at different 

frequencies, if you can. Check if differences exists.

• Non-linearities 

Example 18 (Markov switching model). Suppose P (st =1|st−1 =1)= 

p, P (st =0|st−1 =0)=q. This process has a linear VAR representation 

st =(1− q)+(p + q − 1)st−1 + et 

andaslongaseitherp or q or both are less than one a MAR exist. Good!!! 

But: errors are non-normal (binomial). Conditional on st−1 =1 

Conditional on st−1 =0 

et = 1−p with probability p (41) 

= −p with probability 1 − p (42) 

et = −(1 − q) with probability q (43) 

= q with probability 1 − q (44)

• How do you check for normality/ nonlinearities 

-Ifet is normal: 

T 0.5 

" 

S3 

S4 − 3 ∗ Im 

# 

∼ N(0, 

Sj is the j-th estimated moment of et 

" 

6 ∗ Im 0 

0 24∗ Im 

-Regressêt on y 2 t−1 ,logyt−1, etc. Check significance. 

# 

)

• Stationarity is violated 

Example 19 Great Moderation. 

- Changes in the variance of the process are continuous. Can’t really use 

subsample analysis. 

- There exist a version of the Wold theorem without covariance stationarity. 

where var(et) =Σt. 

y † t = aty−∞ + 

∞X 

Djtet−j 

j=0 

• Usetimevaryingcoefficients VARs with e.g. stochastic volatility

• Small Scale VARs. People use them because: 

a) Estimates more precise 

b) Easier to identify shocks. But generate: 

- Omitted variables, Braun-Mittnik (1993). 

- Misaggregation of shocks, Cooley-Dweyer (1998), Canova-Pina (2006).

What is the consequence of omitting variables? 

In a bivariate VAR(q): 

A11(�) 

A21(�) 

A12(�) 

A22(�) 

ate representation for y1t is 

" 

#" 

y1t 

y2t 

# 

= 

" 

e1t 

e2t 

# 

, the univari- 

[A11(�) − A12(�)A22(�) −1 A21(�)]y1t = e1t − A12(�)A22(�) −1 e2t ≡ υt 

(45) 

Example 20 Suppose m =4, estimate bivariate VAR; three possible mod- 

els. The system with variables 1 and 3 has errors 

Ψ(�)Φ(�) 

Φ(�) = 

" 

" 

e2t 

e4t 

# 

where Ψ(�) = 

A22(�) A24(�) 

A42(�) A44(�) 

" 

A12(�) A14(�) 

# −1 

A32(�) A34(�) 

.Easytoverifythat: 

# 

" 

υ1t 

υ2t 

# 

≡ 

" 

e1t 

e3t 

# 

−

• Atruem variables VAR(1), is transformed into a VAR(∞) with disturbance 

υt if only m1

What is the problem of omitting shocks? 

Aggregation theorem (Faust and Leeper (1997)): Structural MA for a 

partition with m1

If there are m a shocks of one type and m b shocks of another type, m a + 

m b = m and m1 =2. Then 

• eit,i=1, 2 recovers a linear combination of shocks of type i 0 = a, b only 

if D ‡ (�) isblockdiagonal. 

• eit,i = 1, 2 recovers a linear combination of current shocks of type 

i 0 = a, b only if D ‡ (�) =D ‡ , ∀� and block diagonal.

Example 21 Suppose m =4,m1 =2,m2 =2.Then 

⎡ 

⎣ D‡ 

11 

D ‡ 

21 

(�) D‡ 

(�) D‡ 

12 

22 

(�) D‡ 

13 (�) D‡ 

14 (�) 

(�) D‡ 

23 (�) D‡ 

24 (�) 

⎡ 

⎤ 

⎢ 

⎦ ⎢ 

⎣ 

�1t 

�2t 

�3t 

�4t 

⎤ 

⎥ 

⎦ = 

- e1t recovers type 1 shocks if D ‡ 

13 (�) =D‡ 14 (�) =0and e2t recovers type 

2shocksifD ‡ 

21 (�) =D‡ 22 (�) =0. 

- e1t recovers current type 1 shocks if D ‡ 

ii 0(�) =D ‡ 

ii 0, ∀� i,i 0 =1, 2. 

" 

e1t 

e2t 

#

• Non-Wold decompositions (Lippi-Reichlin (1994), Leeper (1991), Hansen- 

Sargent (1991)). Certain economic models do not have a fundamental 

MAR representation. 

e.g. Diffusion models; models where agents anticipate tax changes. 

Example 22 Hall consumption/saving problem. 

Assume yt = et a white noise. Assume β = R −1 < 1 and quadratic 

preferences. Solution for consumption: ct = ct−1 +(1− R −1 )et. No 

problem!! 

If we only observe saving out of labor income st = yt − ct, the solution is 

st − st−1 = R −1 et − et−1 

(49) 

(49) is non-fundamental: the coefficient on et less than the coefficient on 

et−1

Estimate 

st − st−1 = ut − R −1 ut−1 

Different shapes!! Same autocovariance generating function. 

• Relationship DSGE models and VARs 

Log-linearized solution of a DSGE model is of the form: 

(50) 

y2t = A22(θ)y2t−1 + A21(θ)y3t (51) 

y1t = A11(θ)y2t−1 + A12(θ)y3t (52) 

y2t = states and the driving forces, y1t = controls, y3t shocks.

-Ifbothy2t and y1t are observables DSGE is a restricted VAR(1)) 

-Ify2 are omitted, what is the representation of y1t? 

• Three alternative results for reduced systems with only y1t 

AtrueVAR(p)modelistransformedineitheraVAR(∞) orVARMA(p- 

1,p-1) or a VARMA(p,p) depending on the assumptions made.

Example 23 Suppose 

yt = kt + et (53) 

kt = a1kt−1 + a0�t (54) 

a1 persistence, a0 contemporaneous effect.If we observe both yt and kt 

restricted VAR(1). No problem. 

If only yt is observable 

1−a1� 

1+a0−a1� yt = et or 

yt = a0 

1+a0 

X 

j 

( a1 

1+a0 

) j yt−j + et 

(55) 

If a0 is small and a1 high ( a1 

1+a0 )j will be large even for large j. Need 

very long lag length to whiten residuals.If long run restrictions are used, 

potentially important truncation bias.

Summary 

- A system with reduced number of variables needs a very generous lag 

length to approximate the dynamics of the true model. 

-Ifsamplesizeisshortthiscouldbeaproblem. 

- Omitting a ”state” much more important than omitting a ”control”. 

- Omission does not matter very much if the true model has dynamics 

which die out quickly. 

Chari. Kehoe, McGrattan (2006), Christiano, Eichenbaum and Vigfusson 

(2006), Fernandez et al.(2007), Ravenna (2007).

10 Exercises 

1) Take quarterly data for output growth and inflation for your favorite country. Identify 

supply and demand shocks by finding all the rotations which satisfy the following restric- 

tions: supply ∆y ↑,Inf ↓, demand∆y ↑,Inf ↑. How do impulse responses produced 

by the rotations that jointly satisfy the restrictions look like? How do they compare with 

those obtained using the restriction that only supply shocks affect output in the long run, 

butbothdemandandsupplyshockscanaffect inflation in the long run? 

2) Consider the following New Keynesian model 

xt = Etxt+1 + 1 

ϕ (it − Etπt+1)+v1t 

(56) 

πt = βπt+1 + κxt + v2t (57) 

it = φrit−1 +(1−φr)(φππt + φxxt)+v3t (58) 

where xt is the output gap, πt the inflation rate and Rt the nominal interest rate.

i) Plot impulse responses to the three shocks. 

ii) Simulate 11000 data points from this model after you have set ϕ =1.5, β =0.99, 

κ =0.8,φr =0.6,φπ =1.2,φx =0.2 ρ1 =0.9,ρ2 =0.9, σ1 = σ2 = σ3 =0.1, and discard 

the first 1000. With the remaining data estimate a three variable VAR. In particular 

(i) estimate the lag length optimally, (ii) check if the model you have selected have well 

specified residuals and (iii) whether you detect breaks in the specification or not. 

iii) With the estimated model apply a Choleski decomposition in the order (y, π, R) and 

check how the impulse responses compare with the true ones in i). Is there any noticeable 

difference. Why? 

iv) Now try the ordering (R, y, π) doyounoticeanydifference with iii)? Why? 

3) Obtain data for output and hours (employment) for your favorite country - each group 

should use a different country. Construct a measure of labor productivity and run a VAR 

on labor productivity and hours as in Gali (1999, AER) after you have appropriately selected 

the statistical nature of the model. Identify technology shocks as the only source 

of labor productivity in the long run. How much of the fluctuations in hours and labor

productivity do they explain at the 4 years horizon? Repeat the exercise using the restric- 

tion that in response to technology shocks output and labor productivity must increase 

contemporaneously. Are technology shocks a major source of cyclical fluctuations?

Appendix: Inference in SVARs 

- This applies to Choleski, Non-recursive, Long run identifications 

- This applies to Classical or Bayesian inference (flat prior) 

• Find the maximum likelihood estimators of Aj and A0 (this is enough to find the mode 

of Aj). 

• Find the posterior distribution of Aj and A0 (to get the posterior of Aj). 

If prior on Aj, A0 is non-informative and data abundant, shape of the likelihood is the 

same as the shape of the posterior. In the other cases Bayesian analysis is different from 

classical one.

Assume Σ� = I. The likelihood of the SVAR is 

L(Aj, A0|y) ∝ |A −1 

0 A−10 

0 | −0.5T 

exp{0.5 X 

(yt − A(�)yt−1) 

t 

0 (A −1 

0 A−10 

0 )−1 (yt − A(�)yt−1)} (59) 

= |A0| T exp{0.5 X 

(yt − A(�)yt−1) 0 (A −1 

0 A−10 

0 )−1 (yt − A(�)yt−1)} 

t 

If there are no restrictions on Aj, A(�)ML = A(�)OLS and var(A(�)ML) =A −1 

0 A−10 

0 ⊗ 

(Y 0 

t−1 Yt−1) −1 , Yt−1 =[yt−1,...,yt−p]. Nice, because easy to compute. 

(60)

Using the estimator of A(�)ML into the likelihood we have: 

L(A(�) =A(�)ML, A0|y) ∝ |A0| T exp{0.5tr(SMLA 0 0A0)} (61) 

where SML =(yt − A(�)MLyt−1) 0 (yt − A(�)MLyt−1)/T − k, k is number of regressors in 

each equation, tr isthetraceofthematrix. 

Conclusion: (Two step approach): 

a) Find A(�)ML. 

b) Maximize (61) to find A0. 

c) Use Aj = AjA0 to trace out structural dynamics. 

Typically difficult to maximize analytically, need numerical routines (both for likelihood 

and posterior computations).

Note if instead of conditioning on A(�)ML we integrate it out we have: 

L(A0|y) ∝ |A0| T −k exp{0.5tr(SMLA 0 0A0)} 

so if g(A0)| ∝ |A0| k , g(A0|y) ∝ L(A0|y, A(�) =A(�)ML). 

• Bayesian analysis with flat priors equivalent to classical analysis conditional on A(�)ML.

Summary 

• Choleski identification, no restrictions on the VAR. Maximization of (61) implies that 

A0A 0 0 =(SML/T ) −1 .HenceÂ0 = chol((SML/T ) −0.5 ). Nice shortcut. 

• Non-recursive identification, no restrictions on the VAR. Need to maximize (61) (no 

short cuts possible). 

• Long run restrictions. Note that A(1) −1 = A(1) −1A −1 

0 . If A(1)−1 is lower triangular, 

A0 can be found using A0A0 0 =(SML/T ) −1 where 

A(1) −1 

MLA−1 0 is lower triangular. Solution is 

Â0 = chol(A(1) −1 

ML SML/T A(1) −10 

ML )−1 A(1) −1 

ML 

• If the long run system is not recursive, solution more complicated. 

• If the system is over-identified, can’t use a two step approach. Need to jointly maximize 

the likelihood function with respect to Aj, A0.

• With sign restrictions no maximization needed. Find the region of the parameter space 

which satisfies the restrictions. Can do this numerically using a version of an acceptance 

sampling algorithm .

Monte Carlo standard errors for impulse responses 

If prior on Aj, A0 is non-informative posterior is proportional to the likelihood. The 

likelihood of the VAR is the product of a normal for A(�), conditional on A(�)ML and 

Σ −1 and a Wishart for Σ −1 . Then the algorithm works as follows (Choleski system): 

Algorithm 10.1 

1. Draw Σ −1 from a Wishart, conditional on the data. 

2. Set A l 0 = chol(Σ−1 ) l . 

3. Draw A(�) l from a Normal with mean A(�)ML and variance (A l 0 )−1 (A l 0 )−10 

⊗ (Y 0 

t−1 Yt−1) −1 .

4. Set A(�) l = A(�) l A l 0 . Compute (A(�)l ) −1 (the MA of the model). 

5. Repeat steps 1.-4. L times. Order draws and compute percentiles.

If the restrictions are not in Choleski format, substitute step 2. with the maximization of 

the likelihood of A0. If the system is overidentified, need to use another approach (see 

chapter 10). For long run restrictions use: 



2. Set A l 0 

= chol(A(1)−1 

ML Σl A(1) −10 

ML )−1 A(1) −1 

ML . 

3. Draw A(�) l from a Normal with mean A(�)ML and variance (A l 0 )−1 (A l 0 )−10 

⊗ (Y 0 

t−1 Yt−1) −1 . 

4. Repeat steps 1.-4. L times. Order draws and compute percentiles.

For a system where sign restrictions are imposed the approach is easy. Just need draws 

for Σ and A0. Thealgorithmis: 


1. Choose a H such that HH 0 = I. 


3. Set A l 0 = Hsqrt(Σl ). 

4. Draw A(�) l from a Normal with mean A(�)ML and variance (A l 0 )−1 HH 0 (A l 0 )−10 

⊗ (Y 0 

t−1 Yt−1) −1 .

5. Set A(�) l = A(�) l A l 0 . Compute (A(�)l ) −1 (the MA of the model). If column i of 

A(�) l ) −1 satisfies the sign restriction, keep draw otherwise throw it. 

6. Repeat steps 1.-5. until L draws are obtained. Order draws and compute median, mode, 

mean, percentiles, etc. 

• Could also randomize on H. ManyH such that HH 0 = I. Could have a prior on Hs. 

Since H does not enter the likelihood, posterior of H =priorofH.

VAR models Fabio Canova ICREA-UPF, AMeN and CEPR March ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?