24.03.2015 Views

Bayesian Inference in the Seemingly Unrelated Regressions Model

Bayesian Inference in the Seemingly Unrelated Regressions Model

Bayesian Inference in the Seemingly Unrelated Regressions Model

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Bayesian</strong> <strong>Inference</strong> <strong>in</strong> <strong>the</strong> Seem<strong>in</strong>gly <strong>Unrelated</strong> <strong>Regressions</strong> <strong>Model</strong><br />

William E Griffiths*<br />

Economics Department<br />

University of Melbourne<br />

Vic. 3010<br />

Australia<br />

April 18, 2001<br />

Chapter prepared for Computer-Aided Econometrics, edited by David Giles, to be<br />

published by Marcel Dekker.<br />

* I am grateful to Michael Chua for research assistance, and to Denzil Fiebig and<br />

Chris O’Donnell for comments on an earlier draft.


1<br />

I. INTRODUCTION<br />

Zellner’s idea of comb<strong>in</strong><strong>in</strong>g several equations <strong>in</strong>to one model to improve estimation<br />

efficiency (Zellner 1962) ranks as one of <strong>the</strong> most successful and last<strong>in</strong>g <strong>in</strong>novations<br />

<strong>in</strong> <strong>the</strong> history of econometrics. The result<strong>in</strong>g seem<strong>in</strong>gly unrelated regressions (SUR)<br />

model has generated a wealth of both <strong>the</strong>oretical and empirical contributions.<br />

Reviews of work on or <strong>in</strong>volv<strong>in</strong>g <strong>the</strong> SUR model can be found <strong>in</strong> Srivastava and<br />

Dwivedi (1979), Judge et al (1985), Srivastava and Giles (1987) and Fiebig (2001). It<br />

was also Zellner (<strong>in</strong> Zellner 1971) who popularised <strong>Bayesian</strong> <strong>in</strong>ference <strong>in</strong><br />

econometrics generally and described <strong>the</strong> SUR model with<strong>in</strong> <strong>the</strong> context of <strong>Bayesian</strong><br />

<strong>in</strong>ference. However, at that time, convenient methods for deriv<strong>in</strong>g or estimat<strong>in</strong>g<br />

marg<strong>in</strong>al posterior density functions and moments for <strong>in</strong>dividual SUR coefficients<br />

were not generally available. Subsequently, analytical results were derived for some<br />

special cases (Drèze and Morales 1976, Richard and Tompa 1980, Richard and Steel<br />

1988, Steel 1992) and importance sampl<strong>in</strong>g was suggested as a means for estimat<strong>in</strong>g<br />

marg<strong>in</strong>al posterior density functions and <strong>the</strong>ir moments (Kloek and van Dijk 1978).<br />

More recently, <strong>the</strong> application of Markov Cha<strong>in</strong> Monte Carlo (MCMC) methodology<br />

to <strong>Bayesian</strong> <strong>in</strong>ference has made available a new range of numerical methods that<br />

make <strong>Bayesian</strong> estimation of <strong>the</strong> SUR model more convenient and accessible. The<br />

literature on MCMC is extensive; for a general appreciation of its scope and purpose,<br />

see Tierney (1994), Albert and Chib (1996), Chen et al (2000), Chib and Greenberg<br />

(1996), Gilks et al (1996), Tanner (1996), and <strong>the</strong> chapter by Geweke et al (this<br />

volume). For application of MCMC to <strong>the</strong> SUR model, see, for example, Percy<br />

(1992, 1996), Chib and Greenberg (1995), Griffiths and Chotikapanich (1997) and<br />

Griffiths et al (2000).


2<br />

The objective of this chapter is to provide a practical guide to computer-aided<br />

<strong>Bayesian</strong> <strong>in</strong>ference for a variety of problems that arise <strong>in</strong> applications of <strong>the</strong> SUR<br />

model. We describe examples of problems, models and algorithms that have been<br />

placed with<strong>in</strong> a general framework <strong>in</strong> <strong>the</strong> chapter by Geweke et al (this volume); our<br />

chapter can be viewed as complimentary to that chapter. The model is described <strong>in</strong><br />

Section II; <strong>the</strong> jo<strong>in</strong>t, conditional and marg<strong>in</strong>al posterior density functions that result<br />

from a non<strong>in</strong>formative prior are derived. In Section III we describe how to use<br />

sample draws of parameters from <strong>the</strong>ir posterior densities to estimate posterior<br />

quantities of <strong>in</strong>terest; two Gibbs sampl<strong>in</strong>g algorithms and a Metropolis-Hast<strong>in</strong>gs<br />

algorithm are given. Modifications necessary for nonl<strong>in</strong>ear equations, equality<br />

restrictions and <strong>in</strong>equality restrictions are presented <strong>in</strong> Sections IV, V and VI,<br />

respectively. Three applications are described <strong>in</strong> Section VII. Section VIII conta<strong>in</strong>s<br />

methodology for forecast<strong>in</strong>g. Some extensions are briefly mentioned <strong>in</strong> Section IX<br />

and a few conclud<strong>in</strong>g remarks are given <strong>in</strong> Section X.<br />

II.<br />

MODEL SPECIFICATION AND POSTERIORS FROM A<br />

NONINFORMATIVE PRIOR<br />

Consider M equations written as<br />

y = X β + e i = 1,2,..., M<br />

(1)<br />

i i i i<br />

where<br />

y i is a T-dimensional vector of observations on a dependent variable,<br />

X i is a<br />

( T × K i ) matrix of observations on K i nonstochastic explanatory variables, possibly<br />

<strong>in</strong>clud<strong>in</strong>g a constant term, β i is a<br />

Ki<br />

-dimensional vector of unknown coefficients that<br />

we wish to estimate, and e i is a T-dimensional unobserved random vector. The M<br />

equations can be comb<strong>in</strong>ed <strong>in</strong>to one big model written as


3<br />

⎡ y1 ⎤ ⎡X1 ⎤⎡β1 ⎤ ⎡ e1<br />

⎤<br />

⎢<br />

y<br />

⎥ ⎢<br />

2 X<br />

⎥⎢<br />

2 β<br />

⎥ ⎢<br />

2 e<br />

⎥<br />

⎢ ⎥<br />

2<br />

= ⎢ ⎥⎢ ⎥+<br />

⎢ ⎥<br />

⎢ ! ⎥ ⎢ " ⎥⎢ ! ⎥ ⎢ ! ⎥<br />

⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥<br />

y X β e<br />

⎣ M⎦ ⎣ M⎦⎣ M⎦ ⎣ M⎦<br />

(2)<br />

that we <strong>the</strong>n write compactly as<br />

y = Xβ<br />

+ e<br />

(3)<br />

where y is of dimension ( TM × 1), X is of dimension ( TM × K)<br />

, with K = ∑ i=<br />

1 Ki<br />

, β<br />

is ( K × 1) and e is ( TM × 1) . We assume <strong>the</strong> distribution for e is given by<br />

M<br />

e~ N(0, Σ⊗ I T )<br />

(4)<br />

Thus, <strong>the</strong> errors <strong>in</strong> each equation are homoskedastic and not autocorrelated. There is,<br />

however, contemporaneous correlation between correspond<strong>in</strong>g errors <strong>in</strong> different<br />

equations. The variance of <strong>the</strong> error of <strong>the</strong> i-th equation we denote by σ ii,<br />

<strong>the</strong> i-th<br />

diagonal element of Σ . The covariance between two correspond<strong>in</strong>g errors <strong>in</strong> different<br />

equations (say i and j), we write as σ ij,<br />

an off-diagonal element of Σ .<br />

Us<strong>in</strong>g f (.) as generic notation for a probability density function (pdf), <strong>the</strong><br />

likelihood function for<br />

β and Σ can be written as<br />

−MT<br />

2 −T<br />

2 1<br />

−1<br />

′<br />

2<br />

f( y| β, Σ ) = (2 π) Σ exp{ − ( y− Xβ) ( Σ ⊗I )( y− Xβ )} (5)<br />

T<br />

This pdf can also be written as<br />

−MT<br />

2 −T<br />

2 1 −1<br />

A<br />

2<br />

f( y| βΣ , ) = (2 π) Σ exp{ − tr( Σ )}<br />

(6)<br />

where A is an ( M × M)<br />

matrix with ( i, j)<br />

-th element given by<br />

[ A] = ( y − X β )( ′ y − X β )<br />

(7)<br />

ij i i i j j j<br />

Note that A can also be written as


4<br />

A Y X B ′ Y X B<br />

= ( −<br />

*<br />

)( −<br />

*<br />

)<br />

(8)<br />

where Y is <strong>the</strong> ( T × M)<br />

matrix Y = ( y1, y2,..., y M ) , X<br />

*<br />

is <strong>the</strong> ( T × K)<br />

matrix<br />

X*<br />

X1 X2<br />

X M<br />

= ( , ,..., ) , and B is <strong>the</strong> ( K× M)<br />

matrix<br />

⎡β1<br />

⎤<br />

⎢<br />

⎥<br />

2<br />

B = β<br />

⎢<br />

⎥<br />

⎢ " ⎥<br />

⎢<br />

⎥<br />

⎣ βM<br />

⎦<br />

(9)<br />

Result (9) on page 42 of Lütkepohl (1996) can be used to establish <strong>the</strong> equivalence of<br />

equations (5) and (6). Specifically,<br />

tr[( Y − X B) ( Y − X B) Σ ] = [vec( Y − X B)] ′[ Σ ⊗I ]vec( Y − X B)<br />

* * −1 * −1<br />

′<br />

*<br />

T<br />

−1<br />

= ( y− Xβ)( ′ Σ ⊗I )( y− Xβ)<br />

T<br />

(10)<br />

Two prior pdfs will be considered <strong>in</strong> this chapter; <strong>the</strong>y are <strong>the</strong> conventional<br />

non<strong>in</strong>formative prior (see, for example, Zellner 1971, ch.8)<br />

− ( M + 1) 2<br />

f( βΣ , ) = f( β) f( Σ)<br />

∝ Σ (11)<br />

and ano<strong>the</strong>r prior that imposes <strong>in</strong>equality restrictions on β , but is o<strong>the</strong>rwise<br />

non<strong>in</strong>formative. The <strong>in</strong>equality prior and its consequences will be considered later <strong>in</strong><br />

<strong>the</strong> chapter. The non<strong>in</strong>formative prior <strong>in</strong> (11) is chosen to provide objectivity <strong>in</strong><br />

report<strong>in</strong>g, not because we believe total ignorance is prevalent. Geweke et al (chapter<br />

<strong>in</strong> this volume) discuss how to modify results to accommodate <strong>the</strong> prior of a specific<br />

client.<br />

A. Jo<strong>in</strong>t Posterior pdf for (β,Σ)<br />

Apply<strong>in</strong>g Bayes’ <strong>the</strong>orem to <strong>the</strong> prior pdf <strong>in</strong> (11) and <strong>the</strong> likelihood function <strong>in</strong> (5)<br />

and (6) yields <strong>the</strong> jo<strong>in</strong>t posterior pdf for<br />

β and Σ


5<br />

f( β, Σ| y) ∝ f( y| β, Σ) f( β, Σ)<br />

− ( T+ M+ 1) 2 1<br />

−1<br />

′<br />

2<br />

∝ Σ exp{ − ( y− Xβ) ( Σ ⊗I )( y− Xβ)}<br />

− ( T+ M+ 1) 2 1 −1<br />

A<br />

2<br />

= Σ exp{ − tr( Σ )}<br />

T<br />

(12)<br />

In <strong>the</strong> rema<strong>in</strong>der of this section we describe a number of marg<strong>in</strong>al and conditional<br />

posterior pdf’s that are derived from equation (12). These pdf’s will prove useful <strong>in</strong><br />

later sections when we discuss methods for estimat<strong>in</strong>g quantities of <strong>in</strong>terest. We will<br />

assume that <strong>in</strong>terest centers on <strong>in</strong>dividual coefficients, say <strong>the</strong> k-th coefficient <strong>in</strong> <strong>the</strong> i-<br />

th equation β ik , and, more generally, on some functions of <strong>the</strong> coefficients, say g( β ).<br />

Forecast<strong>in</strong>g future values y * will also be considered. The relevant pdf’s that express<br />

our uncerta<strong>in</strong> post-sample knowledge about <strong>the</strong>se quantities are <strong>the</strong> marg<strong>in</strong>al pdf’s<br />

f( | y),<br />

β f ( g( )| y)<br />

ik<br />

β and f( y*<br />

| y ), respectively. Typically, we report results by<br />

graph<strong>in</strong>g <strong>the</strong>se pdf’s, and tabulat<strong>in</strong>g <strong>the</strong>ir means, standard deviations and probabilities<br />

for regions of <strong>in</strong>terest. Describ<strong>in</strong>g <strong>the</strong> tools for do<strong>in</strong>g so is <strong>the</strong> major focus of this<br />

chapter.<br />

B. Conditional Posterior pdf for ( β | Σ<br />

)<br />

The term <strong>in</strong> <strong>the</strong> exponent of equation (12) can be written as<br />

( y− Xβ)( ′ Σ ⊗I )( y− Xβ ) = ( y− Xβˆ)( ′ Σ ⊗I )( y− Xβˆ)<br />

−1 −1<br />

T<br />

T<br />

ˆ −1<br />

+ ( β − β) ′ X′<br />

( Σ ⊗I ) X( β −βˆ)<br />

T<br />

(13)<br />

where<br />

ˆ −1<br />

1<br />

β = [ X ′(<br />

Σ<br />

−1<br />

−<br />

⊗ IT ) X ] X ′(<br />

Σ ⊗ IT<br />

) y . It follows that <strong>the</strong> conditional posterior<br />

pdf for<br />

β<br />

given Σ is <strong>the</strong> multivariate normal pdf<br />

ˆ −1<br />

f( β | Σ, y) ∝exp{ − ( β −β) ′ X′<br />

( Σ ⊗I ) X( β −β ˆ)}<br />

(14)<br />

1<br />

2<br />

T<br />

with posterior mean equal to <strong>the</strong> generalised least squares estimator


6<br />

E( β| y, Σ ) =β= ˆ [ X′ ( Σ ⊗I ) X] X′<br />

( Σ ⊗ I ) y<br />

(15)<br />

−1 −1 −1<br />

T<br />

T<br />

and posterior covariance matrix equal to<br />

−1 −1<br />

T<br />

V( β| y, Σ ) = [ X′<br />

( Σ ⊗ I ) X]<br />

(16)<br />

The last two expressions are familiar ones <strong>in</strong> sampl<strong>in</strong>g <strong>the</strong>ory <strong>in</strong>ference for <strong>the</strong> SUR<br />

model. They show that <strong>the</strong> traditional SUR estimator, written as<br />

β= ˆ [ X′ ( Σˆ ⊗I ) X] X′<br />

( Σˆ<br />

⊗ I ) y<br />

(17)<br />

−1 −1 −1<br />

T<br />

T<br />

where ˆΣ is a 2-step estimator or a maximum likelihood estimator, can be viewed as<br />

<strong>the</strong> mean of <strong>the</strong> conditional posterior pdf for<br />

β<br />

given Σ ˆ.<br />

The traditional covariance<br />

matrix estimator<br />

[ X′ ( Σˆ<br />

⊗ I ) X]<br />

can be viewed as <strong>the</strong> conditional covariance<br />

−1 −1<br />

T<br />

matrix from <strong>the</strong> same pdf. S<strong>in</strong>ce this pdf does not take <strong>in</strong>to account uncerta<strong>in</strong>ty from<br />

not know<strong>in</strong>g Σ (<strong>the</strong> fact that ˆΣ is an estimate is not recognised), it overstates <strong>the</strong><br />

reliability of our <strong>in</strong>formation about β . This dilemma was noted by Fiebig and Kim<br />

(2000) <strong>in</strong> <strong>the</strong> context of an <strong>in</strong>creas<strong>in</strong>g number of equations.<br />

C. Marg<strong>in</strong>al Posterior pdf for β<br />

The more appropriate representation of our uncerta<strong>in</strong>ty about β is <strong>the</strong> marg<strong>in</strong>al<br />

posterior pdf f ( β | y)<br />

. It can be shown that this pdf is given by<br />

f( β | y) = f( β, Σ| y)<br />

dΣ<br />

∝<br />

∫<br />

T 2<br />

A −<br />

(18)<br />

The <strong>in</strong>tegral <strong>in</strong> (18) is performed by us<strong>in</strong>g properties of <strong>the</strong> <strong>in</strong>verted Wishart<br />

distribution (see, for example, Zellner 1971, p.395). For<br />

2<br />

f( β| y)<br />

∝ A −T<br />

to be


7<br />

proper, we require T ≥ M + rank( X*<br />

) (Griffiths et al 2001). Also, this pdf is not of a<br />

standard recognisable form. Except for special cases, analytical expressions for its<br />

normalis<strong>in</strong>g constant and moments are not available. Estimat<strong>in</strong>g <strong>the</strong>se moments, and<br />

marg<strong>in</strong>al pdf’s for <strong>in</strong>dividual coefficients β ik<br />

we describe some more pdf’s that will prove to be useful.<br />

, is considered <strong>in</strong> <strong>the</strong> next section; first,<br />

D. Conditional Posterior pdf for ( β 1| β 2,..., β<br />

M )<br />

It is possible to show that <strong>the</strong> posterior pdf for <strong>the</strong> coefficient vector from one<br />

equation, conditional on those from o<strong>the</strong>r equations, is a multivariate t-distribution.<br />

To derive this result, we will consider <strong>the</strong> posterior pdf for β 1 , conditional on<br />

( β , β ,..., β ) . We write a partition of ( Y − X*<br />

B)<br />

<strong>in</strong>to its first and rema<strong>in</strong><strong>in</strong>g<br />

2 3<br />

M<br />

( M − 1) columns as<br />

( 1 1 1 (1))<br />

Y − X*<br />

B= y − X β E<br />

The correspond<strong>in</strong>g partition of A is<br />

⎡( y1− X1β1)( ′ y1− X1β1) ( y1− X1β1 )′<br />

E(1)<br />

⎤<br />

A = ⎢ ⎥<br />

⎢ E ′<br />

(1) ( y1 X1 1 ) E ′<br />

⎣<br />

− β<br />

(1) E(1)<br />

⎥⎦<br />

Us<strong>in</strong>g a result on <strong>the</strong> determ<strong>in</strong>ant of a partitioned matrix, we have<br />

−<br />

(( )( ′ ) ( )′<br />

( ′ ) )<br />

1 ( )<br />

A = E ′ E y − X β y − X β − y − X β E E E E ′ y − X β<br />

(1) (1) 1 1 1 1 1 1 1 1 1 (1) (1) (1) (1) 1 1 1<br />

Def<strong>in</strong><strong>in</strong>g<br />

Q = I − E E ′ E E ′, and<br />

−1<br />

(1) T (1) ( (1) (1) ) (1)<br />

β # = ′ ′ <strong>the</strong> second<br />

1<br />

1 ( X1 Q(1) X1)<br />

− X1 Q(1) y1<br />

term <strong>in</strong> <strong>the</strong> above equation can be written as<br />

( y − X β )′ Q ( y − X β ) = ( y − X β # )′ Q ( y − X β # ) + ( β − β # )′<br />

X ′ Q X ( β − β # )<br />

1 1 1 (1) 1 1 1 1 1 1 (1) 1 1 1 1 1 1 (1) 1 1 1


8<br />

Collect<strong>in</strong>g all <strong>the</strong>se results, substitut<strong>in</strong>g <strong>in</strong>to equation (18), and lett<strong>in</strong>g | E(1) ′ E(1)<br />

| be<br />

absorbed <strong>in</strong>to <strong>the</strong> proportionality constant, we can write<br />

⎡ ( β1−β# 1) ′ X ′<br />

1 Q(1) X1( β1−β#<br />

1)<br />

⎤<br />

f( β1| y, β2, β3,..., βM<br />

) ∝ ⎢v1+<br />

⎥<br />

2<br />

⎢<br />

s#<br />

1<br />

⎥<br />

⎣<br />

⎦<br />

− ( K + v )/2<br />

1 1<br />

(19)<br />

where v1 = T − K1<br />

and s# 1 2 = ( y1− X1β # 1 )′<br />

Q(1) ( y1− X1β<br />

# 1)/<br />

v1. Equation (19) is <strong>in</strong> <strong>the</strong><br />

form of a multivariate t-distribution with degrees of freedom v 1 , mean β # 1 , and<br />

v v − s# X ′ Q X − . See, for example, Zellner (1971,<br />

covariance matrix ( /( 2) ) 2 ( )<br />

1<br />

1 1 1 1 (1) 1<br />

p.383). The conditional posterior pdf’s for o<strong>the</strong>r β i are similarly def<strong>in</strong>ed.<br />

E. Conditional Posterior pdf for ( Σ | β<br />

)<br />

View<strong>in</strong>g <strong>the</strong> jo<strong>in</strong>t posterior pdf <strong>in</strong> equation (12) as a function of only Σ yields <strong>the</strong><br />

conditional posterior pdf for Σ given β. It is <strong>the</strong> <strong>in</strong>verted Wishart pdf (see, for<br />

example Zellner 1971, p.395)<br />

− ( T+ M+ 1) 2 1 −1<br />

2<br />

f( Σ| β, y) ∝ Σ exp{ − tr( AΣ )}<br />

(20)<br />

It has T degrees of freedom, and parameter matrix A.<br />

F. Marg<strong>in</strong>al Posterior pdf for Σ<br />

The marg<strong>in</strong>al pdf for Σ , obta<strong>in</strong>ed by us<strong>in</strong>g <strong>the</strong> result <strong>in</strong> (13), and <strong>the</strong>n us<strong>in</strong>g properties<br />

of <strong>the</strong> multivariate normal pdf to <strong>in</strong>tegrate out β , is given by<br />

∫<br />

f( Σ | y) = f( β, Σ| y)<br />

dβ<br />

−1 −1/2 − ( T+ M+<br />

1) 2 1 ˆ −1<br />

T<br />

2<br />

∝ X′ ( Σ ⊗I ) X Σ exp{ − ( y− Xβ)( ′ Σ ⊗I )( y− Xβˆ)}<br />

−1 −1/2 − ( T+ M+<br />

1) 2 1 ˆ −1<br />

T<br />

2<br />

= X′<br />

( Σ ⊗I ) X Σ exp{ − tr( AΣ<br />

)} (21)<br />

T


9<br />

where<br />

 is an ( M × M ) matrix with ( i, j)<br />

-th element given by<br />

[ Aˆ<br />

] = ( y − X βˆ<br />

)( ′ y − X βˆ<br />

)<br />

ij i i i j j j<br />

The posterior pdf <strong>in</strong> (21) is not an analytically tractable one whose moments are<br />

known. However, as we will see, we can draw observations from it us<strong>in</strong>g <strong>the</strong> Gibb’s<br />

sampler.<br />

III.<br />

ESTIMATING POSTERIOR QUANTITIES<br />

Given <strong>the</strong> <strong>in</strong>tractability of <strong>the</strong> posterior pdf<br />

2<br />

f( β| y)<br />

∝ A −T<br />

, methods for estimat<strong>in</strong>g<br />

marg<strong>in</strong>al posterior pdf’s for <strong>in</strong>dividual coefficients β ik , <strong>the</strong>ir moments, and<br />

probabilities of <strong>in</strong>terest, are required. Suppose that we have draws<br />

(1) (2) ( N )<br />

β , β ,...., β<br />

taken from f( β | y)<br />

and, possibly, draws<br />

(1) (2) ( N )<br />

Σ , Σ ,..., Σ taken from f( Σ | y)<br />

. We<br />

will describe a number of ways one can proceed to estimate <strong>the</strong> desired quantities;<br />

<strong>the</strong>n, we discuss how <strong>the</strong> required posterior draws can be obta<strong>in</strong>ed.<br />

A. Estimat<strong>in</strong>g Posterior pdf’s<br />

A simple way to estimate <strong>the</strong> marg<strong>in</strong>al posterior pdf of β ik<br />

, say, is to construct a<br />

histogram of draws of that parameter. Jo<strong>in</strong><strong>in</strong>g <strong>the</strong> mid po<strong>in</strong>ts of <strong>the</strong> histogram classes<br />

provides a cont<strong>in</strong>uous representation of <strong>the</strong> pdf, but, typically, it will be a jagged one<br />

unless some k<strong>in</strong>d of smooth<strong>in</strong>g procedure is employed. Alternatively, one can obta<strong>in</strong><br />

a smooth pdf, and a more efficient estimate, by averag<strong>in</strong>g conditional posterior pdf’s<br />

for <strong>the</strong> quantity of <strong>in</strong>terest. In this case, for conditional posterior pdf’s one can use <strong>the</strong><br />

t-distributions def<strong>in</strong>ed by (19), or, if draws on both<br />

distributions def<strong>in</strong>ed by (14).<br />

β and<br />

Σ are available, <strong>the</strong> normal<br />

Consider<strong>in</strong>g <strong>the</strong> t-distribution first, an estimate of f( β | y)<br />

is given by<br />

ik


10<br />

N<br />

( $ ) ( $ ) ( $ ) ( $ )<br />

( 1 − 1 + 1 )<br />

1<br />

fˆ( β | y ) = f β | y , β ,..., β , β ,..., β<br />

∑<br />

ik ik i i M<br />

N $ = 1<br />

( )<br />

( βik<br />

−β#<br />

$<br />

ik )<br />

N<br />

⎡<br />

1 1<br />

= c<br />

⎢<br />

∑<br />

v +<br />

N ⎢ s q<br />

⎢⎣<br />

i<br />

2( $ ) ( $ )<br />

2( $ ) ( $ )<br />

$ = 1 s#<br />

i q #<br />

() ikk<br />

i () ikk<br />

2<br />

⎤<br />

⎥<br />

⎥<br />

⎥⎦<br />

− ( v + 1)/2<br />

i<br />

(22)<br />

The univariate t-distribution that is be<strong>in</strong>g averaged <strong>in</strong> equation (22) is <strong>the</strong> conditional<br />

pdf for a s<strong>in</strong>gle coefficient from β i , obta<strong>in</strong>ed from <strong>the</strong> multivariate t-distribution <strong>in</strong><br />

(19), after generalis<strong>in</strong>g from β 1 to β i . The previously undef<strong>in</strong>ed terms <strong>in</strong> (22) are <strong>the</strong><br />

constant<br />

Γ [( v+<br />

1) / 2] v<br />

c =<br />

Γ( v /2) π<br />

v /2<br />

where Γ (.) is <strong>the</strong> gamma function, <strong>the</strong> conditional posterior mean β # ik which is <strong>the</strong> k-<br />

th element <strong>in</strong> β # i , and q () ikk that is <strong>the</strong> k-th diagonal element of<br />

′<br />

1<br />

i () i i To plot<br />

( XQ X) − .<br />

<strong>the</strong> pdf <strong>in</strong> (22), we choose a grid of values for β ik (50-100 is usually adequate), and<br />

for each value of β ik<br />

aga<strong>in</strong>st <strong>the</strong> β ik .<br />

, we compute <strong>the</strong> average <strong>in</strong> (22). These averages are plotted<br />

Alternatively, <strong>the</strong> conditional normal distributions <strong>in</strong> (14) can be averaged<br />

over Σ . In this case an estimate of <strong>the</strong> marg<strong>in</strong>al posterior pdf for β ik is given by<br />

$ = 1<br />

( $ )<br />

( ik )<br />

1<br />

fˆ( β ik | y ) = ∑ f β | y , Σ<br />

N<br />

N<br />

N<br />

1 1 1 ⎧⎪<br />

1 ( )<br />

2 ⎫<br />

exp (<br />

ˆ $ ⎪<br />

= ∑ ⎨− βik<br />

−βik<br />

) ⎬<br />

2π N<br />

⎪ ⎪⎭<br />

( $ )<br />

( $ )<br />

$ = 1 h 2h<br />

() ikk ⎩ () ikk<br />

(23)<br />

where β ˆ ik is <strong>the</strong> k-element <strong>in</strong> <strong>the</strong> i-th vector component of ˆβ (see equation(15)), and<br />

h () ikk is <strong>the</strong> k-diagonal element <strong>in</strong> <strong>the</strong> i-th diagonal block of<br />

−1 −1<br />

T<br />

[ X′ ( Σ ⊗ I ) X]<br />

(see


11<br />

equation (16)). Like <strong>in</strong> (22), <strong>the</strong> average <strong>in</strong> (23) is computed for, and plotted aga<strong>in</strong>st,<br />

a grid of values for β ik .<br />

B. Estimat<strong>in</strong>g Posterior Means and Standard Deviations<br />

Correspond<strong>in</strong>g to <strong>the</strong> three ways given for estimat<strong>in</strong>g posterior pdf’s, <strong>the</strong>re are three<br />

ways of estimat<strong>in</strong>g <strong>the</strong>ir posterior means and variances. The first way is to use <strong>the</strong><br />

sample mean and covariance matrix of <strong>the</strong> draws. That is,<br />

N<br />

1<br />

Eˆ( β | y ) = β = β<br />

( )<br />

∑ $<br />

N $ = 1<br />

(24)<br />

and<br />

N<br />

N $ = 1<br />

( ) ( )<br />

( )( )<br />

1<br />

Vˆ( β | y ) = ∑ β $ − β β $ − β ′<br />

−1<br />

(25)<br />

The second and third approaches use <strong>the</strong> results (1) an unconditional mean is<br />

equal to <strong>the</strong> mean of <strong>the</strong> conditional means, and (2) <strong>the</strong> unconditional variance is<br />

equal to <strong>the</strong> mean of <strong>the</strong> conditional variances plus <strong>the</strong> variance of <strong>the</strong> conditional<br />

means. Apply<strong>in</strong>g <strong>the</strong>se two results to <strong>the</strong> conditional posterior pdf <strong>in</strong> (19) yields<br />

1 N<br />

N<br />

( $ ) 1<br />

′<br />

( $ ) −1 ′<br />

( $ )<br />

i ∑ i ∑ i () i i i () i i i<br />

N $ = 1 N $ = 1<br />

Eˆ( β | y ) = β # = ( X Q X ) X Q y =β# (26)<br />

and<br />

# # # #<br />

( )( )<br />

⎛<br />

N<br />

N<br />

v 1 2( ) ( ) 1 1 ( ) ( )<br />

ˆ( | )<br />

i<br />

⎞ $<br />

V i y si ( X ′<br />

$<br />

i () i i )<br />

−<br />

$ $<br />

β = Q X<br />

′<br />

⎜ ⎟ ∑#<br />

+ ∑ βi − βi βi − βi<br />

⎝vi<br />

−2⎠N $ = 1 N −1$<br />

= 1<br />

(27)<br />

yields<br />

Apply<strong>in</strong>g <strong>the</strong> two results to <strong>the</strong> normal conditional posterior pdf’s <strong>in</strong> (14)


12<br />

N<br />

N<br />

1 ˆ( ) 1<br />

1( ) 1 1( )<br />

ˆ( | )<br />

$ [ (<br />

− $ ) ]<br />

−<br />

T (<br />

− $<br />

∑ ∑ ′ ′<br />

T )<br />

N $ = 1 N $ = 1<br />

E β y = β = X Σ ⊗I X X Σ ⊗ I y =βˆ<br />

(28)<br />

and<br />

( )( )<br />

N<br />

N<br />

1 −1( $ ) −1 1 ˆ( $ ) ˆ ˆ( $ )<br />

∑ ′<br />

ˆ<br />

T ∑<br />

N $ = 1 N − $ = 1<br />

′<br />

Vˆ( β | y ) = [ X ( Σ ⊗ I ) X ] + β − β β − β<br />

1<br />

(29)<br />

Clearly, us<strong>in</strong>g <strong>the</strong> sample means and standard deviations from equations (24) and (25)<br />

is much easier than us<strong>in</strong>g <strong>the</strong> conditional quantities <strong>in</strong> equations (26) through (29).<br />

However, averag<strong>in</strong>g conditional moments generally leads to more efficient estimates.<br />

C. Estimat<strong>in</strong>g Probabilities<br />

Often, we are <strong>in</strong>terested <strong>in</strong> report<strong>in</strong>g <strong>the</strong> probability that β ik lies with a particular<br />

<strong>in</strong>terval or f<strong>in</strong>d<strong>in</strong>g an <strong>in</strong>terval with a pre-specified probability content. In sampl<strong>in</strong>g<br />

<strong>the</strong>ory <strong>in</strong>ference <strong>in</strong>tervals with 95% probability content are popular. An estimate of<br />

<strong>the</strong> probability that β ik lies <strong>in</strong> a particular <strong>in</strong>terval is given by <strong>the</strong> proportion of draws<br />

that lie with<strong>in</strong> that <strong>in</strong>terval. Alternatively, one can f<strong>in</strong>d conditional probabilities and<br />

average <strong>the</strong>m, along <strong>the</strong> l<strong>in</strong>es that <strong>the</strong> conditional means are averaged <strong>in</strong> equations<br />

(26) and (28). Us<strong>in</strong>g <strong>the</strong> conditional normal distribution as an example, we can<br />

estimate <strong>the</strong> probability that β ik lies <strong>in</strong> <strong>the</strong> <strong>in</strong>terval ( a , b)<br />

as<br />

N $ = 1<br />

( $ )<br />

( ik )<br />

1<br />

N<br />

Pa ˆ( < β < b) = P a< ( β | Σ ) < b<br />

ik<br />

∑<br />

(30)<br />

Order statistics can be used to obta<strong>in</strong> an <strong>in</strong>terval with a prespecified<br />

probability content. For example, for a 95% probability <strong>in</strong>terval for β ik<br />

<strong>the</strong> 0.025 and 0.975 empirical quantiles of <strong>the</strong> draws of <strong>the</strong> β ik .<br />

, we can take


13<br />

D. Functions of β<br />

How do we proceed if we are <strong>in</strong>terested <strong>in</strong> some functions of β , say g( β )? Examples<br />

of such functions considered later <strong>in</strong> this chapter are monotonicity and curvature<br />

conditions <strong>in</strong> a cost system, and <strong>the</strong> relative magnitudes of equivalence scales <strong>in</strong><br />

household expenditure functions. Examples outside <strong>the</strong> context of SUR models are<br />

<strong>the</strong> evaluation of consumer surplus (Griffiths 1999) and <strong>the</strong> stationary region <strong>in</strong> a time<br />

series model (Geweke 1988).<br />

If it is possible to derive, analytically, conditional distributions of <strong>the</strong> form<br />

( ( β1)| , β2,<br />

, βM<br />

)<br />

f g y<br />

… or f ( g( β)| y,<br />

)<br />

Σ , <strong>the</strong>n one can work with <strong>the</strong>se conditional<br />

distributions along <strong>the</strong> l<strong>in</strong>es described above. However, <strong>the</strong> ability to proceed<br />

analytically is rare, given that g( β ) is frequently nonl<strong>in</strong>ear and of lower dimension<br />

than β . Instead, we can compute values<br />

( $ )<br />

g( β ), $ = 1,2, …,<br />

N , from <strong>the</strong> draws of β .<br />

These values can be placed <strong>in</strong> a histogram to estimate <strong>the</strong> pdf of g( β ) . Their sample<br />

mean and variance can be used to estimate <strong>the</strong> correspond<strong>in</strong>g posterior mean and<br />

variance. Probabilities can be estimated us<strong>in</strong>g <strong>the</strong> proportion of values <strong>in</strong> a given<br />

region and order statistics can be used to f<strong>in</strong>d an <strong>in</strong>terval with a given probability<br />

content.<br />

E. Gibbs Sampl<strong>in</strong>g with β and Σ<br />

We now turn to <strong>the</strong> question of how to obta<strong>in</strong> draws β and Σ from <strong>the</strong>ir respective<br />

marg<strong>in</strong>al posterior pdf”s. One possible way is to use an MCMC algorithm known as<br />

Gibbs sampl<strong>in</strong>g. In this procedure draws are made iteratively from <strong>the</strong> conditional<br />

posterior pdf”s. Specifically, given a particular start<strong>in</strong>g value for Σ , say<br />

th draw from <strong>the</strong> Gibbs sampler<br />

(0)<br />

Σ , <strong>the</strong> $ -<br />

(0) (0)<br />

( β , Σ ) is obta<strong>in</strong>ed us<strong>in</strong>g <strong>the</strong> follow<strong>in</strong>g two steps:


14<br />

1. Draw<br />

( )<br />

β $<br />

from<br />

( −1)<br />

f( β| Σ $ , y)<br />

.<br />

2. Draw<br />

( )<br />

Σ $<br />

from<br />

( )<br />

f( Σ| β $ , y)<br />

.<br />

Mak<strong>in</strong>g <strong>the</strong>se draws is straightforward, given that <strong>the</strong> two conditional posterior pdf’s<br />

are normal and <strong>in</strong>verted Wishart, respectively. (See <strong>the</strong> appendix for details.) MCMC<br />

<strong>the</strong>ory suggests that, after a sufficiently large number of draws, <strong>the</strong> Markov Cha<strong>in</strong><br />

created by <strong>the</strong> draws will converge. After convergence, <strong>the</strong> subsequent draws can be<br />

viewed as draws from <strong>the</strong> marg<strong>in</strong>al posterior pdf’s f( β | y)<br />

and f( Σ | y)<br />

. It is <strong>the</strong>se<br />

draws that can be used to present results <strong>in</strong> <strong>the</strong> desired fashion. Draws taken prior to<br />

<strong>the</strong> po<strong>in</strong>t at which convergence is assumed to have taken place are sometimes called<br />

<strong>the</strong> "burn <strong>in</strong>"; <strong>the</strong>y are discarded. A large number of diagnostics have been suggested<br />

for assess<strong>in</strong>g whe<strong>the</strong>r convergence has taken place. See, for example, Cowles and<br />

Carl<strong>in</strong> (1996). Assess<strong>in</strong>g whe<strong>the</strong>r convergence has taken place is similar to assess<strong>in</strong>g<br />

whe<strong>the</strong>r a time series is stationary. Thus, visual <strong>in</strong>spection of a graph of <strong>the</strong> sequence<br />

of draws, test<strong>in</strong>g whe<strong>the</strong>r <strong>the</strong> mean and variance are <strong>the</strong> same at <strong>the</strong> beg<strong>in</strong>n<strong>in</strong>g of <strong>the</strong><br />

cha<strong>in</strong> as at <strong>the</strong> end of <strong>the</strong> cha<strong>in</strong>, and test<strong>in</strong>g whe<strong>the</strong>r two or more separately-run cha<strong>in</strong>s<br />

have <strong>the</strong> same mean and variance, are ways of check<strong>in</strong>g for convergence.<br />

S<strong>in</strong>ce we are us<strong>in</strong>g a sample of draws of β and Σ to estimate posterior means<br />

and standard deviations and o<strong>the</strong>r relevant population quantities, <strong>the</strong> accuracy of <strong>the</strong><br />

estimates is a concern. Estimation accuracy is assessed us<strong>in</strong>g numerical standard<br />

errors. Methods for comput<strong>in</strong>g such standard errors are described <strong>in</strong> <strong>the</strong> chapter by<br />

Geweke et al. Because <strong>the</strong> draws produced by MCMC algorithms are correlated, time<br />

series methods are used to compute <strong>the</strong> standard errors; also, larger samples are<br />

required to achieve a given level of accuracy relative to a situation <strong>in</strong>volv<strong>in</strong>g<br />

<strong>in</strong>dependent draws.


15<br />

Although <strong>the</strong> above remarks on convergence and numerical standard errors<br />

were made <strong>in</strong> <strong>the</strong> context of <strong>the</strong> Gibbs sampler for β and Σ , <strong>the</strong>y also apply to o<strong>the</strong>r<br />

MCMC algorithms <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> Gibbs sampler for β and <strong>the</strong> Metropolis Hast<strong>in</strong>gs<br />

algorithm described below.<br />

F. Gibbs Sampl<strong>in</strong>g with β<br />

If <strong>the</strong> number of equations is large, mak<strong>in</strong>g Σ of high dimension, <strong>the</strong>n it may be<br />

preferable to use a Gibbs sampler based on <strong>the</strong> conditional posterior pdfs for <strong>the</strong> β i<br />

from each equation. Note, however, that this alternative is not feasible if crossequation<br />

restrictions on <strong>the</strong> β i , as discussed <strong>in</strong> Sections V and VII, are present.<br />

To proceed with this Gibbs sampler, we beg<strong>in</strong> with start<strong>in</strong>g values for all<br />

coefficients except <strong>the</strong> first, say<br />

(0) (0) (0)<br />

2 3<br />

( β , β ,…, β M ) and <strong>the</strong>n sample iteratively us<strong>in</strong>g<br />

<strong>the</strong> follow<strong>in</strong>g steps for <strong>the</strong> $ -th draw:<br />

1. Draw<br />

2. Draw<br />

( )<br />

β $ 1 from ( ( $ − 1) ( $ −<br />

f β 1)<br />

1| β2<br />

, , βM<br />

)<br />

… .<br />

( )<br />

β $ 2 from ( ( $ ) ( $ − 1) ( $ −<br />

f β 1)<br />

2 | β1 , β3<br />

, , βM<br />

)<br />

… .<br />

!<br />

i. Draw<br />

( )<br />

β $ i from ( ( $ ) ( $ ) ( $ − 1) ( $ −<br />

f β | 1)<br />

i β1 , , βi− 1, βi+<br />

1 , , βM<br />

)<br />

… … .<br />

!<br />

M. Draw<br />

( )<br />

β $ M from ( ( $ ) ( $<br />

f β | )<br />

M β1 , , βM−1)<br />

… .<br />

The conditional posterior pdfs are multivariate t-distributions from which we can<br />

readily draw values (see <strong>the</strong> Appendix). Ord<strong>in</strong>ary least squares estimates are adequate<br />

for start<strong>in</strong>g values.


16<br />

G. A Metropolis-Hast<strong>in</strong>gs Algorithm<br />

An alternative to Gibbs sampl<strong>in</strong>g is a Metropolis-Hast<strong>in</strong>gs algorithm that draws<br />

observations from <strong>the</strong> marg<strong>in</strong>al posterior pdf f( β | y)<br />

. As we will see, this algorithm<br />

is particularly useful for an <strong>in</strong>equality-restricted prior, or if <strong>the</strong> equations are<br />

nonl<strong>in</strong>ear. The algorithm we describe is a random-walk algorithm; it is just one of<br />

many possibilities. For o<strong>the</strong>rs see, for example, Chen et al (2000).<br />

The Metropolis–Hast<strong>in</strong>gs algorithm generates a candidate value β * that is<br />

accepted or rejected as a draw from <strong>the</strong> posterior pdf f( β | y)<br />

. When it is rejected, <strong>the</strong><br />

previously accepted draw is repeated as a draw. Thus, rules are needed for generat<strong>in</strong>g<br />

<strong>the</strong> candidate value β * and for accept<strong>in</strong>g it. Let V be <strong>the</strong> covariance matrix for <strong>the</strong><br />

distribution used to generate a candidate value. The maximum likelihood covariance<br />

matrix is usually suitable. For <strong>the</strong> l<strong>in</strong>ear SUR model this matrix is<br />

[ X′ ( Σˆ<br />

⊗ I ) X]<br />

.<br />

−1 −1<br />

T<br />

Choose a feasible start<strong>in</strong>g value β<br />

(0)<br />

. The follow<strong>in</strong>g steps can be used to draw <strong>the</strong><br />

( $ + 1) −th<br />

observation <strong>in</strong> a random walk Metropolis–Hast<strong>in</strong>gs cha<strong>in</strong>.<br />

1. Draw a candidate value β * from a<br />

( )<br />

β $<br />

N( , cV)<br />

distribution where c is a<br />

scalar set such that β * is accepted approximately 40-50% of <strong>the</strong> time.<br />

2. Compute <strong>the</strong> ratio of <strong>the</strong> posterior pdf evaluated at <strong>the</strong> candidate draw to<br />

<strong>the</strong> posterior pdf evaluated at <strong>the</strong> previously accepted draw.<br />

r =<br />

f<br />

( β*<br />

| y)<br />

( )<br />

β $ y<br />

f( | )<br />

Note that this ratio can be computed without knowledge of <strong>the</strong> normalis<strong>in</strong>g<br />

constant for f( β | y)<br />

. Also, if any of <strong>the</strong> elements of β * fall outside a<br />

feasible parameter region def<strong>in</strong>ed by an <strong>in</strong>equality-restricted prior (see


17<br />

Section V), <strong>the</strong>n f( β * | y) = 0. When r > 1, β * is a more likely value than<br />

( )<br />

β $<br />

<strong>in</strong> <strong>the</strong> sense that it is closer to <strong>the</strong> mode of <strong>the</strong> distribution. When<br />

r < 1, β * is fur<strong>the</strong>r <strong>in</strong>to <strong>the</strong> tails of <strong>the</strong> distribution. If r > 1, β * is<br />

accepted; if r < 1, β * is accepted with probability r . Thus, more draws<br />

occur <strong>in</strong> regions of high probability and fewer draws occur <strong>in</strong> regions of<br />

low probability. Details of <strong>the</strong> acceptance-rejection procedure follow <strong>in</strong><br />

step 3.<br />

3. Draw a value u for a uniform random variable on <strong>the</strong> <strong>in</strong>terval (0,1).<br />

If<br />

u ≤ r , set<br />

( $ +1 )<br />

β = β*<br />

.<br />

If<br />

u > r , set<br />

( )<br />

β = β<br />

$ +1 ( $ )<br />

.<br />

Return to step 1 with $ set to $ + 1.<br />

Let<br />

* ( )<br />

( | )<br />

q β β $ be <strong>the</strong> distribution used to generate <strong>the</strong> candidate value β * <strong>in</strong> step 1. In<br />

our case it is a normal distribution. In more general Metropolis-Hast<strong>in</strong>gs algorithms,<br />

where our choice of distribution is not necessarily utilized, r is def<strong>in</strong>ed as<br />

r =<br />

* ( $ ) *<br />

f( β | y) q( β | β )<br />

.<br />

( $ ) * ( $ )<br />

f( β | y) q( β | β )<br />

In our case<br />

( $ ) * * ( $ )<br />

q( β | β ) = q( β | β )<br />

. Various alternatives for q (.) have been<br />

suggested <strong>in</strong> <strong>the</strong> literature.<br />

IV.<br />

NONLINEAR SUR<br />

Many economic models are <strong>in</strong>tr<strong>in</strong>sically nonl<strong>in</strong>ear, or a nonl<strong>in</strong>ear model may result<br />

from substitut<strong>in</strong>g nonl<strong>in</strong>ear restrictions on β <strong>in</strong>to a l<strong>in</strong>ear model. The Gibbs sampl<strong>in</strong>g


18<br />

algorithms that we described are no longer applicable for a nonl<strong>in</strong>ear SUR model.<br />

However, we can still proceed with <strong>the</strong> Metropolis-Hast<strong>in</strong>gs algorithm.<br />

Suppose that <strong>the</strong> nonl<strong>in</strong>ear SUR model is given by<br />

y = h( X, β ) + e<br />

i = 1 ,2,…,<br />

M<br />

(31)<br />

i i i<br />

where <strong>the</strong> h<br />

i<br />

are <strong>the</strong> nonl<strong>in</strong>ear functions. The dimensions of y<br />

i<br />

, hi<br />

and e<br />

i<br />

are ( T ×1)<br />

.<br />

In this context X represents a set of explanatory variables and β is <strong>the</strong> vector of all<br />

unknown coefficients. The omission of an i-subscript on X and β is deliberate; <strong>the</strong><br />

same coefficients and <strong>the</strong> same explanatory variables can occur <strong>in</strong> different equations.<br />

The earlier assumptions about <strong>the</strong> e<br />

i<br />

are reta<strong>in</strong>ed.<br />

With a nonl<strong>in</strong>ear model, f ( β)<br />

∝ constant may no longer be suitable as a<br />

non<strong>in</strong>formative prior; consideration needs to be given to <strong>the</strong> type of nonl<strong>in</strong>ear<br />

function and to whe<strong>the</strong>r particular values for some parameters need to be excluded.<br />

Thus, we give results for a general prior on β , denoted by f ( β ) . We reta<strong>in</strong> <strong>the</strong><br />

non<strong>in</strong>formative prior<br />

f ( Σ)<br />

∝ Σ<br />

− ( M + 1)/2<br />

, and assume a priori <strong>in</strong>dependence of β and<br />

Σ . Thus, <strong>the</strong> prior pdf is given by<br />

− ( M + 1)/2<br />

f( β, Σ) ∝ Σ f( β )<br />

(32)<br />

The likelihood function can be written as<br />

−MT<br />

2 −T<br />

2 1<br />

−1<br />

′<br />

2<br />

f( y| β, Σ ) = (2 π) Σ exp{ − ( y−h( X, β)) ( Σ ⊗I )( y−h( X, β))}<br />

−MT<br />

2 −T<br />

2 1 −1<br />

A<br />

2<br />

= (2 π) Σ exp{ − tr( Σ )} (33)<br />

T<br />

where y y′<br />

, y′<br />

,…,<br />

y′<br />

) , h h′<br />

, h′<br />

,…,<br />

h′<br />

) , and, now, A is an ( M × M ) matrix<br />

= ( ′<br />

1 2 M<br />

= ( ′<br />

1 2 M<br />

with<br />

( i , j)<br />

− th element given by


19<br />

[ A] = [ y −h ( X, β)][ ′ y − h ( X, β )]<br />

(34)<br />

ij i i j j<br />

The jo<strong>in</strong>t posterior pdf for ( β, Σ ) is<br />

− ( T+ M+ 1) 2 1 −1<br />

2<br />

f( βΣ , | y) ∝f( β) Σ exp{ − tr( AΣ )}<br />

(35)<br />

and, <strong>in</strong>tegrat<strong>in</strong>g out Σ , <strong>the</strong> marg<strong>in</strong>al posterior pdf for β is<br />

2<br />

f( β| y) ∝ f( β ) A −T<br />

(36)<br />

Thus, <strong>the</strong> posterior for β <strong>in</strong> <strong>the</strong> nonl<strong>in</strong>ear SUR model <strong>in</strong>volves <strong>the</strong> same determ<strong>in</strong>ant<br />

of sums of squares and cross products of residuals as it does <strong>in</strong> <strong>the</strong> l<strong>in</strong>ear model. A<br />

more general prior has been added. (Of course, it also could have been <strong>in</strong>cluded <strong>in</strong><br />

<strong>the</strong> l<strong>in</strong>ear model.)<br />

The Metropolis-Hast<strong>in</strong>gs algorithm described <strong>in</strong> Section III can be readily<br />

applied to <strong>the</strong> posterior pdf <strong>in</strong> equation (36). Because <strong>the</strong> earlier results on conditional<br />

posterior pdfs for β and <strong>the</strong> β i no longer hold, <strong>the</strong> draws need to be used directly to<br />

estimate posterior pdfs and <strong>the</strong>ir moments.<br />

V. IMPOSING LINEAR EQUALITY RESTRICTIONS<br />

Economic applications of SUR models frequently <strong>in</strong>volve l<strong>in</strong>ear restrictions on <strong>the</strong><br />

coefficients. For example, <strong>the</strong> same coefficient may appear <strong>in</strong> more than one equation,<br />

<strong>the</strong> Slutsky symmetry conditions <strong>in</strong> demand models lead to cross-equation<br />

restrictions, or one might want to hypo<strong>the</strong>size that all equations have <strong>the</strong> same<br />

coefficient vector. Under <strong>the</strong> existence of cross-equation l<strong>in</strong>ear restrictions, <strong>the</strong> Gibbs<br />

sampler us<strong>in</strong>g β and Σ , and <strong>the</strong> Metropolis-Hast<strong>in</strong>gs algorithm, can still be used.<br />

However, <strong>the</strong> Gibbs sampler <strong>in</strong>volv<strong>in</strong>g only β is no longer applicable. If <strong>the</strong><br />

restrictions are all with<strong>in</strong> equation restrictions, all three algorithms are possible.


20<br />

Suppose a set of J l<strong>in</strong>ear restrictions is written as<br />

⎛η⎞<br />

Rβ = ( R1 R2)<br />

⎜ ⎟=<br />

r<br />

⎝γ<br />

⎠<br />

(37)<br />

where R<br />

1<br />

is ( J × J ) and nons<strong>in</strong>gular, R<br />

2<br />

is ( J × ( K − J )) , and η and γ are J and<br />

( K − J ) dimensional sub-vectors of β , respectively. To make this partition, it may be<br />

necessary to reorder <strong>the</strong> elements <strong>in</strong> β . Correspond<strong>in</strong>gly, we can reorder <strong>the</strong> columns<br />

of X and partition it so that <strong>the</strong> l<strong>in</strong>ear SUR model can be written as<br />

⎛η⎞<br />

y = Xβ + e= ( X1 X2)<br />

⎜ ⎟+<br />

e<br />

⎝γ<br />

⎠<br />

(38)<br />

This reorder<strong>in</strong>g may destroy <strong>the</strong> block-diagonal properties of X . From (37), we can<br />

solve for η as<br />

−1<br />

1 2<br />

η= R ( r−R<br />

γ )<br />

(39)<br />

Substitut<strong>in</strong>g (39) <strong>in</strong>to (38) and rearrang<strong>in</strong>g yields<br />

− 1 −1<br />

1 1 2 1 1 2<br />

y− X R r = ( X − X R R ) γ + e<br />

or<br />

z = Zγ + e<br />

(40)<br />

where<br />

−1<br />

z = y − X R r , and<br />

1<br />

1<br />

−1<br />

2 1 1 2<br />

Z = X − X R R represent new sets of “observations”.<br />

In general, Z and γ can no longer be partitioned unambiguously <strong>in</strong>to M separate<br />

equations. However, <strong>the</strong> stochastic properties of e rema<strong>in</strong> <strong>the</strong> same. Thus, all <strong>the</strong><br />

results <strong>in</strong> Sections II and III that did not rely on a partition<strong>in</strong>g of X and β can still be<br />

applied to <strong>the</strong> model <strong>in</strong> (40). In particular, a Gibbs sampler can be used to draw γ and


21<br />

Σ , and <strong>the</strong> Metropolis-Hast<strong>in</strong>gs algorithm can be used to draw γ , from <strong>the</strong>ir<br />

respective posterior pdf’s.<br />

VI.<br />

IMPOSING INEQUALITY RESTRICTIONS<br />

Possible <strong>in</strong>equality restrictions on <strong>the</strong> coefficients range from simple ones such as a<br />

sign restriction on a s<strong>in</strong>gle coefficient to more complex ones such as enforc<strong>in</strong>g <strong>the</strong><br />

eigenvalues of a matrix of coefficients to be nonpositive. Lett<strong>in</strong>g <strong>the</strong> feasible region<br />

def<strong>in</strong>ed by <strong>the</strong> <strong>in</strong>equality constra<strong>in</strong>ts be denoted by S , and def<strong>in</strong><strong>in</strong>g <strong>the</strong> <strong>in</strong>dicator<br />

function<br />

I<br />

S<br />

⎧1 for<br />

( β ) =⎨<br />

⎩0 for<br />

β∈ S<br />

β∉ S<br />

(41)<br />

<strong>the</strong> <strong>in</strong>equality restrictions can be accommodated by sett<strong>in</strong>g up <strong>the</strong> o<strong>the</strong>rwise<br />

non<strong>in</strong>formative prior pdf<br />

− ( M + 1) 2<br />

f( β, Σ) ∝ Σ I S ( β )<br />

(42)<br />

Us<strong>in</strong>g Bayes’ Theorem to comb<strong>in</strong>e this prior with <strong>the</strong> likelihood function <strong>in</strong> equation<br />

(5), we obta<strong>in</strong> <strong>the</strong> jo<strong>in</strong>t posterior pdf.<br />

f( β, Σ| y) ∝ f( y| β, Σ) f( β, Σ)<br />

− ( T+ M+ 1) 2 1<br />

−1<br />

′<br />

2<br />

∝ Σ exp{ − ( y− Xβ)( Σ ⊗I )( y− Xβ)} I ( β)<br />

− ( T+ M+ 1) 2 1 −1<br />

A<br />

2<br />

= Σ exp{ − tr( Σ )} I ( β) (43)<br />

S<br />

T<br />

S<br />

From this result we can derive <strong>the</strong> follow<strong>in</strong>g conditional and marg<strong>in</strong>al posterior pdf’s.<br />

distribution<br />

The conditional posterior pdf for ( β | Σ ) is <strong>the</strong> truncated multivariate normal<br />

ˆ −1<br />

f( β | Σ, y) ∝exp{ − ( β −β) ′ X′<br />

( Σ ⊗I ) X( β −βˆ)} I ( β )<br />

(44)<br />

1<br />

2<br />

T<br />

S


22<br />

The conditional posterior pdf for ( Σ | y)<br />

is <strong>the</strong> same <strong>in</strong>verted-Wishart distribution as<br />

was given <strong>in</strong> equation (20). The marg<strong>in</strong>al posterior pdf for β is<br />

−T<br />

2<br />

f( β | y) ∝ A I S ( β )<br />

(45)<br />

The posterior pdf for β 1 conditional on <strong>the</strong> rema<strong>in</strong><strong>in</strong>g β i is <strong>the</strong> truncated multivariate<br />

t-distribution<br />

− ( K + v )/2<br />

1 1<br />

⎡ ( β1−β# 1) ′ X ′<br />

1 Q(1) X1( β1−β#<br />

1)<br />

⎤<br />

f( β1| y, β2, β3,..., βM) ∝ ⎢v ⎥<br />

1+ I ( )<br />

2<br />

S β<br />

⎢<br />

s#<br />

1<br />

⎥<br />

⎣<br />

⎦<br />

(46)<br />

Of <strong>in</strong>terest is how to best use <strong>the</strong>se pdfs to draw observations on β , and<br />

possibly Σ , from <strong>the</strong>ir respective posterior pdf’s. The conditional posterior pdf’s for<br />

( β | Σ ) and ( Σ | β ) can be used with<strong>in</strong> a Gibbs sampler provid<strong>in</strong>g <strong>the</strong> <strong>in</strong>equality<br />

restrictions are sufficiently mild for a simple acceptance-rejection algorithm to be<br />

practical when sampl<strong>in</strong>g from <strong>the</strong> truncated multivariate normal distribution. By a<br />

“simple acceptance–rejection algorithm”, we mean that a draw is made from a<br />

nontruncated multivariate normal distribution and, if it lies outside <strong>the</strong> feasible region,<br />

it is discarded and replaced by ano<strong>the</strong>r draw. This procedure will not be practical if<br />

<strong>the</strong> probability of obta<strong>in</strong><strong>in</strong>g a draw with<strong>in</strong> <strong>the</strong> feasible region is small, which will<br />

almost always be <strong>the</strong> case if <strong>the</strong> number of <strong>in</strong>equality restrictions is moderate to large.<br />

Thus, we are us<strong>in</strong>g <strong>the</strong> term “mild” <strong>in</strong>equality restrictions to describe a situation<br />

where <strong>the</strong> maximum number of draws necessary before a feasible draw is obta<strong>in</strong>ed is<br />

not excessive.<br />

If <strong>the</strong> <strong>in</strong>equality restrictions are not mild, <strong>the</strong>n a Metropolis-Hast<strong>in</strong>gs<br />

algorithm can be employed. In <strong>the</strong> steps we described <strong>in</strong> Section III, if a candidate


23<br />

value β<br />

*<br />

is <strong>in</strong>feasible, <strong>the</strong>n r = 0 , and <strong>the</strong> reta<strong>in</strong>ed draw is automatically <strong>the</strong> last<br />

accepted feasible draw. That is,<br />

β<br />

( $ + 1) ( )<br />

= β<br />

$ .<br />

If <strong>the</strong> <strong>in</strong>equality restrictions are not mild, but are l<strong>in</strong>ear, <strong>the</strong>n us<strong>in</strong>g a Gibbs<br />

sampler on subcomponents of β might prove successful. For example, us<strong>in</strong>g <strong>the</strong><br />

truncated multivariate t-distributions for each of <strong>the</strong> β i , as specified <strong>in</strong> equation (46),<br />

could be useful. Also with<strong>in</strong> different contexts, sampl<strong>in</strong>g from truncated multivariate t<br />

and multivariate normal distributions has been broken down <strong>in</strong>to sampl<strong>in</strong>g from<br />

univariate conditional distributions by Geweke (1991) and Hajivassiliou and<br />

McFadden (1990). Also, see <strong>the</strong> Appendix.<br />

VII.<br />

THREE APPLICATIONS<br />

A. Wheat Yield<br />

In Griffiths et al (2001) <strong>the</strong> follow<strong>in</strong>g model was used for predict<strong>in</strong>g wheat yield <strong>in</strong><br />

five Western Australian shires.<br />

2 3 2 2 2<br />

t 1 2 3 4 5 t 6 t 7 t 8 t 9 t 10 t t<br />

Y = β + β t+ β t + β t + β G + β G + β D + β D + β F + β F + e (47)<br />

Yield ( Y t ) depends on a cubic time trend to capture technological change and on<br />

quadratic functions of ra<strong>in</strong>fall dur<strong>in</strong>g <strong>the</strong> germ<strong>in</strong>ation period ( G t ) , <strong>the</strong> development<br />

period ( D t ) , and <strong>the</strong> flower<strong>in</strong>g period ( F t ). The ra<strong>in</strong>falls are measured relative to<br />

<strong>the</strong>ir sample means. Inequality restrictions are imposed to ensure that <strong>the</strong> response of<br />

yield to ra<strong>in</strong>fall, at average ra<strong>in</strong>fall, is positive. That is, for germ<strong>in</strong>ation ra<strong>in</strong>fall, for<br />

example, ∂Y<br />

/ ∂ G =β 5+ 2β 6 > 0. Thus, <strong>the</strong> feasible region for this example is<br />

{ }<br />

S( β ) = β | β + 2β > 0, β + 2β > 0, β + 2β > 0<br />

(48)<br />

5 6 7 8 9 10


24<br />

Although Griffiths et al (2001) used separate s<strong>in</strong>gle equation estimation for <strong>the</strong> five<br />

shires and focussed on several forecast<strong>in</strong>g issues, <strong>in</strong>vestigation with<strong>in</strong> a five-equation<br />

SUR model has started. Given that <strong>the</strong> <strong>in</strong>equality restrictions with<strong>in</strong> each equation are<br />

relatively mild, but <strong>in</strong> total <strong>the</strong>y are not, a Gibbs sampler us<strong>in</strong>g <strong>the</strong> truncated t<br />

densities <strong>in</strong> equation (46) seems a profitable direction to follow. Also, some<br />

prelim<strong>in</strong>ary work <strong>in</strong>volv<strong>in</strong>g <strong>the</strong> Metropolis-Hast<strong>in</strong>gs algorithm on <strong>the</strong> complete β<br />

vector has proved effective.<br />

B. Cost and Share Equations<br />

In a second application, a translog cost function (constant returns to scale) and costshare<br />

equations for mer<strong>in</strong>o woolgrowers (310 observations over 23 years) was<br />

estimated by Griffiths et al (2000) us<strong>in</strong>g, as <strong>in</strong>puts, land, capital, livestock and o<strong>the</strong>r.<br />

In <strong>the</strong> equations that follow c is cost, q is output, <strong>the</strong><br />

w i are <strong>in</strong>put prices and <strong>the</strong> Si<br />

are <strong>in</strong>put shares.<br />

4 4 4<br />

⎛c<br />

⎞<br />

log ⎜ ⎟ = β + β T + ∑ β log( w ) + 0.5∑∑<br />

β log( w )log( w ) + e<br />

⎝ ⎠<br />

0 T i i ij i j 1<br />

q i= 1 i= 1 j=<br />

1<br />

4<br />

S = β + ∑β log( w ) + e<br />

i = 2, 3, 4<br />

i i ij j i<br />

j=<br />

1<br />

This SUR model has <strong>the</strong> follow<strong>in</strong>g characteristics.<br />

1. The equations are l<strong>in</strong>ear.<br />

2. There are a number of l<strong>in</strong>ear equality restrictions that need to be imposed.<br />

Specifically, <strong>the</strong><br />

s ij<br />

β <strong>in</strong> <strong>the</strong> cost function are equal to <strong>the</strong> β s <strong>in</strong> <strong>the</strong> share<br />

ij<br />

equations, and, fur<strong>the</strong>rmore, to satisfy homogeneity and symmetry, we<br />

require


25<br />

4<br />

∑β i = 1<br />

i= 1<br />

4<br />

∑<br />

j= 1<br />

β = 0<br />

ij<br />

β<br />

ij<br />

= β<br />

ji<br />

3. Inequality restrictions are required for <strong>the</strong> functions to satisfy concavity<br />

and monotonicity. These restrictions are<br />

• Monotonicity 0 < S i < 1<br />

• Concavity B− S + ss′ is negative semidef<strong>in</strong>ite where<br />

⎡β<br />

⎢<br />

⎢β<br />

B =<br />

⎢β<br />

⎢<br />

⎢⎣<br />

β<br />

11<br />

21<br />

31<br />

41<br />

β<br />

β<br />

β<br />

β<br />

12<br />

22<br />

32<br />

42<br />

β<br />

β<br />

β<br />

β<br />

13<br />

23<br />

33<br />

43<br />

β<br />

β<br />

β<br />

β<br />

14<br />

24<br />

34<br />

44<br />

⎤<br />

⎥<br />

⎥<br />

⎥<br />

⎥<br />

⎥⎦<br />

⎡S<br />

⎢<br />

S =<br />

⎢<br />

⎢<br />

⎢<br />

⎢⎣<br />

1<br />

S<br />

2<br />

S<br />

3<br />

⎤<br />

⎥<br />

⎥<br />

⎥<br />

⎥<br />

S ⎥<br />

4 ⎦<br />

⎡S<br />

⎢<br />

⎢S<br />

s =<br />

⎢S<br />

⎢<br />

⎢⎣<br />

S<br />

1<br />

2<br />

3<br />

4<br />

⎤<br />

⎥<br />

⎥<br />

⎥<br />

⎥<br />

⎥⎦<br />

Note that B− S + ss′ is negative semidef<strong>in</strong>ite if and only if its largest<br />

eigenvalue is nonpositive.<br />

S<strong>in</strong>ce<br />

S i depends on <strong>the</strong> <strong>in</strong>put prices, a decision concern<strong>in</strong>g <strong>the</strong> <strong>in</strong>put<br />

prices at which S i is evaluated, and <strong>the</strong> <strong>in</strong>equality restrictions imposed,<br />

needs to be made. The <strong>in</strong>equality restrictions were imposed at average<br />

<strong>in</strong>put prices for each of <strong>the</strong> 23 years.<br />

4. Given <strong>the</strong> severe <strong>in</strong>equality restrictions that were imposed, <strong>the</strong> Metropolis-<br />

Hast<strong>in</strong>gs algorithm was used.<br />

5. The quantities of <strong>in</strong>terest are nonl<strong>in</strong>ear functions of <strong>the</strong> parameters. They<br />

are <strong>the</strong> elasticities of substitution


26<br />

σ<br />

ij<br />

=<br />

β<br />

i<br />

ij<br />

S S<br />

j<br />

+ 1<br />

i ≠ j<br />

and <strong>in</strong>put demand elasticities<br />

ii<br />

ηii<br />

= + Si<br />

−1<br />

Si<br />

η<br />

ij<br />

β<br />

β<br />

=<br />

S<br />

ij<br />

i<br />

+ S<br />

j<br />

C. Expenditure Functions<br />

Our third example <strong>in</strong>volves two expenditure functions estimated from a sample of<br />

1,834 Bangkok households, and deflated by an “equivalence scale” measure of<br />

household size (Griffiths and Chotikapanich 1997). For <strong>the</strong> t-th observation, <strong>the</strong><br />

functions are<br />

α m<br />

β m<br />

( x<br />

− α m<br />

− α<br />

− α m<br />

1 1t<br />

1 1t<br />

t 1 1t<br />

2 2t<br />

3 3t<br />

w1<br />

t<br />

= +<br />

e1<br />

t<br />

xt<br />

xt<br />

( β1m1<br />

t<br />

+ β2m2t<br />

+ β3m3t<br />

)<br />

m<br />

)<br />

+<br />

α<br />

m<br />

β<br />

m<br />

( x − α m<br />

− α<br />

− α m<br />

2 2t<br />

2 2t<br />

t 1 1t<br />

2 2t<br />

3 3t<br />

w2t<br />

= +<br />

+ e2t<br />

xt<br />

xt<br />

( β1m1<br />

t<br />

+ β2m2t<br />

+ β3m3<br />

t<br />

)<br />

m<br />

)<br />

m = 1+δ n +δ n<br />

1t 21 2t 31 3t<br />

m = 1+δ n +δ n<br />

2t 22 2t 32 3t<br />

m = 1+δ n +δ n<br />

3t 23 2t 33 3t<br />

where<br />

w jt = expenditure proportion for commodity j,<br />

x t = total expenditure,<br />

m jt = equivalence scale for commodity j,


27<br />

n 2t<br />

= number of extra adults (each household has at least one adult),<br />

n 3t<br />

= number of children.<br />

The unknown parameters are<br />

(α 1 , α 2 , α 3 , β 1 , β 2 , β 3 , δ 21 , δ 31 , δ 22 , δ 32 , δ 23 , δ 33 )<br />

This SUR model has <strong>the</strong> follow<strong>in</strong>g characteristics:<br />

1. The equations are nonl<strong>in</strong>ear <strong>in</strong> <strong>the</strong> parameters.<br />

2. A number of <strong>in</strong>equality restrictions were imposed, namely,<br />

0 < β1, β 2 < 1 Additional expenditure from a one-unit <strong>in</strong>crease <strong>in</strong><br />

supernumerary <strong>in</strong>come must lie between zero and one.<br />

0≤δ ≤δ ≤1<br />

Expenditure requirements for extra adults are less than<br />

3j<br />

2 j<br />

those for <strong>the</strong> first adult but greater than those for<br />

children.<br />

⎧ eit<br />

⎫<br />

α i < m<strong>in</strong> ⎨<br />

⎬ i = 1, 2<br />

t<br />

⎩1+δ 2<strong>in</strong>2t +δ3<strong>in</strong>3t<br />

⎭<br />

The smallest level of consumption <strong>in</strong> <strong>the</strong> sample must be<br />

greater than subsistence expenditure, a constra<strong>in</strong>t from<br />

<strong>the</strong> utility function.<br />

3. Given <strong>the</strong> nonl<strong>in</strong>ear equations and <strong>the</strong> <strong>in</strong>equality constra<strong>in</strong>ts, <strong>the</strong><br />

Metropolis-Hast<strong>in</strong>gs algorithm was used.<br />

4. Two nonl<strong>in</strong>ear functions of <strong>the</strong> parameters are of <strong>in</strong>terest. They are <strong>the</strong><br />

general scale or "household size":


28<br />

m<br />

x<br />

∑<br />

m<br />

k k k<br />

0 =<br />

x − ∑k α kmk + ∑k α k∑k<br />

β kmk<br />

β<br />

and <strong>the</strong> elasticities. Expressions for <strong>the</strong> latter can be found <strong>in</strong> Griffiths<br />

and Chotikapanich (1997).<br />

VIII. FORECASTING<br />

Suppose that we are <strong>in</strong>terested <strong>in</strong> forecast<strong>in</strong>g dependent variable values <strong>in</strong> <strong>the</strong> next<br />

period. The shire-level wheat yield application <strong>in</strong> <strong>the</strong> previous section is an example<br />

of where such a forecast would be of <strong>in</strong>terest. In that case <strong>the</strong> objective is to forecast<br />

yield for each of <strong>the</strong> five shires. S<strong>in</strong>ce <strong>the</strong> yields are correlated via <strong>the</strong> stochastic<br />

assumptions of <strong>the</strong> SUR model, a jo<strong>in</strong>t forecast is appropriate. We can write next<br />

period’s observation as<br />

y* = X* β + e*<br />

(49)<br />

where y * is an M-dimensional vector, X * is an ( M × K)<br />

block diagonal matrix with<br />

<strong>the</strong> i-th block be<strong>in</strong>g a 1× K ) row vector conta<strong>in</strong><strong>in</strong>g next period’s explanatory<br />

(<br />

i<br />

variables for <strong>the</strong> i-th equation,<br />

X<br />

*<br />

⎡x1*<br />

⎤<br />

⎢<br />

x<br />

⎥<br />

2*<br />

= ⎢<br />

⎥<br />

⎢ " ⎥<br />

⎢⎣<br />

x3*<br />

⎥⎦<br />

(50)<br />

and e* ~ N (0, Σ ) is next periods ( M ×1 ) random error vector. The conventional<br />

<strong>Bayesian</strong> forecast<strong>in</strong>g tool is <strong>the</strong> predictive pdf f( y*<br />

| y ). Graph<strong>in</strong>g marg<strong>in</strong>al<br />

predictive pdf’s from this density function, and comput<strong>in</strong>g its means, standard<br />

deviations and probabilities of <strong>in</strong>terest are <strong>the</strong> standard ways of report<strong>in</strong>g results.


29<br />

*<br />

The procedure for deriv<strong>in</strong>g <strong>the</strong> predictive pdf is to beg<strong>in</strong> with <strong>the</strong> jo<strong>in</strong>t pdf<br />

f( y , βΣ , | y)<br />

and to <strong>the</strong>n <strong>in</strong>tegrate out Σ and β , ei<strong>the</strong>r analytically or via a<br />

numerical sampl<strong>in</strong>g algorithm. Now,<br />

−M<br />

/2 −1/2<br />

1<br />

−1<br />

* 2 * *<br />

′<br />

* *<br />

f( y | β, Σ ) = (2 π) Σ exp{ − ( y − X β) Σ ( y − X β)}<br />

−1/2 1 −1<br />

A<br />

2 *<br />

∝Σ exp{ − tr( Σ )}<br />

(51)<br />

where A* = [ y− X* β][ y− X * β ]′<br />

. Thus, us<strong>in</strong>g <strong>the</strong> posterior pdf <strong>in</strong> equation (12) (no<br />

<strong>in</strong>equality restrictions), we have<br />

f( y , β, Σ | y) = f( y | β, Σ) f( β, Σ | y)<br />

* *<br />

− ( T + M + 2) 2 1<br />

1<br />

A A<br />

−<br />

2<br />

*<br />

∝Σ exp{ − tr[( + ) Σ ]}<br />

Us<strong>in</strong>g properties of <strong>the</strong> <strong>in</strong>verted Wishart distribution to <strong>in</strong>tegrate out Σ yields<br />

∫<br />

f( y , β | y) = f( y , β, Σ| y)<br />

dΣ<br />

* *<br />

*<br />

( T 1) 2<br />

∝ A+<br />

A − +<br />

(52)<br />

Because analytical <strong>in</strong>tegration of β out of equation (52) is not possible, we consider<br />

<strong>the</strong> conditional predictive pdf f( y*<br />

| β , y)<br />

. It turns out that this pdf is a multivariate<br />

student t. Thus, f( y*<br />

| y ) and its moments can be estimated by averag<strong>in</strong>g quantities<br />

from f( y*<br />

| β , y)<br />

over draws of β obta<strong>in</strong>ed us<strong>in</strong>g one of <strong>the</strong> MCMC algorithms<br />

described earlier.<br />

To establish that f( y*<br />

| β , y)<br />

is a multivariate t-distribution, we first note that<br />

(see, for example, Dhrymes 1978, p. 458)<br />

−1<br />

* * *<br />

′<br />

* *<br />

A+ A = A (1 + ( y − X β) A ( y − X β ))<br />

(53)<br />

Thus,


30<br />

−1<br />

f( y* | β, y) ∝ ⎡1 ( y* X* )′<br />

A ( y* X ⎤<br />

⎣<br />

+ − β − * β<br />

⎦<br />

− ( T + 1) 2<br />

⎡<br />

−1<br />

⎛ A ⎞<br />

⎤<br />

∝ ⎢v* + ( y* − X* β) ′ ⎜ ⎟ ( y* − X*<br />

β)<br />

⎥<br />

⎢<br />

v<br />

⎣<br />

⎝ * ⎠<br />

⎥<br />

⎦<br />

− ( M+<br />

v ) 2<br />

*<br />

(54)<br />

where v* = T − M + 1. Equation (54) is a multivariate t-distribution with mean<br />

E( y | β , y)<br />

= X β (55)<br />

* *<br />

covariance matrix<br />

V( y*<br />

| β , y)<br />

=<br />

v<br />

*<br />

A<br />

− 2<br />

(56)<br />

and degrees of freedom v *<br />

. Given draws β ( $ ) , $ = 1,2, …, N from an MCMC<br />

algorithm, one can average <strong>the</strong> quantities <strong>in</strong> equations (54) to (56) over <strong>the</strong>se draws to<br />

estimate <strong>the</strong> required marg<strong>in</strong>al predictive pdf’s and <strong>the</strong>ir moments. Marg<strong>in</strong>al<br />

univariate t distributions from (54) are averaged and <strong>the</strong> formulas are analogous to<br />

those <strong>in</strong> equations (22), (26) and (27) except, of course, that our random variable of<br />

<strong>in</strong>terest is now an element of y * , say y *i , not β ik .<br />

Percy (1992) describes an alternative Gibbs sampl<strong>in</strong>g approach where y * , β<br />

and Σ are recursively generated from <strong>the</strong>ir respective conditional pdf’s. With our<br />

approach, it is not necessary to generate draws on y * . Also, because we have derived<br />

<strong>the</strong> predictive pdf conditional on β , <strong>the</strong> <strong>in</strong>troduction of <strong>in</strong>equality restrictions on β<br />

does not change <strong>the</strong> analysis. The range of values of β over which averag<strong>in</strong>g takes<br />

place is restricted, but that is accommodated by <strong>the</strong> way <strong>in</strong> which β is drawn, and <strong>the</strong><br />

result <strong>in</strong> (54) still holds.<br />

An <strong>in</strong>terest<strong>in</strong>g extension, and one that is of concern to Griffiths et al. (2001), is<br />

captur<strong>in</strong>g <strong>the</strong> extra uncerta<strong>in</strong>ty created by not know<strong>in</strong>g <strong>the</strong> value of one or more


31<br />

regressors <strong>in</strong> X * . We have this problem if a wheat yield forecast is made prior to all<br />

ra<strong>in</strong>falls hav<strong>in</strong>g been observed. The effect can be captured by modell<strong>in</strong>g ra<strong>in</strong>fall and<br />

averag<strong>in</strong>g <strong>the</strong> predictive pdf for yield conditional on ra<strong>in</strong>fall over ra<strong>in</strong>falls draws<br />

made from its predictive pdf.<br />

IX.<br />

SOME EXTENSIONS<br />

Consider estimat<strong>in</strong>g β <strong>in</strong> <strong>the</strong> SUR model when <strong>the</strong>re are miss<strong>in</strong>g observations on one<br />

or more of <strong>the</strong> dependent variables. This problem was considered <strong>in</strong> <strong>the</strong> context of<br />

expenditure functions by Supat (1996). For <strong>the</strong> moment, assume <strong>the</strong> observations are<br />

truly miss<strong>in</strong>g and that <strong>the</strong>y are miss<strong>in</strong>g at random; <strong>the</strong>y are not zeros created by<br />

negative values of an unobserved latent variable, as <strong>in</strong> <strong>the</strong> case with <strong>the</strong> Tobit model.<br />

Writ<strong>in</strong>g<br />

O<br />

y to denote observed components and y U to denote unobserved<br />

components, estimation can proceed with<strong>in</strong> a Gibbs sampl<strong>in</strong>g framework us<strong>in</strong>g <strong>the</strong><br />

O<br />

U<br />

conditional posterior pdfs f( β| Σ , y , y ), f( Σ| β , y , y ) and f( y | βΣ , , y ). The<br />

conditional posterior pdfs for β and Σ are <strong>the</strong> normal and <strong>in</strong>verted Wishart pdf’s<br />

given <strong>in</strong> equations (14) and (20). To <strong>in</strong>vestigate how to draw observations from<br />

U<br />

O<br />

f( y | βΣ , , y ), we write <strong>the</strong> ( M × 1) t-th observation y () t as<br />

O<br />

U<br />

U<br />

O<br />

y = X β+ e<br />

(57)<br />

() t () t () t<br />

The subscript t has been placed <strong>in</strong> <strong>the</strong> paren<strong>the</strong>ses to dist<strong>in</strong>guish <strong>the</strong> ( M ×1)<br />

t-th<br />

observation all equations y () t from <strong>the</strong> ( T ×1)<br />

observations on <strong>the</strong> i-th equation y<br />

i<br />

.<br />

The structure of X () t is similar to that of X * def<strong>in</strong>ed <strong>in</strong> equation (50). We wish to<br />

consider equation (57) for all values of t where y () t has one or more unobserved<br />

components. Reorder<strong>in</strong>g <strong>the</strong> elements if necessary, we can partition equation (57) as


32<br />

⎛<br />

U U U<br />

y ⎞ ⎛<br />

() t X ⎞ ⎛<br />

() t e ⎞<br />

() t<br />

⎟=⎜ ⎟β +⎜ ⎜ O O O<br />

y ⎟ ⎜<br />

() t X ⎟ ⎜<br />

() t e ⎟<br />

⎝ ⎠ ⎝ ⎠ ⎝ () t ⎠<br />

(58)<br />

where we write<br />

⎡<br />

UU UO<br />

Σ Σ ⎤<br />

E⎡ ⎣<br />

e() t e()<br />

′<br />

t<br />

⎤<br />

⎦<br />

= ⎢ ⎥<br />

OU OO<br />

⎢⎣Σ<br />

Σ ⎥⎦<br />

(59)<br />

U<br />

t<br />

The conditional posterior pdf f( y() | βΣ , , y()<br />

) is a multivariate normal distribution<br />

with mean<br />

O<br />

t<br />

U O O UO OO−1<br />

O O<br />

() t () t () t () t () t<br />

E( y | β, Σ , y ) = X β +Σ Σ ( y −X<br />

β )<br />

(60)<br />

and covariance matrix<br />

U O UU UO OO−1<br />

OU<br />

() t y()<br />

t<br />

V( y | β, Σ , ) =Σ −Σ Σ Σ (61)<br />

U<br />

t<br />

O<br />

t<br />

Fur<strong>the</strong>rmore, ( y() | βΣ , , y()<br />

), t = 1 ,2,…,<br />

T are <strong>in</strong>dependent. Thus, for generat<strong>in</strong>g y()<br />

t<br />

with<strong>in</strong> <strong>the</strong> Gibbs sampler, we use <strong>the</strong> conditional normal distributions def<strong>in</strong>ed by<br />

equations (60) and (61) for all observations where an unobserved component is<br />

present.<br />

U<br />

Suppose, now, that <strong>the</strong> unobserved components represent negative values of a<br />

Tobit-type latent variable. In this case we have <strong>the</strong> additional posterior <strong>in</strong>formation<br />

U<br />

that <strong>the</strong> elements of y () t are negative. The conditional posterior pdf for<br />

U O<br />

() t y()<br />

t<br />

( y | βΣ , , ) becomes a truncated (multivariate) normal distribution with a<br />

U<br />

truncation that forces y () t to be negative. Its location vector and scale matrix (no<br />

longer <strong>the</strong> mean and covariance matrix) are given <strong>in</strong> equations (60) and (61). A


33<br />

convenient algorithm for draw<strong>in</strong>g from this truncated normal distribution is described<br />

<strong>in</strong> <strong>the</strong> Appendix.<br />

For extensions <strong>in</strong>to Probit models, see Geweke et al. (1997) and references<br />

<strong>the</strong>re<strong>in</strong>. The literature on simultaneous equation models with Tobit and Probit<br />

variables can be accessed through Li (1998). Sets of SUR expenditure functions with<br />

a common parameter and with unobserved expenditures that result from <strong>in</strong>frequency<br />

of purchase are considered by Griffiths and Valenzuela (1998). Smith and Kohn<br />

(2000) study <strong>Bayesian</strong> estimation of nonparametric SURs.<br />

X. CONCLUDING REMARKS<br />

With <strong>the</strong> recent explosion of literature on MCMC techniques, <strong>Bayesian</strong> <strong>in</strong>ference <strong>in</strong><br />

<strong>the</strong> SUR model has become a practical reality. However, it is <strong>the</strong> author’s view that,<br />

prior to <strong>the</strong> writ<strong>in</strong>g of this chapter, <strong>the</strong> relevant results have not been collected and<br />

summarised <strong>in</strong> a form convenient for applied researchers to implement. It is my hope<br />

this chapter will facilitate and motivate many more applications of <strong>Bayesian</strong> <strong>in</strong>ference<br />

<strong>in</strong> <strong>the</strong> SUR model.<br />

XI.<br />

APPENDIX – DRAWING RANDOM VARIABLES AND VECTORS<br />

A. Multivariate Normal Distribution<br />

To draw a vector y from a N ( µ , Σ ) distribution:<br />

1. Compute <strong>the</strong> Cholesky decomposition H such that H H ′ = Σ .<br />

2. Generate z from N ( 0, I)<br />

.<br />

3. Calculate y = µ + Hz.


34<br />

B. Multivariate t Distribution<br />

Consider <strong>the</strong> multivariate k-dimensional t-distribution with pdf<br />

−1<br />

f( x| µ , V) ∝ ⎡v+ ( x−µ )′<br />

V ( x−µ<br />

) ⎤<br />

⎣<br />

⎦<br />

−( k−v) 2<br />

It has v degrees of freedom, mean µ and covariance matrix<br />

( v ( v − 2))<br />

V . (Assume<br />

v > 2 .) To draw a vector x from this pdf:<br />

1. Compute <strong>the</strong> Cholesky decomposition H such that H H ′ = V .<br />

2. Generate <strong>the</strong> ( k ×1)<br />

vector z<br />

1<br />

from N 0, I ) .<br />

(<br />

k<br />

3. Generate <strong>the</strong> ( v ×1)<br />

vector z<br />

2<br />

from N(0, I ) .<br />

v<br />

4. Calculate x =µ+ Hz1 z2′<br />

z2<br />

v .<br />

C. Inverted Wishart Distribution<br />

Let Σ have an m-dimensional <strong>in</strong>verted Wishart distribution with parameter matrix S<br />

and degrees of freedom v . It has pdf<br />

− ( v+ m+ 1) 2 1 −1<br />

2<br />

f( Σ| S) ∝ Σ exp{ − tr( SΣ<br />

)}<br />

To draw observations on Σ :<br />

1. Compute <strong>the</strong> Cholesky decomposition H such that<br />

−1<br />

H H ′ = S .<br />

2. Draw <strong>in</strong>dependent ( m ×1)<br />

normal random vectors z , z , 1 2<br />

…,<br />

zv<br />

from<br />

N 0, I ) .<br />

(<br />

m<br />

3. Calculate<br />

v<br />

−1<br />

⎛ ⎞<br />

Σ= ⎜ H zizi′ H′<br />

⎟<br />

i=<br />

1<br />

⎝<br />

∑ .<br />


35<br />

D. Univariate Truncated Normal Distribution<br />

Suppose that x is a truncated normal random variable with location µ , scale σ and<br />

truncation a < x < b . To draw x :<br />

1. Draw a uniform (0,1) random variable U .<br />

2. Calculate<br />

−1 ⎡ ⎛ a−µ ⎞ ⎛ ⎛ b−u⎞ ⎛ a−µ<br />

⎞⎞⎤<br />

x =µ+σΦ ⎢Φ ⎜ ⎟+ U Φ⎜ ⎟−Φ⎜ ⎟<br />

σ<br />

⎜<br />

σ σ<br />

⎟⎥<br />

⎣ ⎝ ⎠ ⎝ ⎝ ⎠ ⎝ ⎠⎠⎦<br />

(A.1)<br />

where Φ is <strong>the</strong> standard normal cumulative distribution function.<br />

E. Multivariate Truncated Normal Distribution<br />

Suppose that x is an m-dimensional multivariate truncated normal distribution<br />

such that a1< x1< b1, a2 < x2 < b2, …, am < xm < bm.<br />

1. Use (A.1) to draw x<br />

1.<br />

2. F<strong>in</strong>d <strong>the</strong> location and scale parameters for <strong>the</strong> truncated conditional<br />

normal distribution ( x<br />

2<br />

| x1<br />

) conditional on x<br />

1<br />

drawn <strong>in</strong> step 1.<br />

3. Apply (A.1) to <strong>the</strong> distribution x | ) .<br />

(<br />

2<br />

x1<br />

4. F<strong>in</strong>d location and scale parameters for <strong>the</strong> distribution of x | x , )<br />

conditional on <strong>the</strong> draws made <strong>in</strong> steps 1 and 3.<br />

(<br />

3 2<br />

x1<br />

5. Apply (A.1) to <strong>the</strong> distribution x | x , ) .<br />

(<br />

3 2<br />

x1<br />

6. And so on.


36<br />

REFERENCES<br />

Albert, J.H., Chib, S., 1996. Computation <strong>in</strong> <strong>Bayesian</strong> econometrics: an <strong>in</strong>troduction<br />

to Markov cha<strong>in</strong> Monte Carlo. In: Hill, R.C. (Ed.), Advances <strong>in</strong> Econometrics<br />

Volume 11A: Computational Methods and Applications. JAI Press, Greenwich,<br />

pp.3-24.<br />

Chen, M.-H., Shao, Q.-M., Ibrahim, J.G., 2000. Monte Carlo Methods <strong>in</strong> <strong>Bayesian</strong><br />

Computation. Spr<strong>in</strong>ger, New York.<br />

Chib, S., Greenberg, E., 1995. Hierarchical analysis of SUR models with extensions<br />

to correlated serial errors and time-vary<strong>in</strong>g parameter models. Journal of<br />

Econometrics 68, 339-360.<br />

Chib, S., Greenberg, E., 1996. Markov cha<strong>in</strong> Monte Carlo simulation methods <strong>in</strong><br />

econometrics. Econometric Theory 12, 409-431.<br />

Cowles, M.K., Carl<strong>in</strong>, B.P., 1996. Markov cha<strong>in</strong> Monte Carlo convergence<br />

diagnostics: a comparative review. Journal of <strong>the</strong> American Statistical<br />

Association 91, 883-904.<br />

Dhrymes, P.J., 1978. Introductory Econometrics. Spr<strong>in</strong>ger, New York.<br />

Drèze, J.H., Morales, J.A., 1976. <strong>Bayesian</strong> full <strong>in</strong>formation analysis of simultaneous<br />

equations. Journal of <strong>the</strong> American Statistical Association 71, 919-923.<br />

Fiebig, D., 2001. Seem<strong>in</strong>gly unrelated regression. In: Baltagi, B. (Ed.), Companion <strong>in</strong><br />

Econometrics, Basil Blackwell, London (Chapter 5).<br />

Fiebig, D.G., Kim J.H., 2000. Estimation and <strong>in</strong>ference <strong>in</strong> SUR models when <strong>the</strong><br />

number of equations is large. Econometric Reviews 19, 105-130.


37<br />

Geweke, J., 1988. The secular and cyclical behavior of real GDP <strong>in</strong> n<strong>in</strong>eteen OECD<br />

countries. Journal of Bus<strong>in</strong>ess and Economic Statistics 6, 479-486.<br />

Geweke, J., 1991. Efficient simulation from <strong>the</strong> multivariate normal and student t-<br />

distributions subject to l<strong>in</strong>ear constra<strong>in</strong>ts. In: Keramidas E.M. (ed.), Comput<strong>in</strong>g<br />

Science and Statistics: Proceed<strong>in</strong>gs of <strong>the</strong> Twenty-third Symposium on <strong>the</strong><br />

Interface, Interface Foundation of North America, Fairfax, pp.571-578.<br />

Geweke, J., Keane, M.P., Runkle, D.E., 1997. Statistical <strong>in</strong>ference <strong>in</strong> <strong>the</strong> mult<strong>in</strong>omial<br />

multiperiod probit model. Journal of Econometrics. 80, 127-166.<br />

Gilks, W.R., Richardson, S., Spielgelhalter, D.J., (Eds.), 1996. Markov Cha<strong>in</strong> Monte<br />

Carlo <strong>in</strong> Practice. Chapman and Hall, London.<br />

Griffiths, W.E., 1999. Estimat<strong>in</strong>g consumer surplus: comment on us<strong>in</strong>g simulation<br />

methods for <strong>Bayesian</strong> econometric models: <strong>in</strong>ference development and<br />

communication. Econometric Reviews 18, 75-88.<br />

Griffiths, W.E., Chotikapanich, D., 1997. <strong>Bayesian</strong> methodology for impos<strong>in</strong>g<br />

<strong>in</strong>equality constra<strong>in</strong>ts on a l<strong>in</strong>ear expenditure function with demographic<br />

factors. Australian Economic Papers 36, 321-341.<br />

Griffiths, W.E., O’Donnell, C.J., Tan-Cruz, A., 2000. Impos<strong>in</strong>g regularity conditions<br />

on a system of cost and factor share equations. Australian Journal of<br />

Agricultural and Resource Economics 44, 107-127.<br />

Griffiths, W.E., Newton, L.S., O’Donnell, C.J., 2001, Predictive densities for shire<br />

level wheat yield <strong>in</strong> Western Australia. Paper contributed to <strong>the</strong> Australian<br />

Agricultural and Resource Economics Society Conference <strong>in</strong> Adelaide.<br />

Griffiths, W.E., Skeels, C.J., Chotikapanich, D., 2001. Sample size requirements for<br />

estimation <strong>in</strong> SUR models. In: Ullah, A., Wan, A., Chaturvedi, A., (Eds.),


38<br />

Handbook of Applied Econometrics and Statistical <strong>Inference</strong>, Marcel Dekker,<br />

New York, forthcom<strong>in</strong>g.<br />

Griffiths, W.E., Valenzuela, M.R., 1998. Miss<strong>in</strong>g data from <strong>in</strong>frequency of purchase:<br />

<strong>Bayesian</strong> estimation of a l<strong>in</strong>ear expenditure system. In: Fomby, T.B., Hill, R.C.<br />

(Eds.), Advances <strong>in</strong> Econometrics, Volume 13, Messy Data, Miss<strong>in</strong>g<br />

Observations, Outliers and Mixed-Frequency Data, JAI Press, Greenwich,<br />

pp.75-102.<br />

Hajivassiliou, V.A., McFadden, D.L., 1990, The method of simulated scores for <strong>the</strong><br />

estimation of LDV models with an application to external debt crises. Yale<br />

Cowles Foundation Discussion Paper No. 967.<br />

Judge, G.G., Griffiths, W.E., Hill, R.C., Lütkepohl, H., Lee, T.-C., 1985. The Theory<br />

and Practice of Econometrics, second ed. John Wiley and Sons, New York.<br />

Kloek, T., van Dijk, H.K., 1978. <strong>Bayesian</strong> estimates of equation system parameters:<br />

an application of <strong>in</strong>tegration by Monte Carlo. Econometrica 46, 1-19.<br />

Li, K., 1998. <strong>Bayesian</strong> <strong>in</strong>ference <strong>in</strong> a simultaneous equation model with limited<br />

dependent variables. Journal of Econometrics 85, 387-400.<br />

Lütkepohl, H., 1996. Handbook of Matrices. John Wiley and Sons, Chichester.<br />

Percy, D.F., 1992. Prediction for seem<strong>in</strong>gly unrelated regressions. Journal of <strong>the</strong><br />

Royal Statistical Society B 54, 243-252.<br />

Percy, D.F., 1996. Zellner's <strong>in</strong>fluence on multivariate l<strong>in</strong>ear models. In: Berry, D.A.,<br />

Chaloner, K.M., Geweke, J.K. (Eds.), <strong>Bayesian</strong> Analysis <strong>in</strong> Statistics and<br />

Econometrics: Essays <strong>in</strong> Honor of Arnold Zellner, John Wiley and Sons, New<br />

York (Chapter 17).


39<br />

Richard, J.-F., Tompa, H., 1980. On <strong>the</strong> evaluation of poly-t density functions.<br />

Journal of Econometrics 12, 335-351.<br />

Richard, J.-F., Steel, M.F.J., 1988. <strong>Bayesian</strong> analysis of systems of seem<strong>in</strong>gly<br />

unrelated regression equations under a recursive extended natural conjugate<br />

prior density. Journal of Econometrics 38, 7-37.<br />

Smith, M., Kohn, R., 2000. Nonparametric seem<strong>in</strong>gly unrelated regression. Journal of<br />

Econometrics 98, 257-282.<br />

Srivastava, V.K., Dwivedi, T.D., 1979. Estimation of seem<strong>in</strong>gly unrelated regression<br />

equations: a brief survey. Journal of Econometrics 10, 15-32.<br />

Srivastava, V.K., Giles, D.E.A., 1987. Seem<strong>in</strong>gly <strong>Unrelated</strong> Regression Equations<br />

<strong>Model</strong>s: Estimation and <strong>Inference</strong>. Marcel Dekker, New York.<br />

Steel, M.F.J., 1992. Posterior analysis of restricted seem<strong>in</strong>gly unrelated regression<br />

equation models: a recursive analytical approach. Econometric Reviews 11,<br />

129-142.<br />

Supat, K., 1996. Seem<strong>in</strong>gly <strong>Unrelated</strong> Regression with Miss<strong>in</strong>g Observations on<br />

Dependent Variables. B.Ec. Honours dissertation, University of New England,<br />

Armidale.<br />

Tanner, M.A., 1996. Tools for Statistical <strong>Inference</strong>: Methods for <strong>the</strong> Exploration of<br />

Posterior Distributions and Likelihood Functions, third ed. Spr<strong>in</strong>ger, New<br />

York.<br />

Tierney, L., 1994. Markov cha<strong>in</strong>s for explor<strong>in</strong>g posterior distributions (with<br />

discussion and rejo<strong>in</strong>der). Annals of Statistics 22, 1701-1762.


40<br />

Zellner, A., 1962. An efficient method of estimat<strong>in</strong>g seem<strong>in</strong>gly unrelated regressions<br />

and tests of aggregation bias. Journal of American Statistical Association 57,<br />

500-509.<br />

Zellner, A., 1971. An Introduction to <strong>Bayesian</strong> <strong>Inference</strong> <strong>in</strong> Econometrics. John Wiley<br />

and Sons, New York.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!