23.03.2013 Views

Neural Models of Bayesian Belief Propagation Rajesh ... - Washington

Neural Models of Bayesian Belief Propagation Rajesh ... - Washington

Neural Models of Bayesian Belief Propagation Rajesh ... - Washington

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

238 11 <strong>Neural</strong> <strong>Models</strong> <strong>of</strong> <strong>Bayesian</strong> <strong>Belief</strong> <strong>Propagation</strong> <strong>Rajesh</strong> P. N. Rao<br />

in figure 11.2A. The input that is observed at time t (= 1, 2, . . .) is represented by<br />

the random variable I(t), which can either be discrete-valued or a real-valued<br />

vector such as an image or a speech signal. The input is assumed to be generated<br />

by a hidden cause or “state” θ(t), which can assume one <strong>of</strong> N discrete<br />

values 1, . . . , N. The state θ(t) evolves over time in a Markovian manner, depending<br />

only on the previous state according to the transition probabilities<br />

given by P (θ(t) = i|θ(t − 1) = j) = P (θt i |θt−1 j ) for i, j = 1 . . . N. The observation<br />

I(t) is generated according to the probability P (I(t)|θ(t)).<br />

The belief propagation algorithm can be used to compute the posterior probability<br />

<strong>of</strong> the state given current and past inputs (we consider here only the<br />

“forward” propagation case, corresponding to on-line state estimation). As in<br />

the previous example, the node θt performs a marginalization over neighboring<br />

variables, in this case θt−1 and I(t). The first marginalization results in a<br />

probability vector whose ith component is <br />

is<br />

j P (θt i |θt−1 j )m t−1,t<br />

j where m t−1,t<br />

j<br />

the jth component <strong>of</strong> the message from node θt−1 to θt . The second marginal-<br />

ization is from node I(t) and is given by <br />

I(t) P (I(t)|θt i<br />

lar input I ′ is observed, this sum becomes <br />

)P (I(t)). If a particu-<br />

I(t) P (I(t)|θt i )δ(I(t), I′ ) = P (I ′ |θ t i ),<br />

where δ is the delta function which evaluates to 1 if its two arguments are equal<br />

and 0 otherwise. The two “messages” resulting from the marginalization along<br />

the arcs from θ t−1 and I(t) can be multiplied at node θ t to yield the following<br />

message to θ t+1 :<br />

m t,t+1<br />

i<br />

= P (I ′ |θ t i) <br />

If m 0,1<br />

i<br />

Bayes rule that m t,t+1<br />

i<br />

j<br />

P (θ t i|θ t−1<br />

j )m t−1,t<br />

j<br />

(11.4)<br />

= P (θi) (the prior distribution over states), then it is easy to show using<br />

= P (θt i , I(t), . . . , I(1)).<br />

Rather than computing the joint probability, one is typically interested in<br />

calculating the posterior probability <strong>of</strong> the state, given current and past inputs,<br />

i.e., P (θt i |I(t), . . . , I(1)). This can be done by incorporating a normalization step<br />

at each time step. Define (for t = 1, 2, . . .):<br />

m t i = P (I ′ |θ t i) <br />

P (θ t i|θ t−1<br />

j )m t−1,t<br />

j<br />

(11.5)<br />

j<br />

m t,t+1<br />

i = m t i/n t , (11.6)<br />

where nt = <br />

is easy to see that:<br />

j mt j<br />

. If m0,1<br />

i<br />

= P (θi) (the prior distribution over states), then it<br />

m t,t+1<br />

i = P (θ t i|I(t), . . . , I(1)) (11.7)<br />

This method has the additional advantage that the normalization at each time<br />

step promotes stability, an important consideration for recurrent neuronal networks,<br />

and allows the likelihood function P (I ′ |θt i ) to be defined in proportional<br />

terms without the need for explicitly calculating its normalization factor (see<br />

section 11.4 for an example).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!