Neural Models of Bayesian Belief Propagation Rajesh ... - Washington
Neural Models of Bayesian Belief Propagation Rajesh ... - Washington
Neural Models of Bayesian Belief Propagation Rajesh ... - Washington
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
238 11 <strong>Neural</strong> <strong>Models</strong> <strong>of</strong> <strong>Bayesian</strong> <strong>Belief</strong> <strong>Propagation</strong> <strong>Rajesh</strong> P. N. Rao<br />
in figure 11.2A. The input that is observed at time t (= 1, 2, . . .) is represented by<br />
the random variable I(t), which can either be discrete-valued or a real-valued<br />
vector such as an image or a speech signal. The input is assumed to be generated<br />
by a hidden cause or “state” θ(t), which can assume one <strong>of</strong> N discrete<br />
values 1, . . . , N. The state θ(t) evolves over time in a Markovian manner, depending<br />
only on the previous state according to the transition probabilities<br />
given by P (θ(t) = i|θ(t − 1) = j) = P (θt i |θt−1 j ) for i, j = 1 . . . N. The observation<br />
I(t) is generated according to the probability P (I(t)|θ(t)).<br />
The belief propagation algorithm can be used to compute the posterior probability<br />
<strong>of</strong> the state given current and past inputs (we consider here only the<br />
“forward” propagation case, corresponding to on-line state estimation). As in<br />
the previous example, the node θt performs a marginalization over neighboring<br />
variables, in this case θt−1 and I(t). The first marginalization results in a<br />
probability vector whose ith component is <br />
is<br />
j P (θt i |θt−1 j )m t−1,t<br />
j where m t−1,t<br />
j<br />
the jth component <strong>of</strong> the message from node θt−1 to θt . The second marginal-<br />
ization is from node I(t) and is given by <br />
I(t) P (I(t)|θt i<br />
lar input I ′ is observed, this sum becomes <br />
)P (I(t)). If a particu-<br />
I(t) P (I(t)|θt i )δ(I(t), I′ ) = P (I ′ |θ t i ),<br />
where δ is the delta function which evaluates to 1 if its two arguments are equal<br />
and 0 otherwise. The two “messages” resulting from the marginalization along<br />
the arcs from θ t−1 and I(t) can be multiplied at node θ t to yield the following<br />
message to θ t+1 :<br />
m t,t+1<br />
i<br />
= P (I ′ |θ t i) <br />
If m 0,1<br />
i<br />
Bayes rule that m t,t+1<br />
i<br />
j<br />
P (θ t i|θ t−1<br />
j )m t−1,t<br />
j<br />
(11.4)<br />
= P (θi) (the prior distribution over states), then it is easy to show using<br />
= P (θt i , I(t), . . . , I(1)).<br />
Rather than computing the joint probability, one is typically interested in<br />
calculating the posterior probability <strong>of</strong> the state, given current and past inputs,<br />
i.e., P (θt i |I(t), . . . , I(1)). This can be done by incorporating a normalization step<br />
at each time step. Define (for t = 1, 2, . . .):<br />
m t i = P (I ′ |θ t i) <br />
P (θ t i|θ t−1<br />
j )m t−1,t<br />
j<br />
(11.5)<br />
j<br />
m t,t+1<br />
i = m t i/n t , (11.6)<br />
where nt = <br />
is easy to see that:<br />
j mt j<br />
. If m0,1<br />
i<br />
= P (θi) (the prior distribution over states), then it<br />
m t,t+1<br />
i = P (θ t i|I(t), . . . , I(1)) (11.7)<br />
This method has the additional advantage that the normalization at each time<br />
step promotes stability, an important consideration for recurrent neuronal networks,<br />
and allows the likelihood function P (I ′ |θt i ) to be defined in proportional<br />
terms without the need for explicitly calculating its normalization factor (see<br />
section 11.4 for an example).