11.04.2014 Views

unified detection and recognition for reading text in scene images

unified detection and recognition for reading text in scene images

unified detection and recognition for reading text in scene images

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

An alternative method <strong>for</strong> prediction is called maximum posterior marg<strong>in</strong>al (MPM)<br />

estimation:<br />

ŷ i = arg max<br />

y i ∈Y i<br />

p (y i | x, θ, I), ∀i ∈ V. (2.31)<br />

It corresponds to choos<strong>in</strong>g the label <strong>for</strong> each unknown that maximizes the probability<br />

with all other label<strong>in</strong>gs marg<strong>in</strong>alized. This can often be a more effective way of consider<strong>in</strong>g<br />

the probabilities of all the label<strong>in</strong>gs, rather than simply the maximum (jo<strong>in</strong>t)<br />

label<strong>in</strong>g, as <strong>in</strong> MAP. Marg<strong>in</strong>alization, however, suffers from the same computational<br />

complexity problems. Sum-product loopy belief propagation, described next, can be<br />

used to approximate these marg<strong>in</strong>als.<br />

In practice, the MAP estimate tends to be conservative, try<strong>in</strong>g to give the most<br />

correct labels, while MPM tends to give higher <strong>detection</strong> rates. These methods are<br />

used variously <strong>in</strong> our experiments.<br />

2.3.2 Belief Propagation<br />

Because they all <strong>in</strong>volve comb<strong>in</strong>atorial sums or search spaces, we require approximations<br />

to compute the log partition function <strong>for</strong> the likelihood objective (2.15),<br />

the marg<strong>in</strong>al probabilities <strong>for</strong> its gradient (2.18), or to make predictions with MAP<br />

or MPM, Fortunately, there is one family of algorithms that h<strong>and</strong>les all of these.<br />

Namely, the loopy sum-product or belief propagation algorithm. In short, it exploits<br />

the nature <strong>in</strong> which the jo<strong>in</strong>t probability factorizes <strong>in</strong>to a product of local functions.<br />

Recall the bipartite “factor graph” between unknowns y, represented by nodes <strong>in</strong><br />

the graph, <strong>and</strong> the compatibility functions (or factors, when they are exponentiated)<br />

over the unknowns, as shown <strong>in</strong> the example of Figure 2.1 on page 22. The algorithm<br />

operates by iteratively pass<strong>in</strong>g messages between the nodes represent<strong>in</strong>g the unknowns<br />

<strong>and</strong> the factors. When these messages converge to a stable fixed po<strong>in</strong>t, the results<br />

are equivalent to m<strong>in</strong>imiz<strong>in</strong>g the so-called Bethe free energy, a variational method <strong>for</strong><br />

approximate <strong>in</strong>ference from statistical physics [136]. When the factor graph is a tree,<br />

the sum-product algorithm [60] efficiently per<strong>for</strong>ms exact <strong>in</strong>ference. In most cases,<br />

our graphs are not trees, <strong>and</strong> thus the results are only approximate.<br />

Let N (i) ≡ {C ∈ C | i ∈ C} be the family of <strong>in</strong>dices <strong>for</strong> the factors that neighbor<br />

the ith unknown. This corresponds to the set of factors hav<strong>in</strong>g y i as an argument.<br />

The set of edges <strong>in</strong> a factor graph are thus<br />

E (C) ≡ {(i, C) | i ∈ V ∧ C ∈ N (i)}. (2.32)<br />

For all edges <strong>in</strong> the factor graph (i, C) ∈ E (C), the node-to-factor messages have the<br />

general <strong>for</strong>m<br />

∏<br />

m i→C (y i ) ∝ m C ′ →i (y i ) (2.33)<br />

C ′ ∈N(i)\C<br />

so that the message from a node to a factor is the product of all the messages to<br />

that node from all other neighbor<strong>in</strong>g factors. The result<strong>in</strong>g functional message is<br />

normalized (i.e., sums to 1 over y i ) <strong>for</strong> numerical stability. Note that <strong>for</strong> parsimony,<br />

28

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!