11.04.2014 Views

unified detection and recognition for reading text in scene images

unified detection and recognition for reading text in scene images

unified detection and recognition for reading text in scene images

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.2.1 Model Likelihood<br />

Substitut<strong>in</strong>g the parametric <strong>for</strong>m (2.4) <strong>for</strong> the model likelihood <strong>in</strong> (2.14) we have<br />

⎛<br />

⎞<br />

L (θ; D) ≡ ∑ ⎝ ∑ ( )<br />

U C y (k)<br />

C ,x(k) ; θ C − log Z ( x (k)) ⎠ (2.15)<br />

k C∈C (k)<br />

The set of functions {U C } C∈C (k) depends on the particular unknowns y (k) , <strong>and</strong> thus C<br />

is <strong>in</strong>dexed by the particular example k.<br />

For certa<strong>in</strong> <strong>for</strong>ms of compatibility functions, it can be shown that the objective<br />

function L (θ; D) is convex, which means that global optima can be found by gradient<br />

ascent or other convex optimization techniques [13]. In particular, if the compatibility<br />

functions are l<strong>in</strong>ear <strong>in</strong> the parameters, then the log likelihood (2.14) is convex.<br />

Throughout this thesis, we use l<strong>in</strong>ear compatibility functions, which have the general<br />

<strong>for</strong>m<br />

U C (y C ,x C ; θ C ) = θ C (y C ) · F C (x) , (2.16)<br />

where F C : Ω → R d(C) is a vector of features of the observation, the dimensionality<br />

of which depends on the particular set C. The parameter vector θ C ∈ R |YC|×d(C) is<br />

conveniently thought of as a function θ C : Y C → R d(C) that takes an assignment y C<br />

<strong>and</strong> returns an associated set of weights <strong>for</strong> the features F C .<br />

Tak<strong>in</strong>g the gradient of the objective (2.15) with respect to the parameters yields<br />

∇ θ L (θ; D) = ∑ ∑ ( ( )<br />

∇ θC U C y (k)<br />

C ,x(k) ; θ C −<br />

k C∈C<br />

[ (k) ( ) ])<br />

E C ∇θC U C yC ,x (k) ; θ C | x (k) ; θ C (2.17)<br />

= ∑ ∑ ( [ ])<br />

FC (x) − E C FC (x) | x (k) ; θ C . (2.18)<br />

k C∈C (k)<br />

where E C <strong>in</strong>dicates an expectation with respect to the marg<strong>in</strong>al probability distribution<br />

p (y C | x, θ, I). Equation (2.17) is the gradient <strong>for</strong> general compatibility functions,<br />

while (2.18) is <strong>for</strong> l<strong>in</strong>ear compatibilities (2.16).<br />

To calculate the log likelihood <strong>and</strong> its gradient, <strong>and</strong> thus f<strong>in</strong>d the optimal model θ,<br />

we will need to be able to calculate log Z (x), the so-called log partition function, <strong>and</strong><br />

the marg<strong>in</strong>al probabilities of each y C . In general, these both <strong>in</strong>volve comb<strong>in</strong>atorial<br />

sums, so approximations must be made. Most of these are described <strong>in</strong> Section 2.3.2,<br />

but we describe two here that are more closely related to the log-likelihood <strong>and</strong> the<br />

objective function.<br />

2.2.1.1 Parameter Decoupl<strong>in</strong>g<br />

One simple approximation that may be made is to decouple the parameters <strong>in</strong><br />

θ dur<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g. For <strong>in</strong>stance, <strong>in</strong> the example described at the end of §2.1.2 <strong>and</strong><br />

shown <strong>in</strong> Figure 2.1, there are two types of compatibility functions, one <strong>for</strong> recogniz<strong>in</strong>g<br />

characters based on their appearance <strong>and</strong> another <strong>for</strong> weight<strong>in</strong>g bigrams. If the<br />

parameter vector is decomposed <strong>in</strong>to the parameters <strong>for</strong> the <strong>recognition</strong> <strong>and</strong> bigram<br />

24

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!