14.01.2014 Views

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

• Problem 2: Given the observations <strong>an</strong>d the model ψ = (f, λ), how do we<br />

choose a corresponding label configuration y ∗ = y ∗ 1y ∗ 2...y ∗ s which is optimal<br />

in some sense (i.e., best ”explains” the observations)? (Label decoding)<br />

y ∗ = arg max p(y|x, ψ)<br />

y<br />

• Problem 3: Given the observations, the label configuration y = y 1 y 2 ...y S<br />

<strong>an</strong>d a model ψ = (f, λ), how do we adjust the model parameters λ to<br />

maximize P (y|x, ψ)? (Training)<br />

We c<strong>an</strong> also view the first problem as one <strong>of</strong> scoring how well a given model<br />

matches a given observation sequence. This viewpoint is useful for the purpose<br />

<strong>of</strong> recognition where we are trying to choose among several competing models<br />

the model that best matches the observations. Thus, it does not play <strong>an</strong>y part<br />

in our system where we w<strong>an</strong>t to segment regions by labeling the observations.<br />

tel-00912566, version 1 - 2 Dec 2013<br />

Problem 2 is the one in which we attempt to find the ”correct” label configuration.<br />

Since we are using two labels text, non-text for each site, the problem<br />

becomes to assign one <strong>of</strong> these labels to each site on the image. This is what<br />

we w<strong>an</strong>t to do every time that we process a <strong>document</strong> for region detection. In<br />

case <strong>of</strong> linear-chain conditional r<strong>an</strong>dom fields, this operation is easily achieved<br />

by using the Viterbi algorithm [97] that c<strong>an</strong> assign the optimal labels using dynamic<br />

programming. However, in two-dimensional r<strong>an</strong>dom fields approximate<br />

techniques should be used.<br />

Problem 3 is the one in which we attempt to optimize the model parameters<br />

to best describe how the given observations <strong>an</strong>d label configuration come<br />

about. It is called training, <strong>an</strong>d various methods have been proposed for that.<br />

Among these, Limited Memory Broyden-Fletcher-Goldfarb-Sh<strong>an</strong>no (L-BFGS or<br />

LM-BFGS) [20] <strong>an</strong>d Collin’s Voted Perceptron [26] are two popular chooses.<br />

4.2 Feature functions<br />

Perhaps feature functions are the most import<strong>an</strong>t components <strong>of</strong> the model. The<br />

general form <strong>of</strong> a feature function is f(y i,i∈N(s) , y s , x i,i∈N(s) , x s ), which looks at<br />

a pair <strong>of</strong> adjacent sites to indicate how likely the given label configuration is<br />

correct given the observations at both sites. Because this feature function depends<br />

on two sites, it is called <strong>an</strong> edge feature function. However, if the feature<br />

function only depends on the label <strong>an</strong>d observation at one site, it is called a node<br />

feature function. The value <strong>of</strong> the function is a real value that may depend on<br />

labels <strong>an</strong>d observations, including <strong>an</strong>y non-linear combination <strong>of</strong> them.<br />

The term ”feature function” is different from features <strong>an</strong>d feature extraction<br />

we are familiar with in image processing. Each feature function must be tied to<br />

label configurations. For example, we c<strong>an</strong> define a simple edge feature function<br />

f 1 which produces binary values: it is 1 if the label <strong>of</strong> the current site is ”text”<br />

<strong>an</strong>d the label <strong>of</strong> the site on its left is ”non-text”.<br />

56

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!