Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
• Problem 2: Given the observations <strong>an</strong>d the model ψ = (f, λ), how do we<br />
choose a corresponding label configuration y ∗ = y ∗ 1y ∗ 2...y ∗ s which is optimal<br />
in some sense (i.e., best ”explains” the observations)? (Label decoding)<br />
y ∗ = arg max p(y|x, ψ)<br />
y<br />
• Problem 3: Given the observations, the label configuration y = y 1 y 2 ...y S<br />
<strong>an</strong>d a model ψ = (f, λ), how do we adjust the model parameters λ to<br />
maximize P (y|x, ψ)? (Training)<br />
We c<strong>an</strong> also view the first problem as one <strong>of</strong> scoring how well a given model<br />
matches a given observation sequence. This viewpoint is useful for the purpose<br />
<strong>of</strong> recognition where we are trying to choose among several competing models<br />
the model that best matches the observations. Thus, it does not play <strong>an</strong>y part<br />
in our system where we w<strong>an</strong>t to segment regions by labeling the observations.<br />
tel-00912566, version 1 - 2 Dec 2013<br />
Problem 2 is the one in which we attempt to find the ”correct” label configuration.<br />
Since we are using two labels text, non-text for each site, the problem<br />
becomes to assign one <strong>of</strong> these labels to each site on the image. This is what<br />
we w<strong>an</strong>t to do every time that we process a <strong>document</strong> for region detection. In<br />
case <strong>of</strong> linear-chain conditional r<strong>an</strong>dom fields, this operation is easily achieved<br />
by using the Viterbi algorithm [97] that c<strong>an</strong> assign the optimal labels using dynamic<br />
programming. However, in two-dimensional r<strong>an</strong>dom fields approximate<br />
techniques should be used.<br />
Problem 3 is the one in which we attempt to optimize the model parameters<br />
to best describe how the given observations <strong>an</strong>d label configuration come<br />
about. It is called training, <strong>an</strong>d various methods have been proposed for that.<br />
Among these, Limited Memory Broyden-Fletcher-Goldfarb-Sh<strong>an</strong>no (L-BFGS or<br />
LM-BFGS) [20] <strong>an</strong>d Collin’s Voted Perceptron [26] are two popular chooses.<br />
4.2 Feature functions<br />
Perhaps feature functions are the most import<strong>an</strong>t components <strong>of</strong> the model. The<br />
general form <strong>of</strong> a feature function is f(y i,i∈N(s) , y s , x i,i∈N(s) , x s ), which looks at<br />
a pair <strong>of</strong> adjacent sites to indicate how likely the given label configuration is<br />
correct given the observations at both sites. Because this feature function depends<br />
on two sites, it is called <strong>an</strong> edge feature function. However, if the feature<br />
function only depends on the label <strong>an</strong>d observation at one site, it is called a node<br />
feature function. The value <strong>of</strong> the function is a real value that may depend on<br />
labels <strong>an</strong>d observations, including <strong>an</strong>y non-linear combination <strong>of</strong> them.<br />
The term ”feature function” is different from features <strong>an</strong>d feature extraction<br />
we are familiar with in image processing. Each feature function must be tied to<br />
label configurations. For example, we c<strong>an</strong> define a simple edge feature function<br />
f 1 which produces binary values: it is 1 if the label <strong>of</strong> the current site is ”text”<br />
<strong>an</strong>d the label <strong>of</strong> the site on its left is ”non-text”.<br />
56