Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
determine the average height <strong>of</strong> text lines from text connected components, but<br />
much harder to determine the spaces between lines. Furthermore, line-spacing<br />
varies from one <strong>document</strong> to the other. To ensure that our kernel only captures<br />
one text line, the value <strong>of</strong> σ should be tied to λ. The size <strong>of</strong> the kernel should<br />
depend on the σ parameter. The size should normally be selected large enough<br />
so that kernel coefficients <strong>of</strong> the border rows <strong>an</strong>d columns contribute very little<br />
to the sum <strong>of</strong> coefficients. As a rule <strong>of</strong> thumb the size (in pixels) <strong>of</strong> the kernel<br />
window should be six times the value <strong>of</strong> σ or larger to ensure that the power<br />
<strong>of</strong> coefficient values at the borders <strong>of</strong> the kernel window subside to 1% or lower<br />
th<strong>an</strong> the power <strong>of</strong> center coefficients. The second issue is that we w<strong>an</strong>t the λ<br />
parameter to be dependent on the size <strong>of</strong> the text line. Our parameters are:<br />
tel-00912566, version 1 - 2 Dec 2013<br />
λ = 2 × Avg. text height or width<br />
σ = λ<br />
3.5<br />
γ = 0.7<br />
{ π To capture text lines<br />
ψ =<br />
0 To capture white space<br />
Unfortunately, there is no efficient implementation <strong>of</strong> Gabor filter in which<br />
the λ parameter varies locally. We could m<strong>an</strong>ually crop patches from a <strong>document</strong><br />
image <strong>an</strong>d use a Gabor kernel for each patch that matches text heights locally,<br />
but it would take a huge amount <strong>of</strong> time. Therefore, Gabor filtering using kernel<br />
multiplication in frequency space <strong>an</strong>d fixed parameters for each kernel per <strong>document</strong><br />
is still the only available option. As a consequence two different Gabor<br />
kernels are used to capture text lines. For these kernels, the λ parameters are<br />
set to 2 × Average Text Height <strong>an</strong>d 4 × Average Text Height. Moreover, to capture<br />
white space gaps, three Gabor kernels are used. The λ parameters are set<br />
to 1.5 × Text Width, 3.5 × Average Text Width <strong>an</strong>d 5.5 × Average Text Width.<br />
Figure 4.6 shows the results <strong>of</strong> applying the mentioned Gabor filters to a <strong>document</strong><br />
in figure 4.5.<br />
Figure 4.7 displays two additional examples <strong>of</strong> filtered <strong>images</strong>, highlighting<br />
the effect <strong>of</strong> using various λ parameters to capture text lines <strong>of</strong> different font<br />
sizes.<br />
4.4 Feature functions<br />
Each feature function has two parts. The first part depends on the label <strong>of</strong> the<br />
site or a combination <strong>of</strong> the labels from its neighbors. The second part depends<br />
on the observations. Theoretically, these observations c<strong>an</strong> be generated from<br />
<strong>an</strong>ywhere on the image; however, we restrict them to be computed from <strong>an</strong>d<br />
around the site in question.<br />
We described m<strong>an</strong>y features that we extract from <strong>document</strong> <strong>images</strong>. In order<br />
to generate observations from these features, we compute me<strong>an</strong> <strong>an</strong>d vari<strong>an</strong>ce for<br />
each feature map at the same site. These statistics serve as observations at each<br />
62