Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
proximation <strong>of</strong> a Mumford-Shah functional. They indicate that boundary based<br />
level-set methods such as [52, 53] depend on the number <strong>of</strong> boundary evolution<br />
steps, <strong>an</strong>d they are also sensitive to touching text lines. To h<strong>an</strong>dle these difficulties,<br />
their method seeks to minimize a Mumford-Shah functional using a<br />
piecewise const<strong>an</strong>t approximation [92]. The initial estimates <strong>of</strong> the text lines<br />
are the same as [85], <strong>an</strong>d then they are refined by visiting each pixel <strong>of</strong> the<br />
image in a given order. For each initial text line, a segmentation curve is set<br />
to segment the text line into two regions; inner <strong>an</strong>d outer. In each iteration,<br />
these curves evolve by calculating their parameters based on the intensity <strong>of</strong> the<br />
region. The final results may contain line fragments due to large gaps between<br />
words; so morphological operators are used as part <strong>of</strong> the post-processing step<br />
to merge some <strong>of</strong> these fragments.<br />
tel-00912566, version 1 - 2 Dec 2013<br />
The last reviewed method is published in paper [88] for detecting h<strong>an</strong>dwritten<br />
Arabic text lines. Instead <strong>of</strong> summing values <strong>of</strong> adjacent pixels as in<br />
projection pr<strong>of</strong>iles in [87], Shi et al. apply steerable directional filters, each with<br />
a shape <strong>of</strong> <strong>an</strong> ellipse with a large focal dist<strong>an</strong>ce. The height <strong>of</strong> the filter is chosen<br />
to be the same as the height <strong>of</strong> <strong>an</strong> average text <strong>an</strong>d the width to be five times<br />
its height. Using a filter with a direction similar to the direction <strong>of</strong> text lines,<br />
the pixel value for that location has a greater response th<strong>an</strong> when using <strong>an</strong>other<br />
filter in <strong>an</strong>y other direction. The result <strong>of</strong> filtering generates a map that is later<br />
thresholded adaptively to enh<strong>an</strong>ce the location <strong>of</strong> text lines.<br />
Dist<strong>an</strong>ce based methods<br />
We only review one method in this category. This method has been recently<br />
proposed in [103] for segmentation <strong>of</strong> h<strong>an</strong>dwritten Chinese text regions into<br />
text lines. The heart <strong>of</strong> this method is a minimum sp<strong>an</strong>ning tree algorithm.<br />
In the first stage, the method extracts all the connected components <strong>of</strong> the<br />
<strong>document</strong>. The reasonable assumption is that components which belong to a<br />
single text line are close to one <strong>an</strong>other compared to the components that belong<br />
to different text lines. Therefore, a minimum sp<strong>an</strong>ning tree is applied to<br />
connect neighboring components <strong>of</strong> the same line, <strong>an</strong>d each line corresponds to<br />
a sub-tree. Then because <strong>of</strong> variability <strong>of</strong> layout <strong>of</strong> text lines <strong>an</strong>d occasionally<br />
large gaps between words, the results are not prefect. Hence, the method use a<br />
second-stage clustering procedure to dynamically cut the edges <strong>of</strong> the tree into<br />
groups corresponding to correct text lines. A detection accuracy rate <strong>of</strong> 98.02%<br />
is reported for 803 unconstrained <strong>document</strong>s.<br />
2.3.3 Conclusion<br />
We have noted m<strong>an</strong>y methods in this section for text line detection, <strong>an</strong>d eventually<br />
we have to decide which method is exploitable for detecting text lines<br />
from our corpus. An evaluation <strong>an</strong>d comparison between methods is valid only<br />
if the results are available for the same dataset <strong>an</strong>d with the same evaluation<br />
metric. We identified three different perform<strong>an</strong>ce evaluation metrics; Pixel Correspondence,<br />
Match Counting <strong>an</strong>d Overall Pixel-Level HitRate. We have already<br />
described Match Counting in the first chapter. Unfortunately, in our case, it<br />
32