Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Figure 2.6: Part <strong>of</strong> a camera-captured <strong>document</strong> from [17] that shows skewed text<br />
lines.<br />
tel-00912566, version 1 - 2 Dec 2013<br />
Figure 2.7: Left image is a cropped part <strong>of</strong> a <strong>document</strong> from ICDAR2009 database<br />
<strong>an</strong>d Right image is <strong>an</strong> advertisement page from our own corpus. Both show vertical<br />
text lines that are hard to detect.<br />
historical <strong>document</strong>; however, they are seldom seen in more recent <strong>document</strong>s.<br />
• Skewed text lines appear on camera-captured books. Thick books are<br />
<strong>of</strong>ten hard <strong>an</strong>d time-consuming to be sc<strong>an</strong>ned m<strong>an</strong>ually using flat sc<strong>an</strong>ners.<br />
One solution is to use <strong>an</strong> automated system to turn pages <strong>an</strong>d take<br />
photos <strong>of</strong> each page. The downside is that text lines appear to be skewed<br />
near the borders. Figure 2.6 shows part <strong>of</strong> a camera-captured <strong>document</strong><br />
from [17] containing the mentioned skewed text lines.<br />
• Vertical text lines <strong>of</strong>ten exist in magazine-style pages or advertisements.<br />
They come in two flavours. Either characters are in correct direction but<br />
appear on top <strong>of</strong> each other, or the whole line with all characters is rotated<br />
by ±90 ◦ . Most methods are designed to detect sl<strong>an</strong>ted text lines with a<br />
limited degree <strong>of</strong> freedom; however, over-segmentation errors occur when<br />
the algorithm is not designed to deal with vertical text lines. Figure 2.7<br />
shows <strong>an</strong> example <strong>of</strong> these vertical lines that c<strong>an</strong> be found occasionally in<br />
<strong>document</strong>s.<br />
24