14.01.2014 Views

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Figure 2.6: Part <strong>of</strong> a camera-captured <strong>document</strong> from [17] that shows skewed text<br />

lines.<br />

tel-00912566, version 1 - 2 Dec 2013<br />

Figure 2.7: Left image is a cropped part <strong>of</strong> a <strong>document</strong> from ICDAR2009 database<br />

<strong>an</strong>d Right image is <strong>an</strong> advertisement page from our own corpus. Both show vertical<br />

text lines that are hard to detect.<br />

historical <strong>document</strong>; however, they are seldom seen in more recent <strong>document</strong>s.<br />

• Skewed text lines appear on camera-captured books. Thick books are<br />

<strong>of</strong>ten hard <strong>an</strong>d time-consuming to be sc<strong>an</strong>ned m<strong>an</strong>ually using flat sc<strong>an</strong>ners.<br />

One solution is to use <strong>an</strong> automated system to turn pages <strong>an</strong>d take<br />

photos <strong>of</strong> each page. The downside is that text lines appear to be skewed<br />

near the borders. Figure 2.6 shows part <strong>of</strong> a camera-captured <strong>document</strong><br />

from [17] containing the mentioned skewed text lines.<br />

• Vertical text lines <strong>of</strong>ten exist in magazine-style pages or advertisements.<br />

They come in two flavours. Either characters are in correct direction but<br />

appear on top <strong>of</strong> each other, or the whole line with all characters is rotated<br />

by ±90 ◦ . Most methods are designed to detect sl<strong>an</strong>ted text lines with a<br />

limited degree <strong>of</strong> freedom; however, over-segmentation errors occur when<br />

the algorithm is not designed to deal with vertical text lines. Figure 2.7<br />

shows <strong>an</strong> example <strong>of</strong> these vertical lines that c<strong>an</strong> be found occasionally in<br />

<strong>document</strong>s.<br />

24

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!