14.01.2014 Views

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure 2.4: Part <strong>of</strong> a <strong>document</strong> in our corpus that contain large gaps between words.<br />

tel-00912566, version 1 - 2 Dec 2013<br />

Figure 2.5: Part <strong>of</strong> a <strong>document</strong> in our corpus that shows closely situated text lines.<br />

5 or more text lines.<br />

• Large gaps between words may lead to over-segmentation <strong>of</strong> text lines<br />

<strong>an</strong>d <strong>of</strong>ten cause fragments in text lines. Figure 2.4 shows <strong>an</strong> example <strong>of</strong><br />

large gaps between words.<br />

• Closely situated text lines c<strong>an</strong> become a challenge when the dist<strong>an</strong>ce<br />

between two components from a single text line is larger th<strong>an</strong> the dist<strong>an</strong>ce<br />

between two components from different text lines. Text lines in Figure 2.5<br />

are so close that dist<strong>an</strong>ce-based methods such as minimum sp<strong>an</strong>ning tree,<br />

fail to produce correct results.<br />

• Highly curved text lines <strong>an</strong>d calligraphy c<strong>an</strong> be found in some freestyle<br />

23

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!