Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Figure 2.4: Part <strong>of</strong> a <strong>document</strong> in our corpus that contain large gaps between words.<br />
tel-00912566, version 1 - 2 Dec 2013<br />
Figure 2.5: Part <strong>of</strong> a <strong>document</strong> in our corpus that shows closely situated text lines.<br />
5 or more text lines.<br />
• Large gaps between words may lead to over-segmentation <strong>of</strong> text lines<br />
<strong>an</strong>d <strong>of</strong>ten cause fragments in text lines. Figure 2.4 shows <strong>an</strong> example <strong>of</strong><br />
large gaps between words.<br />
• Closely situated text lines c<strong>an</strong> become a challenge when the dist<strong>an</strong>ce<br />
between two components from a single text line is larger th<strong>an</strong> the dist<strong>an</strong>ce<br />
between two components from different text lines. Text lines in Figure 2.5<br />
are so close that dist<strong>an</strong>ce-based methods such as minimum sp<strong>an</strong>ning tree,<br />
fail to produce correct results.<br />
• Highly curved text lines <strong>an</strong>d calligraphy c<strong>an</strong> be found in some freestyle<br />
23