14.01.2014 Views

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

List <strong>of</strong> Figures<br />

tel-00912566, version 1 - 2 Dec 2013<br />

1.1 A sample <strong>of</strong> a correctly segmented <strong>document</strong> image by ABBYY<br />

FineReader 2011. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br />

1.2 Left; A <strong>document</strong> page that is segmented incorrectly by ABBYY<br />

FineReader 2011 <strong>an</strong>d Right; The same <strong>document</strong> segmented by<br />

our method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />

1.3 Left; A <strong>document</strong> page that is segmented incorrectly by AB-<br />

BYY Fine Reader 2011 <strong>an</strong>d Right; Its corresponding result for<br />

character recognition. . . . . . . . . . . . . . . . . . . . . . . . . 7<br />

1.4 The result <strong>of</strong> paragraph detection <strong>of</strong> the image in figure 1.3 by<br />

our method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7<br />

1.5 Some sample <strong>document</strong>s from our own corpus. . . . . . . . . . . 9<br />

1.6 Two sample <strong>document</strong>s from the dataset for ICDAR2009 competition<br />

[6]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10<br />

1.7 A screen shot <strong>of</strong> our developed s<strong>of</strong>tware for generating XML<br />

ground truth <strong>document</strong>s for each <strong>document</strong> image . . . . . . . . 10<br />

1.8 A screen shot that shows part <strong>of</strong> the contents <strong>of</strong> <strong>an</strong> XML file<br />

created by our s<strong>of</strong>tware . . . . . . . . . . . . . . . . . . . . . . . 11<br />

1.9 Visual org<strong>an</strong>ization <strong>of</strong> this dissertation . . . . . . . . . . . . . . . 12<br />

2.1 Text components touch graphical elements . . . . . . . . . . . . . 15<br />

2.2 Broken components <strong>of</strong> graphical drawings . . . . . . . . . . . . . 15<br />

2.3 Variability <strong>of</strong> font sizes in newspaper style <strong>document</strong>s . . . . . . 22<br />

2.4 Large gaps between words . . . . . . . . . . . . . . . . . . . . . . 23<br />

2.5 Closely situalted text lines . . . . . . . . . . . . . . . . . . . . . . 23<br />

2.6 Skewed text lines in camera-captured <strong>document</strong>s. . . . . . . . . 24<br />

2.7 Vertical Text lines . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br />

2.8 Snakes grow to engulf text lines . . . . . . . . . . . . . . . . . . . 26<br />

2.9 Text line segmentation using global projection pr<strong>of</strong>iles . . . . . . 28<br />

2.10 Adaptive local connectivity map for detecting text lines . . . . . 28<br />

2.11 Block extraction steps in [105] . . . . . . . . . . . . . . . . . . . . 29<br />

2.12 Line segmentation using piecewise projection pr<strong>of</strong>iles . . . . . . . 30<br />

2.13 Steps for locating text line separators in part <strong>of</strong> <strong>document</strong> image.<br />

[75] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

3.1 Adv<strong>an</strong>tages <strong>of</strong> Sauvola’s binarization method . . . . . . . . . . . 37<br />

3.2 Adv<strong>an</strong>tages <strong>of</strong> Otsu’s binarization method . . . . . . . . . . . . . 38<br />

3.3 Histograms <strong>of</strong> features based on solidity . . . . . . . . . . . . . . 40<br />

3.4 Histograms <strong>of</strong> features based on elongation . . . . . . . . . . . . 42<br />

3.5 Histograms <strong>of</strong> features based on height . . . . . . . . . . . . . . . 43<br />

iv

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!