Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
List <strong>of</strong> Figures<br />
tel-00912566, version 1 - 2 Dec 2013<br />
1.1 A sample <strong>of</strong> a correctly segmented <strong>document</strong> image by ABBYY<br />
FineReader 2011. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br />
1.2 Left; A <strong>document</strong> page that is segmented incorrectly by ABBYY<br />
FineReader 2011 <strong>an</strong>d Right; The same <strong>document</strong> segmented by<br />
our method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />
1.3 Left; A <strong>document</strong> page that is segmented incorrectly by AB-<br />
BYY Fine Reader 2011 <strong>an</strong>d Right; Its corresponding result for<br />
character recognition. . . . . . . . . . . . . . . . . . . . . . . . . 7<br />
1.4 The result <strong>of</strong> paragraph detection <strong>of</strong> the image in figure 1.3 by<br />
our method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7<br />
1.5 Some sample <strong>document</strong>s from our own corpus. . . . . . . . . . . 9<br />
1.6 Two sample <strong>document</strong>s from the dataset for ICDAR2009 competition<br />
[6]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10<br />
1.7 A screen shot <strong>of</strong> our developed s<strong>of</strong>tware for generating XML<br />
ground truth <strong>document</strong>s for each <strong>document</strong> image . . . . . . . . 10<br />
1.8 A screen shot that shows part <strong>of</strong> the contents <strong>of</strong> <strong>an</strong> XML file<br />
created by our s<strong>of</strong>tware . . . . . . . . . . . . . . . . . . . . . . . 11<br />
1.9 Visual org<strong>an</strong>ization <strong>of</strong> this dissertation . . . . . . . . . . . . . . . 12<br />
2.1 Text components touch graphical elements . . . . . . . . . . . . . 15<br />
2.2 Broken components <strong>of</strong> graphical drawings . . . . . . . . . . . . . 15<br />
2.3 Variability <strong>of</strong> font sizes in newspaper style <strong>document</strong>s . . . . . . 22<br />
2.4 Large gaps between words . . . . . . . . . . . . . . . . . . . . . . 23<br />
2.5 Closely situalted text lines . . . . . . . . . . . . . . . . . . . . . . 23<br />
2.6 Skewed text lines in camera-captured <strong>document</strong>s. . . . . . . . . 24<br />
2.7 Vertical Text lines . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br />
2.8 Snakes grow to engulf text lines . . . . . . . . . . . . . . . . . . . 26<br />
2.9 Text line segmentation using global projection pr<strong>of</strong>iles . . . . . . 28<br />
2.10 Adaptive local connectivity map for detecting text lines . . . . . 28<br />
2.11 Block extraction steps in [105] . . . . . . . . . . . . . . . . . . . . 29<br />
2.12 Line segmentation using piecewise projection pr<strong>of</strong>iles . . . . . . . 30<br />
2.13 Steps for locating text line separators in part <strong>of</strong> <strong>document</strong> image.<br />
[75] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />
3.1 Adv<strong>an</strong>tages <strong>of</strong> Sauvola’s binarization method . . . . . . . . . . . 37<br />
3.2 Adv<strong>an</strong>tages <strong>of</strong> Otsu’s binarization method . . . . . . . . . . . . . 38<br />
3.3 Histograms <strong>of</strong> features based on solidity . . . . . . . . . . . . . . 40<br />
3.4 Histograms <strong>of</strong> features based on elongation . . . . . . . . . . . . 42<br />
3.5 Histograms <strong>of</strong> features based on height . . . . . . . . . . . . . . . 43<br />
iv