14.01.2014 Views

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Contents<br />

tel-00912566, version 1 - 2 Dec 2013<br />

1 Introduction 2<br />

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2<br />

1.2 Document page segmentation . . . . . . . . . . . . . . . . . . . . 3<br />

1.3 Overview <strong>of</strong> the approach . . . . . . . . . . . . . . . . . . . . . . 4<br />

1.4 Contribution <strong>of</strong> this dissertation . . . . . . . . . . . . . . . . . . 5<br />

1.5 Our datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br />

1.6 Org<strong>an</strong>ization <strong>of</strong> this dissertation . . . . . . . . . . . . . . . . . . 11<br />

2 Related work 13<br />

2.1 Text/graphics separation . . . . . . . . . . . . . . . . . . . . . . . 13<br />

2.1.1 Connected component based methods . . . . . . . . . . . 14<br />

2.1.2 Region-based methods . . . . . . . . . . . . . . . . . . . . 16<br />

2.1.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 17<br />

2.2 Text region detection . . . . . . . . . . . . . . . . . . . . . . . . . 18<br />

2.2.1 Dist<strong>an</strong>ce-based methods . . . . . . . . . . . . . . . . . . . 19<br />

2.2.2 Whitespace <strong>an</strong>alysis . . . . . . . . . . . . . . . . . . . . . 19<br />

2.2.3 Texture-based methods . . . . . . . . . . . . . . . . . . . 20<br />

2.2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

2.3 Text line detection . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

2.3.1 Printed text line detection . . . . . . . . . . . . . . . . . . 25<br />

2.3.2 H<strong>an</strong>dwritten text line detection . . . . . . . . . . . . . . . 27<br />

2.3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />

3 Text/graphics separation 36<br />

3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36<br />

3.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36<br />

3.3 Feature <strong>an</strong>alysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44<br />

3.4 Classifier selection . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />

3.5 LogitBoost for classification . . . . . . . . . . . . . . . . . . . . . 46<br />

3.6 Post processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />

3.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />

4 Region detection 51<br />

4.1 Conditional r<strong>an</strong>dom fields (CRFs) . . . . . . . . . . . . . . . . . 53<br />

4.1.1 The three basic problems for CRFs . . . . . . . . . . . . . 55<br />

4.2 Feature functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />

4.3 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57<br />

4.3.1 Height <strong>an</strong>d width maps . . . . . . . . . . . . . . . . . . . 58<br />

ii

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!