11.04.2014 Views

unified detection and recognition for reading text in scene images

unified detection and recognition for reading text in scene images

unified detection and recognition for reading text in scene images

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Figure 1.1. Figure from the patent <strong>for</strong> the first practical OCR system [105].<br />

font consistency, l<strong>in</strong>guistic properties, <strong>and</strong> a lexicon (Chapter 4). This <strong>unified</strong> <strong>read<strong>in</strong>g</strong><br />

model outper<strong>for</strong>ms approaches us<strong>in</strong>g similar <strong>in</strong><strong>for</strong>mation <strong>in</strong> the more traditional<br />

pipel<strong>in</strong>ed architecture.<br />

Detection <strong>and</strong> <strong>recognition</strong> are not processes that happen <strong>in</strong> isolation, but that <strong>in</strong>stead<br />

require feedback <strong>and</strong> communication <strong>for</strong> optimal per<strong>for</strong>mance. Toward this end,<br />

we <strong>in</strong>troduce a template-based local feature model that can be utilized <strong>for</strong> <strong>detection</strong><br />

or <strong>recognition</strong> (Chapter 5). We achieve better results with fewer features by select<strong>in</strong>g<br />

features <strong>and</strong> tra<strong>in</strong><strong>in</strong>g the model on the jo<strong>in</strong>t task of <strong>detection</strong> <strong>and</strong> <strong>recognition</strong>,<br />

rather than <strong>for</strong> each sub-task <strong>in</strong>dependently. F<strong>in</strong>ally, we unify character <strong>and</strong> word<br />

segmentation with <strong>recognition</strong>, <strong>in</strong>tegrat<strong>in</strong>g these processes <strong>in</strong> a s<strong>in</strong>gle model that can<br />

evaluate <strong>and</strong> compare <strong>in</strong>terpretations with maximal awareness (Chapter 6).<br />

Next we describe the problem of <strong>scene</strong> <strong>text</strong> <strong>read<strong>in</strong>g</strong> <strong>and</strong> contrast it with the probem<br />

of document process<strong>in</strong>g. In the rema<strong>in</strong>der of this chapter we review some of the broad<br />

literature relevant to the tasks of document process<strong>in</strong>g, <strong>text</strong> <strong>detection</strong>, <strong>and</strong> character<br />

<strong>recognition</strong>. Chapter 2 will then review the common, underly<strong>in</strong>g framework <strong>for</strong> the<br />

probability models used <strong>in</strong> the rest of the thesis.<br />

1.1 Scene Text<br />

There is a long history of research <strong>and</strong> development of automatic readers. The first<br />

practical system was a patent filed <strong>in</strong> 1951 by David H. Shepard [105]. This device<br />

(see Figure 1.1) converted typewritten documents <strong>in</strong>to punch cards <strong>in</strong> a magaz<strong>in</strong>e<br />

subscription department. Of course, hardware devices <strong>and</strong> software techniques have<br />

improved a great deal <strong>in</strong> the fifty <strong>in</strong>terven<strong>in</strong>g years.<br />

Several differences exist between <strong>read<strong>in</strong>g</strong> <strong>text</strong> <strong>in</strong> documents <strong>and</strong> <strong>in</strong> <strong>scene</strong> <strong>images</strong>.<br />

A few examples are shown <strong>in</strong> Figure 1.2. The first primary difference is <strong>in</strong> the problem<br />

of locat<strong>in</strong>g the <strong>text</strong> to be recognized. In a st<strong>and</strong>ard one or two column document,<br />

almost no <strong>detection</strong> must be done. L<strong>in</strong>es of <strong>text</strong> are easy to identify <strong>and</strong> simple<br />

heuristics are usually sufficient to prepare the <strong>in</strong>put <strong>for</strong> <strong>recognition</strong>. In more complex<br />

documents such as newspapers <strong>and</strong> magaz<strong>in</strong>es, there is an added difficulty of dist<strong>in</strong>-<br />

2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!