11.04.2014 Views

unified detection and recognition for reading text in scene images

unified detection and recognition for reading text in scene images

unified detection and recognition for reading text in scene images

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Figure 1.5. Un<strong>in</strong><strong>for</strong>med segmentation can make <strong>recognition</strong> difficult. Top: Input<br />

image. Left: Probability of <strong>for</strong>eground under a Gaussian mixture model (k = 3, HSV<br />

color space). Right: B<strong>in</strong>arization by threshold (p (<strong>for</strong>eground|k = 3, image) ≥ 1 2 ).<br />

One other closely related work is that of Tu et al. [118]. Here, discrim<strong>in</strong>ative<br />

models are used as bottom-up proposal distributions <strong>in</strong> a large generative model.<br />

Discrim<strong>in</strong>ative <strong>text</strong> detectors are used, <strong>and</strong> then a generative model must be able to<br />

expla<strong>in</strong> the pixel data as a contour describ<strong>in</strong>g a known character. They claim the<br />

advantage of the generative model is that it ensures consistent regions. However,<br />

spatially coupled discrim<strong>in</strong>ative models can overcome this [61, 124]. While their<br />

bottom-up detectors seem to do a reasonable job of detect<strong>in</strong>g <strong>text</strong> to subsequently be<br />

expla<strong>in</strong>ed, the generative models <strong>for</strong> characters are <strong>in</strong>dependent of one another. In<br />

other words, there is no coupl<strong>in</strong>g of <strong>in</strong><strong>for</strong>mation based on where a character should<br />

appear (i.e., next to other characters), nor how it should appear (similar to nearby<br />

characters). The models <strong>in</strong> this thesis will demonstrate the utility of such <strong>in</strong><strong>for</strong>mation<br />

<strong>for</strong> <strong>in</strong>terpret<strong>in</strong>g <strong>scene</strong> <strong>text</strong>.<br />

We can broadly categorize the systems <strong>for</strong> robust character <strong>recognition</strong> along the<br />

follow<strong>in</strong>g dimensions:<br />

• B<strong>in</strong>arization [87, 80, 113] versus grayscale feature extraction [23, 62, 52, 118]<br />

• Separate <strong>detection</strong>/segmentation [80, 113, 23] versus <strong>in</strong>tegrated <strong>detection</strong>/segmentation<br />

<strong>and</strong> <strong>recognition</strong> [87, 62, 52, 118]<br />

• Language <strong>in</strong>dependent [87, 23, 62, 118] versus language post-process<strong>in</strong>g [80, 113]<br />

versus <strong>in</strong>tegrated language models [52].<br />

Although some OCR systems have a fully <strong>in</strong>tegrated language model [5, 14, 52], the<br />

systems thus far designed <strong>for</strong> <strong>read<strong>in</strong>g</strong> <strong>scene</strong> <strong>text</strong> either use no language <strong>in</strong><strong>for</strong>mation<br />

[87, 22, 62] or, at best, post-process<strong>in</strong>g [80, 113]. Moreover, as the most difficult<br />

<strong>scene</strong> <strong>text</strong> suffers from the difficulties listed <strong>in</strong> Table 1.1 on page 6, b<strong>in</strong>arization <strong>and</strong><br />

segmentation will become <strong>in</strong>creas<strong>in</strong>gly difficult without regard <strong>for</strong> <strong>in</strong>terpretation (see<br />

Figure 1.5). The effectiveness of <strong>in</strong>tegrat<strong>in</strong>g language with simultaneous segmentation<br />

<strong>and</strong> <strong>recognition</strong> on grayscale—not b<strong>in</strong>ary—<strong>images</strong> was demonstrated <strong>in</strong> the camerabased<br />

OCR system of Jacobs et al. [52]. Research on noisy <strong>and</strong> degraded documents<br />

has recognized this fact <strong>for</strong> some time. Indeed, this may expla<strong>in</strong> some of the success<br />

of <strong>text</strong> <strong>detection</strong> <strong>and</strong> <strong>recognition</strong> systems that use commercial OCR systems (e.g.,<br />

[132, 131, 21]).<br />

12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!