unified detection and recognition for reading text in scene images

More documents

Recommendations

Info

Figure 1.1. Figure from the patent for the first practical OCR system [105]. font consistency, linguistic properties, and a lexicon (Chapter 4). This unified reading model outperforms approaches using similar information in the more traditional pipelined architecture. Detection and recognition are not processes that happen in isolation, but that instead require feedback and communication for optimal performance. Toward this end, we introduce a template-based local feature model that can be utilized for detection or recognition (Chapter 5). We achieve better results with fewer features by selecting features and training the model on the joint task of detection and recognition, rather than for each sub-task independently. Finally, we unify character and word segmentation with recognition, integrating these processes in a single model that can evaluate and compare interpretations with maximal awareness (Chapter 6). Next we describe the problem of scene text reading and contrast it with the probem of document processing. In the remainder of this chapter we review some of the broad literature relevant to the tasks of document processing, text detection, and character recognition. Chapter 2 will then review the common, underlying framework for the probability models used in the rest of the thesis. 1.1 Scene Text There is a long history of research and development of automatic readers. The first practical system was a patent filed in 1951 by David H. Shepard [105]. This device (see Figure 1.1) converted typewritten documents into punch cards in a magazine subscription department. Of course, hardware devices and software techniques have improved a great deal in the fifty intervening years. Several differences exist between reading text in documents and in scene images. A few examples are shown in Figure 1.2. The first primary difference is in the problem of locating the text to be recognized. In a standard one or two column document, almost no detection must be done. Lines of text are easy to identify and simple heuristics are usually sufficient to prepare the input for recognition. In more complex documents such as newspapers and magazines, there is an added difficulty of distin- 2
Figure 1.2. Images for document page reading (top) and scene text reading (bottom). 3
Page 1 and 2: UNIFIED DETECTION AND RECOGNITION F
Page 3 and 4: UNIFIED DETECTION AND RECOGNITION F
Page 5 and 6: ACKNOWLEDGEMENTS As soon as one beg
Page 7 and 8: ABSTRACT UNIFIED DETECTION AND RECO
Page 9 and 10: CONTENTS Page ACKNOWLEDGEMENTS . .
Page 11 and 12: 4.3.2.1 Model Training . . . . . .
Page 13 and 14: LIST OF TABLES Table Page 1.1 Diffi
Page 15 and 16: 3.11 Visual comparison of local and
Page 17: CHAPTER 1 INTRODUCTION The first at
Page 21 and 22: T he P h oto S p e cialists Input O
Page 23 and 24: Figure 1.4. Small text in an image
Page 25 and 26: section, we review some of the appr
Page 27 and 28: Kusachi et al. [62] have a multi-re
Page 29 and 30: 1.3.3 Adaptive Recognition Several
Page 31 and 32: ather than integrated systems. In a
Page 33 and 34: Background Faces Allen Andrew Keith
Page 35 and 36: CHAPTER 2 DISCRIMINATIVE MARKOV FIE
Page 37 and 38: If we have a family of sets C = {C
Page 39 and 40: ̂θ ≡ arg max p (θ | D, I) (2.6
Page 41 and 42: functions, i.e., θ = [ θ A θ ] B
Page 43 and 44: P L (θ; α) = −α ‖θ‖ 1 (2.
Page 45 and 46: we drop the dependence of the messa
Page 47 and 48: CHAPTER 3 TEXT AND SIGN DETECTION B
Page 49 and 50: For example, since neighboring regi
Page 51 and 52: Figure 3.3. Decomposition of images
Page 53 and 54: 3.3.1 Feature Overview Rather than
Page 55 and 56: • Raw pixel statistics (e.g., ran
Page 57 and 58: 3.4.2 Experimental Procedure For ou
Page 59 and 60: Figure 3.7. Example contextual dete
Page 61 and 62: Table 3.1. Sign detection results w
Page 63 and 64: Table 3.2. Sign detection results a
Page 65 and 66: In conclusion, the “default” pr
Page 67 and 68: CHAPTER 4 UNIFYING INFORMATION FOR
Page 69 and 70:
will outline the details of our mod
Page 71 and 72:
y 1 01 01 01 0 0 01 1 1 0 0 01 1 1
Page 73 and 74:
4.2.2 Language Model Properties of
Page 75 and 76:
5 Basis Functions Learned Function
Page 77 and 78:
node for the factor C, while w B is
Page 79 and 80:
Figure 4.5. Examples of sign evalua
Page 81 and 82:
Lexicon The lexicon we use is deriv
Page 83 and 84:
40 40 Number of Queries 30 20 10 Nu
Page 85 and 86:
and could be modeled directly if we
Page 87 and 88:
Figure 4.9. Examples from the sign
Page 89 and 90:
Model Correct Free checking 31 BOLT
Page 91 and 92:
4.3.4.2 Lexicon Model Here we discu
Page 93 and 94:
CHAPTER 5 UNIFYING DETECTION AND RE
Page 95 and 96:
character and these were directly u
Page 97 and 98:
L C ( θ C ; F, D ) ≡ ∑ k log p
Page 99 and 100:
For the independent method, each ca
Page 101 and 102:
Wavelet Transform Scale 0.3 0.25 0.
Page 103 and 104:
Scale −0.02 0 0.02 0.04 0.06 0.08
Page 105 and 106:
Figure 5.6. Sample images used in e
Page 107 and 108:
1 Avg. Categ. Accuracy 0.95 0.9 0.8
Page 109 and 110:
Relative Improvement 0.5 0.4 0.3 0.
Page 111 and 112:
Recognition Gain 1 0.8 0.6 0.4 0.2
Page 113 and 114:
of further special recognition. Hav
Page 115 and 116:
CHAPTER 6 THE ROBUST READER In the
Page 117 and 118:
6.2 Semi-Markov Model for Segmentat
Page 119 and 120:
6.2.1.2 Character Bigrams As in ear
Page 121 and 122:
exponential of the corresponding Ma
Page 123 and 124:
is signalled, and this string may e
Page 125 and 126:
and k. This joint space over i and
Page 127 and 128:
each location, all the rectified fi
Page 129 and 130:
This procedure is outlined visually
Page 131 and 132:
(e.g., cs, Bk, Kr, Nb, pd, Tl), we
Page 133 and 134:
18.8 18.6 18.4 KL N−Best Ratio Er
Page 135 and 136:
σ Image Output Binarized OmniPage
Page 137 and 138:
F I B E IA A R T C E N T E R (a) F
Page 139 and 140:
6.4.5 End-to-End Demonstration In t
Page 141 and 142:
T he P h oto S p e cialists The Pro
Page 143 and 144:
T he P hoto S p ecialists L L O Y D
Page 145 and 146:
Table 6.2. Comparison of recognitio
Page 147 and 148:
mercial document recognition softwa
Page 149 and 150:
In addition to the models, we have
Page 151 and 152:
[12] Blum, Avrim, and Langley, Pat.
Page 153 and 154:
[37] Geman, Stuart, and Geman, Dona
Page 155 and 156:
[63] Lafferty, John, McCallum, Andr
Page 157 and 158:
[89] Pal, Chris, Sutton, Charles, a
Page 159 and 160:
[114] Torralba, Antonio, Murphy, Ke
Page 161:
[139] Zhou, Yaqian, Wenb, Fuliang,
show all

unified detection and recognition for reading text in scene images

Create successful ePaper yourself

Delete template?

Save as template?