unified detection and recognition for reading text in scene images

More documents

Recommendations

Info

An alternative method for prediction is called maximum posterior marginal (MPM) estimation: ŷ i = arg max y i ∈Y i p (y i | x, θ, I), ∀i ∈ V. (2.31) It corresponds to choosing the label for each unknown that maximizes the probability with all other labelings marginalized. This can often be a more effective way of considering the probabilities of all the labelings, rather than simply the maximum (joint) labeling, as in MAP. Marginalization, however, suffers from the same computational complexity problems. Sum-product loopy belief propagation, described next, can be used to approximate these marginals. In practice, the MAP estimate tends to be conservative, trying to give the most correct labels, while MPM tends to give higher detection rates. These methods are used variously in our experiments. 2.3.2 Belief Propagation Because they all involve combinatorial sums or search spaces, we require approximations to compute the log partition function for the likelihood objective (2.15), the marginal probabilities for its gradient (2.18), or to make predictions with MAP or MPM, Fortunately, there is one family of algorithms that handles all of these. Namely, the loopy sum-product or belief propagation algorithm. In short, it exploits the nature in which the joint probability factorizes into a product of local functions. Recall the bipartite “factor graph” between unknowns y, represented by nodes in the graph, and the compatibility functions (or factors, when they are exponentiated) over the unknowns, as shown in the example of Figure 2.1 on page 22. The algorithm operates by iteratively passing messages between the nodes representing the unknowns and the factors. When these messages converge to a stable fixed point, the results are equivalent to minimizing the so-called Bethe free energy, a variational method for approximate inference from statistical physics [136]. When the factor graph is a tree, the sum-product algorithm [60] efficiently performs exact inference. In most cases, our graphs are not trees, and thus the results are only approximate. Let N (i) ≡ {C ∈ C | i ∈ C} be the family of indices for the factors that neighbor the ith unknown. This corresponds to the set of factors having y i as an argument. The set of edges in a factor graph are thus E (C) ≡ {(i, C) | i ∈ V ∧ C ∈ N (i)}. (2.32) For all edges in the factor graph (i, C) ∈ E (C), the node-to-factor messages have the general form ∏ m i→C (y i ) ∝ m C ′ →i (y i ) (2.33) C ′ ∈N(i)\C so that the message from a node to a factor is the product of all the messages to that node from all other neighboring factors. The resulting functional message is normalized (i.e., sums to 1 over y i ) for numerical stability. Note that for parsimony, 28
we drop the dependence of the messages m and beliefs b on the observation x and parameters θ. The factor-to-node messages combine the local information expressed in the factor (exponentiated compatibility) and the current messages from its other arguments, ∑ ∏ m C→i (y i ) = exp U C (y C ,x; θ C ) m j→C (y j ). (2.34) y C\{i} ∈Y C\{i} j∈C\{i} These messages are iteratively passed throughout the graph until they converge. Convergence is not guaranteed, but the algorithm empirically tends to give reasonable results in many applications. The name sum-product derives from the form of (2.34), which is a sum over products. At any step in the belief propagation algorithm, the current belief (approximate marginal probability) at a node is represented by the normalized product of messages to that node from its neighboring factors b i (y i ) ∝ ∏ m C→i (y i ). (2.35) C∈N(i) Usually this is only used when the algorithm has converged, but in some experiments we will use it in the middle of belief propagation. These beliefs may then be used to approximate the MPM prediction (2.31). The approximate marginals of sets of nodes corresponding to C ∈ C may be represented by the normalized product b C (y C ) ∝ exp U C (y C ,x; θ C ) ∏ i∈C m i→C (y i ). (2.36) These may be used to approximate the marginals p (y C | x, θ, I) required for calculating the expectations in the likelihood gradient (2.18). The likelihood p (y | x, θ, I) may be directly approximated by the product ratio ∏ C∈C b (y) = b C (y C ) ∏i∈V b i (y i ) |N(i)|−1. (2.37) An approximate method for finding the most likely labeling, or MAP estimate (2.30), involves changing the summation in the factor-to-node messages to a max m C→i (y i ) = max exp U C (y C ,x; θ C ) y C\{i} ∈Y C\{i} ∏ j∈C\{i} m j→C (y j ). (2.38) Using this form in the message passing is called the max-product algorithm. Greater detail about factor graphs and approximate inference may be found in articles by Kschischang et al. [60] and Yedidia et al. [136]. 29
Page 1 and 2: UNIFIED DETECTION AND RECOGNITION F
Page 3 and 4: UNIFIED DETECTION AND RECOGNITION F
Page 5 and 6: ACKNOWLEDGEMENTS As soon as one beg
Page 7 and 8: ABSTRACT UNIFIED DETECTION AND RECO
Page 9 and 10: CONTENTS Page ACKNOWLEDGEMENTS . .
Page 11 and 12: 4.3.2.1 Model Training . . . . . .
Page 13 and 14: LIST OF TABLES Table Page 1.1 Diffi
Page 15 and 16: 3.11 Visual comparison of local and
Page 17 and 18: CHAPTER 1 INTRODUCTION The first at
Page 19 and 20: Figure 1.2. Images for document pag
Page 21 and 22: T he P h oto S p e cialists Input O
Page 23 and 24: Figure 1.4. Small text in an image
Page 25 and 26: section, we review some of the appr
Page 27 and 28: Kusachi et al. [62] have a multi-re
Page 29 and 30: 1.3.3 Adaptive Recognition Several
Page 31 and 32: ather than integrated systems. In a
Page 33 and 34: Background Faces Allen Andrew Keith
Page 35 and 36: CHAPTER 2 DISCRIMINATIVE MARKOV FIE
Page 37 and 38: If we have a family of sets C = {C
Page 39 and 40: ̂θ ≡ arg max p (θ | D, I) (2.6
Page 41 and 42: functions, i.e., θ = [ θ A θ ] B
Page 43: P L (θ; α) = −α ‖θ‖ 1 (2.
Page 47 and 48: CHAPTER 3 TEXT AND SIGN DETECTION B
Page 49 and 50: For example, since neighboring regi
Page 51 and 52: Figure 3.3. Decomposition of images
Page 53 and 54: 3.3.1 Feature Overview Rather than
Page 55 and 56: • Raw pixel statistics (e.g., ran
Page 57 and 58: 3.4.2 Experimental Procedure For ou
Page 59 and 60: Figure 3.7. Example contextual dete
Page 61 and 62: Table 3.1. Sign detection results w
Page 63 and 64: Table 3.2. Sign detection results a
Page 65 and 66: In conclusion, the “default” pr
Page 67 and 68: CHAPTER 4 UNIFYING INFORMATION FOR
Page 69 and 70: will outline the details of our mod
Page 71 and 72: y 1 01 01 01 0 0 01 1 1 0 0 01 1 1
Page 73 and 74: 4.2.2 Language Model Properties of
Page 75 and 76: 5 Basis Functions Learned Function
Page 77 and 78: node for the factor C, while w B is
Page 79 and 80: Figure 4.5. Examples of sign evalua
Page 81 and 82: Lexicon The lexicon we use is deriv
Page 83 and 84: 40 40 Number of Queries 30 20 10 Nu
Page 85 and 86: and could be modeled directly if we
Page 87 and 88: Figure 4.9. Examples from the sign
Page 89 and 90: Model Correct Free checking 31 BOLT
Page 91 and 92: 4.3.4.2 Lexicon Model Here we discu
Page 93 and 94: CHAPTER 5 UNIFYING DETECTION AND RE
Page 95 and 96:
character and these were directly u
Page 97 and 98:
L C ( θ C ; F, D ) ≡ ∑ k log p
Page 99 and 100:
For the independent method, each ca
Page 101 and 102:
Wavelet Transform Scale 0.3 0.25 0.
Page 103 and 104:
Scale −0.02 0 0.02 0.04 0.06 0.08
Page 105 and 106:
Figure 5.6. Sample images used in e
Page 107 and 108:
1 Avg. Categ. Accuracy 0.95 0.9 0.8
Page 109 and 110:
Relative Improvement 0.5 0.4 0.3 0.
Page 111 and 112:
Recognition Gain 1 0.8 0.6 0.4 0.2
Page 113 and 114:
of further special recognition. Hav
Page 115 and 116:
CHAPTER 6 THE ROBUST READER In the
Page 117 and 118:
6.2 Semi-Markov Model for Segmentat
Page 119 and 120:
6.2.1.2 Character Bigrams As in ear
Page 121 and 122:
exponential of the corresponding Ma
Page 123 and 124:
is signalled, and this string may e
Page 125 and 126:
and k. This joint space over i and
Page 127 and 128:
each location, all the rectified fi
Page 129 and 130:
This procedure is outlined visually
Page 131 and 132:
(e.g., cs, Bk, Kr, Nb, pd, Tl), we
Page 133 and 134:
18.8 18.6 18.4 KL N−Best Ratio Er
Page 135 and 136:
σ Image Output Binarized OmniPage
Page 137 and 138:
F I B E IA A R T C E N T E R (a) F
Page 139 and 140:
6.4.5 End-to-End Demonstration In t
Page 141 and 142:
T he P h oto S p e cialists The Pro
Page 143 and 144:
T he P hoto S p ecialists L L O Y D
Page 145 and 146:
Table 6.2. Comparison of recognitio
Page 147 and 148:
mercial document recognition softwa
Page 149 and 150:
In addition to the models, we have
Page 151 and 152:
[12] Blum, Avrim, and Langley, Pat.
Page 153 and 154:
[37] Geman, Stuart, and Geman, Dona
Page 155 and 156:
[63] Lafferty, John, McCallum, Andr
Page 157 and 158:
[89] Pal, Chris, Sutton, Charles, a
Page 159 and 160:
[114] Torralba, Antonio, Murphy, Ke
Page 161:
[139] Zhou, Yaqian, Wenb, Fuliang,
show all

unified detection and recognition for reading text in scene images

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?