Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel Segmentation of heterogeneous document images : an ... - Tel
Chapter 7 Conclusion and future work tel-00912566, version 1 - 2 Dec 2013 In this thesis, we have provided a new framework for document page segmentation that brings some improvements upon current state-ofthe-art methods. Unlike most methods that apply page segmentation on all components of the page, we have presented a method to identify non-textual components and clear them from the document image before applying a page segmentation method. Moreover, the identified non-textual components such as rulers, table structures and advertisement frames are utilized again to separate text regions. For text region detection, we have introduced a framework based on twodimensional conditional random fields that can gather various observations from whitespace analysis, graphical components, run-lengths and separate regions of text according to these observations. As a result, we were able to improve the results of separating text columns when one is situated very close to the other. This framework also prevents the contents of a cell in a table to be merged with the contents of other adjacent cells. Furthermore, the method does not allow text regions inside a frame to be merged with other text regions around. After carefully examining methods for text line detection that can sustain the variety and variation of text lines in handwritten and printed document, we have adopted a variant of Papavassiliou’s text line detection method [75] to segment text regions into text lines. Finally, a novel trainable method based on binary partition tree is proposed to determine the paragraph structures based on the appearances of text lines in a text region. 7.1 Future direction As with every new work, there are many problems and challenges that can be overcome with an improvement to the methods presented in this thesis. We propose several ideas which would clearly be of benefit to this work. We are 110
currently working on some of these ideas to improve our results and we decided to set aside some others due to limitation in time. For text and graphics separation, it would be a benefit to use three labels of ”text”,”graphics” and ”text containers” instead of just two labels of ”text” and ”non-text” components. Text containers refer to tables and frames that contain text. Currently we are doing that by using a threshold on the number of children and the solidity feature of non-textual components. Learning this using a training dataset will make it more robust to errors but it requires a modification to the ground truth data that we use. tel-00912566, version 1 - 2 Dec 2013 For text region detection there is more room for improvement. Currently we are computing several observations with Gabor features with different size of the window and we send these observations to our feature functions. However, to make it more robust and multi-scale what we should do is to change the window size of the Gabor filter locally according to the local height of text connected components. To do this, we have to compute a complete set of Gabor filters and to convolve them with the original image to generate many filtering results. Then for each site, based on the average local height of text components, we have to pick the value from the filleting result that correspond to that local height. Clearly, this approach is unfeasible due to its computation burden. It would be desirable if one could benefit from a formulation and implementation of non-stationary Gabor filters in which the size of the kernel changes locally according to the local height of text components. Feature functions in our CRF model depend on several observation maps from the image. However, the number of features are not enough to separate sites with great confidence. Also the dependency of these observation are not exploited in our framework. As a result the contribution edge potentials in our CRF model is very limited due to small number of feature functions. One can benefit from more effective features such as Ferns [74, 73] to define feature functions in CRF model. As for inference in conditional random fields, we are using ICM [9] due to its simplicity and fast training time which is less than 4 to 6 hours. However, it is known that ICM fails to capture long-range interaction between sites’ labels. We evaluated loopy belief propagation [64] in this work and it could not converge to a good solution after considerable amount of time. One may benefit from other inference methods to improve results for the CRF mdoel. Finally, in the method for text line detection, the global median of height of text connected components are used inside the transition and emission probabilities of HMM. One can reformulate these probabilities to use the local average height of text components instead of the global statistics. 111
- Page 69 and 70: (a) Document (b) Filled text compon
- Page 71 and 72: tel-00912566, version 1 - 2 Dec 201
- Page 73 and 74: tel-00912566, version 1 - 2 Dec 201
- Page 75 and 76: f = [y c = 0] × [y tl = 0] f = [y
- Page 77 and 78: (a) Ground-truth (b) y c = 0 tel-00
- Page 79 and 80: ∂l λ = ∑ ( ∑y∈Y f k (y s ,
- Page 81 and 82: incorrect [100]. Several sufficient
- Page 83 and 84: tel-00912566, version 1 - 2 Dec 201
- Page 85 and 86: tel-00912566, version 1 - 2 Dec 201
- Page 87 and 88: tel-00912566, version 1 - 2 Dec 201
- Page 89 and 90: tel-00912566, version 1 - 2 Dec 201
- Page 91 and 92: Table 4.3: TION COUNT WEIGHTED SUCC
- Page 93 and 94: tel-00912566, version 1 - 2 Dec 201
- Page 95 and 96: tel-00912566, version 1 - 2 Dec 201
- Page 97 and 98: Chapter 5 Text line detection tel-0
- Page 99 and 100: tel-00912566, version 1 - 2 Dec 201
- Page 101 and 102: tel-00912566, version 1 - 2 Dec 201
- Page 103 and 104: Having specified the model, a verti
- Page 105 and 106: • The fifth step is to remove ext
- Page 107 and 108: tel-00912566, version 1 - 2 Dec 201
- Page 109 and 110: text lines can be divided into two
- Page 111 and 112: the two children. The root node rep
- Page 113 and 114: leaves of the tree which contain on
- Page 115 and 116: tel-00912566, version 1 - 2 Dec 201
- Page 117 and 118: tel-00912566, version 1 - 2 Dec 201
- Page 119: tel-00912566, version 1 - 2 Dec 201
- Page 123 and 124: • fn (false negative) is the numb
- Page 125 and 126: 2 ∗ RA ∗ DR F − Measure = RA
- Page 127 and 128: • ”-tn”: This option uses the
- Page 129 and 130: [12] T. M. Breuel. Two geometric al
- Page 131 and 132: [39] B. Gatos, A. Antonacopoulos, a
- Page 133 and 134: [64] K. P. Murphy, Y. Weiss, and M.
- Page 135 and 136: [91] M. Stamp. A revealing introduc
- Page 137 and 138: Index tel-00912566, version 1 - 2 D
Chapter 7<br />
Conclusion <strong>an</strong>d future work<br />
tel-00912566, version 1 - 2 Dec 2013<br />
In this thesis, we have provided a new framework for <strong>document</strong> page<br />
segmentation that brings some improvements upon current state-<strong>of</strong>the-art<br />
methods.<br />
Unlike most methods that apply page segmentation on all components <strong>of</strong> the<br />
page, we have presented a method to identify non-textual components <strong>an</strong>d clear<br />
them from the <strong>document</strong> image before applying a page segmentation method.<br />
Moreover, the identified non-textual components such as rulers, table structures<br />
<strong>an</strong>d advertisement frames are utilized again to separate text regions.<br />
For text region detection, we have introduced a framework based on twodimensional<br />
conditional r<strong>an</strong>dom fields that c<strong>an</strong> gather various observations from<br />
whitespace <strong>an</strong>alysis, graphical components, run-lengths <strong>an</strong>d separate regions <strong>of</strong><br />
text according to these observations. As a result, we were able to improve the<br />
results <strong>of</strong> separating text columns when one is situated very close to the other.<br />
This framework also prevents the contents <strong>of</strong> a cell in a table to be merged with<br />
the contents <strong>of</strong> other adjacent cells. Furthermore, the method does not allow<br />
text regions inside a frame to be merged with other text regions around.<br />
After carefully examining methods for text line detection that c<strong>an</strong> sustain<br />
the variety <strong>an</strong>d variation <strong>of</strong> text lines in h<strong>an</strong>dwritten <strong>an</strong>d printed <strong>document</strong>,<br />
we have adopted a vari<strong>an</strong>t <strong>of</strong> Papavassiliou’s text line detection method [75] to<br />
segment text regions into text lines.<br />
Finally, a novel trainable method based on binary partition tree is proposed<br />
to determine the paragraph structures based on the appear<strong>an</strong>ces <strong>of</strong> text lines in<br />
a text region.<br />
7.1 Future direction<br />
As with every new work, there are m<strong>an</strong>y problems <strong>an</strong>d challenges that c<strong>an</strong> be<br />
overcome with <strong>an</strong> improvement to the methods presented in this thesis. We<br />
propose several ideas which would clearly be <strong>of</strong> benefit to this work. We are<br />
110