Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
the two children. The root node represents the entire text lines <strong>of</strong> the text<br />
region.<br />
• A set <strong>of</strong> features is extracted for each node <strong>of</strong> the binary tree. The cost to<br />
preserve a node as a paragraph is a Gaussi<strong>an</strong> weighted mixture <strong>of</strong> these<br />
features. The cost to remove a node is equal to the sum <strong>of</strong> costs <strong>of</strong> preserving<br />
both child nodes.<br />
• At each node <strong>of</strong> the tree from root to the leaves, if the cost <strong>of</strong> preserving<br />
a node is less th<strong>an</strong> the cost <strong>of</strong> removing the node, that node is marked<br />
to be preserved <strong>an</strong>d the rest <strong>of</strong> the children for that particular node are<br />
ignored. A dynamic programming framework is utilized to estimate these<br />
costs.<br />
tel-00912566, version 1 - 2 Dec 2013<br />
• Training is performed using a training scheme similar to Collin’s voted<br />
perceptron. It tunes weights to ensure that the cost <strong>of</strong> preserving a paragraph<br />
(node) <strong>of</strong> the ground truth is less th<strong>an</strong> that <strong>of</strong> its children. The<br />
cost <strong>of</strong> removing a leaf node is ∞ which ensures that no leaf node c<strong>an</strong> be<br />
removed regardless <strong>of</strong> the cost for preserving the lead node.<br />
6.1 Minimum sp<strong>an</strong>ning tree (MST)<br />
All text lines in one text region are fully connected with links between them.<br />
To compute the MST, first we need to compute a weight for each link. Consider<br />
that all the text lines in figure 6.1 are part <strong>of</strong> one text region. The weight for<br />
the link between the first text line (p) <strong>an</strong>d the second line (q) is defined as:<br />
W mst (p, q) = (1 + d(p, d))(1 + sin(∠ min (p, q))).<br />
where d(p, q) is the Euclide<strong>an</strong> dist<strong>an</strong>ces between the convex hull <strong>of</strong> the text<br />
line p (red marks) <strong>an</strong>d that <strong>of</strong> the text line q (blue marks). ∠ min (p, q) is the<br />
minimum positive <strong>an</strong>gle between the axis <strong>of</strong> p <strong>an</strong>d axis <strong>of</strong> q. The second term<br />
ensures that if accidentally a vertical line is included in the text region, it would<br />
be the last text line to join the sp<strong>an</strong>ning tree.<br />
6.2 Binary partition tree (BPT)<br />
The goal <strong>of</strong> converting a minimum sp<strong>an</strong>ning tree to a binary partition tree is<br />
illustrated in figure 6.2. The root node contains all the text lines. Then at each<br />
node based on a criterion that we describe below, the algorithm chooses one <strong>of</strong><br />
the remaining links inside the MST. The link is removed <strong>an</strong>d the MST ensures<br />
that the tree breaks into two set <strong>of</strong> text lines that are not connected with <strong>an</strong>y<br />
other links. Each set <strong>of</strong> lines are assigned to a child node until we reach to the<br />
101