14.01.2014 Views

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

the two children. The root node represents the entire text lines <strong>of</strong> the text<br />

region.<br />

• A set <strong>of</strong> features is extracted for each node <strong>of</strong> the binary tree. The cost to<br />

preserve a node as a paragraph is a Gaussi<strong>an</strong> weighted mixture <strong>of</strong> these<br />

features. The cost to remove a node is equal to the sum <strong>of</strong> costs <strong>of</strong> preserving<br />

both child nodes.<br />

• At each node <strong>of</strong> the tree from root to the leaves, if the cost <strong>of</strong> preserving<br />

a node is less th<strong>an</strong> the cost <strong>of</strong> removing the node, that node is marked<br />

to be preserved <strong>an</strong>d the rest <strong>of</strong> the children for that particular node are<br />

ignored. A dynamic programming framework is utilized to estimate these<br />

costs.<br />

tel-00912566, version 1 - 2 Dec 2013<br />

• Training is performed using a training scheme similar to Collin’s voted<br />

perceptron. It tunes weights to ensure that the cost <strong>of</strong> preserving a paragraph<br />

(node) <strong>of</strong> the ground truth is less th<strong>an</strong> that <strong>of</strong> its children. The<br />

cost <strong>of</strong> removing a leaf node is ∞ which ensures that no leaf node c<strong>an</strong> be<br />

removed regardless <strong>of</strong> the cost for preserving the lead node.<br />

6.1 Minimum sp<strong>an</strong>ning tree (MST)<br />

All text lines in one text region are fully connected with links between them.<br />

To compute the MST, first we need to compute a weight for each link. Consider<br />

that all the text lines in figure 6.1 are part <strong>of</strong> one text region. The weight for<br />

the link between the first text line (p) <strong>an</strong>d the second line (q) is defined as:<br />

W mst (p, q) = (1 + d(p, d))(1 + sin(∠ min (p, q))).<br />

where d(p, q) is the Euclide<strong>an</strong> dist<strong>an</strong>ces between the convex hull <strong>of</strong> the text<br />

line p (red marks) <strong>an</strong>d that <strong>of</strong> the text line q (blue marks). ∠ min (p, q) is the<br />

minimum positive <strong>an</strong>gle between the axis <strong>of</strong> p <strong>an</strong>d axis <strong>of</strong> q. The second term<br />

ensures that if accidentally a vertical line is included in the text region, it would<br />

be the last text line to join the sp<strong>an</strong>ning tree.<br />

6.2 Binary partition tree (BPT)<br />

The goal <strong>of</strong> converting a minimum sp<strong>an</strong>ning tree to a binary partition tree is<br />

illustrated in figure 6.2. The root node contains all the text lines. Then at each<br />

node based on a criterion that we describe below, the algorithm chooses one <strong>of</strong><br />

the remaining links inside the MST. The link is removed <strong>an</strong>d the MST ensures<br />

that the tree breaks into two set <strong>of</strong> text lines that are not connected with <strong>an</strong>y<br />

other links. Each set <strong>of</strong> lines are assigned to a child node until we reach to the<br />

101

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!