Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
(a) Original image (b) Height map (c) Width map<br />
tel-00912566, version 1 - 2 Dec 2013<br />
Figure 4.3:<br />
Height <strong>an</strong>d width maps<br />
4.3.1 Height <strong>an</strong>d width maps<br />
These two maps represent the local average height <strong>an</strong>d width <strong>of</strong> connected components<br />
in a <strong>document</strong> image. We do not use them directly in feature functions;<br />
instead, we use them as part <strong>of</strong> the computation for a dist<strong>an</strong>ce-based feature.<br />
Each pixel <strong>of</strong> a width/height map is the weighted average <strong>of</strong> the height or width<br />
<strong>of</strong> all text-labeled connected components in a 300 × 300 block around it.<br />
Each map c<strong>an</strong> be computed using the results <strong>of</strong> two image convolutions that<br />
are divided pixel by pixel. Consider that our goal is to generate the height map.<br />
We know the location <strong>of</strong> pixels for all text-labeled components <strong>of</strong> the page. We<br />
first prepare two sparse <strong>images</strong>. The first image A has height <strong>of</strong> a component for<br />
pixels <strong>of</strong> text components, <strong>an</strong>d the rest <strong>of</strong> pixels are zero. The second image B<br />
has a value <strong>of</strong> one for every pixel <strong>of</strong> text components, <strong>an</strong>d the rest <strong>of</strong> the pixels<br />
are zero. We also have a weighting function K <strong>of</strong> size 300×300 pixels defined as:<br />
K(x, y) = cos( πi πj<br />
) cos(<br />
300 300 )<br />
where x <strong>an</strong>d y are the coordinates <strong>of</strong> pixels with (0, 0) being the center <strong>of</strong><br />
the window. Then<br />
HeightMap(i,j) = (K ∗ A) i,j<br />
(K ∗ B) i,j<br />
The width map c<strong>an</strong> be computed in the same way but instead <strong>of</strong> having<br />
the height <strong>of</strong> connected components in the A image, we use widths. Figure 4.3<br />
displays one <strong>document</strong> image <strong>an</strong>d its height <strong>an</strong>d width map. We also normalize<br />
our height <strong>an</strong>d width maps by dividing every pixel’s value to the height <strong>an</strong>d<br />
width <strong>of</strong> the <strong>document</strong>, respectively.<br />
58