14.01.2014 Views

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3. Repeat for m = 1, 2, ..., M<br />

• (a) Compute the working response <strong>an</strong>d weights<br />

z i =<br />

y ∗ i − p(x i)<br />

p(x i )(1 − p(x i )) .<br />

w i = p(x i )(1 − p(x i )).<br />

• Fit the classifier f m (x) ∈ {−1, +1}, by a weighted least-squares regression<br />

<strong>of</strong> z i to x i using weights w i .<br />

• Update F (x) ← F (x) + 1 2 f m(x) <strong>an</strong>d p(x) ← e F (x) /(e F (x) + e −F (x) ).<br />

4. Output the classifier sign[F (x)] = sign[ ∑ M<br />

m=1 f m(x)].<br />

3.6 Post processing<br />

tel-00912566, version 1 - 2 Dec 2013<br />

Based on our experiments, whenever a component is classified as graphics <strong>an</strong>d<br />

contains m<strong>an</strong>y other components, it goes into one <strong>of</strong> the two possible scenarios.<br />

Either the component is a large graphics that contains m<strong>an</strong>y other broken parts,<br />

or it is a frame or table that holds text characters. Because tables <strong>an</strong>d frames<br />

occupy a large area but the actual black pixels are far much less th<strong>an</strong> this area,<br />

they c<strong>an</strong> be easily identified with their solidity feature. We choose a small solidity<br />

value as the threshold that recognize tables from graphics. If the component<br />

has a larger solidity th<strong>an</strong> the threshold, it is identified as graphics <strong>an</strong>d all its<br />

children should also assigned a graphics label regardless <strong>of</strong> the label that are<br />

assigned to them by LogitBoost Classifier. Otherwise the children retain their<br />

original assigned labels.<br />

3.7 Results<br />

Results <strong>of</strong> our text/graphics separation comes in two flavors. In the first part<br />

we evaluate the perform<strong>an</strong>ce <strong>of</strong> our classifier on two datasets; dataset for IC-<br />

DAR2009 page segmentation competition <strong>an</strong>d dataset for ICDAR2011 historical<br />

<strong>document</strong> layout <strong>an</strong>alysis competition. Table 3.3 summaries these results. Note<br />

that the cross-dataset evaluation results is also provided in the table. In crossdataset<br />

evaluation the system is trained on one dataset but is tested on the<br />

other with totally different type <strong>of</strong> graphic structure. The indicated values are<br />

component based <strong>an</strong>d measure the precision <strong>an</strong>d recall <strong>of</strong> component classification<br />

regarding its class label.<br />

Table 3.3: EVALUATION OF LOGITBOOST ON TWO DATASETS FOR<br />

TEXT/GRAPHICS SEPARATION BEFORE POST-PROCESSING<br />

Trained on Tested on Text recall Graphics recall Text precision Graphics precision Text Accuracy<br />

ICDAR2009 ICDAR2009 99.88% 62.66% 98.91% 94.02% 98.82%<br />

ICDAR2009 ICDAR2011 81.31% 57.2% 96.23% 18.53% 79.65%<br />

ICDAR2011 ICDAR2011 99.15% 57.69% 96.92% 83.6% 96.29%<br />

ICDAR2011 ICDAR2009 98.40% 45.928% 98.4% 41.98% 96.65%<br />

47

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!