Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel Segmentation of heterogeneous document images : an ... - Tel

tel.archives.ouvertes.fr
from tel.archives.ouvertes.fr More from this publisher
14.01.2014 Views

computes the area of D, G and D∩G. And the MatchScore(d, g) is set to: area(D ∩ G) max(area(D), area(G)) 4. The computed scores are stored in a match-score matrix. tel-00912566, version 1 - 2 Dec 2013 5. The performance evaluator computes a match-count table by computing a two-dimensional table from the computed match-score matrix. The entry MatchCount(d, g) is set to 1 if the MatchScore(d, g) is greater or equal acceptance threshold. Then two projection profiles, D-profile and G-profile are computed from the match-count table. The entry D(d) is computed as the sum of the matches in the dth row of the match-count table. Likewise, G(g) is computed as the sum of the matches in the gth row. In the original paper [77] the acceptance threshold for general purpose is set to 0.85 and also it’s the same for [70] followed by the original paper. However in [75] the threshold is set to 0.95 for text lines. 6. The evaluator computes the following counts: • one2one: The number of the one-to-one matches. One to one matches happen when D(i),MatchCount(i, j) and G(j) are all equal to one. If the segmentation algorithm produces a perfect result, all elements in the D-profile and the G-profile will be one. • If g one2many is the number of lines in ground-truth that have more than one detected line matches. • d one2many is the number of detected lines that have more than one line matches in ground-truths. • g many2one is the count that many lines in ground-truth have only one detected match line. • d many2one is the count that many detected lines have only one match in ground-truth. 7. Then, the evaluator defines three metrics. The Detection Rate, Recognition Accuracy and F-Measure which can be considered as weighted harmonic mean of detection rate and recognition accuracy [60]. They are defined as follows: DetectionRate(DR) = W 1 ∗ one2one N RecognitionAccuracy(RA) = W 4 ∗ one2one M + W 2 ∗ g one2many N + W 5 ∗ d one2many M + W 3 ∗ g many2one N + W 6 ∗ d many2one M 114

2 ∗ RA ∗ DR F − Measure = RA + DR where N is the number of lines in ground-truth and M is the number of recognized lines. w 1 ,w 2 ,w 3 ,w 4 ,w 5 ,w 6 are predetermined weights that are set to 1, 0.25, 0.25, 1, 0.25, 0.25 respectively in [75] and to 1, 0.75, 0.75, 1, 0.75, 0.75 in [5]. The former is more generous towards errors in results. A.3 Scenario driven region correspondence tel-00912566, version 1 - 2 Dec 2013 Scenario driven region correspondence [25] is another performance evaluation method implemented in Prima Layout Evaluation Tool. It is used as part of the performance evaluation methods in [4] and [6]. The essence of this method from a segmentation point of view is very similar to match counting. The method still computes merge, split, miss/partial miss and false detection errors. However it also takes into account the reading order of the document and assign different weights to errors in each entity depending on the situation in which an error occurs. For example, a merge between two text regions that belong to different text columns is more significant than a merge between two regions that follow each other in a single text column. The weights are set due to the scenario that a evaluation is carried out. All performance evaluations for paragraph detection are done using Prima Evaluation Tool under pure segmentation scenario. 115

computes the area <strong>of</strong> D, G <strong>an</strong>d D∩G. And the MatchScore(d, g) is set to:<br />

area(D ∩ G)<br />

max(area(D), area(G))<br />

4. The computed scores are stored in a match-score matrix.<br />

tel-00912566, version 1 - 2 Dec 2013<br />

5. The perform<strong>an</strong>ce evaluator computes a match-count table by computing a<br />

two-dimensional table from the computed match-score matrix. The entry<br />

MatchCount(d, g) is set to 1 if the MatchScore(d, g) is greater or equal<br />

accept<strong>an</strong>ce threshold. Then two projection pr<strong>of</strong>iles, D-pr<strong>of</strong>ile <strong>an</strong>d G-pr<strong>of</strong>ile<br />

are computed from the match-count table. The entry D(d) is computed as<br />

the sum <strong>of</strong> the matches in the dth row <strong>of</strong> the match-count table. Likewise,<br />

G(g) is computed as the sum <strong>of</strong> the matches in the gth row. In the original<br />

paper [77] the accept<strong>an</strong>ce threshold for general purpose is set to 0.85 <strong>an</strong>d<br />

also it’s the same for [70] followed by the original paper. However in [75]<br />

the threshold is set to 0.95 for text lines.<br />

6. The evaluator computes the following counts:<br />

• one2one: The number <strong>of</strong> the one-to-one matches. One to one<br />

matches happen when D(i),MatchCount(i, j) <strong>an</strong>d G(j) are all equal<br />

to one. If the segmentation algorithm produces a perfect result, all<br />

elements in the D-pr<strong>of</strong>ile <strong>an</strong>d the G-pr<strong>of</strong>ile will be one.<br />

• If g one2m<strong>an</strong>y is the number <strong>of</strong> lines in ground-truth that have<br />

more th<strong>an</strong> one detected line matches.<br />

• d one2m<strong>an</strong>y is the number <strong>of</strong> detected lines that have more th<strong>an</strong><br />

one line matches in ground-truths.<br />

• g m<strong>an</strong>y2one is the count that m<strong>an</strong>y lines in ground-truth have only<br />

one detected match line.<br />

• d m<strong>an</strong>y2one is the count that m<strong>an</strong>y detected lines have only one<br />

match in ground-truth.<br />

7. Then, the evaluator defines three metrics. The Detection Rate, Recognition<br />

Accuracy <strong>an</strong>d F-Measure which c<strong>an</strong> be considered as weighted<br />

harmonic me<strong>an</strong> <strong>of</strong> detection rate <strong>an</strong>d recognition accuracy [60]. They are<br />

defined as follows:<br />

DetectionRate(DR) =<br />

W 1 ∗ one2one<br />

N<br />

RecognitionAccuracy(RA) =<br />

W 4 ∗ one2one<br />

M<br />

+ W 2 ∗ g one2m<strong>an</strong>y<br />

N<br />

+ W 5 ∗ d one2m<strong>an</strong>y<br />

M<br />

+ W 3 ∗ g m<strong>an</strong>y2one<br />

N<br />

+ W 6 ∗ d m<strong>an</strong>y2one<br />

M<br />

114

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!