21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.9 Performance: Measuring correctness<br />

3.9.1 Training texts<br />

Working on "known" bench mark texts of 10-20.000 words, by constantly testing rule<br />

performance on manually introduced - markers, the Portuguese<br />

morphological tagger (analyser and disambiguator together) can be geared to resolve<br />

nearly all ambiguity while retaining a 99.9% correctness rate. For unknown texts,<br />

results are obviously lower. Yet, performance on training texts is not irrelevant, since it<br />

shows that the CG approach does not suffer from system immanent interference<br />

problems to the same degree as, say, a probabilistic tagger based on a pure trigram<br />

HMM, where (to my knowledge) even retraining and measuring on the same corpus<br />

seldom yields more than 97% correctness, even for parts of speech.<br />

Aiming at maximal precision, I have also worked on a larger, untagged text<br />

(170.000 word from the Borba-Ramsey corpus) on both the morphological and syntactic<br />

levels. Though it wasn't possible single-handedly to produce manually tagged<br />

benchmark-corpora of that size, or to fully inspect the outcome of an automatic tagging<br />

run, it still made sense automatically to extract and quantify surviving ambiguities after<br />

tagging runs, since precision (defined as the percentage of surviving readings, that are<br />

correct) can be approximated by minimising ambiguity, at least as long as intermittent<br />

bench mark runs ensure that new rules discard few correct readings, and the ambiguity<br />

percentage thus still remains high in comparison with the other factor in the precision<br />

calculus, error frequency. With a PoS error rate of 1%, for instance, and 10% two-fold<br />

ambiguity, precision would compute as 99/110 = 90%, and cutting ambiguity in half<br />

(while retaining the same error rate) would entail a nearly equivalent improvement in<br />

precision (99/105 94.3%). Surviving ambiguity, then, easily measured without manual<br />

control on any text corpus, can be used as an approximate guide to how precision is<br />

progressing during the grammar writing process. In contrast, recall (defined as the<br />

percentage of correct readings, that survive disambiguation) has - in the absence of a<br />

large tagged and proof-read Portuguese corpus for measuring - to be calculated<br />

manually on smaller sample texts.<br />

When forcing the parser into full disambiguation, where all words - with the<br />

exception of the rare cases of true ambiguity - end up with one reading only, recall and<br />

precision will obviously assume identical values, and one can regard the recall/precision<br />

figure as a direct measure for the parser's performance, which is why I will henceforth<br />

use the more general term correctness to mean recall/precision at 100%<br />

disambiguation.<br />

3.9.2 Test texts<br />

- 187 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!