21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

no lexicon. By combining supervised and unsupervised learning, accuracies of up to<br />

96.8% have subsequently be described (Brill, 1996). Results for languages other than<br />

English seem to confirm the 97% mark as a kind of upper ceiling for the performance of<br />

probabilistic PoS taggers. Thus, the Morphy system described in (Lezius et. al., 1996)<br />

achieved an accuracy of 95.9% for a tag set of 51 tags, using a lexicon of 21.500 words<br />

(about 100.000 word forms). Lezius cites 5 other German taggers or morphology<br />

systems with accuracy rates in the range between 92.8 - 96.7% 91 (ibd. p. 370).<br />

For probabilistic (syntactic) parsing, performance is considerably lower, and such<br />

systems have not so far been able to replace manual annotation as a means of syntactic<br />

parsing. For standard PCFGs, which augment standard CFGs with probabilistic<br />

applicability constraints, accuracies of about 35% are supposed to be typical. Better<br />

results are achieved by conditioning production probabilities not only on the terminal in<br />

question, but also on the rule that generated it, as well as one or more subsequent words.<br />

On the short sentences of the MIT Voyager corpus, an accuracy of 87.5% is reported<br />

(Marcus, 1993). Some parsers make use of lexical information: For the SPATTER<br />

parser (Magermann, 1995) 84% accuracy is claimed for recovering labelled constituents<br />

in WSJ text. In (Collins, 1996) head-dependent relations between pairs of words are<br />

modelled in a probabilistic fashion, yielding 85% precision and recall on the same<br />

material. For longer sentences, systems do not fare as well: (Carroll and Briscoe, 1995)<br />

describes experiments with a probabilistic LR parser trained and tested on the Susannecorpus<br />

(average sentence length: 20 tokens), which first had been relabelled with<br />

CLAWS-II tags using the Acquilex HMM-tagger (Elsworthy, 1994). Here, for<br />

bracketings matching the treebank, a recall of 73.56% and a precision of 39.82 is<br />

reported for the highest ranked 3 analyses of each sentence. 43.8% of sentences had the<br />

correct analysis ranked among the top 10. Parse fails amounted to 25.9% and time-outs<br />

to 0.2%. Nearly a third of all test sentences received more than one hundred different<br />

analyses, 5.8% were assigned more than 100.000 parses.<br />

3.5.2 Generative Grammar:<br />

All or nothing - the competence problem<br />

91 For larger tag sets with hundreds of tags (presumably including inflexional information), considerably lower accuracy<br />

rates - around 80% - are cited for those members of the group of German taggers, that have this option. Of course, as<br />

Elworthy (1995) points out, what is important for performance, may not so much be the size of the tag set used, but the type<br />

of information encoded. From the point of view of disambiguation one might argue that larger tag sets leave more<br />

ambiguities to resolve, but they also provide more and better context to do so (for example, in the shape of inflexional<br />

agreement information). The relatively constant performance of different versions of CLAWS (Leech et. al., 1994), with tag<br />

set size varying by nearly a factor of three, seem to corroborate this assumption.<br />

- 139 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!