21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Sources:<br />

English:<br />

EngCG: Karlsson et. al. (1995)<br />

FDG: http://www.conexor.fi/fdg.html#1 (14.3.1999), by conexor<br />

Portuguese: <strong>Bick</strong> (1996)<br />

Swedish: http://huovinen.lingsoft.fi/doc/swecg/intro/stags.html (23.12.1998), by lingsoft<br />

Norwegian: http://www.hf.uio.no/tekstlab/tagger.html (23.12.1998), by Janni Bonde Johannesen<br />

Estonian: http://www.cl.ut.ee/ee/yllitised/first/kailimyyrisep.html (23.12.1998) by Kaili<br />

Müürisep, University of Tartu<br />

In their comparison between ENGCG and their newly developed FDG, Voutilainen<br />

and Tapanainen (http://www.conexor.fi) report morphosyntactic success rates<br />

(percentage of correct morphosyntactic labels present in the output) of 94.2-96.8%<br />

for ENGCG and 96.4-97% for FDG, with an ambiguity rate of 11.3-13.7% for the<br />

former, and 3.2-3.3% for the latter. The Portuguese parser compares favourably to<br />

this, achieving about the same success rate as FDG (96.4-97.5) even with an<br />

ambiguity rate close to zero.<br />

For Estonian, Müürisep (1996), reports a syntactic error rate of 0.32%, but<br />

with an ambiguity rate of 32% (1.47 tags per word), making a direct comparison<br />

difficult.<br />

On the morphological level, performance is evidently better than on the<br />

syntactic level, for all CG systems. The “classic” ENGCG can be regarded as a base<br />

line, with an error rate of only 0.3% at 3-7% disambiguation. For SWECG 1.0, a<br />

Swedish Constraint Grammar (where no performance data on the syntactic level<br />

could be obtained at the time of writing), morphological performance is about the<br />

same as for English, with an error rate of 0.3%, at an ambiguity rate of 5%<br />

(www.sics.se/ humle/ projects/ svensk/ projectPlan.html, by Mikael Eriksson, Björn Gambäck and<br />

Scott McGlashan, accessed on 23.12.98). The Portuguese parser, by comparison, has an<br />

error range between 0.3% and 1.2%, with 0% ambiguity.<br />

With regard to the dependency performance of PALAVRAS, only a<br />

comparison with FDG makes sense. To compile table (4), a 5000-word text chunk<br />

was automatically analysed by a tree-generating version of PALAVRAS (the one<br />

used internally in the <strong>VISL</strong> grammar teaching programs), producing vertical tree<br />

output as in the example below (3).<br />

- 442 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!