21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8<br />

Conclusion:<br />

The advantages of incrementality<br />

8.1 Evaluating the Portuguese parser:<br />

Specific conclusions<br />

In chapter 3.9, I have shown that the Portuguese CG parser achieves correctness<br />

rates on free text of over 99% for morphology/PoS and 96-97% for syntax, which<br />

compares favourably (ch. 3.5) to both PSG-systems - which are not robust and do not<br />

usually run on free text -, and to probabilistic systems, which hover around the 97%<br />

correctness mark for PoS-tagging, and only rarely succeed in analysing even medium<br />

size sentences correctly in their syntactic entirety. I have suggested (following<br />

Chanod & Tapanainen, 1994) that the advantage of the lexicon and rule based CG<br />

approach over a probabilistic approach resides in the possibility of formulating rules<br />

for exceptions, individual lexemes or rare patterns without disturbing the<br />

functionality of the majority cases, and - as opposed to HMM-taggers in particular -<br />

in the frequent use of long range and unbounded context restrictions (cp. rule type<br />

statistics, ch. 3-7-3). On the other hand, I have striven to document, that a CG<br />

grammar’s advantage over another major family of rule based systems, PSGgrammars,<br />

is not limited to the approach immanent robustness of a parser that<br />

expresses syntactic function by tags, and disambiguates rather than generates, but<br />

also can be made visible on the PSG-grammars’ home turf, syntactic tree structures.<br />

Thus, in the Portuguese system, I have incorporated dependency markers on the<br />

clause level (as opposed to only using them on the group level, like in the<br />

“traditional” ENGCG system), and introduced subclause function tags for finite and<br />

non-finite subclauses. Also, as practical proof of the system’s dependency<br />

information content, a compiler and a set of transformation rules were crafted to<br />

transform CG-output into PSG-style syntactic trees.<br />

Within the growing family of CG-based taggers/parsers, the Portuguese<br />

system is the only fully developed parser for a Romance language, so a certain<br />

typological interest is justified in the degree to which the Portuguese system differs<br />

from or resembles other Constraint Grammars. Areas of interest are (a) the notational<br />

system as such, (b) ambiguity and rule set typology, and (c) performance.<br />

At present, Constraint Grammar projects have been launched for a variety of<br />

languages, of which at least 5 (English, Portuguese, Swedish, Norwegian,<br />

Estonian) 248 have published morphological and/or syntactic tag sets. A comparison<br />

shows (below) that at least the Indo-European grammars share large parts of their<br />

248 In my Spanish Constraint Grammar, the morphological and syntactic tag sets are almost identical to the ones used for<br />

Portuguese.<br />

- 438 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!