21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.6 The rule formalism<br />

In principle, the paradigm of Constraint Grammar is independent not only of the<br />

particular notational conventions commonly associated with it (such as flat dependency<br />

syntax), but also of the rule formalism used to implement and compile Constraint<br />

Grammar rules. Up to now, however, only very few CG-compilers have been written,<br />

and the conventions established by Fred Karlsson’s original LISP-implementation have<br />

largely been maintained in later implementations. Today, to my knowledge, only Pasi<br />

Tapanainen’s two rule compilers, cg1 and cg2, are available to the research community,<br />

one licensed by Lingsoft (www.lingsoft.fi), the other by Connexor (www.conexor.fi).<br />

For testing purposes I programmed (in 1996) a C-version of a cg1-compatible compiler myself, which<br />

handled the morphological disambiguation module in my parser, but only at about 50% the speed<br />

achieved by Tapanainen’s cg1. Still, I gained valuable insight into the way CG-rules work and interact<br />

on a technical level. Thus, I was able to measure “reiteracy” on individual rule set levels: Though - in<br />

theory - rules are supposed to come into play gradually as their contexts grow safer by the work of other<br />

rules, in practice almost all test runs “dried up” already after 2 rounds (on the same heuristics level). In<br />

the face of 18% four-fold-or-higher morphological ambiguity (ch. 3.2.1), this may mean that CG-rules<br />

help each other somewhat more by focusing on different tags and contexts than by disambiguating each<br />

other’s context. In other words, CG-rules can be thought to be complementary to a higher degree than<br />

they are interdependent.<br />

This chapter is meant as a short but comprehensive introduction to Pasi Tapanainen's<br />

cg2 rule-compiler (Tapanainen, 1996), which is the one PALAVRAS is currently using<br />

(1999).<br />

The cg2-compiler runs under UNIX, with the following command line:<br />

dis —grammar rule-file < text.tagged > text.dis<br />

(which reads a rule file into the compiler, and applies it to a tagged text, a disambiguated version of<br />

which is then written to an output file.)<br />

Or (if mapping rules are included, typically at the syntactic level):<br />

mdis—grammar rule-file < text.tagged > text.map&dis<br />

Input from the morphological analyser must be verticalised text, i.e. one word form per<br />

line, followed by all possible readings for this word form, with one reading pr. line,<br />

typically arranged as a so-called cohort in the following way, conventionally with base<br />

forms in quotes, secondary tags in , and morphological tags (the ones destined for<br />

disambiguation) in capital letters.<br />

word form<br />

“base form-1” .. .. WORD CLASS-1 INFLEXION<br />

- 151 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!