21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

wide coverage tagging and parsing tasks, as in the case of the GPSG based Alvey<br />

Natural Language Tools - ANLT - (Phillips & Thompson, 1987) or the ongoing TOSCA<br />

project (Oostdijk, 1991) using extended affix grammar. Since an existing CFG can be<br />

enhanced by probabilistic indexing of its production rules (cp. 3.5.2), hybrid systems<br />

may be one way to solve the recalcitrant problem of huge parse forests for long<br />

sentences, conceptually inherent to the constituent analysis approach. In (Wauschkuhn,<br />

1996) a chart parser is used to implement 615 PSG rules for German, where every rule<br />

is assigned a "safety factor" measuring "usage plausability". The default for terminal<br />

productions is 1. The safety factor, though seemingly assigned by hand, works much the<br />

same way as rule probabilities in PCFGs, allowing to compute a ranking for every tree<br />

in the parse forest: here, the safety factor of the left side (non-terminal) of a production<br />

is the product of the safety factors of all right hand side symbols. Wauschkuhn's parsing<br />

system assigns complete analyses to 56.5% of the sentences in a 1.6 million word news<br />

text corpus, and partial analyses to 85.7%. Due to the lack of a benchmark corpus, no<br />

correctness rate is given 98 . In contrast to many other systems, a sentence is analysed in<br />

two steps: more than half the rules treat macrostructure (clause-trees), and the rest then<br />

parses each subclause's microstructure individually. Thus, even partial analyses still<br />

construct clause-trees, with less than a fifth of partial analyses exhibiting microstructure<br />

failures in more than one subclause. This additional robustness is reminiscent of<br />

Constraint Grammar, where all rules in principle are perceived as independent of each<br />

other, and most of the structure of a sentence will survive a locally wrong function tag<br />

or a wrong dependency marking.<br />

3.5.3 Constraint Grammar: the holographic picture<br />

(addressing ambiguity directly)<br />

Most words in natural language texts are - seen in isolation - ambiguous with regard to<br />

word class, inflexion, syntactic function, semantic content etc. It is, above all, sentence<br />

context (besides content coherence and the reader's "knowledge about the world") that<br />

determines how a word is to be understood. Constraint Grammar (CG), introduced by<br />

Fred Karlsson (1990) and shaped by the Helsinki School (cp. Karlsson et.al., 1995), is a<br />

grammatical approach that aims at performing such disambiguation by establishing<br />

rules for which of a word form's possible readings is to be chosen, and which readings<br />

are to be discarded in a given sentence context. In the parser itself these rules are<br />

98 Wauschkuhn did experiment with ambiguity (ibd., p. 366), reducing parse forest size by running input text through a PoS<br />

tagger first, but blames the available taggers' high error rate (3-5%) for a corresponding drop in parse quality. The<br />

interesting question is how the experiment would have worked with input from a Constraint Grammar tagger, since such<br />

taggers usually claim much lower error rates than probabilistic systems, cp. (Karlsson et. al., 1995) and (<strong>Bick</strong>, 1996).<br />

- 146 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!