21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

While undeniably involving more context than HMMs, probabilistic CFGs suffer<br />

from the same lexicalisation problem and to a much higher degree from scarceness of<br />

hand-tagged training material (while the higher complexity involved would demand<br />

more training data, there is actually less material available 90 ). One of the core problems<br />

of PCFGs is deeply rooted in the assumption of "context-free-ness" itself: the<br />

probability of a given production is wrongly supposed to be the same everywhere. Still,<br />

linguistic context like the function and dependency of the non-terminal in question, will<br />

obviously have a strong influence on this probability. NPs, for instance, are more likely<br />

to be definite (i.e. expand into 'det-def N' or pronouns) in subject position than in direct<br />

object position. While function and dependency are easily available context conditions<br />

in Constraint Grammar, they would have to be expressed in a more implicit way in<br />

PCFGs. An NP's subject function, for example, might in English be expressed by stating<br />

that the NP in question is the first NP in a 'S -> NP VP' production happening to be<br />

describing the NP's mother node, and the conditional probability concerned would then<br />

read: p(NP -> det-def N | NP in S -> NP VP).<br />

Current Constraint Grammars, on the other hand, have only crude tools at their<br />

disposal for exploiting statistical tendencies in collocational patterns, like lexically<br />

marking certain readings as , or ordering rules in successively applied sets of<br />

less and less safe, or more and more heuristic character. Such rule hierarchies mimic, in<br />

a way, the rule probabilities of PCFGs, yet without the latter's mathematical precision.<br />

State-of-the-art probabilistic PoS-taggers can now compete with traditional rule<br />

based systems and achieve correctness rates of 96-97%. Probabilistic taggers also<br />

provide a good base line against which to measure any other tagger: even zero-order<br />

HMM, i.e. where each word simply is assigned its post likely PoS, have a correctness<br />

rate of 91-92%, for English (Eeg-Olofsson, 1991).<br />

Early systems computed both lexical probabilities and Markov Model PoS<br />

transition probabilities from tagged corpora, as - for English - in (Church, 1988) and in<br />

the LOB-tagging system, CLAWS (Garside, 1987), where a success rate of 96-97% is<br />

reported for a mixed tag sets of PoS, inflexion and - for a few words - base form. By<br />

using techniques like the Baum-Welch algorithm, lexica with different tag sets can be<br />

used as a starting point, with only ordinary text to train on. In (Cutting et. al., 1992), for<br />

example, 96% correctness is claimed for recovering PoS tags from the tagged Brown<br />

Corpus (Francis and Kucera, 1992), using only a lexicon and untagged training text<br />

from the same corpus. With yet another probabilistic approach, Ratnaparkhi's<br />

maximum-entropy tagger (Ratnaparkhi, 1996) claims 97% accuracy on WSJ text when<br />

trained on the Penn Treebank (Marcus et al., 1993). In (Brill, 1992) automatically<br />

learned trigram transformation rules are used in combination with a simple zero-order<br />

stochastic tagger, with error rates around 5% when using a tagged training corpus but<br />

90 For English, the 100.000 word syntactically annotated Suzanne corpus does provide such training data, but it must still be<br />

considered a corpus of rather modest size when compared to the market of purely PoS-tagged corpora.<br />

- 138 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!