21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

In a brute force approach, for an average word ambiguity of 2, two to the power of n<br />

combinations would have to be computed. Such exponential complexity growth is, of<br />

course, quite prohibitive. But since what is wanted is only the most likely reading (and<br />

not the probabilities of all possible readings), the program can be set to only use the<br />

highest probability chain encountered so far (i.e. from word 1 up to word i) when<br />

moving on to the next word in the string (i.e. making the transition i -> i+1). This socalled<br />

Viterbi-algorithm yields linear (and therefore manageable) complexity growth,<br />

where the number of operations is proportional to 2n for n words that are on average<br />

two-way ambiguous. Due to the problem of limited training data, zero-probability<br />

transitions have to be replaced by small default values or by lower-n-gram values (i.e.<br />

trigrams by bigrams). Other necessary ad-hoc solutions include heuristics for proper<br />

nouns and lexicon failures (e.g. the use of suffix/PoS probabilities).<br />

Interestingly, while the existence of PoS-lexica is a conditio-sine-qua-non for<br />

most languages, the lack of a tagged training corpus for the estimation of transition<br />

probabilities can be partly compensated for by estimating the parameters of the HMM<br />

by means of an iterative re-estimation process on a previously untagged corpus (called<br />

"forward-backward"-algorithm or Baum-Welch-algorithm). On the other hand, if a<br />

sizeable tagged corpus is available for the language concerned, even the lack of a<br />

lexicon is no real hurdle, since a lexicon file can be automatically compiled from the<br />

tagged corpus, and will have a fair coverage at least for texts from the same domain.<br />

Thus, the 1 million word Brown corpus contains some 70.000 word forms. The<br />

importance of good lexicon coverage has been tested by Eeg-Olofsson (1991, IV p.43)<br />

for a system combining lexicon entries with a heuristics based on 610 suffix strings:<br />

using a 50.000 word corpus of spoken English, the system had an error rate of 2.4%<br />

with full lexicon coverage, but 6% when using a lexicon compiled from one half of the<br />

corpus and then tested on the other. Even with a large suffix module a sizeable lexicon<br />

appears to be necessary, in order to cover those words that are exceptions to the suffix<br />

patterns.<br />

The big advantage of probabilistic taggers is that they are fast, and can be trained<br />

in a short time, without the need of writing a real grammar of rules. Biasing a<br />

probabilistic tagger by adding hand written rules or exceptions, may actually have an<br />

adverse effect on its performance, since intervening on the behalf of a few irregular<br />

words, for example, would interfere with the much more important statistical modelling<br />

of the regular "majority" cases (Chanod and Tapanainen, 1994). Rumours have it that<br />

such phenomena, as well as development speed and cross-language portability of<br />

probabilistic tools, have made some commercial NLP enterprises believe that system<br />

improvement can actually be improved by firing a linguist (and hiring a mathematician<br />

instead). This view, of course, opportunistically ignores the fact that without linguists,<br />

there would be no lexica and no tagged corpora to train a probabilistic parser on in the<br />

first place.<br />

- 136 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!