21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.5.1 Probabilistics: The 'fire a linguist' approach<br />

Most probabilistic NLP systems address part of speech tagging by automatic training<br />

and base themselves upon Hidden Markov Models (HMM), a mathematical model,<br />

where a surface-sequence of symbols is stochastically generated by an underlying<br />

("hidden") Markov process with a state- and/or transition-dependent symbol generator.<br />

A Markov Model consists of a finite number of states and describes processes (or<br />

sequences) as transitions (probability labelled arcs) between these states:<br />

(1)<br />

start<br />

0.2 0.1<br />

0.5 0.4 0.6<br />

2<br />

0.3<br />

1 3<br />

The MM in (1) has three states and a stop-state (Ø). When in state 1, for example, the<br />

MM has a 50% probability of staying there, a 20% probability of moving to state 2, and<br />

a 30% probability of moving to state three. The probability for a given sequence can be<br />

computed as the product of the individual transition probabilities. Thus, the sequence<br />

1132 is assigned the probability 0.5 x 0.3 x 0.1 x 0.4 = 0.006. Since transition<br />

probabilities only depend on which state the process is in at a given point in time, such<br />

a MM is called a first order Markov Model. If the model's states represented the words<br />

of a language, sequences could be used to model utterances in that language, and<br />

transition probabilities could be computed as bigram frequencies in a text corpus.<br />

However, the lack of "contextual memory" in a first order MM makes it impossible to<br />

describe long distance correlations like subject-predicate agreement or valency. In<br />

theory, using higher order Markov Models can be used to somewhat soften this<br />

problem. In a n-th order MM, the networks history of the last n-1 states is taken into<br />

- 134 -<br />

0.9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!