21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

number of rules Rn, but the number of context conditions in the grammar, Cn. While<br />

obviously slowing down the parser, adding more absolute contexts does not change the<br />

linear complexity characteristic as such, a good algorithm that avoids checking contexts<br />

twice for different rules, may even make R grow slower than Rn. Unbounded context<br />

conditions, however, force the parser to look, if necessary, at all words and their<br />

readings in its half of the sentence. Processing time will therefore grow binomially<br />

((n*a) 2 ) with sentence length for that proportion G% of contexts that is unbounded.<br />

(3b) time ~ (n * a * C) * (n * a * G)<br />

where<br />

C = context number constant, depending on, but less than proportional to the<br />

number of contexts Cn in the grammar<br />

G = globality constant, depending on, but less than proportional to the proportion<br />

of unbounded contexts, G%, in the grammar<br />

Finally, processing time is also proportional to the proportion RM of REMOVE rules,<br />

since REMOVE rules have to look at all readings, while SELECT rules, when hitting<br />

the right reading (on average by trying half of them), discard all others automatically.<br />

Therefore 120 , the variable a has to be replaced by a*(RM+1)/2 in the first parenthesis of<br />

equation (3b). Likewise, a in the second parenthesis is influenced by the proportion SC<br />

of safe context conditions (NOT and C) in unbounded contexts.<br />

(3c) time ~ n * a * (RM+1)/2 * C * (n * a * (SC+1)/2 * G)<br />

where:<br />

RM = proportion of REMOVE rules<br />

SC = proportion of safe unbounded context conditions<br />

Binomial complexity growth is tolerable, and compares favourably with the exponential<br />

complexity growth 121 seen when a parser has to look at all analysis paths for a sentence<br />

parse (a n *C).<br />

Having discussed a in the chapter on ambiguity, and G as well asRM earlier in<br />

this chapter, I will now try to shed some light on rule complexity (C) and context<br />

certainty (SC).<br />

120<br />

With SE for the SELECT rule proportion, the formula would be a*RM + a*SE/2, with SE =1-RM we get a*RM + a*(1-<br />

RM)/2, which can be transformed into a*(RM+1)/2.<br />

121<br />

In a probabilistic HMM PoS tagger this problem can be solved by not "remembering" all paths, but only the highest<br />

probability path when progressing from left to right through the sentence. Complexity will then grow in a linear way (~ n * a<br />

* N), with N being the constant reflecting the size of the n-gram window. A probabilistic syntactic parser, evaluating whole<br />

sentence paths, will, of course, have to deal with the above mentioned exponentiality problem.<br />

- 171 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!