21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

0 1 2 3 4 5 6 7 8 9+ heurprop<br />

% word form tokens with n readings: N(w)-%<br />

% word form tokens more than n-ways ambiguous: 100-cum%<br />

- 117 -<br />

N(w)-%<br />

100-cum%<br />

ort-alt<br />

Highly ambiguous words with more than 5 readings are very rare, cumulating to<br />

roughly 1%. Very high ambiguity is usually a symptom of derivational complexity,<br />

where every word class or inflexion reading can again be ambiguous with regard to the<br />

derivational path assumed (prefix & suffix or 2 suffixes?, noun or adjective root?).<br />

Of the 0.3-0.4 % words lacking analysis, most are misspellings, quotations or<br />

loan words from other languages (mainly English, but also French, German and Latin),<br />

and "names" without capitalisation, e.g. pharmaceutical drug names (cp. chapter 2.2.6).<br />

The RNP corpus contains both literature, secondary literature and a considerable<br />

portion of bibliographical information. Considering that the latter accounts for some<br />

text passages in English, French and Spanish as well as foreign language book titles,<br />

bibliographical abbreviations etc., a recall failure of 0.4% must be regarded as quite<br />

low, - and only one forth of this (0.1% or 134 tokens) consists of unanalysable

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!