21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

words: 40.8 10.5 17.9 2.9 0.5 3.1 8.2 6.9 71.0<br />

Table (2) shows the relative risks of a word class WC1 word form to be WC1-WC2<br />

ambiguous. The percentage given is the ratio between the frequency of this ambiguity<br />

class and the frequency of words with at least one WC1 reading: WC1&WC2/WC1. For<br />

example, 15.7% of all words with N readings are ambiguous with at least one VFIN<br />

reading. The isolated word class frequencies for the undisambiguated text are given in<br />

the last row (shaded, e.g. for N, 40.8%).<br />

My ambiguity index is not a percentage, but the sum of all instances of different<br />

ambiguity pairs for a word class WC1 (given in the last column in table 1, i.e. for VFIN,<br />

28.274, the sum of VFIN-N, VFIN-ADJ, VFIN-VFIN, VFIN-INF and so on), divided<br />

by the number of all VFIN candidate word forms (30.619). The resulting figure looks<br />

like a percentage, in fact, it is the sum of all percentages in one row, yet due to the fact<br />

that many word forms host several WC ambiguity pairs, this "sum" is somewhat higher<br />

than what would be the "real" percentage of ambiguous instances for that word class.<br />

The overall ambiguity index for open word class ambiguity (45.1) is calculated as the<br />

ratio between the sum of all WC ambiguity instances (equalling half the sum of the last<br />

column in table 1, minus the shaded boxes), divided by the number of open word class<br />

candidates.<br />

Maybe the most striking result is the fact that nouns appear to be frequent but<br />

harmless, while adjectives and participles are rarer, but very likely to belong too another<br />

nominal 77 class, too. The reason for the latter is a semantico-etymological one - many<br />

participles tend to be treated lexicographically as adjectives, and many adjectives<br />

function as nouns, too. Since lexicography is often bilingually motivated, and word<br />

classes often defined functionally, adjectives like dinamarquês ('Danish') are also listed<br />

as nouns ('Dane'), though there is no morphological reason for this - even the lexeme<br />

category test fails, since these nouns often - atypically - possess gender inflexion like<br />

their adjective counterpart. In the case of ADJ-PCP ambiguity, the parser is set to<br />

routinely discard the ADJ reading, and only "remember" it for later translational<br />

purposes, by adding an tag. However, this is done after the tagging stage,<br />

though the full ambiguity is preserved in table (2).<br />

The most dangerous case, however, are VFIN readings. Because of finite verbs'<br />

crucial role in syntactic mapping, the nearly 50% chance of VFIN-nominal ambiguity<br />

(N, ADJ, PCP, PROP combined) is disconcerting, which is why I will provide a short<br />

assessment of this particular disambiguation task ante temporem. Several<br />

morphologically different endings cases can be distinguished:<br />

(3)<br />

77 ‘Nominal’ is here used as an umbrella term for the open word classes defined by number and gender (N, ADJ, PCP and,<br />

where relevant, PROP)<br />

- 120 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!