21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

quite distinct, and few irregular exceptions exist. So most cases have to be lexical<br />

homonyms. A relatively common case are words in '-r' or 'l' which can cover 2 different<br />

lexemes, one masculine, one feminine (5a). Another possibility is lexicalised<br />

metaphorical use (5b).<br />

(5a)<br />

(5b)<br />

final<br />

"final" ADJ M/F S 'last'<br />

"final" N F S 'finale'<br />

"final" N M S 'end'<br />

cara<br />

"cara" N F S 'face'<br />

"cara" N M S 'guy'<br />

"caro" ADJ F S 'expensive'<br />

A certain amount of ambiguity is even purely syntactic or semantic, like much of the<br />

ADV internal ambiguity where I have chosen to treat the relative () and<br />

interrogative () subclasses of words like como, onde and quando as distinct<br />

word classes, in order to achieve early disambiguation 78 (i.e., in this case, make<br />

syntactic class information available at the PoS tagging level). Another example is the<br />

topological - name ambiguity in Salvador, which can both be a place name (not<br />

allowing an article) and a personal name (allowing the definite article).<br />

Only such word class internal semantic ambiguity has a chance to survive the<br />

tagger's disambiguation rule set, as the figures for the same VEJA text (6) show after<br />

complete analysis (i.e. including disambiguation) 79 .<br />

(6) Table: PoS-ambiguity resolved<br />

N ADJ VFIN INF GER PCP ADV PROP all<br />

pairs<br />

- 122 -<br />

precision<br />

80<br />

(%)<br />

78 This is, of course, an exception, since secondary tags do not usually justify separate reading lines, and are not meant to be<br />

disambiguated at the morphological stage. However, the above distinction in complementizer adverbials is of great<br />

importance for the disambiguation of other - morphological - ambiguities, like the above mentioned FUT SUBJ vs. INF<br />

readings, as well as for syntactic mapping (FS versus ICL).<br />

79 Since the PoS error rate for automatic disambiguation is under 1% classes (cf. chapter 3.9.2) and fairly balanced between<br />

word classes, there is nothing wrong with using the tagger's output after disambiguation as a base line for measuring<br />

"disambiguation gain" in comparison with the ambiguity found before disambiguation.<br />

80 Here defined as the ratio of word forms and word form readings, not , as in Karlsson et. al. (1995), correct readings<br />

divided by all readings. The reason for my usage of the term is, that at nearly 100% disambiguation, the alternative<br />

definition of 'precision' doesn't make much sense, since it will be close to the recall figure, both of which I therefore combine<br />

as correctness (treated in 3.9). Recall without disambiguation (where it does make sense as an independent figure) is treated<br />

in 2.2.6).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!