13.07.2015 Views

sense tagging: don't look for the meaning but for the use

sense tagging: don't look for the meaning but for the use

sense tagging: don't look for the meaning but for the use

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

can relatively easily be extracted from corporausing grammatical and statistic filters, and manualchecking. In <strong>the</strong> barrage example, this in<strong>for</strong>mationis quite productive. It does not impose fur<strong>the</strong>rdividing, <strong>but</strong> strongly confirms <strong>the</strong> classesestablished so far. For instance, frequent verbswith barrage 2.1.1 as object are construire(=build), édifier (=edify), démolir (=démolish),etc., while verbs associated with barrage 2.1.2 area totally disjoint subset: dresser (=put up),franchir (=cross), démanteler (=dismantle), etc.Figure 2 shows <strong>the</strong> most frequent collocationsassociated with <strong>the</strong> various classes of <strong>use</strong>s <strong>for</strong>barrage, roughly grouped by syntactic category.Glosses are provided between square brackets only<strong>for</strong> <strong>the</strong> sake of readability. It is important to notethat <strong>the</strong> <strong>meaning</strong>s that <strong>the</strong>y are referring to werenot <strong>use</strong>d in <strong>the</strong> splitting process, which was doneonly on distri<strong>but</strong>ional grounds. However,interestingly enough, <strong>the</strong> classes of <strong>use</strong>s obtainedthis way are also coherent from a cognitive pointof view.5. CONCLUSIONIn this paper, I have shown that interannotatoragreement is very low in a straight<strong>for</strong>ward <strong>sense</strong><strong>tagging</strong>task, using a traditional dictionary. Forsome words, agreement was no better than chance.A careful analysis reveals that <strong>the</strong> main difficultiescome from <strong>the</strong> lack of distri<strong>but</strong>ional in<strong>for</strong>mation intraditional dictionaries. Building on severalcenturies of lexicographic tradition, dictionariesmainly attempt to describe and define <strong>meaning</strong>,and ra<strong>the</strong>r marginally give in<strong>for</strong>mation about word<strong>use</strong>s and distri<strong>but</strong>ional data. Only very recentlylexicographers have started making systematic <strong>use</strong>of corpora, and dictionaries still do not containsystematically <strong>the</strong> surface clues (syntactic,collocational, etc.) that are required to match agiven <strong>sense</strong> with a given corpus occurrence. I triedto show that distri<strong>but</strong>ional in<strong>for</strong>mation can provide<strong>the</strong> very foundations of dictionary organisation,and that entries can be divided up into coherentusage classes — that one can think about as<strong>sense</strong>s — on <strong>the</strong> sole basis of that in<strong>for</strong>mation,with no resort to <strong>meaning</strong> analysis and <strong>the</strong> more orless introspective or psychological considerationsthat such analysis usually requires. I am convincedthat large scale lexicons organised this way, andcontaining detailed distri<strong>but</strong>ional in<strong>for</strong>mation arenecessary in order <strong>for</strong> fundamental progress to bemade in <strong>sense</strong> <strong>tagging</strong> and o<strong>the</strong>r <strong>sense</strong>-relatedlanguage processing.BARRAGE1. [act of blocking] barrage de X par Y (+Nomin.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!