13.07.2015 Views

sense tagging: don't look for the meaning but for the use

sense tagging: don't look for the meaning but for the use

sense tagging: don't look for the meaning but for the use

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

syntactic property. Worse yet, <strong>the</strong> Petit Laroussedefinitions II and III which at first glance couldcorrespond to this division are in fact at odds withit, as <strong>the</strong> examples and sub-<strong>sense</strong>s reveal.4.3. From <strong>meaning</strong> to <strong>use</strong>It is always easy to point out weaknesses anderrors in entries, in any dictionary. However, mycriticism is of a different nature. I am not trying tospot occasional flaws, <strong>but</strong> questioning <strong>the</strong> verystyle and organisation of entries. In almost all of<strong>the</strong> 60 words <strong>use</strong>d in <strong>the</strong> Experiment Two, <strong>the</strong>definitions (which are after all <strong>the</strong> onlyin<strong>for</strong>mation that annotators have at <strong>the</strong>ir disposalin order to match individual <strong>sense</strong>s with corpuscontexts) do not contain enough clues to per<strong>for</strong>m<strong>the</strong> task safely. Worse yet, <strong>the</strong> division of entriesitself rarely takes into account (and is oftencontradictory with) distri<strong>but</strong>ional facts. Annotatorsall commented on <strong>the</strong> vagueness of definitionsand lack of clear-cut distinctions among <strong>sense</strong>s,which <strong>the</strong>y had never fully realised until <strong>the</strong>y wereconfronted with <strong>the</strong> systematic <strong>tagging</strong> task. Thisvagueness is particularly apparent in abstract, verypolysemous words, such as degré, économie(=economy, economics, saving, etc.),communication (=communication, report,telephone call, etc.), <strong>for</strong>mation (=education,training, <strong>for</strong>ming, <strong>for</strong>mation, etc.), whichconstitute a large part of most texts.The reason <strong>for</strong> this is probably to be found in alexicographic tradition that has its roots in <strong>the</strong>Aristotelian approach to <strong>meaning</strong> and definition.For several centuries, dictionaries have primarilytried to give an account of <strong>meaning</strong>, not of usage(apart from occasional indications of register ordomain). As a result, <strong>the</strong>y rarely provide <strong>the</strong>surface distri<strong>but</strong>ional clues that would enable<strong>sense</strong> discrimination. Only recently somedictionaries (e.g. Cobuild, LDOCE, OALD) havestarted incorporating detailed syntactic,collocational and paradigmatic in<strong>for</strong>mation, usingcorpus evidence instead of lexicographer'sintrospection. This trend is however very newcompared to <strong>the</strong> four-century dictionary buildingtradition, and distri<strong>but</strong>ional in<strong>for</strong>mation in moderndictionaries is still very far from being systematicand precise enough <strong>for</strong> computer <strong>use</strong>. Morecomputer-oriented resources such as WordNetun<strong>for</strong>tunately also almost totally lack this type ofin<strong>for</strong>mation.A major departure from traditional lexicographyhas to be made if we want to accomplishsignificant progress in <strong>sense</strong> <strong>tagging</strong> and o<strong>the</strong>r<strong>sense</strong>-related activities. We have to radically shiftfrom <strong>the</strong> description of <strong>meaning</strong> to that of <strong>the</strong> <strong>use</strong>s.The dictionaries cited above go one step in thatdirection,<strong>but</strong> distri<strong>but</strong>ional in<strong>for</strong>mation is still verymuch conceived as an add-on on top of traditionalfoundations. I will take <strong>the</strong> radical stance thatdistri<strong>but</strong>ional in<strong>for</strong>mation can provide <strong>the</strong> veryfoundations of dictionary organisation, and thatentries can be divided up into coherent usageclasses — that one can think about as <strong>sense</strong>s — on<strong>the</strong> sole basis of that in<strong>for</strong>mation, with no resort to<strong>meaning</strong> analysis and <strong>the</strong> more or lessintrospective or psychological considerations thatsuch analysis usually requires.Although never implemented fully andsystematically in lexicographic work and computerapplications, this point of view is not entirely new.It can be tracked back at least to Meillet [15]:“Le sens d'un mot ne se laisse définir que par unemoyenne entre [ses] emplois linguistiques.” (The <strong>sense</strong>of a word is defined only by <strong>the</strong> average of its linguistic<strong>use</strong>s.)Wittgenstein [18] popularised a similar position in<strong>the</strong> well-known aphorism 4 :“Don't <strong>look</strong> <strong>for</strong> <strong>the</strong> <strong>meaning</strong>, <strong>but</strong> <strong>for</strong> <strong>the</strong> <strong>use</strong>”,and Harris made it part of his linguisticprogramme, by defining “<strong>meaning</strong> as a function ofdistri<strong>but</strong>ion” [10:155-158].4.4. Distri<strong>but</strong>ional in<strong>for</strong>mationIn this section, I will show that entries can bedivided up using various types of distri<strong>but</strong>ionalin<strong>for</strong>mation with no resort to <strong>meaning</strong> analysis. At<strong>the</strong> same time, this in<strong>for</strong>mation is of primaryimportance <strong>for</strong> human annotators and <strong>tagging</strong>systems. I will <strong>use</strong> <strong>the</strong> word barrage (=dam,blocking, roadblock, barrier, etc.) as an example,since while being polysemous, it is not toocomplex <strong>for</strong> <strong>the</strong> space constraints of this paper.4.4.1. Syntactic in<strong>for</strong>mationSyntax provides an extremely powerful tool <strong>for</strong>splitting entries. For example, some <strong>use</strong>s ofbarrage are an active nominalisation of <strong>the</strong> verbbarrer, o<strong>the</strong>rs are not. By active, I mean that <strong>the</strong>nominalisation is a strict synonym of <strong>the</strong> verb, bywhich it can be replaced by changing <strong>the</strong>4 in <strong>the</strong> Philosophische Utersuchungen – he had previouslydefended <strong>the</strong> opposite view in <strong>the</strong> Tractatus.6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!