WordNets

WordNets WordNets

wordnets<br />

Piek Vossen<br />

VU University Amsterdam


What kind of resource is the<br />

Princeton WordNet?<br />

● http://wordnet.princeton.edu/<br />

● Developed by George Miller and his team at<br />

Princeton University, as the implementation of<br />

a mental model of the lexicon<br />

● Mostly used database in language technology<br />

● Enormous impact in language technology<br />

development<br />

● Large<br />

● Free and downloadable<br />

● English


Wordnet Starting point<br />

● Lexical database organized around concepts instead of lexical<br />

forms:<br />

– Separates lexical forms from concepts<br />

– Defines concepts through a relational model of meaning and not an<br />

encyclopedic view<br />

● Concept is defined by the notion of a synset: a set of synonyms<br />

● Synsets distinguish different word meanings:<br />

– {board, plank}{board, committee}{board, get on}<br />

● The ‘synset’ as a weak notion of synonymy:<br />

● “two expressions are synonymous in a/some linguistic context C<br />

if the substitution of one for the other in C does not alter the truth<br />

value.” (Miller et al. 1993)


Wordnet:<br />

a network of semantically related words<br />

{conveyance;transport}<br />

{vehicle}<br />

{motor vehicle; automotive vehicle}<br />

{car; auto; automobile; machine; motorcar}<br />

{cruiser; squad car; patrol car;<br />

police car; prowl car}<br />

{car mirror}<br />

{car door}<br />

{bumper}<br />

{cab; taxi; hack; taxicab}<br />

{car window}<br />

{armrest}<br />

{doorlock}<br />

{hinge;<br />

flexible joint}


Wordnet:<br />

a network of semantically related words<br />

hypernym<br />

H<br />

Y<br />

P<br />

O<br />

N<br />

Y<br />

M<br />

Y<br />

{conveyance;transport}<br />

{vehicle}<br />

{motor vehicle; automotive vehicle}<br />

{car; auto; automobile; machine; motorcar}<br />

{cruiser; squad car; patrol car;<br />

police car; prowl car}<br />

hyponym<br />

holonym meronym<br />

MERONYMY<br />

{cab; taxi; hack; taxicab}<br />

{car mirror}<br />

{car door}<br />

{bumper}<br />

{car window}<br />

{armrest}<br />

{doorlock}<br />

{hinge;<br />

flexible joint}


Wordnet 3.0 statistics<br />

POS Unique Synsets Total<br />

Strings Word-Sense<br />

Pairs<br />

Noun 117,798 82,115 146,312<br />

Verb 11,529 13,767 25,047<br />

Adjective 21,479 18,156 30,002<br />

Adverb 4,481 3,621 5,580<br />

Totals 155,287 117,659 206,941<br />

Synonymy: two words shared a concept:<br />

board, committee<br />

Homonymy: one word belongs to two unrelated concepts:<br />

board, plank<br />

Polysemy: one word has more than one related concept:<br />

university (building & institute)


Semantics of Nouns in WordNet<br />

25 unique beginners for nouns<br />

{act, action, activity} {natural object} {food}<br />

{animal, fauna} {natural phenomenon} {time}<br />

{artifact} {plant, flora} {substance}<br />

{attribute, property} {possession} {group, collection}<br />

{body, corpus} {process} {location, place}<br />

{cognition, knowledge} {quantity, amount} {motive}<br />

{communication} {relation}<br />

{event, happening} {shape}<br />

{feeling, emotion} {state, condition}


uilding<br />

church<br />

abbey<br />

artifact<br />

Lexicalization patterns<br />

bird<br />

object<br />

canary<br />

common<br />

canary<br />

animal<br />

dog<br />

entity<br />

organism<br />

crocodile<br />

tree<br />

plant<br />

fl ower<br />

rose<br />

top-layer<br />

25 unique<br />

beginners<br />

Basic Level<br />

Concepts<br />

(Rosch)


uilding<br />

church<br />

abbey<br />

artifact<br />

Lexicalization patterns<br />

bird<br />

object<br />

canary<br />

common<br />

canary<br />

animal<br />

dog<br />

entity<br />

organism<br />

crocodile<br />

tree<br />

plant<br />

fl ower<br />

• balance of two rose principles:<br />

top-layer<br />

25 unique<br />

beginners<br />

Basic Level<br />

Concepts<br />

● predict most features<br />

● apply to most subclasses<br />

• where most concepts are created<br />

• amalgamate most parts<br />

• most abstract level to draw a pictures


inessential<br />

souvenir<br />

garbage<br />

threat<br />

building<br />

church<br />

abbey<br />

artifact<br />

Lexicalization patterns<br />

bird<br />

object<br />

canary<br />

common<br />

canary<br />

entity<br />

animal<br />

dog<br />

organism<br />

crocodile<br />

tree<br />

plant<br />

top-layer<br />

curiosity<br />

....etc....<br />

waste<br />

25 unique<br />

variable beginners<br />

fl ower<br />

rose<br />

basic level<br />

concepts


leg<br />

beak<br />

Meronymy<br />

tail


Meronymy & pictures


Meronymy<br />

part<br />

body part<br />

tail<br />

dog tail<br />

Anchored Relational Model<br />

Hyponymy<br />

animal<br />

plague<br />

pet<br />

hunter<br />

dog<br />

threat<br />

beast of burden<br />

carnivore<br />

some<br />

dogs<br />

group<br />

group of animals<br />

pack<br />

pack of dogs<br />

watchdog, sheepdog, herding dog, lapdog, working<br />

dog<br />

Newfoundland<br />

dalmatian


Meronymy<br />

Component<br />

Anchored Relational Model<br />

part, limb, organ,<br />

beak, foot,<br />

ear, nose, head, etc.<br />

Hyponymy<br />

Superordinate<br />

Basic Level<br />

Whole<br />

Subordinate<br />

LANGUAGE CUTS UP REALITY IN FUNNY WAYS!<br />

He takes too many medicines./He takes to much medicine.<br />

material, comestibles,<br />

medicine, cutlery,<br />

means of payment,<br />

means, assets, property,<br />

stimulant, earthenware,<br />

etc..<br />

Multiform


EuroWordNet Multilingual database<br />

vehículo<br />

1<br />

auto tren<br />

vehicle<br />

1<br />

car train<br />

2<br />

English Words<br />

2<br />

Spanish Words<br />

veicolo<br />

1<br />

auto treno<br />

2<br />

Italian Words<br />

dopravní prost edník<br />

1<br />

auto vlak<br />

2<br />

Czech Words<br />

voertuig<br />

1<br />

auto trein<br />

2<br />

Dutch Words<br />

véhicule<br />

1<br />

voiture train<br />

2<br />

French Words<br />

Fahrzeug<br />

Auto Zug<br />

1<br />

2<br />

German Words<br />

liiklusvahend<br />

1<br />

auto killavoor<br />

2<br />

Estonian Words


EuroWordNet Multilingual database<br />

vehículo<br />

1<br />

auto tren<br />

vehicle<br />

1<br />

car train<br />

2<br />

English Words<br />

2<br />

Spanish Words<br />

veicolo<br />

1<br />

auto treno<br />

2<br />

Italian Words<br />

3 3<br />

Czech Words<br />

Inter-Lingual<br />

Index<br />

dopravní prost edník<br />

1<br />

auto vlak<br />

2<br />

Car<br />

…<br />

Train<br />

…<br />

Vehicle<br />

voertuig<br />

1<br />

auto trein<br />

2<br />

Dutch Words<br />

véhicule<br />

1<br />

voiture train<br />

2<br />

French Words<br />

Fahrzeug<br />

Auto Zug<br />

1<br />

2<br />

German Words<br />

liiklusvahend<br />

1<br />

auto killavoor<br />

2<br />

Estonian Words


EuroWordNet Multilingual database<br />

vehículo<br />

1<br />

auto tren<br />

vehicle<br />

1<br />

car train<br />

2<br />

English Words<br />

2<br />

Spanish Words<br />

veicolo<br />

1<br />

auto treno<br />

2<br />

Italian Words<br />

Domains<br />

Transport<br />

Road Air Water<br />

3 3<br />

Czech Words<br />

Inter-Lingual<br />

Index<br />

dopravní prost edník<br />

1<br />

auto vlak<br />

2<br />

Top-ontology<br />

Object<br />

Device<br />

TransportDevice<br />

Car<br />

…<br />

Train<br />

…<br />

Vehicle<br />

voertuig<br />

1<br />

auto trein<br />

2<br />

Dutch Words<br />

véhicule<br />

1<br />

voiture train<br />

2<br />

French Words<br />

Fahrzeug<br />

Auto Zug<br />

1<br />

2<br />

German Words<br />

liiklusvahend<br />

1<br />

auto killavoor<br />

2<br />

Estonian Words


EuroWordNet Multilingual database<br />

vehículo<br />

1<br />

auto tren<br />

vehicle<br />

1<br />

car train<br />

2<br />

English Words<br />

2<br />

Spanish Words<br />

veicolo<br />

1<br />

auto treno<br />

2<br />

Italian Words<br />

Domains<br />

Transport<br />

Road Air Water<br />

3 3<br />

Czech Words<br />

Inter-Lingual<br />

Index<br />

dopravní prost edník<br />

1<br />

auto vlak<br />

2<br />

Top-ontology<br />

Object<br />

Device<br />

TransportDevice<br />

Car<br />

…<br />

Train<br />

…<br />

Vehicle<br />

voertuig<br />

1<br />

auto trein<br />

2<br />

Dutch Words<br />

véhicule<br />

1<br />

voiture train<br />

2<br />

French Words<br />

Fahrzeug<br />

Auto Zug<br />

1<br />

2<br />

German Words<br />

liiklusvahend<br />

1<br />

auto killavoor<br />

2<br />

Estonian Words


● Inter-Lingual-Index: unstructured fund of concepts to<br />

provide an efficient mapping across the languages;<br />

● Index-records are mainly based on English WordNet<br />

synsets and consist of synonyms, glosses and source<br />

references;<br />

● Various types of complex equivalence relations are<br />

distinguished;<br />

● Equivalence relations from synsets to index records: not on a<br />

word-to-word basis;<br />

The Multilingual Design<br />

● Indirect matching of synsets linked to the same index items;


Complex mappings across languages<br />

EN-Net<br />

toe<br />

finger<br />

head<br />

NL-Net<br />

hoofd<br />

kop<br />

{ toe : part of foot }<br />

{ finger : part of hand }<br />

{ dedo , dito :<br />

finger or toe }<br />

{ head : part of body }<br />

{ hoofd : human head }<br />

{ kop : animal head }<br />

= normal equivalence<br />

= eq _has_hyponym<br />

= eq _has_hyperonym<br />

IT-Net<br />

dito<br />

ES-Net<br />

dedo


From EuroWordNet to Global WordNet<br />

● Currently, wordnets exist for more than 70 languages,<br />

including: Arabic, Bantu, Basque, Chinese, Bulgarian,<br />

Estonian, Hebrew, Icelandic, Japanese, Kannada,<br />

Korean, Latvian, Nepali, Persian, Romanian, Sanskrit,<br />

Tamil, Thai, Turkish, Zulu...<br />

● Many languages are genetically and typologically<br />

unrelated<br />

● http://www.globalwordnet.org


BabelNet: Combining WordNet & Wikipedia<br />

Language Lemmas Synsets Word senses<br />

English 5,938,324 3,032,406 6,550,579<br />

Catalan 3,523,400 2,227,682 3,812,886<br />

French 3,760,579 2,297,853 4,127,065<br />

German 3,606,838 2,282,501 3,945,699<br />

Italian 3,503,403 2,280,769 3,808,690<br />

Spanish 3,629,457 2,265,189 3,976,233<br />

Total 23,962,001 3,032,406 26,221,152<br />

http://lcl.uniroma1/


CORNETTO<br />

Combinatorial and Relational Network as<br />

Toolkit for Dutch Language Technology<br />

23


Lexical Unit (LU)<br />

Correspond to wordmeaning<br />

pair<br />

form<br />

morphology<br />

syntax<br />

semantics<br />

pragmatics<br />

usage examples<br />

Spanish<br />

Wordnet<br />

Czech<br />

Wordnet<br />

French<br />

Wordnet<br />

Data Organization<br />

German<br />

Wordnet<br />

Korean<br />

Wordnet Arabic<br />

Wordnet<br />

Synonyms<br />

Princeton<br />

Wordnet<br />

Synset<br />

Model meaning relations<br />

Wordnet<br />

Domains<br />

Internal relations<br />

Collection of Terms and<br />

Axioms<br />

SUMO<br />

24


and#1<br />

(band)<br />

Sumo: +HumanGroup<br />

Domain: music<br />

jazzband<br />

(jazz band)<br />

Combinatorics<br />

in een band spelen<br />

(to play in a band)<br />

een band oprichten<br />

(to start a band)<br />

de band speelt<br />

(the band plays)<br />

groep<br />

(group)<br />

gezelschap<br />

(group of people)<br />

muziekgezelschap<br />

(music group)<br />

popgroep<br />

(pop group)<br />

artiest<br />

(artist)<br />

muzikant<br />

(musician)<br />

fietsband<br />

(bike tire)<br />

muziek<br />

(music)<br />

musiceren<br />

(to make music)<br />

Combinatorics<br />

de band oppompen<br />

(to pump air into a tire)<br />

een band plakken<br />

(to fi x a whole in a tire)<br />

een lekke band<br />

(fl at tire)<br />

de band springt<br />

(the tire explodes)<br />

binnenband<br />

(inner tire)<br />

voorwerp (object)<br />

ring (ring)<br />

band#2<br />

(tire)<br />

Sumo: +Artifact<br />

Domain: transport<br />

zwemband<br />

(tire for swimming)<br />

Combinatorics<br />

de band starten<br />

(to start a tape)<br />

op de band opnemen<br />

(to record on tape)<br />

de band afspelen<br />

(to play from a tape)<br />

buitenband<br />

(outer tire)<br />

lezen<br />

(read)<br />

autoband<br />

(car tire)<br />

middel (device)<br />

informatiedrager<br />

(data carrier) schrijven<br />

(write)<br />

geluidsdrager<br />

(audio carrier)<br />

band#3<br />

(audio tape)<br />

Sumo:+SignalCarrier<br />

Domain: media<br />

cassettebandje<br />

(audio cassette)<br />

Combinatorics<br />

een goede/sterke band<br />

(a good strong bond)<br />

de banden verbreken<br />

(to break all bonds)<br />

een band hebben met iemand<br />

(to have a bond with s.o.)<br />

toestand (state)<br />

relatie (relation)<br />

verhouding<br />

(relation)<br />

band#5<br />

(bond)<br />

Sumo: +SocialRelation<br />

Domain: psychology<br />

familieband<br />

(family bond)<br />

bloedband<br />

(blood bond)<br />

moederband<br />

(mother bond)


Dutch wordnet<br />

Cornetto statistics<br />

ALL NOUN VERB ADJECT.<br />

Concepts 70,370 52,845 9,017 7,689<br />

Word meanings 119,108 85,449 17,314 15,712<br />

Lemmas<br />

(form+pos)<br />

92,686 70,315 9,051 12,288<br />

Synonyms 103,762 75,475 14,138 12,914<br />

Synonym per<br />

concepts<br />

Average number<br />

meanings per word<br />

1.47 1.43 1.57 1.68<br />

1.12 1.07 1.56 1.05


Why do we need wordnets?<br />

● So that machines can understand any natural<br />

language and access all knowledge and<br />

information expressed in natural language;<br />

● So that humans can communicate with machines<br />

using natural language and vice versa;<br />

● To study philosophical and psychological<br />

questions such as what is a word, what is a<br />

concept;


Enabling technologies<br />

● Semantic similarity: what expressions or<br />

sentences are semantically similar?<br />

● Semantic relatedness and textual entailment:<br />

smoke entails fire, fire entails damage<br />

● Word-Senses-Disambiguation


● Hirst–St-Onge (1998)<br />

- d = direction changes<br />

- C, k constants<br />

Shortest path<br />

- consider all relations → relatedness<br />

● Leacock and Chodorov (1998)<br />

- D = average depth<br />

- only consider ISA → similarity


Images<br />

● Computer Science and Artificial Intelligence Lab<br />

(Massachusetts Institute of Technology, Torralba et al. 2008):<br />

– database with 80 million tiny images from Google<br />

Search<br />

– Groups of 140 images sharing visual properties and<br />

by Wordnet similarity based on all non-abstract<br />

nouns<br />

– http://groups.csail.mit.edu/vision/TinyImages/


(a) Query image.<br />

(b) First 16 of 80<br />

neighbors found using<br />

k-nearest-neighbors<br />

image similarity<br />

measure.<br />

(c) Ground truth<br />

Wordnet branch<br />

describing the content<br />

of the query image at<br />

multiple semantic<br />

levels.<br />

(d) Sub-tree formed by<br />

accumulating branches<br />

from all 80 neighbors.


● http://www.freesound.org/browse/<br />

● Sound descriptors:<br />

Sounds<br />

– fast female footsteps on wood<br />

– violin pizzicato with natural open strings<br />

● Identifiable and non-identifiable source:<br />

– bark, whinny, miaow, talk → classification by source<br />

– click, clink, thud, screech, rattle → classification is more difficult (plop<br />

=> noise => sound, plunk => sound)<br />

● What features?<br />

– Cognitive-perception: repetitiveness, continuousness, duration,<br />

harmony, pitch<br />

– Physical signal: spectral, temporal, frequency, etc.<br />

● Semantically different events may perceptually sound similar: paper bag<br />

vs eating toast


Language models and brain activity<br />

Mitchel et al 2008, Predicting Human Brain Activity Associated with the<br />

Meanings of Nouns, Science, vol 320


Experimental results

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!