WordNets
WordNets WordNets
wordnets Piek Vossen VU University Amsterdam
- Page 2 and 3: What kind of resource is the Prince
- Page 4 and 5: Wordnet: a network of semantically
- Page 6 and 7: Wordnet 3.0 statistics POS Unique S
- Page 8 and 9: uilding church abbey artifact Lexic
- Page 10 and 11: inessential souvenir garbage threat
- Page 12 and 13: Meronymy & pictures
- Page 14 and 15: Meronymy Component Anchored Relatio
- Page 16 and 17: EuroWordNet Multilingual database v
- Page 18 and 19: EuroWordNet Multilingual database v
- Page 20 and 21: Complex mappings across languages E
- Page 22 and 23: BabelNet: Combining WordNet & Wikip
- Page 24 and 25: Lexical Unit (LU) Correspond to wor
- Page 26 and 27: Dutch wordnet Cornetto statistics A
- Page 28 and 29: Enabling technologies ● Semantic
- Page 30 and 31: Images ● Computer Science and Art
- Page 32 and 33: ● http://www.freesound.org/browse
- Page 34: Experimental results
wordnets<br />
Piek Vossen<br />
VU University Amsterdam
What kind of resource is the<br />
Princeton WordNet?<br />
● http://wordnet.princeton.edu/<br />
● Developed by George Miller and his team at<br />
Princeton University, as the implementation of<br />
a mental model of the lexicon<br />
● Mostly used database in language technology<br />
● Enormous impact in language technology<br />
development<br />
● Large<br />
● Free and downloadable<br />
● English
Wordnet Starting point<br />
● Lexical database organized around concepts instead of lexical<br />
forms:<br />
– Separates lexical forms from concepts<br />
– Defines concepts through a relational model of meaning and not an<br />
encyclopedic view<br />
● Concept is defined by the notion of a synset: a set of synonyms<br />
● Synsets distinguish different word meanings:<br />
– {board, plank}{board, committee}{board, get on}<br />
● The ‘synset’ as a weak notion of synonymy:<br />
● “two expressions are synonymous in a/some linguistic context C<br />
if the substitution of one for the other in C does not alter the truth<br />
value.” (Miller et al. 1993)
Wordnet:<br />
a network of semantically related words<br />
{conveyance;transport}<br />
{vehicle}<br />
{motor vehicle; automotive vehicle}<br />
{car; auto; automobile; machine; motorcar}<br />
{cruiser; squad car; patrol car;<br />
police car; prowl car}<br />
{car mirror}<br />
{car door}<br />
{bumper}<br />
{cab; taxi; hack; taxicab}<br />
{car window}<br />
{armrest}<br />
{doorlock}<br />
{hinge;<br />
flexible joint}
Wordnet:<br />
a network of semantically related words<br />
hypernym<br />
H<br />
Y<br />
P<br />
O<br />
N<br />
Y<br />
M<br />
Y<br />
{conveyance;transport}<br />
{vehicle}<br />
{motor vehicle; automotive vehicle}<br />
{car; auto; automobile; machine; motorcar}<br />
{cruiser; squad car; patrol car;<br />
police car; prowl car}<br />
hyponym<br />
holonym meronym<br />
MERONYMY<br />
{cab; taxi; hack; taxicab}<br />
{car mirror}<br />
{car door}<br />
{bumper}<br />
{car window}<br />
{armrest}<br />
{doorlock}<br />
{hinge;<br />
flexible joint}
Wordnet 3.0 statistics<br />
POS Unique Synsets Total<br />
Strings Word-Sense<br />
Pairs<br />
Noun 117,798 82,115 146,312<br />
Verb 11,529 13,767 25,047<br />
Adjective 21,479 18,156 30,002<br />
Adverb 4,481 3,621 5,580<br />
Totals 155,287 117,659 206,941<br />
Synonymy: two words shared a concept:<br />
board, committee<br />
Homonymy: one word belongs to two unrelated concepts:<br />
board, plank<br />
Polysemy: one word has more than one related concept:<br />
university (building & institute)
Semantics of Nouns in WordNet<br />
25 unique beginners for nouns<br />
{act, action, activity} {natural object} {food}<br />
{animal, fauna} {natural phenomenon} {time}<br />
{artifact} {plant, flora} {substance}<br />
{attribute, property} {possession} {group, collection}<br />
{body, corpus} {process} {location, place}<br />
{cognition, knowledge} {quantity, amount} {motive}<br />
{communication} {relation}<br />
{event, happening} {shape}<br />
{feeling, emotion} {state, condition}
uilding<br />
church<br />
abbey<br />
artifact<br />
Lexicalization patterns<br />
bird<br />
object<br />
canary<br />
common<br />
canary<br />
animal<br />
dog<br />
entity<br />
organism<br />
crocodile<br />
tree<br />
plant<br />
fl ower<br />
rose<br />
top-layer<br />
25 unique<br />
beginners<br />
Basic Level<br />
Concepts<br />
(Rosch)
uilding<br />
church<br />
abbey<br />
artifact<br />
Lexicalization patterns<br />
bird<br />
object<br />
canary<br />
common<br />
canary<br />
animal<br />
dog<br />
entity<br />
organism<br />
crocodile<br />
tree<br />
plant<br />
fl ower<br />
• balance of two rose principles:<br />
top-layer<br />
25 unique<br />
beginners<br />
Basic Level<br />
Concepts<br />
● predict most features<br />
● apply to most subclasses<br />
• where most concepts are created<br />
• amalgamate most parts<br />
• most abstract level to draw a pictures
inessential<br />
souvenir<br />
garbage<br />
threat<br />
building<br />
church<br />
abbey<br />
artifact<br />
Lexicalization patterns<br />
bird<br />
object<br />
canary<br />
common<br />
canary<br />
entity<br />
animal<br />
dog<br />
organism<br />
crocodile<br />
tree<br />
plant<br />
top-layer<br />
curiosity<br />
....etc....<br />
waste<br />
25 unique<br />
variable beginners<br />
fl ower<br />
rose<br />
basic level<br />
concepts
leg<br />
beak<br />
Meronymy<br />
tail
Meronymy & pictures
Meronymy<br />
part<br />
body part<br />
tail<br />
dog tail<br />
Anchored Relational Model<br />
Hyponymy<br />
animal<br />
plague<br />
pet<br />
hunter<br />
dog<br />
threat<br />
beast of burden<br />
carnivore<br />
some<br />
dogs<br />
group<br />
group of animals<br />
pack<br />
pack of dogs<br />
watchdog, sheepdog, herding dog, lapdog, working<br />
dog<br />
Newfoundland<br />
dalmatian
Meronymy<br />
Component<br />
Anchored Relational Model<br />
part, limb, organ,<br />
beak, foot,<br />
ear, nose, head, etc.<br />
Hyponymy<br />
Superordinate<br />
Basic Level<br />
Whole<br />
Subordinate<br />
LANGUAGE CUTS UP REALITY IN FUNNY WAYS!<br />
He takes too many medicines./He takes to much medicine.<br />
material, comestibles,<br />
medicine, cutlery,<br />
means of payment,<br />
means, assets, property,<br />
stimulant, earthenware,<br />
etc..<br />
Multiform
EuroWordNet Multilingual database<br />
vehículo<br />
1<br />
auto tren<br />
vehicle<br />
1<br />
car train<br />
2<br />
English Words<br />
2<br />
Spanish Words<br />
veicolo<br />
1<br />
auto treno<br />
2<br />
Italian Words<br />
dopravní prost edník<br />
1<br />
auto vlak<br />
2<br />
Czech Words<br />
voertuig<br />
1<br />
auto trein<br />
2<br />
Dutch Words<br />
véhicule<br />
1<br />
voiture train<br />
2<br />
French Words<br />
Fahrzeug<br />
Auto Zug<br />
1<br />
2<br />
German Words<br />
liiklusvahend<br />
1<br />
auto killavoor<br />
2<br />
Estonian Words
EuroWordNet Multilingual database<br />
vehículo<br />
1<br />
auto tren<br />
vehicle<br />
1<br />
car train<br />
2<br />
English Words<br />
2<br />
Spanish Words<br />
veicolo<br />
1<br />
auto treno<br />
2<br />
Italian Words<br />
3 3<br />
Czech Words<br />
Inter-Lingual<br />
Index<br />
dopravní prost edník<br />
1<br />
auto vlak<br />
2<br />
Car<br />
…<br />
Train<br />
…<br />
Vehicle<br />
voertuig<br />
1<br />
auto trein<br />
2<br />
Dutch Words<br />
véhicule<br />
1<br />
voiture train<br />
2<br />
French Words<br />
Fahrzeug<br />
Auto Zug<br />
1<br />
2<br />
German Words<br />
liiklusvahend<br />
1<br />
auto killavoor<br />
2<br />
Estonian Words
EuroWordNet Multilingual database<br />
vehículo<br />
1<br />
auto tren<br />
vehicle<br />
1<br />
car train<br />
2<br />
English Words<br />
2<br />
Spanish Words<br />
veicolo<br />
1<br />
auto treno<br />
2<br />
Italian Words<br />
Domains<br />
Transport<br />
Road Air Water<br />
3 3<br />
Czech Words<br />
Inter-Lingual<br />
Index<br />
dopravní prost edník<br />
1<br />
auto vlak<br />
2<br />
Top-ontology<br />
Object<br />
Device<br />
TransportDevice<br />
Car<br />
…<br />
Train<br />
…<br />
Vehicle<br />
voertuig<br />
1<br />
auto trein<br />
2<br />
Dutch Words<br />
véhicule<br />
1<br />
voiture train<br />
2<br />
French Words<br />
Fahrzeug<br />
Auto Zug<br />
1<br />
2<br />
German Words<br />
liiklusvahend<br />
1<br />
auto killavoor<br />
2<br />
Estonian Words
EuroWordNet Multilingual database<br />
vehículo<br />
1<br />
auto tren<br />
vehicle<br />
1<br />
car train<br />
2<br />
English Words<br />
2<br />
Spanish Words<br />
veicolo<br />
1<br />
auto treno<br />
2<br />
Italian Words<br />
Domains<br />
Transport<br />
Road Air Water<br />
3 3<br />
Czech Words<br />
Inter-Lingual<br />
Index<br />
dopravní prost edník<br />
1<br />
auto vlak<br />
2<br />
Top-ontology<br />
Object<br />
Device<br />
TransportDevice<br />
Car<br />
…<br />
Train<br />
…<br />
Vehicle<br />
voertuig<br />
1<br />
auto trein<br />
2<br />
Dutch Words<br />
véhicule<br />
1<br />
voiture train<br />
2<br />
French Words<br />
Fahrzeug<br />
Auto Zug<br />
1<br />
2<br />
German Words<br />
liiklusvahend<br />
1<br />
auto killavoor<br />
2<br />
Estonian Words
● Inter-Lingual-Index: unstructured fund of concepts to<br />
provide an efficient mapping across the languages;<br />
● Index-records are mainly based on English WordNet<br />
synsets and consist of synonyms, glosses and source<br />
references;<br />
● Various types of complex equivalence relations are<br />
distinguished;<br />
● Equivalence relations from synsets to index records: not on a<br />
word-to-word basis;<br />
The Multilingual Design<br />
● Indirect matching of synsets linked to the same index items;
Complex mappings across languages<br />
EN-Net<br />
toe<br />
finger<br />
head<br />
NL-Net<br />
hoofd<br />
kop<br />
{ toe : part of foot }<br />
{ finger : part of hand }<br />
{ dedo , dito :<br />
finger or toe }<br />
{ head : part of body }<br />
{ hoofd : human head }<br />
{ kop : animal head }<br />
= normal equivalence<br />
= eq _has_hyponym<br />
= eq _has_hyperonym<br />
IT-Net<br />
dito<br />
ES-Net<br />
dedo
From EuroWordNet to Global WordNet<br />
● Currently, wordnets exist for more than 70 languages,<br />
including: Arabic, Bantu, Basque, Chinese, Bulgarian,<br />
Estonian, Hebrew, Icelandic, Japanese, Kannada,<br />
Korean, Latvian, Nepali, Persian, Romanian, Sanskrit,<br />
Tamil, Thai, Turkish, Zulu...<br />
● Many languages are genetically and typologically<br />
unrelated<br />
● http://www.globalwordnet.org
BabelNet: Combining WordNet & Wikipedia<br />
Language Lemmas Synsets Word senses<br />
English 5,938,324 3,032,406 6,550,579<br />
Catalan 3,523,400 2,227,682 3,812,886<br />
French 3,760,579 2,297,853 4,127,065<br />
German 3,606,838 2,282,501 3,945,699<br />
Italian 3,503,403 2,280,769 3,808,690<br />
Spanish 3,629,457 2,265,189 3,976,233<br />
Total 23,962,001 3,032,406 26,221,152<br />
http://lcl.uniroma1/
CORNETTO<br />
Combinatorial and Relational Network as<br />
Toolkit for Dutch Language Technology<br />
23
Lexical Unit (LU)<br />
Correspond to wordmeaning<br />
pair<br />
form<br />
morphology<br />
syntax<br />
semantics<br />
pragmatics<br />
usage examples<br />
Spanish<br />
Wordnet<br />
Czech<br />
Wordnet<br />
French<br />
Wordnet<br />
Data Organization<br />
German<br />
Wordnet<br />
Korean<br />
Wordnet Arabic<br />
Wordnet<br />
Synonyms<br />
Princeton<br />
Wordnet<br />
Synset<br />
Model meaning relations<br />
Wordnet<br />
Domains<br />
Internal relations<br />
Collection of Terms and<br />
Axioms<br />
SUMO<br />
24
and#1<br />
(band)<br />
Sumo: +HumanGroup<br />
Domain: music<br />
jazzband<br />
(jazz band)<br />
Combinatorics<br />
in een band spelen<br />
(to play in a band)<br />
een band oprichten<br />
(to start a band)<br />
de band speelt<br />
(the band plays)<br />
groep<br />
(group)<br />
gezelschap<br />
(group of people)<br />
muziekgezelschap<br />
(music group)<br />
popgroep<br />
(pop group)<br />
artiest<br />
(artist)<br />
muzikant<br />
(musician)<br />
fietsband<br />
(bike tire)<br />
muziek<br />
(music)<br />
musiceren<br />
(to make music)<br />
Combinatorics<br />
de band oppompen<br />
(to pump air into a tire)<br />
een band plakken<br />
(to fi x a whole in a tire)<br />
een lekke band<br />
(fl at tire)<br />
de band springt<br />
(the tire explodes)<br />
binnenband<br />
(inner tire)<br />
voorwerp (object)<br />
ring (ring)<br />
band#2<br />
(tire)<br />
Sumo: +Artifact<br />
Domain: transport<br />
zwemband<br />
(tire for swimming)<br />
Combinatorics<br />
de band starten<br />
(to start a tape)<br />
op de band opnemen<br />
(to record on tape)<br />
de band afspelen<br />
(to play from a tape)<br />
buitenband<br />
(outer tire)<br />
lezen<br />
(read)<br />
autoband<br />
(car tire)<br />
middel (device)<br />
informatiedrager<br />
(data carrier) schrijven<br />
(write)<br />
geluidsdrager<br />
(audio carrier)<br />
band#3<br />
(audio tape)<br />
Sumo:+SignalCarrier<br />
Domain: media<br />
cassettebandje<br />
(audio cassette)<br />
Combinatorics<br />
een goede/sterke band<br />
(a good strong bond)<br />
de banden verbreken<br />
(to break all bonds)<br />
een band hebben met iemand<br />
(to have a bond with s.o.)<br />
toestand (state)<br />
relatie (relation)<br />
verhouding<br />
(relation)<br />
band#5<br />
(bond)<br />
Sumo: +SocialRelation<br />
Domain: psychology<br />
familieband<br />
(family bond)<br />
bloedband<br />
(blood bond)<br />
moederband<br />
(mother bond)
Dutch wordnet<br />
Cornetto statistics<br />
ALL NOUN VERB ADJECT.<br />
Concepts 70,370 52,845 9,017 7,689<br />
Word meanings 119,108 85,449 17,314 15,712<br />
Lemmas<br />
(form+pos)<br />
92,686 70,315 9,051 12,288<br />
Synonyms 103,762 75,475 14,138 12,914<br />
Synonym per<br />
concepts<br />
Average number<br />
meanings per word<br />
1.47 1.43 1.57 1.68<br />
1.12 1.07 1.56 1.05
Why do we need wordnets?<br />
● So that machines can understand any natural<br />
language and access all knowledge and<br />
information expressed in natural language;<br />
● So that humans can communicate with machines<br />
using natural language and vice versa;<br />
● To study philosophical and psychological<br />
questions such as what is a word, what is a<br />
concept;
Enabling technologies<br />
● Semantic similarity: what expressions or<br />
sentences are semantically similar?<br />
● Semantic relatedness and textual entailment:<br />
smoke entails fire, fire entails damage<br />
● Word-Senses-Disambiguation
● Hirst–St-Onge (1998)<br />
- d = direction changes<br />
- C, k constants<br />
Shortest path<br />
- consider all relations → relatedness<br />
● Leacock and Chodorov (1998)<br />
- D = average depth<br />
- only consider ISA → similarity
Images<br />
● Computer Science and Artificial Intelligence Lab<br />
(Massachusetts Institute of Technology, Torralba et al. 2008):<br />
– database with 80 million tiny images from Google<br />
Search<br />
– Groups of 140 images sharing visual properties and<br />
by Wordnet similarity based on all non-abstract<br />
nouns<br />
– http://groups.csail.mit.edu/vision/TinyImages/
(a) Query image.<br />
(b) First 16 of 80<br />
neighbors found using<br />
k-nearest-neighbors<br />
image similarity<br />
measure.<br />
(c) Ground truth<br />
Wordnet branch<br />
describing the content<br />
of the query image at<br />
multiple semantic<br />
levels.<br />
(d) Sub-tree formed by<br />
accumulating branches<br />
from all 80 neighbors.
● http://www.freesound.org/browse/<br />
● Sound descriptors:<br />
Sounds<br />
– fast female footsteps on wood<br />
– violin pizzicato with natural open strings<br />
● Identifiable and non-identifiable source:<br />
– bark, whinny, miaow, talk → classification by source<br />
– click, clink, thud, screech, rattle → classification is more difficult (plop<br />
=> noise => sound, plunk => sound)<br />
● What features?<br />
– Cognitive-perception: repetitiveness, continuousness, duration,<br />
harmony, pitch<br />
– Physical signal: spectral, temporal, frequency, etc.<br />
● Semantically different events may perceptually sound similar: paper bag<br />
vs eating toast
Language models and brain activity<br />
Mitchel et al 2008, Predicting Human Brain Activity Associated with the<br />
Meanings of Nouns, Science, vol 320
Experimental results