11.07.2015 Views

semantic roles - VISL

semantic roles - VISL

semantic roles - VISL

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Automatic Semantic RoleAnnotation for SpanishEckhard BickInstitute of Language andCommunicationUniversity of Southern Denmarkeckhard.bick@mail.dkM. Pilar Valverde IbáñezDepartamento de Lengua Española,Universidade de Santiago deCompostelapilar.valverde@usc.es


Talk overview●●●●●motivation and methods: Background<strong>semantic</strong> classification: 1. <strong>semantic</strong> prototypes andatomic features vs. 2. <strong>semantic</strong> <strong>roles</strong> (form vs. function)inducing (2) from (1) using syntactic context:a <strong>semantic</strong> role annotation grammaradvantages and problems:Dependency trees and verb framesan applicative example from the realm of lexicography:The DeepDict collocation dictionary


Higher-level Annotation●●●Semantic vs. syntactic annotation– <strong>semantic</strong> sentence structure, defined as a dependency tree of<strong>semantic</strong> <strong>roles</strong>, provides a more stable alternative to syntacticsurface tags“Comprehension” of sentences– <strong>semantic</strong> role tags can help identify linguistically encodedinformation for applications like dialogue system, IR, IE and MTLess consensus on categories– The higher the level of annotation, the lower the consensus oncategories. Thus, a <strong>semantic</strong> role set has to be definedcarefully, providing well-defined category tests, and allowingthe highest possible degree of filtering compatibility


Integrating structure and lexicon:2 different layers of <strong>semantic</strong> information●(a) "lexical perspective": contextual selection of– a (noun) sense [WordNet style] or– <strong>semantic</strong> prototype [SIMPLE style]●(b) "structural perspective":the <strong>semantic</strong>s of argument structure (verb - nominal)thematic/<strong>semantic</strong> <strong>roles</strong>– Fillmore 1968– Jackendoff 1972– Foley & van Valin 1984– Dowty 1987


The project●Goals:– develp an automatic <strong>semantic</strong> role parser for Spanish– Build a Spanish internet corpus and role-annotate it●●Project framework– 4 month project (September – December 2008) at the ISK, SDU– People: Pilar Valverde (the role grammar itself, corpus work) &Eckhard Bick (HISPAL parser, internet corpus)Method– Constraint Grammar, exploiting the new dependency featuresof the CC3 formalism (implemented by Tino Didriksen)– parallel improvements of the existing parser (lexicon, syntax),the new modul (<strong>semantic</strong> role grammar) and the CG3 formalism(new features)


Lexico-<strong>semantic</strong> tagsin Constraint Grammar●●●secondary: <strong>semantic</strong> tags employed to aiddisambiguation and syntactic annotation (traditional CG):, , , , primary: <strong>semantic</strong> tags as the object of disambiguationexisting applications using lexical <strong>semantic</strong> tags– Named Entity classification– <strong>semantic</strong> prototype tagging for treebanks– <strong>semantic</strong> tag-based applications●●Machine translation (GramTrans)QA, library IE, sentiment surveys, referent identification (anaphora)


Semantic argument slots●●the <strong>semantic</strong>s of a noun can be seen as a "compromise"between its lexical potential or “form” (e.g. prototypes)and the projection of a syntactic-<strong>semantic</strong> argumentslot by the governing verb (“function”)e.g. (country, town) prototype– (a) location, origin, destination slots(adverbial argument of movement verbs)– (b) agent or patient slots(subject of cognitive or agentive verbs)●Rather than hypothesize different senses or lexical types for thesecases, a role annotation level can be introduced as a bridge betweensyntax and true <strong>semantic</strong>s


Semantic granularity●●We use a Portuguese tag set (Bick 2007) with52 <strong>semantic</strong> <strong>roles</strong> (15 core argument <strong>roles</strong> and 37minor and “adverbial” <strong>roles</strong>)Covering the major categories of the tectogrammaticallayer of the PDT (Hajicova et al. 2000)● 20 <strong>roles</strong>: AnCora tag set (Taulé, Martí & Recasens 2008)● 24 <strong>roles</strong>: Sensem corpus (Alonso et al. 2007)●●143 <strong>roles</strong>: ADESSE database (García-Miguel & Albertuz2005)ARG structure (a la PropBank, Palmer et al 2005) areused only partially/internally, but can be added withoutinformation loss by combining <strong>roles</strong> and syntacticfunction tags


"Nominal"The <strong>semantic</strong> role inventorydefinitionexample<strong>roles</strong>§AG agent X eats Y§PAT patient Y eats X, X broke, X was broken§REC receiver give Y to X§BEN benefactive help X§EXP experiencer X fears Y, surprise X§TH theme send X, X is ill, X is situated there§RES result Y built X§ROLE role Y works as a guide§COM co­argument, comitative Y dances with X§ATR static attribute Y is ill, a ring of gold§ATR­RES resulting attribute make somebody nervøs§POS possessor Y belongs to X, Peter's car§CONT content a bottle of wine§PART part Y consists of X, X forms a whole§ID identity the town of Bergen, the Swedish company Volvo§VOC vocative keep calm, Peter!


"Adverbial"<strong>roles</strong> definition example§LOC location live in X, here, at home§ORI origin, source flee from X, meat from Argentina§DES destination send Y to X, a flight to X§PATH path down the road, through the hole§EXT extension, amount march 7 miles, weigh 70 kg§LOC­TMP temporal location last year, tomorrow evening, when we meet§ORI­TMP temporal origin since January§DES­TMP temporal destination until Thursday§EXT­TMP temporal extension for 3 weeks, over a period of 4 years§FREQ frequency sometimes, 14 times§CAU cause because of X, since he couldn't come himself§COMP comparation better than ever§CONC concession in spite of X, though we haven't hear anything§COND condition in the case of X, unless we are told differently§EFF effect, consequence with the result of, there were som many that ...§FIN purpose, intention work for the ratification of the Treaty§INS instrument through X, cut bread with, come by car§MNR manner this way, as you see fit, how ...§COM­ADV accompanier (ArgM) apart from Anne, with s.th. in her hand


"Syntacticdefinitionexample<strong>roles</strong>"§META meta adverbial according to X, maybe, apparently§FOC focalizer only, also, even§ADV dummy adverbial if no other adverbial categories apply§EV event, act, process start X, ... X ends§PRED (top) predicatior main verb in main clause§DENOM denomination lists, headlines§INC verb­incorporated take place (not fully implemented)


Exploiting lexical <strong>semantic</strong> informationthrough syntactic links●corpus information on verb complementation:– 1.5M word ADESSE clause database of verbs & arguments– --> 96 CG set definitions– e.g. V-SP-SUBJ = “contar” “decir” “hablar” ...– MAP (§SP) TARGET §ARG1& (p V-SP-SUBJ)●~ 160 <strong>semantic</strong> prototypes from the HISPAL lexicon– e.g. N-LOC = ..combined with destination prepositionsPRP-DES = “hasta” (till), “en dirección a” (towards) ...– MAP (§DES) TARGET @P< (0 N-LOC LINK p PRP-DES)●Needs dependency trees as input, created with theHISPAL Constraint Grammar parser (Bick 2006)


Dependency treeswill organizeorganizaráMinisterio_de_Salud_PúblicaElprogramaenedificioAGThe Ministry of HealthunEVya program and a partyunafiestaparasu propioLOCin their own buildingtrabajadoressusfor its empolyeesBEN


Source formatEl (the) [el] DET @>N #1­>2Ministerio=de=Salud=Pública [M.] PROP M S @SUBJ> #2­>3 $ARG0 §AGorganizará (organized) [organizar] V FUT 3S IND @FS­STA #3­>0 §PREDun (a) [un] DET M S @>N #4­>5programa (program) [programa] N M S @3 $ARG1 §EVy (and) [y] KC @CO #6­>5una (a) [un] DET F S @>N #7­>8fiesta (party) [fiesta] N M S @5 $ARG1 §EVpara (for) [para] PRP @3sus (their) [su] DET M P @>N @>N #10­>11trabajadores (workers) [trabajador] N M P @P< #11­>9 §BENen (in) [en] PRP @3su (their) [su] DET M S @>N @>N #13­>15propio (own) [propio] DET M S @>N @>N #14­>15edifício (building) [edifício] N M S @P< #15­>12 §LOC(authentic newspaper text)


Grammar profile– 568 hand-written CG role mapping rules– uses the new CG3 formalism– runs on syntactically analyzed text, no restrictions– processes input from the Spanish HISPAL parser– uses a variety of syntactic and <strong>semantic</strong> markers●●●●dependency structuressyntactic function<strong>semantic</strong> verb classes<strong>semantic</strong> prototype tags for nouns<strong>semantic</strong> nounlexiconCG3<strong>semantic</strong>verb lexiconCG3TextHISPALparserRole CGAnnotatedCorpus


Specific problems:diathesis alternation●●ARG <strong>roles</strong> used to systematize diathesis alternationARG0 (argument closest to the predicator):– @SUBJ of most verbs– passive agent in passive clauses●ARG1 (second closest):– @ACC of transitive verbs– @SUBJ of unaccusative verbs●The role mapping rules then use ARG0 instead ofsubject and ARG1 instead of object, generalizing overactive/passive constructions– MAP (§PAT) TARGET §ARG1& (p PRP-A LINK p V-PAT-ACC);– MAP (§AG) TARGET §ARG0& (0 N-HUM);


Specific problems:the particle “se”●●“se” (literally a reflexive pronoun) is very hard toclassify functionally, and syntactic classificationproblems propagate to <strong>semantic</strong> role assignmen“se”-meanings range gradually from a “non-role” to“full <strong>roles</strong>” -> errors, vagueness, ambiguity.– pronominal-integrated part of a verb– true reflexive (e.g. agent affecting himself)– unaccusative/medial (marking the subject as patient without itbeing agent at the same time)– passive (usually repeating the nominal patient subject)– impersonal subject agent●necessary to support these distinctions at the syntacticlevel --> changes to the HISPAL parser


Evaluation●Soft evaluation by manual revision of role labels● 5000 running wordsRecall Precision F-ScoreOverall performance,all levelsignoring errorscaused by syntacticanalysisfailure89.0 % 75.4 % 81.6 %91.4 % 88.6 % 90.0 %no direct comparison with SemEval, because of tag set (size) differencescp. machine-learning results for role-tagging, e.g.86% F-Score (Márquez et al. 2007, Morante et al. 2007)


Error analysis●●●●category vagueness: §REC - §DES - §BEN– similar problem in manual role-annotation (Vaamonde 2008)erroneous inclusion of the particle se (1 rule 20 errors)error patterns traceable to rulesuneven error distribution across rules– high payoff expected from tackling the most problematiccategories


●●Corpus resultscompilation and annotation of a Spanish internet corpus(11.2 million words)to infer tendencies about the relationship between <strong>semantic</strong><strong>roles</strong> and other grammatical categories:Part of speech 2 SemanticRoleSyntacticfunction 1 prototype 3§TH ACC (61%) N (57%) sem-c (10%)§AG SUBJ> (91%) N (45%) Hprof (7%)§ATR SC (75%) N, ADJ, PCP act (7%)§BEN ACC (55%) INDP (35%) HH (13%)§LOC-TMP ADVL (64%) ADV (34%) per (31%)§EV ACC (54%) N (85%) act (33%)§LOC ADVL (57%) PRP-N (55%) L (10%)§REC DAT (73%) PERS (41%) H (9%)§TP FS-ACC (34%) VFIN (33%) sem-c (14%)§PAT SUBJ> (73%) N (55%) sem-c (7%)


●●●●●smallest syntactic “spread”: §AG, §COG, §SP (subject and agent ofpassive)easy: §SP and §COG, inferable from the verb alonedifficult: §TH, covers a wide range of verb types and <strong>semantic</strong>features@SUBJ and @ACC match >= 20 <strong>roles</strong>, but unevenlyhuman <strong>roles</strong> tend to appear left, others rightRole Frequency Subject/object Left/Right ratioratio§TH 14.6 % 25.4 % 31.0 %§AG 6.6 % 97.2 % 78.4 %§ATR 6.0 % - 21.7 %§BEN 5.0 % 3.2 % 59.2 %§LOC-TMP 4.0 % 23.7 % 42.6 %§EV 3.7 % 43.4 % 30.0 %§LOC 3.0 % 0.0 % 23.0 %§REC 1.6 % 87.8 % 44.7 %§TP 1.5 % 4.0 % 7.5 %§PAT 0.4 % 80.0 % 68.5 %


●ProblemsFuture work– interdependence between syntactic and <strong>semantic</strong> annotation– scarcity of necessary linguistic and corpus information– a certain gradual nature of role definitions●Plans:– annotate what is possible, one argument at a time, use functiongeneralisation and noun types where verb frames are not available– Boot-strap a frame lexicon from automatically role-annotated text– First version: Spanish web corpusat http://corp.hum.sdu.dkhuman postrevisioncorporaannotated dataSpanish PropBankSpanish FrameNetgood roleannotationgrammarfrequency-basedframe extraction


Spanish parser: beta.visl.sdu.dkCorpus: corp.hum.sdu.dkpilar.valverde@usc.eseckhard.bick@mail.dk


ReferencesAlonso, L.; Capilla, J.; Castellón, I.; Fernández, A. and Vázquez, G. (2007): “The Sensem Project: Syntactico-Semantic Annotation of Sentences in Spanish”, Recent Advances in Natural Language Processing IV.Selected papers from RANLP 2005. Current Issues in Linguistic Theory, John Benjamins Publishing Co,, pp. 89--98.Bick, E. (2006): “A Constraint Grammar-Based Parser for Spanish”, Proceedings of TIL 2006 - 4th Workshopon Information and HLT.Bick, E. (2007): “Automatic Semantic Role Annotation for Portuguese”, Proceedings of TIL 2007 - 5thWorkshop on Information and Human Language Technology / Anais do XXVII Congresso da SBC , pp.1713--1716.Fillmore, C. (1968): “The case for case”, in E. Bach and R. Harms (ed.), Universals in linguistic theory , Holt,Reinehart and Winston, New York.García-Miguel, J. and Albertuz, F. (2005): “Verbs, <strong>semantic</strong> classes and <strong>semantic</strong> <strong>roles</strong> in the ADESSEproject”, in K.Erk; A. Melinger and S. Schulte im Walde (ed.), Proceedings of the InterdisciplinaryWorkshop on the Identification and Representation of Verb Features and Verb Classes .Gildea, D. and Palmer, M. (2002): “The necessity of parsing for Predicate Argument Recognition”, ACL 2002 .Hajicova, E.; Pane nova, J. and Sgall, P. (2000): A Manual for Tectogrammatic Tagging of the PragueDependency Treebank , Technical report, UFAL/CKL Technical Report TR-2000-09, Charles University,Chzech Republic.Jackendoff, R. (1972): Semantic interpretation in Generative Grammar , The MIT Press, Cambridge.Karlsson et al. (1995): Constraint Grammar - A Language-Independent System for Parsing Unrestricted Text.Natural Language Processing, No 4. Berlin & New York: Mouton de Gruyter.Morante, R. and den Bosch, A. V. (2007): “Memory-based <strong>semantic</strong> role labelling”, Proceedings ofRANLP-2007 , pp. 388-394.Màrquez, L.; Villarejo, L. and Martí, M. (2007): “Semeval-2007 Task 09: Multilevel <strong>semantic</strong> annotation ofCatalan and Spanish”, Proceedings of SemEval 2007 , pp. 42-47.Taulé, M.; Martí, M. and Recasens, M. (2008): “AnCora: Multilevel Annotated Corpora for Catalan andSpanish”, Proceedings of LREC 2008.Vaamonde, G. (2008): “Algunos problemas concretos en la anotación de papeles semánticos. Breve estudiocomparativo a partir de los datos de AnCora, SenSem y ADESSE”, Procesamiento del lenguaje natural ,nº 41, pp. 233--240.


A usage example: Using “deeply” annotatedcorpora for “live” lexicography: DeepDict1) annotate a corpus with dependency, syntacticfunction and <strong>semantic</strong> classes2) extract dependency pairs of words and theircomplements, e.g. V + @ACC, @>N + N3) identify statistically significant relations (not n-grams but “dep-grams!”)4) normalisation (lemma) and classification5) build a graphical interface* <strong>semantic</strong> prototypes will allow generalisations andcross-language comparisons* uses: advanced learner's dictionaries, productiondictionaries, language and linguistics teaching ...


aw text:- Wikipedia- newspaper- Internet- Europarlcorpus:encoding cleaningsentence separationid-markingDanGramEngGram...Comp. lexicaCG grammarsDep grammarexampleconcordance* ......* ......* ......annotated corporanormfrequenciesdep-pairextractorStatisticalDatabaseCGIfriendlyuser interface


DeepDict user interface


DeepDict: implicit <strong>semantic</strong>s from lexical relations


DeepDict prepositions:more syntax than <strong>semantic</strong>s


DeepDict: explicit <strong>semantic</strong>s (<strong>semantic</strong> prototypes)


DeepDict: sources and sizesParser Lexicon Grammar CorporaDanGramPALAVRAS100.000 lexemes,40.000 names70.000 lexemes,15.000 names8.400 rules7.500 rulesHISPAL 73.000 lexemes 4.900 rulesEngGram 81.000 val/sem 4.500 rulesSweGram 65.000 val/sem 8.400 rulesNorGramOBT / viaDanGramOBT / viaDanGramFrAG 57.000 lexemes 1.400 rulesGerGram 25.000 val/sem LS+1.300rulesEspGram 30.000 lexemes 2.600 rulesItaGram 30.600 lexemes 1.600 rulesca. 67 mill. words (mixed)[+83 mill. news]ca. 210 mill. words (news)[+170 mill. wiki a.o.]ca. 50 mill. words (Wiki, Europarl)[+36 mill. Internet]ca. 210 mill. words (mixed)[+106 mill. email & chat]ca. 60 mill. words (news, Europarl)[+ Wiki]ca. 30 mill. words (Wikipedia)[+ internet]-[+67 mill Wiki, Europarl]ca. 44 mill. words (Wiki, Europarl)[+ internet]ca. 18 mill. words (mixed)[+ internet, wiki]-[+ 46 mill. Wiki, Europarl]


Inferring <strong>semantic</strong> <strong>roles</strong> from verb classes andsyntactic function (@) and dependency (p, c and s)implicit inference of <strong>semantic</strong>s:syntactic function (e.g. @SUBJ) and valency potential (e.g.ditransitive ) are not <strong>semantic</strong> by themselves, buthelp restrict the range of possible argument <strong>roles</strong> (e.g.§BEN for @DAT)●●subjects of ergativesMAP (§PAT) TARGET @SUBJ (p LINK NOT c @ACC) ;the give sb-DAT s.th.-ACC frameMAP (§TH) TARGET @ACC (s @DAT) ;


Inferring <strong>semantic</strong> <strong>roles</strong> from <strong>semantic</strong> prototype setsusing syntactic function (@) and dependency (p, c and s)explicit use of lexical <strong>semantic</strong>s: <strong>semantic</strong> prototypes: (human professional), (ideology-follower), (nationality) ...restrict the role range by themselves, but are ultimately stilldependent on verb argument frames(a) "Genitivus objectivus/subjectivus"●MAP (§PAT) TARGET @P< (p PRP-AF + @N< LINK p N-VERBAL) ;# the destruction of the town●MAP (§AG) TARGET GEN @>N (p N-ACT) ;# The government's release of new data●MAP (§PAT) TARGET GEN @>N (p N-HAPPEN) ;# The collapse of the economy


●●●●●●●●Agent: "he was chased by three police cars"MAP (§AG) TARGET @P< (p ("af" @ADVL) LINK p PAS) (0 N-HUM OR N-VEHICLE) ;Possessor: "the painter's brush"MAP (§POS) TARGET @P< (0 N-HUM + GEN LINK 0 @>N) (p N-OBJECT) ;Instrumental: “destroy the piano with a hammer”MAP (§INS) TARGET @P< (0 N-TOOL) (p PRP-MED + @ADVL) ;Content: “a bottle of wine"MAP (§CONT) TARGET @P< (0 N-MASS OR (N P)) (p ("of") LINK p ) ;Attribute: “a statue of gold”MAP (§ATR) TARGET @P< + N-MAT (p ("of") + @N

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!