Unni Cathrine Eiken February 2005

10.04.2013 Views
Mitkov, Ruslan. (2001): Outstanding issues in anaphora resolution. In: Alexander Gelbukh (ed): Computational Linguistics and Intelligent Text Processing, pp. 110-125 Mitkov, Ruslan. (2003): Anaphora Resolution. Chapter 14 in Mitkov (ed): The Oxford Handbook of Computational Linguistics. Oxford University Press, pp. 266-283. Nasukawa, Tetsuya. (1994): Robust method of pronoun resolution using full-text information. Proceedings of the 15 th International Conference on Computational Linguistics (COLING’94, Kyoto), pp.1157-1163. Available at: http://acl.eldoc.ub.rug.nl/mirror/C/C94/index.html NorGram website (2004): http://www.hf.uib.no/i/LiLi/SLF/Dyvik/norgram/ Consulted 23/11-2004 OBT (2005): Oslo-Bergen-taggeren Available at: http://decentius.aksis.uib.no/cl/cgp/obt.html Pantel, Patrick and Dekang Lin (2002): Discovering word senses from text. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Edmonton), pp. 613-619. Pereira, Fernando, N. Tishby, L. Lee. (1993): Distributional clustering of English words. Proceedings of the 31st Annual Meeting of the ACL, pp. 183-190. Available at: http://acl.eldoc.ub.rug.nl/mirror/P/P93/index.html Robbins, R.H. (1997): A Short History of Linguistics. Longman. Saeed, John I. (1997): Semantics. Blackwell. Velldal, Erik. (2003): Modelling Word Senses With Fuzzy Clustering. Cand. Philol. Thesis in Language, Logic and Information. University of Oslo. Wolff, Karl Erich. (1994): A first course in formal concept analysis. In: Faulbaum, F. (ed): SoftStat’93 Advances in Statistical Software 4, pp. 429-438. 88

Appendix A: Ekstraktor.pl – algorithm The algorithm behind Ekstraktor is divided into two separate parts: information retrieval from the Prolog file and processing of the information that was found and stored. First a Prolog output file is opened and each line of the file is read. Based on patternmatching, lines from the file are stored in different arrays according to which pattern they match. Subsequent to the information-extraction from the Prolog file, the information stored in the arrays is processed for the purpose of creating predicate-argument structures. In the following. I will give a brief outline of the processing steps. I will do this by describing each of the central functions in Ekstraktor. The term epmor (eng: ep mother) corresponds to the first EP in the ARG0ep-array, in most cases meaning the EP “in question”. finnHoved(); Finds the semantic forms of the main/first predicate-argument structure in the sentence. This function calls the following (sub)functions: finnEP1(); Since the entities parsed are full sentences, the main structures is limited to having a verb as its head. This function searches the array catsuff for a pattern with the first member of ARG0ep as its EP. If such a pattern is found, the EP is discarded and the first members of arrays ARG0ep and ARG0verdi are removed. finnPred(); Finds the semantic value of the sentence’s predicate/ARG0. Goes through the array semform searching for a pattern with the first member of ARG0ep as EP. If such a pattern is found, the semantic form is retrieved and stored in the array predikat. In order to avoid an “empty” semantic form if the argument is a proper noun, it is checked if the retrieved form matches named. If so, the array navn is searched for a pattern with the first member of ARG0ep as EP. If such an entry is found, predikat is emptied and the new semantic form is stored there. Some predicates have an extra attribute which is stored in the array prt. Each line in this array is searched for a pattern with the first member of ARG0ep as EP. If such an entry is found, the semantic form is retrieved and stored in the array ekstra. lagVerbStruktur(); Creates the correct verbal structure for the predicate. This is for the cases where the predicate has an additional attribute – as in the predicate “lete etter” (Eng: look for). The 89

Page 1 and 2: University of Bergen Section for li

Page 3 and 4: Preface The project presented in th

Page 5 and 6: Table of Contents 1 INTRODUCTION AN

Page 7 and 8: 1 Introduction and problem statemen

Page 9 and 10: patterns found in a text collection

Page 11 and 12: The results obtained in this projec

Page 13 and 14: The term anaphor describes a lingui

Page 15 and 16: 2.1.1.1 Discourse representation th

Page 17 and 18: eferring to BT. The NP which is lin

Page 19 and 20: esolution system will not be able t

Page 21 and 22: (2- 12) REC SUBJ EXIST OBJ IND-OBJ

Page 23 and 24: Figure 1 17

Page 25 and 26: means that the algorithm would prop

Page 27 and 28: for an overview). Many of these sys

Page 29 and 30: (2- 15) a. Politiet etterlyste i da

Page 31 and 32: section. The theory dates back to t

Page 33 and 34: 2.2.2 Different types of context So

Page 35 and 36: neighbours. For example, a target w

Page 37 and 38: with it. Selectional constraints al

Page 39 and 40: 3 From text to EPAS - the extractio

Page 41 and 42: 3.2 Predicate-argument structures "

Page 43 and 44: speaker flexibility with regards to

Page 45 and 46: and woman occur together both in su

Page 47 and 48: occur with. Arguments which are unl

Page 49 and 50: 3.3.1 NorGram in outline Norsk komp

Page 51 and 52: Figure 3 The most useful structure

Page 53 and 54: 3.4 Altering the source As already

Page 55 and 56: (3- 12) (3- 13) Politiet leter ette

Page 57 and 58: ARG1 and ARG2 arrays display a valu

Page 59 and 60: (3- 20) Anne Slåtten bodde i et st

Page 61 and 62: value and highly desirable. As such

Page 63 and 64: this project, this can be interpret

Page 65 and 66: The process of classifying the cons

Page 67 and 68: There are several different distanc

Page 69 and 70: . ankomme,etterforsker,?,? ankomme,

Page 71 and 72: Test 2 Training set: EPAS_arg1 with

Page 73 and 74: The training and test material was

Page 75 and 76: • level 0: words which co-occur w

Page 77 and 78: (4- 9) avklare,obduksjon,? bede-om,

Page 79 and 80: (4-10) below shows the output for t

Page 81 and 82: In the introduction to this chapter

Page 83 and 84: the EPAS can be used in the classif

Page 85 and 86: exemption of jobbe-utfra, none of t

Page 87 and 88: antecedent for (4-15a). In the case

Page 89 and 90: Figure 7 Interestingly enough, howe

Page 91 and 92: When testing on knowledge-dependent

Page 93: Firth, J. R. (1957): A synopsis of

Page 97 and 98: finnARG2(); This function has exact

Page 99 and 100: #legger lest linje inn i @prt derso

Page 101 and 102: sub fjernEP{ #fjerner elementer fra

Page 103 and 104: } splice(@ARGx); $imax = @ARG3ep; @

Page 105 and 106: } else{ } } } push(@liste, $ARG0ep[

Page 107 and 108: 101 Appendix C: the EPAS list 23-å

Page 109 and 110: 103 obdusere,,kvinne observere,,23-

Page 111 and 112: Appendix D: Text aligned with EPAS

Page 113 and 114: eventualiteter. Vi varslet Kripos.

Page 115 and 116: Etterforskerne har flere observasjo

Page 117 and 118: # Subrutine som tar inn argumentnum

Page 119 and 120: Appendix F: POS-based structures SE

Page 121: Vi har ingen spesiell teori som vi

politi

epas

structures

semantic

politiet

etterforsker

method

context

manual

anaphora

unni

cathrine

eiken

february

www.hf.uib.no

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005 ... View more Unni Cathrine Eiken February 2005

Delete template?

Save as template ?

Unni Cathrine Eiken February 2005 Unni Cathrine Eiken February 2005