Unni Cathrine Eiken February 2005

More documents

Recommendations

Info

The Prolog code extract in (3-13) shows the MRS representation of the sentence in (3-12) by listing all the EPs in the sentence as well as the relationships that hold between the individual EPs. In simplified terms, the value of the attribute ‘semform’ holds the semantic form of the predicate, and the values of ‘ARG1’ and ‘ARG2’ point to the EPs where the semantic forms for argument 1 and argument 2 can be found. In order to extract all EPAS from such a Prolog file, one must go through all the EPs in turn, and find the semantic forms of each main EP and its belonging argument 1 and argument 2. In the extraction process, this matching and tracing of values is performed by the script Ekstraktor. The algorithm behind Ekstraktor is divided into two more or less separate parts: information retrieval from the Prolog file and processing of the information that was found and stored. Perl was chosen as the programming language mainly because of its excellent pattern matching facilities. Perl offers a very powerful and flexible regular expression syntax which lets the programmer construct regular expressions that will handle all kinds of pattern matching. For the information retrieval part of Ekstraktor, it was desirable to go through an input file, check for various patterns and store parts of the input file relevant to how the patterns were matched. (3- 14) shows one of the pattern checks in Ekstraktor – if the line read from the file contains the string: ‘relation’),semform( the entire line is stored in the array semform. (3- 14) if ($linjeFraFil =~ m/'relation'\),semform\(/){ push(@semform, $linjeFraFil); } By going through the input file line by line and checking for several patterns, all information relevant to extracting the EPAS is stored in a system of arrays. To be able to keep track of which EP the various values belong to, a system of two arrays for each argument type is used – one for EP number and one for argument value. The ARG0 arrays correspond to the predicates in the structures and for each, the semantic form can directly be found in the semform-array. The 50
ARG1 and ARG2 arrays display a value that must be traced before the semantic form can be extracted. A simplified example of the argument arrays for the sentence in (3-12) is shown in (3-15): (3- 15) ARG0: EP VALUE 23 21 25 22 28 24 31 22 32 24 ARG1: EP VALUE 19 21 23 24 ARG2: EP VALUE 19 22 23 22 To find the EPAS for this sentence, the first EP in the ARG0-array is incorporated in a regular expression which then is used for pattern matching in the members of the semform-array. If there exists an entry which matches the pattern, that is, which has an EP-value identical to the first EP in the ARG0-array, the semantic form is retrieved. To find the belonging arguments 1 and 2, the ARG1 and ARG2-arrays are searched for an EP identical to the one of the predicate. If such an EP is found, the corresponding value is retrieved – for ARG1 in our example that would be the value 24. To find the semantic form of this value, we must find the EP where this value is identical to the value of ARG0, that is, the ARG0-array must again be consulted. When the EP is found, the semform-array can be pattern matched and the semantic form can be retrieved. To retrace the example; following such a procedure, the sentence in example (3-16): (3- 16) Politiet leter etter morderen The police are looking for the murderer is represented with the following EPAS, extracted from the Prolog file of the parse: (3- 17) lete-etter,politi,morder look-for,police,murderer For a detailed walkthrough of Ekstraktor, please consult Appendix A. The program code is available in Appendix B. 51
Page 1 and 2:
University of Bergen Section for li
Page 3 and 4:
Preface The project presented in th
Page 5 and 6: Table of Contents 1 INTRODUCTION AN
Page 7 and 8: 1 Introduction and problem statemen
Page 9 and 10: patterns found in a text collection
Page 11 and 12: The results obtained in this projec
Page 13 and 14: The term anaphor describes a lingui
Page 15 and 16: 2.1.1.1 Discourse representation th
Page 17 and 18: eferring to BT. The NP which is lin
Page 19 and 20: esolution system will not be able t
Page 21 and 22: (2- 12) REC SUBJ EXIST OBJ IND-OBJ
Page 23 and 24: Figure 1 17
Page 25 and 26: means that the algorithm would prop
Page 27 and 28: for an overview). Many of these sys
Page 29 and 30: (2- 15) a. Politiet etterlyste i da
Page 31 and 32: section. The theory dates back to t
Page 33 and 34: 2.2.2 Different types of context So
Page 35 and 36: neighbours. For example, a target w
Page 37 and 38: with it. Selectional constraints al
Page 39 and 40: 3 From text to EPAS - the extractio
Page 41 and 42: 3.2 Predicate-argument structures "
Page 43 and 44: speaker flexibility with regards to
Page 45 and 46: and woman occur together both in su
Page 47 and 48: occur with. Arguments which are unl
Page 49 and 50: 3.3.1 NorGram in outline Norsk komp
Page 51 and 52: Figure 3 The most useful structure
Page 53 and 54: 3.4 Altering the source As already
Page 55: (3- 12) (3- 13) Politiet leter ette
Page 59 and 60: (3- 20) Anne Slåtten bodde i et st
Page 61 and 62: value and highly desirable. As such
Page 63 and 64: this project, this can be interpret
Page 65 and 66: The process of classifying the cons
Page 67 and 68: There are several different distanc
Page 69 and 70: . ankomme,etterforsker,?,? ankomme,
Page 71 and 72: Test 2 Training set: EPAS_arg1 with
Page 73 and 74: The training and test material was
Page 75 and 76: • level 0: words which co-occur w
Page 77 and 78: (4- 9) avklare,obduksjon,? bede-om,
Page 79 and 80: (4-10) below shows the output for t
Page 81 and 82: In the introduction to this chapter
Page 83 and 84: the EPAS can be used in the classif
Page 85 and 86: exemption of jobbe-utfra, none of t
Page 87 and 88: antecedent for (4-15a). In the case
Page 89 and 90: Figure 7 Interestingly enough, howe
Page 91 and 92: When testing on knowledge-dependent
Page 93 and 94: Firth, J. R. (1957): A synopsis of
Page 95 and 96: Appendix A: Ekstraktor.pl - algorit
Page 97 and 98: finnARG2(); This function has exact
Page 99 and 100: #legger lest linje inn i @prt derso
Page 101 and 102: sub fjernEP{ #fjerner elementer fra
Page 103 and 104: } splice(@ARGx); $imax = @ARG3ep; @
Page 105 and 106: } else{ } } } push(@liste, $ARG0ep[
Page 107 and 108:
101 Appendix C: the EPAS list 23-å
Page 109 and 110:
103 obdusere,,kvinne observere,,23-
Page 111 and 112:
Appendix D: Text aligned with EPAS
Page 113 and 114:
eventualiteter. Vi varslet Kripos.
Page 115 and 116:
Etterforskerne har flere observasjo
Page 117 and 118:
# Subrutine som tar inn argumentnum
Page 119 and 120:
Appendix F: POS-based structures SE
Page 121:
Vi har ingen spesiell teori som vi
show all

Unni Cathrine Eiken February 2005

Create successful ePaper yourself

Delete template?

Save as template?