The pre-editing of the text collection will naturally have affected the resulting EPAS list. All structures from the original texts are not extracted and the EPAS list as a consequence does not include all relevant context patterns for the domain. Still, for the purposes of a pilot study such as in this thesis, the central structures, which display the most typical context patterns for the domain, include enough information to provide a tendency of the usefulness of the method. For the purpose of subsequent analyses, the extraction process can easily be performed on unedited original texts. 3.5 Finding the words The process of extracting meaning structures such as the EPAS from the texts in the text collection is a substantial undertaking. It is also quite a tedious task, and since tedious tasks tend to benefit from being automated I wrote the Perl script Ekstraktor, which interprets the MRS- structures of a sentence and thereby puts together the EPAS for each parsed sentence. This chapter describes the outline of the automated extraction process. XLE provides the user with the choice of several output formats, including a graphical user interface that displays a tree graph of the parse as well as its F-structure and MRS-structure. By choice, the output can also be viewed as a file of Prolog predicates. In the process of extracting the EPAS, Ekstraktor reads the Prolog output, saves relevant information in a system of arrays, and subsequently performs several tests and actions on the stored information in order to present a list of all EPAS found in the parsed sentence. The MRS structures as represented in the Prolog output, provides all the information needed to extract the EPAS. Initially, the main EP with belonging arguments must be found. Since for the purpose of this paper, the linguistic structures analyzed are limited to full sentences, the main EP must display the category ‘v’ for verb. Once the main EP is identified, the semantic values for it and for its belonging arguments must be found. Subsequently, all the remaining predicateargument structures must be found. For them, there is no restriction as to which category they have. Consider the sentence shown in (3-12) together with an extract of the Prolog output of the parse shown in (3-13): 48
(3- 12) (3- 13) Politiet leter etter morderen The police are looking for the murderer cf(1,eq(attr(var(19),'ARG0'),var(20))), cf(1,eq(attr(var(19),'ARG1'),var(21))), cf(1,eq(attr(var(19),'ARG2'),var(22))), cf(1,eq(attr(var(19),'LBL'),var(10))), cf(1,eq(attr(var(19),'LNK'),14)), cf(1,eq(attr(var(19),'_CAT'),'p')), cf(1,eq(attr(var(19),'_CATSUFF'),'sel')), cf(1,eq(attr(var(19),'relation'),semform('etter',15,[],[]))), cf(1,eq(attr(var(20),'type'),'event')), cf(1,eq(attr(var(21),'PERF'),'-')), cf(1,eq(attr(var(21),'TENSE'),'pres')), cf(1,eq(attr(var(21),'type'),'event')), cf(1,eq(attr(var(22),'NUM'),'sg')), cf(1,eq(attr(var(22),'PERS'),'3')), cf(1,eq(attr(var(22),'type'),'ref-ind')), cf(1,eq(attr(var(23),'ARG0'),var(21))), cf(1,eq(attr(var(23),'ARG1'),var(24))), cf(1,eq(attr(var(23),'ARG2'),var(22))), cf(1,eq(attr(var(23),'LBL'),var(10))), cf(1,eq(attr(var(23),'LNK'),10)), cf(1,eq(attr(var(23),'_CAT'),'v')), cf(1,eq(attr(var(23),'_PRT'),'etter')), cf(1,eq(attr(var(23),'relation'),semform('lete',11,[],[]))), cf(1,eq(attr(var(24),'NUM'),'sg')), cf(1,eq(attr(var(24),'PERS'),'3')), cf(1,eq(attr(var(24),'type'),'ref-ind')), cf(1,eq(attr(var(25),'ARG0'),var(22))), cf(1,eq(attr(var(25),'BODY'),var(26))), cf(1,eq(attr(var(25),'LBL'),var(27))), cf(1,eq(attr(var(25),'LNK'),18)), cf(1,eq(attr(var(25),'RSTR'),var(14))), cf(1,eq(attr(var(25),'relation'),semform('def',31,[],[]))), cf(1,eq(attr(var(26),'type'),'handle')), cf(1,eq(attr(var(27),'type'),'handle')), cf(1,eq(attr(var(28),'ARG0'),var(24))), cf(1,eq(attr(var(28),'BODY'),var(29))), cf(1,eq(attr(var(28),'LBL'),var(30))), cf(1,eq(attr(var(28),'LNK'),0)), cf(1,eq(attr(var(28),'RSTR'),var(17))), cf(1,eq(attr(var(28),'relation'),semform('def',9,[],[]))), cf(1,eq(attr(var(29),'type'),'handle')), cf(1,eq(attr(var(30),'type'),'handle')), cf(1,eq(attr(var(31),'ARG0'),var(22))), cf(1,eq(attr(var(31),'LBL'),var(13))), cf(1,eq(attr(var(31),'LNK'),18)), cf(1,eq(attr(var(31),'_CAT'),'n')), cf(1,eq(attr(var(31),'relation'),semform('morder',19,[],[]))), cf(1,eq(attr(var(32),'ARG0'),var(24))), cf(1,eq(attr(var(32),'LBL'),var(16))), cf(1,eq(attr(var(32),'LNK'),0)), cf(1,eq(attr(var(32),'_CAT'),'n')), cf(1,eq(attr(var(32),'relation'),semform('politi1',1,[],[]))), 49
- Page 1 and 2:
University of Bergen Section for li
- Page 3 and 4: Preface The project presented in th
- Page 5 and 6: Table of Contents 1 INTRODUCTION AN
- Page 7 and 8: 1 Introduction and problem statemen
- Page 9 and 10: patterns found in a text collection
- Page 11 and 12: The results obtained in this projec
- Page 13 and 14: The term anaphor describes a lingui
- Page 15 and 16: 2.1.1.1 Discourse representation th
- Page 17 and 18: eferring to BT. The NP which is lin
- Page 19 and 20: esolution system will not be able t
- Page 21 and 22: (2- 12) REC SUBJ EXIST OBJ IND-OBJ
- Page 23 and 24: Figure 1 17
- Page 25 and 26: means that the algorithm would prop
- Page 27 and 28: for an overview). Many of these sys
- Page 29 and 30: (2- 15) a. Politiet etterlyste i da
- Page 31 and 32: section. The theory dates back to t
- Page 33 and 34: 2.2.2 Different types of context So
- Page 35 and 36: neighbours. For example, a target w
- Page 37 and 38: with it. Selectional constraints al
- Page 39 and 40: 3 From text to EPAS - the extractio
- Page 41 and 42: 3.2 Predicate-argument structures "
- Page 43 and 44: speaker flexibility with regards to
- Page 45 and 46: and woman occur together both in su
- Page 47 and 48: occur with. Arguments which are unl
- Page 49 and 50: 3.3.1 NorGram in outline Norsk komp
- Page 51 and 52: Figure 3 The most useful structure
- Page 53: 3.4 Altering the source As already
- Page 57 and 58: ARG1 and ARG2 arrays display a valu
- Page 59 and 60: (3- 20) Anne Slåtten bodde i et st
- Page 61 and 62: value and highly desirable. As such
- Page 63 and 64: this project, this can be interpret
- Page 65 and 66: The process of classifying the cons
- Page 67 and 68: There are several different distanc
- Page 69 and 70: . ankomme,etterforsker,?,? ankomme,
- Page 71 and 72: Test 2 Training set: EPAS_arg1 with
- Page 73 and 74: The training and test material was
- Page 75 and 76: • level 0: words which co-occur w
- Page 77 and 78: (4- 9) avklare,obduksjon,? bede-om,
- Page 79 and 80: (4-10) below shows the output for t
- Page 81 and 82: In the introduction to this chapter
- Page 83 and 84: the EPAS can be used in the classif
- Page 85 and 86: exemption of jobbe-utfra, none of t
- Page 87 and 88: antecedent for (4-15a). In the case
- Page 89 and 90: Figure 7 Interestingly enough, howe
- Page 91 and 92: When testing on knowledge-dependent
- Page 93 and 94: Firth, J. R. (1957): A synopsis of
- Page 95 and 96: Appendix A: Ekstraktor.pl - algorit
- Page 97 and 98: finnARG2(); This function has exact
- Page 99 and 100: #legger lest linje inn i @prt derso
- Page 101 and 102: sub fjernEP{ #fjerner elementer fra
- Page 103 and 104: } splice(@ARGx); $imax = @ARG3ep; @
- Page 105 and 106:
} else{ } } } push(@liste, $ARG0ep[
- Page 107 and 108:
101 Appendix C: the EPAS list 23-å
- Page 109 and 110:
103 obdusere,,kvinne observere,,23-
- Page 111 and 112:
Appendix D: Text aligned with EPAS
- Page 113 and 114:
eventualiteter. Vi varslet Kripos.
- Page 115 and 116:
Etterforskerne har flere observasjo
- Page 117 and 118:
# Subrutine som tar inn argumentnum
- Page 119 and 120:
Appendix F: POS-based structures SE
- Page 121:
Vi har ingen spesiell teori som vi