Unni Cathrine Eiken February 2005
Unni Cathrine Eiken February 2005 Unni Cathrine Eiken February 2005
3.6 Evaluation of the data set The data set created by the extraction process consisted of 195 elementary predicate-argument structures in its raw form. The original EPAS list was not directly applicable for the next parts of the project. Not all of the extracted structures on the list were suitable for further analysis. Some of the EPAS were not given an optimal analysis (for my purposes) by the grammar, some were irrelevant for the later analysis and some were not extracted correctly from the MRS by the Perl script. The dataset was post-edited to achieve a set of EPAS that did not include erroneously extracted or undesired structures. With such a small collection of structures as is the case in this project, the inclusion of only a few incorrect structures would be likely to skew the subsequent analysis and possibly produce false results. In the following, I will briefly outline some of the reasons why the EPAS list included incorrect structures and describe how the list was revised. 3.6.1 Errors from the grammar Some of the undesired structures in the original EPAS list were directly caused by characteristics in the NorGram grammar. In the original EPAS list, there were for instance several structures of the type exemplified by (3-18): (3- 18) a. verbal predicate, nominal argument b. preposition, verbal predicate, nominal argument These structures should preferably have been combined into one EPAS. The example in (3-19) below shows a concrete instance from the EPAS list and is analogous to several other instances: (3- 19) a. bo, Anne live, Anne b. i, bo, studentkollektiv in, live, student housing The structure is extracted from the following sentence from the text material: 52
(3- 20) Anne Slåtten bodde i et studentkollektiv utenfor Førde sentrum. Anne Slåtten lived in student housing outside central Førde. As example (3-20) shows, these structures originate from sentences featuring a verb with an adverbial complement. The adverbial is realised as a prepositional phrase where the preposition is selected by the verb. It would have been expected that sentences such as Anne bodde i et studentkollektiv (Anne lived in student housing) would result in one EPAS with the entity studentkollektiv somehow realized as the structure’s argument 2. Instead, the MRS structure of this and other similar sentences did not provide the necessary link between the verb as predicate and studentkollektiv as the second argument. When discussing this problem with the developers of the grammar, the source of the obstacle was easily identified. In the grammar, the verb bo (live) existed as an intransitive verb, not allowing for an adverbial complement to be analysed as required to produce the desired EPAS. In order to allow for this and similar sentences with the verb bo to produce one EPAS with the correct relationship between the predicate and its arguments, the entry for bo was altered. A solution which allows for an arbitrary preposition was favoured, instead of creating a new template that specifies the possible following prepositions. Analysing the sentence above with the revised grammar produces structures of the following type: (3- 21) bo, Anne, studentkollektiv live, Anne, studenthousing The same phenomena was observed for a few other verbs with prepositional phrases as complements, such as gjemme i (hide in) and observere i (observe in). In these instances, the structures were manually edited. 3.6.2 Irrelevant structures Some of the structures that were extracted correctly from the text collection, were simply removed in the final post-editing of the EPAS list. These structures were not directly relevant for the later analysis process and would not contribute with any valuable information for the 53
- Page 7 and 8: 1 Introduction and problem statemen
- Page 9 and 10: patterns found in a text collection
- Page 11 and 12: The results obtained in this projec
- Page 13 and 14: The term anaphor describes a lingui
- Page 15 and 16: 2.1.1.1 Discourse representation th
- Page 17 and 18: eferring to BT. The NP which is lin
- Page 19 and 20: esolution system will not be able t
- Page 21 and 22: (2- 12) REC SUBJ EXIST OBJ IND-OBJ
- Page 23 and 24: Figure 1 17
- Page 25 and 26: means that the algorithm would prop
- Page 27 and 28: for an overview). Many of these sys
- Page 29 and 30: (2- 15) a. Politiet etterlyste i da
- Page 31 and 32: section. The theory dates back to t
- Page 33 and 34: 2.2.2 Different types of context So
- Page 35 and 36: neighbours. For example, a target w
- Page 37 and 38: with it. Selectional constraints al
- Page 39 and 40: 3 From text to EPAS - the extractio
- Page 41 and 42: 3.2 Predicate-argument structures "
- Page 43 and 44: speaker flexibility with regards to
- Page 45 and 46: and woman occur together both in su
- Page 47 and 48: occur with. Arguments which are unl
- Page 49 and 50: 3.3.1 NorGram in outline Norsk komp
- Page 51 and 52: Figure 3 The most useful structure
- Page 53 and 54: 3.4 Altering the source As already
- Page 55 and 56: (3- 12) (3- 13) Politiet leter ette
- Page 57: ARG1 and ARG2 arrays display a valu
- Page 61 and 62: value and highly desirable. As such
- Page 63 and 64: this project, this can be interpret
- Page 65 and 66: The process of classifying the cons
- Page 67 and 68: There are several different distanc
- Page 69 and 70: . ankomme,etterforsker,?,? ankomme,
- Page 71 and 72: Test 2 Training set: EPAS_arg1 with
- Page 73 and 74: The training and test material was
- Page 75 and 76: • level 0: words which co-occur w
- Page 77 and 78: (4- 9) avklare,obduksjon,? bede-om,
- Page 79 and 80: (4-10) below shows the output for t
- Page 81 and 82: In the introduction to this chapter
- Page 83 and 84: the EPAS can be used in the classif
- Page 85 and 86: exemption of jobbe-utfra, none of t
- Page 87 and 88: antecedent for (4-15a). In the case
- Page 89 and 90: Figure 7 Interestingly enough, howe
- Page 91 and 92: When testing on knowledge-dependent
- Page 93 and 94: Firth, J. R. (1957): A synopsis of
- Page 95 and 96: Appendix A: Ekstraktor.pl - algorit
- Page 97 and 98: finnARG2(); This function has exact
- Page 99 and 100: #legger lest linje inn i @prt derso
- Page 101 and 102: sub fjernEP{ #fjerner elementer fra
- Page 103 and 104: } splice(@ARGx); $imax = @ARG3ep; @
- Page 105 and 106: } else{ } } } push(@liste, $ARG0ep[
- Page 107 and 108: 101 Appendix C: the EPAS list 23-å
(3- 20)<br />
Anne Slåtten bodde i et studentkollektiv utenfor Førde sentrum.<br />
Anne Slåtten lived in student housing outside central Førde.<br />
As example (3-20) shows, these structures originate from sentences featuring a verb with an<br />
adverbial complement. The adverbial is realised as a prepositional phrase where the preposition<br />
is selected by the verb. It would have been expected that sentences such as Anne bodde i et<br />
studentkollektiv (Anne lived in student housing) would result in one EPAS with the entity<br />
studentkollektiv somehow realized as the structure’s argument 2. Instead, the MRS structure of<br />
this and other similar sentences did not provide the necessary link between the verb as predicate<br />
and studentkollektiv as the second argument. When discussing this problem with the developers<br />
of the grammar, the source of the obstacle was easily identified. In the grammar, the verb bo<br />
(live) existed as an intransitive verb, not allowing for an adverbial complement to be analysed as<br />
required to produce the desired EPAS. In order to allow for this and similar sentences with the<br />
verb bo to produce one EPAS with the correct relationship between the predicate and its<br />
arguments, the entry for bo was altered. A solution which allows for an arbitrary preposition was<br />
favoured, instead of creating a new template that specifies the possible following prepositions.<br />
Analysing the sentence above with the revised grammar produces structures of the following<br />
type:<br />
(3- 21)<br />
bo, Anne, studentkollektiv<br />
live, Anne, studenthousing<br />
The same phenomena was observed for a few other verbs with prepositional phrases as<br />
complements, such as gjemme i (hide in) and observere i (observe in). In these instances, the<br />
structures were manually edited.<br />
3.6.2 Irrelevant structures<br />
Some of the structures that were extracted correctly from the text collection, were simply<br />
removed in the final post-editing of the EPAS list. These structures were not directly relevant for<br />
the later analysis process and would not contribute with any valuable information for the<br />
53