Unni Cathrine Eiken February 2005
Unni Cathrine Eiken February 2005 Unni Cathrine Eiken February 2005
process. To be as useful as possible, the meaning structures should be normalised and generalisable. The examples above show how normalisation through use of EPAS realises the concept of canonical form to some degree and seems particularly useful for the purpose of the present work. By using grammatic relations such as subject and object as reference points, semantically equivalent sentences, such as (3-2a) and (3-2b), would be given different meaning structures due to the difference in verbal voice. Structuring the meanings conveyed with the sentences in (3-2) within a grammatical relations paradigm would make it necessary to mark the verbal voice as well as the grammatical relations. In addition, active and passive structures would have to be treated differently in the subsequent analysis. Basing the extraction merely on syntactic properties of the sentences in the corpus would make the extracted material very difficult to classify, mainly because similar meanings would be represented differently. The advantages of a normalised and generalisable dataset is further clarified by the following example. Upon a simple grammatical analysis, the sentences shown in (3- 2) can be categorised based on the syntactic roles predicate, subject and object. The result of such a classification is shown in examples (3-5) and (3-6): (3- 5) predicate subject object a. drepe morder kvinne kill b. drepe kill c. drepe kill murderer kvinne woman kvinne woman woman murderer murderer ? The structures in (3-5) above can be extracted upon part of speech tagging of the sentences in (3-2). The active and passive predicate receives the same structure, and as no semantic information is available, the structuring of the arguments is in accordance with their status as subject or object. Attempting to classify these subjects and objects based on their co-occurrence with the predicate produces groupings of words which are not directly generalisable. Murderer 38
and woman occur together both in subject and object position, not reflecting the preferred selectional constrictions within the domain. (3- 6) predicate subject object a. drepe morder kvinne kill murderer woman a. drepes kvinne morder is-killed woman murderer b. drepes kvinne ? is-killed woman Example (3-6) provides a more elegant structuring. Because an extraction method based on syntactic relations is unable to generalise over verbal voice, two separate predicates are extracted, one for the passive and one for the active voice. Even though logically, the same action is performed on the entity the woman in both sentences in (3-6), a method as outlined above would not allow for a straightforward interpretation of this. The generalisation between active and passive versions of the same sentence is lost in such an approach. This would result in a higher number of predicates, and therefore in a less generalisable data material. It is likely that results as outlined above would also be of less use as a referent guessing helper in an anaphora resolution system, precisely because of the lower level of generalisability. 3.2.1 What is represented in the EPAS? Jurafsky and Martin (2000, p. 510) state that all languages have predicate-argument structures at the core of their semantic structure. They further describe that the grammar organises the predicate-argument structure and selectional constraints restrict how other words and phrases can combine with a given word. In this project, a simplified version of predicate-argument structures is used as meaning representation. The EPAS, or meaning representations, are limited to consist of two nominal arguments at the most. Either one of the arguments in an EPAS may be empty/unidentified. This means that the EPAS extracted from my texts will belong to one of the following three patterns: 39
- Page 1 and 2: University of Bergen Section for li
- Page 3 and 4: Preface The project presented in th
- Page 5 and 6: Table of Contents 1 INTRODUCTION AN
- Page 7 and 8: 1 Introduction and problem statemen
- Page 9 and 10: patterns found in a text collection
- Page 11 and 12: The results obtained in this projec
- Page 13 and 14: The term anaphor describes a lingui
- Page 15 and 16: 2.1.1.1 Discourse representation th
- Page 17 and 18: eferring to BT. The NP which is lin
- Page 19 and 20: esolution system will not be able t
- Page 21 and 22: (2- 12) REC SUBJ EXIST OBJ IND-OBJ
- Page 23 and 24: Figure 1 17
- Page 25 and 26: means that the algorithm would prop
- Page 27 and 28: for an overview). Many of these sys
- Page 29 and 30: (2- 15) a. Politiet etterlyste i da
- Page 31 and 32: section. The theory dates back to t
- Page 33 and 34: 2.2.2 Different types of context So
- Page 35 and 36: neighbours. For example, a target w
- Page 37 and 38: with it. Selectional constraints al
- Page 39 and 40: 3 From text to EPAS - the extractio
- Page 41 and 42: 3.2 Predicate-argument structures "
- Page 43: speaker flexibility with regards to
- Page 47 and 48: occur with. Arguments which are unl
- Page 49 and 50: 3.3.1 NorGram in outline Norsk komp
- Page 51 and 52: Figure 3 The most useful structure
- Page 53 and 54: 3.4 Altering the source As already
- Page 55 and 56: (3- 12) (3- 13) Politiet leter ette
- Page 57 and 58: ARG1 and ARG2 arrays display a valu
- Page 59 and 60: (3- 20) Anne Slåtten bodde i et st
- Page 61 and 62: value and highly desirable. As such
- Page 63 and 64: this project, this can be interpret
- Page 65 and 66: The process of classifying the cons
- Page 67 and 68: There are several different distanc
- Page 69 and 70: . ankomme,etterforsker,?,? ankomme,
- Page 71 and 72: Test 2 Training set: EPAS_arg1 with
- Page 73 and 74: The training and test material was
- Page 75 and 76: • level 0: words which co-occur w
- Page 77 and 78: (4- 9) avklare,obduksjon,? bede-om,
- Page 79 and 80: (4-10) below shows the output for t
- Page 81 and 82: In the introduction to this chapter
- Page 83 and 84: the EPAS can be used in the classif
- Page 85 and 86: exemption of jobbe-utfra, none of t
- Page 87 and 88: antecedent for (4-15a). In the case
- Page 89 and 90: Figure 7 Interestingly enough, howe
- Page 91 and 92: When testing on knowledge-dependent
- Page 93 and 94: Firth, J. R. (1957): A synopsis of
and woman occur together both in subject and object position, not reflecting the preferred<br />
selectional constrictions within the domain.<br />
(3- 6)<br />
predicate subject object<br />
a. drepe morder kvinne<br />
kill murderer woman<br />
a. drepes kvinne morder<br />
is-killed woman murderer<br />
b. drepes kvinne ?<br />
is-killed woman<br />
Example (3-6) provides a more elegant structuring. Because an extraction method based on<br />
syntactic relations is unable to generalise over verbal voice, two separate predicates are<br />
extracted, one for the passive and one for the active voice. Even though logically, the same<br />
action is performed on the entity the woman in both sentences in (3-6), a method as outlined<br />
above would not allow for a straightforward interpretation of this. The generalisation between<br />
active and passive versions of the same sentence is lost in such an approach. This would result in<br />
a higher number of predicates, and therefore in a less generalisable data material. It is likely that<br />
results as outlined above would also be of less use as a referent guessing helper in an anaphora<br />
resolution system, precisely because of the lower level of generalisability.<br />
3.2.1 What is represented in the EPAS?<br />
Jurafsky and Martin (2000, p. 510) state that all languages have predicate-argument structures at<br />
the core of their semantic structure. They further describe that the grammar organises the<br />
predicate-argument structure and selectional constraints restrict how other words and phrases<br />
can combine with a given word. In this project, a simplified version of predicate-argument<br />
structures is used as meaning representation. The EPAS, or meaning representations, are limited<br />
to consist of two nominal arguments at the most. Either one of the arguments in an EPAS may<br />
be empty/unidentified. This means that the EPAS extracted from my texts will belong to one of<br />
the following three patterns:<br />
39