Unni Cathrine Eiken February 2005
Unni Cathrine Eiken February 2005 Unni Cathrine Eiken February 2005
2 Theoretical background In order to understand the motivation for developing an extraction and classification method as described in the present work, one needs a brief explanation of the theoretical foundation on which the method is based. In this chapter, the theoretical background of the method is described. In section 2.1 the concept of anaphora resolution and the need for context information in anaphora resolution systems is outlined. In section 2.2 the notion of using context as a means to identify semantically similar words is explained. 2.1 Anaphora resolution Most natural language texts contain an abundance of pronouns and other expressions which are referentially linked to other items in the texts. In order to understand the meaning conveyed by a text, one needs a method to find out which entities these expressions are linked to. It is difficult to determine what a pronoun refers to without taking the notion of context and real-world knowledge into account. Natural language requires a certain amount of context to be intelligible. We distinguish between linguistic context, which denotes the concrete linguistic setting that a given word occurs in, and a more general notion of context that refers to the non-linguistic setting. In the following, a background on the theoretical basics of anaphora will be given, before some approaches to anaphora resolution are briefly outlined. Anaphor and referring expression are both terms that are used for words that point back either to other words or to entities in the world. Anaphora 2 can be defined as the linguistic phenomenon of using an anaphor to point back to a previously mentioned item in a text (Mitkov 2003, p. 266). In the Oxford Concise Dictionary of Linguistics (Matthews 1997), a referring expression is defined as a linguistic element that refers to a specific entity in the real world, termed a referent. A referring expression can be any natural language expression that is used to refer to a realworld entity, including nouns and pronouns. As such the linguistic expressions James and he in a given text may both refer to a person called “James” existing in the real world. 2 The term anaphora is in the present work used in alignment with current literature on anaphora resolution. Anaphora is the linguistic phenomenon of an anaphor pointing to another item in the text and should not be understood as the plural form of anaphor, which is anaphors. 6
The term anaphor describes a linguistic element, often a pronoun or a nominal, which is linked to another linguistic element previously presented in the text (Mitkov 2003). An anaphoric reference is usually supported by a preceding nominal, which is called an antecedent. If a referring pronoun is mentioned previous to the mentioning of its referent, the term cataphora applies (Jurafsky and Martin 2000, p. 675). Anaphora provides us with an indirect reference to a real-world entity. When a referring expression, such as James, has been introduced in a text, it allows for subsequent reference by anaphors, such as he or the boy. The original referring expression is therefore the antecedent of the subsequent referring anaphor, for example the pronoun he. If the anaphor and the antecedent it is linked to both have the same referent in the real world, they are termed coreferential (Mitkov 2003, p. 267). (2- 1) Politimannen sier at han har flere observasjoner The policeman says that he has several observations In example (2-1) above, the pronoun han (he) is an anaphor which points back to its antecedent, the referring expression politimannen (the policeman). Han and politimannen both refer to the same real-world referent, the object “the policeman”, and are therefore coreferential. There are various and complex structural conditions on the co-occurrence of an anaphor and its antecedent. This includes constraints on how far away from each other the antecedent and the referring anaphor can be without disturbing the understanding of the text. An elaborate discussion of these conditions is, however, not within the scope of the present work. Mitkov (2003, p. 268) distinguishes between the following types of anaphora: • pronominal anaphora: The anaphor is a pronoun. • lexical noun phrase anaphora: The anaphor is a definite description or proper name that gives additional information and has a meaning independent of the antecedent. • verb anaphora: The anaphor is a verb and refers to an action. • adverb anaphora: The anaphor is an adverb. • zero anaphora: The anaphor is implicitly present in the text, but physically omitted. 7
- Page 1 and 2: University of Bergen Section for li
- Page 3 and 4: Preface The project presented in th
- Page 5 and 6: Table of Contents 1 INTRODUCTION AN
- Page 7 and 8: 1 Introduction and problem statemen
- Page 9 and 10: patterns found in a text collection
- Page 11: The results obtained in this projec
- Page 15 and 16: 2.1.1.1 Discourse representation th
- Page 17 and 18: eferring to BT. The NP which is lin
- Page 19 and 20: esolution system will not be able t
- Page 21 and 22: (2- 12) REC SUBJ EXIST OBJ IND-OBJ
- Page 23 and 24: Figure 1 17
- Page 25 and 26: means that the algorithm would prop
- Page 27 and 28: for an overview). Many of these sys
- Page 29 and 30: (2- 15) a. Politiet etterlyste i da
- Page 31 and 32: section. The theory dates back to t
- Page 33 and 34: 2.2.2 Different types of context So
- Page 35 and 36: neighbours. For example, a target w
- Page 37 and 38: with it. Selectional constraints al
- Page 39 and 40: 3 From text to EPAS - the extractio
- Page 41 and 42: 3.2 Predicate-argument structures "
- Page 43 and 44: speaker flexibility with regards to
- Page 45 and 46: and woman occur together both in su
- Page 47 and 48: occur with. Arguments which are unl
- Page 49 and 50: 3.3.1 NorGram in outline Norsk komp
- Page 51 and 52: Figure 3 The most useful structure
- Page 53 and 54: 3.4 Altering the source As already
- Page 55 and 56: (3- 12) (3- 13) Politiet leter ette
- Page 57 and 58: ARG1 and ARG2 arrays display a valu
- Page 59 and 60: (3- 20) Anne Slåtten bodde i et st
- Page 61 and 62: value and highly desirable. As such
2 Theoretical background<br />
In order to understand the motivation for developing an extraction and classification method as<br />
described in the present work, one needs a brief explanation of the theoretical foundation on<br />
which the method is based. In this chapter, the theoretical background of the method is<br />
described. In section 2.1 the concept of anaphora resolution and the need for context information<br />
in anaphora resolution systems is outlined. In section 2.2 the notion of using context as a means<br />
to identify semantically similar words is explained.<br />
2.1 Anaphora resolution<br />
Most natural language texts contain an abundance of pronouns and other expressions which are<br />
referentially linked to other items in the texts. In order to understand the meaning conveyed by a<br />
text, one needs a method to find out which entities these expressions are linked to. It is difficult<br />
to determine what a pronoun refers to without taking the notion of context and real-world<br />
knowledge into account. Natural language requires a certain amount of context to be intelligible.<br />
We distinguish between linguistic context, which denotes the concrete linguistic setting that a<br />
given word occurs in, and a more general notion of context that refers to the non-linguistic<br />
setting. In the following, a background on the theoretical basics of anaphora will be given,<br />
before some approaches to anaphora resolution are briefly outlined.<br />
Anaphor and referring expression are both terms that are used for words that point back either to<br />
other words or to entities in the world. Anaphora 2 can be defined as the linguistic phenomenon<br />
of using an anaphor to point back to a previously mentioned item in a text (Mitkov 2003, p.<br />
266).<br />
In the Oxford Concise Dictionary of Linguistics (Matthews 1997), a referring expression is<br />
defined as a linguistic element that refers to a specific entity in the real world, termed a referent.<br />
A referring expression can be any natural language expression that is used to refer to a realworld<br />
entity, including nouns and pronouns. As such the linguistic expressions James and he in<br />
a given text may both refer to a person called “James” existing in the real world.<br />
2 The term anaphora is in the present work used in alignment with current literature on anaphora resolution.<br />
Anaphora is the linguistic phenomenon of an anaphor pointing to another item in the text and should not be<br />
understood as the plural form of anaphor, which is anaphors.<br />
6