Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005 Unni Cathrine Eiken February 2005

10.04.2013 Views

2 Theoretical background In order to understand the motivation for developing an extraction and classification method as described in the present work, one needs a brief explanation of the theoretical foundation on which the method is based. In this chapter, the theoretical background of the method is described. In section 2.1 the concept of anaphora resolution and the need for context information in anaphora resolution systems is outlined. In section 2.2 the notion of using context as a means to identify semantically similar words is explained. 2.1 Anaphora resolution Most natural language texts contain an abundance of pronouns and other expressions which are referentially linked to other items in the texts. In order to understand the meaning conveyed by a text, one needs a method to find out which entities these expressions are linked to. It is difficult to determine what a pronoun refers to without taking the notion of context and real-world knowledge into account. Natural language requires a certain amount of context to be intelligible. We distinguish between linguistic context, which denotes the concrete linguistic setting that a given word occurs in, and a more general notion of context that refers to the non-linguistic setting. In the following, a background on the theoretical basics of anaphora will be given, before some approaches to anaphora resolution are briefly outlined. Anaphor and referring expression are both terms that are used for words that point back either to other words or to entities in the world. Anaphora 2 can be defined as the linguistic phenomenon of using an anaphor to point back to a previously mentioned item in a text (Mitkov 2003, p. 266). In the Oxford Concise Dictionary of Linguistics (Matthews 1997), a referring expression is defined as a linguistic element that refers to a specific entity in the real world, termed a referent. A referring expression can be any natural language expression that is used to refer to a realworld entity, including nouns and pronouns. As such the linguistic expressions James and he in a given text may both refer to a person called “James” existing in the real world. 2 The term anaphora is in the present work used in alignment with current literature on anaphora resolution. Anaphora is the linguistic phenomenon of an anaphor pointing to another item in the text and should not be understood as the plural form of anaphor, which is anaphors. 6

The term anaphor describes a linguistic element, often a pronoun or a nominal, which is linked to another linguistic element previously presented in the text (Mitkov 2003). An anaphoric reference is usually supported by a preceding nominal, which is called an antecedent. If a referring pronoun is mentioned previous to the mentioning of its referent, the term cataphora applies (Jurafsky and Martin 2000, p. 675). Anaphora provides us with an indirect reference to a real-world entity. When a referring expression, such as James, has been introduced in a text, it allows for subsequent reference by anaphors, such as he or the boy. The original referring expression is therefore the antecedent of the subsequent referring anaphor, for example the pronoun he. If the anaphor and the antecedent it is linked to both have the same referent in the real world, they are termed coreferential (Mitkov 2003, p. 267). (2- 1) Politimannen sier at han har flere observasjoner The policeman says that he has several observations In example (2-1) above, the pronoun han (he) is an anaphor which points back to its antecedent, the referring expression politimannen (the policeman). Han and politimannen both refer to the same real-world referent, the object “the policeman”, and are therefore coreferential. There are various and complex structural conditions on the co-occurrence of an anaphor and its antecedent. This includes constraints on how far away from each other the antecedent and the referring anaphor can be without disturbing the understanding of the text. An elaborate discussion of these conditions is, however, not within the scope of the present work. Mitkov (2003, p. 268) distinguishes between the following types of anaphora: • pronominal anaphora: The anaphor is a pronoun. • lexical noun phrase anaphora: The anaphor is a definite description or proper name that gives additional information and has a meaning independent of the antecedent. • verb anaphora: The anaphor is a verb and refers to an action. • adverb anaphora: The anaphor is an adverb. • zero anaphora: The anaphor is implicitly present in the text, but physically omitted. 7

2 Theoretical background<br />

In order to understand the motivation for developing an extraction and classification method as<br />

described in the present work, one needs a brief explanation of the theoretical foundation on<br />

which the method is based. In this chapter, the theoretical background of the method is<br />

described. In section 2.1 the concept of anaphora resolution and the need for context information<br />

in anaphora resolution systems is outlined. In section 2.2 the notion of using context as a means<br />

to identify semantically similar words is explained.<br />

2.1 Anaphora resolution<br />

Most natural language texts contain an abundance of pronouns and other expressions which are<br />

referentially linked to other items in the texts. In order to understand the meaning conveyed by a<br />

text, one needs a method to find out which entities these expressions are linked to. It is difficult<br />

to determine what a pronoun refers to without taking the notion of context and real-world<br />

knowledge into account. Natural language requires a certain amount of context to be intelligible.<br />

We distinguish between linguistic context, which denotes the concrete linguistic setting that a<br />

given word occurs in, and a more general notion of context that refers to the non-linguistic<br />

setting. In the following, a background on the theoretical basics of anaphora will be given,<br />

before some approaches to anaphora resolution are briefly outlined.<br />

Anaphor and referring expression are both terms that are used for words that point back either to<br />

other words or to entities in the world. Anaphora 2 can be defined as the linguistic phenomenon<br />

of using an anaphor to point back to a previously mentioned item in a text (Mitkov 2003, p.<br />

266).<br />

In the Oxford Concise Dictionary of Linguistics (Matthews 1997), a referring expression is<br />

defined as a linguistic element that refers to a specific entity in the real world, termed a referent.<br />

A referring expression can be any natural language expression that is used to refer to a realworld<br />

entity, including nouns and pronouns. As such the linguistic expressions James and he in<br />

a given text may both refer to a person called “James” existing in the real world.<br />

2 The term anaphora is in the present work used in alignment with current literature on anaphora resolution.<br />

Anaphora is the linguistic phenomenon of an anaphor pointing to another item in the text and should not be<br />

understood as the plural form of anaphor, which is anaphors.<br />

6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!