10.04.2013 Views

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

exploited in the process of resolving the pronoun it. The likelihood of the antecedent is<br />

determined statistically and the antecedent candidate with the highest value is selected by the<br />

system. The approach uses a syntactic-based heuristic rule for the selection of the antecedent.<br />

Nasukawa states that approaches using real-world knowledge are not large-scale enough yet to<br />

be of use in broad-coverage systems and attempts at extracting information corresponding to<br />

case frames in world knowledge from the texts to be processed (Nasukawa 1994, p. 1157). In<br />

this way, collocation patterns are used as a form of world knowledge for the domain of the texts.<br />

As has been outlined in the introduction, this thesis describes a method that can aid in the<br />

resolution of the anaphoric expressions that require real-world knowledge to correctly resolve<br />

their antecedents. The method automatically extracts and classifies nominal arguments, resulting<br />

in associated classes of similar words. This is a knowledge-poor method in the sense that it does<br />

not require a comprehensive knowledge base to be built, but rather uses data and co-occurrence<br />

patterns from a corpus to find the most likely antecedent from a list of possible candidates found<br />

in a text.<br />

2.1.3 Anaphora resolution and text summarisation<br />

As already mentioned, several NLP applications need a reliable means to resolve anaphoric<br />

expressions and identify coreferences. In the field of text summarisation, which belongs to the<br />

domain of the KunDoc project, anaphora resolution is vital for the process of finding<br />

coreferential chains, identifying discourse structure and ultimately producing a coherent<br />

summary. Systems for automatic text summarisation need to make a number of choices<br />

regarding the resolution of anaphoric expressions. Mani (2001, p. 70) identifies “dangling<br />

anaphors” as a coherence problem in automatic summaries; without a means to resolve<br />

anaphoric expressions, the summary may contain anaphors, but not the antecedents they refer to.<br />

This disturbs the coherence in the summary; not all the information that the reader needs is<br />

present in the summarised text. The (constructed) example below illustrates this: consider the<br />

full text example in (2-15a) in connection with the summarised version in (2-15b). Both<br />

instances of the pronouns han (he) in the summarised version in (2-15b) do not have an<br />

identified reference in the text. For a reader presented only with the summary, it is highly<br />

unclear what these pronouns refer to.<br />

22

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!