10.04.2013 Views

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

This thesis explores the value of using co-occurrence patterns to create concept groups that can<br />

act as an aid in the process of finding what a pronoun refers to. In order to find the entity that the<br />

pronoun han (he) refers to in example (1-1), the following two alternative patterns can be<br />

considered:<br />

(1- 3)<br />

a. lensmann etterlyser vitne sergeant calls-for witness<br />

b. gjerningsmann etterlyser vitne perpetrator calls-for witness<br />

When considering which of these patterns is the most likely one, data collected from a corpus<br />

can be consulted (Dagan and Itai 1990; Dagan et al. 1995; Nasukawa 1994; inter al.). If one of<br />

the patterns is found literally in the corpus, it will receive a strong preference. If none of the<br />

patterns occur in the data collection, similar patterns can be considered. Given that the patterns<br />

in example (1-4) below do feature in the data collection, they can contribute to guessing the<br />

correct referent for the anaphor in example (1-1):<br />

(1- 4)<br />

a. politi etterlyser vitne police call-for witness<br />

b. etterforsker etterlyser vitne investigator calls-for witness<br />

c. lensmann avhører vitne sergeant interviews witness<br />

d. politi avhører person police interview person<br />

e. gjerningsmann dreper offer perpetrator kills victim<br />

f. gjerningsmann angriper kvinne perpetrator attacks woman<br />

In view of the patterns in (1-4), the word lensmann (sergeant) engages in contexts similar to<br />

those of politi (police), which in turn occurs in similar contexts to etterforsker (investigator). By<br />

using association techniques, lensmann can be associated with the other arguments which occur<br />

in similar linguistic environments, and subsequently be preferred as the referent in (1-1).<br />

Approaches within the field of anaphora resolution have in recent years focused on knowledge-<br />

poor strategies used in combination with corpora, at the same time, the notion of constructing a<br />

large and comprehensive base of real-world knowledge has been abandoned somewhat (see<br />

Mitkov 2003 for a brief overview). The approach in the present work expands the co-occurrence<br />

2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!