Unni Cathrine Eiken February 2005

10.04.2013 Views
with such an approach is stated by the developers: “the strategy is simple, but requires a fairly large amount of knowledge to be useful for a broad range of cases” (Carbonell and Brown 1988, p. 97). Generally speaking, the knowledge bases that knowledge-based systems for anaphora resolution rely on are difficult to represent and process, and require a considerable amount of human input (Mitkov 2001, p. 110). The information is structured using different frameworks; often each anaphora resolution system structures its knowledge base in a system-specific manner. Rather than giving an outline of various specific methods belonging to the traditional approaches, some of the formats used for knowledge representation are briefly mentioned below. Several frameworks have been developed to cope with the need for a formalism to represent real-world or domain knowledge. Most of these have been part of specific anaphora resolutions systems and have not constituted independent frameworks for the representations of real-world knowledge. Minsky’s Frames (Minsky 1975, in Botley and McEnery 2000) is a framework for representing knowledge about stereotyped objects and events. The frames are dynamic in the sense that the information they hold about a particular object or event can change if new information is encountered. Input into the system is interpreted in accordance with the information present in the frames; the frames generate expectations about the input (Botley and McEnery 2000, p. 12). In the case of a “shooting frame” being evoked upon processing of the sentence in (2-9a), the expectation that if somebody misses, it is likely to be the same person that also was doing the shooting, is created. Following such an expectation, it is easy to identify the correct antecedent for the anaphor. Schank’s Scripts (Schank 1972, in Botley and McEnery 2000) have some similarity to Minsky’s Frames, but are primarily used to represent knowledge about events which do not undergo change (Botley and McEnery 2000, p. 12). Information about role assignment and the sequence of events in given contexts is represented in the script. 2.1.2.3 Alternative approaches to anaphora resolution Hand-coded knowledge bases that aim at representing real-world or domain knowledge are expensive and labor-intensive to build and maintain. As a consequence, the focus has shifted toward systems that rely less heavily on world knowledge in the last 15 years (see Mitkov 2003 20

for an overview). Many of these systems incorporate semantic and real-world knowledge, but use methods that enable the collection of this information to have a high degree of automation (Baldwin 1997; Dagan and Itai 1990; Dagan et al. 1995; Nasukawa 1994; inter al.). Mitkov (2003) terms these systems knowledge-poor and attributes their growth in number in recent years to the fact that corpora and similar electronic linguistic resources have become better, larger and more available. Some of these systems do not really attempt at building a world- or domain knowledge base (Baldwin 1997; Nasukawa 1994), but rather look at features such as co- occurrence patterns in the text itself, while others integrate corpora and use them as a form of abstract knowledge base (Dagan and Itai 1990; Dagan et al. 1995). Among the different “alternative” approaches, Dagan and Itai’s (1990) statistical approach, Dagan et al.’s (1995) estimation of unseen patterns and Nasukawa’s (1994) knowledge-free method are of particular interest for this project. Dagan and Itai’s (1990) method is that of using co-occurrence patterns observed in a corpus as a type of selectional restrictions. Co-occurrence patterns observed in a large corpus are thought to reflect the semantic constraints that apply to natural language. Candidates for antecedents for the anaphor it are identified in the text and put in the place of the anaphor to be resolved. This produces co-occurrence patterns that are checked against the corpus. Subsequently the candidate present in the most frequently occurring cooccurrence pattern is chosen as the antecedent. This method relies on a large corpus, as only patterns which actually have been seen in the corpus are considered. Infrequent patterns will not be picked since they generally speaking will not feature on the top of the pattern list. Dagan et al. (1995) offer a solution to this problem by presenting a similar method which also estimates the probability of co-occurrence patterns that have not been observed in the training data. They state the importance of distinguishing between probable and improbable unobserved cooccurrence patterns and emphasise that the “distinctions ought to be made using the data that do occur in the corpus” (Dagan et al. 1995, p. 164). Anologies are made between specific unseen co-occurrence patterns and observed co-occurrences which contain similar words, determining word similarity by a similarity metric. Patterns that contain similar words to the target word and that have been observed in the training data are used to calculate how likely the target word is to occur in the same pattern. Nasukawa (1994) presents a resolution rate of 93,8% in an even knowledge-poorer method for pronoun resolution. Instead of drawing information from a corpus, word frequency and co-occurrence patterns in the text itself are used to filter out the most likely candidate for the antecedent. In Nasukawa’s approach, inter-sentential data is 21

Page 1 and 2: University of Bergen Section for li

Page 3 and 4: Preface The project presented in th

Page 5 and 6: Table of Contents 1 INTRODUCTION AN

Page 7 and 8: 1 Introduction and problem statemen

Page 9 and 10: patterns found in a text collection

Page 11 and 12: The results obtained in this projec

Page 13 and 14: The term anaphor describes a lingui

Page 15 and 16: 2.1.1.1 Discourse representation th

Page 17 and 18: eferring to BT. The NP which is lin

Page 19 and 20: esolution system will not be able t

Page 21 and 22: (2- 12) REC SUBJ EXIST OBJ IND-OBJ

Page 23 and 24: Figure 1 17

Page 25: means that the algorithm would prop

Page 29 and 30: (2- 15) a. Politiet etterlyste i da

Page 31 and 32: section. The theory dates back to t

Page 33 and 34: 2.2.2 Different types of context So

Page 35 and 36: neighbours. For example, a target w

Page 37 and 38: with it. Selectional constraints al

Page 39 and 40: 3 From text to EPAS - the extractio

Page 41 and 42: 3.2 Predicate-argument structures "

Page 43 and 44: speaker flexibility with regards to

Page 45 and 46: and woman occur together both in su

Page 47 and 48: occur with. Arguments which are unl

Page 49 and 50: 3.3.1 NorGram in outline Norsk komp

Page 51 and 52: Figure 3 The most useful structure

Page 53 and 54: 3.4 Altering the source As already

Page 55 and 56: (3- 12) (3- 13) Politiet leter ette

Page 57 and 58: ARG1 and ARG2 arrays display a valu

Page 59 and 60: (3- 20) Anne Slåtten bodde i et st

Page 61 and 62: value and highly desirable. As such

Page 63 and 64: this project, this can be interpret

Page 65 and 66: The process of classifying the cons

Page 67 and 68: There are several different distanc

Page 69 and 70: . ankomme,etterforsker,?,? ankomme,

Page 71 and 72: Test 2 Training set: EPAS_arg1 with

Page 73 and 74: The training and test material was

Page 75 and 76: • level 0: words which co-occur w

Page 77 and 78: (4- 9) avklare,obduksjon,? bede-om,

Page 79 and 80: (4-10) below shows the output for t

Page 81 and 82: In the introduction to this chapter

Page 83 and 84: the EPAS can be used in the classif

Page 85 and 86: exemption of jobbe-utfra, none of t

Page 87 and 88: antecedent for (4-15a). In the case

Page 89 and 90: Figure 7 Interestingly enough, howe

Page 91 and 92: When testing on knowledge-dependent

Page 93 and 94: Firth, J. R. (1957): A synopsis of

Page 95 and 96: Appendix A: Ekstraktor.pl - algorit

Page 97 and 98: finnARG2(); This function has exact

Page 99 and 100: #legger lest linje inn i @prt derso

Page 101 and 102: sub fjernEP{ #fjerner elementer fra

Page 103 and 104: } splice(@ARGx); $imax = @ARG3ep; @

Page 105 and 106: } else{ } } } push(@liste, $ARG0ep[

Page 107 and 108: 101 Appendix C: the EPAS list 23-å

Page 109 and 110: 103 obdusere,,kvinne observere,,23-

Page 111 and 112: Appendix D: Text aligned with EPAS

Page 113 and 114: eventualiteter. Vi varslet Kripos.

Page 115 and 116: Etterforskerne har flere observasjo

Page 117 and 118: # Subrutine som tar inn argumentnum

Page 119 and 120: Appendix F: POS-based structures SE

Page 121: Vi har ingen spesiell teori som vi

politi

epas

structures

semantic

politiet

etterforsker

method

context

manual

anaphora

unni

cathrine

eiken

february

www.hf.uib.no

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005 ... View more Unni Cathrine Eiken February 2005

Delete template?

Save as template ?

Unni Cathrine Eiken February 2005 Unni Cathrine Eiken February 2005