Unni Cathrine Eiken February 2005

More documents

Recommendations

Info

c. PERP: gjerningsmann, drapsmann perpetrator, killer d. PERSON: person, bilfører, syklist, vedkommende person, car driver, biker, generic-nom e. OBSERV: teori, observasjon theory, observation f. PLACE: studentkollektiv, Førde student housing, Førde The classes of words shown in (4-11) form groups of concepts which occur in the same contextual environments within the thematic domain that the EPAS are extracted from. The groupings seem to reflect real semantic clusters in the sense that one can easily find a label to describe each group. For the purpose of the text collection in the present work, these six concept groups represent six distinct semantic groupings that share many features with respect to pattern distribution in the data set. With a larger data set to run the concept association on, more concept groups, and also more members within each group, would have been a likely outcome. The results of the concept association on the small data set in this project, does, however, suggest the feasibility of the method, as well as show that frequent patterns in smaller text collections also work toward capturing interesting concept groupings. 4.3 Step III: Using concept groups in TiMBL The concept groups which emerged as a result of the association performed in section 4.2 above, represent clusters of words that occur in similar constellations in the data material. The emergence of concept groups which intuitively seem to have some semantic resemblance to each other confirms that the context a word fits into does indeed say something about what the word means, as per the distributional hypothesis. 74
In the introduction to this chapter, it was stated that the aim of classifying the EPAS list is twofold; on the one hand it is of interest to see to which degree the environments that an argument occurs in over a collection of texts provide sufficient cues to ensure a correct guess of which argument can be expected in a specific context, on the other hand it is equally interesting to see if we through classification can narrow down the set of possible arguments for a specific context pattern. Through the association technique, six groups of words emerged; the members of each group sharing the feature that they all tend to occur in the same environments. Previously, it has been stated that some anaphors need access to information about the world in order to be resolved. This information can to some extent be represented by the concept groups associated from the data set. By identifying groups of words which typically occur in the same textual environment, an intuition about which words to expect in which contexts is captured. In the event of “difficult” anaphors which depend on world knowledge, an anaphora resolution system can retrieve potential antecedents from the text, check which concept group an expected antecedent is likely to belong to and consequently chose the antecedent candidate belonging to the expected concept group. As a first step of examining the usefulness of concept groups in combination with anaphora, experiments aiming at enhancing the performance of the classifier in section 4.1 were performed. These experiments are described in the following section. 4.3.1 Testing Tests were performed in TiMBL, using the relevant concept group as the category for a feature pattern. Analogous to the testing in section 4.1.2 above, two separate test sets were prepared, one for the classification of each argument. In the cases where the relevant argument was a member of one of the concept groups, the head label of the concept group was used as the category label in the input data. If the relevant argument did not belong to any concept group, the argument itself was used as category label, as in the tests in section 4.1.2. Example (4-12) below shows an excerpt of the input file used for training the classifier for argument 1 classification. (4- 12) drepe,gjerningsmann,kvinne,PERP drept,sykepleiestudent,?,WOMAN død,sykepleiestudent,?,WOMAN ekstra,patrulje,?,patrulje 75
Page 1 and 2:
University of Bergen Section for li
Page 3 and 4:
Preface The project presented in th
Page 5 and 6:
Table of Contents 1 INTRODUCTION AN
Page 7 and 8:
1 Introduction and problem statemen
Page 9 and 10:
patterns found in a text collection
Page 11 and 12:
The results obtained in this projec
Page 13 and 14:
The term anaphor describes a lingui
Page 15 and 16:
2.1.1.1 Discourse representation th
Page 17 and 18:
eferring to BT. The NP which is lin
Page 19 and 20:
esolution system will not be able t
Page 21 and 22:
(2- 12) REC SUBJ EXIST OBJ IND-OBJ
Page 23 and 24:
Figure 1 17
Page 25 and 26:
means that the algorithm would prop
Page 27 and 28:
for an overview). Many of these sys
Page 29 and 30: (2- 15) a. Politiet etterlyste i da
Page 31 and 32: section. The theory dates back to t
Page 33 and 34: 2.2.2 Different types of context So
Page 35 and 36: neighbours. For example, a target w
Page 37 and 38: with it. Selectional constraints al
Page 39 and 40: 3 From text to EPAS - the extractio
Page 41 and 42: 3.2 Predicate-argument structures "
Page 43 and 44: speaker flexibility with regards to
Page 45 and 46: and woman occur together both in su
Page 47 and 48: occur with. Arguments which are unl
Page 49 and 50: 3.3.1 NorGram in outline Norsk komp
Page 51 and 52: Figure 3 The most useful structure
Page 53 and 54: 3.4 Altering the source As already
Page 55 and 56: (3- 12) (3- 13) Politiet leter ette
Page 57 and 58: ARG1 and ARG2 arrays display a valu
Page 59 and 60: (3- 20) Anne Slåtten bodde i et st
Page 61 and 62: value and highly desirable. As such
Page 63 and 64: this project, this can be interpret
Page 65 and 66: The process of classifying the cons
Page 67 and 68: There are several different distanc
Page 69 and 70: . ankomme,etterforsker,?,? ankomme,
Page 71 and 72: Test 2 Training set: EPAS_arg1 with
Page 73 and 74: The training and test material was
Page 75 and 76: • level 0: words which co-occur w
Page 77 and 78: (4- 9) avklare,obduksjon,? bede-om,
Page 79: (4-10) below shows the output for t
Page 83 and 84: the EPAS can be used in the classif
Page 85 and 86: exemption of jobbe-utfra, none of t
Page 87 and 88: antecedent for (4-15a). In the case
Page 89 and 90: Figure 7 Interestingly enough, howe
Page 91 and 92: When testing on knowledge-dependent
Page 93 and 94: Firth, J. R. (1957): A synopsis of
Page 95 and 96: Appendix A: Ekstraktor.pl - algorit
Page 97 and 98: finnARG2(); This function has exact
Page 99 and 100: #legger lest linje inn i @prt derso
Page 101 and 102: sub fjernEP{ #fjerner elementer fra
Page 103 and 104: } splice(@ARGx); $imax = @ARG3ep; @
Page 105 and 106: } else{ } } } push(@liste, $ARG0ep[
Page 107 and 108: 101 Appendix C: the EPAS list 23-å
Page 109 and 110: 103 obdusere,,kvinne observere,,23-
Page 111 and 112: Appendix D: Text aligned with EPAS
Page 113 and 114: eventualiteter. Vi varslet Kripos.
Page 115 and 116: Etterforskerne har flere observasjo
Page 117 and 118: # Subrutine som tar inn argumentnum
Page 119 and 120: Appendix F: POS-based structures SE
Page 121: Vi har ingen spesiell teori som vi
show all

Unni Cathrine Eiken February 2005

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?