Unni Cathrine Eiken February 2005
Unni Cathrine Eiken February 2005 Unni Cathrine Eiken February 2005
The classifier was created with EPAS_arg1 with no pronouns as training set and tested with all EPAS containing pronouns in the position of argument 1. For the test set, each EPAS was completed with the antecedent for its pronoun. For reasons of classification and testing, the antecedent was appended at the end of each EPAS, thus functioning as the category label for the structure. In total, there were 26 EPAS with pronouns in the position of argument 1. (4- 3) below shows an example from the test file with pronouns as argument 1: (4- 3) få,pron,rapport,politi When classifying with argument 1 as category label and testing with EPAS with pronouns as argument 1, TiMBL assigned the correct category in 57,69% (15/26) of the test cases. One of the cases where the classifier had assigned the “wrong” category was actually not incorrect, the antecedent was of a form that did not exist in the training material (antecedent: kvinne/vitne (woman/witness), assigned category: vitne (witness)). Furthermore, in six of the incorrectly assigned categories, the category chosen by the classifier was semantically close to the correct antecedent. Example (4-4) below shows the seven examples where the incorrect categories assigned by the classifier can in fact be viewed as belonging to the same semantic group, and thus at least as a partially successful classification. Regarding all these instances as successful category assignments would heighten the classifier’s correct categorisations to 84,61% (22/26). (4- 4) Correct antecedent Assigned category kvinne/vitne (woman/witness) vitne (witness) Fonn (Fonn) politi (police) Kripos-spesialist (Kripos specialist) politi (police) politimester (police chief) Fonn (Fonn) politi (police) etterforsker (investigator) politi (police) Fonn (Fonn) Slåtten (Slåtten) kvinne (woman) 64
Test 2 Training set: EPAS_arg1 with no pronouns, argument 1 ignored. Test method: leave-one-out Result: 42,40% (81/191) correct classifications When training and testing on the EPAS_arg1 list with pronouns removed, the classifier produced a quite poor accuracy of 42,40%. TiMBL’s leave-one-out option makes it possible to train and test on the same material, as each pattern in the training file is used as a test case while the rest of the patterns are used as training material. One reason for the relatively low percentage of correctly classified instances is most likely the small size of the data set. With only 191 patterns to learn from, the classifier does not have enough diversity in the examples to provide correct classifications and also does not find enough occurrences of the individual patterns to be able to pick the correct category. Since politi (police) by far is the most frequent feature in the EPAS list, many instances are wrongly assigned the category “police” by virtue of the majority vote of the nearest neighbour classification. An attempt to avoid this effect is described in test 3. Examining the instances where the classifier assigned the wrong category to an EPAS showed that in 27 of the incorrectly classified cases, the assigned category was semantically similar to the correct category. This suggests that the list in itself does contain some relevant information about the distribution of argument 1 in the data set. Example (4-5) below shows the correct categories and the categories assigned by the classifier. (4- 5) Correct category Assigned category Anne kvinne (woman) Slåtten drapsmann (killer) gjerningsmann (perpetrator) etterforsker (investigator) politi (police) Fonn lensmann (deputy) politi (police) gjerningsmann (perpetrator) person (person) Kripos-spesialist(Kripos specialist) politi (police) kvinne (woman) 23-åring (23-year-old) 65
- Page 19 and 20: esolution system will not be able t
- Page 21 and 22: (2- 12) REC SUBJ EXIST OBJ IND-OBJ
- Page 23 and 24: Figure 1 17
- Page 25 and 26: means that the algorithm would prop
- Page 27 and 28: for an overview). Many of these sys
- Page 29 and 30: (2- 15) a. Politiet etterlyste i da
- Page 31 and 32: section. The theory dates back to t
- Page 33 and 34: 2.2.2 Different types of context So
- Page 35 and 36: neighbours. For example, a target w
- Page 37 and 38: with it. Selectional constraints al
- Page 39 and 40: 3 From text to EPAS - the extractio
- Page 41 and 42: 3.2 Predicate-argument structures "
- Page 43 and 44: speaker flexibility with regards to
- Page 45 and 46: and woman occur together both in su
- Page 47 and 48: occur with. Arguments which are unl
- Page 49 and 50: 3.3.1 NorGram in outline Norsk komp
- Page 51 and 52: Figure 3 The most useful structure
- Page 53 and 54: 3.4 Altering the source As already
- Page 55 and 56: (3- 12) (3- 13) Politiet leter ette
- Page 57 and 58: ARG1 and ARG2 arrays display a valu
- Page 59 and 60: (3- 20) Anne Slåtten bodde i et st
- Page 61 and 62: value and highly desirable. As such
- Page 63 and 64: this project, this can be interpret
- Page 65 and 66: The process of classifying the cons
- Page 67 and 68: There are several different distanc
- Page 69: . ankomme,etterforsker,?,? ankomme,
- Page 73 and 74: The training and test material was
- Page 75 and 76: • level 0: words which co-occur w
- Page 77 and 78: (4- 9) avklare,obduksjon,? bede-om,
- Page 79 and 80: (4-10) below shows the output for t
- Page 81 and 82: In the introduction to this chapter
- Page 83 and 84: the EPAS can be used in the classif
- Page 85 and 86: exemption of jobbe-utfra, none of t
- Page 87 and 88: antecedent for (4-15a). In the case
- Page 89 and 90: Figure 7 Interestingly enough, howe
- Page 91 and 92: When testing on knowledge-dependent
- Page 93 and 94: Firth, J. R. (1957): A synopsis of
- Page 95 and 96: Appendix A: Ekstraktor.pl - algorit
- Page 97 and 98: finnARG2(); This function has exact
- Page 99 and 100: #legger lest linje inn i @prt derso
- Page 101 and 102: sub fjernEP{ #fjerner elementer fra
- Page 103 and 104: } splice(@ARGx); $imax = @ARG3ep; @
- Page 105 and 106: } else{ } } } push(@liste, $ARG0ep[
- Page 107 and 108: 101 Appendix C: the EPAS list 23-å
- Page 109 and 110: 103 obdusere,,kvinne observere,,23-
- Page 111 and 112: Appendix D: Text aligned with EPAS
- Page 113 and 114: eventualiteter. Vi varslet Kripos.
- Page 115 and 116: Etterforskerne har flere observasjo
- Page 117 and 118: # Subrutine som tar inn argumentnum
- Page 119 and 120: Appendix F: POS-based structures SE
Test 2<br />
Training set: EPAS_arg1 with no pronouns, argument 1 ignored.<br />
Test method: leave-one-out<br />
Result: 42,40% (81/191) correct classifications<br />
When training and testing on the EPAS_arg1 list with pronouns removed, the classifier<br />
produced a quite poor accuracy of 42,40%. TiMBL’s leave-one-out option makes it possible to<br />
train and test on the same material, as each pattern in the training file is used as a test case while<br />
the rest of the patterns are used as training material. One reason for the relatively low percentage<br />
of correctly classified instances is most likely the small size of the data set. With only 191<br />
patterns to learn from, the classifier does not have enough diversity in the examples to provide<br />
correct classifications and also does not find enough occurrences of the individual patterns to be<br />
able to pick the correct category. Since politi (police) by far is the most frequent feature in the<br />
EPAS list, many instances are wrongly assigned the category “police” by virtue of the majority<br />
vote of the nearest neighbour classification. An attempt to avoid this effect is described in test 3.<br />
Examining the instances where the classifier assigned the wrong category to an EPAS showed<br />
that in 27 of the incorrectly classified cases, the assigned category was semantically similar to<br />
the correct category. This suggests that the list in itself does contain some relevant information<br />
about the distribution of argument 1 in the data set. Example (4-5) below shows the correct<br />
categories and the categories assigned by the classifier.<br />
(4- 5)<br />
Correct category Assigned category<br />
Anne kvinne (woman)<br />
Slåtten<br />
drapsmann (killer) gjerningsmann (perpetrator)<br />
etterforsker (investigator) politi (police)<br />
Fonn lensmann (deputy)<br />
politi (police)<br />
gjerningsmann (perpetrator) person (person)<br />
Kripos-spesialist(Kripos specialist) politi (police)<br />
kvinne (woman) 23-åring (23-year-old)<br />
65