Unni Cathrine Eiken February 2005
Unni Cathrine Eiken February 2005 Unni Cathrine Eiken February 2005
which is the desired output category. The comma-separated values format was used for the EPAS classification. In order to classify the constituents of each EPAS based on the contextual patterns in the structure, each part of the EPAS was classified with reference to the other constituents in it. Somewhat analogous to the way that a screw can be described as being small, long and containing no holes, the argument åsted (crime scene) can be described through its cooccurrence with the predicate ankomme (arrive) and the argument etterforsker (investigator) (example (4-1)). This makes it possible to train a classifier on the EPAS list, using the argument which’s environment is to be learned as category label, and each constituent in the EPAS as features. To avoid that the category was explicitly present in the training material and ensure that the classifier was trained only on the environment of the desired category, the relevant feature was ignored using TiMBL’s ignore option. In order to classify the structures once for each argument type, two different data sets were prepared. Example (4-1) shows the format of the three-feature dataset that was used. The parentheses indicate that the feature in question was ignored when training and classifying. (4- 1) features category a. predicate, (argument 1), argument 2 argument 1 b. predicate, argument 1, (argument 2) argument 2 Example (4-2) shows excerpts of the two input files: (4-2a) shows the structures with argument 1 as category, while (4-2b) shows the same structures with argument 2 as the category. The classifier is given two constituents of an EPAS to learn from and the target constituent is given as the EPAS’ category. (4- 2) a. ankomme,etterforsker,?,etterforsker ankomme,etterforsker,?,etterforsker ankomme,etterforsker,åsted,etterforsker antyde,politi,?,politi avhøre,?,person,? avhøre,?,vedkommende,? avhøre,politi,vitne,politi 62
. ankomme,etterforsker,?,? ankomme,etterforsker,?,? ankomme,etterforsker,åsted,åsted antyde,politi,?,? avhøre,?,person,person avhøre,?,vedkommende,vedkommende avhøre,politi,vitne,vitne The output file that is created when TiMBL has classified the input data and run a test with the test data consists of the input given in the test set with the category predicted by TiMBL added at the end of each line. Further, the output supplied by TiMBL upon a successful training and testing round gives information about the actions in the various stages of analysis. TiMBL’s actions can be divided into three separate phases; in phase 1 the training data is analysed, in phase 2 the items in the training data are stored for efficient use during testing and in phase 3 the trained classifier is applied to the test set. For the purposes of the EPAS analysis, the default algorithm was used in the test phase. This algorithm computes the similarity between a test and a training item in terms of weighted overlap; the total difference between two patterns is the sum of relevance weights of those features which are not equal (Daelemans et al. 2003, p. 13). The classification of the EPAS and the subsequent testing was carried out in two distinct steps; classification and testing of argument 1 and argument 2 was done separately. The results of the classification and testing is described in the following sections. 4.1.2.1 Classifying argument 1 Several experiments were run through TiMBL with the aim of classifying occurrences of argument 1 according to the environment they occur in. The classifier was trained on all EPAS not containing pronouns and then tested. For the purpose of classifying occurrences of argument 1, an EPAS list with the relevant argument 1 as category label was used. In the following descriptions of the performed tests, this list will be referred to as EPAS_arg1. Test 1 Training set: EPAS_arg1 with no pronouns, argument 1 ignored. Test set: EPAS with pronouns in argument 1 position. Result: 57,69% (15/26) correct classifications 63
- Page 17 and 18: eferring to BT. The NP which is lin
- Page 19 and 20: esolution system will not be able t
- Page 21 and 22: (2- 12) REC SUBJ EXIST OBJ IND-OBJ
- Page 23 and 24: Figure 1 17
- Page 25 and 26: means that the algorithm would prop
- Page 27 and 28: for an overview). Many of these sys
- Page 29 and 30: (2- 15) a. Politiet etterlyste i da
- Page 31 and 32: section. The theory dates back to t
- Page 33 and 34: 2.2.2 Different types of context So
- Page 35 and 36: neighbours. For example, a target w
- Page 37 and 38: with it. Selectional constraints al
- Page 39 and 40: 3 From text to EPAS - the extractio
- Page 41 and 42: 3.2 Predicate-argument structures "
- Page 43 and 44: speaker flexibility with regards to
- Page 45 and 46: and woman occur together both in su
- Page 47 and 48: occur with. Arguments which are unl
- Page 49 and 50: 3.3.1 NorGram in outline Norsk komp
- Page 51 and 52: Figure 3 The most useful structure
- Page 53 and 54: 3.4 Altering the source As already
- Page 55 and 56: (3- 12) (3- 13) Politiet leter ette
- Page 57 and 58: ARG1 and ARG2 arrays display a valu
- Page 59 and 60: (3- 20) Anne Slåtten bodde i et st
- Page 61 and 62: value and highly desirable. As such
- Page 63 and 64: this project, this can be interpret
- Page 65 and 66: The process of classifying the cons
- Page 67: There are several different distanc
- Page 71 and 72: Test 2 Training set: EPAS_arg1 with
- Page 73 and 74: The training and test material was
- Page 75 and 76: • level 0: words which co-occur w
- Page 77 and 78: (4- 9) avklare,obduksjon,? bede-om,
- Page 79 and 80: (4-10) below shows the output for t
- Page 81 and 82: In the introduction to this chapter
- Page 83 and 84: the EPAS can be used in the classif
- Page 85 and 86: exemption of jobbe-utfra, none of t
- Page 87 and 88: antecedent for (4-15a). In the case
- Page 89 and 90: Figure 7 Interestingly enough, howe
- Page 91 and 92: When testing on knowledge-dependent
- Page 93 and 94: Firth, J. R. (1957): A synopsis of
- Page 95 and 96: Appendix A: Ekstraktor.pl - algorit
- Page 97 and 98: finnARG2(); This function has exact
- Page 99 and 100: #legger lest linje inn i @prt derso
- Page 101 and 102: sub fjernEP{ #fjerner elementer fra
- Page 103 and 104: } splice(@ARGx); $imax = @ARG3ep; @
- Page 105 and 106: } else{ } } } push(@liste, $ARG0ep[
- Page 107 and 108: 101 Appendix C: the EPAS list 23-å
- Page 109 and 110: 103 obdusere,,kvinne observere,,23-
- Page 111 and 112: Appendix D: Text aligned with EPAS
- Page 113 and 114: eventualiteter. Vi varslet Kripos.
- Page 115 and 116: Etterforskerne har flere observasjo
- Page 117 and 118: # Subrutine som tar inn argumentnum
which is the desired output category. The comma-separated values format was used for the<br />
EPAS classification. In order to classify the constituents of each EPAS based on the contextual<br />
patterns in the structure, each part of the EPAS was classified with reference to the other<br />
constituents in it. Somewhat analogous to the way that a screw can be described as being small,<br />
long and containing no holes, the argument åsted (crime scene) can be described through its cooccurrence<br />
with the predicate ankomme (arrive) and the argument etterforsker (investigator)<br />
(example (4-1)). This makes it possible to train a classifier on the EPAS list, using the argument<br />
which’s environment is to be learned as category label, and each constituent in the EPAS as<br />
features. To avoid that the category was explicitly present in the training material and ensure<br />
that the classifier was trained only on the environment of the desired category, the relevant<br />
feature was ignored using TiMBL’s ignore option. In order to classify the structures once for<br />
each argument type, two different data sets were prepared. Example (4-1) shows the format of<br />
the three-feature dataset that was used. The parentheses indicate that the feature in question was<br />
ignored when training and classifying.<br />
(4- 1)<br />
features category<br />
a. predicate, (argument 1), argument 2 argument 1<br />
b. predicate, argument 1, (argument 2) argument 2<br />
Example (4-2) shows excerpts of the two input files: (4-2a) shows the structures with argument<br />
1 as category, while (4-2b) shows the same structures with argument 2 as the category. The<br />
classifier is given two constituents of an EPAS to learn from and the target constituent is given<br />
as the EPAS’ category.<br />
(4- 2)<br />
a. ankomme,etterforsker,?,etterforsker<br />
ankomme,etterforsker,?,etterforsker<br />
ankomme,etterforsker,åsted,etterforsker<br />
antyde,politi,?,politi<br />
avhøre,?,person,?<br />
avhøre,?,vedkommende,?<br />
avhøre,politi,vitne,politi<br />
62