Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005 Unni Cathrine Eiken February 2005

10.04.2013 Views

which is the desired output category. The comma-separated values format was used for the EPAS classification. In order to classify the constituents of each EPAS based on the contextual patterns in the structure, each part of the EPAS was classified with reference to the other constituents in it. Somewhat analogous to the way that a screw can be described as being small, long and containing no holes, the argument åsted (crime scene) can be described through its cooccurrence with the predicate ankomme (arrive) and the argument etterforsker (investigator) (example (4-1)). This makes it possible to train a classifier on the EPAS list, using the argument which’s environment is to be learned as category label, and each constituent in the EPAS as features. To avoid that the category was explicitly present in the training material and ensure that the classifier was trained only on the environment of the desired category, the relevant feature was ignored using TiMBL’s ignore option. In order to classify the structures once for each argument type, two different data sets were prepared. Example (4-1) shows the format of the three-feature dataset that was used. The parentheses indicate that the feature in question was ignored when training and classifying. (4- 1) features category a. predicate, (argument 1), argument 2 argument 1 b. predicate, argument 1, (argument 2) argument 2 Example (4-2) shows excerpts of the two input files: (4-2a) shows the structures with argument 1 as category, while (4-2b) shows the same structures with argument 2 as the category. The classifier is given two constituents of an EPAS to learn from and the target constituent is given as the EPAS’ category. (4- 2) a. ankomme,etterforsker,?,etterforsker ankomme,etterforsker,?,etterforsker ankomme,etterforsker,åsted,etterforsker antyde,politi,?,politi avhøre,?,person,? avhøre,?,vedkommende,? avhøre,politi,vitne,politi 62

. ankomme,etterforsker,?,? ankomme,etterforsker,?,? ankomme,etterforsker,åsted,åsted antyde,politi,?,? avhøre,?,person,person avhøre,?,vedkommende,vedkommende avhøre,politi,vitne,vitne The output file that is created when TiMBL has classified the input data and run a test with the test data consists of the input given in the test set with the category predicted by TiMBL added at the end of each line. Further, the output supplied by TiMBL upon a successful training and testing round gives information about the actions in the various stages of analysis. TiMBL’s actions can be divided into three separate phases; in phase 1 the training data is analysed, in phase 2 the items in the training data are stored for efficient use during testing and in phase 3 the trained classifier is applied to the test set. For the purposes of the EPAS analysis, the default algorithm was used in the test phase. This algorithm computes the similarity between a test and a training item in terms of weighted overlap; the total difference between two patterns is the sum of relevance weights of those features which are not equal (Daelemans et al. 2003, p. 13). The classification of the EPAS and the subsequent testing was carried out in two distinct steps; classification and testing of argument 1 and argument 2 was done separately. The results of the classification and testing is described in the following sections. 4.1.2.1 Classifying argument 1 Several experiments were run through TiMBL with the aim of classifying occurrences of argument 1 according to the environment they occur in. The classifier was trained on all EPAS not containing pronouns and then tested. For the purpose of classifying occurrences of argument 1, an EPAS list with the relevant argument 1 as category label was used. In the following descriptions of the performed tests, this list will be referred to as EPAS_arg1. Test 1 Training set: EPAS_arg1 with no pronouns, argument 1 ignored. Test set: EPAS with pronouns in argument 1 position. Result: 57,69% (15/26) correct classifications 63

which is the desired output category. The comma-separated values format was used for the<br />

EPAS classification. In order to classify the constituents of each EPAS based on the contextual<br />

patterns in the structure, each part of the EPAS was classified with reference to the other<br />

constituents in it. Somewhat analogous to the way that a screw can be described as being small,<br />

long and containing no holes, the argument åsted (crime scene) can be described through its cooccurrence<br />

with the predicate ankomme (arrive) and the argument etterforsker (investigator)<br />

(example (4-1)). This makes it possible to train a classifier on the EPAS list, using the argument<br />

which’s environment is to be learned as category label, and each constituent in the EPAS as<br />

features. To avoid that the category was explicitly present in the training material and ensure<br />

that the classifier was trained only on the environment of the desired category, the relevant<br />

feature was ignored using TiMBL’s ignore option. In order to classify the structures once for<br />

each argument type, two different data sets were prepared. Example (4-1) shows the format of<br />

the three-feature dataset that was used. The parentheses indicate that the feature in question was<br />

ignored when training and classifying.<br />

(4- 1)<br />

features category<br />

a. predicate, (argument 1), argument 2 argument 1<br />

b. predicate, argument 1, (argument 2) argument 2<br />

Example (4-2) shows excerpts of the two input files: (4-2a) shows the structures with argument<br />

1 as category, while (4-2b) shows the same structures with argument 2 as the category. The<br />

classifier is given two constituents of an EPAS to learn from and the target constituent is given<br />

as the EPAS’ category.<br />

(4- 2)<br />

a. ankomme,etterforsker,?,etterforsker<br />

ankomme,etterforsker,?,etterforsker<br />

ankomme,etterforsker,åsted,etterforsker<br />

antyde,politi,?,politi<br />

avhøre,?,person,?<br />

avhøre,?,vedkommende,?<br />

avhøre,politi,vitne,politi<br />

62

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!