Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005 Unni Cathrine Eiken February 2005

10.04.2013 Views

The classifier was created with EPAS_arg1 with no pronouns as training set and tested with all EPAS containing pronouns in the position of argument 1. For the test set, each EPAS was completed with the antecedent for its pronoun. For reasons of classification and testing, the antecedent was appended at the end of each EPAS, thus functioning as the category label for the structure. In total, there were 26 EPAS with pronouns in the position of argument 1. (4- 3) below shows an example from the test file with pronouns as argument 1: (4- 3) få,pron,rapport,politi When classifying with argument 1 as category label and testing with EPAS with pronouns as argument 1, TiMBL assigned the correct category in 57,69% (15/26) of the test cases. One of the cases where the classifier had assigned the “wrong” category was actually not incorrect, the antecedent was of a form that did not exist in the training material (antecedent: kvinne/vitne (woman/witness), assigned category: vitne (witness)). Furthermore, in six of the incorrectly assigned categories, the category chosen by the classifier was semantically close to the correct antecedent. Example (4-4) below shows the seven examples where the incorrect categories assigned by the classifier can in fact be viewed as belonging to the same semantic group, and thus at least as a partially successful classification. Regarding all these instances as successful category assignments would heighten the classifier’s correct categorisations to 84,61% (22/26). (4- 4) Correct antecedent Assigned category kvinne/vitne (woman/witness) vitne (witness) Fonn (Fonn) politi (police) Kripos-spesialist (Kripos specialist) politi (police) politimester (police chief) Fonn (Fonn) politi (police) etterforsker (investigator) politi (police) Fonn (Fonn) Slåtten (Slåtten) kvinne (woman) 64

Test 2 Training set: EPAS_arg1 with no pronouns, argument 1 ignored. Test method: leave-one-out Result: 42,40% (81/191) correct classifications When training and testing on the EPAS_arg1 list with pronouns removed, the classifier produced a quite poor accuracy of 42,40%. TiMBL’s leave-one-out option makes it possible to train and test on the same material, as each pattern in the training file is used as a test case while the rest of the patterns are used as training material. One reason for the relatively low percentage of correctly classified instances is most likely the small size of the data set. With only 191 patterns to learn from, the classifier does not have enough diversity in the examples to provide correct classifications and also does not find enough occurrences of the individual patterns to be able to pick the correct category. Since politi (police) by far is the most frequent feature in the EPAS list, many instances are wrongly assigned the category “police” by virtue of the majority vote of the nearest neighbour classification. An attempt to avoid this effect is described in test 3. Examining the instances where the classifier assigned the wrong category to an EPAS showed that in 27 of the incorrectly classified cases, the assigned category was semantically similar to the correct category. This suggests that the list in itself does contain some relevant information about the distribution of argument 1 in the data set. Example (4-5) below shows the correct categories and the categories assigned by the classifier. (4- 5) Correct category Assigned category Anne kvinne (woman) Slåtten drapsmann (killer) gjerningsmann (perpetrator) etterforsker (investigator) politi (police) Fonn lensmann (deputy) politi (police) gjerningsmann (perpetrator) person (person) Kripos-spesialist(Kripos specialist) politi (police) kvinne (woman) 23-åring (23-year-old) 65

The classifier was created with EPAS_arg1 with no pronouns as training set and tested with all<br />

EPAS containing pronouns in the position of argument 1. For the test set, each EPAS was<br />

completed with the antecedent for its pronoun. For reasons of classification and testing, the<br />

antecedent was appended at the end of each EPAS, thus functioning as the category label for the<br />

structure. In total, there were 26 EPAS with pronouns in the position of argument 1. (4- 3) below<br />

shows an example from the test file with pronouns as argument 1:<br />

(4- 3)<br />

få,pron,rapport,politi<br />

When classifying with argument 1 as category label and testing with EPAS with pronouns as<br />

argument 1, TiMBL assigned the correct category in 57,69% (15/26) of the test cases. One of<br />

the cases where the classifier had assigned the “wrong” category was actually not incorrect, the<br />

antecedent was of a form that did not exist in the training material (antecedent: kvinne/vitne<br />

(woman/witness), assigned category: vitne (witness)). Furthermore, in six of the incorrectly<br />

assigned categories, the category chosen by the classifier was semantically close to the correct<br />

antecedent. Example (4-4) below shows the seven examples where the incorrect categories<br />

assigned by the classifier can in fact be viewed as belonging to the same semantic group, and<br />

thus at least as a partially successful classification. Regarding all these instances as successful<br />

category assignments would heighten the classifier’s correct categorisations to 84,61% (22/26).<br />

(4- 4)<br />

Correct antecedent Assigned category<br />

kvinne/vitne (woman/witness) vitne (witness)<br />

Fonn (Fonn) politi (police)<br />

Kripos-spesialist (Kripos specialist) politi (police)<br />

politimester (police chief) Fonn (Fonn)<br />

politi (police) etterforsker (investigator)<br />

politi (police) Fonn (Fonn)<br />

Slåtten (Slåtten) kvinne (woman)<br />

64

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!