10.04.2013 Views

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 Classification<br />

In order to use the structures in the EPAS list as an aid in anaphora resolution, they have to be<br />

processed. The pre-processing in section 3.6.4 has shown that there does exist interesting<br />

distributions in the data set and indicates that certain groups of arguments display distributions<br />

particular for the domain. As a step toward exploring if these distributions can be used to<br />

represent selectional restrictions and thus function as real-world knowledge for the domain, the<br />

words in the EPAS list must be classified. This procedure uses the context pattens that a word<br />

occurs in to classify the word, for example allowing for an argument to be classified according<br />

to the predicates it co-occurs with. A classification of this type gives information about which<br />

word to expect in a given context pattern and the results can therefore be used in the process of<br />

chosing the most likely antecedent for an anaphor. In this respect, the most likely antecedent<br />

must be interpreted as the most likely antecedent given a particular contextual pattern.<br />

In the following, the EPAS list will first be classified to see if the context patterns represented<br />

by the EPAS contain enough information to suggest the correct antecedent for anaphoric<br />

expressions from the text collection. Then an association of concepts will be performed, creating<br />

bundles of those arguments which occur in similar contexts/with similar predicates. These<br />

concepts will then be applied in co-occurrence with the classification method to see if they<br />

improve the process of suggesting the correct antecedent for the anaphors.<br />

For the purposes of classification and testing, the EPAS list was divided into training and test<br />

sets. The test set consist of all structures containing pronouns, while the training set consists of<br />

the remaining EPAS. In the case of the test set, the correct antecedent for each pronoun was<br />

identified manually and added to the test file. When testing with the test instances, the classifier<br />

assigns an antecedent based on the patterns it has seen in the training set. In this way, the correct<br />

antecedent in each test case functions as a means of measuring the success rate of the<br />

classification. The test set provides a good way of testing the product of the classification and<br />

gives a measure as to whether the correct antecedent can be assigned based on training on<br />

occurrences of EPAS/context patterns.<br />

58

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!