10.04.2013 Views

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5 Final remarks<br />

5.1 Is a parser vital for the extraction process?<br />

An initial assumption during the development of the method in this thesis was that it was of high<br />

importance to found the extraction method on a syntactic parse of the text collection. As the<br />

reader will recall, the reasons for this assumption were elaborated in chapter 3 and will therefore<br />

not be discussed further here. However, as a means of evaluating the extraction method, the<br />

texts in the text collection were processed using the Oslo-Bergen Tagger (OBT <strong>2005</strong>). This is a<br />

part of speech (POS) tagger which among other options offers the user a syntactic<br />

disambiguation of the input text. The texts were POS tagged using the web version of the tagger<br />

and structures corresponding to subject-verb-object relations were manually extracted from the<br />

output. This yielded a list of 169 structures, 26 of them with pronouns. The structures were<br />

extracted using a quite rudimentary method; for example no differences were made between<br />

active and passive versions of the same predicate. This resulted in a list featuring exactly the<br />

problematic issues discussed in chapter 3; the arguments were represented (and subsequently<br />

structured) not according to thematic roles, but merely according to their syntactic roles in the<br />

sentence. As a result, the list did not reflect characteristic arguments of the different predicates<br />

to the same degree as the EPAS list did. The list of the POS-based structures is available in<br />

Appendix F.<br />

Consider the FCA diagram in Figure 7 below. Figure 7 shows part of the FCA diagram created<br />

for the POS-based structures; the section of the diagram with the argument politi (police) as<br />

starting point is highlighted. When comparing this figure to the corresponding figure for the<br />

EPAS list (Figure 5 in section 3.6.4), it is quite clear that the POS-based list is significantly less<br />

generalisable. There are no clear groupings of arguments which display specific behaviour<br />

through their combination with a certain subset of predicates. Because formal subjects of both<br />

active and passive sentences are realised as first arguments in this extraction, it is hardly<br />

possible to group arguments into groups of semantically related words based on their<br />

distribution. As can be seen from the diagram, politi co-occurs with both sykepleiestudent<br />

(student nurse) and bilfører (driver), as well as other, more relevant, arguments.<br />

82

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!