10.04.2013 Views

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The pre-editing of the text collection will naturally have affected the resulting EPAS list. All<br />

structures from the original texts are not extracted and the EPAS list as a consequence does not<br />

include all relevant context patterns for the domain. Still, for the purposes of a pilot study such<br />

as in this thesis, the central structures, which display the most typical context patterns for the<br />

domain, include enough information to provide a tendency of the usefulness of the method. For<br />

the purpose of subsequent analyses, the extraction process can easily be performed on unedited<br />

original texts.<br />

3.5 Finding the words<br />

The process of extracting meaning structures such as the EPAS from the texts in the text<br />

collection is a substantial undertaking. It is also quite a tedious task, and since tedious tasks tend<br />

to benefit from being automated I wrote the Perl script Ekstraktor, which interprets the MRS-<br />

structures of a sentence and thereby puts together the EPAS for each parsed sentence. This<br />

chapter describes the outline of the automated extraction process.<br />

XLE provides the user with the choice of several output formats, including a graphical user<br />

interface that displays a tree graph of the parse as well as its F-structure and MRS-structure. By<br />

choice, the output can also be viewed as a file of Prolog predicates. In the process of extracting<br />

the EPAS, Ekstraktor reads the Prolog output, saves relevant information in a system of arrays,<br />

and subsequently performs several tests and actions on the stored information in order to present<br />

a list of all EPAS found in the parsed sentence.<br />

The MRS structures as represented in the Prolog output, provides all the information needed to<br />

extract the EPAS. Initially, the main EP with belonging arguments must be found. Since for the<br />

purpose of this paper, the linguistic structures analyzed are limited to full sentences, the main EP<br />

must display the category ‘v’ for verb. Once the main EP is identified, the semantic values for it<br />

and for its belonging arguments must be found. Subsequently, all the remaining predicateargument<br />

structures must be found. For them, there is no restriction as to which category they<br />

have. Consider the sentence shown in (3-12) together with an extract of the Prolog output of the<br />

parse shown in (3-13):<br />

48

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!