10.04.2013 Views

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The Prolog code extract in (3-13) shows the MRS representation of the sentence in (3-12) by<br />

listing all the EPs in the sentence as well as the relationships that hold between the individual<br />

EPs. In simplified terms, the value of the attribute ‘semform’ holds the semantic form of the<br />

predicate, and the values of ‘ARG1’ and ‘ARG2’ point to the EPs where the semantic forms for<br />

argument 1 and argument 2 can be found. In order to extract all EPAS from such a Prolog file,<br />

one must go through all the EPs in turn, and find the semantic forms of each main EP and its<br />

belonging argument 1 and argument 2. In the extraction process, this matching and tracing of<br />

values is performed by the script Ekstraktor.<br />

The algorithm behind Ekstraktor is divided into two more or less separate parts: information<br />

retrieval from the Prolog file and processing of the information that was found and stored. Perl<br />

was chosen as the programming language mainly because of its excellent pattern matching<br />

facilities. Perl offers a very powerful and flexible regular expression syntax which lets the<br />

programmer construct regular expressions that will handle all kinds of pattern matching. For the<br />

information retrieval part of Ekstraktor, it was desirable to go through an input file, check for<br />

various patterns and store parts of the input file relevant to how the patterns were matched. (3-<br />

14) shows one of the pattern checks in Ekstraktor – if the line read from the file contains the<br />

string:<br />

‘relation’),semform(<br />

the entire line is stored in the array semform.<br />

(3- 14)<br />

if ($linjeFraFil =~ m/'relation'\),semform\(/){<br />

push(@semform, $linjeFraFil);<br />

}<br />

By going through the input file line by line and checking for several patterns, all information<br />

relevant to extracting the EPAS is stored in a system of arrays. To be able to keep track of which<br />

EP the various values belong to, a system of two arrays for each argument type is used – one for<br />

EP number and one for argument value. The ARG0 arrays correspond to the predicates in the<br />

structures and for each, the semantic form can directly be found in the semform-array. The<br />

50

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!