Unni Cathrine Eiken February 2005
Unni Cathrine Eiken February 2005
Unni Cathrine Eiken February 2005
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
The Prolog code extract in (3-13) shows the MRS representation of the sentence in (3-12) by<br />
listing all the EPs in the sentence as well as the relationships that hold between the individual<br />
EPs. In simplified terms, the value of the attribute ‘semform’ holds the semantic form of the<br />
predicate, and the values of ‘ARG1’ and ‘ARG2’ point to the EPs where the semantic forms for<br />
argument 1 and argument 2 can be found. In order to extract all EPAS from such a Prolog file,<br />
one must go through all the EPs in turn, and find the semantic forms of each main EP and its<br />
belonging argument 1 and argument 2. In the extraction process, this matching and tracing of<br />
values is performed by the script Ekstraktor.<br />
The algorithm behind Ekstraktor is divided into two more or less separate parts: information<br />
retrieval from the Prolog file and processing of the information that was found and stored. Perl<br />
was chosen as the programming language mainly because of its excellent pattern matching<br />
facilities. Perl offers a very powerful and flexible regular expression syntax which lets the<br />
programmer construct regular expressions that will handle all kinds of pattern matching. For the<br />
information retrieval part of Ekstraktor, it was desirable to go through an input file, check for<br />
various patterns and store parts of the input file relevant to how the patterns were matched. (3-<br />
14) shows one of the pattern checks in Ekstraktor – if the line read from the file contains the<br />
string:<br />
‘relation’),semform(<br />
the entire line is stored in the array semform.<br />
(3- 14)<br />
if ($linjeFraFil =~ m/'relation'\),semform\(/){<br />
push(@semform, $linjeFraFil);<br />
}<br />
By going through the input file line by line and checking for several patterns, all information<br />
relevant to extracting the EPAS is stored in a system of arrays. To be able to keep track of which<br />
EP the various values belong to, a system of two arrays for each argument type is used – one for<br />
EP number and one for argument value. The ARG0 arrays correspond to the predicates in the<br />
structures and for each, the semantic form can directly be found in the semform-array. The<br />
50