Unni Cathrine Eiken February 2005
Unni Cathrine Eiken February 2005
Unni Cathrine Eiken February 2005
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.4 Altering the source<br />
As already mentioned, parsing randomly selected Norwegian texts is not an entirely<br />
straightforward task. Although NorGram provides for a quite broad grammar, not all linguistic<br />
constructions are parsed and, more importantly, not all words are covered in the lexicon. Ideally,<br />
it would be desirable to collect a limited domain treebank consisting of parsed sentences of the<br />
original texts as I found them on the internet. In practice, this was not a feasible task. It early<br />
became evident that the texts to be analyzed would have to be simplified for practical reasons.<br />
For the purpose of classification, I needed to extract the EPAS present in the texts. All the other<br />
information that was included in every sentence was not essential or necessary for the project.<br />
Although aware that it would be more scientific, and in any respect better, to extract the EPAS<br />
from original texts that have not been tampered with by me, this was not possible within the<br />
framework of this thesis. Given that I would have to simplify the texts in any case, I decided to<br />
cut most information that was irrelevant for the extraction of the (most central) EPAS. This<br />
process was performed on alle sentences in the text collection. Mainly adverbial phrases were<br />
excluded, on the basis that they would not be included in the extracted EPAS in any case. The<br />
examples in (3-11) below illustrate a typical example:<br />
(3- 11)<br />
a. Original sentence:<br />
Etter at hun ble funnet opplyste et vitne at hun hadde hørt<br />
høye rop om hjelp fra stedet tidlig søndag morgen.<br />
After she was found a witness informed that she had heard loud screams for<br />
help from the area early Sunday morning.<br />
b. Simplified form:<br />
Et vitne opplyste at hun hadde hørt høye rop.<br />
A witness informed that she had heard loud screams.<br />
c. Extracted structures:<br />
høre,vitne,rop<br />
høy,rop,?<br />
opplyse,vitne,?<br />
hear, witness, scream<br />
loud, scream,?<br />
inform, witness,?<br />
47