Unni Cathrine Eiken February 2005
Unni Cathrine Eiken February 2005
Unni Cathrine Eiken February 2005
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.3 Parsing with NorGram<br />
To be able to extract the EPAS from the text in a semi-automatic fashion, some sort of linguistic<br />
analysis of the texts is needed. One problem with working on a small language like Norwegian<br />
is that the linguistic tools you might need in the process just are not fully developed yet. Velldal<br />
(2003) describes a project where a set of Norwegian nouns are grouped into semantic classes<br />
based on their distribution over a large body of text. A word’s distribution in different contexts<br />
is represented as a feature vector in a semantic space model. In his project, Velldal addresses the<br />
problem of a lacking parser for Norwegian by stating that there does not exist any syntactic<br />
parser for Norwegian. Instead, he uses a shallow processing tool on a tagged corpus. The<br />
processing tool “translates” the tagged structures into predicate-argument structures, overcoming<br />
the need for a parser by only analysing those parts of the text relevant for the extraction of the<br />
needed structures. As has been explained in section 3.2, an extraction method that is based on<br />
surface structures and does not take semantic relations into account, might produce results that<br />
are unsuitable both for subsequent use in anaphora resolution and for generalisation of concepts.<br />
In view of this, the present work has aimed at developing an extraction method that uses parsed<br />
text to collect the meaning structures from the text.<br />
Although it is true that there does not exist any parser that fully covers the Norwegian language<br />
at the moment, there are a few alternative parsers available. Even if these grammars are not<br />
entirely robust enough to return parses on randomly chosen texts, they can be used for the<br />
experiments outlined in this project. The extraction method described in this thesis implements<br />
one of the existing parsing tools for Norwegian bokmål, NorGram (NorGram 2004).<br />
Since there are no easy-to-use automated tools available for use in the extraction process,<br />
obtaining the EPAS from the text involved a substantial amount of manual work, even when<br />
using a parser to automate the extraction. Parsing the texts was definitely of value, though, since<br />
once the texts were parsed and there was a syntactic analysis to work on, the EPAS could more<br />
readily be extracted. Because of the modular nature of the extraction method, the extraction<br />
process is not parser-dependent. Should a new and more robust grammar become available, the<br />
extraction method can be modified to accommodate this. The next section of this chapter briefly<br />
describes how the NorGram/XLE parser was used in the project, while section 3.3.2 describes in<br />
greater detail how the EPAS were extracted from the parser’s output.<br />
42