10.04.2013 Views

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

Unni Cathrine Eiken February 2005

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.3 Parsing with NorGram<br />

To be able to extract the EPAS from the text in a semi-automatic fashion, some sort of linguistic<br />

analysis of the texts is needed. One problem with working on a small language like Norwegian<br />

is that the linguistic tools you might need in the process just are not fully developed yet. Velldal<br />

(2003) describes a project where a set of Norwegian nouns are grouped into semantic classes<br />

based on their distribution over a large body of text. A word’s distribution in different contexts<br />

is represented as a feature vector in a semantic space model. In his project, Velldal addresses the<br />

problem of a lacking parser for Norwegian by stating that there does not exist any syntactic<br />

parser for Norwegian. Instead, he uses a shallow processing tool on a tagged corpus. The<br />

processing tool “translates” the tagged structures into predicate-argument structures, overcoming<br />

the need for a parser by only analysing those parts of the text relevant for the extraction of the<br />

needed structures. As has been explained in section 3.2, an extraction method that is based on<br />

surface structures and does not take semantic relations into account, might produce results that<br />

are unsuitable both for subsequent use in anaphora resolution and for generalisation of concepts.<br />

In view of this, the present work has aimed at developing an extraction method that uses parsed<br />

text to collect the meaning structures from the text.<br />

Although it is true that there does not exist any parser that fully covers the Norwegian language<br />

at the moment, there are a few alternative parsers available. Even if these grammars are not<br />

entirely robust enough to return parses on randomly chosen texts, they can be used for the<br />

experiments outlined in this project. The extraction method described in this thesis implements<br />

one of the existing parsing tools for Norwegian bokmål, NorGram (NorGram 2004).<br />

Since there are no easy-to-use automated tools available for use in the extraction process,<br />

obtaining the EPAS from the text involved a substantial amount of manual work, even when<br />

using a parser to automate the extraction. Parsing the texts was definitely of value, though, since<br />

once the texts were parsed and there was a syntactic analysis to work on, the EPAS could more<br />

readily be extracted. Because of the modular nature of the extraction method, the extraction<br />

process is not parser-dependent. Should a new and more robust grammar become available, the<br />

extraction method can be modified to accommodate this. The next section of this chapter briefly<br />

describes how the NorGram/XLE parser was used in the project, while section 3.3.2 describes in<br />

greater detail how the EPAS were extracted from the parser’s output.<br />

42

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!