21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The core of PALMORF is written in C and runs on UNIX or MacOS platforms,<br />

tagging roughly 1000 words a second (preprocessing included). It consists of about<br />

4000 lines of source code (+ most of the ANSI library), some 2000 lines of<br />

grammatical inflexion and derivation rules, and a 75.000 entry electronic lexicon.<br />

Due to the way the lexicon is organised at run time, the program requires some 8 MB<br />

of free RAM. For additional pre- and postprocessing, PALMORF is aided by a<br />

number of smaller filter programs written in Perl.<br />

2.2.2 Program architecture<br />

2.2.2.1 Program modules<br />

Below, the basic "flow chart" structure of the PALMORF program is explained.<br />

Basically, there is a choice between one-word-only direct analysis and file-based 8<br />

running text analysis, the latter featuring preprocessing and heuristics modules where<br />

also polylexicals, abbreviations, orthographic variation and sentence boundaries can<br />

be handled, as well as some simple context dependent heuristics. Both program paths<br />

make use of the same inflexion and derivation modules, that are applied recursively<br />

until an analysis is found, and hereafter, until all analyses of the same or lower<br />

derivational depth are found. A more detailed discussion of the program architecture<br />

of PALMORF can be found in the appendix section.<br />

direct analysis<br />

word form analysis<br />

findword<br />

whole word search<br />

inflexion morpheme<br />

analysis<br />

8 Of course, this version can not only handle files, but - via unix program chaining - also individual chunks of text<br />

entered via the keyboard or an html-form.<br />

prefix<br />

root lexicon search<br />

lexicon organisation<br />

search trees<br />

suffix<br />

analysis<br />

- 17 -<br />

text file analysis<br />

PREPROCESSOR<br />

polylexicals+<br />

capitalisation<br />

numbers<br />

punctuation<br />

abbreviations+<br />

hyphenation+<br />

INPUT<br />

RUNNING TEXT ANALYSIS<br />

word form analysis

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!