21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

strings ending in or containing '.', '-' or '/' are checked in the lexicon. If<br />

they do not figure there as abbreviations, the string is split up in ordinary<br />

word strings and $-prefixed punctuation marks. In ambiguous cases (e.g.<br />

sentence final abbreviations) a few simple context dependent rules are<br />

used.<br />

hyphenation<br />

hyphenated strings are split up, but the hyphen is retained as a suffixed<br />

marker at the end of the word originally preceding it. After individual<br />

inflexional analysis in the main program hyphenated polylexicals can<br />

thus be "reassembled" later on, and checked against the lexicon.<br />

enclitics<br />

as part of the hyphenation analysis pronominal enclitics are identified,<br />

isolated and morphologically standardised. If followed by inflexional<br />

elements, these are "glued" to the preceding verb (e.g. "dar-lhes-ei" -><br />

darei- lhes)<br />

text file analysis<br />

next word -> inflexion analysis (includes suffix module), -> prefix<br />

sends all non-$-strings to the main analysis module (punctuation marks,<br />

numbers etc. have been marked $ by the preprocessor)<br />

orthographic variation*<br />

changes oi/ou digraphs, brazilises European Portuguese spelling<br />

accentuation errors*<br />

removes, changes or adds accents in unanalysed words<br />

spelling errors*<br />

corrects a few common errors in unanalysed words, mostly ASCII<br />

problems (e.g. c -> ç, ao -> ão)<br />

propria heuristics<br />

assigns the PROP tag to unanalysed or heavily derivated capitalised<br />

words (restricted after full stop and by certain context sensitive rules<br />

searching for name chains and pre-name contexts.)<br />

non-propria heuristics<br />

assigns word class, inflexional and derivational tags by trying to do<br />

partial analyses of as large as possible a right hand chunk of any<br />

unanalysable word, recognising inflexion morphemes, suffixes and<br />

word class specific endings and attaching them to hypothesised 'xxx'<br />

roots.<br />

local disambiguation<br />

all but the least complex derivational readings are discarded<br />

output<br />

writes the remaining analyses to the _pars file, root, derivation, word<br />

class and inflexion<br />

cohort statistics<br />

ambiguity distribution analysis<br />

- 471 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!