21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.4 Word internal (local) disambiguation<br />

Sometimes an ambiguous word form is assigned readings of differing complexity, that<br />

is, some analyses are made up of more derivational elements than others. However,<br />

"Karlsson's law" of minimal derivational complexity 86 (Karlsson, 1992, 1995) claims<br />

that in such cases the cohort can be made less ambiguous by rejecting all but the least<br />

complex readings, which in almost all cases prove to be the contextually correct ones.<br />

Though the law was not specifically formulated for Portuguese, it seems to hold for that<br />

language, too 87 .<br />

When the morphological analyser program searches for analyses of a given word,<br />

it first looks for whole roots and inflexional endings (step 1) , then for suffixation with<br />

or without inflexion (step 2). Implementing Karlsson’s law, the program only<br />

progresses to step 2, if no readings are found at step 1. Suffixation itself is analysed<br />

iteratively with increasing "suffixation depth" for each round (step 2a: one suffix, step<br />

2b: two suffixes etc.), maximum depth being 4 at the moment. Again, the process only<br />

goes on to the next round (depth), if no analyses are found. Thus the analysis cohort<br />

only contains the "shortest" readings, saving time and disambiguation effort.<br />

Prefixation (step 3), though, is more problematic. Only undertaking step 3, if no<br />

analyses are found in step 1 and 2, would mean possibly neglecting a 2-element analysis<br />

with prefix and root only, just because the program already has found a - say - 4element<br />

reading involving 3 suffixes. So, prefixation is done whenever suffixation has<br />

been done. For each prefix on the list step 1 and 2, too, are undertaken for the remaining<br />

part of the word. As before, depth is increased step by step if no analysis is found for<br />

that individual prefix, thus automatically discarding unnecessarily complex analyses.<br />

But, when searching for possible prefixes, the program has to look at all prefixes,<br />

because it cannot know in advance which particular prefix will yield the analysis with<br />

fewest elements, nor whether this analysis will be shorter than the shortest "suffixation<br />

only" analysis.<br />

Therefore, after completed analysis, word internal disambiguation is undertaken<br />

summarily on the resulting cohort, discarding all readings that have more than the<br />

minimum number of derivation elements for that cohort.<br />

When applied to the RNP literature corpus, local disambiguation - apart from<br />

obviously reducing overall ambiguity - has a peculiar "smoothing effect" on the<br />

ambiguity distribution curve by considerably lowering the percentage of 4-way<br />

ambiguous word forms (that previously had been higher than the one for 3-way<br />

86 Karlsson (1995) uses the term "local disambiguation" for this selection process, referring to the fact, that the rule<br />

concerned is applied to word forms in isolation, and does not make use of any context conditions whatsoever.<br />

87 The law was inspired by languages with productive compound formation, like Swedish and German, but can be extended<br />

to languages with few root compounds, as long as these languages have productive affixation, like English (Karlsson et. al.,<br />

1995) and, here, Portuguese. Though Karlsson's law is of a heuristic nature, it is all but impossible to find counter-examples.<br />

- 129 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!