21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Table: Name heuristics - decision table<br />

Preceding context<br />

Competing analysis<br />

underived, pre-name<br />

class<br />

'Senhor'<br />

sentenceinitial<br />

after only<br />

capitalised<br />

words:<br />

"headline"<br />

- 44 -<br />

after lower<br />

case word<br />

lexical lexical lexical lexical<br />

after name or<br />

pre-name<br />

noun<br />

underived, not pre- lexical lexical lexical lexical/PROP<br />

name class<br />

'Concordo'<br />

(older version:<br />

lexical/PROP)<br />

long root,<br />

derivational<br />

'Palestr-inha'<br />

lexical lexical/PROP lexical/PROP lexical/PROP<br />

short root,<br />

derivational<br />

'Cas-ina'<br />

lexical/PROP lexical/PROP lexical/PROP lexical/PROP<br />

none PROP PROP PROP PROP<br />

Originally I worked with a very "soft" definition of a pre-name context (all words<br />

that are not capitalised plus lexical pre-name expressions, even if they are<br />

capitalised), and most capitalised words would get both the lexical and the nameheuristic<br />

analysis. This kind of cautiousness is typical for the parsing system, and<br />

exploits its "progressive level" characteristics - ambiguity not resolved on one level,<br />

will be treated with better tools on the next. In this case, context sensitive Constraint<br />

Grammar rules would do the job.<br />

There is, however, a reason for excluding ordinary lower case words from the<br />

pre-name context, at least where the competing analysis is non-derivational (i.e.<br />

inherently probable): Compound names retain more of their internal structure in the<br />

analysis, if compound initial (capitalised) adjectives or pre-name nouns (titles etc.)<br />

are tagged as ADJ or N (3b), respectively, than in an all-name chain analysis (3a):<br />

(3a) Escola PROP @NPHR Santa PROP @N< Cecília PROP @N<<br />

(3b) Escola N @NPHR Santa ADJ @>N Cecília PROP @N<<br />

The price for the more fine grained analysis in (3b) is the risk of the tagger's not<br />

handing a PROP analysis at all to the CG-disambiguation module in the case of<br />

isolated upper case words that have a clear (non-derived) alternative analysis, like in<br />

Bárbara and Xavier, which both are simple adjectives in the lexicon (with the<br />

meaning of 'barbaric' and 'annoying', respectively.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!