21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

compiled into a computer program that takes as input morphologically processed, but<br />

still fully ambiguous text, as provided by lexicon and inflexion rule based analysers like<br />

the one used in my own system, or the TWOL analysers (in the Helsinki systems, cp.<br />

Koskenniemi, 1983). The multiple ambiguity represented by alternative tag lines, will<br />

optimally be reduced to only one line 99 (the correct reading) by the CG-rule system.<br />

(1) Constraint grammar input (morphological analyser output)<br />

""<br />

"nunca" ADV<br />

""<br />

"como" ADV<br />

"como" ADV<br />

"como" KS<br />

"como" V PR 1S VFIN<br />

""<br />

"peixe" N M S<br />

""<br />

[ADV=adverb, KS=subordinating conjunction, V=verb, N=noun, PR=present tense, S=singular, M=maskuline, 1=1.person,<br />

VFIN=finite verb, =relative, =interrogative, =monotransitive]<br />

The four readings 100 of the word form 'como' are - in CG terminology - called a cohort.<br />

A typical CG-rule 101 for disambiguating this ambiguity might be the following:<br />

(2) SELECT (VFIN) IF (NOT *-1 VFIN) (NOT *1 VFIN)<br />

[select for a given word form the VFIN reading (finite verb) if there is no (NOT) - neither to the left (*-1)<br />

nor the right (*1) - other word that can be VFIN.] 102<br />

By first adding ("mapping") all 103 possible syntactic functions onto a word form,<br />

conditioned by its word class, inflexion etc., and then disambiguating this syntactic<br />

99 Of course, in the case of true ambiguity (which is surprisingly rare in the world of corpus linguistics), two (or more)<br />

correct tag lines are possible and should then be preserved.<br />

100 The difference ADV and ADV is not really motivated by morphological word class, but expresses a<br />

semantic-functional distinction (the English translation is 'like' in the first case, and 'as' in the second). It is of great<br />

importance to polysemy resolution to determine which of a word's potential valency patterns has been instantiated in a given<br />

clause context, and which semantic class fills a given valency slot. Here valency tags (and selection restrictions) gain<br />

importance not only as secondary tags (that exclusively are used for the disambiguation of morphological/syntactic tags), but<br />

also as primary tags in their own right, which can and must be ambiguated, like for the word form 'revista' , where simple<br />

word class ambiguity (V-N) is turned into fourfold lexeme ambiguity:<br />

rever V 'see again' instantiated valency: transitive <br />

rever V 'leak through' instantiated valency: intransitive <br />

revista N 'news magazine' instantiated valency: title , semantic class: reading matter <br />

revista N 'inspection' instantiated semantic class: +CONTROL, +PERFECTIVE<br />

101 The notation convention used here is the one used by Pasi Tapanainen's cg2-compiler, which among other things replaces<br />

the older operators '@w=0' and '@w=!' with the ordinary English words 'REMOVE' med ' SELECT'.<br />

102 The rule has been simplified, presuming that every sentence contains at least one finite verb, which isn't always the case,<br />

in head lines, exclamations etc. The rule can be made safer by conditioning it on the existence of a full stop (*1<br />

PUNKTUM) or by exploiting the possible valency relation between the transitive verb comer and the 'safe' peixe (0 )<br />

(1C NP).<br />

- 147 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!