21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Portuguese non-name words. Apart from that, the ambiguity distribution is almost the<br />

same as in the mixed Borba-Ramsey Corpus (where the portion of unanalysed<br />

Portuguese words is higher, 0.2-0.3%, due at least in part to scientific and dialectal text<br />

contributions).<br />

In 1.6% of all cases, a PROP tag was applied heuristically, - to capitalised words<br />

that could not be given another analysis without orthographical change (in mid<br />

sentence), or even after orthographical alteration (sentence initially).<br />

3.2.2 Word class specific morphological ambiguity<br />

In order to know where the CG-rules could be made to be most effective, or, in other<br />

words, for which cases it was worth the trouble to write a lot of rules, I was interested in<br />

getting a more detailed picture of Portuguese morphological ambiguity. For the closed<br />

word classes (PRP, KS, KC, IN, DET, SPEC, PERS, NUM) ambiguity classes can be<br />

taken directly from the lexicon, and it would in principle be possible to write rules for<br />

every single word. For the open word classes (N, ADJ, PROP, V, ADV 76 ), however, a<br />

statistical approach seemed appropriate to assess the magnitude of the problem.<br />

Table (1) shows the numbers for a 170.666 word VEJA newspaper corpus,<br />

containing 121.170 words (71%) that are assigned at least one open word class reading.<br />

The basis for measuring ambiguity was a version of the parser that uses certain 3 verbal<br />

portmanteau tags not used in 3.5.1, as well as some word internal disambiguation (cp.<br />

3.4). The resulting reduction in overall ambiguity from 2.0 to 1.7 has to be borne in<br />

mind when comparing the word class specific figures below with the findings in 3.5.1.<br />

I have split up the V class into finite verbs (VFIN) and three non-finite<br />

subclasses, INF, GER and PCP, both because they show a syntactically completely<br />

different behaviour, and because the non-finite classes with their well-defined ending<br />

('ar/er/ir' for INF, 'ando/indo' for GER and ado/ido' for PCP) can be expected to show<br />

their own, narrow ambiguity pattern. That the latter is quite distinct from that of finite<br />

verbs, can be seen form the low numbers for VFIN-INF, VFIN-GER and VFIN-PCP<br />

("verb internal") ambiguity, respectively. The somewhat higher figure for VFIN-INF is<br />

due to the fact that the Portuguese infinitive can be inflected - yielding ambiguity with<br />

future subjunctive readings.<br />

(1) Table: PoS-ambiguity class frequencies<br />

N ADJ VFIN INF GER PCP ADV PROP all ambiguous<br />

PoS pairs<br />

N 2188 9273 10959 766 6 2197 2057 1940 29386<br />

ADJ 241 2369 113 9 2334 1168 916 16423<br />

76 This class does contain both a kind of "closed subclass" and the open class in '-mente', but is here treated as one.<br />

- 118 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!