21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.2 Morphological ambiguity in Portuguese<br />

3.2.1 Overall morphological ambiguity<br />

In order to quantitatively assess the ambiguity problem in Portuguese, before writing<br />

disambiguation rules, I ran the morphological tagger on two larger chunks of corpus,<br />

accessible to me at the time:<br />

(a) a 630.000-word ECI-excerpt from the Borba-Ramsey corpus of written Brazilian<br />

Portuguese<br />

(b) a 132.000 word corpus derived from the on-line data base of Brazilian literature in<br />

São Paulo (Rede Nacional de Pesquisa)<br />

Table (1) shows the number and percentage of word form tokens with 0, 1, 2 ... 20<br />

readings. The 1-readings row contains the figures for unambiguous cases, the 0readings<br />

row covers recall failures.<br />

(1) Table: morphological ambiguity in Portuguese<br />

Number of<br />

readings<br />

Number of word form<br />

tokens<br />

- 115 -<br />

%<br />

cumulative %<br />

mixed literature mixed literature mixed literature<br />

0 2108 479 0.3 0.4 0.3 0.4<br />

1 290131 62527 46.1 47.4 46.4 47.7<br />

2 149148 30860 23.7 23.4 70.1 71.1<br />

3 74142 15075 11.8 11.4 81.9 82.5<br />

4 81732 17126 13.0 13.0 94.9 95.5<br />

5 23837 4209 3.8 3.2 98.7 98.7<br />

6 6582 1437 1.0 1.1 99.7 99.8<br />

7 1043 159 0.2 0.1 99.9 99.9<br />

8 520 79 0.1 0.1 100.0 100.0<br />

9 9 1 - - - -<br />

10 37 15 - - - -<br />

11 16 2 - - - -<br />

12 23 6 - - - -<br />

13 4 0 - - - -<br />

14 5 0 - - - -<br />

15 6 3 - - - -<br />

16 5 1 - - - -<br />

17 1 1 - - - -<br />

18 1 0 - - - -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!