21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

(1) Language distribution and error type in unanalysable words<br />

DOMAIN<br />

NUMBER OF<br />

TOKENS<br />

PERCENTAGE<br />

English 77 12.8 (9.3)<br />

French 78 12.9 (3.7)<br />

Italian 10 1.7 (1.5)<br />

Spanish 28 4.6 (0.6)<br />

German 15 2.5 (0.2)<br />

Latin 24 4.0 (2.7)<br />

orthographic variation<br />

(European/accentuation)<br />

125 20.7 Correctables<br />

other port. orthographic 74 12.3 Misspellings<br />

non-capitalised names and 37<br />

6.1<br />

Encyclopaedic<br />

abbreviations<br />

lexicon failures<br />

names and name roots 18<br />

3.0<br />

abbreviations<br />

19<br />

3.1<br />

root not found in lexicon<br />

found in Aurelio 69<br />

119<br />

19.7<br />

Core lexicon<br />

91<br />

15.1<br />

failures<br />

not found in Aurelio<br />

28<br />

4.6<br />

derivation/flexion problem 15<br />

2.5<br />

Affix lexicon<br />

suffix<br />

8<br />

1.3<br />

failures<br />

prefix<br />

3<br />

0.5<br />

inflexion ending<br />

2<br />

0.3<br />

alternation information 2<br />

0.3<br />

other 2 0.3<br />

SUM 604 100.0<br />

The table shows a roughly equal distribution of unanalysable words between three<br />

main groups, (a) foreign loan words, (b) spelling problems (shaded), and (c) lexicon<br />

failures (including abbreviations and name derived words). Of course, the spelling<br />

problem group will vary greatly in size depending on corpus quality and<br />

provenience. Also, the one-register corpus above is not typical with regard to loan<br />

word distribution. Ordinarily - as the numbers from the Borba-Ramsey corpus show,<br />

English has a larger and French a smaller share in the loan word pool. And while<br />

nearly non-existent in the literature corpus, scientific domain words can be quite<br />

prominent. Cp. the following percentages from the Borba-Ramsey corpus:<br />

(2)<br />

domain number percentage<br />

medical terms 129 5.0% (of all unanalysable words)<br />

botanical terms 45 1.7% (of all unanalysable words)<br />

pharmaceutical names 102 3.9% (of all unanalysable words)<br />

69 Aurélio Buarque de Holanda Ferreira, “Novo Dicionário Aurelio”, second edition, Rio de Janeiro 1986<br />

- 97 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!