25.12.2013 Views

Was sollen wir tun? Was dürfen wir glauben? - bei DuEPublico ...

Was sollen wir tun? Was dürfen wir glauben? - bei DuEPublico ...

Was sollen wir tun? Was dürfen wir glauben? - bei DuEPublico ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

BLUHM 11<br />

Also, new dictionaries partly rely on older dictionaries to supply information about phenomena<br />

of language and their interpretation. 4 To put it crudely, dictionary writers crib what other<br />

dictionary writers have written. This is hardly avoidable for reasons of economy, and it is also<br />

a reasonable thing to do: it is a scientific virtue to preserve knowledge that has already been<br />

gained. But there is no way to know, when consulting a dictionary, to what extent the authors<br />

of the dictionary have checked the material that they have inherited from their predecessors—<br />

to what extent they are preserving not only past knowledge but past mistakes.<br />

Finally, it is important to note that dictionaries rely on intuitions at various points. Older<br />

dictionaries, such as the Oxford English Dictionary, relied on quotations that were collected<br />

by informants and thus relied on the judgment and the passive linguistic competence of those<br />

informants. The collected quotations were then processed by the dictionary’s writers and<br />

editor, who have left their mark on the entries, as well.<br />

4. The Benefits of Using Corpora in Linguistic Analysis<br />

If ordinary language is important with respect to some philosophical endeavour, and if we<br />

want to avoid the potential errors I have pointed out, we need some basis on which our intuitions<br />

(as well as dictionaries’ information) can be tested, corrected, and extended. More particularly,<br />

we need independent, and thus unbiased, evidence that expressions in which we are<br />

interested are used in certain ways. Also, we need an independent basis for testing our hypotheses<br />

about the use of these expressions.<br />

Linguistic text corpora can serve these functions and more. Before I go into that, let me<br />

briefly indicate what a corpus is.<br />

4.1 Linguistic Text Corpora<br />

Regrettably, a wholly convincing definition of ‘corpus’ is difficult to obtain. A very wide characterisation<br />

is as follows:<br />

We define a corpus simply as “a collection of texts.” If that seems too broad, the one<br />

qualification we allow relates to the domains and contexts in which the word is used<br />

rather than its denotation: A corpus is a collection of texts when considered as an object<br />

of language or literary study. (Kilgarriff and Grefenstette 2003: 334)<br />

Other rather broad characterisations point to collecting principles to distinguish corpora from<br />

mere collections of text. But it is doubtful whether these are clear criteria:<br />

If a corpus is defined as a principled or structured […] collection of texts, it has to be<br />

distinguished from a more arbitrary collection of material or “text database”. [...] The<br />

borderline between a well-defined corpus and a random collection of texts is unlikely to<br />

be a clear-cut one, however. (Hundt 2008: 170)<br />

Let us just say that a corpus is a collection of texts (written or spoken) that serves as a primary<br />

database for supplying evidence with respect to some linguistic question. That might<br />

not be a fully satisfactory definition, but it will suffice for the present purpose.<br />

The more sophisticated corpora are also annotated; they contain information, for example, on<br />

parts of speech.<br />

There are many corpora that are freely accessible for scientific purposes. By way of example,<br />

let me name two suitable ones. 5 For British English, there is the British National Corpus<br />

4<br />

Cf. Bergenholtz and Schaeder 1985: 292.<br />

5<br />

Comprehensive lists can be found in, e.g., Lee 2010 and Xiao 2008.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!