25.12.2013 Views

Was sollen wir tun? Was dürfen wir glauben? - bei DuEPublico ...

Was sollen wir tun? Was dürfen wir glauben? - bei DuEPublico ...

Was sollen wir tun? Was dürfen wir glauben? - bei DuEPublico ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

12 DON’T ASK, LOOK!<br />

(BNC); for American English, the Corpus of Contemporary American English (COCA). 6 The<br />

BNC is a relatively large, closed corpus of texts of written and spoken language. It contains<br />

approximately 100 million words in texts dating from 1960 to 1994. 7 COCA is not closed;<br />

every year approximately 20 million words are added. At present, the corpus contains about<br />

450 million words from more than 175,000 texts dated from 1990 to the present (2012). 8<br />

Both BNC and COCA are freely accessible for scientific purposes. The essentially identical<br />

search interfaces for both are provided by Brigham Young University in Provo, Utah. The<br />

available search algorithms are quite powerful, allowing queries for exact strings as well as<br />

lemmata (i.e., words disregarding inflexions, such as ‘hope’, ‘hopes’, ‘hoped’, and ‘hoping’).<br />

The corpora are annotated, allowing queries for strings of certain grammatical categories<br />

(e.g., ‘hope’ as a noun vs. a verb). It is possible to search for the co-occurrence of expressions<br />

within a distance of 10 words (unfor<strong>tun</strong>ately, this function ignores sentence boundaries).<br />

4.2 Four Benefits of Using Corpora<br />

So, what are the benefits of using corpora?<br />

First of all, corpora provide data on the basis of which hypotheses can be formulated, they<br />

provide data to confirm or falsify hypotheses and conclusions from the analytical process, and<br />

they provide data that can be used to exemplify or illustrate specific usages of interest.<br />

And, secondly, all of these data are, by and large, unfiltered.<br />

Not all corpora fulfil all of these functions equally well, of course. All corpora provide some<br />

linguistic context for the queried expressions, but wider contexts (more than one or two sentences)<br />

are not always provided. There are several corpora that, for copyright reasons, do not<br />

give free access to the texts that constitute their basic data. Yet a relatively thorough consideration<br />

of context may be required to formulate substantial and interesting hypotheses, especially<br />

when an analytical task is first approached. And sometimes the meaning of a word can<br />

be understood only when the wider context of its use is known.<br />

Another important property of corpora is size. Hypotheses are best tested with a large corpus<br />

that, due to its size, contains rare linguistic phenomena. It is important to keep in mind that<br />

hypotheses claiming the non-existence of some phenomenon cannot be proved, and hypotheses<br />

claiming the existence of some phenomenon cannot be disproved by a corpus analysis.<br />

However, if a corpus is very large and comprises a balanced mixture of texts, we can base at<br />

least tentative claims about the existence or non-existence of phenomena on it.<br />

Regardless of the width of the context provided, there are two further benefits of using corpora.<br />

Thirdly, the contexts in which the queried expressions are found give insights into the variety<br />

of real-life situations in which the phenomenon referred to by the concept occurs.<br />

And finally, the contexts often provide excellent raw material for thought experiments with<br />

regard to the concept and the phenomenon in question.<br />

5. A Few Remarks on Other Options<br />

There are two alternatives to the use of corpora that I would like to address briefly. The first is<br />

the use of Internet search engine queries.<br />

6<br />

The corpora are accessible at http://corpus.byu.edu/bnc/and http://corpus.byu.edu/coca/.<br />

7<br />

The online interface dates the texts in BNC to “1970s–1993” (cf. http://corpus.byu.edu/bnc/). My deviation<br />

from this information is based on Leech, Rayson, and Wilson 2001: 1.<br />

8<br />

For the sake of comparison, the largest accessible German corpus, DEREKO, contains 10 million texts<br />

with 2.3 billion (10 9 ) words.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!