Negative evidence and the raw frequency fallacy* - CiteSeerX

Negative evidence and the raw frequency fallacy* - CiteSeerX Negative evidence and the raw frequency fallacy* - CiteSeerX

10.01.2014 Views

NOTE Negative evidence and the raw frequency fallacy* ANATOL STEFANOWITSCH Introduction There is little that is more completely accepted in the conventional wisdom of modern linguistics than the assumption that corpora do not contain negative evidence and that, therefore, intuition-based acceptability judgments are an indispensable part of linguistic methodology. This assumption goes back at least to Chomsky’s discussion of grammaticality in Syntactic Structures (Chomsky 1957: 15 ff.), whose claims can perhaps be excused on the basis that he was writing before the advent of modern corpus linguistics. More worrying, however, is that many modern corpus linguists still share this assumption. For example, in what is otherwise one of the most thorough and most thoughtful textbooks on corpus linguistics currently available, McEnery and Wilson (2001: 11) ask: Without recourse to introspective judgments, how can ungrammatical utterances be distinguished from ones that simply haven’t occurred yet? If our finite corpus does not contain the sentence: *He shines Tony books. how do we conclude that it is ungrammatical? And without any discussion of potential alternatives they promptly give the following answer (McEnery and Wilson 2001: 12): It is only by asking a native or expert speaker of a language for their opinion of the grammaticality of a sentence that we can hope to differentiate unseen but grammatical constructions from those which are simply grammatical but unseen. They conclude their discussion by stating that “we [corpus linguists] must not eschew introspection entirely. If we do, detecting ungrammatical structures and ambiguous structures becomes difficult and, indeed, may be impossible.” (McEnery and Wilson 2001: 12). Corpus Linguistics and Linguistic Theory 21 (2006), 6177 DOI 10.1515/CLLT.2006.003 1613-7027/06/00020061 Walter de Gruyter

NOTE<br />

<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> <strong>fallacy*</strong><br />

ANATOL STEFANOWITSCH<br />

Introduction<br />

There is little that is more completely accepted in <strong>the</strong> conventional wisdom<br />

of modern linguistics than <strong>the</strong> assumption that corpora do not<br />

contain negative <strong>evidence</strong> <strong>and</strong> that, <strong>the</strong>refore, intuition-based acceptability<br />

judgments are an indispensable part of linguistic methodology.<br />

This assumption goes back at least to Chomsky’s discussion of grammaticality<br />

in Syntactic Structures (Chomsky 1957: 15 ff.), whose claims<br />

can perhaps be excused on <strong>the</strong> basis that he was writing before <strong>the</strong> advent<br />

of modern corpus linguistics. More worrying, however, is that many<br />

modern corpus linguists still share this assumption.<br />

For example, in what is o<strong>the</strong>rwise one of <strong>the</strong> most thorough <strong>and</strong> most<br />

thoughtful textbooks on corpus linguistics currently available, McEnery<br />

<strong>and</strong> Wilson (2001: 11) ask:<br />

Without recourse to introspective judgments, how can ungrammatical<br />

utterances be distinguished from ones that simply haven’t occurred<br />

yet? If our finite corpus does not contain <strong>the</strong> sentence:<br />

*He shines Tony books.<br />

how do we conclude that it is ungrammatical?<br />

And without any discussion of potential alternatives <strong>the</strong>y promptly<br />

give <strong>the</strong> following answer (McEnery <strong>and</strong> Wilson 2001: 12):<br />

It is only by asking a native or expert speaker of a language for <strong>the</strong>ir<br />

opinion of <strong>the</strong> grammaticality of a sentence that we can hope to differentiate<br />

unseen but grammatical constructions from those which are<br />

simply grammatical but unseen.<br />

They conclude <strong>the</strong>ir discussion by stating that “we [corpus linguists]<br />

must not eschew introspection entirely. If we do, detecting ungrammatical<br />

structures <strong>and</strong> ambiguous structures becomes difficult <strong>and</strong>, indeed,<br />

may be impossible.” (McEnery <strong>and</strong> Wilson 2001: 12).<br />

Corpus Linguistics <strong>and</strong> Linguistic Theory 21 (2006), 6177<br />

DOI 10.1515/CLLT.2006.003<br />

1613-7027/06/00020061<br />

Walter de Gruyter


62 A. Stefanowitsch<br />

In this note, I would like to take issue with (a large part of) <strong>the</strong>ir<br />

argument. I will argue that <strong>the</strong> idea that corpora do not contain negative<br />

<strong>evidence</strong> is simply a special case of what I have termed <strong>the</strong> observed<strong>frequency</strong><br />

(or <strong>raw</strong>-<strong>frequency</strong>) fallacy, i. e., <strong>the</strong> belief that “[o]bserved frequencies<br />

of occurrence represent relevant facts for scientific analysis”<br />

(Stefanowitsch 2005: 296). When approached with <strong>the</strong> right methodological<br />

tools, corpora do provide negative <strong>evidence</strong>, i. e., <strong>evidence</strong> that<br />

allows us, in principle, to distinguish between constructions that did not<br />

occur but could have (<strong>the</strong>se could be referred to as ‘accidentally absent’,<br />

<strong>and</strong> constructions that did not occur <strong>and</strong> could not have (<strong>the</strong>se can be<br />

referred to as ‘significantly absent’ structures). Thus, while I do agree<br />

that linguists cannot (<strong>and</strong> should not) ‘eschew introspection entirely’, I<br />

will argue that <strong>the</strong>y can (<strong>and</strong> largely should) eschew introspective judgments<br />

of acceptability.<br />

Collostructional analysis <strong>and</strong> <strong>the</strong> significance of absence<br />

In this section, I will address <strong>the</strong> general issue of how significant absences<br />

of a particular configuration of linguistic elements can be distinguished<br />

from accidental ones, using as an example <strong>the</strong> ‘ability’ or ‘inability’ of<br />

English verbs to occur with ditransitive complementation. The choice of<br />

this example is motivated primarily by practical considerations: as will<br />

presently become clear, <strong>the</strong> method I will use requires <strong>the</strong> researcher to<br />

extract exhaustively from a corpus all occurrences of <strong>the</strong> grammatical<br />

phenomenon in question. Ditransitive complementation happens to be<br />

one of <strong>the</strong> features that is relatively uncontroversially tagged in <strong>the</strong><br />

largest grammatically annotated balanced corpus currently available, <strong>the</strong><br />

British component of <strong>the</strong> International Corpus of English (ICE-GB, cf.<br />

Nelson et al. 2002). However, it is a welcome coincidence that this is<br />

precisely <strong>the</strong> complementation pattern that McEnery <strong>and</strong> Wilson chose<br />

to demonstrate <strong>the</strong> need for grammaticality judgments. 1<br />

The relevant method is one of several that Gries <strong>and</strong> I have developed<br />

in a series of publications specifically for <strong>the</strong> purpose of investigating<br />

<strong>the</strong> relationship between grammatical constructions <strong>and</strong> <strong>the</strong> words occurring<br />

in <strong>the</strong>m, <strong>and</strong> that we refer to collectively as collostructional<br />

analysis (cf. e. g., Stefanowitsch <strong>and</strong> Gries 2003, 2005, to appear a; Gries<br />

<strong>and</strong> Stefanowitsch 2004a, b, to appear). 2 The most basic of <strong>the</strong>se methods,<br />

simple collexeme analysis, allows <strong>the</strong> researcher to identify words<br />

that occur significantly more or less frequently than expected in a given<br />

slot of a construction. This is done on <strong>the</strong> basis of a st<strong>and</strong>ard 2-by-2<br />

contingency table containing four observed frequencies: (a) <strong>the</strong> <strong>frequency</strong><br />

of a given word in a particular slot of a given construction, (b)<br />

<strong>the</strong> <strong>frequency</strong> of <strong>the</strong> same word in <strong>the</strong> corresponding slots of all o<strong>the</strong>r


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 63<br />

Table 1.<br />

Give with ditransitive complementation in <strong>the</strong> ICE-GB<br />

Ditransitive ÿDitransitive Total<br />

give 560 531 1,091<br />

(14.57) (1076.43)<br />

ÿgive 1,264 134,196 135,460<br />

(1,809.43) (133,650.57)<br />

Total 1,824 134,727 136,551<br />

constructions, (c) <strong>the</strong> <strong>frequency</strong> of all o<strong>the</strong>r words in <strong>the</strong> relevant slot of<br />

<strong>the</strong> construction under investigation, <strong>and</strong> (d) <strong>the</strong> <strong>frequency</strong> of all o<strong>the</strong>r<br />

words in <strong>the</strong> corresponding slot of all o<strong>the</strong>r constructions. From <strong>the</strong>se<br />

frequencies, we can derive <strong>the</strong> expected <strong>frequency</strong> of occurrence of <strong>the</strong><br />

word in <strong>the</strong> construction, which allows us to determine whe<strong>the</strong>r <strong>and</strong> in<br />

what direction <strong>the</strong> observed <strong>frequency</strong> deviates from <strong>the</strong> expected <strong>frequency</strong><br />

<strong>and</strong> whe<strong>the</strong>r this deviation is statistically significant. As an example,<br />

consider Table 1, which shows <strong>the</strong> relevant contingency table for <strong>the</strong><br />

verb give <strong>and</strong> <strong>the</strong> ditransitive complementation pattern in <strong>the</strong> British<br />

Component of <strong>the</strong> International Corpus of English (ICE-GB) (expected<br />

frequencies are shown in paren<strong>the</strong>ses).<br />

As Table 1 shows, give occurs vastly more frequently than expected<br />

with ditransitive complementation; <strong>the</strong> Fisher-Yates exact test shows that<br />

this difference is highly significant (p < 4.94e324, <strong>the</strong> smallest number<br />

a typical current home-issue computer can h<strong>and</strong>le). In collostructional<br />

analysis, we usually take <strong>the</strong> p-value directly as a measure of association<br />

strength (cf. Pedersen 1996 <strong>and</strong> Stefanowitsch <strong>and</strong> Gries 2003: 238 f. for<br />

justification). In o<strong>the</strong>r words, <strong>the</strong> extremely small p-value is taken to be<br />

an indication of an extremely strong association between give <strong>and</strong> <strong>the</strong><br />

ditransitive complementation pattern.<br />

Repeating this procedure for all verbs occurring with ditransitive complementation<br />

in <strong>the</strong> ICE-GB allows us to rank all verbs first, by whe<strong>the</strong>r<br />

<strong>the</strong>y occur more or less frequently than expected, <strong>and</strong> second, by association<br />

strength. Words that occur more frequently than expected are referred<br />

to as attracted collexemes (<strong>the</strong> strength of <strong>the</strong>ir positive association<br />

can be referred to as attraction strength), words that occur less frequently<br />

are referred to as repelled collexemes (with a corresponding repulsion<br />

strength). For example, all verbs occurring significantly more frequently<br />

than expected are shown in Table 2. The significance level of 0.05<br />

was corrected for multiple testing using a simple Bonferroni correction<br />

(Bonferroni 1936) whereby <strong>the</strong> significance level is divided by <strong>the</strong><br />

number of tests. Since <strong>the</strong> ICE-GB contains 4,856 verb types, this gives<br />

us 0.05/4,856 1.03E05. 3


64 A. Stefanowitsch<br />

Table 2.<br />

Significantly attracted collexemes of <strong>the</strong> ditransitive in <strong>the</strong> ICE-GB<br />

Collexeme F(Corpus) F O (Ditr) F E (Ditr) FYE p-value<br />

give 1,091 560 14.57 0.00E000<br />

tell 792 493 10.58 0.00E000<br />

send 295 78 3.94 4.13E076<br />

ask 504 92 6.73 9.65E074<br />

show 628 84 8.39 5.15E056<br />

offer 196 54 2.62 3.73E054<br />

convince 32 23 0.43 1.70E036<br />

cost 65 23 0.87 9.04E027<br />

inform 55 20 0.73 9.57E024<br />

teach 92 23 1.23 7.94E023<br />

assure 19 13 0.25 1.04E020<br />

remind 41 16 0.55 7.25E020<br />

lend 31 12 0.41 3.48E015<br />

promise 43 12 0.57 3.26E013<br />

owe 25 9 0.33 2.24E011<br />

grant 26 9 0.35 3.38E011<br />

warn 38 10 0.51 5.94E011<br />

award 16 7 0.21 7.72E010<br />

persuade 33 8 0.44 1.03E008<br />

allow 326 20 4.35 2.59E008<br />

guarantee 27 7 0.36 5.27E008<br />

deny 51 8 0.68 3.82E007<br />

earn 56 8 0.75 8.03E007<br />

h<strong>and</strong> 16 5 0.21 1.63E006<br />

pay 395 18 5.28 8.66E006<br />

give back 4 3 0.05 9.42E006<br />

The list of verbs in this table could now serve as a basis for a variety<br />

of observations, for example about <strong>the</strong> meaning of <strong>the</strong> ditransitive complementation<br />

pattern. I will not pursue this issue here (but cf. Stefanowitsch<br />

<strong>and</strong> Gries 2003, Section 3.2.2). 4 Instead, let me point out two<br />

facts about <strong>the</strong> way that <strong>the</strong> label ‘ditransitive’ is applied in <strong>the</strong> ICE-GB.<br />

First, structures with nominal <strong>and</strong> with clausal direct objects are included<br />

under this label (i. e., uses like She told me that she wants to be<br />

free of lawyers <strong>and</strong> doctors [ICE GB s2a062 133] or I told him to drive<br />

<strong>the</strong> forklift truck [ICE GB s2a067 050] as well as <strong>the</strong> more obvious I’ve<br />

told you <strong>the</strong> truth [ICE GB w2 f.006 213]). Second, some verbs are<br />

tagged as ditransitive whose second object might be better analyzed as<br />

an oblique argument, e. g., cost, asinIt cost <strong>the</strong>m three quid [ICE-GB<br />

s1a007 054]). 5 In o<strong>the</strong>r words, <strong>the</strong> label is applied ra<strong>the</strong>r generously.<br />

Next, consider Table 3, which shows <strong>the</strong> significantly repelled collexemes,<br />

sorted by repulsion strength (only <strong>the</strong> first two are significant at<br />

corrected levels).


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 65<br />

Table 3.<br />

Significantly repelled collexemes of <strong>the</strong> ditransitive in <strong>the</strong> ICE-GB<br />

Collexeme F(Corpus) F O (Ditr) F E (Ditr) FYE p-value<br />

make 1,865 3 24.91 3.39E008<br />

do 2,937 12 39.23 2.56E007<br />

find 854 2 11.41 7.96E004<br />

call 616 1 8.23 2.32E003<br />

keep 374 1 5 3.95E002<br />

One could now ask why verbs that occur in a given construction might<br />

do so less frequently than expected. There are several reasons, some of<br />

<strong>the</strong>m more interesting than o<strong>the</strong>rs. First, verbs may appear on this list<br />

because <strong>the</strong>y are incorrectly tagged (in this case, call, which is tagged as<br />

‘ditransitive’ in <strong>the</strong> utterance And <strong>the</strong> person who’s being called [ICE-<br />

GB s1a030 003]). Obviously, such incorrect tags are hard to eliminate<br />

completely once a corpus reaches a certain size. Second, some verbs<br />

appear on this list because <strong>the</strong>ir ditransitive uses are very restricted, in<br />

some cases to a single fixed expression (in this case, keep s. o. company).<br />

Finally, most verbs appear on this list because <strong>the</strong>y occur very frequently<br />

with o<strong>the</strong>r complementation patterns (this is most obvious for <strong>the</strong> high<strong>frequency</strong><br />

verbs make <strong>and</strong> do, but it is also true of find). What one can<br />

take away from a discussion of such cases is, first, that fixed expressions<br />

must be taken into account in any linguistic analysis, <strong>and</strong> second, that<br />

complementation patterns exhibit a certain amount of productivity, occurring<br />

at least occasionally with verbs whose dominant patterns are<br />

o<strong>the</strong>rs (both facts are unsurprising from <strong>the</strong> perspective of construction<br />

grammar, in which collostructional analysis first developed).<br />

However, <strong>the</strong> data in Table 3 do not speak directly to <strong>the</strong> issue of<br />

negative <strong>evidence</strong> yet: a fur<strong>the</strong>r step is necessary. In our previous work,<br />

we have referred as ‘repelled’ only to those words which do occur in a<br />

given construction but do so less frequently than expected; however, as<br />

we noted in passing in our first paper (cf. Stefanowitsch <strong>and</strong> Gries 2003:<br />

238), it is possible <strong>and</strong> perhaps logical to include in this category<br />

words that would have been expected to occur in <strong>the</strong> construction based<br />

on <strong>the</strong>ir overall <strong>frequency</strong> in <strong>the</strong> corpus, but did not, in fact, occur in<br />

<strong>the</strong> construction at all. This is <strong>the</strong> step that finally takes us to <strong>the</strong> issue<br />

of negative <strong>evidence</strong>: The range of frequencies of occurrence that can be<br />

evaluated for statistical significance include <strong>the</strong> limiting case of zero; in<br />

o<strong>the</strong>r words, <strong>the</strong> non-occurrence of a particular configuration of linguistic<br />

categories (for example, of a particular verb in a particular construction)<br />

can be compared to its expected <strong>frequency</strong> of occurrence. This will


66 A. Stefanowitsch<br />

Table 4.<br />

a. say<br />

Three verbs that do not occur with ditransitive complementation in <strong>the</strong> ICE-GB<br />

Ditransitive ÿDitransitive Total<br />

b. explain<br />

say 0 3,333 3,333<br />

(44.52)<br />

ÿsay 1,824 131,394 133,218<br />

Total 1,824 134,727 136,551<br />

Ditransitive ÿDitransitive Total<br />

explain 0 172 172<br />

(2.30)<br />

ÿexplain 1,824 134,555 136,379<br />

c. whisper<br />

Total 1,824 134,727 136,551<br />

Ditransitive ÿDitransitive Total<br />

whisper 0 5 5<br />

(0.07)<br />

ÿwhisper 1,824 134,722 136,546<br />

Total 1,824 134,727 136,551<br />

allow us, in many cases, to determine whe<strong>the</strong>r an unseen construction is<br />

likely to be a possible construction of a language or not.<br />

Consider Table 4, which shows <strong>the</strong> contingency tables for three verbs<br />

that do not occur with ditransitive complementation in <strong>the</strong> ICE-GB, say,<br />

explain, <strong>and</strong> whisper.<br />

On a priori grounds, we might expect all three verbs to allow ditransitive<br />

complementation, since <strong>the</strong>y are all reasonably close in meaning to<br />

one of <strong>the</strong> most strongly attracted collexemes of this pattern, tell (<strong>and</strong><br />

o<strong>the</strong>r verbs of communication occurring among <strong>the</strong> significantly<br />

attracted collexemes; e. g. ask, inform, teach, assure). On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>,<br />

<strong>the</strong>y are textbook cases in <strong>the</strong> linguistic literature of verbs not allowing<br />

ditransitive complementation (cf. e. g., Pinker 1989).<br />

Table 4a provides conclusive <strong>evidence</strong> that <strong>the</strong> linguistic literature is<br />

right in <strong>the</strong> case of say, whose repulsion strength meets <strong>the</strong> corrected<br />

level of significance (p 1.96E20; < 1.03E05). We can confidently


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 67<br />

claim that <strong>the</strong> combination [say ditransitive] is significantly absent. In<br />

<strong>the</strong> case of explain, <strong>the</strong> repulsion strength does not meet even <strong>the</strong> uncorrected<br />

level (p 0.099; > 0.05), although it is not too far off. It is simply<br />

not frequent enough in <strong>the</strong> ICE-GB to let us determine whe<strong>the</strong>r its nonoccurrence<br />

is accidental or significant, although its marginal statistical<br />

significance may lead us to suspect <strong>the</strong> latter. No such suspicion would<br />

be warranted in <strong>the</strong> case of whisper, whose non-occurrence is well within<br />

<strong>the</strong> range of accidental variation (p 0.935; > 0.05).<br />

Before discussing <strong>the</strong>se issues any fur<strong>the</strong>r, let us take a look at <strong>the</strong><br />

results we get when we apply simple collexeme analysis exploratively to<br />

all verbs that occur in <strong>the</strong> ICE-GB but not in <strong>the</strong> ditransitive. There are<br />

4,856 verb types in <strong>the</strong> ICE-GB (according to my definition, which lists<br />

phrasal verbs as separate types, see Footnote 4). Of <strong>the</strong>se, 4,782 do not<br />

occur in <strong>the</strong> ditransitive. In turn, this non-occurrence is significant only<br />

for 53 verbs (of which only 11 meet <strong>the</strong> corrected level of significance).<br />

Table 5 shows <strong>the</strong> significantly repelled collexemes.<br />

Table 5.<br />

Significantly repelled collexemes of <strong>the</strong> ditransitive in <strong>the</strong> ICE-GB<br />

Collexeme F(Corpus) F O (Ditr) F E (Ditr) FYE p-value<br />

be 25,416 0 340.00 4.29E165<br />

be|have 6,261 0 83.63 3.66E038<br />

have 4,303 0 57.48 2.90E026<br />

think 3,335 0 44.55 1.90E020<br />

say 3,333 0 44.52 1.96E020<br />

know 2,120 0 28.32 3.32E013<br />

see 1,971 0 26.33 2.54E012<br />

go 1,900 0 25.38 6.69E012<br />

want 1,256 0 16.78 4.27E008<br />

use 1,222 0 16.32 6.77E008<br />

come 1,140 0 15.23 2.06E007<br />

look 1,099 0 14.68 3.59E007<br />

Significant at uncorrected significance levels:<br />

try 749 0 10.00 4.11E005<br />

mean 669 0 8.94 1.21E004<br />

work 646 0 8.63 1.65E004<br />

like 600 0 8.01 3.08E004<br />

feel 593 0 7.92 3.38E004<br />

become 577 0 7.71 4.20E004<br />

happen 523 0 6.99 8.70E004<br />

put 513 0 6.85 9.96E004<br />

talk 490 0 6.55 1.36E003<br />

hear 483 0 6.45 1.49E003<br />

need 420 0 5.61 3.49E003<br />

believe 397 0 5.30 4.76E003


68 A. Stefanowitsch<br />

Table 5. (continued)<br />

Collexeme F(Corpus) F O (Ditr) F E (Ditr) FYE p-value<br />

provide 380 0 5.08 5.99E003<br />

live 378 0 5.05 6.16E003<br />

remember 373 0 4.98 6.59E003<br />

produce 328 0 4.38 1.21E002<br />

speak 323 0 4.31 1.29E002<br />

hope 316 0 4.22 1.42E002<br />

run 309 0 4.13 1.56E002<br />

change 306 0 4.09 1.63E002<br />

meet 303 0 4.05 1.69E002<br />

help 301 0 4.02 1.74E002<br />

start 294 0 3.93 1.91E002<br />

move 291 0 3.89 1.99E002<br />

seem 285 0 3.81 2.16E002<br />

agree 279 0 3.73 2.34E002<br />

lead 271 0 3.62 2.60E002<br />

expect 265 0 3.54 2.82E002<br />

consider 264 0 3.53 2.86E002<br />

suggest 259 0 3.46 3.06E002<br />

describe 259 0 3.46 3.06E002<br />

decide 259 0 3.46 3.06E002<br />

underst<strong>and</strong> 250 0 3.34 3.46E002<br />

hold 249 0 3.33 3.50E002<br />

require 244 0 3.26 3.75E002<br />

involve 242 0 3.23 3.85E002<br />

suppose 241 0 3.22 3.90E002<br />

include 236 0 3.15 4.17E002<br />

occur 233 0 3.11 4.35E002<br />

develop 233 0 3.11 4.35E002<br />

go on 231 0 3.09 4.46E002<br />

follow 227 0 3.03 4.71E002<br />

Two things about this table require discussion. First, it demonstrates<br />

that even a one-million-word corpus is too small to allow us to identify<br />

significant absences for more than a h<strong>and</strong>ful of cases (at least for a<br />

relatively rare pattern such as ditransitive complementation). I will discuss<br />

this problem in <strong>the</strong> remainder of this section <strong>and</strong> in <strong>the</strong> next section.<br />

Second, <strong>the</strong> results only tell us that a particular structure is significantly<br />

absent, <strong>the</strong>y do not, as pointed out in <strong>the</strong> introduction, tell us why it is<br />

significantly absent. I will return to this problem in <strong>the</strong> final section.<br />

The problem of insufficient corpus size can ultimately only be solved<br />

by <strong>the</strong> creation of larger grammatically annotated corpora. However, in<br />

many individual cases it is possible to arrive at a fairly safe conclusion<br />

using currently available non-annotated corpora. Take <strong>the</strong> case of ex-


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 69<br />

plain. In <strong>the</strong> 100-million-word British National Corpus (<strong>the</strong> largest balanced<br />

corpus of British English currently available), <strong>the</strong> verb explain<br />

occurs 18,334 times, but not once with ditransitive complementation. In<br />

Stefanowitsch <strong>and</strong> Gries (2003: 219), we estimated that <strong>the</strong> BNC contains<br />

10,206,300 complementation patterns overall. If we assume that<br />

<strong>the</strong> proportion of ditransitives in <strong>the</strong> BNC is <strong>the</strong> same as in <strong>the</strong> ICE-<br />

GB, <strong>the</strong>n <strong>the</strong> BNC should contain 136,332 ditransitives. Given <strong>the</strong>se<br />

figures, we can now calculate <strong>the</strong> expected <strong>frequency</strong> of occurrence of<br />

explain in <strong>the</strong> ditransitive: 245. The difference between this <strong>and</strong> <strong>the</strong> observed<br />

<strong>frequency</strong> of zero is highly significant (at uncorrected levels of<br />

significance; p 6.73E108; < 0.001). Thus, <strong>the</strong> combination [explain<br />

ditransitive] can be categorized as a significantly absent structure<br />

based on negative corpus <strong>evidence</strong>. This strategy works even with a low<strong>frequency</strong><br />

verb like whisper, which occurs only 2,976 times in <strong>the</strong> BNC,<br />

but, again, does not occur in <strong>the</strong> ditransitive. Under <strong>the</strong> assumptions<br />

just outlined, <strong>the</strong> expected <strong>frequency</strong> of [whisper ditransitive] is 40.<br />

Again, <strong>the</strong> difference is highly significant (at uncorrected levels;<br />

p 4.139019E18; < 0.001). 6 Repeating individual tests on a larger<br />

corpus will, of course, not invariably lead to <strong>the</strong> conclusion that a given<br />

structure is significantly absent. In many cases, <strong>the</strong> <strong>frequency</strong> of a verb<br />

will remain too low to yield significant results. For example, <strong>the</strong> verb<br />

oxidise, like whisper, occurs five times in <strong>the</strong> ICE-GB, never in <strong>the</strong> ditransitive.<br />

In <strong>the</strong> BNC, it occurs 99 times, also never in <strong>the</strong> ditransitive. The<br />

expected <strong>frequency</strong> in this case is 1 (1.32, to be precise), <strong>and</strong> <strong>the</strong> difference<br />

between this <strong>and</strong> <strong>the</strong> observed <strong>frequency</strong> of zero is still far too<br />

small to reach statistical significance (p 0.26; >0.05). In o<strong>the</strong>r cases,<br />

extending a search to a larger corpus will fail to replicate <strong>the</strong> zero occurrence<br />

in <strong>the</strong> smaller corpus. For example, <strong>the</strong> BNC contains one clear<br />

example of donate with ditransitive complementation (1a), <strong>and</strong> a second<br />

potential one (1b):<br />

(1) a. Saudi king donates Laura transplant money. The king of Saudi<br />

Arabia has donated a hundred <strong>and</strong> fifty thous<strong>and</strong> pounds to Laura<br />

Davies ... (K1N)<br />

b. ... if <strong>the</strong> villagers hadn’t so kindly donated her furnishings, she’d<br />

probably still be existing in empty rooms ... (H95). 7<br />

Faced with such examples, <strong>the</strong>re is no longer any reason to believe<br />

that [donate ditransitive] is a significantly absent structure.<br />

Thus, <strong>the</strong> methodological problem of insufficiently large corpora is<br />

not, in principle, an argument against replacing intuitive grammaticality<br />

judgments by negative corpus <strong>evidence</strong> (in practice it may be, a point<br />

which I will return to below). Instead, <strong>the</strong> preceding discussion shows


70 A. Stefanowitsch<br />

that negative corpus <strong>evidence</strong> can be adduced for a pet case of syntactic<br />

<strong>the</strong>orizing (note that among <strong>the</strong> significantly absent collexemes of <strong>the</strong><br />

ditransitive complementation pattern in Table 5, <strong>the</strong>re are a large<br />

number of famously non-ditransitive verbs, e. g., suggest, provide, say,<br />

describe, etc.).<br />

One may now argue that even if such negative corpus <strong>evidence</strong> can be<br />

obtained, it does not add any insights that could not also be arrived at<br />

by introspective acceptability judgments at best (e. g., for say, explain,<br />

whisper), it will confirm what we know from intuition anyway, in <strong>the</strong><br />

worst case it will never yield enough data to decide <strong>the</strong> issue (e. g., oxidise)<br />

or contradict generally agreed-upon acceptability judgments (e. g.,<br />

donate). There are two reasons why this argument is wrong: first, unlike<br />

acceptability judgments, negative corpus <strong>evidence</strong> meets <strong>the</strong> st<strong>and</strong>ards of<br />

scientific research. Second, it is only such corpus <strong>evidence</strong> that will allow<br />

us to make principled statements about what is <strong>and</strong> is not possible: if we<br />

hypo<strong>the</strong>size, based on acceptability judgments, that whisper ditransitive<br />

is impossible, a single counterexample can prove this wrong. While<br />

such counterexamples may not occur even in very large corpora, <strong>the</strong>y<br />

are still easy to come by in <strong>the</strong> age of <strong>the</strong> Internet. A web search quickly<br />

turns up counterexamples produced by native speakers of (British) English,<br />

both with clausal <strong>and</strong>, more crucially, with nominal objects:<br />

(2) a. ... when I first beheld you <strong>the</strong> instinct of Nature whispered me<br />

that we were in some degree related ... (Jane Austen, Love <strong>and</strong><br />

Friendship, Letter 11)<br />

b. She had not been allowed to ... to bury <strong>the</strong> two people she had<br />

loved most in <strong>the</strong> world ... to whisper <strong>the</strong>m a last goodbye. (Meg<br />

Hutchinson, Peppercorn Woman)<br />

Of course, <strong>the</strong>se examples can in turn be questioned; <strong>the</strong>y may reflect<br />

older stages of <strong>the</strong> language (Jane Austen wrote Love <strong>and</strong> Friendship<br />

around 1790), <strong>the</strong>y may reflect regional dialects (Meg Hutchinson lives<br />

in Staffordshire), etc. However, <strong>the</strong> fact remains that we have objective<br />

au<strong>the</strong>ntic examples pitched against subjective intuition. In contrast, <strong>the</strong><br />

negative corpus <strong>evidence</strong> obtained from <strong>the</strong> BNC gives us an objective<br />

basis for arguing that, even though utterances like (2) can occur, <strong>the</strong>y<br />

are very marginal. This allows us to uphold <strong>the</strong> useful generalization<br />

that communication verbs can be used with ditransitive syntax if <strong>the</strong>y<br />

refer to <strong>the</strong> type of message communicated, but not if <strong>the</strong>y refer to <strong>the</strong><br />

manner in which it is communicated (Pinker 1989), even though we have<br />

to reformulate it as a strong statistical tendency instead of a categorical<br />

constraint. The same holds for donate. Even if we generously admit both<br />

(1a <strong>and</strong> 1b) as counterexamples, we can still point out that donate would


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 71<br />

have been expected to occur 14 times (based on <strong>the</strong> assumptions above),<br />

<strong>and</strong> thus still constitutes a strongly repelled collexeme (p 0.0001;<br />


72 A. Stefanowitsch<br />

whe<strong>the</strong>r zero deviates significantly from this expected <strong>frequency</strong> (<strong>the</strong> sufficient<br />

condition for upholding <strong>the</strong> hypo<strong>the</strong>sis). This information will<br />

likely be more difficult to obtain or estimate than information about<br />

complementation patterns, but to do so is by no means impossible.<br />

Any hypo<strong>the</strong>sis about possible <strong>and</strong> impossible structures in language<br />

is ultimately a hypo<strong>the</strong>sis about <strong>the</strong> incompatibility of two (or more)<br />

linguistic categories. As long as <strong>the</strong>se categories can be operationalized<br />

in such a way that <strong>the</strong>y can be exhaustively annotated (or identified<br />

spontaneously) in a corpus of naturally occurring language, <strong>and</strong> as long<br />

as <strong>the</strong> corpus is large enough, this corpus can provide both positive <strong>and</strong><br />

negative <strong>evidence</strong>. The first condition should always be met: if a category<br />

cannot be operationalized for objective identification, it has no place in<br />

a linguistic <strong>the</strong>ory. The second condition is not currently met. There are<br />

several syntactically annotated corpora (for example, <strong>the</strong> Penn Treebank,<br />

Sampson’s Suzanne <strong>and</strong> Christine corpora, <strong>and</strong> <strong>the</strong> ICE-GB used in this<br />

note), but <strong>the</strong>y are ei<strong>the</strong>r too small for many research questions, or <strong>the</strong>ir<br />

annotation scheme is too coarse or too unreliable, or both. However,<br />

this cannot seriously be used as a defense of <strong>the</strong> introspective method.<br />

Instead, it must be used as an argument for <strong>the</strong> funding <strong>and</strong> <strong>the</strong> human<br />

resources necessary for <strong>the</strong> construction of large grammatically annotated<br />

corpora. A discipline can only get so far by thought experiments (if<br />

that is what acceptability judgments are). It begins to make substantial<br />

headway only when it faces up to <strong>the</strong> problem of data scarcity <strong>and</strong> solves<br />

it. Astronomers have built radio telescopes, physicists have built particle<br />

colliders, <strong>and</strong> geneticists have sequenced <strong>the</strong> human genome; linguists<br />

should be able to construct large, balanced, syntactically annotated corpus<br />

of at least <strong>the</strong> world’s major languages. But even until this goal is<br />

reached or, more likely, in case it is never reached corpora can yield<br />

both positive <strong>and</strong> negative <strong>evidence</strong> for <strong>the</strong> construction of linguistic<br />

<strong>the</strong>ories.<br />

Final remarks: <strong>the</strong> occurring <strong>and</strong> <strong>the</strong> non-occurring<br />

The main point of this note was to show that corpora contain negative<br />

<strong>evidence</strong> <strong>and</strong> that this negative corpus <strong>evidence</strong> can, <strong>and</strong> should, replace<br />

introspective acceptability judgments. It seems appropriate, however, to<br />

discuss <strong>the</strong> most important <strong>the</strong>oretical implications of such a step.<br />

First, from <strong>the</strong> perspective advocated here, <strong>the</strong> non-occurrence of a<br />

particular linguistic structure is merely <strong>the</strong> limiting case; it is not qualitatively<br />

different from very rare occurrences. This may seem to be a problem<br />

for an approach that argues for an absolute distinction between<br />

possible <strong>and</strong> impossible configurations of linguistic categories (for example,<br />

between grammatical <strong>and</strong> ungrammatical structures). This problem


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 73<br />

may be more apparent than real, however. The continuum between significantly<br />

rare <strong>and</strong> significantly absent structures is not fundamentally<br />

different from <strong>the</strong> continuum between various degrees of unacceptability<br />

that is regularly found for acceptability ratings. In both cases, <strong>the</strong> data<br />

must be viewed in light of one’s <strong>the</strong>ory of language in order to make<br />

sense of this continuum. Also, it may well be possible to identify a degree<br />

of improbability that is close enough to impossibility to be indistinguishable<br />

from it.<br />

Second, while <strong>the</strong> statistically significant absence (or rareness) of a<br />

particular configuration of grammatical categories can be taken as <strong>evidence</strong><br />

that this configuration is impossible (i. e., very improbable), it does<br />

not, in itself, provide any clues as to why this should be <strong>the</strong> case. Again,<br />

<strong>the</strong> same is true of introspective judgments. Chomsky pointed this out<br />

early on: “The notion ‘acceptable’ is not to be confused with ‘grammatical’.<br />

Acceptability belongs to <strong>the</strong> study of performance whereas grammaticalness<br />

belongs to <strong>the</strong> study of competence” (Chomsky 1965: 11). A<br />

linguistic structure may give rise to introspective judgments of unacceptability<br />

for a number of reasons, of which ungrammaticality (or, more,<br />

generally, failure to conform to general linguistic rules) is just one. What<br />

that reason is must be determined independently of <strong>the</strong> acceptability<br />

judgment. The same is true of significantly absent (or rare) structures:<br />

determining significant absence/rareness is just <strong>the</strong> first step of a linguistic<br />

analysis. The second step is to determine <strong>the</strong> reasons for <strong>the</strong> significant<br />

absence/rareness. This step can be much closer to traditional linguistic<br />

argumentation. First, it may involve <strong>the</strong> search for au<strong>the</strong>ntic counterexamples<br />

(as in <strong>the</strong> case of whisper above) in order to test <strong>the</strong> extent of<br />

this absence. This may uncover variation in <strong>the</strong> data (panchronic, regional,<br />

social, etc.) or particular contexts in which seemingly impossible<br />

structures become possible. Second, it may involve constructing examples<br />

in order to determine whe<strong>the</strong>r <strong>the</strong> significant absence is semantically<br />

determined. If <strong>the</strong> constructed examples are not interpretable, <strong>the</strong> absence<br />

may simply be due to semantic incompatibility. For example, no<br />

interpretation can be assigned to He knew her <strong>the</strong> answer or She saw<br />

him <strong>the</strong> light. If <strong>the</strong> constructed examples are interpretable, <strong>the</strong>ir absence<br />

cannot be due to semantic incompatibility but may instead have purely<br />

formal reasons. For example, He said her <strong>the</strong> answer or She put him <strong>the</strong><br />

book are straightforwardly interpretable (of course, <strong>the</strong>re may be more<br />

fine-grained semantic restrictions as <strong>the</strong> huge literature on ditransitives<br />

shows). In o<strong>the</strong>r words, while I argue against <strong>the</strong> use of acceptability<br />

judgments as a linguistic method, I do not argue against <strong>the</strong> use of interpretation.<br />

There is good reason for this distinction, which I am not <strong>the</strong><br />

first to point out: interpreting utterances is a natural human activity,<br />

judging <strong>the</strong>ir acceptability is not.


74 A. Stefanowitsch<br />

Third, while it is plausible to speak of different degrees of attraction<br />

or repulsion in <strong>the</strong> case of combinations that do occur, it is less clear<br />

whe<strong>the</strong>r it makes sense to speak of different degrees of absence, as <strong>the</strong><br />

ranking of significantly absent collexemes in Table 5 suggests. Methodologically,<br />

this ranking merely reflects <strong>the</strong> certainty with which we can<br />

say that a structure is impossible. One may (but need not) argue, though,<br />

that this certainty reflects <strong>the</strong> certainty of a native speaker, in which case<br />

<strong>the</strong> ‘degrees of absence’ do become relevant to <strong>the</strong>oretical considerations.<br />

Whe<strong>the</strong>r <strong>the</strong> predictions of such a view are borne out by empirical data<br />

remains to be seen.<br />

More generally, it seems to me that accepting <strong>the</strong> methodology I have<br />

argued for here may lead to a slight but pervasive reorientation of linguistic<br />

<strong>the</strong>ory. If we accept significant presence <strong>and</strong> significant absence<br />

(as well as significant <strong>frequency</strong> <strong>and</strong> rareness) as <strong>the</strong> primary facts that<br />

a linguistic <strong>the</strong>ory must explain, <strong>the</strong>n this <strong>the</strong>ory will have to be broader<br />

than most current <strong>the</strong>ories. Ra<strong>the</strong>r than focusing exclusively on grammaticality,<br />

such a <strong>the</strong>ory would have to uncover <strong>the</strong> whole range of<br />

causes for <strong>the</strong> presence <strong>and</strong> absence of linguistic structures <strong>and</strong> investigate<br />

all of <strong>the</strong>m with <strong>the</strong> same degree of rigor <strong>and</strong> explicitness. The aim<br />

of linguistic analysis would no longer be “to separate <strong>the</strong> grammatical<br />

sequences which are <strong>the</strong> sentences of [a language] L from <strong>the</strong> ungrammatical<br />

sequences which are not sentences of L <strong>and</strong> to study <strong>the</strong> structure<br />

of <strong>the</strong> grammatical sentences” (Chomsky 1957: 13). Instead, <strong>the</strong><br />

aim would be to provide for individual languages <strong>and</strong>, ultimately, for<br />

language in general a comprehensive <strong>the</strong>ory of <strong>the</strong> occurring <strong>and</strong> <strong>the</strong><br />

non-occurring. 8<br />

Received January 2006<br />

Revisions received March 2006<br />

Final acceptance March 2006<br />

University of Bremen<br />

Notes<br />

* I would like to thank Stefan Gries, Arne Zeschel <strong>and</strong> <strong>the</strong> participants of <strong>the</strong> 7.<br />

Norddeutsches Linguistisches Kolloquium for <strong>the</strong>ir comments on <strong>the</strong> ideas presented<br />

in this paper. Any conceptual errors are mine alone.<br />

1. Actually, <strong>the</strong>re are several potential reasons for <strong>the</strong> oddness of McEnery <strong>and</strong> Wilson’s<br />

example (for example, <strong>the</strong> use of <strong>the</strong> simple present <strong>and</strong> <strong>the</strong> potential violation<br />

of <strong>the</strong> selection restrictions of <strong>the</strong> verb shine by <strong>the</strong> direct object NP books).<br />

Their discussion suggests, however, that <strong>the</strong>y are concerned with complementation.<br />

2. An overview over this method <strong>and</strong> its place in <strong>the</strong> corpus-based study of grammatical<br />

patterns will be provided in Stefanowitsch <strong>and</strong> Gries (to appear b); meanwhile,<br />

an introduction can be found on my website at . This website also provides a number of Perl scripts for doing col-


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 75<br />

lostructional analysis (PerlClx 1.0); cf. also Gries’ R script CollAnalysis 3, available<br />

from his website at . Incidentally, both scripts can provide <strong>the</strong> corpus <strong>frequency</strong><br />

that a word not occurring in a particular construction must have in order<br />

for its absence to be significant given <strong>the</strong> <strong>frequency</strong> of <strong>the</strong> construction <strong>and</strong> <strong>the</strong><br />

size of <strong>the</strong> corpus. CollAnalysis provides this information as part of a collostructional<br />

analysis, PerlClx contains a script (zclx.pl) exclusively dedicated to this purpose.<br />

3. The Bonferroni correction is meant to place stricter requirements on statistical<br />

significance in situations where multiple tests are performed on <strong>the</strong> same data set:<br />

obviously, <strong>the</strong> more tests you perform, <strong>the</strong> more chances <strong>the</strong>re are for a seemingly<br />

significant result to come about by accident. However, some have argued (for<br />

example, Pernege 1998) that this correction does more harm than good because<br />

it removes many results that are significant. I will not place too much emphasis<br />

here on this correction. In a sense, whe<strong>the</strong>r one has to apply it in <strong>the</strong> context of<br />

collostructional analysis or not depends on one’s view of what one is doing: is<br />

one testing individual word-construction pairs (in which case each test can st<strong>and</strong><br />

on its own <strong>and</strong> one could be less concerned with correcting) or is one testing a<br />

construction <strong>and</strong> all words occurring in it (in which case one could be more concerned<br />

with correcting).<br />

4. These results differ from those that Gries <strong>and</strong> I have presented previously, mainly<br />

because we have focused exclusively on ditransitives with two nominal objects,<br />

whereas I have included here all uses tagged as ‘ditransitive’ in <strong>the</strong> ICE-GB, including<br />

those with a clausal direct object. Fur<strong>the</strong>rmore, here <strong>and</strong> throughout <strong>the</strong><br />

following discussion I have used regular expressions to search <strong>the</strong> corpus files<br />

directly, ra<strong>the</strong>r than using ICECUP, <strong>the</strong> software tool that accompanies <strong>the</strong> ICE-<br />

GB. I have also discarded all verbs marked as ‘ignored’ in <strong>the</strong> corpus annotation<br />

<strong>and</strong> I have discarded all unclear words. I have manually lemmatized <strong>the</strong> verbs<br />

<strong>and</strong> st<strong>and</strong>ardized spelling variants. Finally, I treat phrasal verbs as lemmas in<br />

<strong>the</strong>ir own right (cf. give back in Table 2); <strong>the</strong>y were identified by searching for<br />

verbs that were followed (<strong>and</strong> in some cases preceded) by a particle annotated<br />

as such.<br />

5. Note that cost does not behave like a typical transitive verb (whe<strong>the</strong>r in its monotransitive<br />

or its ditransitive use). For example, it cannot be passivized: *Three quid<br />

were cost (<strong>the</strong>m), *They were cost three quid. Thus, <strong>the</strong> apparent direct object may<br />

be better analyzed as an oblique (for example, a subject complement or an adjunct,<br />

cf. e. g., Quirk et al. 1985, § 16.27).<br />

6. Since we are conducting individual tests here based on hypo<strong>the</strong>ses about specific<br />

verbs, we could argue that <strong>the</strong> levels of significance do not have to be adjusted<br />

for multiple testing. However, since <strong>the</strong>re are thous<strong>and</strong>s of tests of <strong>the</strong> same kind<br />

(verb ditransitive) that we could have performed, it might be a good idea to<br />

correct for multiple testing anyway. According to Leech et al.’s (2001) <strong>frequency</strong><br />

list, <strong>the</strong>re are 38,019 verb types in <strong>the</strong> BNC; if anything, this is an overestimation<br />

(since inaccurately tokenized forms like ["see] [&mdash;see] [see?] etc. are all<br />

counted as <strong>the</strong>ir own lemmas), so if we correct on this basis, we are on <strong>the</strong> safe<br />

side. The corrected level of significance is 1.32E06; both explain <strong>and</strong> whisper<br />

clear this level by several orders of magnitude.<br />

7. Due to <strong>the</strong> ambiguity of <strong>the</strong> pronominal form her, it is not clear whe<strong>the</strong>r this<br />

example is monotransitive (They donated [NP her furnishings]) or ditransitive<br />

(They donated [NP her] [NP furnishings]). However, a web search turns up additional<br />

clear (if rare) examples of ditransitive uses of donate, for example, In May<br />

2004, Cycle Heaven, a local retailer, <strong>and</strong> City of York Council threw down <strong>the</strong>


76 A. Stefanowitsch<br />

gauntlet to local schools, saying ‘achieve your target increases in walking <strong>and</strong> cycling<br />

by summer 2005, <strong>and</strong> we will donate you a free, high quality children’s bike! (http://<br />

www.york.gov.uk/cgi-bin/wn_document.pl?type 5927).<br />

8. I do not want to conclude this note without applying <strong>the</strong> by now familiar reasoning<br />

to McEnery <strong>and</strong> Wilson’s question how <strong>the</strong> ungrammaticality of *He shines<br />

Tony books could be determined without intuition judgments. Shine (in all its<br />

senses) occurs 2,258 times in <strong>the</strong> BNC. On <strong>the</strong> assumptions made above, <strong>the</strong> expected<br />

<strong>frequency</strong> of shine with ditransitive complementation would be 30; <strong>the</strong><br />

observed <strong>frequency</strong> is zero. This difference is highly significant (p 6.47E14)<br />

even if we correct for multiple testing. Thus, without resorting to introspection,<br />

we have proved that [shine ditransitive] is significantly absent. Whe<strong>the</strong>r this is<br />

strictly due to ungrammaticality is doubtful: first, McEnery <strong>and</strong> Wilson’s sentence<br />

is interpretable (albeit weird); second, it is possible to find au<strong>the</strong>ntic ditransitive<br />

uses by certified native speakers of English for both senses of shine: (i) Shine me<br />

a light from your eyes dear (Christine McVie, Show me a Smile [performed by<br />

Fleetwood Mac]); (ii) He smiles telling him to shine him a metallic Purple armor<br />

(Jimi Hendrix, Bold as Love). Thus, we could hypothsize that ditransitive uses of<br />

shine are semantically so restricted that <strong>the</strong>y occur only in very specific circumstances<br />

(e. g., <strong>the</strong> ‘light’ reading can only occur ditransitively when <strong>the</strong> direct object<br />

is light) or that <strong>the</strong>y only occur in certain dialects (e. g., Lancashire [McVie]<br />

<strong>and</strong> Seattle [Hendrix]) or registers (e. g., rock lyrics).<br />

References<br />

Bonferroni Carlo E.<br />

1936 Teoria statistica delle classi e calcolo delle probabilità . Pubblicazioni del<br />

R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 8,<br />

362.<br />

Chomsky, Noam<br />

1957 Syntactic Structures. The Hague: Mouton.<br />

1965 Aspects of <strong>the</strong> Theory of Syntax. Cambridge, MA: MIT Press.<br />

Gries, Stefan Th. <strong>and</strong> Anatol Stefanowitsch<br />

2004a Extending collostructional analysis: a corpus-based perspective on ‘alternations’.<br />

International Journal of Corpus Linguistics 9(1), 97129.<br />

2004b Co-varying collexemes in <strong>the</strong> into-causative. In: Achard, Michel <strong>and</strong> Suzanne<br />

Kemmer (eds.), Language, Culture, <strong>and</strong> Mind. Stanford: CSLI,<br />

225236.<br />

To appear Cluster analysis <strong>and</strong> <strong>the</strong> identification of collexeme classes. In John Newman<br />

<strong>and</strong> Sally Rice (eds.), Empirical <strong>and</strong> Experimental Methods in Cognitive/Functional<br />

Research. Stanford: CSLI.<br />

Leech, Geoffrey, Paul Rayson, <strong>and</strong> Andrew Wilson<br />

2001 Word Frequencies in Written <strong>and</strong> Spoken English: Based on <strong>the</strong> British<br />

National Corpus. London: Longman.<br />

McEnery, Tony, <strong>and</strong> Andrew Wilson<br />

2001 Corpus Linguistics. An Introduction. Second edition. Edinburgh: Edinburgh<br />

University Press.<br />

Nelson, Gerald, Sean Wallis <strong>and</strong> Bas Aarts (eds.)<br />

2002 Exploring Natural Language: Working with <strong>the</strong> British Component of <strong>the</strong><br />

International Corpus of English. Amsterdam <strong>and</strong> Philadelphia: John Benjamins.


<strong>Negative</strong> <strong>evidence</strong> <strong>and</strong> <strong>the</strong> <strong>raw</strong> <strong>frequency</strong> fallacy 77<br />

Pedersen, Ted<br />

1996 Fishing for exactness. Proceedings of <strong>the</strong> South Central SAS User’s Group<br />

Conference, Austin, TX, 188200.<br />

Pernege, Thomas V<br />

1998 What’s wrong with Bonferroni adjustments. British Medical Journal 316,<br />

12361238.<br />

Pinker, Steven<br />

1989 Learnability <strong>and</strong> Cognition. The Acquisition of Argument Structure. Cambridge,<br />

MA: MIT Press.<br />

Quirk, R<strong>and</strong>olph, Sidney Greenbaum, Geoffrey Leech, <strong>and</strong> Jan Svartvik<br />

1985 A Comprehensive Grammar of <strong>the</strong> English Language. London: Longman.<br />

Stefanowitsch, Anatol<br />

2005 New York, Dayton (Ohio), <strong>and</strong> <strong>the</strong> Raw Frequency Fallacy. Corpus Linguistics<br />

<strong>and</strong> Linguistic Theory 1(2), 295301.<br />

Stefanowitsch, Anatol <strong>and</strong> Stefan Th. Gries<br />

2003 Collostructions: investigating <strong>the</strong> interaction of words <strong>and</strong> constructions.<br />

International Journal of Corpus Linguistics 8(2), 209243.<br />

2005 Covarying Collexemes. Corpus Linguistics <strong>and</strong> Linguistic Theory 1(1),<br />

143.<br />

To appear a Channel <strong>and</strong> constructional meaning: A collostructional case study. In:<br />

Kristiansen, Gitte <strong>and</strong> René Dirven (eds.), Cognitive Sociolinguistics:<br />

Language Variation, Cultural Models, Social Systems. Berlin <strong>and</strong> New<br />

York: Mouton de Gruyter.<br />

To appear b Corpora <strong>and</strong> Grammar. In: Anke Lüdeling, Meria Kytö, <strong>and</strong> Tony<br />

McEnery. Corpus Linguistics (H<strong>and</strong>books of Linguistics <strong>and</strong> Communication<br />

Science/HSK). Berlin: Mouton de Gruyter.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!