15.06.2013 Views

Selected Papers from the Fourteenth International ... - STIBA Malang

Selected Papers from the Fourteenth International ... - STIBA Malang

Selected Papers from the Fourteenth International ... - STIBA Malang

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1 Ans van Kemenade, Tanja Milicev & R. Harald Baayen<br />

<strong>from</strong> YCOE. To help manage <strong>the</strong> data, <strong>the</strong> query results <strong>from</strong> CorpusSearch were<br />

imported into a simple Microsoft Access database. This was accomplished with<br />

a specially-written script in <strong>the</strong> computer language perl, which transformed <strong>the</strong><br />

CorpusSearch output into several tables in ‘comma-separated value’ (CSV) format.<br />

These tables were <strong>the</strong>n imported into Access (using its interactive ‘Import’<br />

command).<br />

The Access database includes one table for <strong>the</strong> subclauses found by <strong>the</strong> queries<br />

in YCOE, a second table for <strong>the</strong> complete sentence containing <strong>the</strong> clause in context,<br />

and a third table for <strong>the</strong> manually-entered subject properties. A fourth table was<br />

added later listing <strong>the</strong> source documents and <strong>the</strong> chronological period <strong>the</strong>y belong<br />

to. These tables are related to each o<strong>the</strong>r with appropriate keys and relationships,<br />

and are edited <strong>from</strong> a form that arranges <strong>the</strong> information conveniently. 2<br />

3.1 Parameters and values<br />

The discourse-relevant properties of each subject were entered in numerical values.<br />

We here discuss only those properties primarily relevant for <strong>the</strong> quantitative analysis<br />

in this article.<br />

The first relevant parameter is NP type, <strong>the</strong> numerical values are as in (26):<br />

(26) Numerical values for NP type<br />

1 personal pronoun<br />

2 weak demonstrative (<strong>the</strong> se paradigm)<br />

3 strong demonstrative (this, that, <strong>the</strong>se, those)<br />

4 definite NP<br />

5 indefinite NP<br />

6 reflexive pronoun<br />

7 Man<br />

8 proper name<br />

The second relevant parameter is NP position, <strong>the</strong> values are in (27):<br />

(27) Numerical values for NP position<br />

1 left periphery (e.g., wh-words in questions)<br />

2 high (preceding þa/þonne)<br />

3 mid (immediately following þa/þonne)<br />

4 low (preceding <strong>the</strong> non-finite verb)<br />

5 low (following <strong>the</strong> non-finite verb)<br />

2. The database and <strong>the</strong> import script were built by Alexis Dimitriadis. They made it possible<br />

to analyze a large number of sentences with a considerably increased level of efficiency and<br />

accuracy.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!