26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

49<br />

<strong>on</strong>e <str<strong>on</strong>g>in</str<strong>on</strong>g> Greek, the Perseus Ancient Greek Dependency Treebank (AGDT). 147 This secti<strong>on</strong> describes<br />

these treebanks <strong>and</strong> their uses with<str<strong>on</strong>g>in</str<strong>on</strong>g> classical scholarship.<br />

The Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> Dependency Treebank is a 53,143-word collecti<strong>on</strong> of syntactically parsed Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> sentences<br />

<strong>and</strong> it currently st<strong>and</strong>s at versi<strong>on</strong> 1.5 with excerpts from eight authors. Because Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> is a heavily<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>flected language with a great degree of variability <str<strong>on</strong>g>in</str<strong>on</strong>g> its word order, the annotati<strong>on</strong> style of the Lat<str<strong>on</strong>g>in</str<strong>on</strong>g><br />

Dependency Treebank was based <strong>on</strong> that of the Prague Dependency Treebank (PDT), which was then<br />

tailored for Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> us<str<strong>on</strong>g>in</str<strong>on</strong>g>g the grammar of P<str<strong>on</strong>g>in</str<strong>on</strong>g>kster (Bamman <strong>and</strong> Crane 2006). Accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to Bamman<br />

<strong>and</strong> Crane (2006) there are a variety of potential uses for a Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> treebank, <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g “the potential to<br />

be used as a knowledge source <str<strong>on</strong>g>in</str<strong>on</strong>g> a number of traditi<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>es of <str<strong>on</strong>g>in</str<strong>on</strong>g>quiry, <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g rhetoric,<br />

lexicography, philology <strong>and</strong> historical l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics.” In their <str<strong>on</strong>g>in</str<strong>on</strong>g>itial research they explored us<str<strong>on</strong>g>in</str<strong>on</strong>g>g the<br />

Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> Dependency Treebank to detail the use of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> rhetorical devices <strong>and</strong> to quantify the change<br />

over time <str<strong>on</strong>g>in</str<strong>on</strong>g> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> from a subject-object-verb word order to a subject-verb-object order. Later research<br />

with the use of the Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> Dependency Treebank made use of the resources with<str<strong>on</strong>g>in</str<strong>on</strong>g> the PDL to provide<br />

advanced read<str<strong>on</strong>g>in</str<strong>on</strong>g>g support <strong>and</strong> to provide more-sophisticated levels of lemmatized <strong>and</strong> morphosyntactic<br />

search<str<strong>on</strong>g>in</str<strong>on</strong>g>g (Bamman <strong>and</strong> Crane 2007).<br />

The IT Treebank is an <strong>on</strong>go<str<strong>on</strong>g>in</str<strong>on</strong>g>g project that will <str<strong>on</strong>g>in</str<strong>on</strong>g>clude all of the works of Thomas Aqu<str<strong>on</strong>g>in</str<strong>on</strong>g>as as well as<br />

61 authors related to him <strong>and</strong> will ultimately <str<strong>on</strong>g>in</str<strong>on</strong>g>clude 179 texts <strong>and</strong> 11 milli<strong>on</strong> tokens. Accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to its<br />

website, the IT Treebank is “presently composed of 82,141 tokens, for a total of 3,714 syntactically<br />

parsed sentences excerpted from Scriptum super Sententiis Magistri Petri Lombardi, Summa c<strong>on</strong>tra<br />

Gentiles <strong>and</strong> Summa Theologiae.” Their most recent work has explored the development of a valency<br />

lexic<strong>on</strong>, <strong>and</strong> the authors argue that although many classical languages projects exist, few have<br />

annotated texts above the morphological level (McGillivray <strong>and</strong> Passarotti 2009). N<strong>on</strong>etheless, the<br />

authors <str<strong>on</strong>g>in</str<strong>on</strong>g>sist “nowadays it is possible <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>deed necessary to match lexic<strong>on</strong>s with data from<br />

(annotated) corpora, <strong>and</strong> vice versa. This requires the scholars to exploit the vast amount of textual<br />

data from classical languages already available <str<strong>on</strong>g>in</str<strong>on</strong>g> digital format … <strong>and</strong> particularly those annotated at<br />

the highest levels.”<br />

Rather than develop their own annotati<strong>on</strong> st<strong>and</strong>ards, these two Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> treebank projects worked together<br />

to develop a comm<strong>on</strong> st<strong>and</strong>ard set of guidel<str<strong>on</strong>g>in</str<strong>on</strong>g>es that they have published <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e. 148 This provides an<br />

important example of the need for different projects with similar goals to not <strong>on</strong>ly collaborate but also<br />

to make the results of that collaborati<strong>on</strong> available to others. Another important collaborative feature of<br />

these treebanks is that, particularly <str<strong>on</strong>g>in</str<strong>on</strong>g> the case of the Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> Dependency Treebank, a large number of<br />

graduate <strong>and</strong> undergraduate students have c<strong>on</strong>tributed to this knowledge base.<br />

Other work <str<strong>on</strong>g>in</str<strong>on</strong>g> terms of collaborative treebanks has been c<strong>on</strong>ducted by the Perseus Project, which has<br />

created the AGDT. The AGDT, currently <str<strong>on</strong>g>in</str<strong>on</strong>g> versi<strong>on</strong> 1.1, is a “192,204-word collecti<strong>on</strong> of syntactically<br />

parsed Greek sentences” from Hesiod, Homer, <strong>and</strong> Aeschylus. The development of the AGDT has<br />

focused <strong>on</strong> a new model of treebank<str<strong>on</strong>g>in</str<strong>on</strong>g>g, that of the creati<strong>on</strong> of scholarly treebanks (Bamman,<br />

Mambr<str<strong>on</strong>g>in</str<strong>on</strong>g>i, <strong>and</strong> Crane 2009). While traditi<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistic-annotati<strong>on</strong> projects have focused <strong>on</strong> creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

the s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle best annotati<strong>on</strong> (often enforc<str<strong>on</strong>g>in</str<strong>on</strong>g>g <str<strong>on</strong>g>in</str<strong>on</strong>g>terannotator agreement), such a model is poor fit when the<br />

object of annotati<strong>on</strong> itself is an object of <str<strong>on</strong>g>in</str<strong>on</strong>g>tense scholarly debate:<br />

147 For more <strong>on</strong> the Perseus Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> Dependency Treebank <strong>and</strong> the AGDT (as well as to download them), see http://nlp.perseus.tufts.edu/syntax/treebank/,<br />

<strong>and</strong> for the Index Thomisticus Treebank, see http://itreebank.marg<str<strong>on</strong>g>in</str<strong>on</strong>g>alia.it/<br />

148 For the most recent versi<strong>on</strong> of the guidel<str<strong>on</strong>g>in</str<strong>on</strong>g>es, see http://hdl.h<strong>and</strong>le.net/10427/42683; for more <strong>on</strong> the collaborati<strong>on</strong>, see Bamman, Passarotti, <strong>and</strong> Crane<br />

(2008).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!