26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

55<br />

<strong>and</strong> these authors noted that even more <str<strong>on</strong>g>in</str<strong>on</strong>g>terest<str<strong>on</strong>g>in</str<strong>on</strong>g>g results could be ga<str<strong>on</strong>g>in</str<strong>on</strong>g>ed by us<str<strong>on</strong>g>in</str<strong>on</strong>g>g such techniques<br />

with the large corpus of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> that is grow<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e, from multiple editi<strong>on</strong>s of classical Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> authors<br />

to neo-Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> texts. In a larger corpus, a dynamic lexic<strong>on</strong> could be used to explore how classical Lat<str<strong>on</strong>g>in</str<strong>on</strong>g><br />

authors such as Caesar <strong>and</strong> Ovid used words differently, or the use of a word could be compared<br />

between classical <strong>and</strong> neo-Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> texts. Another advantage of a dynamic lexic<strong>on</strong> is that rather than<br />

present<str<strong>on</strong>g>in</str<strong>on</strong>g>g several highly illustrative examples of word usage (as is d<strong>on</strong>e with the Cambridge Greek<br />

English Lexic<strong>on</strong>), it can present as many examples as are found <str<strong>on</strong>g>in</str<strong>on</strong>g> the corpus. F<str<strong>on</strong>g>in</str<strong>on</strong>g>ally, the fact that the<br />

dynamic lexic<strong>on</strong> supports the ability to search across Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> <strong>and</strong> Greeks text us<str<strong>on</strong>g>in</str<strong>on</strong>g>g English translati<strong>on</strong>s<br />

of Greek <strong>and</strong> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> words is a “close approximati<strong>on</strong> to real cross-language <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> retrieval.”<br />

Perhaps most important, Bamman <strong>and</strong> Crane argue that their work to create a dynamic lexic<strong>on</strong><br />

illustrates how even small structured-knowledge sources can be used to m<str<strong>on</strong>g>in</str<strong>on</strong>g>e <str<strong>on</strong>g>in</str<strong>on</strong>g>terest<str<strong>on</strong>g>in</str<strong>on</strong>g>g patterns from<br />

larger collecti<strong>on</strong>s:<br />

The applicati<strong>on</strong> of structured knowledge to much larger but unstructured collecti<strong>on</strong>s addresses<br />

a gap left by the massive digitizati<strong>on</strong> efforts of groups such as Google <strong>and</strong> the Open C<strong>on</strong>tent<br />

Alliance (OCA). While these large projects are creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g truly milli<strong>on</strong>- book collecti<strong>on</strong>s, the<br />

services they provide are general (e.g., key term extracti<strong>on</strong>, named entity analysis, related<br />

works) <strong>and</strong> reflect the wide array of texts <strong>and</strong> languages they c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g>. By apply<str<strong>on</strong>g>in</str<strong>on</strong>g>g the language<br />

specific knowledge of experts (as encoded <str<strong>on</strong>g>in</str<strong>on</strong>g> our treebank), we are able to create more specific<br />

services to complement these general <strong>on</strong>es already <str<strong>on</strong>g>in</str<strong>on</strong>g> place. In creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g a dynamic lexic<strong>on</strong> built<br />

from the <str<strong>on</strong>g>in</str<strong>on</strong>g>tersecti<strong>on</strong> of a 3.5 milli<strong>on</strong> word corpus <strong>and</strong> a 30,457 word treebank, we are<br />

highlight<str<strong>on</strong>g>in</str<strong>on</strong>g>g the immense role than even very small structured knowledge sources can play<br />

(Bamman <strong>and</strong> Crane 2008).<br />

The authors also observed that s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce many of the technologies used to build the lexic<strong>on</strong>, such as wordsense<br />

disambiguati<strong>on</strong> <strong>and</strong> syntactic pars<str<strong>on</strong>g>in</str<strong>on</strong>g>g, are modular, any separate improvements made to these<br />

algorithms could be <str<strong>on</strong>g>in</str<strong>on</strong>g>corporated back <str<strong>on</strong>g>in</str<strong>on</strong>g>to the lexic<strong>on</strong>. Similarly, as tagg<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> pars<str<strong>on</strong>g>in</str<strong>on</strong>g>g accuracy<br />

improve with the size of a corpus <strong>and</strong> as the tra<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g corpus of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> grows, so will the treebank. In<br />

additi<strong>on</strong>, this work illustrates how small-doma<str<strong>on</strong>g>in</str<strong>on</strong>g> tools might be repurposed to work with larger<br />

collecti<strong>on</strong>s.<br />

Bamman <strong>and</strong> Crane (2009) have <str<strong>on</strong>g>in</str<strong>on</strong>g>vestigated these issues further <str<strong>on</strong>g>in</str<strong>on</strong>g> their overview of computati<strong>on</strong>al<br />

l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics <strong>and</strong> lexicography. They noted that while the TLG <strong>and</strong> Perseus provide “dirty results,” or the<br />

ability to f<str<strong>on</strong>g>in</str<strong>on</strong>g>d all the <str<strong>on</strong>g>in</str<strong>on</strong>g>stances of a lemma <str<strong>on</strong>g>in</str<strong>on</strong>g> their collecti<strong>on</strong>s, the TLL gives a smaller subset of<br />

impeccably precise results. Bamman <strong>and</strong> Crane argued that <str<strong>on</strong>g>in</str<strong>on</strong>g> the future, a comb<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong> of these two<br />

approaches will be necessary, <strong>and</strong> lexicography will need to utilize both mach<str<strong>on</strong>g>in</str<strong>on</strong>g>e-learn<str<strong>on</strong>g>in</str<strong>on</strong>g>g techniques<br />

that learn from large textual collecti<strong>on</strong>s <strong>and</strong> the knowledge <strong>and</strong> labor <str<strong>on</strong>g>in</str<strong>on</strong>g>vested <str<strong>on</strong>g>in</str<strong>on</strong>g> h<strong>and</strong>crafted lexic<strong>on</strong>s<br />

to help such techniques learn. The authors also noted that new lexic<strong>on</strong>s built for a classical<br />

cyber<str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure would need to support new levels of research:<br />

Manual lexicography has produced fantastic results for Classical languages, but as we design a<br />

cyber<str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure for Classics <str<strong>on</strong>g>in</str<strong>on</strong>g> the future, our aim must be to build a scaffold<str<strong>on</strong>g>in</str<strong>on</strong>g>g that is<br />

essentially enabl<str<strong>on</strong>g>in</str<strong>on</strong>g>g: it must not <strong>on</strong>ly make historical languages more accessible <strong>on</strong> a functi<strong>on</strong>al<br />

level, but <str<strong>on</strong>g>in</str<strong>on</strong>g>tellectually as well; it must give students the resources they need to underst<strong>and</strong> a<br />

text while also provid<str<strong>on</strong>g>in</str<strong>on</strong>g>g scholars the tools to <str<strong>on</strong>g>in</str<strong>on</strong>g>teract with it <str<strong>on</strong>g>in</str<strong>on</strong>g> whatever ways they see fit<br />

(Bamman <strong>and</strong> Crane 2009).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!