26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

54<br />

entirely pre-searched for each lemma-form, <strong>and</strong> the results archived <str<strong>on</strong>g>in</str<strong>on</strong>g> static HTML (Hypertext<br />

Mark-up Language) pages. This c<strong>on</strong>stitutes a digital archive of lexicographic ‘slips’, provid<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

the dicti<strong>on</strong>ary writers with immediate access to the searches, <strong>and</strong> also enabl<str<strong>on</strong>g>in</str<strong>on</strong>g>g the citati<strong>on</strong>s <strong>and</strong><br />

their c<strong>on</strong>texts to be archived <str<strong>on</strong>g>in</str<strong>on</strong>g> a generic format that is not tied to any particular operat<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

system or database program (Fraser 2008).<br />

This digital archive of Greek lemma searches has helped speed the process of writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g entries.<br />

Interest<str<strong>on</strong>g>in</str<strong>on</strong>g>gly, as this lexic<strong>on</strong> has been designed for students, Fraser noted that it gives fewer Greek<br />

quotati<strong>on</strong>s <strong>and</strong> more space to semantic descripti<strong>on</strong>. Citati<strong>on</strong>s have also been restricted to a can<strong>on</strong> of 70<br />

authors, with no examples taken from fragmentary authors or <str<strong>on</strong>g>in</str<strong>on</strong>g>scripti<strong>on</strong>s. Fraser also reported that<br />

while dicti<strong>on</strong>ary entries are stored <str<strong>on</strong>g>in</str<strong>on</strong>g> XML, the project created a new DTD for their system based <strong>on</strong> a<br />

“provisi<strong>on</strong>al entry structure.”<br />

The PDL already c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g>s digital versi<strong>on</strong>s of lexic<strong>on</strong>s for some <str<strong>on</strong>g>in</str<strong>on</strong>g>dividual authors 159 as well as several<br />

major classical lexic<strong>on</strong>s, such as the Lewis & Short Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> Dicti<strong>on</strong>ary 160 <strong>and</strong> the Liddell Scott Johns<strong>on</strong><br />

Greek English Lexic<strong>on</strong> (LSJ). 161 The lexic<strong>on</strong>s that are a part of Perseus, however, differ from the<br />

projects described above. Instead of design<str<strong>on</strong>g>in</str<strong>on</strong>g>g lexic<strong>on</strong>s for both pr<str<strong>on</strong>g>in</str<strong>on</strong>g>t <strong>and</strong> electr<strong>on</strong>ic distributi<strong>on</strong>, the<br />

lexic<strong>on</strong>s, <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>deed all reference works that are part of Perseus, have been created from the start to<br />

serve as both hyperl<str<strong>on</strong>g>in</str<strong>on</strong>g>ked tools <str<strong>on</strong>g>in</str<strong>on</strong>g> an <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrated Greek <strong>and</strong> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> read<str<strong>on</strong>g>in</str<strong>on</strong>g>g envir<strong>on</strong>ment <strong>and</strong> as<br />

knowledge sources that can be m<str<strong>on</strong>g>in</str<strong>on</strong>g>ed to support a variety of automated processes. 162<br />

In additi<strong>on</strong> to turn<str<strong>on</strong>g>in</str<strong>on</strong>g>g “traditi<strong>on</strong>al” pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted lexic<strong>on</strong>s <str<strong>on</strong>g>in</str<strong>on</strong>g>to dynamic reference works, current research at<br />

Perseus is explor<str<strong>on</strong>g>in</str<strong>on</strong>g>g how to create a new k<str<strong>on</strong>g>in</str<strong>on</strong>g>d of “dynamic lexic<strong>on</strong>” that is generated not from just <strong>on</strong>e<br />

pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted text but from all the texts <str<strong>on</strong>g>in</str<strong>on</strong>g> a digital library (Bamman <strong>and</strong> Crane 2008). They first used the<br />

large, aligned, parallel corpus of English <strong>and</strong> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> <str<strong>on</strong>g>in</str<strong>on</strong>g> Perseus to <str<strong>on</strong>g>in</str<strong>on</strong>g>duce a word-sense <str<strong>on</strong>g>in</str<strong>on</strong>g>ventory <strong>and</strong><br />

determ<str<strong>on</strong>g>in</str<strong>on</strong>g>ed how often certa<str<strong>on</strong>g>in</str<strong>on</strong>g> def<str<strong>on</strong>g>in</str<strong>on</strong>g>iti<strong>on</strong>s of a word were actually manifested, while us<str<strong>on</strong>g>in</str<strong>on</strong>g>g the c<strong>on</strong>text<br />

surround<str<strong>on</strong>g>in</str<strong>on</strong>g>g words to determ<str<strong>on</strong>g>in</str<strong>on</strong>g>e which def<str<strong>on</strong>g>in</str<strong>on</strong>g>iti<strong>on</strong>s were used <str<strong>on</strong>g>in</str<strong>on</strong>g> a given <str<strong>on</strong>g>in</str<strong>on</strong>g>stance. The treebank was<br />

then used to tra<str<strong>on</strong>g>in</str<strong>on</strong>g> an automatic syntactic parser for the Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> corpus, <str<strong>on</strong>g>in</str<strong>on</strong>g> particular to extract <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong><br />

about word’s subcategorizati<strong>on</strong> frames <strong>and</strong> selecti<strong>on</strong>al preferences. Cluster<str<strong>on</strong>g>in</str<strong>on</strong>g>g was then used to<br />

establish semantic similarity between words determ<str<strong>on</strong>g>in</str<strong>on</strong>g>ed by their appearance <str<strong>on</strong>g>in</str<strong>on</strong>g> similar c<strong>on</strong>texts<br />

(Bamman <strong>and</strong> Crane 2009). This automatically extracted lexical <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> can be used <str<strong>on</strong>g>in</str<strong>on</strong>g> a variety of<br />

ways:<br />

A digital library architecture <str<strong>on</strong>g>in</str<strong>on</strong>g>teracts with this knowledge <str<strong>on</strong>g>in</str<strong>on</strong>g> three ways: first, it lets us further<br />

c<strong>on</strong>textualize our source texts for the users of our exist<str<strong>on</strong>g>in</str<strong>on</strong>g>g digital library; sec<strong>on</strong>d, it allows us to<br />

present customized reports for word usage accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to the metadata associated with the texts<br />

from which they’re drawn, enabl<str<strong>on</strong>g>in</str<strong>on</strong>g>g us to create a dynamic lexic<strong>on</strong> that not <strong>on</strong>ly notes how a<br />

word is used <str<strong>on</strong>g>in</str<strong>on</strong>g> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> <str<strong>on</strong>g>in</str<strong>on</strong>g> general, but also <str<strong>on</strong>g>in</str<strong>on</strong>g> any specific author, genre, or era (or comb<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong><br />

of those). And third, it lets us c<strong>on</strong>t<str<strong>on</strong>g>in</str<strong>on</strong>g>ue to m<str<strong>on</strong>g>in</str<strong>on</strong>g>e more texts for the knowledge they c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g> as<br />

they’re added to the library collecti<strong>on</strong>, essentially mak<str<strong>on</strong>g>in</str<strong>on</strong>g>g it an open-ended service (Bamman<br />

<strong>and</strong> Crane 2008).<br />

As <strong>on</strong>e example, Bamman <strong>and</strong> Crane (2008) traced how the use of the word libero changed over time<br />

<strong>and</strong> across genre (e.g., classical authors vs. Church Fathers). The Perseus corpus is somewhat small,<br />

159 For example, P<str<strong>on</strong>g>in</str<strong>on</strong>g>dar --http://www.perseus.tufts.edu/hopper/textdoc=Perseus%3atext%3a1999.04.0072<br />

160 http://www.perseus.tufts.edu/hopper/textdoc=Perseus%3atext%3a1999.04.0059<br />

161 http://www.perseus.tufts.edu/hopper/textdoc=Perseus%3atext%3a1999.04.0057, for more <strong>on</strong> the development of the LSJ, see Crane (1998) <strong>and</strong><br />

Rydberg-Cox (2002).<br />

162 For more <strong>on</strong> the need to design “dynamic reference works,” see Crane (2005) <strong>and</strong> Crane <strong>and</strong> J<strong>on</strong>es (2006).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!