26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

17<br />

should segment the text <str<strong>on</strong>g>in</str<strong>on</strong>g>to blocks—which may be smaller than words—while recogniz<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

(Edwards et al. 2004).<br />

In their model, Edwards <strong>and</strong> colleagues chose not to model word to word transiti<strong>on</strong> probabilities s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce<br />

word order <str<strong>on</strong>g>in</str<strong>on</strong>g> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> is highly arbitrary. The method had reas<strong>on</strong>able accuracy: 75 percent of the letters<br />

were correctly transcribed <strong>and</strong> the search<str<strong>on</strong>g>in</str<strong>on</strong>g>g ability was reported to be relatively str<strong>on</strong>g.<br />

Some research with document analysis of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> manuscripts has focused <strong>on</strong> assist<str<strong>on</strong>g>in</str<strong>on</strong>g>g palaeographers.<br />

The discipl<str<strong>on</strong>g>in</str<strong>on</strong>g>e of palaeography is explored further <str<strong>on</strong>g>in</str<strong>on</strong>g> its subsecti<strong>on</strong>, but <str<strong>on</strong>g>in</str<strong>on</strong>g> general, palaeography<br />

studies the writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g style of ancient documents. 53 Moalla et al. (2006) c<strong>on</strong>ducted automatic analysis of<br />

the writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g styles of ancient Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> manuscripts from the eighth to the sixteenth centuries <strong>and</strong> focused<br />

<strong>on</strong> the extracti<strong>on</strong> of “sufficiently discrim<str<strong>on</strong>g>in</str<strong>on</strong>g>ative features” to be able to differentiate between<br />

sufficiently large numbers of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> writ<str<strong>on</strong>g>in</str<strong>on</strong>g>gs. A number of problems complicated their image analysis,<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g the complexity of the shapes of letters, hybrid writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g styles, poor manuscript quality,<br />

overlapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g l<str<strong>on</strong>g>in</str<strong>on</strong>g>es <strong>and</strong> words, <strong>and</strong> poor-quality manuscript images. Their discrim<str<strong>on</strong>g>in</str<strong>on</strong>g>ant analysis of 15<br />

Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> classes achieved a classificati<strong>on</strong>-accuracy rate of <strong>on</strong>ly 59 percent <str<strong>on</strong>g>in</str<strong>on</strong>g> their first iterati<strong>on</strong>, but the<br />

elim<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong> of four classes that were not statistically well-represented <str<strong>on</strong>g>in</str<strong>on</strong>g>creased the rate to 81 percent.<br />

Another key area of technology research is the development of techniques for digitiz<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> search<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>cunabula, or early pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted books, a large number of which were pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted <str<strong>on</strong>g>in</str<strong>on</strong>g> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>. One major project<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g> this area is CAMENA—Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> Texts of Early Modern Europe, 54 hosted by the University of<br />

Mannheim. Their digital library <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes five collecti<strong>on</strong>s: a collecti<strong>on</strong> of Neo-Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> poetry composed<br />

by German authors available as images <strong>and</strong> mach<str<strong>on</strong>g>in</str<strong>on</strong>g>e-readable texts; a collecti<strong>on</strong> of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> historical <strong>and</strong><br />

political writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g from early modern Germany; a reference collecti<strong>on</strong> of dicti<strong>on</strong>aries <strong>and</strong> h<strong>and</strong>books<br />

from 1500–1750 that helps provide a read<str<strong>on</strong>g>in</str<strong>on</strong>g>g envir<strong>on</strong>ment; a corpus of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> letters written by German<br />

scholars between 1530 <strong>and</strong> 1770; <strong>and</strong> a collecti<strong>on</strong> of early pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted editi<strong>on</strong>s of Italian Renaissance<br />

humanists born before 1500. This project also <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes the Term<str<strong>on</strong>g>in</str<strong>on</strong>g>i <strong>and</strong> Lemmata databases, which are<br />

now part of the eAQUA Project. The wealth of Neo-Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> materials <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e is well documented by the<br />

“Philological Museum: An Analytic Bibliography of On-L<str<strong>on</strong>g>in</str<strong>on</strong>g>e Neo Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> Texts,” 55 an extensive<br />

website created by Dana F. Sutt<strong>on</strong> of the University of California, Irv<str<strong>on</strong>g>in</str<strong>on</strong>g>e, that s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce 1999 has served as<br />

an “analytic bibliography of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> texts written dur<str<strong>on</strong>g>in</str<strong>on</strong>g>g the Renaissance <strong>and</strong> later that are freely<br />

available to the general public <strong>on</strong> the Web” <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes more than 33,960 records.<br />

Digitiz<str<strong>on</strong>g>in</str<strong>on</strong>g>g <str<strong>on</strong>g>in</str<strong>on</strong>g>cunabula, or books pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted before 1500, poses a number of challenges, as outl<str<strong>on</strong>g>in</str<strong>on</strong>g>ed by<br />

Schibel <strong>and</strong> Rydberg-Cox (2006) <strong>and</strong> Rydberg-Cox (2009). As they expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed:<br />

The primary challenges arise from the use of n<strong>on</strong>st<strong>and</strong>ard typographical glyphs based <strong>on</strong><br />

medieval h<strong>and</strong>writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g to abbreviate words. Further difficulties are posed by the practice of<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>c<strong>on</strong>sistently mark<str<strong>on</strong>g>in</str<strong>on</strong>g>g word breaks at the end of l<str<strong>on</strong>g>in</str<strong>on</strong>g>es <strong>and</strong> of reduc<str<strong>on</strong>g>in</str<strong>on</strong>g>g or even elim<str<strong>on</strong>g>in</str<strong>on</strong>g>at<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

spac<str<strong>on</strong>g>in</str<strong>on</strong>g>g between some words (Rydberg-Cox 2009).<br />

In additi<strong>on</strong>, such digitized texts are often presented to a modern audience <strong>on</strong>ly after an extensive<br />

amount of edit<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> annotati<strong>on</strong> has occurred, a level of edit<str<strong>on</strong>g>in</str<strong>on</strong>g>g that is not scalable to milli<strong>on</strong>-book<br />

libraries.<br />

53 An excellent resource for explor<str<strong>on</strong>g>in</str<strong>on</strong>g>g ancient writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g systems is Mnam<strong>on</strong>: Ancient Writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g Systems <str<strong>on</strong>g>in</str<strong>on</strong>g> the Mediterranean<br />

(http://lila.sns.it/mnam<strong>on</strong>/<str<strong>on</strong>g>in</str<strong>on</strong>g>dex.phppage=Home&lang=en), which not <strong>on</strong>ly provides extensive descripti<strong>on</strong>s <strong>on</strong> various writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g systems but also <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes<br />

selected electr<strong>on</strong>ic resources.<br />

54 http://www.uni-mannheim.de/mateo/camenahtdocs/camena.html<br />

55 http://www.philological.bham.ac.uk/bibliography/

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!