26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

124<br />

While the advanced document-recogniti<strong>on</strong> technology used with the Archimedes Palimpsest has been<br />

discussed previously, the metadata <strong>and</strong> l<str<strong>on</strong>g>in</str<strong>on</strong>g>k<str<strong>on</strong>g>in</str<strong>on</strong>g>g strategies used to l<str<strong>on</strong>g>in</str<strong>on</strong>g>k manuscript metadata, images,<br />

<strong>and</strong> transcripti<strong>on</strong>s that were developed merit some further discussi<strong>on</strong>. Two recent articles by Doug<br />

Emery <strong>and</strong> Michael B. Toth (Emery <strong>and</strong> Toth 2009, Toth <strong>and</strong> Emery 2008) have described this process<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g> detail. The creati<strong>on</strong> of the Archimedes Palimpsest Digital product, which released <strong>on</strong>e terabyte of<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>tegrated image <strong>and</strong> transcripti<strong>on</strong> data, required the spatial l<str<strong>on</strong>g>in</str<strong>on</strong>g>k<str<strong>on</strong>g>in</str<strong>on</strong>g>g of registered images for each leaf<br />

“to diplomatic transcripti<strong>on</strong>s that scholars <str<strong>on</strong>g>in</str<strong>on</strong>g>itially created <str<strong>on</strong>g>in</str<strong>on</strong>g> various n<strong>on</strong>st<strong>and</strong>ard formats, with<br />

associated st<strong>and</strong>ardized metadata” (Emery <strong>and</strong> Toth 2009). The transcripti<strong>on</strong> encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g built off of<br />

previous work c<strong>on</strong>ducted by the HMT project, <strong>and</strong> Emery <strong>and</strong> Toth noted that st<strong>and</strong>ardized metadata<br />

were critical for three purposes: “(1) access to <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> of images for digital process<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong><br />

enhancement, (2) management of transcripti<strong>on</strong>s from those images, <strong>and</strong> (3) l<str<strong>on</strong>g>in</str<strong>on</strong>g>kage of the images with<br />

the transcripti<strong>on</strong>s.”<br />

The authors also described how the great discipl<str<strong>on</strong>g>in</str<strong>on</strong>g>ary variety of scholars work<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>on</strong> the palimpsest,<br />

from students of Ancient Greek to those explor<str<strong>on</strong>g>in</str<strong>on</strong>g>g the history of science, necessitated the ability to<br />

capture data from a range of scholars <str<strong>on</strong>g>in</str<strong>on</strong>g> a st<strong>and</strong>ard digital format. This necessity led to a “Transcripti<strong>on</strong><br />

Integrati<strong>on</strong> Plan” that <str<strong>on</strong>g>in</str<strong>on</strong>g>corporated Unicode, Dubl<str<strong>on</strong>g>in</str<strong>on</strong>g> Core, <strong>and</strong> the TEI. They expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed that they chose<br />

Dubl<str<strong>on</strong>g>in</str<strong>on</strong>g> Core as their major <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> st<strong>and</strong>ard for digital images <strong>and</strong> transcripti<strong>on</strong>s because it would<br />

allow for “host<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> of this data set <strong>and</strong> other cultural works across service providers,<br />

libraries <strong>and</strong> cultural <str<strong>on</strong>g>in</str<strong>on</strong>g>stituti<strong>on</strong>s” (Toth <strong>and</strong> Emery 2008). While they used the “Identificati<strong>on</strong>,” “Data<br />

Type,” <strong>and</strong> “Data C<strong>on</strong>tent” elements from the Dubl<str<strong>on</strong>g>in</str<strong>on</strong>g> Core element set, they also needed to extend this<br />

st<strong>and</strong>ard with elements such as “Spatial Data Reference” drawn from the Federal Geographic Data<br />

Committee C<strong>on</strong>tent St<strong>and</strong>ard for Digital Geospatial Metadata.<br />

Emery <strong>and</strong> Toth (2009) argued that <strong>on</strong>e of the guid<str<strong>on</strong>g>in</str<strong>on</strong>g>g pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ciples both beh<str<strong>on</strong>g>in</str<strong>on</strong>g>d their choice of comm<strong>on</strong><br />

st<strong>and</strong>ards <strong>and</strong> emphasis <strong>on</strong> the importance of <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrat<str<strong>on</strong>g>in</str<strong>on</strong>g>g data <strong>and</strong> metadata was the need to create a<br />

digital archive for both today <strong>and</strong> the distant future. The data set they created thus also follows the<br />

pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ciples of the Open Archival Informati<strong>on</strong> System (OAIS) 396 In their data set, every image bears all<br />

relevant metadata <str<strong>on</strong>g>in</str<strong>on</strong>g> its header, <strong>and</strong> each image file or folio directory serves as a self-c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g>ed<br />

preservati<strong>on</strong> unit that <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes all the images of a given folio side, XMP metadata files, checksum data<br />

<strong>and</strong> the spatially mapped TEI-XML transcripti<strong>on</strong>s. In additi<strong>on</strong>, the project developed its own<br />

Archimedes Palimpsest Metadata St<strong>and</strong>ard that “provides a metadata structure specifically geared to<br />

relat<str<strong>on</strong>g>in</str<strong>on</strong>g>g all images of a folio side <str<strong>on</strong>g>in</str<strong>on</strong>g> a s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle multi- or hyper-spectral data “cube””<br />

(Emery <strong>and</strong> Toth 2009). Because each image has its own embedded metadata, the images can either<br />

st<strong>and</strong> al<strong>on</strong>e or be related to other members of the same cube. F<str<strong>on</strong>g>in</str<strong>on</strong>g>ally, more than 140 of the 180 folio<br />

sides <str<strong>on</strong>g>in</str<strong>on</strong>g>clude a transcripti<strong>on</strong>, <strong>and</strong> the l<str<strong>on</strong>g>in</str<strong>on</strong>g>es <str<strong>on</strong>g>in</str<strong>on</strong>g> these transcripti<strong>on</strong>s are mapped to rectangular regi<strong>on</strong>s <str<strong>on</strong>g>in</str<strong>on</strong>g><br />

the folio images us<str<strong>on</strong>g>in</str<strong>on</strong>g>g the TEI element. This mapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g serves two useful purposes: it<br />

allows the digital transcripti<strong>on</strong>s to provide “mach<str<strong>on</strong>g>in</str<strong>on</strong>g>e-readable c<strong>on</strong>tent” <strong>and</strong> allows easy movement<br />

between the transcripti<strong>on</strong> <strong>and</strong> the image.<br />

In additi<strong>on</strong> to the challenges presented by <str<strong>on</strong>g>in</str<strong>on</strong>g>dividual manuscripts, other digital projects have explored<br />

the challenges of manag<str<strong>on</strong>g>in</str<strong>on</strong>g>g multiple manuscripts of the same text. The Roman de La Rose 397 Digital<br />

<strong>Library</strong> (RRDL), a jo<str<strong>on</strong>g>in</str<strong>on</strong>g>t project of the Sheridan Libraries of Johns Hopk<str<strong>on</strong>g>in</str<strong>on</strong>g>s University <strong>and</strong> the<br />

Bibliothèque Nati<strong>on</strong>ale de France (BnF), seeks to ultimately provide access to digital surrogates of all<br />

of the manuscripts (more than 300) c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g the Roman de la Rose poem. The creati<strong>on</strong> of this digital<br />

396 For more <strong>on</strong> this ISO st<strong>and</strong>ard, see http://public.ccsds.org/publicati<strong>on</strong>s/archive/650x0b1.pdf<br />

397 http://rom<strong>and</strong>elarose.org/ - home

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!