26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

155<br />

Although the applicati<strong>on</strong> named DUGA that has been created by the eSAD project is based <strong>on</strong> decisi<strong>on</strong><br />

support system technology (DSS) such as that used by doctors <strong>and</strong> eng<str<strong>on</strong>g>in</str<strong>on</strong>g>eers, they ultimately decided it<br />

was an <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong> support system they were creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce “experts transcrib<str<strong>on</strong>g>in</str<strong>on</strong>g>g ancient documents<br />

do not make decisi<strong>on</strong>s based <strong>on</strong> evidence but <str<strong>on</strong>g>in</str<strong>on</strong>g>stead create <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong>s of the texts based <strong>on</strong> their<br />

percepti<strong>on</strong>” (Olsen et al. 2009). At the same time, <strong>on</strong>e of the key research goals was to explore issues<br />

of technology transfer, or to see if the ideas <str<strong>on</strong>g>in</str<strong>on</strong>g>volved <str<strong>on</strong>g>in</str<strong>on</strong>g> creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g a DSS could be transferred to the<br />

work of classical scholars (Roued-Cunliffe 2010).<br />

One key idea beh<str<strong>on</strong>g>in</str<strong>on</strong>g>d the ISS is that an <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong> is made up of a network of “percepts” that range<br />

from low-level (determ<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g a character was created by an <str<strong>on</strong>g>in</str<strong>on</strong>g>cised stroke) to high-level (determ<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

that several characters make up a word). While this network of percepts is implicit <str<strong>on</strong>g>in</str<strong>on</strong>g> the process of<br />

papyrologists, the eSAD project plans to make them explicit <str<strong>on</strong>g>in</str<strong>on</strong>g> a “human-readable format through a<br />

web-based browser applicati<strong>on</strong>” (Olsen et al. 2009). In the applicati<strong>on</strong>, the most elementary percepts<br />

will be image regi<strong>on</strong>s that c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g> “graphemes”; these images will then be divided <str<strong>on</strong>g>in</str<strong>on</strong>g>to cells “where<br />

each cell is expected to c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g> what is perceived as a character or a space” (Olsen et al. 2009). The<br />

divisi<strong>on</strong> of the image is c<strong>on</strong>sidered to be a tessellati<strong>on</strong>, <strong>and</strong> documents can be tessellated <str<strong>on</strong>g>in</str<strong>on</strong>g> different<br />

ways. The basic idea is that <str<strong>on</strong>g>in</str<strong>on</strong>g>dividual <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong>s can be represented as “networks of substantiated<br />

percepts” that will then be made explicit through an <strong>on</strong>tology. “The <strong>on</strong>tology aims to make the<br />

rati<strong>on</strong>ale beh<str<strong>on</strong>g>in</str<strong>on</strong>g>d the network of percepts visible,” Olsen et al. expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed, “<strong>and</strong> thus expose both: (a)<br />

some of the cognitive processes <str<strong>on</strong>g>in</str<strong>on</strong>g>volved <str<strong>on</strong>g>in</str<strong>on</strong>g> damaged texts <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong>; <strong>and</strong> (b) a set of arguments<br />

support<str<strong>on</strong>g>in</str<strong>on</strong>g>g the tentative <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong>” (Olsen et al. 2009). The f<str<strong>on</strong>g>in</str<strong>on</strong>g>al ISS system will use this <strong>on</strong>tology<br />

(that will be formatted <str<strong>on</strong>g>in</str<strong>on</strong>g> EpiDoc) as a framework to assist scholars <str<strong>on</strong>g>in</str<strong>on</strong>g> creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g transcripti<strong>on</strong>s of texts.<br />

Another step <str<strong>on</strong>g>in</str<strong>on</strong>g> the model<str<strong>on</strong>g>in</str<strong>on</strong>g>g process for the ISS was creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g image-capture <strong>and</strong> -process<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

algorithms that could embody perceptual processes of papyrologists. As papyrologists often do not<br />

have access to the orig<str<strong>on</strong>g>in</str<strong>on</strong>g>al objects, they frequently work with digital photographs, <strong>and</strong> Tarte (2011)<br />

acknowledged that digitiz<str<strong>on</strong>g>in</str<strong>on</strong>g>g wooden stylus tablets as “text bear<str<strong>on</strong>g>in</str<strong>on</strong>g>g objects” was not an easy feat.<br />

Up<strong>on</strong> observ<str<strong>on</strong>g>in</str<strong>on</strong>g>g papyrologists, they c<strong>on</strong>cluded that both manipulati<strong>on</strong>s of the images <strong>and</strong> prior<br />

knowledge played important roles <str<strong>on</strong>g>in</str<strong>on</strong>g> the percepti<strong>on</strong> of characters <strong>and</strong> words. The tablets were<br />

visualized us<str<strong>on</strong>g>in</str<strong>on</strong>g>g polynomial texture maps, <strong>and</strong> several algorithms were used to detect the text with<str<strong>on</strong>g>in</str<strong>on</strong>g><br />

the images. The algorithms created for m<str<strong>on</strong>g>in</str<strong>on</strong>g>imiz<str<strong>on</strong>g>in</str<strong>on</strong>g>g background <str<strong>on</strong>g>in</str<strong>on</strong>g>terference (flatten<str<strong>on</strong>g>in</str<strong>on</strong>g>g the gra<str<strong>on</strong>g>in</str<strong>on</strong>g> of the<br />

wood) have also been utilized <str<strong>on</strong>g>in</str<strong>on</strong>g> the VRE-SDM. One of the most complicated (<strong>and</strong> still <strong>on</strong>go<str<strong>on</strong>g>in</str<strong>on</strong>g>g) tasks<br />

was develop<str<strong>on</strong>g>in</str<strong>on</strong>g>g algorithms to extract the “strokelets” that form characters (<str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g broken <strong>on</strong>es), as<br />

this is the feature “<strong>on</strong> which the human visual system locks” (Tarte 2011). The f<str<strong>on</strong>g>in</str<strong>on</strong>g>al major algorithm<br />

developed was a “stroke-completi<strong>on</strong> algorithm” that was created to help facilitate both automatic <strong>and</strong><br />

scholarly identificati<strong>on</strong> of characters. The ISS to be developed will eventually propose potential<br />

character read<str<strong>on</strong>g>in</str<strong>on</strong>g>gs (utiliz<str<strong>on</strong>g>in</str<strong>on</strong>g>g a knowledge base of “digitally identified list of possible read<str<strong>on</strong>g>in</str<strong>on</strong>g>gs”) but<br />

will never force a user to choose <strong>on</strong>e (Tarte 2011).<br />

Many of the <str<strong>on</strong>g>in</str<strong>on</strong>g>sights for both the algorithm development process <strong>and</strong> the format of the ISS built off<br />

earlier work by Melissa Terras that modeled how papyrologists read documents (Terras 2005). This<br />

model identified various levels of read<str<strong>on</strong>g>in</str<strong>on</strong>g>g c<strong>on</strong>ducted by papyrologists (identify<str<strong>on</strong>g>in</str<strong>on</strong>g>g features [strokelets],<br />

characters, series of characters, morpheme, grammatical level, mean<str<strong>on</strong>g>in</str<strong>on</strong>g>g of word, mean<str<strong>on</strong>g>in</str<strong>on</strong>g>g of groups of<br />

words, mean<str<strong>on</strong>g>in</str<strong>on</strong>g>g of document), but the use of knowledge-elicitati<strong>on</strong> techniques, such as “th<str<strong>on</strong>g>in</str<strong>on</strong>g>k aloud”<br />

protocols, by Terras revealed that “<str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong> as a mean<str<strong>on</strong>g>in</str<strong>on</strong>g>g-build<str<strong>on</strong>g>in</str<strong>on</strong>g>g process” did not <str<strong>on</strong>g>in</str<strong>on</strong>g>variably<br />

beg<str<strong>on</strong>g>in</str<strong>on</strong>g> at the feature level <strong>and</strong> then successively build to higher levels of read<str<strong>on</strong>g>in</str<strong>on</strong>g>g. Instead, as Tarte<br />

expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed, the creati<strong>on</strong> of <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong>s jumped between levels of read<str<strong>on</strong>g>in</str<strong>on</strong>g>g, <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong>s at any<br />

given level might <str<strong>on</strong>g>in</str<strong>on</strong>g>fluence those at another. Roued-Cunliffe articulated this po<str<strong>on</strong>g>in</str<strong>on</strong>g>t further:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!