26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

46<br />

different apparatuses. In Boschetti’s system, names had to “match items of a table that c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g>s the<br />

can<strong>on</strong>ical form of the name, abbreviati<strong>on</strong>s, orthographical variants <strong>and</strong> possible decl<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong>s”<br />

The next major step was to develop a set of heuristics to be used <str<strong>on</strong>g>in</str<strong>on</strong>g> automatically pars<str<strong>on</strong>g>in</str<strong>on</strong>g>g the different<br />

apparatuses. Each item <str<strong>on</strong>g>in</str<strong>on</strong>g> the apparatus was separated by a new l<str<strong>on</strong>g>in</str<strong>on</strong>g>e <strong>and</strong> all items were then tokenized<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>to <strong>on</strong>e of the follow<str<strong>on</strong>g>in</str<strong>on</strong>g>g categories: verse number, Greek word, Greek punctuati<strong>on</strong> mark, metrical<br />

sign, Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> word, Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> punctuati<strong>on</strong> mark, scholar name, manuscript abridgment, <strong>and</strong> bibliographic<br />

reference. All scholars’ names, manuscript abridgments, <strong>and</strong> bibliographic references were compared<br />

with <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> from the tables created <str<strong>on</strong>g>in</str<strong>on</strong>g> the previous step. The rest of the tokens were then<br />

aggregated to identify verse references, read<str<strong>on</strong>g>in</str<strong>on</strong>g>gs, <strong>and</strong> sources. The f<str<strong>on</strong>g>in</str<strong>on</strong>g>al step was the use of an<br />

alignment algorithm to parse text substituti<strong>on</strong>s “<str<strong>on</strong>g>in</str<strong>on</strong>g> order to map the read<str<strong>on</strong>g>in</str<strong>on</strong>g>gs <strong>on</strong> the exact positi<strong>on</strong> of<br />

the verse <str<strong>on</strong>g>in</str<strong>on</strong>g> the reference editi<strong>on</strong>.” Boschetti revealed that about 90 percent of read<str<strong>on</strong>g>in</str<strong>on</strong>g>gs found <str<strong>on</strong>g>in</str<strong>on</strong>g><br />

apparatuses were substituti<strong>on</strong>s, or chunks of text that should replace <strong>on</strong>e or more l<str<strong>on</strong>g>in</str<strong>on</strong>g>es <str<strong>on</strong>g>in</str<strong>on</strong>g> a reference<br />

editi<strong>on</strong>. His algorithm utilized the c<strong>on</strong>cept of “edit distance” 141 to align read<str<strong>on</strong>g>in</str<strong>on</strong>g>gs from the apparatus<br />

with the porti<strong>on</strong> of text <str<strong>on</strong>g>in</str<strong>on</strong>g> the reference editi<strong>on</strong> where the edit distance was lowest. Boschetti also<br />

chose to use a “brute force” comb<str<strong>on</strong>g>in</str<strong>on</strong>g>atorial algorithm that “rec<strong>on</strong>structs all the comb<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong>s of adjacent<br />

words <str<strong>on</strong>g>in</str<strong>on</strong>g> the reference text (capitalised <strong>and</strong> without spaces) <strong>and</strong> it compares them with the read<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong><br />

its permutati<strong>on</strong>s.” One limitati<strong>on</strong> of his work, Boschetti reported, was that the current system is applied<br />

<strong>on</strong>ly to “items c<strong>on</strong>stituted by Greek sequences, immediately followed by source,” <strong>and</strong> excludes those<br />

cases where items <str<strong>on</strong>g>in</str<strong>on</strong>g>cluded Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> language explanati<strong>on</strong>s of textual operati<strong>on</strong>s to perform.<br />

To test his system, Boschetti calculated its performance aga<str<strong>on</strong>g>in</str<strong>on</strong>g>st 56 verses of Weckle<str<strong>on</strong>g>in</str<strong>on</strong>g>’s editi<strong>on</strong> of<br />

Aeschylus’ Persae <strong>and</strong> evaluated it by h<strong>and</strong>. For processed items (exclud<str<strong>on</strong>g>in</str<strong>on</strong>g>g items with Lat<str<strong>on</strong>g>in</str<strong>on</strong>g><br />

predicates), 88 percent of c<strong>on</strong>jectures were mapped <strong>on</strong>to the reference text correctly, <strong>and</strong> 77 percent of<br />

c<strong>on</strong>jectures were mapped correctly <str<strong>on</strong>g>in</str<strong>on</strong>g> the total collecti<strong>on</strong>. This work illustrates that although an<br />

automated system does require a fair amount of prelim<str<strong>on</strong>g>in</str<strong>on</strong>g>ary manual analysis, the heuristics <strong>and</strong><br />

algorithms that were created provide encourag<str<strong>on</strong>g>in</str<strong>on</strong>g>g results that deserve further explorati<strong>on</strong>.<br />

Recent work by Schmidt <strong>and</strong> Colomb (2009) has taken a different approach to the challenge of textual<br />

variati<strong>on</strong>, <strong>on</strong>e that also addresses related issues with overlapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g hierarchies <str<strong>on</strong>g>in</str<strong>on</strong>g> markup. Accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to<br />

Schmidt <strong>and</strong> Colomb, there are two basic forms of textual variati<strong>on</strong>: that found <str<strong>on</strong>g>in</str<strong>on</strong>g> multiple copies of a<br />

work, such as <str<strong>on</strong>g>in</str<strong>on</strong>g> the case of multiple manuscripts; <strong>and</strong> that aris<str<strong>on</strong>g>in</str<strong>on</strong>g>g from physical alterati<strong>on</strong>s <str<strong>on</strong>g>in</str<strong>on</strong>g>troduced<br />

by an author or copyist <str<strong>on</strong>g>in</str<strong>on</strong>g> a s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle manuscript. Early pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted books <strong>and</strong> h<strong>and</strong>written medieval<br />

manuscripts often have high levels of variati<strong>on</strong>, <strong>and</strong> the techniques of textual criticism grew up around<br />

the desire to create a s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle, def<str<strong>on</strong>g>in</str<strong>on</strong>g>itive text. Despite the fact that the digital envir<strong>on</strong>ment provided new<br />

possibilities for represent<str<strong>on</strong>g>in</str<strong>on</strong>g>g multiple versi<strong>on</strong>s of a text, significant disagreement am<strong>on</strong>g textual editors<br />

c<strong>on</strong>t<str<strong>on</strong>g>in</str<strong>on</strong>g>ued, as Schmidt <strong>and</strong> Colomb related:<br />

With the arrival of the digital medium the old arguments gradually gave way to the realisati<strong>on</strong><br />

that multiple versi<strong>on</strong>s could now coexist with<str<strong>on</strong>g>in</str<strong>on</strong>g> the same text. … This raised the prospect of a<br />

s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle model of variati<strong>on</strong> that might at last unite the various str<strong>and</strong>s of text-critical theory.<br />

However, so far no generally accepted technique of how to achieve this has been developed.<br />

This failure perhaps underlies the comm<strong>on</strong>ly held belief am<strong>on</strong>g humanists that any<br />

computati<strong>on</strong>al model of a text is necessarily temporary, subjective <strong>and</strong> imperfect (Schmidt <strong>and</strong><br />

Colomb 2009).<br />

141 Edit distance has been def<str<strong>on</strong>g>in</str<strong>on</strong>g>ed as a “str<str<strong>on</strong>g>in</str<strong>on</strong>g>g distance,” or the number of operati<strong>on</strong>s required to transform <strong>on</strong>e str<str<strong>on</strong>g>in</str<strong>on</strong>g>g <str<strong>on</strong>g>in</str<strong>on</strong>g>to another (with typical allowable<br />

operati<strong>on</strong>s, <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g the <str<strong>on</strong>g>in</str<strong>on</strong>g>serti<strong>on</strong>, deleti<strong>on</strong>, or substituti<strong>on</strong> of a s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle character) (http://en.wikipedia.org/wiki/Edit_distance).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!