26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

47<br />

Additi<strong>on</strong>ally, Schmidt <strong>and</strong> Colomb postulated that the lack of an “accurate model of textual variati<strong>on</strong>”<br />

<strong>and</strong> the ability to implement such a model <str<strong>on</strong>g>in</str<strong>on</strong>g> a digital world have c<strong>on</strong>t<str<strong>on</strong>g>in</str<strong>on</strong>g>ued to frustrate many<br />

humanists.<br />

A related problem identified by Schmidt <strong>and</strong> Colomb is that of overlapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g hierarchies, or when<br />

different markup structures (e.g., generic structural markup, l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistic markup, literary markup)<br />

overlap <str<strong>on</strong>g>in</str<strong>on</strong>g> a text. Markup is said to overlap <str<strong>on</strong>g>in</str<strong>on</strong>g> that “the tags <str<strong>on</strong>g>in</str<strong>on</strong>g> <strong>on</strong>e perspective are not always well<br />

formed with respect to tags <str<strong>on</strong>g>in</str<strong>on</strong>g> another” (e.g., as <str<strong>on</strong>g>in</str<strong>on</strong>g> well-formed XML). Schmidt <strong>and</strong> Colomb proposed<br />

that the term “overlapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g hierarchies” is essentially <str<strong>on</strong>g>in</str<strong>on</strong>g>correct: “Firstly, not all overlap is between<br />

compet<str<strong>on</strong>g>in</str<strong>on</strong>g>g hierarchies, <strong>and</strong> sec<strong>on</strong>dly what is meant by the term ‘hierarchy’ is actually ‘trees’, that is a<br />

specific k<str<strong>on</strong>g>in</str<strong>on</strong>g>d of hierarchy <str<strong>on</strong>g>in</str<strong>on</strong>g> which each node, except for the root, has <strong>on</strong>ly <strong>on</strong>e parent.” They put<br />

forward that although there have been over 50 papers deal<str<strong>on</strong>g>in</str<strong>on</strong>g>g with this topic, <strong>on</strong>e fundamental <strong>and</strong><br />

comm<strong>on</strong> weakness <str<strong>on</strong>g>in</str<strong>on</strong>g> the proposed approaches was that they offered soluti<strong>on</strong>s to problematic markup<br />

by us<str<strong>on</strong>g>in</str<strong>on</strong>g>g markup itself. The authors further <strong>and</strong> asserted that all cases of overlapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g hierarchies are<br />

also cases of textual variati<strong>on</strong>, even if the reverse is not always true. “The overlapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g hierarchies<br />

problem, then, boils down to variati<strong>on</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> the metadata,” Schmidt <strong>and</strong> Colomb declared, add<str<strong>on</strong>g>in</str<strong>on</strong>g>g that “it<br />

is entirely subsumed by the textual variati<strong>on</strong> problem because textual variati<strong>on</strong> is variati<strong>on</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> the entire<br />

text, not <strong>on</strong>ly <str<strong>on</strong>g>in</str<strong>on</strong>g> the markup” (Schmidt <strong>and</strong> Colomb 2009). They thus c<strong>on</strong>cluded that textual variati<strong>on</strong><br />

was the problem that needed solv<str<strong>on</strong>g>in</str<strong>on</strong>g>g.<br />

Ma<str<strong>on</strong>g>in</str<strong>on</strong>g>ta<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g that neither versi<strong>on</strong> c<strong>on</strong>trol systems nor multiple sequence alignment (<str<strong>on</strong>g>in</str<strong>on</strong>g>spired by<br />

bio<str<strong>on</strong>g>in</str<strong>on</strong>g>formatics) can adequately address the problem of text variants, Schmidt <strong>and</strong> Colomb propose<br />

model<str<strong>on</strong>g>in</str<strong>on</strong>g>g text variati<strong>on</strong> as either a “m<str<strong>on</strong>g>in</str<strong>on</strong>g>imally redundant directed graph” or as an “ordered list of<br />

pairs” where each pair c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g>s a “set of versi<strong>on</strong>s <strong>and</strong> a fragment of text or data.” The greatest<br />

challenge with variant graphs, they expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed, is how to process them efficiently. The m<str<strong>on</strong>g>in</str<strong>on</strong>g>imum<br />

number of functi<strong>on</strong>s that users would need were read<str<strong>on</strong>g>in</str<strong>on</strong>g>g a s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle versi<strong>on</strong> of a text, search<str<strong>on</strong>g>in</str<strong>on</strong>g>g a<br />

multiversi<strong>on</strong> text, compar<str<strong>on</strong>g>in</str<strong>on</strong>g>g two versi<strong>on</strong>s of a text, determ<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g what was a variant of what else,<br />

creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> edit<str<strong>on</strong>g>in</str<strong>on</strong>g>g a variant graph, <strong>and</strong> separat<str<strong>on</strong>g>in</str<strong>on</strong>g>g c<strong>on</strong>tent <strong>and</strong> variati<strong>on</strong>. The soluti<strong>on</strong> proposed by<br />

Schmidt (2010) is the multiversi<strong>on</strong> document format (MVD):<br />

The Multi-Versi<strong>on</strong> Document or MVD model represents all the versi<strong>on</strong>s of a work, whether<br />

they arise from correcti<strong>on</strong>s to a text or from the copy<str<strong>on</strong>g>in</str<strong>on</strong>g>g of <strong>on</strong>e orig<str<strong>on</strong>g>in</str<strong>on</strong>g>al text <str<strong>on</strong>g>in</str<strong>on</strong>g>to several variant<br />

versi<strong>on</strong>s, or some comb<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong> of the two, as four atomic operati<strong>on</strong>s: <str<strong>on</strong>g>in</str<strong>on</strong>g>serti<strong>on</strong>, deleti<strong>on</strong>,<br />

substituti<strong>on</strong>, <strong>and</strong> transpositi<strong>on</strong>. … An MVD can be represented as a directed graph, with <strong>on</strong>e<br />

start node <strong>and</strong> <strong>on</strong>e end-node. … Alternatively it can be serialized as a list of paired values, each<br />

c<strong>on</strong>sist<str<strong>on</strong>g>in</str<strong>on</strong>g>g of a fragment of text <strong>and</strong> a set of versi<strong>on</strong>s to which that fragment bel<strong>on</strong>gs. As the<br />

number of versi<strong>on</strong>s <str<strong>on</strong>g>in</str<strong>on</strong>g>creases, the number of fragments <str<strong>on</strong>g>in</str<strong>on</strong>g>creases, their size decreases, <strong>and</strong> the<br />

size of their versi<strong>on</strong>-sets <str<strong>on</strong>g>in</str<strong>on</strong>g>creases. This provides a good scalability as it trades off complexity<br />

for size, someth<str<strong>on</strong>g>in</str<strong>on</strong>g>g that modern computers are very good at h<strong>and</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>g. By follow<str<strong>on</strong>g>in</str<strong>on</strong>g>g a path from<br />

the start-node to the end-node any versi<strong>on</strong> can be recovered. When read<str<strong>on</strong>g>in</str<strong>on</strong>g>g the list form of the<br />

graph, fragments not bel<strong>on</strong>g<str<strong>on</strong>g>in</str<strong>on</strong>g>g to the desired versi<strong>on</strong> are merely skipped over (Schmidt 2010).<br />

Schmidt listed a number of benefits of the MVD format for humanists, <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g the follow<str<strong>on</strong>g>in</str<strong>on</strong>g>g: (1) it<br />

supports the automatic computati<strong>on</strong> of <str<strong>on</strong>g>in</str<strong>on</strong>g>serti<strong>on</strong>s, deleti<strong>on</strong>s, variants, <strong>and</strong> transpositi<strong>on</strong>s between a set<br />

of versi<strong>on</strong>s; (2) MVDs are c<strong>on</strong>tent format-agnostic about <str<strong>on</strong>g>in</str<strong>on</strong>g>dividual versi<strong>on</strong>s so they can be used with<br />

any generalized markup or pla<str<strong>on</strong>g>in</str<strong>on</strong>g> text; (3) an MVD is “not a collecti<strong>on</strong> of files” <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>stead stores “<strong>on</strong>ly<br />

the differences between all the versi<strong>on</strong>s of a work as <strong>on</strong>e digital entity <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>terrelates them” (Schmidt<br />

2010); (4) s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce the MVD stores the overlapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g structures of a set of versi<strong>on</strong>s, the markup of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!