26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

61<br />

Plat<strong>on</strong>ists). John Lee (2007) has c<strong>on</strong>ducted other work <str<strong>on</strong>g>in</str<strong>on</strong>g> textual reuse <strong>and</strong> explored sentence<br />

alignment <str<strong>on</strong>g>in</str<strong>on</strong>g> the Synoptic Gospels of the Greek New Testament. He po<str<strong>on</strong>g>in</str<strong>on</strong>g>ted out that explor<str<strong>on</strong>g>in</str<strong>on</strong>g>g ancienttext<br />

reuse is a difficult but important task s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce ancient authors rarely acknowledged their sources <strong>and</strong><br />

often quoted from memory or comb<str<strong>on</strong>g>in</str<strong>on</strong>g>ed multiple sources. “Identify<str<strong>on</strong>g>in</str<strong>on</strong>g>g the sources of ancient texts is<br />

useful <str<strong>on</strong>g>in</str<strong>on</strong>g> many ways,” Lee stressed: “It helps establish their relative dates. It traces the evoluti<strong>on</strong> of<br />

ideas. The material quoted, left out or altered <str<strong>on</strong>g>in</str<strong>on</strong>g> a compositi<strong>on</strong> provides much <str<strong>on</strong>g>in</str<strong>on</strong>g>sight <str<strong>on</strong>g>in</str<strong>on</strong>g>to the agenda<br />

of its author” (Lee 2007).<br />

Authorship attributi<strong>on</strong>, or us<str<strong>on</strong>g>in</str<strong>on</strong>g>g manual or automatic techniques to determ<str<strong>on</strong>g>in</str<strong>on</strong>g>e the authorship of<br />

an<strong>on</strong>ymous texts, has been previously explored <str<strong>on</strong>g>in</str<strong>on</strong>g> classical studies (Rudman 1998) <strong>and</strong> rema<str<strong>on</strong>g>in</str<strong>on</strong>g>s a topic<br />

of <str<strong>on</strong>g>in</str<strong>on</strong>g>terest. Forstall <strong>and</strong> Scheirer (2009) presented new methods for authorship attributi<strong>on</strong> based <strong>on</strong><br />

sound rather than text to Greek <strong>and</strong> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> poets <strong>and</strong> prose authors:<br />

We present the functi<strong>on</strong>al n-gram as a feature well-suited to the analysis of poetry <strong>and</strong> other<br />

sound-sensitive material, work<str<strong>on</strong>g>in</str<strong>on</strong>g>g toward a stylistics based <strong>on</strong> sound rather than text. Us<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

Support Vector Mach<str<strong>on</strong>g>in</str<strong>on</strong>g>es (SVM) for text classificati<strong>on</strong>, we extend the expressi<strong>on</strong> of our results<br />

from a s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle marg<str<strong>on</strong>g>in</str<strong>on</strong>g>al distance or a b<str<strong>on</strong>g>in</str<strong>on</strong>g>ary yes/no decisi<strong>on</strong> to a more flexible receiver-operator<br />

characteristic curve. We apply the same feature methodology to Pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ciple Comp<strong>on</strong>ent Analysis<br />

(PCA) <str<strong>on</strong>g>in</str<strong>on</strong>g> order to validate PCA <strong>and</strong> to explore its expressive potential (Forstall <strong>and</strong> Scheirer<br />

2009).<br />

The authors discovered that sounds tested with SVMs produced results that performed as well as, if not<br />

better than, functi<strong>on</strong>-words <str<strong>on</strong>g>in</str<strong>on</strong>g> every experiment performed, <strong>and</strong> thus c<strong>on</strong>cluded that “sound can be<br />

captured <strong>and</strong> used effectively as a feature for attribut<str<strong>on</strong>g>in</str<strong>on</strong>g>g authorship to a variety of literary texts.”<br />

Forstall <strong>and</strong> Scheirer also reported some <str<strong>on</strong>g>in</str<strong>on</strong>g>terest<str<strong>on</strong>g>in</str<strong>on</strong>g>g <str<strong>on</strong>g>in</str<strong>on</strong>g>itial results <str<strong>on</strong>g>in</str<strong>on</strong>g> explor<str<strong>on</strong>g>in</str<strong>on</strong>g>g the Homeric poems,<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g test<str<strong>on</strong>g>in</str<strong>on</strong>g>g the argument that this poetry was composed without aid of writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g, an issue explored<br />

at length by the Homeric Multitext Project. “When the works of Thucydides, a literate prose historian,<br />

were projected us<str<strong>on</strong>g>in</str<strong>on</strong>g>g the pr<str<strong>on</strong>g>in</str<strong>on</strong>g>cipal comp<strong>on</strong>ents derived from Homer, Thucydides' work not <strong>on</strong>ly<br />

clustered together but had a much smaller radius than either of the Homeric poems,” Forstall <strong>and</strong><br />

Scheirer c<strong>on</strong>tended, add<str<strong>on</strong>g>in</str<strong>on</strong>g>g that “this result agrees with philological arguments for the Homer's works<br />

hav<str<strong>on</strong>g>in</str<strong>on</strong>g>g been produced by a wholly different, oral mode of compositi<strong>on</strong>.” The work of Forstall <strong>and</strong><br />

Scheirer is just <strong>on</strong>e example of many am<strong>on</strong>g digital classics projects of how computer science<br />

methodologies can shed new light <strong>on</strong> old questi<strong>on</strong>s.<br />

The PDL has c<strong>on</strong>ducted some of its own experiments <str<strong>on</strong>g>in</str<strong>on</strong>g> automatic quotati<strong>on</strong> identificati<strong>on</strong>. Ernst-<br />

Gerlach <strong>and</strong> Crane (2008) <str<strong>on</strong>g>in</str<strong>on</strong>g>troduced an algorithm for the automatic analysis of citati<strong>on</strong>s but found<br />

that they needed to first manually analyze the structure of quotati<strong>on</strong>s <str<strong>on</strong>g>in</str<strong>on</strong>g> three different reference works<br />

of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> texts to determ<str<strong>on</strong>g>in</str<strong>on</strong>g>e text quotati<strong>on</strong> alternati<strong>on</strong> patterns. Their experience c<strong>on</strong>firmed Lee’s earlier<br />

po<str<strong>on</strong>g>in</str<strong>on</strong>g>t that text reuse is rarely word for word, though <str<strong>on</strong>g>in</str<strong>on</strong>g> this case it was the quotati<strong>on</strong> practices of<br />

n<str<strong>on</strong>g>in</str<strong>on</strong>g>eteenth-century reference works, rather than those of ancient authors, that proved problematic:<br />

Quotati<strong>on</strong>s are, <str<strong>on</strong>g>in</str<strong>on</strong>g> practice, often not exact. In some cases, our quotati<strong>on</strong>s are based <strong>on</strong> different<br />

editi<strong>on</strong>s of a text than those to which we have electr<strong>on</strong>ic access <strong>and</strong> we f<str<strong>on</strong>g>in</str<strong>on</strong>g>d occasi<strong>on</strong>al<br />

variati<strong>on</strong>s that reflect different versi<strong>on</strong>s of the text. We also found, however, that some<br />

quotati<strong>on</strong>s – especially <str<strong>on</strong>g>in</str<strong>on</strong>g> reference works such as lexica <strong>and</strong> grammars – deliberately modify<br />

the quoted text – the goal <str<strong>on</strong>g>in</str<strong>on</strong>g> such cases is not to replicate the orig<str<strong>on</strong>g>in</str<strong>on</strong>g>al text but to illustrate a<br />

po<str<strong>on</strong>g>in</str<strong>on</strong>g>t about lexicography, grammar, or some other topic (Ernst-Gerlach <strong>and</strong> Crane 2008).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!