Rome Wasn't Digitized in a Day - Council on Library and Information ...

More documents

Recommendations

Info

15 manuscript pages and the transcriptions of the text are available for download onl<strong>in</strong>e. 50 Scholars are work<strong>in</strong>g with digital images rather than the manuscript itself, and scholars from diverse discipl<strong>in</strong>es, <strong>in</strong>clud<strong>in</strong>g palaeography, the history of mathematics and science, and Byzant<strong>in</strong>e liturgy, have done extensive work with this palimpsest. Much of the image-process<strong>in</strong>g work with the palimpsest has focused on develop<strong>in</strong>g algorithms to extract the text of Archimedes <strong>in</strong> particular from page images. Salerno et al. (2007) used pr<strong>in</strong>cipal component analysis (PCA) and <strong>in</strong>dependent component analysis (ICA) techniques to extract “clean maps of the primary Archimedes text, the overwritten text, and the mold pattern present <strong>in</strong> the pages” from 14 hyperspectral images of the Archimedes. Their goals were to provide better access to the text and to develop techniques that could be used <strong>in</strong> other palimpsestdigitization projects. The authors also report that: A further aspect of the problem is to partly automate the read<strong>in</strong>g and transcription tasks. This cannot be <strong>in</strong>tended as a substitution of the human experts <strong>in</strong> a task where they perform better than any presently conceivable numerical strategy, but as an acceleration of the human work (Salerno et al. 2007). The importance of not replac<strong>in</strong>g expert scholars with systems but rather of develop<strong>in</strong>g tools that assist them <strong>in</strong> their traditional tasks is a theme seen throughout the literature. Other significant work <strong>in</strong> the area of provid<strong>in</strong>g access to fragile manuscripts has been conduced by the EDUCE (Enhanced Digital Unwrapp<strong>in</strong>g for Conservation and Education) Project. 51 Investigators on this National Science Foundation–funded project have been work<strong>in</strong>g to develop systems that support the “virtual unwrapp<strong>in</strong>g and visualization of ancient texts.” Accord<strong>in</strong>g to their website: The overall purpose is to capture <strong>in</strong> digital form fragile 3D texts, such as ancient papyrus and scrolls of other materials us<strong>in</strong>g a custom built, portable, multi-power CT scann<strong>in</strong>g device and then to virtually “unroll” the scroll us<strong>in</strong>g image algorithms, render<strong>in</strong>g a digital facsimile that exposes and makes legible <strong>in</strong>scriptions and other mark<strong>in</strong>gs on the artifact, all <strong>in</strong> a non-<strong>in</strong>vasive process. Some of the EDUCE Project’s image-process<strong>in</strong>g techniques have been used by the Homer Multitext 52 Project as described by Baumann and Seales (2009), who presented an application of imageregistration techniques, or the “process of mapp<strong>in</strong>g a sensed image <strong>in</strong>to the coord<strong>in</strong>ate system of a reference image,” to the Venetus A manuscript of the Iliad used <strong>in</strong> this project. The Homer Multitext Project <strong>in</strong>cluded 3-D scann<strong>in</strong>g as part of its digitization strategy, but as the 3-D scann<strong>in</strong>g system acquired un-textured 3-D models a “procedure to register the 2D photography to the 3D scans was performed periodically.” Dur<strong>in</strong>g one photography session it was discovered that technical issues had produced a number of images of poor quality. While these images were reshot, time constra<strong>in</strong>ts prevented perform<strong>in</strong>g the 3-D geometry capture for these pages aga<strong>in</strong>. The result was a number of folios that had two sets of data—a “dirty” image that had registered 3-D geometry and a “clean” image with no associated geometry—to which the project wished to apply digital flatten<strong>in</strong>g algorithms. The ma<strong>in</strong> computational problem was thus to determ<strong>in</strong>e a means of obta<strong>in</strong><strong>in</strong>g a “high-quality deformation of the ‘clean image’ such that the text was <strong>in</strong> the same position as the ‘dirty image’” that would then allow them to “apply digital flatten<strong>in</strong>g us<strong>in</strong>g the acquired correspond<strong>in</strong>g 3D geometry.” 50 http://archimedespalimpsest.net/ 51 http://www.stoa.org/educe/ 52 http://chs.harvard.edu/wa/pageRtn=ArticleWrapper&bdc=12&mn=1169
16 The image-registration algorithm developed by Baumann and Seales was successful, and the authors rightly concluded that: High-resolution, multispectral digital imag<strong>in</strong>g of important documents is emerg<strong>in</strong>g as a standard practice for enabl<strong>in</strong>g scholarly analysis of difficult or damaged texts. As imag<strong>in</strong>g techniques improve, documents are revisited and re-imaged, and registration of these images <strong>in</strong>to the same frame of reference for direct comparison can be a powerful tool (Baumann and Seales 2009). The work of the EDUCE Project illustrates how the state of the art is be<strong>in</strong>g used to provide new levels of access to valuable and damaged manuscripts. Lat<strong>in</strong> In light of the extensive digitization of cultural heritage materials such as manuscripts and the large number of Lat<strong>in</strong> texts that are becom<strong>in</strong>g available through massive digitization projects, techniques for improv<strong>in</strong>g access to these materials is an area of grow<strong>in</strong>g research that is exam<strong>in</strong>ed <strong>in</strong> this subsection. A variety of approaches have been explored for improv<strong>in</strong>g access to Lat<strong>in</strong> manuscripts. Leydier et al. (2007) explored the use of “word-spott<strong>in</strong>g” to improve <strong>in</strong>formation retrieval of textual data <strong>in</strong> primarily Lat<strong>in</strong> medieval manuscript images. They describe the technique as follows: In practice, word-spott<strong>in</strong>g consists <strong>in</strong> retriev<strong>in</strong>g all the occurrences of an image of a word. This template word is selected by the user by outl<strong>in</strong><strong>in</strong>g one occurrence on the document. It results <strong>in</strong> the system propos<strong>in</strong>g a sorted list of hits that the user can prune manually. … Word-spott<strong>in</strong>g is based on a similarity or a distance between two images, the reference image def<strong>in</strong>ed by the user and the target images represent<strong>in</strong>g the rest of the page or all the pages of a multi-page document. Contrary to text query on a document processed by OCR, a word-image query can be sensitive to the style of the writ<strong>in</strong>g or the typography used. This technique is used when word recognition cannot be done, for example on very deteriorated pr<strong>in</strong>ted documents or on manuscripts (Leydier et al. 2007). The authors report that ma<strong>in</strong> drawback to this approach is that a user has to select a keyword <strong>in</strong> a manuscript image (typically based on an ascii transcript) as a basis for further image retrieval, limit<strong>in</strong>g their approach to retrieval of other images by word only. Another approach, presented by Edwards et al. (2004), tra<strong>in</strong>ed a generalized Hidden Markov Model (gHMM) on the transcription of a Lat<strong>in</strong> manuscript to get both a transmission model and one example each for 22 letters to create an emission model. Their transition model for unigrams, bigrams, and trigrams was fitted us<strong>in</strong>g the Lat<strong>in</strong> Library’s electronic version of Caesar’s Gallic Wars, and their emission model was tra<strong>in</strong>ed on 22 glyphs taken from a twelfth-century manuscript of Terence’s Comoediae. In contrast to Leydier et al., the authors argued that word-spott<strong>in</strong>g was not entirely appropriate for a highly <strong>in</strong>flected language such as Lat<strong>in</strong>: Manmatha et al. … <strong>in</strong>troduce the technique of “word spott<strong>in</strong>g,” which segments text <strong>in</strong>to word images, rectifies the word images, and then uses an aligned tra<strong>in</strong><strong>in</strong>g set to learn correspondences between rectified word images and str<strong>in</strong>gs. The method is not suitable for a heavily <strong>in</strong>flected language, because words take so many forms. In an <strong>in</strong>flected language, the natural unit to match to is a subset of a word, rather than a whole word, imply<strong>in</strong>g that one
Page 1 and 2: "Rome Wasn
Page 3 and 4: ii ISBN 978-1-932326-38-3 CLIR Publ
Page 5 and 6: iv EpiDoc-Based Digital Epigraphy P
Page 7 and 8: vi ABOUT THE AUTHOR Alison Babeu ha
Page 9 and 10: viii PBW PCA PDB PDL PHI PLANETS PN
Page 11 and 12: x pursuit of knowledge about the an
Page 13 and 14: 2 and intellectual
Page 15 and 16: 4 associations to work together, an
Page 17 and 18: 6 briefly explore issues that are n
Page 19 and 20: 8 for English translations of Greek
Page 21 and 22: 10 is created of the entire bibliog
Page 23 and 24: 12 directory of more than 2,100 cat
Page 25: 14 generated by the decision tree p
Page 29 and 30: 18 Schibel and Rydberg-Cox argued t
Page 31 and 32: 20 information ret
Page 33 and 34: 22 In ancient manuscripts, Sanskrit
Page 35 and 36: 24 algorithms; ins
Page 37 and 38: 26 Markov Models (MEMM) 82 and outp
Page 39 and 40: 28 One major project to recently em
Page 41 and 42: 30 developed a cuneiform sign reper
Page 43 and 44: 32 and prayers), and edited texts a
Page 45 and 46: 34 Digital critical editions, howev
Page 47 and 48: 36 While these requirements may see
Page 49 and 50: 38 many discrete texts, in<
Page 51 and 52: 40 examine how l<s
Page 53 and 54: 42 The second fact that Rob
Page 55 and 56: 44 “multimedia scholarly editions
Page 57 and 58: 46 different apparatuses. In Bosche
Page 59 and 60: 48 individual text
Page 61 and 62: 50 In these cases we must provide a
Page 63 and 64: 52 First, it utilizes a nearest nei
Page 65 and 66: 54 entirely pre-searched for each l
Page 67 and 68: 56 As this research in</str
Page 69 and 70: 58 the inscription
Page 71 and 72: 60 Text Min<strong
Page 73 and 74: 62 This manual analysis provided a
Page 75 and 76: 64 As illustrated by this def<stron
Page 77 and 78:
66 archaeology as a discipl
Page 79 and 80:
68 funded with public money <strong
Page 81 and 82:
70 for re-use is a simple slogan, b
Page 83 and 84:
72 framework to in
Page 85 and 86:
74 then mapped to the CRM-EH so tha
Page 87 and 88:
76 tDAR also stores all resources <
Page 89 and 90:
78 Many look to their in</s
Page 91 and 92:
80 Although the multidiscipl<strong
Page 93 and 94:
82 computer science can make it pos
Page 95 and 96:
84 were created after this time. A
Page 97 and 98:
86 transparency, accessibility, ava
Page 99 and 100:
88 (Research Archive for Ancient Sc
Page 101 and 102:
90 This challenge of not just digit
Page 103 and 104:
92 Part of the research of the Plei
Page 105 and 106:
94 After correctin
Page 107 and 108:
96 Epigraphy Overview: Epigraphy Da
Page 109 and 110:
98 step, because epigraphic texts s
Page 111 and 112:
100 text and to previous annotation
Page 113 and 114:
102 and squeezes as well as a selec
Page 115 and 116:
104 original); typ
Page 117 and 118:
106 To begin this
Page 119 and 120:
108 better set the in</stro
Page 121 and 122:
110 Another related project that ha
Page 123 and 124:
112 used to point
Page 125 and 126:
114 names in texts
Page 127 and 128:
116 and Manuscripts (VRE-SDM) 378 s
Page 129 and 130:
118 level of storage in</st
Page 131 and 132:
120 simplicity, and sensible file n
Page 133 and 134:
122 compliant DTD and schema. 393 I
Page 135 and 136:
124 While the advanced document-rec
Page 137 and 138:
126 Another significant manuscript
Page 139 and 140:
128 compelled to operate in
Page 141 and 142:
130 Cayless thus developed a method
Page 143 and 144:
132 must consider commercial and en
Page 145 and 146:
134 standard developed for the desc
Page 147 and 148:
136 to standard catalogs, identific
Page 149 and 150:
138 present an extension and comb<s
Page 151 and 152:
140 scholars in th
Page 153 and 154:
142 described as the amicitia papyr
Page 155 and 156:
144 to a full text transcription wh
Page 157 and 158:
146 The final majo
Page 159 and 160:
148 One is toward openness; the oth
Page 161 and 162:
150 this idea of personal ownership
Page 163 and 164:
152 prototype 512 in</stron
Page 165 and 166:
154 other assertions made about par
Page 167 and 168:
156 The conclusion drawn from this
Page 169 and 170:
158 and that “is main</st
Page 171 and 172:
160 philology available at the webs
Page 173 and 174:
162 descriptions that provides scho
Page 175 and 176:
164 of multilingua
Page 177 and 178:
166 Another major methodological is
Page 179 and 180:
168 ancient historians. Network ana
Page 181 and 182:
170 Bradley and Short (2005) have o
Page 183 and 184:
172 “apparently disconnected and
Page 185 and 186:
174 sources used and their abbrevia
Page 187 and 188:
176 resources, inc
Page 189 and 190:
178 materials could be found, a rec
Page 191 and 192:
180 or more documents (69 percent)
Page 193 and 194:
182 This work tends to focus on des
Page 195 and 196:
184 A related poin
Page 197 and 198:
186 “infer user
Page 199 and 200:
188 accessed; few providers, if any
Page 201 and 202:
190 information so
Page 203 and 204:
192 features. In addition, while pa
Page 205 and 206:
194 postdoctoral students, who were
Page 207 and 208:
196 computers by ancient historians
Page 209 and 210:
198 Demos (a growin</strong
Page 211 and 212:
200 database of this encyclopedia c
Page 213 and 214:
202 academia.” At the same time,
Page 215 and 216:
204 granted their views of the pote
Page 217 and 218:
206 science and use ICT occasionall
Page 219 and 220:
208 also declared that the m<strong
Page 221 and 222:
210 were exceptionally diverse, as
Page 223 and 224:
212 across virtual collections of d
Page 225 and 226:
214 tradition of European literatur
Page 227 and 228:
216 the related MILARQ project, 649
Page 229 and 230:
218 initial fund<s
Page 231 and 232:
220 SPQR—Supportin</stron
Page 233 and 234:
222 environments in</strong
Page 235 and 236:
224 become less “localized” wit
Page 237 and 238:
226 methods,” Cohen et al. expla<
Page 239 and 240:
228 One might have presumed that ou
Page 241 and 242:
230 This lack of deeper understand<
Page 243 and 244:
232 expertise, tools, experience, a
Page 245 and 246:
234 All link<stron
Page 247 and 248:
236 of architecture, tools and serv
Page 249 and 250:
238 reported that eSAD was develop<
Page 251 and 252:
240 semantic tools that would make
Page 253 and 254:
242 Sustainable Pr
Page 255 and 256:
244 A recent ARL report that explor
Page 257 and 258:
246 organizations and fundi
Page 259 and 260:
248 throughout this review, i.e., o
Page 261 and 262:
250 commercial providers. At the sa
Page 263 and 264:
252 representations beyond the bord
Page 265 and 266:
254 such as Vindol
Page 267 and 268:
256 that the question of in
Page 269 and 270:
258 Linguistic Com
Page 271 and 272:
260 resources and technology overvi
Page 273 and 274:
262 According to B
Page 275 and 276:
264 support the main</stron
Page 277 and 278:
266 Interoperability”). 763 The m
Page 279 and 280:
268 as the point o
Page 281 and 282:
270 This architecture makes it easy
Page 283 and 284:
272 [Aschenbrenner et al. 2008]. As
Page 285 and 286:
274 Editions.” Advances i
Page 287 and 288:
276 [Bodard et al. 2009]. Bodard, G
Page 289 and 290:
278 [Bulger et al. 2011]. Bulger, M
Page 291 and 292:
280 [Choudhury and Stin</st
Page 293 and 294:
282 [Crane et al. 2009a]. Crane, Gr
Page 295 and 296:
284 [Dué and Ebbott 2009]. Dué, C
Page 297 and 298:
286 [Flaten 2009]. Flaten, Arne R.
Page 299 and 300:
288 [Hardwick 2000]. Hardwick, Lorn
Page 301 and 302:
290 Conference on E-Science Worksho
Page 303 and 304:
292 [Lockyear 2007]. Lockyear, Kris
Page 305 and 306:
294 Cambridge, MA: Association for
Page 307 and 308:
296 [Ntzios et al. 2007]. Ntzios, K
Page 309 and 310:
298 [Reddy and Crane 2006]. Reddy,
Page 311 and 312:
300 [Rydberg-Cox 2009]. Rydberg-Cox
Page 313 and 314:
302 [Smith 2010]. Smith, D. Neel.
Page 315 and 316:
304 and the Canadian Academic Commu
Page 317 and 318:
306 [Wallom et al. 2009]. Wallom, D
show all

Rome Wasn't Digitized in a Day - Council on Library and Information ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?