26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

24<br />

algorithms; <str<strong>on</strong>g>in</str<strong>on</strong>g>stead, it uses “l<str<strong>on</strong>g>in</str<strong>on</strong>g>ear symmetry with a threshold of correlati<strong>on</strong> for each character, <strong>and</strong> an<br />

ordered sequence of characters to be searched for” (Tse <strong>and</strong> Bigun 2007). Tse <strong>and</strong> Bigun offered a<br />

fairly detailed explanati<strong>on</strong> for why they chose to avoid an approach us<str<strong>on</strong>g>in</str<strong>on</strong>g>g segmentati<strong>on</strong> algorithms,<br />

which are often employed <str<strong>on</strong>g>in</str<strong>on</strong>g> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> text recogniti<strong>on</strong>:<br />

The system proposed <str<strong>on</strong>g>in</str<strong>on</strong>g> this paper uses a segmentati<strong>on</strong>-free approach because the Serto script<br />

has characters that are cursive with difficult to determ<str<strong>on</strong>g>in</str<strong>on</strong>g>e start <strong>and</strong> end po<str<strong>on</strong>g>in</str<strong>on</strong>g>ts for characters.<br />

This is <strong>on</strong>e difference between Serto <strong>and</strong> Arabic or the cursive form of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> languages. S<str<strong>on</strong>g>in</str<strong>on</strong>g>ce<br />

segmentati<strong>on</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> these languages is difficult <strong>and</strong> not easy as <str<strong>on</strong>g>in</str<strong>on</strong>g> scripts like pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>,<br />

remov<str<strong>on</strong>g>in</str<strong>on</strong>g>g the need for segmentati<strong>on</strong> becomes an alternative way of deal<str<strong>on</strong>g>in</str<strong>on</strong>g>g with the problem of<br />

segmentati<strong>on</strong>, at least to obta<str<strong>on</strong>g>in</str<strong>on</strong>g> a quick basel<str<strong>on</strong>g>in</str<strong>on</strong>g>e recogniti<strong>on</strong> scheme (Tse <strong>and</strong> Bigun 2007).<br />

The Serto script OCR system that Tse <strong>and</strong> Bigun ultimately developed produced character-recogniti<strong>on</strong><br />

rates of approximately 90 percent. Earlier work by Clocks<str<strong>on</strong>g>in</str<strong>on</strong>g> (2003) had also described methods for the<br />

automatic recogniti<strong>on</strong> of Syriac h<strong>and</strong>writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g, albeit texts written <str<strong>on</strong>g>in</str<strong>on</strong>g> the Strangely script, <strong>and</strong> used a<br />

collecti<strong>on</strong> of historical manuscript images. This system reported recogniti<strong>on</strong> rates that ranged from<br />

between 61 percent <strong>and</strong> 100 percent, depend<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>on</strong> both the techniques used <strong>and</strong> the manuscript<br />

source.<br />

One research project that seeks to build an <str<strong>on</strong>g>in</str<strong>on</strong>g>itial cyber<str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure or <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e hub for Syriac is the<br />

Syriac Research Group, a jo<str<strong>on</strong>g>in</str<strong>on</strong>g>t project of the University of Alabama <strong>and</strong> Pr<str<strong>on</strong>g>in</str<strong>on</strong>g>cet<strong>on</strong> University. 69 The<br />

group’s major goal is to produce a new generati<strong>on</strong> of tools <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> resources that will help<br />

alleviate the access <strong>and</strong> discovery problem that currently h<str<strong>on</strong>g>in</str<strong>on</strong>g>ders “scholarly research <strong>on</strong> Syriac<br />

language, cultures <strong>and</strong> history.” An <str<strong>on</strong>g>in</str<strong>on</strong>g>ternati<strong>on</strong>al team of scholars is work<str<strong>on</strong>g>in</str<strong>on</strong>g>g to create an “<strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e<br />

reference source” that will meet the needs both of advanced Syriac scholars <strong>and</strong> of the <str<strong>on</strong>g>in</str<strong>on</strong>g>terested<br />

public, <strong>and</strong> their website offers a useful mockup of the potential portal as well as a list of potential user<br />

scenarios (e.g., a Syriac researcher work<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>on</strong> manuscripts, a n<strong>on</strong>specialist researcher).<br />

Accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to the project website, the “Syriac Reference Portal” 70 will serve as an “<str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> hub”<br />

that <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes an <strong>on</strong>tology or classificati<strong>on</strong> system that can be used for creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g Syriac reference works,<br />

an <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e encyclopedia, a gazetteer that <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes both geographic <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> <strong>and</strong> maps that are<br />

relevant to Syriac studies, an extensive bibliography, <strong>and</strong> a multil<str<strong>on</strong>g>in</str<strong>on</strong>g>gual authority file that will support<br />

“st<strong>and</strong>ardiz<str<strong>on</strong>g>in</str<strong>on</strong>g>g references to Syriac authors, texts, <strong>and</strong> place names.” Other related research work<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>cludes revisi<strong>on</strong> of the Unicode st<strong>and</strong>ard for Syriac, a TEI adaptati<strong>on</strong> for the descripti<strong>on</strong> of Syriac<br />

manuscripts, <strong>and</strong> plans to add a prosopographical tool. While the Syriac Research Group hopes to<br />

make all the resources listed above available <str<strong>on</strong>g>in</str<strong>on</strong>g> the first generati<strong>on</strong> of the portal, the ultimate goal is to<br />

open these resources to the scholarly community for “collaborative augmentati<strong>on</strong> <strong>and</strong> annotati<strong>on</strong>.” The<br />

project hopes that as the portal grows that the project team will be able to iteratively add to the hub by<br />

l<str<strong>on</strong>g>in</str<strong>on</strong>g>k<str<strong>on</strong>g>in</str<strong>on</strong>g>g to digitized Syriac c<strong>on</strong>tent that is available <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e, such as the Syriac corpus of literature be<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

prepared by Brigham Young University (BYU), 71 the eBeth Arké Syriac Studies Collecti<strong>on</strong>, 72 <strong>and</strong> the<br />

Manumed project <str<strong>on</strong>g>in</str<strong>on</strong>g> the European Uni<strong>on</strong>. 73<br />

69 http://www.syriac.ua.edu/<br />

70 The “Syriac Reference Portal” has been based <strong>on</strong> the Indiana Philosophy Ontology Project (InPhO) (https://<str<strong>on</strong>g>in</str<strong>on</strong>g>pho.cogs.<str<strong>on</strong>g>in</str<strong>on</strong>g>diana.edu/)<br />

71 http://cpart.byu.edu/page=112&sidebar<br />

72 http://www.hmml.org/vivarium/BethArke.htm<br />

73 The Manumed Project (http://www.manumed.org/en/) is build<str<strong>on</strong>g>in</str<strong>on</strong>g>g an extensive virtual library of digitized cultural heritage documents from the Euro-<br />

Mediterranean regi<strong>on</strong> with a particular focus <strong>on</strong> manuscripts <str<strong>on</strong>g>in</str<strong>on</strong>g> languages <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g “Arabic, Greek, Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>, Syriac, Hebrew, Aramaic, Coptic, Berber,<br />

Armenian.”

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!