Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
24<br />
algorithms; <str<strong>on</strong>g>in</str<strong>on</strong>g>stead, it uses “l<str<strong>on</strong>g>in</str<strong>on</strong>g>ear symmetry with a threshold of correlati<strong>on</strong> for each character, <strong>and</strong> an<br />
ordered sequence of characters to be searched for” (Tse <strong>and</strong> Bigun 2007). Tse <strong>and</strong> Bigun offered a<br />
fairly detailed explanati<strong>on</strong> for why they chose to avoid an approach us<str<strong>on</strong>g>in</str<strong>on</strong>g>g segmentati<strong>on</strong> algorithms,<br />
which are often employed <str<strong>on</strong>g>in</str<strong>on</strong>g> Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> text recogniti<strong>on</strong>:<br />
The system proposed <str<strong>on</strong>g>in</str<strong>on</strong>g> this paper uses a segmentati<strong>on</strong>-free approach because the Serto script<br />
has characters that are cursive with difficult to determ<str<strong>on</strong>g>in</str<strong>on</strong>g>e start <strong>and</strong> end po<str<strong>on</strong>g>in</str<strong>on</strong>g>ts for characters.<br />
This is <strong>on</strong>e difference between Serto <strong>and</strong> Arabic or the cursive form of Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> languages. S<str<strong>on</strong>g>in</str<strong>on</strong>g>ce<br />
segmentati<strong>on</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> these languages is difficult <strong>and</strong> not easy as <str<strong>on</strong>g>in</str<strong>on</strong>g> scripts like pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ted Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>,<br />
remov<str<strong>on</strong>g>in</str<strong>on</strong>g>g the need for segmentati<strong>on</strong> becomes an alternative way of deal<str<strong>on</strong>g>in</str<strong>on</strong>g>g with the problem of<br />
segmentati<strong>on</strong>, at least to obta<str<strong>on</strong>g>in</str<strong>on</strong>g> a quick basel<str<strong>on</strong>g>in</str<strong>on</strong>g>e recogniti<strong>on</strong> scheme (Tse <strong>and</strong> Bigun 2007).<br />
The Serto script OCR system that Tse <strong>and</strong> Bigun ultimately developed produced character-recogniti<strong>on</strong><br />
rates of approximately 90 percent. Earlier work by Clocks<str<strong>on</strong>g>in</str<strong>on</strong>g> (2003) had also described methods for the<br />
automatic recogniti<strong>on</strong> of Syriac h<strong>and</strong>writ<str<strong>on</strong>g>in</str<strong>on</strong>g>g, albeit texts written <str<strong>on</strong>g>in</str<strong>on</strong>g> the Strangely script, <strong>and</strong> used a<br />
collecti<strong>on</strong> of historical manuscript images. This system reported recogniti<strong>on</strong> rates that ranged from<br />
between 61 percent <strong>and</strong> 100 percent, depend<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>on</strong> both the techniques used <strong>and</strong> the manuscript<br />
source.<br />
One research project that seeks to build an <str<strong>on</strong>g>in</str<strong>on</strong>g>itial cyber<str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure or <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e hub for Syriac is the<br />
Syriac Research Group, a jo<str<strong>on</strong>g>in</str<strong>on</strong>g>t project of the University of Alabama <strong>and</strong> Pr<str<strong>on</strong>g>in</str<strong>on</strong>g>cet<strong>on</strong> University. 69 The<br />
group’s major goal is to produce a new generati<strong>on</strong> of tools <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> resources that will help<br />
alleviate the access <strong>and</strong> discovery problem that currently h<str<strong>on</strong>g>in</str<strong>on</strong>g>ders “scholarly research <strong>on</strong> Syriac<br />
language, cultures <strong>and</strong> history.” An <str<strong>on</strong>g>in</str<strong>on</strong>g>ternati<strong>on</strong>al team of scholars is work<str<strong>on</strong>g>in</str<strong>on</strong>g>g to create an “<strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e<br />
reference source” that will meet the needs both of advanced Syriac scholars <strong>and</strong> of the <str<strong>on</strong>g>in</str<strong>on</strong>g>terested<br />
public, <strong>and</strong> their website offers a useful mockup of the potential portal as well as a list of potential user<br />
scenarios (e.g., a Syriac researcher work<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>on</strong> manuscripts, a n<strong>on</strong>specialist researcher).<br />
Accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to the project website, the “Syriac Reference Portal” 70 will serve as an “<str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> hub”<br />
that <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes an <strong>on</strong>tology or classificati<strong>on</strong> system that can be used for creat<str<strong>on</strong>g>in</str<strong>on</strong>g>g Syriac reference works,<br />
an <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e encyclopedia, a gazetteer that <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes both geographic <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> <strong>and</strong> maps that are<br />
relevant to Syriac studies, an extensive bibliography, <strong>and</strong> a multil<str<strong>on</strong>g>in</str<strong>on</strong>g>gual authority file that will support<br />
“st<strong>and</strong>ardiz<str<strong>on</strong>g>in</str<strong>on</strong>g>g references to Syriac authors, texts, <strong>and</strong> place names.” Other related research work<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g>cludes revisi<strong>on</strong> of the Unicode st<strong>and</strong>ard for Syriac, a TEI adaptati<strong>on</strong> for the descripti<strong>on</strong> of Syriac<br />
manuscripts, <strong>and</strong> plans to add a prosopographical tool. While the Syriac Research Group hopes to<br />
make all the resources listed above available <str<strong>on</strong>g>in</str<strong>on</strong>g> the first generati<strong>on</strong> of the portal, the ultimate goal is to<br />
open these resources to the scholarly community for “collaborative augmentati<strong>on</strong> <strong>and</strong> annotati<strong>on</strong>.” The<br />
project hopes that as the portal grows that the project team will be able to iteratively add to the hub by<br />
l<str<strong>on</strong>g>in</str<strong>on</strong>g>k<str<strong>on</strong>g>in</str<strong>on</strong>g>g to digitized Syriac c<strong>on</strong>tent that is available <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e, such as the Syriac corpus of literature be<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />
prepared by Brigham Young University (BYU), 71 the eBeth Arké Syriac Studies Collecti<strong>on</strong>, 72 <strong>and</strong> the<br />
Manumed project <str<strong>on</strong>g>in</str<strong>on</strong>g> the European Uni<strong>on</strong>. 73<br />
69 http://www.syriac.ua.edu/<br />
70 The “Syriac Reference Portal” has been based <strong>on</strong> the Indiana Philosophy Ontology Project (InPhO) (https://<str<strong>on</strong>g>in</str<strong>on</strong>g>pho.cogs.<str<strong>on</strong>g>in</str<strong>on</strong>g>diana.edu/)<br />
71 http://cpart.byu.edu/page=112&sidebar<br />
72 http://www.hmml.org/vivarium/BethArke.htm<br />
73 The Manumed Project (http://www.manumed.org/en/) is build<str<strong>on</strong>g>in</str<strong>on</strong>g>g an extensive virtual library of digitized cultural heritage documents from the Euro-<br />
Mediterranean regi<strong>on</strong> with a particular focus <strong>on</strong> manuscripts <str<strong>on</strong>g>in</str<strong>on</strong>g> languages <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g “Arabic, Greek, Lat<str<strong>on</strong>g>in</str<strong>on</strong>g>, Syriac, Hebrew, Aramaic, Coptic, Berber,<br />
Armenian.”