Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
269<br />
services, or “build<str<strong>on</strong>g>in</str<strong>on</strong>g>g blocks of specialized functi<strong>on</strong>ality,” <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g functi<strong>on</strong>alities such as<br />
tokenizati<strong>on</strong>, lemmatizati<strong>on</strong>, <strong>and</strong> collati<strong>on</strong> that are “wrapped <str<strong>on</strong>g>in</str<strong>on</strong>g>to <str<strong>on</strong>g>in</str<strong>on</strong>g>dividual services to be re-used by<br />
other services or plugged <str<strong>on</strong>g>in</str<strong>on</strong>g>to an applicati<strong>on</strong> envir<strong>on</strong>ment”; (3) the TextGrid middleware; <strong>and</strong> (4)<br />
stable archives (Aschenbrenner et al. 2009). They have also developed a semantic service registry for<br />
TextGrid. Ziel<str<strong>on</strong>g>in</str<strong>on</strong>g>ski et al. have offered a c<strong>on</strong>cise summary of their approach:<br />
The TextGrid <str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure is a multilayered system created with the motivati<strong>on</strong> to hide the<br />
complex grid <str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure … from the scholars <strong>and</strong> to make it possible to <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrate external<br />
services with TextGrid tools. Basically <str<strong>on</strong>g>in</str<strong>on</strong>g> this service oriented architecture (SOA), there are<br />
three layers: the user <str<strong>on</strong>g>in</str<strong>on</strong>g>terface, a services layer with tools for textual analysis <strong>and</strong> text<br />
process<str<strong>on</strong>g>in</str<strong>on</strong>g>g, <strong>and</strong> the TextGrid middleware, which itself <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes multiple layers (Ziel<str<strong>on</strong>g>in</str<strong>on</strong>g>ksi et al.<br />
2009).<br />
N<strong>on</strong>etheless, the TextGrid project faced a variety of data <str<strong>on</strong>g>in</str<strong>on</strong>g>teroperability challenges <str<strong>on</strong>g>in</str<strong>on</strong>g> terms of us<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />
the TEI as its basic form of markup, s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce partner projects used the TEI at vary<str<strong>on</strong>g>in</str<strong>on</strong>g>g levels of<br />
sophisticati<strong>on</strong>. While they did not want to sacrifice the depth of semantic encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g the TEI offered,<br />
they needed to def<str<strong>on</strong>g>in</str<strong>on</strong>g>e a m<str<strong>on</strong>g>in</str<strong>on</strong>g>imum “abstracti<strong>on</strong> level” necessary to promote larger <str<strong>on</strong>g>in</str<strong>on</strong>g>teroperability of<br />
computati<strong>on</strong>al processes <str<strong>on</strong>g>in</str<strong>on</strong>g> TextGrid (Blanke et al. 2008). As a soluti<strong>on</strong>, TextGrid developed a “core”<br />
encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g approach:<br />
… which follows a simple pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ciple: <strong>on</strong>e can always go from a higher semantic degree to a<br />
lower semantic degree; <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> possessi<strong>on</strong> of a suitable transformati<strong>on</strong> script, this mapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g can<br />
be d<strong>on</strong>e automatically. TextGrid encourages all its participat<str<strong>on</strong>g>in</str<strong>on</strong>g>g projects to describe their data<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g> an XML-based markup that is suitable for their specific research questi<strong>on</strong>s. At the same time<br />
projects can register a mapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g from their specific, semantically deep data to the respective<br />
TextGrid-wide core encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g that is a reas<strong>on</strong>ably expressive TEI-subset (Blanke et al. 2008)<br />
TextGrid’s soluti<strong>on</strong> thus attempts to respect the sophisticated encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g of local practices while<br />
ma<str<strong>on</strong>g>in</str<strong>on</strong>g>ta<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g a basic level of <str<strong>on</strong>g>in</str<strong>on</strong>g>teroperability. This illustrates the difficulties of support<str<strong>on</strong>g>in</str<strong>on</strong>g>g cross-corpora<br />
search<str<strong>on</strong>g>in</str<strong>on</strong>g>g even with<str<strong>on</strong>g>in</str<strong>on</strong>g> <strong>on</strong>e project. All c<strong>on</strong>tent that is created with the help of TextGridLab or comes<br />
from external resources is saved unchanged to the TextGrid repository. Metadata are extracted <strong>and</strong><br />
normalized before be<str<strong>on</strong>g>in</str<strong>on</strong>g>g stored <str<strong>on</strong>g>in</str<strong>on</strong>g> central metadata storage, <strong>and</strong> a full-text <str<strong>on</strong>g>in</str<strong>on</strong>g>dex is extracted from the<br />
raw data repository <strong>and</strong> updated with all changes (Ludwig <strong>and</strong> Küster 2008).<br />
The TextGridLab tool (an Eclipse-based GUI) is <str<strong>on</strong>g>in</str<strong>on</strong>g>tended to help users create TEI resources that can<br />
live with<str<strong>on</strong>g>in</str<strong>on</strong>g> the data grid. Although TEI documents form a large part of the resources <str<strong>on</strong>g>in</str<strong>on</strong>g> TextGrid, it can<br />
h<strong>and</strong>le heterogeneous data formats (pla<str<strong>on</strong>g>in</str<strong>on</strong>g> text, TEI/XML, images). TextGrid also provides a number of<br />
basic services (tokenizers, lemmatizers, sort<str<strong>on</strong>g>in</str<strong>on</strong>g>g tools, stream<str<strong>on</strong>g>in</str<strong>on</strong>g>g editors, collati<strong>on</strong> tools) that can be<br />
used aga<str<strong>on</strong>g>in</str<strong>on</strong>g>st its objects while also lett<str<strong>on</strong>g>in</str<strong>on</strong>g>g users create their own services with<str<strong>on</strong>g>in</str<strong>on</strong>g> a Web services<br />
framework:<br />
Web Service frameworks are available for many programm<str<strong>on</strong>g>in</str<strong>on</strong>g>g languages—so if a pers<strong>on</strong> or<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g>stituti<strong>on</strong> wishes to make his/her text process<str<strong>on</strong>g>in</str<strong>on</strong>g>g tool available to the TextGrid community <strong>and</strong><br />
the workflow eng<str<strong>on</strong>g>in</str<strong>on</strong>g>e, the first step is to implement a Web Service wrapper for the tool <strong>and</strong><br />
deploy it <strong>on</strong> a public server (or <strong>on</strong>e of TextGrid’s). The next steps are to apply for registrati<strong>on</strong><br />
<str<strong>on</strong>g>in</str<strong>on</strong>g> the TextGrid service registry <strong>and</strong> to provide a client plug-<str<strong>on</strong>g>in</str<strong>on</strong>g> for the Eclipse GUI so that the<br />
tool is accessible for humans (GUI) <strong>and</strong> mach<str<strong>on</strong>g>in</str<strong>on</strong>g>es (service registry) alike (Ziel<str<strong>on</strong>g>in</str<strong>on</strong>g>ski et al. 2009).