26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

269<br />

services, or “build<str<strong>on</strong>g>in</str<strong>on</strong>g>g blocks of specialized functi<strong>on</strong>ality,” <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g functi<strong>on</strong>alities such as<br />

tokenizati<strong>on</strong>, lemmatizati<strong>on</strong>, <strong>and</strong> collati<strong>on</strong> that are “wrapped <str<strong>on</strong>g>in</str<strong>on</strong>g>to <str<strong>on</strong>g>in</str<strong>on</strong>g>dividual services to be re-used by<br />

other services or plugged <str<strong>on</strong>g>in</str<strong>on</strong>g>to an applicati<strong>on</strong> envir<strong>on</strong>ment”; (3) the TextGrid middleware; <strong>and</strong> (4)<br />

stable archives (Aschenbrenner et al. 2009). They have also developed a semantic service registry for<br />

TextGrid. Ziel<str<strong>on</strong>g>in</str<strong>on</strong>g>ski et al. have offered a c<strong>on</strong>cise summary of their approach:<br />

The TextGrid <str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure is a multilayered system created with the motivati<strong>on</strong> to hide the<br />

complex grid <str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure … from the scholars <strong>and</strong> to make it possible to <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrate external<br />

services with TextGrid tools. Basically <str<strong>on</strong>g>in</str<strong>on</strong>g> this service oriented architecture (SOA), there are<br />

three layers: the user <str<strong>on</strong>g>in</str<strong>on</strong>g>terface, a services layer with tools for textual analysis <strong>and</strong> text<br />

process<str<strong>on</strong>g>in</str<strong>on</strong>g>g, <strong>and</strong> the TextGrid middleware, which itself <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes multiple layers (Ziel<str<strong>on</strong>g>in</str<strong>on</strong>g>ksi et al.<br />

2009).<br />

N<strong>on</strong>etheless, the TextGrid project faced a variety of data <str<strong>on</strong>g>in</str<strong>on</strong>g>teroperability challenges <str<strong>on</strong>g>in</str<strong>on</strong>g> terms of us<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

the TEI as its basic form of markup, s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce partner projects used the TEI at vary<str<strong>on</strong>g>in</str<strong>on</strong>g>g levels of<br />

sophisticati<strong>on</strong>. While they did not want to sacrifice the depth of semantic encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g the TEI offered,<br />

they needed to def<str<strong>on</strong>g>in</str<strong>on</strong>g>e a m<str<strong>on</strong>g>in</str<strong>on</strong>g>imum “abstracti<strong>on</strong> level” necessary to promote larger <str<strong>on</strong>g>in</str<strong>on</strong>g>teroperability of<br />

computati<strong>on</strong>al processes <str<strong>on</strong>g>in</str<strong>on</strong>g> TextGrid (Blanke et al. 2008). As a soluti<strong>on</strong>, TextGrid developed a “core”<br />

encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g approach:<br />

… which follows a simple pr<str<strong>on</strong>g>in</str<strong>on</strong>g>ciple: <strong>on</strong>e can always go from a higher semantic degree to a<br />

lower semantic degree; <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> possessi<strong>on</strong> of a suitable transformati<strong>on</strong> script, this mapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g can<br />

be d<strong>on</strong>e automatically. TextGrid encourages all its participat<str<strong>on</strong>g>in</str<strong>on</strong>g>g projects to describe their data<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g> an XML-based markup that is suitable for their specific research questi<strong>on</strong>s. At the same time<br />

projects can register a mapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g from their specific, semantically deep data to the respective<br />

TextGrid-wide core encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g that is a reas<strong>on</strong>ably expressive TEI-subset (Blanke et al. 2008)<br />

TextGrid’s soluti<strong>on</strong> thus attempts to respect the sophisticated encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g of local practices while<br />

ma<str<strong>on</strong>g>in</str<strong>on</strong>g>ta<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g a basic level of <str<strong>on</strong>g>in</str<strong>on</strong>g>teroperability. This illustrates the difficulties of support<str<strong>on</strong>g>in</str<strong>on</strong>g>g cross-corpora<br />

search<str<strong>on</strong>g>in</str<strong>on</strong>g>g even with<str<strong>on</strong>g>in</str<strong>on</strong>g> <strong>on</strong>e project. All c<strong>on</strong>tent that is created with the help of TextGridLab or comes<br />

from external resources is saved unchanged to the TextGrid repository. Metadata are extracted <strong>and</strong><br />

normalized before be<str<strong>on</strong>g>in</str<strong>on</strong>g>g stored <str<strong>on</strong>g>in</str<strong>on</strong>g> central metadata storage, <strong>and</strong> a full-text <str<strong>on</strong>g>in</str<strong>on</strong>g>dex is extracted from the<br />

raw data repository <strong>and</strong> updated with all changes (Ludwig <strong>and</strong> Küster 2008).<br />

The TextGridLab tool (an Eclipse-based GUI) is <str<strong>on</strong>g>in</str<strong>on</strong>g>tended to help users create TEI resources that can<br />

live with<str<strong>on</strong>g>in</str<strong>on</strong>g> the data grid. Although TEI documents form a large part of the resources <str<strong>on</strong>g>in</str<strong>on</strong>g> TextGrid, it can<br />

h<strong>and</strong>le heterogeneous data formats (pla<str<strong>on</strong>g>in</str<strong>on</strong>g> text, TEI/XML, images). TextGrid also provides a number of<br />

basic services (tokenizers, lemmatizers, sort<str<strong>on</strong>g>in</str<strong>on</strong>g>g tools, stream<str<strong>on</strong>g>in</str<strong>on</strong>g>g editors, collati<strong>on</strong> tools) that can be<br />

used aga<str<strong>on</strong>g>in</str<strong>on</strong>g>st its objects while also lett<str<strong>on</strong>g>in</str<strong>on</strong>g>g users create their own services with<str<strong>on</strong>g>in</str<strong>on</strong>g> a Web services<br />

framework:<br />

Web Service frameworks are available for many programm<str<strong>on</strong>g>in</str<strong>on</strong>g>g languages—so if a pers<strong>on</strong> or<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>stituti<strong>on</strong> wishes to make his/her text process<str<strong>on</strong>g>in</str<strong>on</strong>g>g tool available to the TextGrid community <strong>and</strong><br />

the workflow eng<str<strong>on</strong>g>in</str<strong>on</strong>g>e, the first step is to implement a Web Service wrapper for the tool <strong>and</strong><br />

deploy it <strong>on</strong> a public server (or <strong>on</strong>e of TextGrid’s). The next steps are to apply for registrati<strong>on</strong><br />

<str<strong>on</strong>g>in</str<strong>on</strong>g> the TextGrid service registry <strong>and</strong> to provide a client plug-<str<strong>on</strong>g>in</str<strong>on</strong>g> for the Eclipse GUI so that the<br />

tool is accessible for humans (GUI) <strong>and</strong> mach<str<strong>on</strong>g>in</str<strong>on</strong>g>es (service registry) alike (Ziel<str<strong>on</strong>g>in</str<strong>on</strong>g>ski et al. 2009).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!