26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

113<br />

uncerta<str<strong>on</strong>g>in</str<strong>on</strong>g>ty regard<str<strong>on</strong>g>in</str<strong>on</strong>g>g dates <str<strong>on</strong>g>in</str<strong>on</strong>g> the legal texts <str<strong>on</strong>g>in</str<strong>on</strong>g> Volterra. To <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrate HGV <strong>and</strong> Volterra, they created<br />

annotati<strong>on</strong>s databases for each project or “r<strong>and</strong>omly-generated values associated with each record <str<strong>on</strong>g>in</str<strong>on</strong>g><br />

the orig<str<strong>on</strong>g>in</str<strong>on</strong>g>al databases” so they could “dem<strong>on</strong>strate cross-database jo<str<strong>on</strong>g>in</str<strong>on</strong>g>s <strong>and</strong> third-party annotati<strong>on</strong>s”<br />

(Jacks<strong>on</strong> et al. 2009).<br />

The project used OGSA-DAI 372 for data <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> because it was c<strong>on</strong>sidered a de facto st<strong>and</strong>ard by<br />

many other e-science projects for <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrat<str<strong>on</strong>g>in</str<strong>on</strong>g>g heterogeneous databases, it was open source, <strong>and</strong> it was<br />

compliant with many relati<strong>on</strong>al databases, XML, <strong>and</strong> other file-based resources. OGSA-DAI also<br />

supported the exposure of data resources <strong>on</strong> to grids (Bodard et al. 2009). Most important, <str<strong>on</strong>g>in</str<strong>on</strong>g> terms of<br />

data <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong>:<br />

… OGSA-DAI can abstract the underly<str<strong>on</strong>g>in</str<strong>on</strong>g>g databases us<str<strong>on</strong>g>in</str<strong>on</strong>g>g SQL views <strong>and</strong> provide an<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>tegrated <str<strong>on</strong>g>in</str<strong>on</strong>g>terface <strong>on</strong>to them us<str<strong>on</strong>g>in</str<strong>on</strong>g>g distributed query<str<strong>on</strong>g>in</str<strong>on</strong>g>g. This fulfils the essential requirement<br />

of the project to leave the underly<str<strong>on</strong>g>in</str<strong>on</strong>g>g data resources untouched as far as possible (Jacks<strong>on</strong> et al.<br />

2009).<br />

One goal of LaQuAT was to be able to support federated search<str<strong>on</strong>g>in</str<strong>on</strong>g>g of a “virtual database” <str<strong>on</strong>g>in</str<strong>on</strong>g> order that<br />

the underly<str<strong>on</strong>g>in</str<strong>on</strong>g>g databases would not have to undergo major changes for <str<strong>on</strong>g>in</str<strong>on</strong>g>clusi<strong>on</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> such a resource.<br />

“The ability to l<str<strong>on</strong>g>in</str<strong>on</strong>g>k up such diverse data resources, <str<strong>on</strong>g>in</str<strong>on</strong>g> a way that respects the orig<str<strong>on</strong>g>in</str<strong>on</strong>g>al data resources<br />

<strong>and</strong> the communities resp<strong>on</strong>sible for them,” Bodard et al. (2009) asserted, “is a press<str<strong>on</strong>g>in</str<strong>on</strong>g>g need am<strong>on</strong>g<br />

humanities researchers.”<br />

A number of issues complicated data <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong>, however, <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g data c<strong>on</strong>sistency <strong>and</strong> some<br />

specific features of OGSA-DAI. To beg<str<strong>on</strong>g>in</str<strong>on</strong>g> with, some of the orig<str<strong>on</strong>g>in</str<strong>on</strong>g>al data <str<strong>on</strong>g>in</str<strong>on</strong>g> the HGV database had<br />

been “c<strong>on</strong>tam<str<strong>on</strong>g>in</str<strong>on</strong>g>ated by c<strong>on</strong>trol characters,” a factor that had serious implicati<strong>on</strong>s for the OGSA-DAI<br />

system s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce it provided access to databases via web services, which are based <strong>on</strong> the exchange of<br />

XML documents. S<str<strong>on</strong>g>in</str<strong>on</strong>g>ce the use of c<strong>on</strong>trol characters with<str<strong>on</strong>g>in</str<strong>on</strong>g> an XML document results <str<strong>on</strong>g>in</str<strong>on</strong>g> an <str<strong>on</strong>g>in</str<strong>on</strong>g>valid<br />

XML file that cannot be parsed, they had to extend the system’s “relati<strong>on</strong>al data to XML c<strong>on</strong>versi<strong>on</strong><br />

classes to filter out such c<strong>on</strong>trol characters <strong>and</strong> replace these with spaces.” The Volterra database also<br />

presented its own unique challenges, particularly <str<strong>on</strong>g>in</str<strong>on</strong>g> terms of database design, s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce not all tables had<br />

the same columns <strong>and</strong> some columns with the same <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> had different names. A sec<strong>on</strong>d major<br />

challenge was the lack of suitable database drivers, <strong>and</strong> the data from both Volterra <strong>and</strong> HGV were<br />

ported <str<strong>on</strong>g>in</str<strong>on</strong>g>to MySQL to be able to <str<strong>on</strong>g>in</str<strong>on</strong>g>teract with OGSA-DAI. Other issues <str<strong>on</strong>g>in</str<strong>on</strong>g>cluded need<str<strong>on</strong>g>in</str<strong>on</strong>g>g to adapt the<br />

way the OGSA-DAI exposed metadata <strong>and</strong> hav<str<strong>on</strong>g>in</str<strong>on</strong>g>g to alter the way the system used SQL views<br />

because of the large nature of the HGV database. In the end, the project could use <strong>on</strong>ly a subset of the<br />

HGV database to ensure that query time would be reas<strong>on</strong>able. Despite these <strong>and</strong> other challenges, the<br />

project was able to develop a dem<strong>on</strong>strator that provided <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrated access to both HGV <strong>and</strong><br />

Volterra. 373<br />

The LaQuAT project had orig<str<strong>on</strong>g>in</str<strong>on</strong>g>ally assumed that <strong>on</strong>e of the most useful outcomes of <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrat<str<strong>on</strong>g>in</str<strong>on</strong>g>g the<br />

two databases would be where data overlapped (such as <str<strong>on</strong>g>in</str<strong>on</strong>g> terms of pers<strong>on</strong>al <strong>and</strong> place names), but<br />

they found <str<strong>on</strong>g>in</str<strong>on</strong>g>stead that clear-cut overlaps were fairly easy to identify. A far more <str<strong>on</strong>g>in</str<strong>on</strong>g>terest<str<strong>on</strong>g>in</str<strong>on</strong>g>g questi<strong>on</strong>,<br />

they proposed, was to try to automatically recognize “the co-existence of hom<strong>on</strong>ymous pers<strong>on</strong>s or<br />

372 While the technical details of this software are bey<strong>on</strong>d the scope of this paper, Jacks<strong>on</strong> et al. expla<str<strong>on</strong>g>in</str<strong>on</strong>g> that “OGSA-DAI executes workflows which can<br />

be viewed as scripts which specify what data is to be accessed <strong>and</strong> what is to be d<strong>on</strong>e to it. Workflows c<strong>on</strong>sist of activities, which are well-def<str<strong>on</strong>g>in</str<strong>on</strong>g>ed<br />

functi<strong>on</strong>al units which perform some data-related operati<strong>on</strong> e.g. query a database, transform data to XML, deliver data via FTP. A client submits a<br />

workflow to an OGSA-DAI server via an OGSA-DAI web service. The server parses, compiles <strong>and</strong> executes the workflow.”<br />

373 For more <strong>on</strong> the <str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure proof of c<strong>on</strong>cept design, see Jacks<strong>on</strong> et al. (2009). This dem<strong>on</strong>strator can be viewed at<br />

http://doma<str<strong>on</strong>g>in</str<strong>on</strong>g>001.vidar.ngs.manchester.ac.uk:8080/laquat/laquatDemo.jsp

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!