26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

73<br />

Another research project that has explored <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrat<str<strong>on</strong>g>in</str<strong>on</strong>g>g diverse data sets <str<strong>on</strong>g>in</str<strong>on</strong>g> archaeology is the STAR-<br />

Semantic Technologies for Archaeology Resources) Project 222 based at the University of Glamorgan.<br />

This AHRC-funded project worked <str<strong>on</strong>g>in</str<strong>on</strong>g> collaborati<strong>on</strong> with English Heritage (EH) 223 to develop<br />

semantic technologies that could be used <str<strong>on</strong>g>in</str<strong>on</strong>g> the doma<str<strong>on</strong>g>in</str<strong>on</strong>g> of digital archaeology. They sought to<br />

dem<strong>on</strong>strate the utility of cross-search<str<strong>on</strong>g>in</str<strong>on</strong>g>g archaeological data that were expressed as RDF <strong>and</strong> that<br />

c<strong>on</strong>formed to a s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle <strong>on</strong>tological scheme (B<str<strong>on</strong>g>in</str<strong>on</strong>g>d<str<strong>on</strong>g>in</str<strong>on</strong>g>g et al. 2008). For this <strong>on</strong>tological scheme, B<str<strong>on</strong>g>in</str<strong>on</strong>g>d<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

et al. (2008) expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed that they had created a “modular RDF extensi<strong>on</strong>” of CRM-EH, 224 an<br />

archaeological extensi<strong>on</strong> of the CIDOC-CRM that had been produced by EH but had previously <strong>on</strong>ly<br />

existed <strong>on</strong> paper. “The <str<strong>on</strong>g>in</str<strong>on</strong>g>itial work <strong>on</strong> the CRM-EH was prompted by a need to model the<br />

archaeological processes <strong>and</strong> c<strong>on</strong>cepts <str<strong>on</strong>g>in</str<strong>on</strong>g> use by the (EH) archaeological teams,” B<str<strong>on</strong>g>in</str<strong>on</strong>g>d<str<strong>on</strong>g>in</str<strong>on</strong>g>g et al. (2008)<br />

expla<str<strong>on</strong>g>in</str<strong>on</strong>g>ed, “to <str<strong>on</strong>g>in</str<strong>on</strong>g>form future systems design <strong>and</strong> to aid <str<strong>on</strong>g>in</str<strong>on</strong>g> the potential <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> of archaeological<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> <str<strong>on</strong>g>in</str<strong>on</strong>g>teroperable web based research <str<strong>on</strong>g>in</str<strong>on</strong>g>itiatives.” It was hoped that the design of a comm<strong>on</strong><br />

<strong>on</strong>tology would support greater “cross doma<str<strong>on</strong>g>in</str<strong>on</strong>g> search<str<strong>on</strong>g>in</str<strong>on</strong>g>g” <strong>and</strong> more “semantic depth” for<br />

archaeological queries.<br />

The STAR project mapped five archaeological databases, each with its own unique schema, to the<br />

CRM-EH <strong>on</strong>tology. The <str<strong>on</strong>g>in</str<strong>on</strong>g>itial mapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g between database columns <strong>and</strong> RDF entities was undertaken<br />

manually with the assistance of doma<str<strong>on</strong>g>in</str<strong>on</strong>g> experts for <strong>on</strong>e of these databases; this mapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g was then used<br />

to extrapolate the mapp<str<strong>on</strong>g>in</str<strong>on</strong>g>gs for the other databases. After the data from these five databases were<br />

mapped to the CRM-EH, a complicated data-extracti<strong>on</strong> process that <str<strong>on</strong>g>in</str<strong>on</strong>g>volved the creati<strong>on</strong> of unique<br />

identifiers, the model<str<strong>on</strong>g>in</str<strong>on</strong>g>g of events through the creati<strong>on</strong> of <str<strong>on</strong>g>in</str<strong>on</strong>g>termediate “virtual” entities, <strong>and</strong> the<br />

model<str<strong>on</strong>g>in</str<strong>on</strong>g>g of “data <str<strong>on</strong>g>in</str<strong>on</strong>g>stance values” was undertaken. A mapp<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> extracti<strong>on</strong> utility was built to<br />

allow users to query the archaeological data <strong>and</strong> then save their query as an XML file for later reuse<br />

<strong>and</strong> the results of their query as tabular data <str<strong>on</strong>g>in</str<strong>on</strong>g> an RDF format. To help users work with the extracted<br />

archaeological data that were stored as RDF, the STAR project <str<strong>on</strong>g>in</str<strong>on</strong>g>itially built a prototype<br />

“search/browse applicati<strong>on</strong>” where the extracted data were stored <str<strong>on</strong>g>in</str<strong>on</strong>g> a MySQL 225 RDF triplestore 226<br />

<strong>and</strong> users could query it us<str<strong>on</strong>g>in</str<strong>on</strong>g>g a CRM-based web service. This RDF data store not <strong>on</strong>ly holds the<br />

CRM-EH <strong>on</strong>tology <strong>and</strong> the “amalgamated data” from the separate archaeological databases but also<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>cludes a number of important doma<str<strong>on</strong>g>in</str<strong>on</strong>g>-specific thesauri <strong>and</strong> glossaries that are represented <str<strong>on</strong>g>in</str<strong>on</strong>g> SKOS<br />

(B<str<strong>on</strong>g>in</str<strong>on</strong>g>d<str<strong>on</strong>g>in</str<strong>on</strong>g>g 2010). 227<br />

Mov<str<strong>on</strong>g>in</str<strong>on</strong>g>g bey<strong>on</strong>d this <str<strong>on</strong>g>in</str<strong>on</strong>g>itial prototype, the STAR project website offers access to a full research<br />

dem<strong>on</strong>strator 228 that provides a SPARQL 229 -based semantic search not <strong>on</strong>ly of the extracted databases<br />

but also of an archaeological grey literature collecti<strong>on</strong> made available to them by the ADS. 230 They<br />

used NLP <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong>-extracti<strong>on</strong> techniques 231 to identify key c<strong>on</strong>cepts <str<strong>on</strong>g>in</str<strong>on</strong>g> the grey literature that were<br />

222 http://hypermedia.research.glam.ac.uk/kos/star/<br />

223 http://www.english-heritage.org.uk/<br />

224 http://hypermedia.research.glam.ac.uk/resources/crm/<br />

225 MySQL is a “relati<strong>on</strong>al database management system that runs as a server provid<str<strong>on</strong>g>in</str<strong>on</strong>g>g multi-user access to a number of databases”<br />

(http://en.wikipedia.org/wiki/MySQL). The source code for MySQL has been made available under a GNU General Public License <strong>and</strong> is a popular choice<br />

of “database for use <str<strong>on</strong>g>in</str<strong>on</strong>g> web applicati<strong>on</strong>s.”<br />

226 A triplestore is a “purpose-built database for the storage <strong>and</strong> retrieval of Resource Descripti<strong>on</strong> Framework (RDF) metadata”<br />

(http://en.wikipedia.org/wiki/Triplestore#) <strong>and</strong> they are designed for the storage <strong>and</strong> retrieval of statements known as triples, “<str<strong>on</strong>g>in</str<strong>on</strong>g> the form of subjectpredicate-object”<br />

as are used <str<strong>on</strong>g>in</str<strong>on</strong>g> the creati<strong>on</strong> of RDF statements.<br />

227 SKOS st<strong>and</strong>s for “Simple Knowledge Organizati<strong>on</strong> System” <strong>and</strong> provides a RDF model for encod<str<strong>on</strong>g>in</str<strong>on</strong>g>g reference tool such as thesauri, tax<strong>on</strong>omies, <strong>and</strong><br />

classificati<strong>on</strong> systems. SKOS is under active development as part of the W3C’s Semantic Web activity (http://www.w3.org/2004/02/skos/)<br />

228 http://hypermedia.research.glam.ac.uk/resources/star-dem<strong>on</strong>strator/<br />

229 http://www.w3.org/TR/rdf-sparql-query/<br />

230 The grey literature collecti<strong>on</strong> that the STAR project used was an “extract of the OASIS corpus” that is ma<str<strong>on</strong>g>in</str<strong>on</strong>g>ta<str<strong>on</strong>g>in</str<strong>on</strong>g>ed by the ADS. The OASIS (Onl<str<strong>on</strong>g>in</str<strong>on</strong>g>e<br />

AccesS to the Index of archaeological <str<strong>on</strong>g>in</str<strong>on</strong>g>vestigati<strong>on</strong>S) (http://oasis.ac.uk/) corpus seeks to “provide an <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e <str<strong>on</strong>g>in</str<strong>on</strong>g>dex to the mass of archaeological grey<br />

literature that has been produced as a result of the advent of large-scale developer funded fieldwork.”<br />

231 The STAR project made use of GATE (http://gate.ac.uk/)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!