26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

260<br />

resources <strong>and</strong> technology overview; <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> gather<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> dissem<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong>; <str<strong>on</strong>g>in</str<strong>on</strong>g>tellectual property<br />

rights <strong>and</strong> bus<str<strong>on</strong>g>in</str<strong>on</strong>g>ess models; <strong>and</strong> c<strong>on</strong>structi<strong>on</strong> <strong>and</strong> exploitati<strong>on</strong> agreements. CLARIN is still <str<strong>on</strong>g>in</str<strong>on</strong>g> its<br />

preparatory stage <strong>and</strong> envisi<strong>on</strong>s two later phases, a c<strong>on</strong>structi<strong>on</strong> phase <strong>and</strong> an exploitati<strong>on</strong> phase. This<br />

preparatory phase has a number of objectives, <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g organiz<str<strong>on</strong>g>in</str<strong>on</strong>g>g the fund<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> governance <str<strong>on</strong>g>in</str<strong>on</strong>g> 22<br />

countries <strong>and</strong> thoroughly explor<str<strong>on</strong>g>in</str<strong>on</strong>g>g the technical dimensi<strong>on</strong>, for, as Váradi et al. admit, “a language<br />

resources <strong>and</strong> technology <str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure is a novel c<strong>on</strong>cept.” CLARIN is fully <str<strong>on</strong>g>in</str<strong>on</strong>g>vestigat<str<strong>on</strong>g>in</str<strong>on</strong>g>g the user<br />

dimensi<strong>on</strong> <strong>and</strong> is undertak<str<strong>on</strong>g>in</str<strong>on</strong>g>g an analysis of how language technology is currently used <str<strong>on</strong>g>in</str<strong>on</strong>g> the<br />

humanities to make sure that all developed technical specificati<strong>on</strong>s meet the actual needs of humanities<br />

users. This scop<str<strong>on</strong>g>in</str<strong>on</strong>g>g study <strong>and</strong> research <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes undertak<str<strong>on</strong>g>in</str<strong>on</strong>g>g a number of typical humanities research<br />

projects to help validate developed prototypes. 738 In additi<strong>on</strong>, they plan to c<strong>on</strong>duct outreach to less<br />

technologically advanced secti<strong>on</strong>s of the humanities <strong>and</strong> social sciences to promote the use of language<br />

resources <strong>and</strong> technology (Váradi et al. 2008). CLARIN is also seek<str<strong>on</strong>g>in</str<strong>on</strong>g>g to br<str<strong>on</strong>g>in</str<strong>on</strong>g>g together the<br />

humanities <strong>and</strong> language technology communities, <strong>and</strong> it plans to collaborate with DARIAH <str<strong>on</strong>g>in</str<strong>on</strong>g> this<br />

area <strong>and</strong> others. 739<br />

One example of a humanities case study was reported by Villegas <strong>and</strong> Parra (2009), who explored the<br />

scenario of a social historian wish<str<strong>on</strong>g>in</str<strong>on</strong>g>g to c<strong>on</strong>duct a search of multiple newspaper archives. They found<br />

that provid<str<strong>on</strong>g>in</str<strong>on</strong>g>g access to primary source data that were “highly distributed <strong>and</strong> stored <str<strong>on</strong>g>in</str<strong>on</strong>g> different<br />

applicati<strong>on</strong>s with different formats” was very difficult <strong>and</strong> that humanities researchers required the<br />

“<str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>teroperability of distributed <strong>and</strong> heterogeneous research data.” Villegas <strong>and</strong> Parra<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>cluded a detailed analysis of the complicated steps required to create a f<str<strong>on</strong>g>in</str<strong>on</strong>g>al envir<strong>on</strong>ment where the<br />

user could actually analyze the data. They also provided some <str<strong>on</strong>g>in</str<strong>on</strong>g>sights for further CLARIN research<br />

<strong>and</strong> <strong>on</strong>go<str<strong>on</strong>g>in</str<strong>on</strong>g>g case studies; namely, that (1) humanists need to be made better aware of exist<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics resources <strong>and</strong> tools; (2) users need <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrated access to data with more automated processes<br />

to simplify laborious data-gather<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> -<str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> tasks; (3) the use of st<strong>and</strong>ards <strong>and</strong> protocols<br />

would help make data <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> easier; (4) NLP tools require textual data <str<strong>on</strong>g>in</str<strong>on</strong>g> order to perform<br />

automated analysis but many data providers do not provide access to their data <str<strong>on</strong>g>in</str<strong>on</strong>g> a textual format; <strong>and</strong><br />

(5) the use of web services with st<strong>and</strong>ardized <str<strong>on</strong>g>in</str<strong>on</strong>g>terfaces <strong>and</strong> str<strong>on</strong>gly typed XML messages could help<br />

guarantee <str<strong>on</strong>g>in</str<strong>on</strong>g>teroperability of resources <strong>and</strong> tools. The authors admit that this last desideratum will<br />

require a great deal of c<strong>on</strong>sensus am<strong>on</strong>g service providers.<br />

In additi<strong>on</strong> to study<str<strong>on</strong>g>in</str<strong>on</strong>g>g how humanities users might use language tools <strong>and</strong> resources, CLARIN plans<br />

to <str<strong>on</strong>g>in</str<strong>on</strong>g>clude language resources for all European languages <str<strong>on</strong>g>in</str<strong>on</strong>g> participat<str<strong>on</strong>g>in</str<strong>on</strong>g>g countries, <strong>and</strong> it has def<str<strong>on</strong>g>in</str<strong>on</strong>g>ed<br />

BLARK, or a Basic Language Resources Toolkit, that will be required for each well-documented<br />

language. BLARKs must c<strong>on</strong>sist of two types of lexic<strong>on</strong>s, <strong>on</strong>e “form based” <strong>and</strong> <strong>on</strong>e “lexical<br />

semantic,” or essentially a treebank <strong>and</strong> an automatically annotated larger corpus. As part of this work,<br />

CLARIN has recently made a number of services available <strong>on</strong> its website under its “Virtual Language<br />

Observatory.” 740 Included am<strong>on</strong>g these services are massive language-resource <strong>and</strong> language-tool<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>ventories that can be searched or browsed by faceted metadata.<br />

738 As various publicati<strong>on</strong>s <strong>and</strong> deliverables are completed for the various work packages, all reports can be downloaded from the CLARIN website<br />

(http://www.clar<str<strong>on</strong>g>in</str<strong>on</strong>g>.eu/deliverables).<br />

739 In July of 2011, both CLARIN <strong>and</strong> DARIAH signed a “letter of <str<strong>on</strong>g>in</str<strong>on</strong>g>tent” with EGI (European Grid Infrastructure (http://www.egi.eu)), “which has the<br />

express <str<strong>on</strong>g>in</str<strong>on</strong>g>tenti<strong>on</strong> of ensur<str<strong>on</strong>g>in</str<strong>on</strong>g>g that technology developed by the two ESFRI projects <strong>and</strong> EGI are compatible <strong>and</strong> provides the best service to their users.”<br />

Both CLARIN <strong>and</strong> DARIAH are funded by ESFRI (European Strategy Forum <strong>on</strong> Research Infrastructures), <strong>and</strong> this agreement with the EGI has the end<br />

goal of ensur<str<strong>on</strong>g>in</str<strong>on</strong>g>g that all three projects “develop comm<strong>on</strong> tools <strong>and</strong> technologies while explor<str<strong>on</strong>g>in</str<strong>on</strong>g>g further opportunities for collaborati<strong>on</strong>.”<br />

(http://www.dariah.eu/<str<strong>on</strong>g>in</str<strong>on</strong>g>dex.phpopti<strong>on</strong>=com_c<strong>on</strong>tent&view=article&id=151:egi-signed-letter-of-<str<strong>on</strong>g>in</str<strong>on</strong>g>tent-with-dariah-<strong>and</strong>clar<str<strong>on</strong>g>in</str<strong>on</strong>g>&catid=3:dariah&Itemid=197)<br />

740 http://www.clar<str<strong>on</strong>g>in</str<strong>on</strong>g>.eu/vlo/ <strong>and</strong> for more details <strong>on</strong> this resource, see (Uytvanck et al. 2010).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!