Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
260<br />
resources <strong>and</strong> technology overview; <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> gather<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> dissem<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong>; <str<strong>on</strong>g>in</str<strong>on</strong>g>tellectual property<br />
rights <strong>and</strong> bus<str<strong>on</strong>g>in</str<strong>on</strong>g>ess models; <strong>and</strong> c<strong>on</strong>structi<strong>on</strong> <strong>and</strong> exploitati<strong>on</strong> agreements. CLARIN is still <str<strong>on</strong>g>in</str<strong>on</strong>g> its<br />
preparatory stage <strong>and</strong> envisi<strong>on</strong>s two later phases, a c<strong>on</strong>structi<strong>on</strong> phase <strong>and</strong> an exploitati<strong>on</strong> phase. This<br />
preparatory phase has a number of objectives, <str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g organiz<str<strong>on</strong>g>in</str<strong>on</strong>g>g the fund<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> governance <str<strong>on</strong>g>in</str<strong>on</strong>g> 22<br />
countries <strong>and</strong> thoroughly explor<str<strong>on</strong>g>in</str<strong>on</strong>g>g the technical dimensi<strong>on</strong>, for, as Váradi et al. admit, “a language<br />
resources <strong>and</strong> technology <str<strong>on</strong>g>in</str<strong>on</strong>g>frastructure is a novel c<strong>on</strong>cept.” CLARIN is fully <str<strong>on</strong>g>in</str<strong>on</strong>g>vestigat<str<strong>on</strong>g>in</str<strong>on</strong>g>g the user<br />
dimensi<strong>on</strong> <strong>and</strong> is undertak<str<strong>on</strong>g>in</str<strong>on</strong>g>g an analysis of how language technology is currently used <str<strong>on</strong>g>in</str<strong>on</strong>g> the<br />
humanities to make sure that all developed technical specificati<strong>on</strong>s meet the actual needs of humanities<br />
users. This scop<str<strong>on</strong>g>in</str<strong>on</strong>g>g study <strong>and</strong> research <str<strong>on</strong>g>in</str<strong>on</strong>g>cludes undertak<str<strong>on</strong>g>in</str<strong>on</strong>g>g a number of typical humanities research<br />
projects to help validate developed prototypes. 738 In additi<strong>on</strong>, they plan to c<strong>on</strong>duct outreach to less<br />
technologically advanced secti<strong>on</strong>s of the humanities <strong>and</strong> social sciences to promote the use of language<br />
resources <strong>and</strong> technology (Váradi et al. 2008). CLARIN is also seek<str<strong>on</strong>g>in</str<strong>on</strong>g>g to br<str<strong>on</strong>g>in</str<strong>on</strong>g>g together the<br />
humanities <strong>and</strong> language technology communities, <strong>and</strong> it plans to collaborate with DARIAH <str<strong>on</strong>g>in</str<strong>on</strong>g> this<br />
area <strong>and</strong> others. 739<br />
One example of a humanities case study was reported by Villegas <strong>and</strong> Parra (2009), who explored the<br />
scenario of a social historian wish<str<strong>on</strong>g>in</str<strong>on</strong>g>g to c<strong>on</strong>duct a search of multiple newspaper archives. They found<br />
that provid<str<strong>on</strong>g>in</str<strong>on</strong>g>g access to primary source data that were “highly distributed <strong>and</strong> stored <str<strong>on</strong>g>in</str<strong>on</strong>g> different<br />
applicati<strong>on</strong>s with different formats” was very difficult <strong>and</strong> that humanities researchers required the<br />
“<str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g>teroperability of distributed <strong>and</strong> heterogeneous research data.” Villegas <strong>and</strong> Parra<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g>cluded a detailed analysis of the complicated steps required to create a f<str<strong>on</strong>g>in</str<strong>on</strong>g>al envir<strong>on</strong>ment where the<br />
user could actually analyze the data. They also provided some <str<strong>on</strong>g>in</str<strong>on</strong>g>sights for further CLARIN research<br />
<strong>and</strong> <strong>on</strong>go<str<strong>on</strong>g>in</str<strong>on</strong>g>g case studies; namely, that (1) humanists need to be made better aware of exist<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />
l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics resources <strong>and</strong> tools; (2) users need <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrated access to data with more automated processes<br />
to simplify laborious data-gather<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> -<str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> tasks; (3) the use of st<strong>and</strong>ards <strong>and</strong> protocols<br />
would help make data <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> easier; (4) NLP tools require textual data <str<strong>on</strong>g>in</str<strong>on</strong>g> order to perform<br />
automated analysis but many data providers do not provide access to their data <str<strong>on</strong>g>in</str<strong>on</strong>g> a textual format; <strong>and</strong><br />
(5) the use of web services with st<strong>and</strong>ardized <str<strong>on</strong>g>in</str<strong>on</strong>g>terfaces <strong>and</strong> str<strong>on</strong>gly typed XML messages could help<br />
guarantee <str<strong>on</strong>g>in</str<strong>on</strong>g>teroperability of resources <strong>and</strong> tools. The authors admit that this last desideratum will<br />
require a great deal of c<strong>on</strong>sensus am<strong>on</strong>g service providers.<br />
In additi<strong>on</strong> to study<str<strong>on</strong>g>in</str<strong>on</strong>g>g how humanities users might use language tools <strong>and</strong> resources, CLARIN plans<br />
to <str<strong>on</strong>g>in</str<strong>on</strong>g>clude language resources for all European languages <str<strong>on</strong>g>in</str<strong>on</strong>g> participat<str<strong>on</strong>g>in</str<strong>on</strong>g>g countries, <strong>and</strong> it has def<str<strong>on</strong>g>in</str<strong>on</strong>g>ed<br />
BLARK, or a Basic Language Resources Toolkit, that will be required for each well-documented<br />
language. BLARKs must c<strong>on</strong>sist of two types of lexic<strong>on</strong>s, <strong>on</strong>e “form based” <strong>and</strong> <strong>on</strong>e “lexical<br />
semantic,” or essentially a treebank <strong>and</strong> an automatically annotated larger corpus. As part of this work,<br />
CLARIN has recently made a number of services available <strong>on</strong> its website under its “Virtual Language<br />
Observatory.” 740 Included am<strong>on</strong>g these services are massive language-resource <strong>and</strong> language-tool<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g>ventories that can be searched or browsed by faceted metadata.<br />
738 As various publicati<strong>on</strong>s <strong>and</strong> deliverables are completed for the various work packages, all reports can be downloaded from the CLARIN website<br />
(http://www.clar<str<strong>on</strong>g>in</str<strong>on</strong>g>.eu/deliverables).<br />
739 In July of 2011, both CLARIN <strong>and</strong> DARIAH signed a “letter of <str<strong>on</strong>g>in</str<strong>on</strong>g>tent” with EGI (European Grid Infrastructure (http://www.egi.eu)), “which has the<br />
express <str<strong>on</strong>g>in</str<strong>on</strong>g>tenti<strong>on</strong> of ensur<str<strong>on</strong>g>in</str<strong>on</strong>g>g that technology developed by the two ESFRI projects <strong>and</strong> EGI are compatible <strong>and</strong> provides the best service to their users.”<br />
Both CLARIN <strong>and</strong> DARIAH are funded by ESFRI (European Strategy Forum <strong>on</strong> Research Infrastructures), <strong>and</strong> this agreement with the EGI has the end<br />
goal of ensur<str<strong>on</strong>g>in</str<strong>on</strong>g>g that all three projects “develop comm<strong>on</strong> tools <strong>and</strong> technologies while explor<str<strong>on</strong>g>in</str<strong>on</strong>g>g further opportunities for collaborati<strong>on</strong>.”<br />
(http://www.dariah.eu/<str<strong>on</strong>g>in</str<strong>on</strong>g>dex.phpopti<strong>on</strong>=com_c<strong>on</strong>tent&view=article&id=151:egi-signed-letter-of-<str<strong>on</strong>g>in</str<strong>on</strong>g>tent-with-dariah-<strong>and</strong>clar<str<strong>on</strong>g>in</str<strong>on</strong>g>&catid=3:dariah&Itemid=197)<br />
740 http://www.clar<str<strong>on</strong>g>in</str<strong>on</strong>g>.eu/vlo/ <strong>and</strong> for more details <strong>on</strong> this resource, see (Uytvanck et al. 2010).