Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
21<br />
Another major scholarly <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e collecti<strong>on</strong> of Sanskrit is the Digital Corpus of Sanskrit (DCS), 61 which<br />
provides access to a searchable collecti<strong>on</strong> of lemmatized Sanskrit texts <strong>and</strong> to a partial versi<strong>on</strong> of the<br />
database of the SanskritTagger software. SanskritTagger is a “part-of-speech (POS) <strong>and</strong> lexical tagger<br />
for post-Vedic Sanskrit” <strong>and</strong> it is able to analyze unprocessed digital Sanskrit text both lexically <strong>and</strong><br />
morphologically. 62 The DCS was automatically created from the most recent versi<strong>on</strong> of the<br />
SanskritTagger database with a corpus chosen by the software creator Oliver Hellwig (the website<br />
notes that this corpus had made no attempt to be exhaustive). The DCS was designed to support<br />
research <str<strong>on</strong>g>in</str<strong>on</strong>g> Sanskrit philology, <strong>and</strong> it is possible to search for lexical units <strong>and</strong> their collocati<strong>on</strong>s from a<br />
corpus of 2,700,000 words.<br />
A variety of research has been c<strong>on</strong>ducted <str<strong>on</strong>g>in</str<strong>on</strong>g>to the development of tools for Sanskrit, <strong>and</strong> this<br />
subsecti<strong>on</strong> reviews <strong>on</strong>ly some of it. The need for digitized Sanskrit lexic<strong>on</strong>s 63 as part of a larger<br />
computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics platform is an area of research for the Sanskrit <strong>Library</strong>, an issue that receives<br />
substantial attenti<strong>on</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> Huet (2004). Huet’s article provides an overview of work to develop both a<br />
Sanskrit lexical database <strong>and</strong> various automatic-tagg<str<strong>on</strong>g>in</str<strong>on</strong>g>g tools to support a philologist:<br />
The first level of <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong> of a Sanskrit text is its word-to-word segmentati<strong>on</strong>, <strong>and</strong> our<br />
tagger will be able to assist a philology specialist to achieve complete morphological mark-up<br />
systematically. This will allow the development of c<strong>on</strong>cordance analysis tools recogniz<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />
morphological variants, a task which up to now has to be performed manually (Huet 2004).<br />
Huet also asserted that the classical Sanskrit corpus is extensive <strong>and</strong> presents computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics<br />
with many analytical challenges.<br />
In additi<strong>on</strong> to the challenges Sanskrit presents for develop<str<strong>on</strong>g>in</str<strong>on</strong>g>g computati<strong>on</strong>al tools, the features of the<br />
language itself make the creati<strong>on</strong> of critical editi<strong>on</strong>s difficult. As Csernel <strong>and</strong> Patte (2009) expla<str<strong>on</strong>g>in</str<strong>on</strong>g>, a<br />
“critical editi<strong>on</strong>” must take “<str<strong>on</strong>g>in</str<strong>on</strong>g>to account all the different known versi<strong>on</strong>s of the same text <str<strong>on</strong>g>in</str<strong>on</strong>g> order to<br />
show the differences between any two dist<str<strong>on</strong>g>in</str<strong>on</strong>g>ct versi<strong>on</strong>s.” 64 The creati<strong>on</strong> of critical editi<strong>on</strong>s is<br />
challeng<str<strong>on</strong>g>in</str<strong>on</strong>g>g <str<strong>on</strong>g>in</str<strong>on</strong>g> any language, particularly if there are many manuscript witnesses, but Sanskrit presents<br />
some unique problems. In this paper, Csernel <strong>and</strong> Patte present an approach based <strong>on</strong> paragraphs <strong>and</strong><br />
sentences extracted from a collecti<strong>on</strong> of manuscripts known as the “Banaras” gloss. This gloss was<br />
written <str<strong>on</strong>g>in</str<strong>on</strong>g> the seventh century AD <strong>and</strong> is the most famous commentary <strong>on</strong> the “notorious” Pan<str<strong>on</strong>g>in</str<strong>on</strong>g>i<br />
grammar, which was known as the first “generative” grammar <strong>and</strong> was written around the fifth century<br />
BC. One major characteristic of Sanskrit described by Csernel <strong>and</strong> Patte is that it is “not l<str<strong>on</strong>g>in</str<strong>on</strong>g>ked to a<br />
specific script,” <strong>and</strong> while the Brahmi script was used for a l<strong>on</strong>g time, Devanagari is now the most<br />
comm<strong>on</strong>. The authors reported that they used the transliterati<strong>on</strong> scheme of Sanskrit for Tex that was<br />
developed by Frans Velthius 65 where<str<strong>on</strong>g>in</str<strong>on</strong>g> each Sanskrit letter is written us<str<strong>on</strong>g>in</str<strong>on</strong>g>g between <strong>on</strong>e <strong>and</strong> three<br />
Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> characters.<br />
An <str<strong>on</strong>g>in</str<strong>on</strong>g>terest<str<strong>on</strong>g>in</str<strong>on</strong>g>g <str<strong>on</strong>g>in</str<strong>on</strong>g>sight provided by these authors was how <strong>on</strong>e problematic feature of Sanskrit texts—<br />
namely, text written without spaces—was also found <str<strong>on</strong>g>in</str<strong>on</strong>g> other ancient texts:<br />
61 http://kjc-fs-cluster.kjc.uni-heidelberg.de/dcs/<br />
62 For more details <strong>on</strong> this tagger, see Hellwig (2007); for <strong>on</strong>e of its research uses <str<strong>on</strong>g>in</str<strong>on</strong>g> philology see Hellwig (2010).<br />
63 The NEH has recently funded a first step <str<strong>on</strong>g>in</str<strong>on</strong>g> this directi<strong>on</strong>. A project entitled “Sanskrit Lexical Sources: Digital Synthesis <strong>and</strong> Revisi<strong>on</strong>” will support an<br />
“<str<strong>on</strong>g>in</str<strong>on</strong>g>ternati<strong>on</strong>al partnership between the Sanskrit <strong>Library</strong> (Maharishi University of Management) <strong>and</strong> the Cologne Digital Sanskrit Lexic<strong>on</strong> (CDSL) project<br />
(Institute of Indology <strong>and</strong> Tamil Studies, Cologne University) to establish a digital Sanskrit lexical reference work.”<br />
http://www.neh.gov/news/archive/201007200.html<br />
64 Further discussi<strong>on</strong> of this issue can be found <str<strong>on</strong>g>in</str<strong>on</strong>g> the secti<strong>on</strong> <strong>on</strong> Digital Editi<strong>on</strong>s.<br />
65 http://www.ctan.org/tex-archive/language/devanagari/velthuis/