26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

21<br />

Another major scholarly <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e collecti<strong>on</strong> of Sanskrit is the Digital Corpus of Sanskrit (DCS), 61 which<br />

provides access to a searchable collecti<strong>on</strong> of lemmatized Sanskrit texts <strong>and</strong> to a partial versi<strong>on</strong> of the<br />

database of the SanskritTagger software. SanskritTagger is a “part-of-speech (POS) <strong>and</strong> lexical tagger<br />

for post-Vedic Sanskrit” <strong>and</strong> it is able to analyze unprocessed digital Sanskrit text both lexically <strong>and</strong><br />

morphologically. 62 The DCS was automatically created from the most recent versi<strong>on</strong> of the<br />

SanskritTagger database with a corpus chosen by the software creator Oliver Hellwig (the website<br />

notes that this corpus had made no attempt to be exhaustive). The DCS was designed to support<br />

research <str<strong>on</strong>g>in</str<strong>on</strong>g> Sanskrit philology, <strong>and</strong> it is possible to search for lexical units <strong>and</strong> their collocati<strong>on</strong>s from a<br />

corpus of 2,700,000 words.<br />

A variety of research has been c<strong>on</strong>ducted <str<strong>on</strong>g>in</str<strong>on</strong>g>to the development of tools for Sanskrit, <strong>and</strong> this<br />

subsecti<strong>on</strong> reviews <strong>on</strong>ly some of it. The need for digitized Sanskrit lexic<strong>on</strong>s 63 as part of a larger<br />

computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics platform is an area of research for the Sanskrit <strong>Library</strong>, an issue that receives<br />

substantial attenti<strong>on</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> Huet (2004). Huet’s article provides an overview of work to develop both a<br />

Sanskrit lexical database <strong>and</strong> various automatic-tagg<str<strong>on</strong>g>in</str<strong>on</strong>g>g tools to support a philologist:<br />

The first level of <str<strong>on</strong>g>in</str<strong>on</strong>g>terpretati<strong>on</strong> of a Sanskrit text is its word-to-word segmentati<strong>on</strong>, <strong>and</strong> our<br />

tagger will be able to assist a philology specialist to achieve complete morphological mark-up<br />

systematically. This will allow the development of c<strong>on</strong>cordance analysis tools recogniz<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

morphological variants, a task which up to now has to be performed manually (Huet 2004).<br />

Huet also asserted that the classical Sanskrit corpus is extensive <strong>and</strong> presents computati<strong>on</strong>al l<str<strong>on</strong>g>in</str<strong>on</strong>g>guistics<br />

with many analytical challenges.<br />

In additi<strong>on</strong> to the challenges Sanskrit presents for develop<str<strong>on</strong>g>in</str<strong>on</strong>g>g computati<strong>on</strong>al tools, the features of the<br />

language itself make the creati<strong>on</strong> of critical editi<strong>on</strong>s difficult. As Csernel <strong>and</strong> Patte (2009) expla<str<strong>on</strong>g>in</str<strong>on</strong>g>, a<br />

“critical editi<strong>on</strong>” must take “<str<strong>on</strong>g>in</str<strong>on</strong>g>to account all the different known versi<strong>on</strong>s of the same text <str<strong>on</strong>g>in</str<strong>on</strong>g> order to<br />

show the differences between any two dist<str<strong>on</strong>g>in</str<strong>on</strong>g>ct versi<strong>on</strong>s.” 64 The creati<strong>on</strong> of critical editi<strong>on</strong>s is<br />

challeng<str<strong>on</strong>g>in</str<strong>on</strong>g>g <str<strong>on</strong>g>in</str<strong>on</strong>g> any language, particularly if there are many manuscript witnesses, but Sanskrit presents<br />

some unique problems. In this paper, Csernel <strong>and</strong> Patte present an approach based <strong>on</strong> paragraphs <strong>and</strong><br />

sentences extracted from a collecti<strong>on</strong> of manuscripts known as the “Banaras” gloss. This gloss was<br />

written <str<strong>on</strong>g>in</str<strong>on</strong>g> the seventh century AD <strong>and</strong> is the most famous commentary <strong>on</strong> the “notorious” Pan<str<strong>on</strong>g>in</str<strong>on</strong>g>i<br />

grammar, which was known as the first “generative” grammar <strong>and</strong> was written around the fifth century<br />

BC. One major characteristic of Sanskrit described by Csernel <strong>and</strong> Patte is that it is “not l<str<strong>on</strong>g>in</str<strong>on</strong>g>ked to a<br />

specific script,” <strong>and</strong> while the Brahmi script was used for a l<strong>on</strong>g time, Devanagari is now the most<br />

comm<strong>on</strong>. The authors reported that they used the transliterati<strong>on</strong> scheme of Sanskrit for Tex that was<br />

developed by Frans Velthius 65 where<str<strong>on</strong>g>in</str<strong>on</strong>g> each Sanskrit letter is written us<str<strong>on</strong>g>in</str<strong>on</strong>g>g between <strong>on</strong>e <strong>and</strong> three<br />

Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> characters.<br />

An <str<strong>on</strong>g>in</str<strong>on</strong>g>terest<str<strong>on</strong>g>in</str<strong>on</strong>g>g <str<strong>on</strong>g>in</str<strong>on</strong>g>sight provided by these authors was how <strong>on</strong>e problematic feature of Sanskrit texts—<br />

namely, text written without spaces—was also found <str<strong>on</strong>g>in</str<strong>on</strong>g> other ancient texts:<br />

61 http://kjc-fs-cluster.kjc.uni-heidelberg.de/dcs/<br />

62 For more details <strong>on</strong> this tagger, see Hellwig (2007); for <strong>on</strong>e of its research uses <str<strong>on</strong>g>in</str<strong>on</strong>g> philology see Hellwig (2010).<br />

63 The NEH has recently funded a first step <str<strong>on</strong>g>in</str<strong>on</strong>g> this directi<strong>on</strong>. A project entitled “Sanskrit Lexical Sources: Digital Synthesis <strong>and</strong> Revisi<strong>on</strong>” will support an<br />

“<str<strong>on</strong>g>in</str<strong>on</strong>g>ternati<strong>on</strong>al partnership between the Sanskrit <strong>Library</strong> (Maharishi University of Management) <strong>and</strong> the Cologne Digital Sanskrit Lexic<strong>on</strong> (CDSL) project<br />

(Institute of Indology <strong>and</strong> Tamil Studies, Cologne University) to establish a digital Sanskrit lexical reference work.”<br />

http://www.neh.gov/news/archive/201007200.html<br />

64 Further discussi<strong>on</strong> of this issue can be found <str<strong>on</strong>g>in</str<strong>on</strong>g> the secti<strong>on</strong> <strong>on</strong> Digital Editi<strong>on</strong>s.<br />

65 http://www.ctan.org/tex-archive/language/devanagari/velthuis/

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!