28.01.2015 Views

Download - Российский комитет Программы ЮНЕСКО ...

Download - Российский комитет Программы ЮНЕСКО ...

Download - Российский комитет Программы ЮНЕСКО ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

texts totalling some two million word uses. The texts span all major styles of<br />

modern Buryat, including fiction, academic writing, and news.<br />

In keeping with the standards of the Text Encoding Initiative (TEI) 78 , all the<br />

corpora should go through corpus-header markup.<br />

A newly-launched pilot version of the site http://corpora.imbtarchive.ru/<br />

index.php features selected materials, such as concordances 79 to Buryat<br />

fiction. These have been built using original software developed by the project<br />

team. Contexts bring out the meanings of designated word forms, facilitating<br />

their subsequent breakdown across grammar categories for analysis. The corpus<br />

is to be regularly supplemented, updated and, if necessary, corrected/modified.<br />

The project’s preliminary results have been reported in scholarly paper<br />

collections, symposia proceedings, and on the Internet.<br />

Representing a language on the Web and making related resources accessible<br />

to the research community should be a key priority with modern-day linguists.<br />

One of the main aims behind the Buryat corpus project is to integrate the language<br />

into the global information environment. This aim is outlined in the Russian<br />

Academy of Sciences Presidium’s Corpus Linguistics programme of fundamental<br />

research (Direction 3. Creation and development of corpora resources on<br />

Russia’s languages; http://www.corpling-ran.ru/n3.html). Being implemented<br />

as part of this programme and with its support, the project is aimed at modelling<br />

a morphological description of the Buryat language, which would pave the way<br />

for the development of a morphology parser.<br />

The Institute of Mongol, Buddhist and Tibetan Studies is also involved with<br />

experimental developments in Buryat phonetics. Its current projects are aimed<br />

at building a database from information collected ever since the Linguistics<br />

Department’s Experimental Phonetics Research Laboratory was set up. These<br />

developments could subsequently be incorporated into the Buryat corpus as an<br />

oral speech subcorpus.<br />

Speech databases as a major type of linguistic resources are, per se, of much<br />

research interest. Such bases are essential to scholarly tasks related to the<br />

analysis and description of oral speech. Building large, wide-ranging and<br />

informative (multitier) speech databases, along with an easy-to-use and<br />

reliable set of tools for their development and employment is an increasingly<br />

important task, of relevance for computer applications and for fundamental<br />

phonetic research alike.<br />

78<br />

The TEI’s aim is to develop standardized methods for marking textual resources. [Editor’s note.]<br />

79<br />

Concordance is a list of examples of the use of a particular word in context, as sourced from a textual<br />

corpus, complete with links to the source. [Ed.]<br />

193

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!