28.01.2015 Views

Download - Российский комитет Программы ЮНЕСКО ...

Download - Российский комитет Программы ЮНЕСКО ...

Download - Российский комитет Программы ЮНЕСКО ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

lexicography, and its various manifestations in speech. The key goal is building<br />

pragmatic oriented linguistic models to serve as a basis for the creation of<br />

automated systems of processing data in the Tatar language. Special attention<br />

is paid to the decisive issues of Tatar terminology in cyberspace.<br />

So far, a full-featured computer model of the Tatar morphology has been<br />

created. There are several options to choose from given the structural specificity<br />

of the Tatar language and applied problems to be solved. The Generative<br />

morphology model based on the inflection rules, while operating a bit slower,<br />

provides complete analysis of word forms taking into account the agglutinative<br />

nature of the language and allows to recognize word forms of potentially<br />

unlimited length. The Paradigmatic model provides rapid detection of word<br />

forms and their correctness analysis with up to 95 percent accuracy. The model<br />

is used in the search engine of the RUSSIA University Information System<br />

(by the Centre of Information Technologies of the Lomonosov Moscow State<br />

University), as well as in the MS Windows and MS Office applications. The<br />

recognition speed reaches 100 words in 0.014 seconds.<br />

In addition, within a joint project with the Bilkent University (Turkey) a twolevel<br />

morphology model has been developed working under the well-known<br />

PC-KIMMO shell programme. It is used as a part of the Tatar-Turkish machine<br />

translator.<br />

A structural functional model of Tatar affixational morphemes has also<br />

been created, allowing for the construction of various pragmatic oriented<br />

morphological models. It served as a basis for the integrated “Tatar Morpheme”<br />

software data set. In fact, it is a computer workstation for developing various<br />

linguistic processors, and for educational and research activities in the field of<br />

Tatar linguistics. The “Tatar Morpheme” can be successfully used as a research<br />

tool for other languages as well.<br />

The Tatar-Russian machine translator of Tatar proper names is especially<br />

essential for automated systems of Civil Registry Offices and Passport and<br />

Visa Services. The programme is also used to automatically generate names in<br />

reliance on the Tatar names component model.<br />

ABBYY FineReader OCR software tool has been successfully localized to Tatar.<br />

Due to the built-in Tatar morphology component Tatar texts are recognized<br />

with the same speed and accuracy level as Russian and English ones.<br />

We are currently working on the creation and support of a digital Tatar corpus,<br />

i.e. an Internet-based national corpus with the following components:<br />

180<br />

• Digital raw (unformatted) texts (newspapers, magazines, books,<br />

documents, etc.);

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!