22.01.2015 Views

Military Communications and Information Technology: A Trusted ...

Military Communications and Information Technology: A Trusted ...

Military Communications and Information Technology: A Trusted ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

272 <strong>Military</strong> <strong>Communications</strong> <strong>and</strong> <strong>Information</strong> <strong>Technology</strong>...<br />

There are different machine translation approaches, from simple word-toword<br />

translation, to rule-based approaches to statistical machine translation, <strong>and</strong><br />

hybrid approaches. Word-to-word translations operate on the basis of dictionaries.<br />

Rule-based approaches apply a set of h<strong>and</strong>written linguistic rules that describe how<br />

to translate one specific source language into one specific target language. In contrast,<br />

SMT is based on machine learning. Most modern MT research is performed<br />

in statistical machine translation.<br />

The objective of SMT is to find the most probable translation for a given<br />

sentence. SMT uses machine learning techniques which means that the system<br />

learns how to translate source language texts into target language texts, based on<br />

training data <strong>and</strong> by the application of learning algorithms. The focus thereby is not<br />

on generating a perfect word-to-word translation but on transferring the meaning<br />

of the source language text into the target language.<br />

Figure 3. Excerpt of the Dari-German Corpus<br />

To build an SMT system, it has to be trained by the application of machine<br />

learning techniques. The training data are parallel corpora, i.e., large collections<br />

of texts in the source as well as in the target language that represent translations<br />

of each other (see Figure 3 for an example). During the training stage, the system<br />

statistically analyses the training corpus in order to learn a so-called translation<br />

model. For this, different learning algorithms may be applied. The model assigns diverse<br />

probabilities to co-occurrences of textual segments (e.g., phrases) in the source<br />

language <strong>and</strong> in the target language. The model then contains phrases in the source<br />

language together with possible translations in the target languages <strong>and</strong> different<br />

probabilities for each specific phrase pair. Figure 4 shows an excerpt of a translation<br />

model that has been generated by our ISAF-MT SMT system. The purpose<br />

of the translation model is to generate a reasonable translation. In addition to<br />

the translation model, a language model is trained using a monolingual text corpus<br />

in the target language. The language model contains n-gram probabilities for word<br />

sequences of the target language. Based on the language model the system is designed<br />

to derive good target language expressions in terms of naturalness. After<br />

training on the training corpus, the SMT system is able to translate new input text.<br />

It will generate the translation of highest probability based on the trained models.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!