22.01.2015 Views

Military Communications and Information Technology: A Trusted ...

Military Communications and Information Technology: A Trusted ...

Military Communications and Information Technology: A Trusted ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

276 <strong>Military</strong> <strong>Communications</strong> <strong>and</strong> <strong>Information</strong> <strong>Technology</strong>...<br />

Our project proved the usefulness of SMT for military purposes in terms<br />

of fast system production <strong>and</strong> applicability of translation results. We showed that,<br />

after building up the SMT infrastructure (computer cluster, procedures for building<br />

corpora, etc.), a translation system for rough translation can be produced<br />

rapidly [4]. We also argue that the system outputs translations that can support<br />

military operations. This means that the translations generated by our SMT system<br />

are of a quality that is good enough to capture the meaning of the input text, i.e. are<br />

rough translations. This already applies to a system trained on only a few thous<strong>and</strong><br />

lines, for example, on the 4000 lines of high quality translations of our training<br />

corpus. System performance improved with size of corpus data. Apart from that,<br />

the integration of dictionaries significantly improved translation quality especially<br />

when the system was trained on small data.<br />

In our project we showed that SMT systems that can support military intelligence<br />

can be produced rapidly. In order to build a translation system, training<br />

data can be obtained from the internet. In case of insufficient amount of data<br />

resources, as it is often the case for rarely spoken languages, the training corpus<br />

can be produced by human translators. Additionally, available linguistic resources<br />

can be usefully integrated, <strong>and</strong>, on the basis of linguistic expertise, system components<br />

can be modified to adapt to the specific language pair. Such a system could<br />

already support military intelligence. After more corpus development, as translation<br />

quality increases with corpus size, the system could be applied for tasks of higher<br />

complexity or difficulty. For a more detailed description on the fast production<br />

of SMT systems in the context of military missions, see [4].<br />

The ISAF-MT SMT system outputs translations of different quality, i.e.,<br />

translation quality can vary from very bad to very good translations. In Figure 5<br />

you can see an example translation of a newspaper article generated by a current<br />

version of our ISAF-MT system. Bad translations can be caused by words or<br />

phrases that have not been translated from Dari to German. Furthermore, words or<br />

phrases can occur that have been translated into wrong German words or phrases.<br />

Translations that are useful for military purposes can be sentences that are not<br />

good in terms of naturalness or correctness of the target language but that bring<br />

across the semantic content of the source sentences. There are also translations that<br />

capture the right semantic content of the Dari sentences <strong>and</strong> that are correct <strong>and</strong><br />

natural German sentences. Overall, we argue that a translated document consisting<br />

of translations of different quality (like in Figure 5) can bring across the meaning<br />

of the source document. Thus, even a translation system that has been trained on<br />

only a small number of sentences, like our system, can be useful in the context<br />

of military intelligence.<br />

Though machine translation is an active research area, to our knowledge, there<br />

are not many systems for translating Dari to/from German. We have compared our<br />

SMT system to Google Translate [20]. We ran Google’s Persian to German translation<br />

on a r<strong>and</strong>om test set that has been extracted from our Dari-German Corpus.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!