22.01.2015 Views

Military Communications and Information Technology: A Trusted ...

Military Communications and Information Technology: A Trusted ...

Military Communications and Information Technology: A Trusted ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 3: <strong>Information</strong> <strong>Technology</strong> for Interoperability <strong>and</strong> Decision...<br />

273<br />

Figure 4. Excerpt of a Translation Model generated by the ISAF-MT SMT System<br />

Different factors can influence the performance of an SMT system, e.g., the size<br />

<strong>and</strong> quality of the training corpus (as described below), the integration of linguistic<br />

knowledge or the different machine learning implementations. Training data are<br />

highly essential for SMT performance. There are certain requirements that, the more<br />

satisfied, can lead to better translation quality:<br />

• Size of the training data: In general, the larger a training corpus is the more<br />

expressions in terms of vocabulary <strong>and</strong> word combinations it will cover.<br />

Therefore, corpus size is significant for the training of the system <strong>and</strong> system<br />

performance will improve when more training data are provided.<br />

• Translation quality of the training data: The translation quality of the training<br />

corpus directly corresponds to the correctness of the translation model. This<br />

means the more correct translations the training corpus contains the better<br />

the system will be able to learn how to produce correct translations.<br />

• Domain adaption of the training data: Language is highly ambiguous, i.e.,<br />

a word, phrase or sentence can have different meanings <strong>and</strong> therefore<br />

may also have different translations. The meaning of a text segment may<br />

depend on its contextual domain. If a system has been trained for a specific<br />

domain it will choose translations appropriate for that domain. The more<br />

domain-specific a training corpus is, the better the translation performance<br />

for texts from this domain will be.<br />

Linguistic expertise can be applied to adapt SMT technology to the specific<br />

language pair of interest, thus improving translation quality. Also, different SMT<br />

approaches <strong>and</strong> machine learning algorithms can be applied to modify a system<br />

in terms of translation quality as well as efficiency. As SMT is an active research<br />

field, better <strong>and</strong> faster approaches are constantly being implemented.<br />

The advantages that SMT holds over other machine translation approaches<br />

are, for example, the same that statistical NLP technology in general holds over<br />

rule-based approaches, i.e.:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!