This wiki has been archived and is now read-only.

Okapi Use Case - Simple Machine Translation

From MultilingualWeb-LT EC Project Wiki
Jump to: navigation, search

1 Description

XML and HTML5 documents are translated using a machine translation system, such as Microsoft Translator.

The documents are extracted based on their ITS properties and the extracted content is send to the translation server. The translated content is merged back into its original XML or HTML5 format.

2 Data categories

The following data categories are directly used:

  • Translate - The non-translatable content is protected.
  • Locale Filter - Only the parts in the scope of the locale filter are extracted, the others are treated as 'do not translate' content.
  • Element Within Text - The information is used to decide what elements are extracted as in-line codes and sub-flows.
  • Preserve Space - The information is mapped to the preserveSpace field in the extracted text unit.
  • Domain - The domain values are placed into a propery that can be provided to select an MT engine.

3 Benefits

  • The ITS markup provides the key information that drives the extraction in both XML and HTML5.
  • Information such as preserving white space can also be passed on to the extracted content and insure a better output.