Online MT System Internationalization Project Information Metadata
Contents
- 1 Summary
- 2 Use Case Description
- 3 Use Case Implementation
- 4 Use Case Demonstration
- 5 Interoperability Behaviour
- 5.1 Step 1: Source HTML5
- 5.2 Step 2: The RTTS processes the content of the source file and sends it to the selected MT System
- 5.3 Step 3A: The MT System returns the results of the translation to the RTTS
- 5.4 Step 3B: The MT System sends the information of the localization note to the postedition tool
- 5.5 Step 4: The RTTS receives the translated file from the MT System and presents the result to the user
- 5.6 Step 5: The postedition tool improves the translations and feeds the MT System
1 Summary
This implementation demonstrates how an Online MT System can automatically translate HTML5 documents from an ITS-conformant Web CMS.
In this use case ITS meta-data is used to solve the following problems:
- Informing the RTTS of precisely which sentences or sentence fragments should or should not be translated and which is the source language.
- Benefit: Allows the user to block automatically the machine translation of certain parts of the Web page that are not required to be translated or must not be machine translated because of its difficulty or provenance, i.e. a technical essay or constitutional laws.
- Benefit: Avoids automatically the machine translation of parts of the Web page that are in various languages and must remain that way, i.e. a language selector.
- Benefit: Specifies automatically to the RTTS the source language of the text and whether it applies to the whole text or not.
- Uses the translate data category and the language information data category.
- Informing the RTTS, at a paragraph, sentence or word level, of the appropriate training corpora or glossary (depending on the MT System) that should be used on the translation by the MT Systems.
- Benefit: Improves the accuracy and quality of the machine translation.
- Uses the domain data category.
- Providing the editor with the necessary information to review the text in order to help him with the disambiguation and to improve the quality and accuracy of the revision.
- Benefit: Help to improve the accuracy and quality of the a review machine translation after the postedition process.
- Uses the localization note data category.
2 Use Case Description
This use case demonstration illustrates how ITS allows a HTML5 Content Author to communicate instructions on language, domain and translation, and convey infomation about the translation to a content editor via a Real Time Translation System connected to different MT Service Providers.
This scenario may involve the following product classes: Content Authoring Tool; Postedition Tool; Content Management System (CMS), MT Systems and Web Browsers.
The business processes involved are: TBD
3 Use Case Implementation
The implementation of this use case involves the following components:
- Linguaserve’s RTTS (Real Time Translation System) ATLAS PW1.
- DCU’s MT System MaTrEx (Statistical)
- LucySoftware’s MT System (Rule-based)
- Postedition tool.
4 Use Case Demonstration
- Status:Specification under development, implementation under development
- Demonstration:TBD.
5 Interoperability Behaviour
Design assumptions:
- After clicking in the language selector the user will send a request to the RTTS to translate the input file.
- By default the MT Systems will translate the content of the tags and will not translate the attributes should the metadata translate tag is absent.
- Some of the metadata of the input will be deleted in the output after the process.
- The input file example is based on the HTML5 files of the Test Suite
5.1 Step 1: Source HTML5
This HTML source file:
<!DOCTYPE html> <html lang="en"> <head> <meta charset=utf-8> <title>ITS 2.0 – Laws & Rights</title> <link href="Rules.xml" rel="its-rules"/> </meta> </head> <body> <section> <span id="languageSelector"> <ul> <li><a href="/en/index.html">English</a></li> <li><a href="/es/index.html">Español</a></li> </ul> </span> </section> <section its-domain="Laws & Rights" its-domain-mapping="'Laws & Rights' LAW"> <h1 lang="de" its-translate="no">des Gesetz</h1> <span> This text is targeted for the common people </span> <p> Law is the set of enforced rules under which a society is governed. Law is one of the most basic social institutions-and one of the most necessary.<br/> No society could exist if all people did just as they pleased without regard for the rights of others. </p> <p> The Law should not be against human rights that are commonly understood as "inalienable fundamental rights" to which a person is inherently entitled simply because she or he is a human being. </p> </section> </body> </html>
where the Rules.xml is:
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0"> <its:translateRule selector="//h:section/h:span|//h:section/h:span/h:ul/h:li/h:a" translate="no"/> <its:locNoteRule locNoteType="alert" selector="//h:section/h:span" locNotePointer="../h:p"/> </its:rules>
The user clicks on the link Español then a request of translation to Spanish is sent to the RTTS.
5.2 Step 2: The RTTS processes the content of the source file and sends it to the selected MT System
The RTTS receives the translation request and downloads the original file, subsequently parses and process the source code. In the second place downloads and reads the XML Rules file, capture the rules, and apply them to the HTML document and overriding when necessary. After this process the input file will look like this:
<!DOCTYPE html> <html lang="en"> <head> <meta charset=utf-8> <title>ITS 2.0 – Laws & Rights</title> <link href="Rules.xml" rel="its-rules"/> </meta> </head> <body> <section> <span id="languageSelector" its-translate="no"> <ul> <li><a href="/en/index.html" its-translate="no">English</a></li> <li><a href="/es/index.html" its-translate="no">Español</a></li> </ul> </span> </section> <section its-domain="Laws & Rights" its-domain-mapping="'Laws & Rights' LAW"> <h1 lang="de" its-translate="no">des Gesetz</h1> <span its-translate="no"> This text is targeted for the common people </span> <p its-loc-note="This text is targeted for the common people" its-loc-note-type="alert"> Law is the set of enforced rules under which a society is governed. Law is one of the most basic social institutions-and one of the most necessary.<br/> No society could exist if all people did just as they pleased without regard for the rights of others. </p> <p its-loc-note="This text is targeted for the common people" its-loc-note-type="alert"> The Law should not be against human rights that are commonly understood as "inalienable fundamental rights" to which a person is inherently entitled simply because she or he is a human being. </p> </section> </body> </html>
This file is sent to the MT System.
5.3 Step 3A: The MT System returns the results of the translation to the RTTS
The MT System will parse the file, translate it and create the next output:
<!DOCTYPE html> <html lang="en"> <head> <meta charset=utf-8> <title>ITS 2.0 – Leyes y Derechos</title> <link href="Rules.xml" rel="its-rules"/> </meta> </head> <body> <section> <span id="languageSelector" its-translate="no"> <ul> <li><a href="/en/index.html" its-translate="no">English</a></li> <li><a href="/es/index.html" its-translate="no">Español</a></li> </ul> </span> </section> <section its-domain="Laws & Rights" its-domain-mapping="'Laws & Rights' LAW"> <h1 lang="de" its-translate="no">des Gesetz</h1> <span its-translate="no"> This text is targeted for the common people </span> <p its-loc-note="This text is targeted for the common people" its-loc-note-type="alert"> El derecho es el conjunto de reglas impuestas bajo las cuales se rige una sociedad. La ley es una de las instituciones sociales más básicos, y una de las más necesarias.<br/> Ninguna sociedad podría existir si todas las personas que hicieron precisamente lo que quisieran sin tener en cuenta los derechos de los demás. </p> <p its-loc-note="This text is targeted for the common people" its-loc-note-type="alert"> La ley no debe estar en contra de los derechos humanos que se entiende comúnmente como "derechos inalienables fundamentales" a la que una persona está intrínsecamente titulado simplemente porque él o ella es un ser humano. </p> </section> </body> </html>
5.4 Step 3B: The MT System sends the information of the localization note to the postedition tool
The MT System will wrap the localization note to the translatable content to send it to the postedition tool for the editor to use it as help when reviewing the content.
TBD.
5.5 Step 4: The RTTS receives the translated file from the MT System and presents the result to the user
Once the RTTS receives the output from the MT System, it will modify some tags and clean others that are no longer needed, finally the result will be:
<!DOCTYPE html> <html lang="es"> <head> <meta charset=utf-8> <title>ITS 2.0 – Leyes y Derechos</title> </meta> </head> <body> <section> <span id="languageSelector"> <ul> <li><a href="/en/index.html">English</a></li> <li><a href="/es/index.html">Español</a></li> </ul> </span> </section> <section> <h1 lang="de">des Gesetz</h1> <p> El derecho es el conjunto de reglas impuestas bajo las cuales se rige una sociedad. La ley es una de las instituciones sociales más básicos, y una de las más necesarias.<br/> Ninguna sociedad podría existir si todas las personas que hicieron precisamente lo que quisieran sin tener en cuenta los derechos de los demás. </p> <p> La ley no debe estar en contra de los derechos humanos que se entiende comúnmente como "derechos inalienables fundamentales" a la que una persona está intrínsecamente titulado simplemente porque él o ella es un ser humano. </p> </section> </body> </html>
In this example html:@lang is updated according to the new language, and all the ITS tags plus link and the second span are deleted because they will produce noise in the user side.
5.6 Step 5: The postedition tool improves the translations and feeds the MT System
After the improvement of the translation, the next request to the same file will have the next aspect:
<!DOCTYPE html> <html lang="es"> <head> <meta charset=utf-8> <title>ITS 2.0 – Leyes y Derechos</title> </meta> </head> <body> <section> <span id="languageSelector"> <ul> <li><a href="/en/index.html">English</a></li> <li><a href="/es/index.html">Español</a></li> </ul> </span> </section> <section> <h1 lang="de">des Gesetz</h1> <p> El derecho es el conjunto de reglas impuestas bajo las cuales se rige una sociedad. La ley es una de las instituciones sociales más básicas, y una de las más necesarias.<br/> Ninguna sociedad podría existir si todas las personas que hicieran lo que les diera la gana sin tener en cuenta los derechos de los demás. </p> <p> La ley no debería estar en contra de los derechos humanos que se entienden comúnmente como "derechos inalienables fundamentales" a los que una persona tiene derecho inherentemente por el simple hecho de ser un ser humano. </p> </section> </body> </html>