Warning:
This wiki has been archived and is now read-only.

XLIFF MT Roundtrip Value Chain workflow

From MultilingualWeb-LT EC Project Wiki
Jump to: navigation, search

1 Summary

This implementation demonstrates how a machine translation workflow can be established between multiple service providers using XLIFF, including for the automatic identification and inclusion of terminology for statistical machine translation.

In this use case ITS meta-data is use to solve the following problems:

  • The workflow management of localisation jobs across multiple service providers
    • Benefit:Lowers workflow management costs by automating the roundtrip mapping of ITS attributes from source document to XLIFF for passing between service providers and back to the target document format. This reduced file handling costs and the consistent and accurate handling of meta-data between source and target documents and the systems involved in a localsiaiton process
    • Uses the disambiguation and terminology data categories
  • Identifying term candidates in a way that allows suitable translations to be found by an SMT service.
    • Benefit:Improves the quality of statistical machine translation at low cost by automating the identification of terminology and its translations.
    • Uses the disambiguation and terminology data categories
  • Correlating provenance record from different service providers to provide an end-to-end view of the execution of a workflow across a value chain
    • Benefit: Allows the automated collection of provenance records in a form that can be easily interlinked and queried across service providers.
    • Uses the its:tool annotation and the stand-off provenance data category

2 Use Case Description

This use case demonstrated how ITS data categories can be communicated between different service providers along side localisation job content using XLIFF.

This scenario involves the following product classes: Content Management Systems (CMS); Translation Management Systems; Machne Translation Service and Text Analytics service.

The business processes involved are:

3 Use Case Implementation

This scenairo extends the scenario presented for the Simple Segment Machine Translation use case which uses: CMS-LION and Segment-level, ITS-aware Matrex SMT Web Service. The direct interaction between these components in this other use case has been replaced with a more general purpose XLIFF workflow system from University of Limerick called SOLAS.

4 Use Case Demonstration

  • Status:Specification under development, implementation under development
  • Demonstration:TBD.

5 Interoperability Behaviour

The following figure outlines the workflow behaviour of this demonstrator using the Business Process Modelling Notation

Error creating thumbnail: Unable to save thumbnail to destination

5.1 Step 1: Configure Workflow

5.2 Step 2: Submit Source HTML5

This HTML source file has been simplified and refactored from part of the HTML source of this wikipedia page.

 <!DOCTYPE html>
 <html lang="en">
  <head>
   <meta charset="utf-8"></meta>
   <meta name="description" content="latin words and phrases"/>
   <link href="CMS-SMT-rountrip-sourceRules.xml" rel="its-rules">
   <title translate="no">CMS-SMT roundtrip test</title>
  </head>
  <body>
   <p>
   “<strong class="lang-la" translate="no">Felix, qui potuit rerum cognoscere causas</strong>” is verse 490 of the 
   "Georgics" (29 BC), by the Latin poet Virgil. 
   It is literally translated as: <span class="classical-quote">“Fortunate who was able of things to know the causes”</span>. 
   </p> 
  </body>
 </html> 

where CMS-SMT-rountrip-sourceRules.xml is

 <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0">
   <its:translateRule selector="//*/@title" translate="yes"/ />
   <its:domainRule selector="/html/body" domainPointer="/html/head/meta[@name='description']/@content" 
     domainMapping="'latin words and phrases' wikipedia-literature"/>
   <its:domainRule selector="//*/[@class='classical-quote']" domainPointer="/html/head/meta[@name='description']/@content" 
     domainMapping="'latin words and phrases' literature-quotations"/>
   <its:languageInformation selector="starts-with(//*/@class,'lang-')" langPointer="substring-after(//span/@class,'lang-')"/> 
 </its:rules>


5.3 Step 2: Run Through Disambiguation Service

 <!DOCTYPE html>
 <html lang="en">
  <head>
   <meta charset="utf-8"></meta>
   <meta name="description" content="latin words and phrases"/>
   <link href="CMS-SMT-rountrip-sourceRules.xml" rel="its-rules">
   <title translate="no">CMS-SMT roundtrip test</title>
  </head>
  <body>
   <p>
   “<strong class="lang-la" translate="no">Felix, qui potuit rerum cognoscere causas</strong>” is verse 490 of the 
   "<span its-disambig-class-ref="http://dbpedia.org/class/yago/Greco-RomanAgriculturalWritings"  
			its-disambig-ident-ref="http://dbpedia.org/page/Georgics" 
			its-disambig-granularity="its:entity">Georgics</span>" (29 BC), by the Latin poet 
   <span its-disambig-class-ref="http://dbpedia.org/ontology/Agent"  
			its-disambig-ident-ref="http://dbpedia.org/page/Virgil" 
			its-disambig-granularity="its:entity">Virgil</span>. 
   It is literally translated as: <span class="classical-quote">“Fortunate who was able of things to know the causes”</span>. 
   </p> 
  </body>
 </html> 

5.4 Step 3:

This is segmented within CMS-LION into two segments. The two segments are then passed separately to the SMT service in each of the following two steps.

5.5 Step 4: Call Enrycher Disambiguation Service

  • Enrycher invocation and return parameters

Inputs:

  • text: “<span translate="no" lang="la">Felix, qui potuit rerum cognoscere causas</span>” is verse 490 of the "Georgics" (29 BC), by the Latin poet Virgil.
  • language: en
  • translate: yes

Outputs:

  • disambiguatedText: “<span translate="no" lang="la">Felix, qui potuit rerum cognoscere causas</span>” is verse 490 of the " <span its-disambig-class-ref="http://dbpedia.org/class/yago/Greco-RomanAgriculturalWritings" its-disambig-ident-ref="http://dbpedia.org/page/Georgics" its-disambig-granularity="its:entity" >Georgics</span>" (29 BC), by the Latin poet <span its-disambig-class-ref="http://dbpedia.org/ontology/Agent" its-disambig-ident-ref="http://dbpedia.org/page/Virgil" its-disambig-granularity="its:entity" >Virgil</span>.


  • Resulting XLIFF
  • Resulting PROV records

5.6 Step 5: Filter for Translation

  • XLIFF resulting after no-translate filter
  • Resulting PROV Record

5.7 Step 6: Call SMT Service

  • Matrex service invocation and response parameters
  • Resulting XLIFF
  • Resulting PROV Record

5.8 Step 7: Reassemble Target document

  • Target document
  • Resulting PROV Record