Work Package 3: CMS – Localization Chain Integration – Details
Version 0.2 • 20 December 2012
Return to Work Package 3 Overview
The following content is presented for informational purposes and has not been edited or condensed.
Contents:
Cocomore Report
Task 3.1
In this Task Cocomore is developing a set of Drupal modules that will support the use of
- Ability to read and add local as well as global metadata (translate, localization note, domain, disambiguation).
- Global and local metadata are retained within the content and either exported from or imported into the system.
- Metadata added by external sources (disambiguation, translation-agent, revision-agent) can also be imported and processed.
- First publication of module to Drupal community expected January/February 2013. Interoperability through data exchange with Linguaserve and Enrycher’s systems.
- Contribution to specification of the disambiguation data category
- Contribution to specification of the tool reference mechanism
- Partial implementation of the disambiguation data category within Enrycher. So far, we cover local annotation for HTML5, but not yet global annotations or annotation of XML documents.
- Partial implementation of the tool reference mechanism within Enrycher (HTML5 only).
- Current functionality available as a web service: (see http://aidemo.ijs.si/mlw/
- Linguaserve has provided web service documentation and support to Cocomore for Drupal modules, as well as support in testing.
Okapi Components for XLIFF is currently in process. This deliverable is composed of several parts:
- Part 1: Creation of two filters (one for XML and one for HTML5) that can process ITS-annotated documents and merged them back after extraction. This portion is done for most data categories with ongoing work for others.
- Part 2: Implementation in the XLIFF reader/writer of Okapi of the mapping to the ITS data categories. Most data categories have been mapped and the relevant Okapi components are being updated step by step to support that mapping.
- Part 3: Implementation of a component that uses Enrycher’s Web service to gather Disambiguation data for any input document supported by the Okapi filters (not just XML/HTML5). This part is in process. The component currently works with the Enrycher service and annotate the submitted documents. More work need to be done on utilizing the annotations with other components. A presentation demonstrating the state of those components from the September 2012 Prague meeting is available. The latest version of the components is maintained in the snapshot distribution of Okapi. End-users are also starting to use them in the release version of Okapi (as of this writing the most recent release was on November 24, 2012).
Report on LT-Web Processing in the CMS
- Webservice definition for Drupal modules.
- Input on best practices in content granularity and analysis of possibilities to provide context to the translators.
- Testing on intercommunication between Cocomore and Linguaserve.
- Localization case study for ITS 2.0 web localization from German into Chinese and French in progress. To be completed in December 2012.
Task 3.2
Deliverable 3.2.2
The contents are generated in Drupal, a Content Management System (CMS). Before they are sent, the contents are annotated with ITS 2.0 metadata in two ways: automatic annotation and manual annotation. XHTML + ITS 2.0 will be used as interchange format.
Once created, they are sent to the Linguaserve Global Business Connector Contents (GBCC) translation server, processed in the Linguaserve internal localization workflow Platform for Localization, Interoperability and Normalization of Translation (PLINT). Afterwards, with the annotated content translated and the metadata treated, they are downloaded by the client and imported into the CMS.
The ITS 2.0 selected data categories for integration are:
- Translate
- Localization Note
- Domain
- Language Information
- Allowed Characters
- Storage Size
- Provenance
- Readiness (ITS 2.0 extension)
Integration on behalf of the LSP (Language Services Provider) is being done in three areas, and is expected to have a complete version by December 2012. The B2B Integration Showcase is expected to be completed with real content, client and use case by March 2013.
- Pre-production/post-production engine for processing content files annotated with ITS 2.0.
- Linguaserve localization workflow to provide support to project management and production processes.
- Computer Assisted Translation (CAT) tool usage for translation, revision and postediting with ITS 2.0 annotated content.
Current Status of work pending completion of the real client showcase implementation system:
- All data categories implemented are pending final unit tests in coordination with Cocomore. Expected date for final tests is December 2012.
- Domain: workflow integration on development is also expected to be completed in December 2012.
- Provenance: workflow integration pending reply on behalf of Cocomore. Expected completion date is December 2012.
Linguaserve Report
Task 3.1
Linguaserve has provided web service documentation and support to Cocomore for Drupal modules, as well as support in testing.
Task 3.2 for Deliverable 3.2.2
The contents are generated in Drupal, a Content Management System (CMS). Before they are sent, the contents are annotated with ITS 2.0 metadata in two ways: automatic annotation and manual annotation. XHTML + ITS 2.0 will be used as interchange format.
Once created, they are sent to the Linguaserve Global Business Connector Contents (GBCC) translation server, processed in the Linguaserve internal localization workflow Platform for Localization, Interoperability and Normalization of Translation (PLINT). Afterwards, with the annotated content translated and the metadata treated, they are downloaded by the client and imported into the CMS.
The ITS 2.0 selected data categories for integration are:
- Translate
- Localization Note
- Domain
- Language Information
- Allowed Characters
- Storage Size
- Provenance
- Readiness (ITS 2.0 extension)
Integration on behalf of the LSP (Language Services Provider) is being done in three areas, and is expected to have a complete version by December 2012. The B2B Integration Showcase is expected to be completed with real content, client and use case by March 2013.
- Pre-production/post-production engine for processing content files annotated with ITS 2.0.
- Linguaserve localization workflow to provide support to project management and production processes.
- Computer Assisted Translation (CAT) tool usage for translation, revision and postediting with ITS 2.0 annotated content.
Current Status of work pending completion of the real client showcase implementation system:
- All data categories implemented are pending final unit tests in coordination with Cocomore. Expected date for final tests is December 2012.
- Domain: workflow integration on development is also expected to be completed in December 2012.
- Provenance: workflow integration pending reply on behalf of Cocomore. Expected completion date is December 2012.
Detailed progress
Task 3.1: Support for web services and interoperability between CMS and TMS
- Webservice definition for Drupal modules.
- Input on best practices in content granularity and analysis of possibilities to provide context to the translators.
- Testing on intercommunication between Cocomore and Linguaserve.
- Localization case study for ITS 2.0 web localization from German into Chinese and French in progress. To be completed in December 2012.
Task 3.2 for Deliverable 3.2.2
The work status is as follows:
- Intercommunication between Cocomore and Linguaserve is ready as shown in Lyon TPAC.
- CAT tool filter has been adapted to ITS 2.0 usage is done.
- Use of data categories shown in demo engine and in CAT tool shown in Lyon TPAC with the previous XML-Drupal format. This format is now being changed to XHTML-Drupal.
- Use of metadata
ITS 2.0 Data Category Behaviour Linguaserve TMS module modified Translate Block parts of untranslatable content Engine and CAT tool Localization Note Provide information to translators/revisers Engine and CAT tool Localization Note Alert the project managers and add tooltip visualization in the workflow Localization workflow Domain Provide context to the translators Engine and CAT tool Domain Automatic selection of CAT terminology and translation memories Localization workflow Language Information Inform the translators/revisers CAT tool Language Information Update the information after the translation job has been completed Engine Language Information Quality check to ensure the source language content complies with the Webservice parameter Localization workflow Allowed Characters check if the restrictions are met Engine Storage Size Inform the translator/reviser/posteditor CAT tool Storage Size Check if the restrictions are met Engine Storage Size Quality check using the original content Localization workflow Provenance Create or update the data category information with the translator/reviser/posteditor who carried out the work engine Readiness (ITS 2.0 extension) Update the data category information with the availability dates and the following tasks in the localization chain engine Readiness (ITS 2.0 extension) Delivery date control and priority control Localization workflow - A demo for ITS2 processing is available (with the previous XML-Drupal format) at https://www-pre.linguaserve.net/las_demos/control/MLWLTWP3DemoEngine (User: demos, password: demosLingu@serve). They were several interchange format changes (XML → HTML5 → XHTML) to cover various needs of the manual task, CMS capabilities, and best practices related to the standard. These changes affected the development of the ITS 2 engine.
- B2B Integration Showcase is expected to be ready for the Multilingual Web Workshop in Rome (12th March). Key milestones for completion include:
- December 2012 - Web services connector and engine manipulation unit and integration tests.
- December 2012 - Text annotation. Linguaserve enriches texts (around 75 thousand words) with metadata (support provided by Cocomore).
- December 2012 - Cocomore sends all annotated contents to Linguaserve.
- January 2013 - Translating environment: Linguaserve uses the annotated texts for a human machine-assisted translating scenario.
- January 2013 - Linguaserve prepares enriched metadata content for Cocomore to import translated annotated texts (150,000 words) into Drupal.
- February 2013 – Import of translated and annotated texts back into Drupal begins.
- February 2013 – Import of translated and annotated texts back into Drupal ends.
- March 2013 - Quality assurance: Review and feedback process.
- March 12th 2013 - Deliverable D.3.2.2. Linguaserve and Cocomore deliver the first version of a website with annotated and translated text.
- April-June 2013 - Review and web site maintenance with ITS 2.0. Linguaserve and Cocomore review the whole showcase. Web content and annotation update maintenance is undertaken with the full CMS-TMS workflow.
- June 2013 – Showcase report. Linguaserve delivers to Vistatec the report on the showcase D3.2.2 to be integrated into Deliverable 3.2.3. , including showcase layout, design, dissemination, presentation and demo material (support provided by Cocomore).
VistaTEC - Work Packet 3 - Localization Quality Assurance Showcase
VistaTEC is developing applications and integrations to its existing systems to enable linguistic reviewers to collect relevant metadata from LT-Web marked content as it moves through the language review workflow and storing, processing, integrating, displaying and modifying it to support Localisation Managers running Language Quality Review Programmes.
Quality Metadata
VistaTEC’s main area of interest within ITS 2.0 relates to the two localization quality data categories: locQualityIssue and locQualitySummary. locQualityIssue is a somewhat complex data category as it encodes several types of data which can be applied in multiple instances to the same document elements. The desire for interoperability within the data values is also high.
Work on these data categories produced a comprehensive list of data attributes, recommended values and methods for local, global and stand-off markup.
Presentations
Presented browser based prototype vision for Reviewer’s Workbench at FEISGILLT in Seattle in October 2012. Presentation outlined the current challenges for reviewers with multiple files, double-entry of data and awkward nature of data capture. The demonstration of the prototype showed how the process could be refined with in-browser identification, classification and capture of errors using client-side code to alter the HTML DOM and embed information pertaining to errors as ITS 2.0 metadata.
Implementation
Table 1: Milestones and progress
Milestone | Status |
---|---|
ITS 2.0 Test Suite | In Progress and on schedule. |
Reviewer’s Workbench | |
(Desktop application for capturing reviewer feedback as ITS metadata and rendering existing relevant metadata as visual queues for reviewers.) | Browser based prototype coded and demonstrated at Prague. |
Java application not started. | |
REST API to receive requests from Reviewer’s Workbench and Web Language Quality Dashboard | Not started. Planned start January 2013. |
Creation of triple store for storage of Provenance and conversion of data to RDF. | Prototype coded and demonstrated in Prague. Planned start January 2013. |
Conversion routines to convert ITS metadata to triplestore RDF data. | Not started. Planned start January 2013. |
Business Intelligence queries against triple store | Not started. Planned start January 2013. |
Decision Support reports | Not started. Planned start date February 2013. |
Extension of Business Intelligence Dashboard | |
(Charts and Graphical Key Performance Indicators.) | Not started. Planned start date March 2013. |