Metadata and workflow comparison

From MultilingualWeb-LT EC Project Wiki
Jump to: navigation, search

1 Process Definitions

To help structure these process definitions they are captured:

  1. defined as classes, i.e. a type of process that can have any number of real world instances. This offer opportunities to refine definitions in a structured way in future to produce sub-classes
  2. grouped under sets of process classes which will help clearly communicate where in the overall internationalization and localization process they sit
  3. should indicate the output of the process or the change in state of those outputs compared to inputs
  4. should indicate the inputs to the process and preconditions on those input
  5. should be correlated against data categories, indicating if the process creates, reads, updates or delete specific data categories from source or target content

1.1 Generation of Source Locale Content

This is a group of processes classes that are involved in the generation of content that is intended to be localised for different target locales.

1.1.1 create-source

Author source locale content. This involve the creation of unstructured content but also its annotation to provide structure such a headings and bullet points. The creation of source content may be conducted in compliance with internationalisation guidelines, whereby authors are encourages to use certain spelling, grammatical, format, style and annotation guidelines that aim to facilitate the subsequent localization of the content.

1.1.2 revise-source

Provide a revised version of existing source content (needs contentResultSource - yes)
Source content
Revised version of the source content, associated to the original

1.1.3 annotate-source

This is annotation of content to associate portions of it with specific meta-data relevant to the processes it will later undergo. This could include marking content to be translated, to be transliterated, that represents a named entity, or that represents a term. Such annotation may be performed by the original author, or by a dedicated quality assurance or localisation specialist and may be assisted by text analytic services, e.g. named entity recognizer.

1.1.4 generate-source-terminology

Management of source content terminology in a separate terminology database. Terminology represents certain important and repeated concepts within a set of content, the consistent use of which is seen to improve the comprehension of the content. For localisation, the consistent translation of terminology is an important quality concern.

1.1.5 source-quality-assurance

Quality assurance review of source content. This is the process of ensuring the consistency and comprehensibility of source content, including functions such as spell checking, grammar checking, correct terminology usage, correct use of meta-data, correct annotation of concepts, the use of controlled language to ease the translation task and the inclusion of content specific instructions or comments to inform the translation process.

1.1.6 voiceover-source

Provision of a spoken audio component to source content.

1.1.7 subtitle-source

Provision of textual transcription of audio component of source content.

1.1.8 internationalise-content

check and revise content to meet internationalisation guidelines
This seems a generic process encompassing many of the others presented in this section, so this may be redundant as a separate process in this list.

1.2 Preparation for Localization

This is a group of processes classes that are involved in preparing content specifically to be localized.

1.2.1 translate-multilingual-terms

associate approved translation to source content terminology

1.2.2 normalize-source

This involves the division of source content into segments, usually at a sentential level, which are amenable to translation. This includes the removal of mark-up not relevant to the translation process and the inclusion of instructions on whether specific content should not be translated in specific locales.

1.2.3 assemble-localization-job

assemble the source content and associated language resources, e.g. translation memories, term-bases, MT engines, translation guidelines, as input into a localisation job

1.2.4 generate-localization-quote

provide an estimate of the effort required to localise a job for quoting or pricing purposes, not to perform the job

1.2.5 transcribe-source

transcribe the source content (needs contentResultSource - yes)

1.2.6 transliterate-source

transliterate the source content (needs contentResultSource - yes)

1.3 Localization

This is a group of process classes that are involved in the localization of content from a source localised to one or more target locales.

1.3.1 translate

Bind a translation in the target language to the source content. This involves the translation of source locale content into a language appropriate to a target locale. It may be performed in a way that is sensitive to the domain of the source content, to terminology and entity annotation in the source and to translation instructions provided with the source content. It may be performed by human or automated software agents.
source content
binding between source content and a translation into the target language

1.3.2 machine-translate

subclass of
translate using an automated process
Machine translation agent identification

1.3.3 statistical-machine-translate

subclass of
translate using an automated process#
translation confidence score

1.3.4 rule-based-machine-translate

subclass of
translate using an automated process

1.3.5 translation-memory-lookup

subclass of
translate using an automated process
fuzzy match score

1.3.6 human-translate

subclass of
translate using human judgement only

1.3.7 human-transcreate

subclass of
Human translation performed with priority given to maintaining intent, style, tone and context, over the literal accuracy of translation. It is typically performed on marketing content.

1.3.8 post-edit

subclass of
translate and human-translate
approve a previous machine translation or provide a preferred alternative translation
a binding between some source content and one or more translations into the target language
a preferred revision of the target content or the selection of one of the input translations as the preferred translation

1.3.9 Non-language-content-localization

This is the adaptation of non text content to the target locale, which may involve the adaptation or replace of images, colour scheme, graphical design and content layouts, text font, number and data formats, including currency, numbers, dates etc.

1.3.10 review-target-quality

human review for quality assurance only the target text, without the source text (see UNE 15038 “review”), by an expert for instance

1.3.11 review-translation-quality

human revision for quality assurance examining the translation and comparing source and target (see UNE 15038 “revision”)

1.3.12 analyse-localization-workflow

This is the analysis of the performance of the localization process to identify processes that are not achieving agreed performance targets and to provide input to decision making on process improvement.

1.4 Publication of Target Locale Content

This is a group of process classes that are involved in the consumption of target locale content and any associated rating or annotation by content consuming users.

1.4.1 integrate-target-content

This is the integration of the target content into the form of the content to be consumed. It includes the assembly of content in the correct order and the integration of meta-data previously removed from the source content.

1.4.2 test-target-content

This is testing of applications that use the target content to ensure correct operation and presentation of the content.

1.4.3 proof-target-content

human checking of proofs before publishing for quality assurance (see UNE 15038 “proofreading”)

1.4.4 publish-target-content

Publish target locale content for consumption by its intended consumers.

1.4.5 localisation-meta-data-removal

This is the process of removing meta-data that was used in the localization process, but is not required for the publishing and consumption of the target content.

1.4.6 gather-target-content-consumer-feedback

This is the process of eliciting and collecting feedback on the quality and usefulness of the target content from its consumers.

1.5 Curation of Content Translations

This is a group of process classes that are involved in the analysis and productive reuse of process provenance data and source-target localize content bindings resulting from the execution of process class instances.

1.5.1 maintain-translation-memory

Maintenance of Translation memories, including replacement of matches with revised translations and corrections to source content

1.5.2 maintain-termbase

his is the collection and indexing of all identified terms from a set of content being translated, together with definitions, morphologies and their associated translations.

1.5.3 maintain-parallel-text

Maintenance of parallel text for SMT training

2 Mapping processes definitions to data categories

2.1 Explanation

This is currently a placeholder from earlier work. TO be updated against revised work on this page. The following table is a static snapshot of this page, where the latest version is maintained.
  • Consumes = the metadata is used by the designated process to produce its results, i.e., it is input into that stage of the workflow.
  • Generates = the metadata is output from the process. This includes processes that take the metadata item is input but modify its value as output (i.e., a process can both consume and generate a specific metadata item).
  • Transforms = the process converts the metadata from one format to another (without modifying the values). A common example would be a process that filters an input format and then passes metadata items found in that format along in a new format without altering their values.

2.2 Table

Key Authoring Phase (CMS/Controlled Language) Translation Phase Metrics/Analysis Publication (CMS)
C = Consumes Authoring Source QA Enrichment Source Terminology Management Connection to Provider Translation Multilingual Terminology Management Review Translation QA Post-translation Process Annotate w/process & qual data Annotate provenance of lang. res. CMS reintegration CMS revision management Publication
G = Generates General XLIFF Connector Pretranslation Human Machine L10n Postediting Update Linguistic Resources Costing and Billing
autoLanguageProcessingRule x x xx xx x x x x x x xx x x
directionality xxx xxx xx x xx xx xxx xxx xx x xxx x xx xx x
dropRule x xx xx x x x x x
idValue x x x x x x
localElementsWithinText x xx xxx xxx xx xx xx x x
localeSpecificContent x xxx xx xx xx x x x x x
preserveSpace x x x x x x x x x x x x x
ruby xx x x x x xx xx xxx x x x x x
targetPointer xx xx x x x
translate x xxx xx xxx xxx x xxx xxx x x xx xxx x
approvalStatus x xxx xx x x x xx xx x x xx xxx xx x xx xxx x
cacheStatus x x x xx xx x x x x
legalStatus x xx xx x x x xx xx xxx xx xx xxx x
processState x xx xx xx xx xx xx x xx x x xx xx xx xx x
processTrigger x xx xx xx xx xx xx x xx x xx xx xx xx x
proofreadingState x xx xx
revisionState x x x x
Project Information
domain x xx xx xx x x x xxx xxx x x xx xxx x
formatType x xx xx x x x xx xx x x xx x xx
genre x xx xx x x x xx xx x xx x xx x
purpose x x x x x x x x x x
register x xx x x x x x xx
translator qualification x x xxx xxx x x x x
author x x x x x xx
contentLicensingTerms x x x x x xx x
revisionAgent x x xx x
sourceLanguage x x x x x x x xx
translationAgent x x x x x x xx x xx xx x x x
qualityError x x x x xx xx x x xx xx x
qualityProfile x x x x xx xx x x xx xx x
context x x xx xx xx x x x xx x xx x x
confidentiality x x x x x x x
externalPlaceholder x x xxx xxx x x x x x xx
languageResource xxx xxx xxx xx xx xx x xxx xxx x x xx xxx x
mtConfidence x x x x x
mtDisambiguation x x x x x x x x
namedEntity x x xx xx x x x x x x xx x x
specialRequirements x xx x x x x x x x xx x xx
term x xx xx xx xxx xxx x xxx xxx x x xx xx x
textAnalysisAnnotation x xx x x x x xx x