XLIFF 2.0 Mapping

From ITS
Jump to: navigation, search

Contents

Introduction

This document provides a recommendation on how the ITS 2.0 data categories are represented in XLIFF 2.
For the mapping between ITS 2.0 and XLIFF 1.2 see the page "XLIFF 1.2 Mapping".

Notes:

  • Please, use the IG mailing list (http://lists.w3.org/Archives/Public/public-i18n-its-ig/) for discussing this topic.
  • The 'structural' entries relate to the cases where the element with the ITS information is a non-inline (structural) element. For example a <p> in HTML.
  • The 'inline' entries relate to the case where the element with the ITS information is an inline element. For example a <span> in HTML.
  • The prefix itsxlf refers to the namespace http://www.w3.org/ns/its-xliff/

Data Categories

Translate (==========TO REVIEW)

Indicates whether a content is translatable or not.
See http://www.w3.org/TR/its20/#trans-datacat for details.

Structural Elements

Use the translate attribute:

Original:

<p translate='yes|no'>Text</p>

Extraction:

<unit id='1' translate='yes|no'>
 <segment>
  <source>Text</source>
 </segment>
</unit>

If the element is not translatable you can also simply not extract it.

For inline elements

Use <mrk> with translate='yes|no'>. A fall-back option is to extract the non-translatable content as inline code,

Original:

<p>Text <code translate='no'>Code</code></p>

Extraction:

<unit id='1'>
 <segment>
  <source>Text <pc id='1'/><mrk id='m1' translate='no'>Code</mrk></pc></source>
 </segment>
</unit>

or

<unit id='1'>
 <segment>
  <source>Text <ph id='1'/></source>
 </segment>
</unit>

Localization Note (==========TODO)

Provides a way to communicate notes to localizers about a particular item of content.
See http://www.w3.org/TR/its20/#locNote-datacat for more details.

Structural Elements

TODO

Inline Elements

TODO

Terminology (==========TO REVIEW)

Marks terms and optionally associates them with information, such as definitions.
See http://www.w3.org/TR/its20/#terminology for more details.

Structural Elements

It is recommended to map terminology information that appears on a structural element in the original document by using an inline <mrk> element.

Original:

<p its-term='yes'>Term</p>

Extracted:

<unit id='1'>
 <segment>
  <source><mrk id="m1" type='term'>Term</mrk></source>
 </segment>
</unit>

Inline Elements

In XLIFF 2 terms are denoted using <mrk mtype='term'>:

Original:

<p>Text with a <span its-term='yes'>term</span>.</p>

Extracted:

<unit id='1'>
 <segment>
  <source>Text with a <pc id='1'><mrk id="m1" type='term'>term</mrk></pc>.</source>
 </segment>
</trans-unit>
  • Use type="its:term-no" for denoting instances where you have its:term="no".
  • its:termInfoRef is mapped to the XLIFF ref attribute.
  • its:termConfidence is mapped to itsxlf:termConfidence.
  • When itsxlf:termConfidence is used, the annotated text must be contained within an element with a relevant its:annotatorsRef.
  • The attribute value can be used to store information denoted by the global rule attribute its:termInfoPointer.

Note: If needed, the value of the ITS termInfoRef attribute must be adjusted to point to a resource accessible from the XLIFF document. The location and format of this resource is decided by the tool creating the XLIFF document.

Original:

<p>Text with a <span its-term='yes' its-term-info-ref='http://en.wikipedia.org/wiki/Terminology'
 its-term-confidence='0.9'>term</span>.</p>

Extracted:

<unit id='1' its:annotatorsRef='terminology|http://www.cngl.ie/termchecker'>
 <segment>
  <source>Text with a <pc id='1'><mrk id='m1' type='term'
  itsxlf:termInfoRef='http://en.wikipedia.org/wiki/Terminology'
  itsxlf:termConfidence='0.9'>term</mrk></pc>.</source>
 </segment>
</unit>

Directionality (==========TODO)

Provides information about the text directionality of the content.
See http://www.w3.org/TR/its20/#directionality for more details.

Structural Elements

TODO

Inline Elements

TODO

Language Information (==========TO REVIEW)

Expresses the language for a given content.
See http://www.w3.org/TR/its20/#language-information for more details.

Structural Elements

Because XLIFF document are normally source monlingual, whole paragraphs in the source document that are not in the main source language should generally not be extracted.

If there a need to extract such content, the XLIFF output should use an inline <mrk> element to enclose the content in a different language than the normal source language of the document.

Inline Elements

Use the attribute xml:lang in <mrk>.

Original:

<!doctype html>
<html lang="en">
 <head>
  <meta charset="utf-8">
  <title>My Document</title>
 </head>
 <body>
  <p>Span of text <span lang="fr">en français</span>.</p>
 </body>
</html>

Extraction:

...
<unit id='2'>
 <segment>
  <source>Span of text <pc id='1'><mrk id="m1" xml:lang="fr" mtype='its:any'
   >en français</mrk></pc>.</source>
 </segment>
</unit>
...

Element Within Text (==========TO REVIEW)

Indicates if an element should be treated as part of a text flow, or as a separate "paragraph".
See http://www.w3.org/TR/its20/#elements-within-text for more details.

This data category is not used directly in XLIFF, but it drives what XLIFF element is used to represent the original element in the extracted document:

  • withinText='no': Use <unit>
  • withinText='yes': Use an inline element such as <pc>, <sc>/<ec> or <ph>.
  • withinText='nested': Use a separate <unit>.

Domain (==========TO REVIEW)

Identifies the topic or subject of a given content.
See http://www.w3.org/TR/its20/#domain for more details.

Structural Elements

Use the attribute itsxlf:domains:

Original:

<!doctype html>
<html lang="en">
 <head>
  <meta charset="utf-8">
  <title>Data Category: Domain</title>
  <script type="application/its+xml">
   <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0"
    xmlns:h="http://www.w3.org/1999/xhtml">
    <its:domainRule selector="//h:*[@class='dom1']" domainPointer="./@class"
     domainMapping="dom1 domain1" />
   </its:rules>
  </script>
 </head>
 <body>
  <p class="dom1">Text in the domain domain1</p>
 </body>
</html>

Extraction:

...
<unit id='2' itsxlf:domains='domain1'>
 <segment>
  <source>Text in the domain domain1</source>
 </segment>
</unit>
...

Inline Elements

Use the attribute itsxlf:domains in <mrk>:

Original:

<!doctype html>
<html lang="en">
 <head>
  <meta charset="utf-8">
  <title>Data Category: Domain</title>
  <script type="application/its+xml">
   <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0"
    xmlns:h="http://www.w3.org/1999/xhtml">
    <its:domainRule selector="//h:*[@class='dom1']" domainPointer="./@class"
     domainMapping="dom1 domain1" />
   </its:rules>
  </script>
 </head>
 <body>
  <p>Span of text <span class="dom1">in the domain domain1</span></p>
 </body>
</html>

Extraction:

...
<unit id='2'>
 <segment>
  <source>Span of text <pc id='1'><mrk id='m1' mtype='its:any' itsxlf:domains='domain1'
  >in the domain domain1</mrk></pc></source>
 </segment>
</trans-unit>
...

Text Analysis (==========TO REVIEW)

Annotates content with lexical or conceptual information for the purpose of contextual disambiguation.
See http://www.w3.org/TR/its20/#textanalysis for more details.

Structural Elements

It is not recommended that Text Analysis be used at a structural level.

If a structural element of the original document has a Text Analysis annotation, it is recommended to represent that annotation using a <mrk> element that encloses the whole content of the <source> element.

Original:

<p its-ta-class-ref="http://nerd.eurecom.fr/ontology#Place"
   its-ta-ident-ref="http://dbpedia.org/resource/Arizona">Arizona</p>

Extraction:

<unit id="1">
 <segment>
  <source><mrk id="m1" mtype="its:any"
               its:taClassRef="http://nerd.eurecom.fr/ontology#Place"
               its:taIdentRef="http://dbpedia.org/resource/Arizona">Arizona</mrk></source>
 </segment>
</unit>

Inline Elements

Use the ITS attributes in the <mrk> element.

If its:taConfidence is used, the annotated text must be contained within an element with a relevant its:annotatorsRef.

Original:

<div its-annotators-ref="text-analysis|http://enrycher.ijs.si">
...
 <p><span its-ta-class-ref="http://nerd.eurecom.fr/ontology#Place"
         its-ta-ident-ref="http://dbpedia.org/resource/Arizona">Arizona</span></p>
...
</div>

Extraction:

<unit id="1" its:annotatorsRef="text-analysis|http://enrycher.ijs.si">
 <segment>
  <source><mrk id="m1" mtype="its:any"
               its:taClassRef="http://nerd.eurecom.fr/ontology#Place"
               its:taIdentRef="http://dbpedia.org/resource/Arizona">Arizona</mrk></source>
 </segment>
</unit>

Locale Filter (==========TO REVIEW)

Specifies that a content is only applicable to certain locales.
See http://www.w3.org/TR/its20/#LocaleFilter for more details.

Structural Elements

When the Target Locale in XLIFF is Undefined

Use ITS attributes:

Original:

<p its-locale-filter-list='fr'>Text A</p>
<p its-locale-filter-list='ja'>Text B</p>

Extraction:

<xliff srcLang='en' ...>
... 
<unit id='1' its:localeFilterList='fr'>
 <segment>
  <source>Text A</source>
 </segment>
</unit>
<unit id='2' its:localeFilterList='ja'>
 <segment>
  <source>Text B</source>
 </segment>
</unit>

When the Target Locale in XLIFF is Defined

Use the translate attribute (yes if the target locale applies, no if it does not). It is also recommended to keep the original ITS attributes, so the file could potentially be re-purposed (even if it has a current target):

Original:

<p its-locale-filter-list='fr'>Text A</p>
<p its-locale-filter-list='ja'>Text B</p>

Extraction:

<xliff srcLang='en' trgLang='fr' ...>
... 
<unit id='1' translate='yes' its:localeFilterList='fr'>
 <segment>
  <source>Text A</source>
 </segment>
</unit>
<unit id='2' translate='no' its:localeFilterList='ja'>
 <segment>
  <source>Text B</source>
 </segment>
</unit>

If the entry does not apply to the defined target locale you can also simply not extract it.

Inline Elements

When the Target Locale in XLIFF is Undefined

Use the <mrk> element with the original ITS attributes:

Original:

<p>Text <span its-locale-filter-list='fr' its-locale-filter-type='exclude'>text</span></p>

Extraction:

<xliff srcLang='en' ...>
... 
<unit id='1'>
 <segment>
  <source>Text <pc id='1'><mrk id='m1' type='its:any' 
   its:localeFilterList='fr' its:localeFilterType='exclude'>text</mrk></g></source>
 </segment>
</unit>

When the Target Locale in XLIFF is Defined

Use the <mrk> element with translate='yes' if the target does apply or translate='no' if it does not. It is also recommended to keep the original ITS attributes, so the file could potentially be re-purposed (even if it has a current target).

Original:

<p>Text <span its-locale-filter-list='fr' its-locale-filter-type='exclude'>text</span></p>

Extraction:

<xliff srcLang='en' trgLang='fr'...>
... 
<unit id='1'>
 <segment>
  <source>Text <pc id='1'><mrk id='m1' type='its:any' translate='no'
   its:localeFilterList='fr' its:localeFilterType='exclude'>text</mrk></g></source>
 </segment>
</unit>

If the content does not apply to the defined target locale you can also simply replace it by an inline code.

Provenance (==========TO REVIEW)

Communicates the identity of agents that have been involved in the translation of the content or the revision of the translated content.
See http://www.w3.org/TR/its20/#provenance for more details.

Structural Elements

The Provenance data category can be used on <file>, <group> and <unit>.

If a standoff element is needed (because the annotated element has more than one set of the provenance attributes), the <its:provenanceRecords> element must be located in same the element as where the reference is declared.

<unit id='1' its:provenanceRecordsRef="#its=prov1">
 <its:provenanceRecords xml:id="prov1">
  <its:provenanceRecord person="John Doe"/>
  <its:provenanceRecord revPerson="John Smith"/>
 </its:provenanceRecords>
...

Inline Elements

For annotating the source or the target content, use the <<mrk> element with the ITS attributes. If a standoff <its:provenanceRecords> element is being used, it must be located in the same <unit> as where the inline rference is declared.

<unit id='1'>
 <its:provenanceRecords xml:id="prov1">
    <its:provenanceRecord person="John Doe"/>
    <its:provenanceRecord revPerson="John Smith"/>
  </its:provenanceRecords>
 <segment>
  <source>Some text</source>
  <target><mrk id='m1' type='its:any' its:provenanceRecordsRef="#its=prov1">Some text</mrk></target>
 </segment>
</unit>

External Resource (==========TODO)

Indicates that a node represents or references potentially translatable data in a resource outside the document.
See http://www.w3.org/TR/its20/#externalresource for more details.

Structural Elements

Use the attribute itsxlf:externalResourceRef in <trans-unit>:

Original:

<its:rules version="2.0" xmlns:its="http://www.w3.org/2005/11/its"
 xmlns:html="http://www.w3.org/1999/xhtml">
 <its:externalResourceRefRule selector="//html:video/@src"
  externalResourceRefPointer="."/>
 <its:externalResourceRefRule selector="//html:video/@poster"
  externalResourceRefPointer="."/>
</its:rules> ..
<video
 height=360
 poster=video-image.png
 src=http://www.example.com/video/v2.mp
 width=640>

Extraction:

...
<trans-unit id='2' itsxlf:externalResourceRef="http://www.example.com/video/v2.mp">
...

Inline Elements

Use the attribute itsxlf:externalResourceRef in the inline element that holds the reference (e.g. <x/> or <ph>):

Original:

<!doctype html>
<html lang="en">
 <head>
  <meta charset="utf-8">
  <title>Data Category: External Resource</title>
  <script type="application/its+xml">
   <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0" xmlns:h="http://www.w3.org/1999/xhtml">
    <its:externalResourceRefRule selector="//h:img" externalResourceRefPointer="@src"/>
   </its:rules>
  </script>
 </head>
 <body>
  <p>Image: <img src="example.png" alt="Text for the image"></p>
 </body>
</html>

Extraction:

...
<trans-unit id='3'>
 <source>Image: <x id='1' itsxlf:externalResourceRef="example.png"/></source>
</trans-unit>
...

Target Pointer (==========TO REVIEW)

Provides a way to associate the node of a given source content (i.e. the content to be translated) and the node of its corresponding target content.
See http://www.w3.org/TR/its20/#target-pointer for more details.

This data category is not mapped to XLIFF but used by extracting and merging tools to get the source content from the original document and put back the translated content at its proper location.

Note that ITS processors working on XLIFF documents should use the following rule to locate the source and target content:

<its:targetPointerRule selector="//xlf:source" targetPointer="../xlf:target"/>

Id Value (==========TO REVIEW)

Indicates a value that can be used as unique identifier for a given part of the content.
See http://www.w3.org/TR/its20/#idvalue for more details.

Structural Elements

Use the name attribute in <unit>:

Original:

<p id='p1'>Text of the paragraph.</p>

Extraction:

<unit id='1' name='p1'>
 <segment>
  <source>Text of the paragraph.</source>
 </segment>
</unit>

Inline Elements

The Id Value data category is not mapped to inline codes.

Preserve Space (==========TO REVIEW)

Indicates how whitespace should be handled in a given content.
See http://www.w3.org/TR/its20/#preservespace for more details.

Structural Elements

Whitespace handling at the structural level is indicated with xml:space in XLIFF 2:

Original:

<listing xml:space='preserve'>Line 1
Line 2</listing>

Extraction:

<unit id='1' xml:space='preserve'>
 <segment>
  <source>Line 1
Line 2</source>
 </segment>
</unit>

Inline Elements

Use the attribute xml:space in <mrk>.

Original:

<para>Normal text and
 <span xml:space="preserve">preserved spaces: [   ]</span>.
</para>

Extraction:

<unit id='1'>
 <segment>
  <source>Normal text and <pc id='1'><mrk
   xml:space="preserve" mtype='its:any'>preserved spaces: [   ]</mrk></pc>.</source>
 </segment>
</unit>

Note that, currently, few localization applications will honor preserving whitespace for only a given span of text.

Localization Quality Issue (==========TO REVIEW)

Expresses information related to localization quality assessment tasks.
See http://www.w3.org/TR/its20/#lqissue for more details.

Structural Elements

Localization Quality Issue annotation may be used to annotate the source or the target content within a <unit> element. It is done by using the <mrk> element. See below for details.

Inline Elements

The ITS attributes for Localization Quality Issue may be used inline with an <mrk> within a <source> or <target> elements in a <unit> element. For example for a single instance of the Localization Quality Issue data category:

<unit id="1">
 <segment>
  <source>This is the content</source>
  <target><mrk id="m1" type="its:any" its:locQualityIssueType="misspelling"
               its:locQualityIssueComment="'c'es' is unknown. Could be 'c'est'"
               its:locQualityIssueSeverity="50">c'es</mrk> le contenu</target>
  </segment>
</unit> 

When needed, a stand-off notation can be used and it is located at the unit's extension point (before the first <segment> element). Note that the reference must used the XLIFF's fragment identifier syntax. The Fragment identifier prefix for the ITS module/extension is its.

<unit id="1">
 <its:locQualityIssues xml:id="lqi1">
  <its:locQualityIssue 
       locQualityIssueType="misspelling"
       locQualityIssueComment="'c'es' is unknown. Could be 'c'est'"
       locQualityIssueSeverity="50" />
  <its:locQualityIssue 
       locQualityIssueType="grammar"
       locQualityIssueComment="Sentence is not capitalized"
       locQualityIssueSeverity="20" />
 </its:locQualityIssues>
 <segment>
  <source>This is the content</source>
  <target><mrk id="m1" type="its:any"
               its:locQualityIssuesRef="#its=lqi1">c'es le contenu</mrk></target>
 </segment>
</unit>

Localization Quality Rating (==========TODO)

Expresses an overall measurement of the localization quality of a document or an item in a document.
See http://www.w3.org/TR/its20/#lqrating for more details.

Structural Elements

Use the ITS attributes to annotate a <file>, <group>, <trans-unit> or <alt-trans> elements.

<trans-unit id="1" its:locQualityRatingScore="100"
 its:locQualityRatingScoreThreshold="95"
 its:locQualityRatingProfileRef="http://example.org/qaModel/v13">
 <source>text</source>
 <target>texte</target>
<trans-unit>

Inline Elements

Use the ITS attributes of Localization Quality Rating to annotate a segment (<mrk mtype="seg">) or a given span of the content (<mrk mtype="x-its">).

<trans-unit id="1">
 <source>Some text</source>
 <seg-source><mrk mtype="seg" mid="1">Some text</mrk></seg-source>
 <target><mrk mtype="seg" mid="1"
              its:locQualityRatingScore="0.56"
              its:locQualityRatingScoreThreshold="95"
 >Du texte</mrk></target>
</trans-unit>
<trans-unit id="1">
 <source>Some text and a term</source>
 <target>Du texte et un <mrk mtype="x-its" its:locQualityRatingVote="100"
 its:locQualityRatingVoteThreshold="95"
 its:locQualityRatingProfileRef="http://example.org/qaModel/v13">terme</mrk></source>
</trans-unit>

MT Confidence (==========TO REVIEW)

Communicates the self-reported confidence score from a machine translation engine of the accuracy of a translation it has provided.
See http://www.w3.org/TR/its20/#mtconfidence for more details.

Structural Elements

It is not recommended that MT Confidence be used at a structural level.

If a structural element of the original document has an MT Confidence annotation, it is recommended to represent that annotation using a <mrk> element that encloses the whole content of the <source> element. See the Inline Elements section below for details.

The MT Confidence score must be within the scope of a corresponding its:annotatorsRef attribute.

In the match element

The MT Confidence data category can also be used on the <match> element of the Translation Candidates module. In that case: use the matchQuality attribute to store the value. You must adjust the value by multiplying it by 100 as the scale of matchQuality is [0.0 to 100.0] and the scale for the MT Confidence is [0.0 to 1.0].

<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" 
       xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0"
       xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0"
       version="2.0" srcLang="en" trgLang="fr">
<file id="f1" its:annotatorsRef="mt-confidence|MTServices-XYZ">
 <unit id="1">
  <mtc:matches>
   <!-- Score provided by MTServices-XYZ -->
   <mtc:match ref="#m1" matchQuality="89.82">
    <source>Text</source>
    <target >Texte</target>
   </mtc:match>
   <!-- Score provided by MTProvider-ABC -->
   <mtc:match ref="#m1" matchQuality="67.8"
              its:annotatorsRef="mt-confidence|MTProvider-ABC">
    <source>Text</source>
    <target >Texte</target>
   </mtc:match>
   <!-- Score provided by MTProvider-JKL -->
   <mtc:match ref="#m1" matchQuality="65"
             its:annotatorsRef="mt-confidence|MTProvider-JKL">
    <source>Text</source>
    <target >texte</target>
   </mtc:match>
   <!-- Score provided by MTServices-XYZ -->
   <mtc:match ref="#m1" matchQuality="89.82">
    <source>Some text</source>
    <target>Du texte</target>
   </mtc:match>
  </mtc:matches>
  <segment>
   <source><mrk id='m1' type='mtc:match'>Text</mrk></source>
  </segment>
 </unit>
</file>
</xliff>

Note that matchQuality cannot be mapped to ITS MT Confidence directly as no its:mtConfidencePointer is defined in ITS 2.0.

Inline Elements

Use the its:mtConfidence attributes on the <mrk> element.

<target>
 <mrk id="m1" type="its:any"
    its:mtConfidence="0.8982"
    its:annotatorsRef="mt-confidence|MTServices-XYZ"
 >Some translated text</mrk>
</target>

Allowed Characters (==========TODO)

Specifies the characters that are permitted in a given piece of content.
See http://www.w3.org/TR/its20/#allowedchars for more details.

Note: This data category is not mapped to the charclass attribute of XLIFF 1.2 because the placement and semantics of that attribute do not match exactly the ones of its:allowedCharacters.

Structural Elements

Use the ITS attribute on the <source> and <target> elements:

<trans-unit id="1">
 <source its:allowedCharacters="[a-ZA-Z]">Text</source>
</trans-unit>

Inline Elements

Use the ITS attribute on the <mrk> element:

<trans-unit id="1">
 <source>user name: 
  <g id="1"><mrk mtype="x-its" its:allowedCharacters="[a-ZA-Z]">johnDoe</mrk></g>.</source>
</trans-unit>

Storage Size (==========TODO)

Specifies the maximum storage size of a given content.
See http://www.w3.org/TR/its20/#storagesize for more details.

Structural Elements

Use the ITS attributes for Storage Size on the <source> and <target> elements:

<trans-unit id="1">
 <source its:storageSize="12"
         its:storageEncoding="UTF-16"
         its:lineBreakType="crlf">Text</source>
</trans-unit>

Inline Elements

Use the ITS attributes for Storage Size on the <mrk> element:

<trans-unit id="1">
 <source><mrk its:storageSize="8"
              its:storageEncoding="UTF-16" mtype="x-its">CONTINUE</mrk></source>
</trans-unit>

References