Warning:
This wiki has been archived and is now read-only.

XLIFF 2.0 Mapping

From ITS

Jump to: navigation, search

1 Introduction
2 Implementing and testing the mapping
3 General considerations for ITS 2.0 and XLIFF
4 Data Categories Existing in XLIFF [TRANSFERRED TO XLIFF 2.1 Draft]
- 4.1 Translate [TRANSFERRED TO XLIFF 2.1 Draft]
  - 4.1.1 Structural Elements
  - 4.1.2 For inline elements
- 4.2 Preserve Space [TRANSFERRED TO XLIFF 2.1 Draft]
  - 4.2.1 Structural Elements
  - 4.2.2 Inline Elements
5 Data Categories Partially Covered in XLIFF
6 Data Categories Represented Using ITS Itself
7 Data Categories Not Representing Metadata [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]
8 Data Categories Not Mapped Yet
9 References

Introduction

The mapping development is being transferred to the OASIS XLIFF TC where it is becoming a normative part of the planned XLIFF 2.1 release. Parts that gave been transitioned to XLIFF TC are marked and should not be further developed here. This document provides a recommendation on how the ITS 2.0 data categories are represented in XLIFF 2.
For the mapping between ITS 2.0 and XLIFF 1.2 see the page "XLIFF 1.2 Mapping".

Notes:

Please, use the IG mailing list (http://lists.w3.org/Archives/Public/public-i18n-its-ig/) for discussing this topic.
The 'structural' entries relate to the cases where the element with the ITS information is a non-inline (structural) element. For example a <p> in HTML.
The 'inline' entries relate to the case where the element with the ITS information is an inline element. For example a <span> in HTML.
The prefix itsxlf refers to the namespace http://www.w3.org/ns/its-xliff/

ITS data categories can be classified into several categories in XLIFF:

Data Categories Existing in XLIFF
Data Categories Partially Covered in XLIFF
Data Categories Represented Using ITS Itself
Data Categories Not Representing Metadata
And because this document is still under definition there are: Data Categories Not Mapped Yet

Implementing and testing the mapping

General implementation and testing considerations

This section is a stub. Feel free to complete it by providing e.g. these ideas:

What input files are needed: XLIFF, general XML, HTML5?
- XLIFF 2 documents, and I suppose (to see if an extractor supports ITS too): HTML5 or XML documents

What output is needed: XLIFF only?
- For the extraction case: the XLIFF output
- But I suppose some kind of comparable text format would be ideal. I'm not sure the same XPath-based format we used for ITS would be best here as XLIFF processors may be using very different way to process the document. Maybe something using the ID of the object rather than the path would be better?

How would the conformance of the output to mappings be tested?
- Ideally by comparing the gold output to the tool's output

What would be a good location of the test files - a github repository or a XLIFF / ITS group specific location ? Advantage: many people can contribute
- Github would be fine. I'm guessing there may be some call for hosting this in SVN's OASIS too.

Would we need to require a preprocessing of XLIFF files so that general ITS processors understand them? See the related thread.

Types of processors

( Draft section, taken from http://lists.w3.org/Archives/Public/public-i18n-its-ig/2014Oct/0016.html )

Tools that process the mapping:

An XLIFF Extractor aware of both ITS and the ITS module for any data coming from the original source document.
An XLIFF Modifier aware of the ITS Module for data generated during the life time of the XLIFF document.
An XLIFF Merger aware of both the ITS Module and the ITS syntax if any of that data is merged back into the translated document.

Having the ITS Module in its own namespace has the advantage that if ITS itself changes and those changes affect the mapping, one can create an updated version of the ITS module.

Rules file for the mapping

<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0"
 xmlns:xlf="urn:oasis:names:tc:xliff:document:2.0"
 xmlns:itsm="urn:oasis:names:tc:xliff:itx:2.1">
 <its:translateRule selector="/xlf:xliff" translate="no"/>
 <its:translateRule selector="//xlf:source" translate="yes"/>
 <its:targetPointerRule selector="//xlf:source" targetPointer="../xlf:target"/>
 <its:translateRule selector="//xlf:target" translate="yes"/>
 <its:withinTextRule withinText="yes" selector="//xlf:ph|//xlf:pc|//xlf:sc|//xlf:ec|//xlf:mrk|//xlf:sm|//xlf:em"/>
 <its:allowedCharactersRule selector="//xlf:mrk" allowedCharactersPointer="@itsm:allowedCharacters"/>
 <its:domainRule selector="//xlf:mrk[@type='any‘ and @itsm:domains]" domainPointer="@itsm:domains" /> 
</its:rules>

General considerations for ITS 2.0 and XLIFF

Tools that make use of ITS 2.0 and XLIFF

We assume three types of tools that make use of ITS 2.0 information in relation to XLIFF.

An XLIFF Extractor aware of both ITS and the ITS module for any data coming from the original source document.
An XLIFF Modifier aware of the ITS Module for data generated during the life time of the XLIFF document.
An XLIFF Merger aware of both the ITS Module and the ITS syntax if any of that data is merged back into the translated document.

The reminder of this section shows capabilities that these tools need to implement. Each data category section in the ITS 2.0 module describes specifics of its usage in XLIFF.

Re-writing the ITS namespace

For using several ITS 2.0 data categories locally in XLIFF, the ITS 2.0 attributes need to be written with a dedicated namespace. The namespace URI is

urn:oasis:names:tc:xliff:itsm:2.1

It is a best practice to use the namespace prefix itsm.

The semantics of the attributes are analogical to their counterparts in the W3C ITS namespace in case those counterparts exist. The main semantic difference between its and itsm attributes is that itsm attributes can apply on non-wellformed spans that are delimited by empty boundary markers <sm/>/<em/>.

http://www.w3.org/2005/11/its

The usage of the namespace urn:oasis:names:tc:xliff:itsm:2.1 is among other reasons due to XLIFF 2.x validation constraints.

Note YS: it's also because the ITS namespace needs at time to be completed: when a data category uses the XLIFF markup and is missing some features (we would not be able to use the ITS namespace for this); and when ITS local rules are missing things, like a domain attribute

Handling of ITS Tools Annotation

ITS 2.0 provides a tools annotation mechanism. It identifies the processor that generates ITS information. This information is mandatory for the MT Confidence data category and optional for other data categories. It is mandatory for Terminology and Text Analysis if these provide confidence information.

tbd: what is special about handling this in XLIFF?

Note YS: nothing really, only that it has to hanlde the sm/em case too

Handling of overlap

In XLIFF, ITS information among others may be applied to mrk elements. If the ITS information is applied to pairs of sm and em elements, it may overlap with other elements. In that case the normal ITS mechanism of datacategory inheritance for elements nodes cannot be applied, because it would applies to the empty content of sm or em, not the content between sm and the corresponding em.

An ITS processor, before processing an XLIFF file, needs to do the following steps.

1) Change all pc elements to sc and ec elements. This is needed to handle proper inheritace of ITS 2.0 information. Example:

<sm id='1' type='term' ref="http://en.wikipedia.org/wiki/Qu%C3%A9b%C3%A9cois'/>French <pc id='2'>Canadian
hockey</pc><em startRef='1'/>

Note YS: Not sure about the spans for 'Quebecois': it seems it should annotate only 'French Canadian' not 'French Canadian hockey', no? And it would stilll be a good example.

Will be changed to

<sm id='1' type='term' ref="http://en.wikipedia.org/wiki/Qu%C3%A9b%C3%A9cois'/>French <sc id='1'>Canadian
hockey<ec startRef='1'/><em startRef='1'/>

2) Change non-overlapping sm and em to mrk elements.

) set current content to whole content to be processed.
) is there an s tag in current content? Then output text before s tag and do 3), else just output all text in current content.
) has the s tag an e tag with corresponding id? Then create a mrk node. Set the content between s and e to new current content. Do 2). Else discard s and go to 2)
) output rest of text

Example:

<sm id='1' type='term' ref="http://en.wikipedia.org/wiki/Qu%C3%A9b%C3%A9cois'/>French <sc id='1'>Canadian
hockey<ec startRef='1'/><em startRef='1'/>

Will be changed to

<mrk id='1' type='term' ref="http://en.wikipedia.org/wiki/Qu%C3%A9b%C3%A9cois'/>French <sc id='1'>Canadian
hockey<ec startRef='1'/></mrk>

Note that this step cannot be done for overlapping markup, due to element hierarchy constraints in XML. This is identical to the case 3 in NIF2ITS conversion.

3) If there are two mrk elements that contain the same ITS information, create a global rule that identifies them. Example:

<mrk id="m1" type="dbp:entity" ref="http://www.wikidata.org/wiki/Q1187234">Port Metro of <mrk id="m2" type="oc:entity/City"
value="City of Vancouver" ref="http://en.wikipedia.org/wiki/Vancouver">Vancouver</mrk></mrk><mrk id="m2bis" type="oc:entity/City"
value="City of Vancouver" ref="http://en.wikipedia.org/wiki/Vancouver"> City</mrk>

Will have the global rule (using the XLIFF namespace as default namespace in XPath): tbd

Data Categories Existing in XLIFF [TRANSFERRED TO XLIFF 2.1 Draft]

Translate [TRANSFERRED TO XLIFF 2.1 Draft]

Indicates whether a content is translatable or not.
See http://www.w3.org/TR/its20/#trans-datacat for details.

Structural Elements

Use the translate attribute:

Original:

<p translate='yes|no'>Text</p>

Extraction:

<unit id='1' translate='yes|no'>
 <segment>
  <source>Text</source>
 </segment>
</unit>

If the element is not translatable you can also simply not extract it.

For inline elements

Use <mrk> with translate='yes|no'>. A fall-back option is to extract the non-translatable content as inline code,

Original:

<p>Text <code translate='no'>Code</code></p>

Extraction:

<unit id='1'>
 <segment>
  <source>Text <pc id='1'/><mrk id='m1' translate='no'>Code</mrk></pc></source>
 </segment>
</unit>

<unit id='1'>
 <segment>
  <source>Text <ph id='1'/></source>
 </segment>
</unit>

Preserve Space [TRANSFERRED TO XLIFF 2.1 Draft]

Indicates how whitespace should be handled in a given content.
See http://www.w3.org/TR/its20/#preservespace for more details.

Structural Elements

Whitespace handling at the structural level is indicated with xml:space in XLIFF 2:

Original:

<listing xml:space='preserve'>Line 1
Line 2</listing>

Extraction:

<unit id='1' xml:space='preserve'>
 <segment>
  <source>Line 1
Line 2</source>
 </segment>
</unit>

Inline Elements

Use the attribute xml:space in <mrk>.

Original:

<para>Normal text and
 <span xml:space="preserve">preserved spaces: [   ]</span>.
</para>

Extraction:

<unit id='1'>
 <segment>
  <source>Normal text and <pc id='1'><mrk
   xml:space="preserve" mtype='its:any'>preserved spaces: [   ]</mrk></pc>.</source>
 </segment>
</unit>

Note that, currently, few localization applications will honor preserving whitespace for only a given span of text.

Data Categories Partially Covered in XLIFF

Localization Note [TRANSFERRED TO XLIFF 2.1 Draft]

Provides a way to communicate notes to localizers about a particular item of content.
See http://www.w3.org/TR/its20/#locNote-datacat for more details.

Structural Elements

TODO

Inline Elements

TODO

Terminology [TRANSFERRED TO XLIFF 2.1 Draft]

Marks terms and optionally associates them with information, such as definitions.
See http://www.w3.org/TR/its20/#terminology for more details.

Structural Elements

It is recommended to map terminology information that appears on a structural element in the original document by using an inline <mrk> element.

Original:

<p its-term='yes'>Term</p>

Extracted:

<unit id='1'>
 <segment>
  <source><mrk id="m1" type='term'>Term</mrk></source>
 </segment>
</unit>

Inline Elements

In XLIFF 2 terms are denoted using <mrk type='term'>:

Original:

<p>Text with a <span its-term='yes'>term</span>.</p>

Extracted:

<unit id='1'>
 <segment>
  <source>Text with a <pc id='1'><mrk id="m1" type='term'>term</mrk></pc>.</source>
 </segment>
</trans-unit>

Use type="its:term-no" for denoting instances where you have its:term="no".
its:termInfoRef is mapped to the XLIFF ref attribute.
its:termConfidence is mapped to itsxlf:termConfidence.
When itsxlf:termConfidence is used, the annotated text MUST be contained within an element with a relevant its:annotatorsRef.
The attribute value can be used to store information denoted by the global rule attribute its:termInfoPointer.

WARNING: TBD: the XLIFF 2 specification allow ref and value to be both set at the same time. ITS 2.0 does not allow an info and an info-ref to be set at the same time. So we have to decide something for this case.

Note: If needed, the value of the ITS termInfoRef attribute is to be adjusted to point to a resource accessible from the XLIFF document. The location and format of this resource is decided by the tool creating the XLIFF document.

Original:

<p>Text with a <span its-term='yes' its-term-info-ref='http://en.wikipedia.org/wiki/Terminology'
 its-term-confidence='0.9'>term</span>.</p>

Extracted:

<unit id='1' its:annotatorsRef='terminology|http://www.cngl.ie/termchecker'>
 <segment>
  <source>Text with a <pc id='1'><mrk id='m1' type='term'
  itsxlf:termInfoRef='http://en.wikipedia.org/wiki/Terminology'
  itsxlf:termConfidence='0.9'>term</mrk></pc>.</source>
 </segment>
</unit>

Language Information [TRANSFERRED TO XLIFF 2.1 Draft]

Expresses the language for a given content.
See http://www.w3.org/TR/its20/#language-information for more details.

XLIFF is a bilingual document and defines the source and target language of its payload using the srcLang and trgLang attributes in the <xliff> element. By default, those languages apply to the content of <source> and <target>.

Structural Elements

Because XLIFF documents are normally source-monolingual, whole paragraphs in the source document that are not in the main source language are generally not to be extracted.

If there is a need to extract such content, the XLIFF output has to use an inline <mrk> element to enclose the content in a different language than the normal source language of the document.

Inline Elements

Use the attribute xml:lang in <mrk>.

Original:

<!doctype html>
<html lang="en">
 <head>
  <meta charset="utf-8">
  <title>My Document</title>
 </head>
 <body>
  <p>Span of text <span lang="fr">en français</span>.</p>
 </body>
</html>

Extraction:

...
<unit id='2'>
 <segment>
  <source>Span of text <pc id='1'><mrk id="m1" xml:lang="fr" mtype='its:any'
   >en français</mrk></pc>.</source>
 </segment>
</unit>
...

MT Confidence [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]

Communicates the self-reported confidence score from a machine translation engine of the accuracy of a translation it has provided.
See http://www.w3.org/TR/its20/#mtconfidence for more details.

Structural Elements

It is not recommended that MT Confidence be used at a structural level.

If a structural element of the original document has an MT Confidence annotation, it is recommended to represent that annotation using a <mrk> element that encloses the whole content of the <source> element. See the Inline Elements section below for details.

The MT Confidence score must be within the scope of a corresponding its:annotatorsRef attribute.

In the match element

The MT Confidence data category can also be used on the <match> element of the Translation Candidates module. In that case: use the matchQuality attribute to store the value. You must adjust the value by multiplying it by 100 as the scale of matchQuality is [0.0 to 100.0] and the scale for the MT Confidence is [0.0 to 1.0].

<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" 
       xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0"
       xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0"
       version="2.0" srcLang="en" trgLang="fr">
<file id="f1" its:annotatorsRef="mt-confidence|MTServices-XYZ">
 <unit id="1">
  <mtc:matches>
   <!-- Score provided by MTServices-XYZ -->
   <mtc:match ref="#m1" matchQuality="89.82">
    <source>Text</source>
    <target >Texte</target>
   </mtc:match>
   <!-- Score provided by MTProvider-ABC -->
   <mtc:match ref="#m1" matchQuality="67.8"
              its:annotatorsRef="mt-confidence|MTProvider-ABC">
    <source>Text</source>
    <target >Texte</target>
   </mtc:match>
   <!-- Score provided by MTProvider-JKL -->
   <mtc:match ref="#m1" matchQuality="65"
             its:annotatorsRef="mt-confidence|MTProvider-JKL">
    <source>Text</source>
    <target >texte</target>
   </mtc:match>
   <!-- Score provided by MTServices-XYZ -->
   <mtc:match ref="#m1" matchQuality="89.82">
    <source>Some text</source>
    <target>Du texte</target>
   </mtc:match>
  </mtc:matches>
  <segment>
   <source><mrk id='m1' type='mtc:match'>Text</mrk></source>
  </segment>
 </unit>
</file>
</xliff>

Note that matchQuality cannot be mapped to ITS MT Confidence directly as no its:mtConfidencePointer is defined in ITS 2.0.

Inline Elements

Use the its:mtConfidence attributes on the <mrk> element.

<target>
 <mrk id="m1" type="its:any"
    its:mtConfidence="0.8982"
    its:annotatorsRef="mt-confidence|MTServices-XYZ"
 >Some translated text</mrk>
</target>

Data Categories Represented Using ITS Itself

Domain [TRANSFERRED TO XLIFF 2.1 Draft]

Identifies the topic or subject of a given content.
See http://www.w3.org/TR/its20/#domain for more details.

Structural Elements

Use the attribute itsxlf:domains:

Original:

<!doctype html>
<html lang="en">
 <head>
  <meta charset="utf-8">
  <title>Data Category: Domain</title>
  <script type="application/its+xml">
   <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0"
    xmlns:h="http://www.w3.org/1999/xhtml">
    <its:domainRule selector="//h:*[@class='dom1']" domainPointer="./@class"
     domainMapping="dom1 domain1" />
   </its:rules>
  </script>
 </head>
 <body>
  <p class="dom1">Text in the domain domain1</p>
 </body>
</html>

Extraction:

...
<unit id='2' itsxlf:domains='domain1'>
 <segment>
  <source>Text in the domain domain1</source>
 </segment>
</unit>
...

Inline Elements

Use the attribute itsxlf:domains in <mrk>:

Original:

<!doctype html>
<html lang="en">
 <head>
  <meta charset="utf-8">
  <title>Data Category: Domain</title>
  <script type="application/its+xml">
   <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0"
    xmlns:h="http://www.w3.org/1999/xhtml">
    <its:domainRule selector="//h:*[@class='dom1']" domainPointer="./@class"
     domainMapping="dom1 domain1" />
   </its:rules>
  </script>
 </head>
 <body>
  <p>Span of text <span class="dom1">in the domain domain1</span></p>
 </body>
</html>

Extraction:

...
<unit id='2'>
 <segment>
  <source>Span of text <pc id='1'><mrk id='m1' mtype='its:any' itsxlf:domains='domain1'
  >in the domain domain1</mrk></pc></source>
 </segment>
</trans-unit>
...

Text Analysis [TRANSERED TO XLIFF 2.1 DRAFT]

Annotates content with lexical or conceptual information for the purpose of contextual disambiguation.
See http://www.w3.org/TR/its20/#textanalysis for more details.

Structural Elements

Text Analysis is not to be used at a structural level.

If a structural element of the original document has a Text Analysis annotation, it is RECOMMENDED to represent that annotation using a <mrk> element that encloses the whole content of the <source> element.

Original:

<p its-ta-class-ref="http://nerd.eurecom.fr/ontology#Place"
   its-ta-ident-ref="http://dbpedia.org/resource/Arizona">Arizona</p>

Extraction:

<unit id="1">
 <segment>
  <source><mrk id="m1" type="its:any"
               its:taClassRef="http://nerd.eurecom.fr/ontology#Place"
               its:taIdentRef="http://dbpedia.org/resource/Arizona">Arizona</mrk></source>
 </segment>
</unit>

Inline Elements

Use the ITS attributes in the <mrk> element.

If its:taConfidence is used, the annotated text must be contained within an element with a relevant its:annotatorsRef.

Original:

<div its-annotators-ref="text-analysis|http://enrycher.ijs.si">
...
 <p><span its-ta-class-ref="http://nerd.eurecom.fr/ontology#Place"
         its-ta-ident-ref="http://dbpedia.org/resource/Arizona">Arizona</span></p>
...
</div>

Extraction:

<unit id="1" its:annotatorsRef="text-analysis|http://enrycher.ijs.si">
 <segment>
  <source><mrk id="m1" type="its:any"
               its:taClassRef="http://nerd.eurecom.fr/ontology#Place"
               its:taIdentRef="http://dbpedia.org/resource/Arizona">Arizona</mrk></source>
 </segment>
</unit>

Locale Filter (==========TO REVIEW)

Specifies that a content is only applicable to certain locales.
See http://www.w3.org/TR/its20/#LocaleFilter for more details.

Structural Elements

When the Target Locale in XLIFF is Undefined

Use ITS attributes:

Original:

<p its-locale-filter-list='fr'>Text A</p>
<p its-locale-filter-list='ja'>Text B</p>

Extraction:

<xliff srcLang='en' ...>
... 
<unit id='1' its:localeFilterList='fr'>
 <segment>
  <source>Text A</source>
 </segment>
</unit>
<unit id='2' its:localeFilterList='ja'>
 <segment>
  <source>Text B</source>
 </segment>
</unit>

When the Target Locale in XLIFF is Defined

Use the translate attribute (yes if the target locale applies, no if it does not). It is also recommended to keep the original ITS attributes, so the file could potentially be re-purposed (even if it has a current target):

Original:

<p its-locale-filter-list='fr'>Text A</p>
<p its-locale-filter-list='ja'>Text B</p>

Extraction:

<xliff srcLang='en' trgLang='fr' ...>
... 
<unit id='1' translate='yes' its:localeFilterList='fr'>
 <segment>
  <source>Text A</source>
 </segment>
</unit>
<unit id='2' translate='no' its:localeFilterList='ja'>
 <segment>
  <source>Text B</source>
 </segment>
</unit>

If the entry does not apply to the defined target locale you can also simply not extract it.

Inline Elements

When the Target Locale in XLIFF is Undefined

Use the <mrk> element with the original ITS attributes:

Original:

<p>Text <span its-locale-filter-list='fr' its-locale-filter-type='exclude'>text</span></p>

Extraction:

<xliff srcLang='en' ...>
... 
<unit id='1'>
 <segment>
  <source>Text <pc id='1'><mrk id='m1' type='its:any' 
   its:localeFilterList='fr' its:localeFilterType='exclude'>text</mrk></g></source>
 </segment>
</unit>

When the Target Locale in XLIFF is Defined

Use the <mrk> element with translate='yes' if the target does apply or translate='no' if it does not. It is also recommended to keep the original ITS attributes, so the file could potentially be re-purposed (even if it has a current target).

Original:

<p>Text <span its-locale-filter-list='fr' its-locale-filter-type='exclude'>text</span></p>

Extraction:

<xliff srcLang='en' trgLang='fr'...>
... 
<unit id='1'>
 <segment>
  <source>Text <pc id='1'><mrk id='m1' type='its:any' translate='no'
   its:localeFilterList='fr' its:localeFilterType='exclude'>text</mrk></g></source>
 </segment>
</unit>

If the content does not apply to the defined target locale you can also simply replace it by an inline code.

Provenance (==========TRANSFER TO XLIFF 2.1 DRAFT IN PROGRESS)

Communicates the identity of agents that have been involved in the translation of the content or the revision of the translated content.
See http://www.w3.org/TR/its20/#provenance for more details.

Structural Elements

The Provenance data category can be used on <file>, <group> and <unit>.

If a standoff element is needed (because the annotated element has more than one set of the provenance attributes), the <its:provenanceRecords> element must be located in same the element as where the reference is declared.

<unit id='1' its:provenanceRecordsRef="#its=prov1">
 <its:provenanceRecords xml:id="prov1">
  <its:provenanceRecord person="John Doe"/>
  <its:provenanceRecord revPerson="John Smith"/>
 </its:provenanceRecords>
...

Inline Elements

For annotating the source or the target content, use the <<mrk> element with the ITS attributes. If a standoff <its:provenanceRecords> element is being used, it must be located in the same <unit> as where the inline rference is declared.

<unit id='1'>
 <its:provenanceRecords xml:id="prov1">
    <its:provenanceRecord person="John Doe"/>
    <its:provenanceRecord revPerson="John Smith"/>
  </its:provenanceRecords>
 <segment>
  <source>Some text</source>
  <target><mrk id='m1' type='its:any' its:provenanceRecordsRef="#its=prov1">Some text</mrk></target>
 </segment>
</unit>

Localization Quality Issue [TRANSFERRED TO XLIFF 2.1 DRAFT]

Expresses information related to localization quality assessment tasks.
See http://www.w3.org/TR/its20/#lqissue for more details.

Structural Elements

Localization Quality Issue annotation may be used to annotate the source or the target content within a <unit> element. It is done by using the <mrk> element. See below for details.

Inline Elements

The ITS attributes for Localization Quality Issue may be used inline with an <mrk> within a <source> or <target> elements in a <unit> element. For example for a single instance of the Localization Quality Issue data category:

<unit id="1">
 <segment>
  <source>This is the content</source>
  <target><mrk id="m1" type="its:any" its:locQualityIssueType="misspelling"
               its:locQualityIssueComment="'c'es' is unknown. Could be 'c'est'"
               its:locQualityIssueSeverity="50">c'es</mrk> le contenu</target>
  </segment>
</unit>

When needed, a stand-off notation can be used and it is located at the unit's extension point (before the first <segment> element). Note that the reference must used the XLIFF's fragment identifier syntax. The Fragment identifier prefix for the ITS module/extension is its.

<unit id="1">
 <its:locQualityIssues xml:id="lqi1">
  <its:locQualityIssue 
       locQualityIssueType="misspelling"
       locQualityIssueComment="'c'es' is unknown. Could be 'c'est'"
       locQualityIssueSeverity="50" />
  <its:locQualityIssue 
       locQualityIssueType="grammar"
       locQualityIssueComment="Sentence is not capitalized"
       locQualityIssueSeverity="20" />
 </its:locQualityIssues>
 <segment>
  <source>This is the content</source>
  <target><mrk id="m1" type="its:any"
               its:locQualityIssuesRef="#its=lqi1">c'es le contenu</mrk></target>
 </segment>
</unit>

Allowed Characters [TRANSFER TO XLIFF 2.1 DRAFT IN PROGRESS]

Specifies the characters that are permitted in a given piece of content.
See http://www.w3.org/TR/its20/#allowedchars for more details.

Structural Elements

dF: I do not see why the allowed characters could not be set for whole units, groups or files. If a project is coming from a legacy system the restrictions are likely to be structural..

If a structural element of the original document has a Allowed Characters annotation, it is recommended to represent that annotation using a <mrk> element that encloses the whole content of the <source> element. For example:

Original:

<p its-allowedCharacters="[a-ZA-Z]">Text</p>

Extraction:

<unit id="1">
 <segment>
  <source><mrk id="m1" type="its:any" its:allowedCharacters="[a-ZA-Z]">Text</source>
 </segment>
</unit>

Inline Elements

Use the ITS attribute on the <mrk> element:

<unit id="1">
 <segment>
  <source>user name: 
   <pc id="1"><mrk id="m1" type="its:any" its:allowedCharacters="[a-ZA-Z]">johnDoe</mrk></pc>.</source>
 </segment>
</unit>

Data Categories Not Representing Metadata [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]

Elements Within Text [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]

Indicates if an element should be treated as part of a text flow, or as a separate "paragraph".
See http://www.w3.org/TR/its20/#elements-within-text for more details.

This data category is not used directly in XLIFF, but it drives what XLIFF element is used to represent the original element in the extracted document:

withinText='no': Use <unit>
withinText='yes': Use an inline element such as <pc>, <sc>/<ec> or <ph>.
withinText='nested': Use a separate <unit>.

Target Pointer (==========TO REVIEW)

Provides a way to associate the node of a given source content (i.e. the content to be translated) and the node of its corresponding target content.
See http://www.w3.org/TR/its20/#target-pointer for more details.

This data category is not mapped to XLIFF but used by extracting and merging tools to get the source content from the original document and put back the translated content at its proper location.

Note that ITS processors working on XLIFF documents should use the following rule to locate the source and target content:

<its:targetPointerRule selector="//xlf:source" targetPointer="../xlf:target"/>

Id Value (==========TO REVIEW)

Indicates a value that can be used as unique identifier for a given part of the content.
See http://www.w3.org/TR/its20/#idvalue for more details.

Note that the identifiers in XLIFF are not unique per document, so using the Id Value data category to specify IDs in an XLIFF document is largely useless, excepted when used in very specific contexts that cannot be expressed in the ITS rules. See the Fragment Identifier section for details on IDs in XLIFF 2.

Structural Elements

Use the name attribute in <unit>:

Original:

<p id='p1'>Text of the paragraph.</p>

Extraction:

<unit id='1' name='p1'>
 <segment>
  <source>Text of the paragraph.</source>
 </segment>
</unit>

Inline Elements

The Id Value data category is not mapped to inline codes.

Data Categories Not Mapped Yet

Directionality (==========TODO)

Provides information about the text directionality of the content.
See http://www.w3.org/TR/its20/#directionality for more details.

Structural Elements

TODO

Inline Elements

TODO

External Resource [TRANSFER TO XLIFF 2.1 IN PROGRESS]

[Note: Felix has the action item to provide an explanation for the XLIFF 2.1 draft why this category actually isn't being mapped. We may provide some informative guidance how to extract directionality metadata using XLIFF directionality mechanism..] Indicates that a node represents or references potentially translatable data in a resource outside the document.
See http://www.w3.org/TR/its20/#externalresource for more details.

Structural Elements

Use the attribute itsxlf:externalResourceRef in <trans-unit>:

Original:

<its:rules version="2.0" xmlns:its="http://www.w3.org/2005/11/its"
 xmlns:html="http://www.w3.org/1999/xhtml">
 <its:externalResourceRefRule selector="//html:video/@src"
  externalResourceRefPointer="."/>
 <its:externalResourceRefRule selector="//html:video/@poster"
  externalResourceRefPointer="."/>
</its:rules> ..
<video
 height=360
 poster=video-image.png
 src=http://www.example.com/video/v2.mp
 width=640>

Extraction:

...
<trans-unit id='2' itsxlf:externalResourceRef="http://www.example.com/video/v2.mp">
...

Inline Elements

Use the attribute itsxlf:externalResourceRef in the inline element that holds the reference (e.g. <x/> or <ph>):

Original:

<!doctype html>
<html lang="en">
 <head>
  <meta charset="utf-8">
  <title>Data Category: External Resource</title>
  <script type="application/its+xml">
   <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0" xmlns:h="http://www.w3.org/1999/xhtml">
    <its:externalResourceRefRule selector="//h:img" externalResourceRefPointer="@src"/>
   </its:rules>
  </script>
 </head>
 <body>
  <p>Image: <img src="example.png" alt="Text for the image"></p>
 </body>
</html>

Extraction:

...
<trans-unit id='3'>
 <source>Image: <x id='1' itsxlf:externalResourceRef="example.png"/></source>
</trans-unit>
...

Localization Quality Rating [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]

Expresses an overall measurement of the localization quality of a document or an item in a document.
See http://www.w3.org/TR/its20/#lqrating for more details.

Structural Elements

Use the ITS attributes to annotate a <file>, <group>, <trans-unit> or <alt-trans> elements.

<trans-unit id="1" its:locQualityRatingScore="100"
 its:locQualityRatingScoreThreshold="95"
 its:locQualityRatingProfileRef="http://example.org/qaModel/v13">
 <source>text</source>
 <target>texte</target>
<trans-unit>

Inline Elements

Use the ITS attributes of Localization Quality Rating to annotate a segment (<mrk mtype="seg">) or a given span of the content (<mrk mtype="x-its">).

<trans-unit id="1">
 <source>Some text</source>
 <seg-source><mrk mtype="seg" mid="1">Some text</mrk></seg-source>
 <target><mrk mtype="seg" mid="1"
              its:locQualityRatingScore="0.56"
              its:locQualityRatingScoreThreshold="95"
 >Du texte</mrk></target>
</trans-unit>

<trans-unit id="1">
 <source>Some text and a term</source>
 <target>Du texte et un <mrk mtype="x-its" its:locQualityRatingVote="100"
 its:locQualityRatingVoteThreshold="95"
 its:locQualityRatingProfileRef="http://example.org/qaModel/v13">terme</mrk></source>
</trans-unit>

Storage Size (==========TODO)

Specifies the maximum storage size of a given content.
See http://www.w3.org/TR/its20/#storagesize for more details.

Structural Elements

Use the ITS attributes for Storage Size on the <source> and <target> elements:

<trans-unit id="1">
 <source its:storageSize="12"
         its:storageEncoding="UTF-16"
         its:lineBreakType="crlf">Text</source>
</trans-unit>

Inline Elements

Use the ITS attributes for Storage Size on the <mrk> element:

<trans-unit id="1">
 <source><mrk its:storageSize="8"
              its:storageEncoding="UTF-16" mtype="x-its">CONTINUE</mrk></source>
</trans-unit>

References

Internationalization Tag Set (ITS) Version 2.0: http://www.w3.org/TR/its20/
XLIFF Version 2.0: http://docs.oasis-open.org/xliff/xliff-core/v2.0/xliff-core-v2.0.html

Retrieved from "https://www.w3.org/International/its/wiki/index.php?title=XLIFF_2.0_Mapping&oldid=1132"

XLIFF 2.0 Mapping

Contents

Introduction

Implementing and testing the mapping

General implementation and testing considerations

Types of processors

Rules file for the mapping

General considerations for ITS 2.0 and XLIFF

Tools that make use of ITS 2.0 and XLIFF

Re-writing the ITS namespace

Handling of ITS Tools Annotation

Handling of overlap

Data Categories Existing in XLIFF [TRANSFERRED TO XLIFF 2.1 Draft]

Translate [TRANSFERRED TO XLIFF 2.1 Draft]

Structural Elements

For inline elements

Preserve Space [TRANSFERRED TO XLIFF 2.1 Draft]

Structural Elements

Inline Elements

Data Categories Partially Covered in XLIFF

Localization Note [TRANSFERRED TO XLIFF 2.1 Draft]

Structural Elements

Inline Elements

Terminology [TRANSFERRED TO XLIFF 2.1 Draft]

Structural Elements

Inline Elements

Language Information [TRANSFERRED TO XLIFF 2.1 Draft]

Structural Elements

Inline Elements

MT Confidence [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]

Structural Elements

In the match element

Inline Elements

Data Categories Represented Using ITS Itself

Domain [TRANSFERRED TO XLIFF 2.1 Draft]

Structural Elements

Inline Elements

Text Analysis [TRANSERED TO XLIFF 2.1 DRAFT]

Structural Elements

Inline Elements

Locale Filter (==========TO REVIEW)

Structural Elements

When the Target Locale in XLIFF is Undefined

When the Target Locale in XLIFF is Defined

Inline Elements

When the Target Locale in XLIFF is Undefined

When the Target Locale in XLIFF is Defined

Provenance (==========TRANSFER TO XLIFF 2.1 DRAFT IN PROGRESS)

Structural Elements

Inline Elements

Localization Quality Issue [TRANSFERRED TO XLIFF 2.1 DRAFT]

Structural Elements

Inline Elements

Allowed Characters [TRANSFER TO XLIFF 2.1 DRAFT IN PROGRESS]

Structural Elements

Inline Elements

Data Categories Not Representing Metadata [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]

Elements Within Text [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]

Target Pointer (==========TO REVIEW)

Id Value (==========TO REVIEW)

Structural Elements

Inline Elements

Data Categories Not Mapped Yet

Directionality (==========TODO)

Structural Elements

Inline Elements

External Resource [TRANSFER TO XLIFF 2.1 IN PROGRESS]

Structural Elements

Inline Elements

Localization Quality Rating [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]

Structural Elements

Inline Elements

Storage Size (==========TODO)

Structural Elements

Inline Elements

References

Navigation menu