XLIFF 2.0 Mapping
Contents
- 1 Introduction
- 2 Implementing and testing the mapping
- 3 General considerations for ITS 2.0 and XLIFF
- 4 Data Categories Existing in XLIFF [TRANSFERRED TO XLIFF 2.1 Draft]
- 5 Data Categories Partially Covered in XLIFF
- 6 Data Categories Represented Using ITS Itself
- 6.1 Domain [TRANSFERRED TO XLIFF 2.1 Draft]
- 6.2 Text Analysis [TRANSERED TO XLIFF 2.1 DRAFT]
- 6.3 Locale Filter (==========TO REVIEW)
- 6.4 Provenance (==========TRANSFER TO XLIFF 2.1 DRAFT IN PROGRESS)
- 6.5 Localization Quality Issue [TRANSFERRED TO XLIFF 2.1 DRAFT]
- 6.6 Allowed Characters [TRANSFER TO XLIFF 2.1 DRAFT IN PROGRESS]
- 7 Data Categories Not Representing Metadata [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]
- 8 Data Categories Not Mapped Yet
- 9 References
Introduction
The mapping development is being transferred to the OASIS XLIFF TC where it is becoming a normative part of the planned XLIFF 2.1 release.
Parts that gave been transitioned to XLIFF TC are marked and should not be further developed here.
This document provides a recommendation on how the ITS 2.0 data categories are represented in XLIFF 2.
For the mapping between ITS 2.0 and XLIFF 1.2 see the page "XLIFF 1.2 Mapping".
Notes:
- Please, use the IG mailing list (http://lists.w3.org/Archives/Public/public-i18n-its-ig/) for discussing this topic.
- The 'structural' entries relate to the cases where the element with the ITS information is a non-inline (structural) element. For example a <p> in HTML.
- The 'inline' entries relate to the case where the element with the ITS information is an inline element. For example a <span> in HTML.
- The prefix
itsxlf
refers to the namespace http://www.w3.org/ns/its-xliff/
ITS data categories can be classified into several categories in XLIFF:
- Data Categories Existing in XLIFF
- Data Categories Partially Covered in XLIFF
- Data Categories Represented Using ITS Itself
- Data Categories Not Representing Metadata
- And because this document is still under definition there are: Data Categories Not Mapped Yet
Implementing and testing the mapping
General implementation and testing considerations
This section is a stub. Feel free to complete it by providing e.g. these ideas:
- What input files are needed: XLIFF, general XML, HTML5?
- XLIFF 2 documents, and I suppose (to see if an extractor supports ITS too): HTML5 or XML documents
- What output is needed: XLIFF only?
- For the extraction case: the XLIFF output
- But I suppose some kind of comparable text format would be ideal. I'm not sure the same XPath-based format we used for ITS would be best here as XLIFF processors may be using very different way to process the document. Maybe something using the ID of the object rather than the path would be better?
- How would the conformance of the output to mappings be tested?
- Ideally by comparing the gold output to the tool's output
- What would be a good location of the test files - a github repository or a XLIFF / ITS group specific location ? Advantage: many people can contribute
- Github would be fine. I'm guessing there may be some call for hosting this in SVN's OASIS too.
- Would we need to require a preprocessing of XLIFF files so that general ITS processors understand them? See the related thread.
Types of processors
( Draft section, taken from http://lists.w3.org/Archives/Public/public-i18n-its-ig/2014Oct/0016.html )
Tools that process the mapping:
- An XLIFF Extractor aware of both ITS and the ITS module for any data coming from the original source document.
- An XLIFF Modifier aware of the ITS Module for data generated during the life time of the XLIFF document.
- An XLIFF Merger aware of both the ITS Module and the ITS syntax if any of that data is merged back into the translated document.
Having the ITS Module in its own namespace has the advantage that if ITS itself changes and those changes affect the mapping, one can create an updated version of the ITS module.
Rules file for the mapping
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0" xmlns:xlf="urn:oasis:names:tc:xliff:document:2.0" xmlns:itsm="urn:oasis:names:tc:xliff:itx:2.1"> <its:translateRule selector="/xlf:xliff" translate="no"/> <its:translateRule selector="//xlf:source" translate="yes"/> <its:targetPointerRule selector="//xlf:source" targetPointer="../xlf:target"/> <its:translateRule selector="//xlf:target" translate="yes"/> <its:withinTextRule withinText="yes" selector="//xlf:ph|//xlf:pc|//xlf:sc|//xlf:ec|//xlf:mrk|//xlf:sm|//xlf:em"/> <its:allowedCharactersRule selector="//xlf:mrk" allowedCharactersPointer="@itsm:allowedCharacters"/> <its:domainRule selector="//xlf:mrk[@type='any‘ and @itsm:domains]" domainPointer="@itsm:domains" /> </its:rules>
General considerations for ITS 2.0 and XLIFF
Tools that make use of ITS 2.0 and XLIFF
We assume three types of tools that make use of ITS 2.0 information in relation to XLIFF.
- An XLIFF Extractor aware of both ITS and the ITS module for any data coming from the original source document.
- An XLIFF Modifier aware of the ITS Module for data generated during the life time of the XLIFF document.
- An XLIFF Merger aware of both the ITS Module and the ITS syntax if any of that data is merged back into the translated document.
The reminder of this section shows capabilities that these tools need to implement. Each data category section in the ITS 2.0 module describes specifics of its usage in XLIFF.
Re-writing the ITS namespace
For using several ITS 2.0 data categories locally in XLIFF, the ITS 2.0 attributes need to be written with a dedicated namespace. The namespace URI is
urn:oasis:names:tc:xliff:itsm:2.1
It is a best practice to use the namespace prefix itsm.
The semantics of the attributes are analogical to their counterparts in the W3C ITS namespace in case those counterparts exist. The main semantic difference between its and itsm attributes is that itsm attributes can apply on non-wellformed spans that are delimited by empty boundary markers <sm/>/<em/>.
The usage of the namespace urn:oasis:names:tc:xliff:itsm:2.1
is among other reasons due to XLIFF 2.x validation constraints.
Note YS: it's also because the ITS namespace needs at time to be completed: when a data category uses the XLIFF markup and is missing some features (we would not be able to use the ITS namespace for this); and when ITS local rules are missing things, like a domain attribute
Handling of ITS Tools Annotation
ITS 2.0 provides a tools annotation mechanism. It identifies the processor that generates ITS information. This information is mandatory for the MT Confidence data category and optional for other data categories. It is mandatory for Terminology and Text Analysis if these provide confidence information.
tbd: what is special about handling this in XLIFF?
Note YS: nothing really, only that it has to hanlde the sm/em case too
Handling of overlap
In XLIFF, ITS information among others may be applied to mrk
elements. If the ITS information is applied to pairs of sm
and em
elements, it may overlap with other elements. In that case the normal ITS mechanism of datacategory inheritance for elements nodes cannot be applied, because it would applies to the empty content of sm
or em
, not the content between sm
and the corresponding em
.
An ITS processor, before processing an XLIFF file, needs to do the following steps.
1) Change all pc
elements to sc
and ec
elements. This is needed to handle proper inheritace of ITS 2.0 information. Example:
<sm id='1' type='term' ref="http://en.wikipedia.org/wiki/Qu%C3%A9b%C3%A9cois'/>French <pc id='2'>Canadian hockey</pc><em startRef='1'/>
Note YS: Not sure about the spans for 'Quebecois': it seems it should annotate only 'French Canadian' not 'French Canadian hockey', no? And it would stilll be a good example.
Will be changed to
<sm id='1' type='term' ref="http://en.wikipedia.org/wiki/Qu%C3%A9b%C3%A9cois'/>French <sc id='1'>Canadian hockey<ec startRef='1'/><em startRef='1'/>
2) Change non-overlapping sm
and em
to mrk
elements.
- ) set current content to whole content to be processed.
- ) is there an s tag in current content? Then output text before s tag and do 3), else just output all text in current content.
- ) has the s tag an e tag with corresponding id? Then create a mrk node. Set the content between s and e to new current content. Do 2). Else discard s and go to 2)
- ) output rest of text
Example:
<sm id='1' type='term' ref="http://en.wikipedia.org/wiki/Qu%C3%A9b%C3%A9cois'/>French <sc id='1'>Canadian hockey<ec startRef='1'/><em startRef='1'/>
Will be changed to
<mrk id='1' type='term' ref="http://en.wikipedia.org/wiki/Qu%C3%A9b%C3%A9cois'/>French <sc id='1'>Canadian hockey<ec startRef='1'/></mrk>
Note that this step cannot be done for overlapping markup, due to element hierarchy constraints in XML. This is identical to the case 3 in NIF2ITS conversion.
3) If there are two mrk
elements that contain the same ITS information, create a global rule that identifies them. Example:
<mrk id="m1" type="dbp:entity" ref="http://www.wikidata.org/wiki/Q1187234">Port Metro of <mrk id="m2" type="oc:entity/City" value="City of Vancouver" ref="http://en.wikipedia.org/wiki/Vancouver">Vancouver</mrk></mrk><mrk id="m2bis" type="oc:entity/City" value="City of Vancouver" ref="http://en.wikipedia.org/wiki/Vancouver"> City</mrk>
Will have the global rule (using the XLIFF namespace as default namespace in XPath): tbd
Data Categories Existing in XLIFF [TRANSFERRED TO XLIFF 2.1 Draft]
Translate [TRANSFERRED TO XLIFF 2.1 Draft]
Indicates whether a content is translatable or not.
See http://www.w3.org/TR/its20/#trans-datacat for details.
Structural Elements
Use the translate
attribute:
Original:
<p translate='yes|no'>Text</p>
Extraction:
<unit id='1' translate='yes|no'> <segment> <source>Text</source> </segment> </unit>
If the element is not translatable you can also simply not extract it.
For inline elements
Use <mrk>
with translate='yes|no'>
.
A fall-back option is to extract the non-translatable content as inline code,
Original:
<p>Text <code translate='no'>Code</code></p>
Extraction:
<unit id='1'> <segment> <source>Text <pc id='1'/><mrk id='m1' translate='no'>Code</mrk></pc></source> </segment> </unit>
or
<unit id='1'> <segment> <source>Text <ph id='1'/></source> </segment> </unit>
Preserve Space [TRANSFERRED TO XLIFF 2.1 Draft]
Indicates how whitespace should be handled in a given content.
See http://www.w3.org/TR/its20/#preservespace for more details.
Structural Elements
Whitespace handling at the structural level is indicated with xml:space
in XLIFF 2:
Original:
<listing xml:space='preserve'>Line 1 Line 2</listing>
Extraction:
<unit id='1' xml:space='preserve'> <segment> <source>Line 1 Line 2</source> </segment> </unit>
Inline Elements
Use the attribute xml:space
in <mrk>
.
Original:
<para>Normal text and <span xml:space="preserve">preserved spaces: [ ]</span>. </para>
Extraction:
<unit id='1'> <segment> <source>Normal text and <pc id='1'><mrk xml:space="preserve" mtype='its:any'>preserved spaces: [ ]</mrk></pc>.</source> </segment> </unit>
Note that, currently, few localization applications will honor preserving whitespace for only a given span of text.
Data Categories Partially Covered in XLIFF
Localization Note [TRANSFERRED TO XLIFF 2.1 Draft]
Provides a way to communicate notes to localizers about a particular item of content.
See http://www.w3.org/TR/its20/#locNote-datacat for more details.
Structural Elements
TODO
Inline Elements
TODO
Terminology [TRANSFERRED TO XLIFF 2.1 Draft]
Marks terms and optionally associates them with information, such as definitions.
See http://www.w3.org/TR/its20/#terminology for more details.
Structural Elements
It is recommended to map terminology information that appears on a structural element in the original document by using an inline <mrk>
element.
Original:
<p its-term='yes'>Term</p>
Extracted:
<unit id='1'> <segment> <source><mrk id="m1" type='term'>Term</mrk></source> </segment> </unit>
Inline Elements
In XLIFF 2 terms are denoted using <mrk type='term'>
:
Original:
<p>Text with a <span its-term='yes'>term</span>.</p>
Extracted:
<unit id='1'> <segment> <source>Text with a <pc id='1'><mrk id="m1" type='term'>term</mrk></pc>.</source> </segment> </trans-unit>
- Use
type="its:term-no"
for denoting instances where you haveits:term="no"
. -
its:termInfoRef
is mapped to the XLIFFref
attribute. -
its:termConfidence
is mapped toitsxlf:termConfidence
. - When
itsxlf:termConfidence
is used, the annotated text MUST be contained within an element with a relevantits:annotatorsRef
. - The attribute
value
can be used to store information denoted by the global rule attributeits:termInfoPointer
.
WARNING: TBD: the XLIFF 2 specification allow ref and value to be both set at the same time. ITS 2.0 does not allow an info and an info-ref to be set at the same time. So we have to decide something for this case.
Note: If needed, the value of the ITS termInfoRef
attribute is to be adjusted to point to a resource accessible from the XLIFF document. The location and format of this resource is decided by the tool creating the XLIFF document.
Original:
<p>Text with a <span its-term='yes' its-term-info-ref='http://en.wikipedia.org/wiki/Terminology' its-term-confidence='0.9'>term</span>.</p>
Extracted:
<unit id='1' its:annotatorsRef='terminology|http://www.cngl.ie/termchecker'> <segment> <source>Text with a <pc id='1'><mrk id='m1' type='term' itsxlf:termInfoRef='http://en.wikipedia.org/wiki/Terminology' itsxlf:termConfidence='0.9'>term</mrk></pc>.</source> </segment> </unit>
Language Information [TRANSFERRED TO XLIFF 2.1 Draft]
Expresses the language for a given content.
See http://www.w3.org/TR/its20/#language-information for more details.
XLIFF is a bilingual document and defines the source and target language of its payload using the srcLang
and trgLang
attributes in the <xliff>
element. By default, those languages apply to the content of <source>
and <target>
.
Structural Elements
Because XLIFF documents are normally source-monolingual, whole paragraphs in the source document that are not in the main source language are generally not to be extracted.
If there is a need to extract such content, the XLIFF output has to use an inline <mrk>
element to enclose the content in a different language than the normal source language of the document.
Inline Elements
Use the attribute xml:lang
in <mrk>
.
Original:
<!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <title>My Document</title> </head> <body> <p>Span of text <span lang="fr">en français</span>.</p> </body> </html>
Extraction:
... <unit id='2'> <segment> <source>Span of text <pc id='1'><mrk id="m1" xml:lang="fr" mtype='its:any' >en français</mrk></pc>.</source> </segment> </unit> ...
MT Confidence [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]
Communicates the self-reported confidence score from a machine translation engine of the accuracy of a translation it has provided.
See http://www.w3.org/TR/its20/#mtconfidence for more details.
Structural Elements
It is not recommended that MT Confidence be used at a structural level.
If a structural element of the original document has an MT Confidence annotation, it is recommended to represent that annotation using a <mrk>
element that encloses the whole content of the <source>
element. See the Inline Elements section below for details.
The MT Confidence score must be within the scope of a corresponding its:annotatorsRef
attribute.
In the match element
The MT Confidence data category can also be used on the <match>
element of the Translation Candidates module.
In that case: use the matchQuality
attribute to store the value. You must adjust the value by multiplying it by 100 as the scale of matchQuality
is [0.0 to 100.0] and the scale for the MT Confidence is [0.0 to 1.0].
<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0" xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0" version="2.0" srcLang="en" trgLang="fr"> <file id="f1" its:annotatorsRef="mt-confidence|MTServices-XYZ"> <unit id="1"> <mtc:matches> <!-- Score provided by MTServices-XYZ --> <mtc:match ref="#m1" matchQuality="89.82"> <source>Text</source> <target >Texte</target> </mtc:match> <!-- Score provided by MTProvider-ABC --> <mtc:match ref="#m1" matchQuality="67.8" its:annotatorsRef="mt-confidence|MTProvider-ABC"> <source>Text</source> <target >Texte</target> </mtc:match> <!-- Score provided by MTProvider-JKL --> <mtc:match ref="#m1" matchQuality="65" its:annotatorsRef="mt-confidence|MTProvider-JKL"> <source>Text</source> <target >texte</target> </mtc:match> <!-- Score provided by MTServices-XYZ --> <mtc:match ref="#m1" matchQuality="89.82"> <source>Some text</source> <target>Du texte</target> </mtc:match> </mtc:matches> <segment> <source><mrk id='m1' type='mtc:match'>Text</mrk></source> </segment> </unit> </file> </xliff>
Note that matchQuality
cannot be mapped to ITS MT Confidence directly as no its:mtConfidencePointer
is defined in ITS 2.0.
Inline Elements
Use the its:mtConfidence
attributes on the <mrk>
element.
<target> <mrk id="m1" type="its:any" its:mtConfidence="0.8982" its:annotatorsRef="mt-confidence|MTServices-XYZ" >Some translated text</mrk> </target>
Data Categories Represented Using ITS Itself
Domain [TRANSFERRED TO XLIFF 2.1 Draft]
Identifies the topic or subject of a given content.
See http://www.w3.org/TR/its20/#domain for more details.
Structural Elements
Use the attribute itsxlf:domains
:
Original:
<!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <title>Data Category: Domain</title> <script type="application/its+xml"> <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0" xmlns:h="http://www.w3.org/1999/xhtml"> <its:domainRule selector="//h:*[@class='dom1']" domainPointer="./@class" domainMapping="dom1 domain1" /> </its:rules> </script> </head> <body> <p class="dom1">Text in the domain domain1</p> </body> </html>
Extraction:
... <unit id='2' itsxlf:domains='domain1'> <segment> <source>Text in the domain domain1</source> </segment> </unit> ...
Inline Elements
Use the attribute itsxlf:domains
in <mrk>
:
Original:
<!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <title>Data Category: Domain</title> <script type="application/its+xml"> <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0" xmlns:h="http://www.w3.org/1999/xhtml"> <its:domainRule selector="//h:*[@class='dom1']" domainPointer="./@class" domainMapping="dom1 domain1" /> </its:rules> </script> </head> <body> <p>Span of text <span class="dom1">in the domain domain1</span></p> </body> </html>
Extraction:
... <unit id='2'> <segment> <source>Span of text <pc id='1'><mrk id='m1' mtype='its:any' itsxlf:domains='domain1' >in the domain domain1</mrk></pc></source> </segment> </trans-unit> ...
Text Analysis [TRANSERED TO XLIFF 2.1 DRAFT]
Annotates content with lexical or conceptual information for the purpose of contextual disambiguation.
See http://www.w3.org/TR/its20/#textanalysis for more details.
Structural Elements
Text Analysis is not to be used at a structural level.
If a structural element of the original document has a Text Analysis annotation, it is RECOMMENDED to represent that annotation using a <mrk>
element that encloses the whole content of the <source>
element.
Original:
<p its-ta-class-ref="http://nerd.eurecom.fr/ontology#Place" its-ta-ident-ref="http://dbpedia.org/resource/Arizona">Arizona</p>
Extraction:
<unit id="1"> <segment> <source><mrk id="m1" type="its:any" its:taClassRef="http://nerd.eurecom.fr/ontology#Place" its:taIdentRef="http://dbpedia.org/resource/Arizona">Arizona</mrk></source> </segment> </unit>
Inline Elements
Use the ITS attributes in the <mrk>
element.
If its:taConfidence
is used, the annotated text must be contained within an element with a relevant its:annotatorsRef
.
Original:
<div its-annotators-ref="text-analysis|http://enrycher.ijs.si"> ... <p><span its-ta-class-ref="http://nerd.eurecom.fr/ontology#Place" its-ta-ident-ref="http://dbpedia.org/resource/Arizona">Arizona</span></p> ... </div>
Extraction:
<unit id="1" its:annotatorsRef="text-analysis|http://enrycher.ijs.si"> <segment> <source><mrk id="m1" type="its:any" its:taClassRef="http://nerd.eurecom.fr/ontology#Place" its:taIdentRef="http://dbpedia.org/resource/Arizona">Arizona</mrk></source> </segment> </unit>
Locale Filter (==========TO REVIEW)
Specifies that a content is only applicable to certain locales.
See http://www.w3.org/TR/its20/#LocaleFilter for more details.
Structural Elements
When the Target Locale in XLIFF is Undefined
Use ITS attributes:
Original:
<p its-locale-filter-list='fr'>Text A</p> <p its-locale-filter-list='ja'>Text B</p>
Extraction:
<xliff srcLang='en' ...> ... <unit id='1' its:localeFilterList='fr'> <segment> <source>Text A</source> </segment> </unit> <unit id='2' its:localeFilterList='ja'> <segment> <source>Text B</source> </segment> </unit>
When the Target Locale in XLIFF is Defined
Use the translate
attribute (yes
if the target locale applies, no
if it does not).
It is also recommended to keep the original ITS attributes, so the file could potentially be re-purposed (even if it has a current target):
Original:
<p its-locale-filter-list='fr'>Text A</p> <p its-locale-filter-list='ja'>Text B</p>
Extraction:
<xliff srcLang='en' trgLang='fr' ...> ... <unit id='1' translate='yes' its:localeFilterList='fr'> <segment> <source>Text A</source> </segment> </unit> <unit id='2' translate='no' its:localeFilterList='ja'> <segment> <source>Text B</source> </segment> </unit>
If the entry does not apply to the defined target locale you can also simply not extract it.
Inline Elements
When the Target Locale in XLIFF is Undefined
Use the <mrk>
element with the original ITS attributes:
Original:
<p>Text <span its-locale-filter-list='fr' its-locale-filter-type='exclude'>text</span></p>
Extraction:
<xliff srcLang='en' ...> ... <unit id='1'> <segment> <source>Text <pc id='1'><mrk id='m1' type='its:any' its:localeFilterList='fr' its:localeFilterType='exclude'>text</mrk></g></source> </segment> </unit>
When the Target Locale in XLIFF is Defined
Use the <mrk>
element with translate='yes'
if the target does apply or translate='no'
if it does not. It is also recommended to keep the original ITS attributes, so the file could potentially be re-purposed (even if it has a current target).
Original:
<p>Text <span its-locale-filter-list='fr' its-locale-filter-type='exclude'>text</span></p>
Extraction:
<xliff srcLang='en' trgLang='fr'...> ... <unit id='1'> <segment> <source>Text <pc id='1'><mrk id='m1' type='its:any' translate='no' its:localeFilterList='fr' its:localeFilterType='exclude'>text</mrk></g></source> </segment> </unit>
If the content does not apply to the defined target locale you can also simply replace it by an inline code.
Provenance (==========TRANSFER TO XLIFF 2.1 DRAFT IN PROGRESS)
Communicates the identity of agents that have been involved in the translation of the content or the revision of the translated content.
See http://www.w3.org/TR/its20/#provenance for more details.
Structural Elements
The Provenance data category can be used on <file>
, <group>
and <unit>
.
If a standoff element is needed (because the annotated element has more than one set of the provenance attributes), the <its:provenanceRecords>
element must be located in same the element as where the reference is declared.
<unit id='1' its:provenanceRecordsRef="#its=prov1"> <its:provenanceRecords xml:id="prov1"> <its:provenanceRecord person="John Doe"/> <its:provenanceRecord revPerson="John Smith"/> </its:provenanceRecords> ...
Inline Elements
For annotating the source or the target content, use the <<mrk>
element with the ITS attributes.
If a standoff <its:provenanceRecords>
element is being used, it must be located in the same <unit>
as where the inline rference is declared.
<unit id='1'> <its:provenanceRecords xml:id="prov1"> <its:provenanceRecord person="John Doe"/> <its:provenanceRecord revPerson="John Smith"/> </its:provenanceRecords> <segment> <source>Some text</source> <target><mrk id='m1' type='its:any' its:provenanceRecordsRef="#its=prov1">Some text</mrk></target> </segment> </unit>
Localization Quality Issue [TRANSFERRED TO XLIFF 2.1 DRAFT]
Expresses information related to localization quality assessment tasks.
See http://www.w3.org/TR/its20/#lqissue for more details.
Structural Elements
Localization Quality Issue annotation may be used to annotate the source or the target content within a <unit>
element.
It is done by using the <mrk>
element. See below for details.
Inline Elements
The ITS attributes for Localization Quality Issue may be used inline with an <mrk>
within a <source>
or <target>
elements in a <unit>
element. For example for a single instance of the Localization Quality Issue data category:
<unit id="1"> <segment> <source>This is the content</source> <target><mrk id="m1" type="its:any" its:locQualityIssueType="misspelling" its:locQualityIssueComment="'c'es' is unknown. Could be 'c'est'" its:locQualityIssueSeverity="50">c'es</mrk> le contenu</target> </segment> </unit>
When needed, a stand-off notation can be used and it is located at the unit's extension point (before the first <segment>
element).
Note that the reference must used the XLIFF's fragment identifier syntax. The Fragment identifier prefix for the ITS module/extension is its
.
<unit id="1"> <its:locQualityIssues xml:id="lqi1"> <its:locQualityIssue locQualityIssueType="misspelling" locQualityIssueComment="'c'es' is unknown. Could be 'c'est'" locQualityIssueSeverity="50" /> <its:locQualityIssue locQualityIssueType="grammar" locQualityIssueComment="Sentence is not capitalized" locQualityIssueSeverity="20" /> </its:locQualityIssues> <segment> <source>This is the content</source> <target><mrk id="m1" type="its:any" its:locQualityIssuesRef="#its=lqi1">c'es le contenu</mrk></target> </segment> </unit>
Allowed Characters [TRANSFER TO XLIFF 2.1 DRAFT IN PROGRESS]
Specifies the characters that are permitted in a given piece of content.
See http://www.w3.org/TR/its20/#allowedchars for more details.
Structural Elements
dF: I do not see why the allowed characters could not be set for whole units, groups or files. If a project is coming from a legacy system the restrictions are likely to be structural..
If a structural element of the original document has a Allowed Characters annotation, it is recommended to represent that annotation using a <mrk>
element that encloses the whole content of the <source>
element. For example:
Original:
<p its-allowedCharacters="[a-ZA-Z]">Text</p>
Extraction:
<unit id="1"> <segment> <source><mrk id="m1" type="its:any" its:allowedCharacters="[a-ZA-Z]">Text</source> </segment> </unit>
Inline Elements
Use the ITS attribute on the <mrk>
element:
<unit id="1"> <segment> <source>user name: <pc id="1"><mrk id="m1" type="its:any" its:allowedCharacters="[a-ZA-Z]">johnDoe</mrk></pc>.</source> </segment> </unit>
Data Categories Not Representing Metadata [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]
Elements Within Text [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]
Indicates if an element should be treated as part of a text flow, or as a separate "paragraph".
See http://www.w3.org/TR/its20/#elements-within-text for more details.
This data category is not used directly in XLIFF, but it drives what XLIFF element is used to represent the original element in the extracted document:
-
withinText='no'
: Use<unit>
-
withinText='yes'
: Use an inline element such as<pc>
,<sc>
/<ec>
or<ph>
. -
withinText='nested'
: Use a separate<unit>
.
Target Pointer (==========TO REVIEW)
Provides a way to associate the node of a given source content (i.e. the content to be translated) and the node of its corresponding target content.
See http://www.w3.org/TR/its20/#target-pointer for more details.
This data category is not mapped to XLIFF but used by extracting and merging tools to get the source content from the original document and put back the translated content at its proper location.
Note that ITS processors working on XLIFF documents should use the following rule to locate the source and target content:
<its:targetPointerRule selector="//xlf:source" targetPointer="../xlf:target"/>
Id Value (==========TO REVIEW)
Indicates a value that can be used as unique identifier for a given part of the content.
See http://www.w3.org/TR/its20/#idvalue for more details.
Note that the identifiers in XLIFF are not unique per document, so using the Id Value data category to specify IDs in an XLIFF document is largely useless, excepted when used in very specific contexts that cannot be expressed in the ITS rules. See the Fragment Identifier section for details on IDs in XLIFF 2.
Structural Elements
Use the name
attribute in <unit>
:
Original:
<p id='p1'>Text of the paragraph.</p>
Extraction:
<unit id='1' name='p1'> <segment> <source>Text of the paragraph.</source> </segment> </unit>
Inline Elements
The Id Value data category is not mapped to inline codes.
Data Categories Not Mapped Yet
Directionality (==========TODO)
Provides information about the text directionality of the content.
See http://www.w3.org/TR/its20/#directionality for more details.
Structural Elements
TODO
Inline Elements
TODO
External Resource [TRANSFER TO XLIFF 2.1 IN PROGRESS]
[Note: Felix has the action item to provide an explanation for the XLIFF 2.1 draft why this category actually isn't being mapped.
We may provide some informative guidance how to extract directionality metadata using XLIFF directionality mechanism..]
Indicates that a node represents or references potentially translatable data in a resource outside the document.
See http://www.w3.org/TR/its20/#externalresource for more details.
Structural Elements
Use the attribute itsxlf:externalResourceRef
in <trans-unit>
:
Original:
<its:rules version="2.0" xmlns:its="http://www.w3.org/2005/11/its" xmlns:html="http://www.w3.org/1999/xhtml"> <its:externalResourceRefRule selector="//html:video/@src" externalResourceRefPointer="."/> <its:externalResourceRefRule selector="//html:video/@poster" externalResourceRefPointer="."/> </its:rules> .. <video height=360 poster=video-image.png src=http://www.example.com/video/v2.mp width=640>
Extraction:
... <trans-unit id='2' itsxlf:externalResourceRef="http://www.example.com/video/v2.mp"> ...
Inline Elements
Use the attribute itsxlf:externalResourceRef
in the inline element that holds the reference (e.g. <x/>
or <ph>
):
Original:
<!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <title>Data Category: External Resource</title> <script type="application/its+xml"> <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0" xmlns:h="http://www.w3.org/1999/xhtml"> <its:externalResourceRefRule selector="//h:img" externalResourceRefPointer="@src"/> </its:rules> </script> </head> <body> <p>Image: <img src="example.png" alt="Text for the image"></p> </body> </html>
Extraction:
... <trans-unit id='3'> <source>Image: <x id='1' itsxlf:externalResourceRef="example.png"/></source> </trans-unit> ...
Localization Quality Rating [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]
Expresses an overall measurement of the localization quality of a document or an item in a document.
See http://www.w3.org/TR/its20/#lqrating for more details.
Structural Elements
Use the ITS attributes to annotate a <file>
, <group>
, <trans-unit>
or <alt-trans>
elements.
<trans-unit id="1" its:locQualityRatingScore="100" its:locQualityRatingScoreThreshold="95" its:locQualityRatingProfileRef="http://example.org/qaModel/v13"> <source>text</source> <target>texte</target> <trans-unit>
Inline Elements
Use the ITS attributes of Localization Quality Rating to annotate a segment (<mrk mtype="seg">
) or a given span of the content (<mrk mtype="x-its">
).
<trans-unit id="1"> <source>Some text</source> <seg-source><mrk mtype="seg" mid="1">Some text</mrk></seg-source> <target><mrk mtype="seg" mid="1" its:locQualityRatingScore="0.56" its:locQualityRatingScoreThreshold="95" >Du texte</mrk></target> </trans-unit>
<trans-unit id="1"> <source>Some text and a term</source> <target>Du texte et un <mrk mtype="x-its" its:locQualityRatingVote="100" its:locQualityRatingVoteThreshold="95" its:locQualityRatingProfileRef="http://example.org/qaModel/v13">terme</mrk></source> </trans-unit>
Storage Size (==========TODO)
Specifies the maximum storage size of a given content.
See http://www.w3.org/TR/its20/#storagesize for more details.
Structural Elements
Use the ITS attributes for Storage Size on the <source>
and <target>
elements:
<trans-unit id="1"> <source its:storageSize="12" its:storageEncoding="UTF-16" its:lineBreakType="crlf">Text</source> </trans-unit>
Inline Elements
Use the ITS attributes for Storage Size on the <mrk>
element:
<trans-unit id="1"> <source><mrk its:storageSize="8" its:storageEncoding="UTF-16" mtype="x-its">CONTINUE</mrk></source> </trans-unit>
References
- Internationalization Tag Set (ITS) Version 2.0: http://www.w3.org/TR/its20/
- XLIFF Version 2.0: http://docs.oasis-open.org/xliff/xliff-core/v2.0/xliff-core-v2.0.html