XLIFF Mapping

From MultilingualWeb-LT EC Project Wiki
Revision as of 21:52, 5 March 2013 by Dlewis6 (Talk | contribs)

Jump to: navigation, search

This page provide a tentative mapping of the ITS data categories in XLIFF 1.2 and 2.0

Notes:

  • When posting emails about this topic, please refer to Issue 55
  • The namespace prefix 'TBD' has been replaced with 'itsx'.
  • The 'strucural' entries relate to the case where the element with the ITS information is a non-inline (structural) element. For example a <p> in HTML.
  • The 'inline' entries relate to the case where the element with the ITS information is an inline element. For example a <span> in HTML.
  • In general, this table attempts to map ITS data categories and their attributes first into corresponding attributes or elements in XLIFF 1.2 amd 2.0. Only if a native XLIFF equivalent is the introduction of a native ITS attribute or element into XLIFF considered, and then only where extensibility is permitted in XLIFF. In other words the mapping aims to ensure the resulting document is a conformant XLIFF document.
  • itsx: is a schema prefix for a schema that should be designed and hosted by W3C MLW-LT to facilitate ITS mappings, such as XLIFF mapping. The rationale is that XLIFF processors processing the ITS mappings won't usually be generic ITS processors, that could properly parse native ITS constructs.
  • Color Code
color meaning
  stuck in XLIFF TC
  dependent on an unstable ITS category
  nedds W3C I18N WG review

The mapping currently takes the following approach:

</a> </tr>
Data Categories ("driver") XLIFF 1.2 XLIFF 2.0
Translate
(Yves)
structural: no extraction or
<trans-unit id='id' translate='yes|no'>
structural: no extraction or
<unit id='id'>
 <segment translate='yes|no'>
inline: inline code or
<mrk mtype="protected">...</mrk>
<mrk mtype="x-its-Translate-Yes">...</mrk>

Value for 'not-protected' needs to be defined. dF: Proposed verbose: x-its-Translate-Yes.

inline: inline code or
<mrk id="id" translate="yes|no">
Localization Note
(Yves)
structural:
<note>

alert: priority="1" description: priority > 1

structural:
<note>

Note is now extensible in XLIFF 2.0
so its:noteType can be used.

inline:
<mrk mtype='x-its' comment='[note]' itsx:locNoteType='alert|description'>

is it a best practice?

inline:
<mrk id='1' type='comment' value='[note]' > should extensiblity be introduced here?
Terminology
(???)
Recommend only use Terminology inline:

For source with its:term="yes":

<mrk mtype="term" its:termInfoRef="#ge4" its:termConfidence="0.5">Bohemia</mrk>

If its:termConfidence is used, then the annotated text must be contained within an element with a relevant its:annotatorsRef, e.g.:

its:annotatorsRef="terminology|http://www.cngl.ie/termchecker"

For source with its:term="no":

<mrk mtype="x-its-term-no"
inline:
<mrk type='term' value='info'|ref='infoRef'>
Directionality
(???)
structural: trans-unit its:dir structural: XLIFF2 directionality mechanism
inline: Unicode characters for inline inline: XLIFF2 directionality mechanism
Ruby
(???)
TBD TBD
Language information
(???)
structural: fall back on mrk if needed

inline:

<mrk mtype='x-itsLang???' xml:lang='lang'>

inline:

<mrk type='its:Lang' value='en'>
Element Within Text
(Yves)
yes: inline codes

no: trans-unit

nested: <sub>
yes: inline codes

no: unit

nested: sub-flows mechanism
Domain
(???)
itsx:domain="Travel"
itsx namespace pending
itsx:domain
Text Analysis
(Dave and David)
Recommend only use text analysis inline:
<mrk mtype="phrase" its:taConfidence="0.7"
    its:taClassRef="http://nerd.eurecom.fr/ontology#Place"
    its:taIdentRef="http://dbpedia.org/resource/Arizona">
    Arizona</mrk>

If its:taConfidence is used, then the annotated text must be contained within an element with a relevant its:annotatorsRef, e.g.:

its:annotatorsRef="text-analysis|http://enrycher.ijs.si"
inline:
1.2 <mrk mtype='phrase'> using ITS native and (if used) comment for the resolved prose text
Locale Filter
(Yves)
structural: Same as for Translate structural: Same as for Translate
inline: Same as for Translate inline: Same as for Translate
Provenance
(Dave)
structural:
<target its:provenanceRecordsRef="#ph3">
...
<its:provenanceRecords xml:id="ph3">
   <its:provenanceRecord 
     person="John Doe"
     orgRef="http://www.legaltrans-ex.com/"
     revPerson="Tommy Atkins"
     revOrgRef="http://www.vistatec.com/"
     provRef="http://www.examplelsp.com/excontent987/legal/prov/e6354"/>
   <its:provenanceRecord 
     revPerson="John Smith"
     revOrgRef="http://john-smith.qa.example.com"/>
 </its:provenanceRecords>
structural:
<target its:provenanceRecordsRef="#ph3">
...
<its:provenanceRecords xml:id="ph3">
   <its:provenanceRecord 
     person="John Doe"
     orgRef="http://www.legaltrans-ex.com/"
     revPerson="Tommy Atkins"
     revOrgRef="http://www.vistatec.com/"
     provRef="http://www.examplelsp.com/excontent987/legal/prov/e6354"/>
   <its:provenanceRecord 
     revPerson="John Smith"
     revOrgRef="http://john-smith.qa.example.com"/>
 </its:provenanceRecords>

inline:
 <mrk mtype="x-its" its:provenanceRecordsRef="#ph3">

or

 <mrk mtype="seg" its:provenanceRecordsRef="#ph3">
inline:
 <mrk id='1' type="its:provenanceRecordsRef" ref="#ph3">
External Resource
(???)
in trans-unit:
 itsx:itsExternalResource
in unit:
 itsx:itsExternalResource
inline:
 <mrk mtype="x-its" itsx:itsExternalResource=[uri]">
inline:
 <mrk id='1' type="itsx:externalResource" ref="[uri]">
Target Pointer
(Yves)
N/A in the XLIFF document. Used when extracting and merging. N/A in the XLIFF document. Used when extracting and merging.
Id Value
(Yves)
structural:
<trans-unit resname="[value]">
structural: <unit name="[value]">
inline: N/A inline: N/A
Preserve Space
(???)
structural: xml:space structural: xml:space
inline: inline:
Localization Quality Issue
(Yves)
in trans-unit (recommended when issue related to the translation):
<trans-unit its:locQualityIssuesRef="#lqi1">
...
<its:locQualityIssues xml:id="lqi1">
 <its:locQualityIssue locQualityIssueType locQualityIssueComment
  locQualityIssueSeverity locQualityIssueProfileRef />
</its:locQualityIssues> 

Can also be used in source and target element if the issue is related to either separate of the translation between them.

in unit:
<unit its:locQualityIssuesRef="#lqi1">
...
<its:locQualityIssues xml:id="lqi1">
 <its:locQualityIssue locQualityIssueType locQualityIssueComment
  locQualityIssueSeverity locQualityIssueProfileRef />
</its:locQualityIssues> 
inline:
<mrk mtype="x-its" its:locQualityIssuesRef="#lqi1">
...
<its:locQualityIssues xml:id="lqi1">
 <its:locQualityIssue locQualityIssueType locQualityIssueComment
  locQualityIssueSeverity locQualityIssueProfileRef locQualityIssueEnabled/>
</its:locQualityIssues> 
inline:
<mrk type="its:lqi" ref="#lqi1">
...
<its:locQualityIssues xml:id="lqi1">
 <its:locQualityIssue locQualityIssueType locQualityIssueComment
  locQualityIssueSeverity locQualityIssueProfileRef locQualityIssueEnabled/>
</its:locQualityIssues> 
Localization Quality Rating
(dF)
structural: structural:
inline: inline:
MT Confidence
(dF)
Structural: It is recommended that for use in alt-trans the existing xlf:match-quality attribute be used for presenting the value of its:mtConfidence. In this case, the value of the alt-trans origin attribute should be set to "MT", e.g.
<alt-trans mid="0" match-quality="0.546" origin="MT">

Note: Only in cases when the XLIFF files are used with tools that do not consume the XLIFF alt-trans match-quality and origin and attributes, should consideration be given to using its:mtConfidence, but only for the target and bin-target sub-elements of the alt-trans, e.g.

<alt-trans>
 <target its:mtConfidence="0.8982">some translated text</target>
</alt-trans>

In addition, if the content of an alt-trans target element is copied verbatum to the target element of a trans-unit, i.e. no post-editing is conducted on the MT translation, then its:mtConfidence can be used as an attribute for the trans-unit target sub-element, if the translation was perfomrned on the whole unit, or otherwise on each segment mrk element individually, e.g.

<trans-unit>
  <target>
   <mrk mtype="seg" its:mtConfidence="0.8982">some translated text</target>
  </target>
</trans-unit>
structural:
inline: TO BE REVISITED, does this make sense given the above, apart form perhaps for differential subsegment confidence scores.


This is only relevant with mtype="seg"

inline:
Allowed Characters
(Yves)
structural:
<trans-unit its:allowedCharacters>
structural:
<unit its:allowedCharacters>
inline:
<mrk mtype="x-its" its:allowedCharacters="[pattern]">
inline:
<mrk id="id" type="its:allowedCharacters"
 value="[pattern]">
Storage Size
(Yves)
structural:
<trans-unit its:storageSize its:storageEncoding its:lineBreakType>

(maxbytes not enough and can't use both pointer and local markup)

structural:
<unit its:storageSize its:storageEncoding its:lineBreakType

(see also possible module from FE)

inline:
<mrk mtype="x-its" its:storageSize its:storageEncoding its:lineBreakType>
inline: ??? (Could use a delimited string in mrk's @value. Otherwise: This opens the question of allowing or not extented attributes in <mrk>

1 Notes

1.1 Provenance mapping

1.1.1 Best Practice

In XLIFF, the ITS provenance annotation should only be added as local stand-off markup i.e. using a its:provenanceRecords element within the XLIFF file. This facilitates the addition of further its:provenanceRecord elements as additional translation, translation revisions or other activities recorded in external provenance records are conducted upon the XLIFF file.

If the its:provenanceRecords element referenced by a its:provenanceRecordsRef contains any of the translation or translation revision related attributes, namely: its:person, its:personRef, its:org, its:orgRef, its:tool, its:toolRef, its:revPerson, its:revPersonRef, its:revOrg, its:revOrgRef, its:revTool or its:revToolRef, then the its:provenanceRecordsRef should only be used as local or global annotation selecting xlf:target or xlf:bin-target elements or a xlf:mrk inline markup within either of those XLIFF elements. This is because the provenance mark-up in this case is appropriate only to translated text.

If the its:provenanceRecords element referenced by a its:provenanceRecordsRef contains only the provRef attribute, then the its:provenanceRecordsRef may be used as local or global annotation selecting any XLIFF elements, since the its:provRef attribute may point to an external provenance records that could relate to an activity that resulted in textual content of any of the elements in an XLIFF file.

If, as the result of additional activities upon an XLIFF file results in values in a its:provenanceRecord that forks from that of other elements referencing the same its:provenanceRecords, then that its:provenanceRecords must be copied to a new element with a distinct id, while the reference attribute for the element(s) concerned is changed to refer to this new its:provenanceRecords id.

1.1.2 Design Note

Note XLIFF1.2 supports some constructs that could map to ITS provenance annotation as outlined below. This mapping is not recommended.

<trans-unit phase-name="#ph1">
<target phase-name="#ph2">
...
<phase-group>
 <phase phase-name="ph1">
   process-name="translate"
   company-name="[value of its:orgRef or its:org]"
   tool-id="tl1"
   contact-name="[value of its:person]"
   contact-email="[value of its:personRef IF it has scheme 'mailto:'] "
   /> 
 <phase phase-name="ph2">
 ...
   /> 
</phase-group> 
 ... 
<tool tool-id="tl1">
  tool-name="[value of its:toolRef or its:tool]"
</tool>

One limitation however is that we can't map non 'mailto:' scheme for its:personRef into contact-email. Therefore the proposed mapping uses a reference to the ITS stand-off record, its:provenanceRecords. This also offers a similar mapping then for both XLIFF1.2 and XLIFF2.0 where phase-group is not available.