ITS WG Collaborative editing page

Follow the conventions for editing this page.

Author: Yves Savourel

Unique Identifier

Summary

[R004] It should be possible to attach a unique identifier to any localizable item. This identifier should be unique within a document set, but should be identical across all translations of the same item.

Challenges

In order to most effectively re-use translated text where content is re-used (either across update versions or across deliverables) it is necessary to have a unique and persistent identifier associated with the element.

This identifier allows the translation tools to correctly track an item from one version or location to the next. After one is sure that this is the same item, the content can be examined for changes, and if no change has taken place the potential for re-use of the previous translation is very high.

Change analysis constitutes an extremely powerful productivity tool for translation when compared to the typical source matching (a.k.a. translation memory) techniques, which simply look for similar source text in the database without, most of the time, being able to tell whether the context of its use is the same.

This change analysis technique has been possible with user-interface messages in the past, but the introduction of structured XML (and SGML) documents will allow for its use in documents also.

Notes

The xml:id attribute [XML ID][1] may be a means to carry the unique identifier.

Note that, while an xml:id value is unique within a document and not necessarily within a set of documents, combining the xml:id value with the URI of the document would ensure a globally unique identifier.

Quick Guidelines

There are multiple methods for creating unique identifiers, for example:

Using the CRC of the document and add to it the UTC current time in milliseconds as a modifier. See an example of this solution in the xml:tm specification [2].
Using the mechanism described in the Java API documentation for UUIDs. See [3] for for details.
Using URIs, where the in-document ID is a fragment identifier.

There are various ways to check for unique IDs in a document. For example, the following XQuery user defined function:

query version "1.0";
declare namespace eg = "http://example.com#";
declare function eg:checkUniqueness  
($inputSequence as item()*) as item()*
{
for $node in $inputSequence//element()
for $comparedNodes in $inputSequence//element() except $node
where $comparedNodes/@id = $node/@id
return "Error: ID is already defined."
};

The same function can be defined in XSLT:

<xsl:function name="eg:checkUniqueness" as="item()*">
 <xsl:param name="inputSequence" as="item()*"/>
  <xsl:for-each select="$inputSequence//element()">
  <xsl:variable name="node" select="."/>
 <xsl:variable name="comparedNodes" select="$inputSequence//element() except $node"/>
 <xsl:for-each select="$comparedNodes[@id = $node/@id]">
 <xsl:text>Error: ID is already defined.</xsl:text>  
 </xsl:for-each>
 </xsl:for-each>
</xsl:function>