Applying ITS 1.0 functionality in various serializations of HTML 5

What is ITS 1.0?

The Internationalization Tag Set (ITS) 1.0 is a W3C Recommendation to support the Internationalization and Localization of XML. There are various use cases for ITS 1.0 so-called Internationalization and Localization "data categories" . These are described in the Recommendation and in the accompanying document Best Practices for XML Internationalization. Here we will exemplify a use case achieved with the ITS 1.0 data category "Translate". "Translate" is a means to specify whether a piece of content needs to be translated or not during the localization process. A typical scenario is:

  1. The author or a localization engineer specifies what parts need or do not need translation.
  2. The difference between these parts is made obvious to the translator, e.g. via extraction of translatable text or highlighting.
  3. The translator uses this information for the translation.

ITS 1.0 is an important means for in this process. An example input document is given below.

<messages xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0">
 <msg num="123">Click Resume Button on Status Display or  
  <panelmsg its:translate="no">CONTINUE</panelmsg> Button on printer panel</msg>
</messages>

The default of ITS 1.0 "Translate" is that elements content is translatable and attribute values are not. In the example, the content of the <panelmsg> element must not be translated. This is expressed via the so-called "local" ITS 1.0 attribute its:translate with the value no.

An additional approach to such local usage of ITS 1.0 are global ITS "rules". These rules express the same functionality as local ITS 1.0 markup, but they are independent of a position of the target document and can be applied to several (parts of) documents. This is achieved via the usage of XPath. An example is given below.

<its:rules version="1.0" xmlns:its="http://www.w3.org/2005/11/its">
 <its:translateRule selector="//panelmsg" translate="no"/>
</its:rules>

The <its:rules> element contains an <its:translateRule> element with two attributes. The value of the selector attribute is an XPath expression which selects all <panelmsg> elements. The value of the translate attribute has the same function as the local its:translate attribute.

With the global approach of ITS 1.0, it becomes possible to apply ITS 1.0 data categories to documents without changing them, since the ITS 1.0 information can be stored independently of the target documents. This will be demonstrated in the following section, using the example of HTML 5 documents in various serializations.

ITS 1.0 in HTML (5) - Problems

Thousands of HTML documents (that is, HTML 4.01, XHTML, HTML 5, ...) are subject to localization. This means also that some parts of them need to be translated, but others not.

An example input document is given below.

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>An HTML Document</title>
</head>
<body>
<h1>Example</h1>
<p>This is an example HTML document.</p>
<pre>Some source code</pre>
</body>
</html>

For this document, the author or localization engineer might want to specify that the content of a specific <pre> element should not be translated. However, HTML has no markup for this purpose available. A solution which is applicable to XHTML is to extend the XHTML schema with ITS markup. However, this is only applicable to XHTML or the XML serialization of HTML 5, and the usage of local ITS 1.0 markup in the XML serialization might create problems for browser display.

Solution: ITS 1.0 in HTML 5 - serialized as XML or HTML

Our solution for using ITS 1.0 within XHTML and the HTML serialization of HTML 5 can be summarized as follows: to minimize the impact on HTML in both serializations, we do not use ITS 1.0 markup, but markup without a namespace. With the means of global ITS rules, this markup is associated with ITS 1.0 functionality. A detailed description is given below.

Adding markup with ITS 1.0 functionality into HTML 5 documents

In the files listed below, non-translatable content is specified in the following manner. First, with an attribute "translate" with the value "no" at the <pre> element. Second, with a separate document xhtml-sample-rules.xml. This ITS .10 rules file contains the following <its:translateRule> element:

 <its:translateRule
 selector="//h:pre[@translate='no'] | //pre[@translate='no']"
 translate="no" xmlns:h="http://www.w3.org/1999/xhtml"/>

This rule means that all <pre> elements with an attribute translate="no" should not be translated. To put it differently, non-ITS 1.0 markup (the translate attribute) is associated with the functionality of the ITS 1.0 "Translate" data category.

The files are provided in an HTML serialization, served as text/html, and in an XML serialization, served as application/xhtml+xml.

No ITS functionality "translate" attribute at pre element
HTML serialization HTML serialization
XML serialization XML serialization

The files with the "translate" attribute have been tested under Windows with the following browsers: DoCoMo P213i (browser unknown), Firefox 2, IE 6, Opera 9, Safari 3. All browsers display the files properly, which demonstrates the limited impact of this approach.

ITS processing

The input file in XML serialization with markup for ITS 1.0 "Translate" functionality is processed with an ant file in the following manner (note that the ant file assumes the presence of Saxon 8. The location attribute of the <pathelement> element has to be changed accordingly):

  1. An ITS 1.0 implementation, e.g. an XSLT stylesheet, processes the XML serialization. The output is an intermediate XSLT stylesheet.
  2. The intermeditate stylesheet is run against the input document. The output is a document with ITS 1.0 "Translate" information for each element and attribute node.
  3. The output of step 2 is run against a stylesheet which highlights non-translatable content. The result is the final document with CSS styling of non-translatable content.

Step 3 involves CSS selectors. The selection of the non-translatable <pre> element is achieved with the following, automatically generated selector:

html > head ~ body > h1 ~ p ~ pre

The input file in HTML serialization needs to be converted into an XHTML serialization. This can be achieved e.g. by HTML tidy. After this step, the processing as described above is applied. However, the resulting CSS stylesheet / selectors can be applied to the original HTML serialization. In this way, the translator is able to work with the original document.

Resume

It has been shown that the global approach of ITS 1.0 data categories can be used to apply ITS 1.0 data category functionality without ITS 1.0 markup. The purpose for this exercise is a minimal impact on non-ITS processing, e.g. display and editing of HTML 5 documents. The generation of CSS selectors finally allows for applying the ITS 1.0 functionality to the original document in XML or HTML serialization.


$Id: Overview.html,v 1.13 2008/03/17 01:34:40 fsasaki Exp $