Position Statement

Shigemichi Yazawa, GlobalSight

As XML gathers popularity among various industries, the need for localization of XML documents arises. While XML defines the syntax of document, it doesn't define the meanings of each part of document structure, whether it's a visible text or some kind of identifier and so on. I call this meanings contextual information in this paper. The contextual information of the documents are different in each document. Even the documents with the same structure might have different contextual information.

One of the contextual information of documents the localization tools should know about is which elements or attributes are translated so the tools can present only translatable text. Some elements are better to be presented as embedded parts in a sentence (e.g. <bold>). The other elements can contain other formats such as HTML or RTF, then the tools should switch the parser to extract contents correctly.

The list of contextual information the localization tools need to know include:

We need to have some mechanism to indicate these information to localize XML documents properly and effectively. This is the topic ITS group (http://groups.yahoo.com/group/lisa-its) is trying to address. The discussion in the group so far found a couple of possibilities of such mechanism.

One is to markup the contextual information directly in the XML documents. Localizable elements may have localize="yes" attributes. Inline elements may have inline="yes" attributes. This involves an authoring guideline such as not to put localizable text in attributes since it is difficult to indicate which attributes are localizable. This direct markup on original documents also enables to include notes for localizers.

The other way is to define a rule file that can specify the contextual information of the XML documents. The rule file uses a query language such as XPath to address parts in the XML documents and specify how to treat the parts in localization environment. The rule file itself should naturally be XML document.This approach doesn't require the modification of the original XML documents.

My interest is mainly in the rule file approach. GlobalSight actually defines our own simple rule file DTD and uses it in our products. The discussion in ITS group produced so far a little more detailed and robust tag set of the rule file. I expect that this issue is discussed in I18N Workshop and hopefully it spawns an interest group to give some more thoughts which eventually publishes a standard of rule file.