The presentation of this document has been augmented to identify changes from a previous version. Three kinds of changes are highlighted: new, added text, changed text, and deleted text.
This document is also available in these non-normative formats: XHTML Diff markup to publication from 18 May 2006.
Copyright © 2007 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document provides a set of guidelines for developing XML documents and schemas that are internationalized properly. Following the best practices describes here allow both the developer of XML applications, as well as the author of XML content to create material in different languages.
This document is still in an early draft stage. Feedback is especially appreciated on the general concept of ITS, the guidelines listed, and when applicable, the mechanisms defined for the selection of ITS specific information in XML documents.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a First Public Working Draft of "Best Practices for XML Internationalization (XML i18n BP)". The document wasprovides best practices developed by the Internationalizationto Tag Set (ITS) WorkingXML that Group, partdevelopers of the W3Capplications Internationalization Activity.well A completecontent list of changesuse to thisensure that their XML documents and schemas document is available.
Thisadaptable is an updatedinternational audience. WorkingThese are Draft of "Besttechniques Practices for XML Internationalization".addressed A completethe list of changescontent development if unnecessary costs and is available. The Internationalization Tag Set (ITS) Workingon.This Group does intends to publish thisInternationalization document as a Working Group, part of the W3C Internationalization Activity. Note.
Feedback about the content of this document is encouraged. Send your comments to www-i18n-comments@w3.org. Use "[Comment on xml-i18n-bp WD]" in the subject line of your email, followed by a brief subject. The archives for this list are publicly available.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. This document is informative only. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
xml:langa way to specify natural language contentThis document is a complement to [ITS]. Not all internationalization-related issues can be solved with special markup described in [ITS]; there are a number of problems that can be avoided by designing correctly the XML format, and by applying a few guidelines when designing and authoring documents. This document and [ITS] implement requirements formulated in [ITS REQ].
This document is divided into two main sections:
The first one is intended to the designers and developers of XML applications.
The second is for the XML content authors. This includes users modifying the original content such as the translators.
Designers and developers of XML applications should read Section 2: When Designing an XML Application. It provides a list of some of the important design choices they should do in order to ensure the internationalization of their format. The techniques are usually illustrated with examples for XML Schema, RELAX NG and XML DTD.
Users and authors of XML content should read Section 3: When Authoring XML Content where they can find a number of guidelines on how to create content with internationalization in mind. Many of these best practices do not require the XML format used to have been developed especially for internationalization.
Section 5: ITS Applied to Existing Formats provides a set of concrete examples on how to apply ITS to existing XML based formats. This illustrates many of the guidelines in this document.
Each guideline is illustrated by one or more techniques (identified with a sequential number through-out the document).
Designers and developers of XML applications should take in account the following best practices:
Best Practice 1: Provide xml:langa way to specify natural language content
Best Practice 1: Provide a way to specify text directionality
Best Practice 1: Indicate the translatability of elements and attributes
Best Practice 1: Provide a way to overrideDefine translatability information
Best Practice 1: Provide texta way to override segmentation-related information
Best Practice 1: Provideelement a way to specify comments for translators
Best Practice 1: Provide a way to specify unique identifiers
Best Practice 1: Provide a way to override terminology information
Best Practice 1: Useruby multilingual documents with caution
xml:lang inattribute to your DTD or schema to allow to specify the natural language the content of the content.Howdocuments is . See the toLanguage doIdentification this
Makesection sure the XML Specification for more information on xml:lang.
It is attribute is available for the root element of your document, and for anyxml:lang element where a change of language may occur.
Foras details on how to add an attribute suchwould diminish the interoperability as xml:langyour to a DTD, an XSD schema, or a RELAX-NG schema, see: Section 4.2: Adding an Attribute to an Existing DTD or Schema.applications.
Note: The scope of the xml:lang attribute applies to both the attributes and the content of the element where it appears, therefore one cannot specify different languages for an attribute and the element content. ITS does not provide remedy for this. Instead, it is recommended to not use attributes for translatable text.
Note: If
Include notxml:lang the language ofschema.
Import the content, but a natural language value as data or meta-data about something externalreferences to the document has to be specified, an attribute different from xml:lang should be used.
In this exampleinclude the XHTML hreflang attribute does not indicate that thedocument, content offollow the element a isprovided in German.
<a xml:lang="en" href="german.html" hreflang="de">Click here for German</a>
For existing DTD and schema:.
If you are working with an existing DTD or schema where there is a way to specify content language that is not implemented usingOnce the xml:lang attribute (but still uses the same values as xml:lang),declared, you should provide an ITS rules document where you use the its:langRuleelements element to specify what attribute or element is used instead of xml:lang.schema.
In this documentnot the langcodelanguage element is used to specify the language of an entry.
<myRes> <messages> <msg id="1"> <langcode>en</langcode> <text>Cannot find file.</text> </msg> <msg id="2"> <langcode>fr</langcode> <text>Fichier no trouvé.</text> </msg> </messages> </myRes>
Use the following rule to specify that the langcodedifferent element holds thethan same values as the xml:lang attribute.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:langRule selector="//text" langPointer="../langcode"/> </its:rules>
Whyused. doing this
Itexample is notthe recommended to use yourhreflang own attribute or element to specify the language content. XHTML.
The xml:langhreflang attribute is supported by various XML technologies such as XPath and XSL (e.g. the lang() function). Using something different would diminish the interoperability of your documents and reduce your capability to take advantage of some XML applications.XHTML
How to do thisTODO
Make sure theinclude its:dir attribute is available for the root element of your document and for all elements with content that may be rendered..
For
Make details on how to addhave an attribute such as its:dir-like to a DTD, an XSD schema, or a RELAX-NG schema, see: Section 4.2: Adding an Attribute to an Existing DTD or Schema.
For existing DTD and schema:TODO
If you are working with, an existing DTD or schema where there ishas a way to specify textvalues directionality that is not implemented using thewith its:dir attribute, you should provide an ITS rules document where you use the its:dirRule element to associate the different directionality indicators with their equivalent in ITS..
In this document the textdiryour attribute is used to specify directionality of a text run.
<text xml:lang="en"> <body> <par>In Hebrew, the title <quote xml:lang="he" textdir="r2l">פעילות הבינאום, W3C</quote> means <quote>Internationalization Activity, W3C</quote>.</par> </body> </text>[Ed. note: TODO:with Need to get the correct display (or not? ...)]
Use the following rulecategory, to specify the relationships between the textdir attribute of the format andusing the ITS "Directionality"dirRule data category.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:dirRule selector="//*[textdir='l2r']" dir="ltr"/> <its:dirRule selector="//*[textdir='r2l']" dir="rtl"/> <its:dirRule selector="//*[textdir='lro']" dir="lro"/> <its:dirRule selector="//*[textdir='rlo']" dir="rlo"/> </its:rules>
Why doing thiselements.
[Ed. note: TODO]
How to do this
Make surethat all translatable text is stored as element content.
For example, do not allow this:
The alt attribute contains translatable text.
<image src="elephants.png" alt="Elephants bathing in the Zambezi River."/>
Instead, design for this:
There is no more translatable attribute.
<image src="elephants.png">Elephants bathing in the Zambezi River.</image>
For existing DTD and schema:
The default assumption in ITS is that attributes are not translatable. If you are working with a DTD or a schema where there are translatable attributes, you should provide an ITS rules document where you use the its:translateRule element to specify what attributes are translatable. See: Best Practice 1: Indicate the translatability of elements and attributes for more information how to do this.
Why doing thisattributes.
There are a number of issues related to storing translatable text in attribute values. Some of them are:
The language identification mechanism (i.e. xml:lang) applies to the content of the element where it is declared, including its attribute values. If the text of an attribute is in a different language than the text of the element content, one cannot set the language for both correctly.
In some languages, bidirectional markers may be needed to provide a correct display. Tags cannot be used within an attribute value. One can use Unicode control characters instead, but this is not recommended (see: [Unicode in XML]).
It is difficult to apply to the text of the attribute value meta-information such as no-translate flags, designer's notes, etc.
The difficulty to attach unique identifiers to translatable attribute text makes it more complicated to use ID-based leveraging tools.
Translatable attributes can create problems when they are prepared for localization because they can occur within the content of a translatable element, breaking it into different parts, and possibly altering the sentence structure.
All these potential problems are less likelynot to occur when the text is the content of an element rather than the value of an attribute.
Note: InDo many occurences, movingput translatable text from attribute value to element content can result in having a sentence embedded within another one. For instance, in the example above: the description of the image will be embedded inside the text of the paragraph where the image is. In such cases, do not forget to declare the relevant element (here image) as 'nested', as described here: Best Practice 1: Provide texta way to override segmentation-related information
HowMake to do this
Youtranslatable should provide an ITS rules documentelement where you use its:translateRule elements to indicate which elements have non-translatable content.
If you are working with a DTD or a schema where there are translatable attributes (something that isdo not recommended),allow you should also use its:translateRule to specify these translatable attributes.this:
Note: Because a rule hasBad precedence over the onesalt before, you want to start with the most general rules first and progressively override them as needed. Some rules may be more complex to take in account all the aspects of inheritance.translatable.
Note: Try to keep the number of nodes to bedesign overriden to a minimum for better performances. For example, If most of a document should not be translated, it is better to set the root element to be non-translatable than to set all elements. The inheritance mechanism will have the same effect for a much lower computing cost.this:
Note: If needed, make provisions for the caseattribute.
Indicate where the content of an element is flagged with xml:lang="zxx", where zxx indicates a content that is not in aelements language, and therefore is most likely not translatable.
In the following document, theXML content of the headtranslatable element should not be translated, and the valuemost of the alttime, attribute shouldnon-translatable be translated. In addition, the content of the del element should not be translated.
<myDoc xml:lang="en"> <head> <author>Page Harrison</author> <rev>v13 July-27-2005</rev> </head> <par>To start click <ins>the <ui>Start</ui> button</ins> <del>this icon: <ref file="start.png" alt="Start icon"/> </del> and fill the form.</par> </myDoc>
Theuses. first rule indicates that the content of headall in myDocelements is not translatable. By inheritance,and the child elementsvalues of head are also assumed not translatable.
The second rule indicates that all the alt attributes are translatable.
The third rule indicates that the content of del is not translatable.
The fourth rule indicates that the non-translatability of del applies also to any attribute that may have been set as translatable by a prior rule (i.e. the second rule).
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:translateRule selector="/myDoc/head" translate="no"/> <its:translateRule selector="//*/@alt" translate="yes"/> <its:translateRule selector="//del" translate="no"/> <its:translateRule selector="//del/descendant-or-self::*/@*" translate="no"/> </its:rules>
Why doing this
By default, ITS assumes that the content of all elements is translatable and that all attributes have non-translatable values. If your XML document type does not correspond to this default assumptions it is important tospecify indicate what are the exceptions.
its:translatewhat and its:rulesare in your DTD or schema to allow authors to override translatability information.translatable.How to do this
Make sure the its:translate attribute is available for the root element of your documents, and for any element that has text content.
For details on how to add an attribute such as its:translate to a DTD, an XSD schema, or a RELAX-NG schema, see: Section 4.2: Adding an Attribute to an Existing DTD or Schema.
Make also sureUse the its:rules element is available somewhere in your documents, for exampleexpression in the headerselector part if there is one. The its:rules element provides accessattribute to the its:translateRule element which can beassign used to change the translatability property of attributes.information.
In the following document, the content of the par elements are normally translatable, but in this instance, the last one should remain in English. Declaring its:translatebe as an optional attribute of the par element allows the author to set the given paragraph as not translatable.translated:
<myDoc xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <par>To apply these terms to you library, attach the following notice. It is safest to attach it to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found.</par> <par>The notice should read (preferably in English):</par> <par its:translate="no">This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This software is distributed as open source under LGPL.</par> </myDoc>[Ed. note: TODO: Maybe this is more an example for the author. Maybe we should have and example showing ITS inclusion in a DTD or schema.]
For existing DTD and schema:
If you are working with DTD or a schema where there is a way to override translatability information that is not its:translate, you should provide anof ITS rules document where you use the its:translateRule element to associate this mechanism to the ITS Translate data category.used.
For example, [DITA 1.0] offers its own translate attribute, and [Glade] provide its own translatable attribute. Both have the same function as its:translate.
The following rules indicate howIn to associate the DITA translatedocument attribute with the ITS Translate data category. The order in which the rules aretext listed is important: You must first defined the general rules, then two rules for the case of the elements with translate="no" (one for the elements, one for the attributes of children elements), and lastly, the rule for translate="yes"emphasized.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <!-- Rules for the translate attributes --> <translateRule selector="//*[@translate='no']" translate="no"/> <translateRule selector="//*[@translate='no']/descendant-or-self::*/@*" translate="no"/> <translateRule selector="//*[@translate='yes']" translate="yes"/> </its:rules>
You can find a more complete example of how DITA markup is associated with ITS in Section 5.4.2: Relating ITS to Existing Markup in DITA.
Why doing this
In some cases,define the author of a document mayparts need to change the translatability property onuse parts of the content, overriding defaults or moregeneral rules.element:
Howassign to do this
By default, ITS processors assume thatof the text content of each element is separated fromdocuments the other elements. You should provide an ITS rules document where you use the its:withinTextRule element to indicate which elements should be treated as part of its parent, or as a nested entry.
The following DITA document hasinformation two elements that should be treated as "within text": term andthe b, and one that should be treated as a nested independent run of text: fn.default.
<concept id="myConcept" xml:lang="en-us"> <title>Types of horse</title> <conbody> <ol> <li>Palouse horse:<p> <term>Palouse horses</term> <fn>A palouse horse is the same as an <b>Appaloosa</b>.</fn> have spotted coats. The <term>Nez-Perce</term> Indians have been key in breeding this type of horse.</p> </li> </ol> </conbody> </concept>
The its:withinTextRule element is used to specify the behavior of term and b (within text), as well as fn (nested). Any case not listed is assumed to have the value its:withinText="no".
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:withinTextRule selector="//term | //b" withinText="yes"/> <its:withinTextRule selector="//fn" withinText="nested"/> </its:rules>
These rules applied on the example document result on four distinct runs of text:TODO
title: "Types of horse"
li: "Palouse horse:"
p: "{term}Palouse horses{/term}{fn/} have spotted coats. The {term}Nez-Perce{/term} Indians have been key in breeding this type of horse."
fn: "A palouse horse is the same as an {b}Appaloosa{/b}."
Why doing this
Many applications that process content for linguistic-related tasks need to be able to perform a basic segmentation of the text content. They need to be able to do this without knowing about the semantic of the elements.
While in many cases it is possibleup to automatically detect mixed content, there are some occurrences where the structure of an element makes it impossible forgeneric tools to know for sure how to treat text. For example, the li element in XHTMLprocess.
This can contain text as well as p elements.rules.
[Ed. note: TODO: More details]
its:locNote, its:locNoteType, and its:locNoteRef in your DTD or schema to allow authors to provide translation-related notes and instructions.text.HowUse to do this
Makeexpression sure the attributes its:locNote, its:locNoteType,attribute as well as its:locNoteRef are available in your DTD or schema.
Make also sure thatof the its:rules element is available somewhere in your documents, for example in the header part if there is one. The its:rules element provides access to the its:locNoteRule element which can be used to specify translation-related notes and instructionITS at a more general level.
Why doing this
[Ed. note: TODO]
The xml:id attribute is a possible candidate for such role.
If your DTD or schema contains elements that define termsare or information associated to terms you should use the its:termRule elementof to indicate their correspondance with the ITS "Terminology" datatext category.
[Ed. note: TODO]flow.
its:termlink, its:termInfoRefstrong, and its:rulesemph in your DTD or schema to allow authors to override terminology-relatedof informationHow to do this
Make sure the its:termflow and the its:termInfoRef attributes are available for any element that has text content. [Ed. note: Notshould sure about this: Shouldn't it apply only to elements that are defined as term?]
Make also sure the its:rules element is available somewhere in your documents, for example in the header partnot if there is one. The its:rules element provides access to the its:termRule element which can be used to change the terminology-related information ofinto attributes.
[Ed. note: TODO]segments:
Why doing this
In some cases, the author of a document may need to change the information indicating what is a term or how to point to term information, overriding more general rules that have been defined for the DTD or schema.
If possible avoid having element names reflecting the ID of the element
[Ed. note: TODO]
<strings> <INPUTPATH>Input path:</INPUTPATH> <HELP>Help</HELP> <OK>OK</OK> <CANCEL>Cancel</CANCEL> </strings>
Instead, [Ed. note: TODO]
[Ed. note: TODO]
<strings> <str xml:id="INPUTPATH">Input path:</str> <str xml:id="HELP">Help</str> <str xml:id="OK">OK</str> <str xml:id="CANCEL">Cancel</str> </strings>
Provides these rules inProvide a single standalone ITS document. ITS-aware tools will be able to associate it with the documents it pertains using their own mechanism, or the authors of the documents will be able to use the ITS linking mechanism to point to it.
You ITS rules document should include the following information, when applicable:
items
What part of your markup has translatabilityat rules different from the defaults (See: Best Practice 1: Indicate the translatability of elements and attributes).
Thean list of elements that should be treated as "nested" orassign "within text" from a segmentation viewpoint (See: Best Practice 1: Provide texta way to override segmentation-related information).
What part of your markup denotes terms and information relatedidentifier to them (See: Best Practice 1: Indicate terminology-related elements).text.
What part of your markup holds notes for the localizers or the translators (See: Best Practice 1: Provideelement a way to specify comments for translators).
The correspondance between any proprietary mechanism you have to specify the language of content and xml:langxml:id (See: Best Practice 1: Provide xml:langa way to specify natural language content).
Thecorrespondance between any proprietary mechanism you have to override translatability information and the ITS equivalent (See: Best Practice 1: Provide a way to overrideDefine translatability information).
The correspondance between any proprietary mechanism you have to indicate text directionality and its:dir (See: Best Practice 1: Provide a way to specify text directionality).
The correspondance between any proprietary mechanism you have to markup Ruby text and its:ruby (See: Best Practice 1: Provide a way to specify rubywithin text).role.
Some examples of ITS rules documents for existing XML formats are shown in Section 5: ITS Applied to Existing Formats.
Authors of XML content should consider the following best practices:
Best Practice 1: Override translatability information if needed
Best Practice 1: Assign unique identifiers to text items when possible
Best Practice 1: Ensure any inserted text is context-independent
A number of these practices can be followed only when the XML application has been internationalized properly using the design guidelines Section 2: When Designing an XML Application.
How to do this
Your DTD or schema should provide the xml:lang attribute for this purpose. See: Best Practice 1: Provide xml:langa way to specify natural language content for more information.
Use this recommended attribute on the root element and, if needed, on each element for which the language content is different. The elements without declaration inherit the language information from their parents.
In this example, the main content of the document is in English, while a short citation is identified as being in French Canadian.
<document xml:lang="en"> <para>The motto of Québec is the short phrase: <q xml:lang="fr">Je me souviens</q>. It is chiseled on the front of the Parliament Building.</para> </document>
Why doing this
Having information about what is the language of the content is very important in many situations. Some of them are:
selection of a proper font (e.g. for traditional or simplified Chinese)
processing of the text for wrapping and hyphenation
providing spell-checking or grammar verification of the text
selecting proper formatting properties for data such as date, time, numbers, etc.
selecting proper automated text such as quotation marks or other punctuation signs
using the text with voice browsers
Howattribute to do this
[Ed. note: TODO]
Overriding translatability information relates to marking up paragraphs or sectionspecify of text that should remain untranslated, but are enclosed in XML elements that are normally translatable.
Note: Authors should NOT use its:translate to tag single words or terms that (they think) should remain the same as the source language when translated into a given targetwhat language (e.g. loan-words). This type of decision is done during translation using terminology lookup tools, and does not involve any specific tagging. Authors may decide what is translatable, but not how to translate it.
Do NOT do the following:
Inyour this document its:translate is used to markup a proper name and two loan words in an attempt to indicate what should not be translated. You should NOT do this.
<book xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <body> <p>Everything started when <span its:translate="no">Zebulon</span> discovered that he had a <span its:translate="no">doppelgänger</span> who was a serious baseball <span its:translate="no">aficionado</span>.</p> </body> </book>
Why doing this
[Ed. note: TODO]document.
How to do this
Use inserted text only whendocument the text is self-contained and does not affect its surrounding context. Error messages, quotations are an example of inserted text that usually would not cause problem.
Avoid to use inserted text that has any effect or dependence onat the context where is is inserted.
Why doing this
If not used properly, inserted text can cause important (and sometimes un-resolvable) problems duringneeded, localization.
Inserted text refers to any textoverride that is marked by a placeholder indeclaration the XML document and automatically inserted within a text content when the document is processed. The nature of such text can be for example:
boilerplate text reused in different contexts,
various parts of a compound document put together,
or variables values computed at some point during the process the document go through.
The implementation of such text can bea done different ways in XML. Some of them are:
Using entity references.
Using [XInclude 1.0] mechanisms.
Using [XLink 1.0] mechanisms.language.
Using a custom mechanism specific to a given format (e.g. the conref attribute in [DITA 1.0]).
There are several important issues related to inserted text. Consider the following:
In this example, the author, working with [DITA 1.0], decided toof reference the standard terms she uses and has at her disposal in a termbase by using the conref mechanism. In this occurence, the term t123 has the value "hydraulic lift".
<p>Using an <term conref="termbase#t123"/> raise the vehicle from the ground.</p>
At a first glance this seems to workdocument fine in English. However, such construction has several problems:
You do not want to separate the article from the noun. If "hydraulic lift" is modified in the future and replaced by some other term, it may require an article 'a' instead of 'an'.
The article/noun separation causes also trouble for the translator: Without any easy way to see the actual term when translating the paragraph, she may not be able to decide the gender of the article.
If it is usedwhile at the beginning of a sentence, the term would need to be capitalized.
Thecitation term is singular in the termbase, while it may need to bebeing plural somewhere in the document.
In inflected languages the form required in the text may be different from the form stored in the termbase. For example, in Polish the term would be stored in its nominative form ("dźwignia hydrauliczna"), while it should be in its instrumental form once inserted in this context: "Używając [dźwignię hydrauliczną] podnieś pojazd z ziemi."