This document defines data categories and their
implementation as a set of elements and attributes called the
The document provides examples of how ITS can be used
with existing vocabularies. Feedback is especially appreciated
on the general design of ITS, and on the design of the
individual data categories.
This is a Last Call Working Draft of
"Internationalization Tag Set (ITS) Version 1.0".
This document defines data categories and their
implementation as a set of elements and attributes called the
The document provides examples of how ITS can be used
with existing vocabularies. See the latest
revision log for changes since the last publication of this document.
This document was developed by the
The W3C Membership and other interested parties are invited to review the document and send comments through
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the
This is an updated version of this document.
This document defines a standard for high-quality, cost
efficient internationalization and localization of schemas and
XML instances (both existing ones and new ones). On the one
hand, the standard is defined conceptually through the notion
of data categories. On the other hand, the standard defines
implementations of these data categories as a set of elements
and attributes called the Internationalization Tag Set
(ITS). The document provides examples of how ITS can be used
with existing popular markup schemes such as
DocBook. Furthermore, the document provides implementations
for three schema languages: XML DTD Feedback related to this document is
especially appreciated on the general concept of ITS and the
mechanisms defined for the selection of ITS-specific
information in documents, and on the design of the individual
data categories.
Requirements for this document are formulated in
This document covers the following requirements:
The following requirements will be addressed in
The Working Group decided not to cover the following requirements at this time to be able to focus on the most important ones.
The Working Group will cover some of the requirements which currently
are not covered here in a separate document on best practices for internationalization and localization
of schemas and XML documents.
The ITS specification aims to provide different types of users with information about what markup should be supported to enable worldwide use and effective internationalization and localization of content. The following paragraphs sketch these different types of users, and their usage of ITS.
This type of user will find proposals for attribute and element names to be included in their new schema (aka “host vocabulary”). Using the attribute and element names proposed in the ITS specification may be helpful because it leads to easier recognition of the concepts represented by both schema users and processors. It is perfectly possible, however, for a schema developer to develop his own set of attribute and element names. The specification sets out, first and foremost, to ensure the required markup is available, and that the behaviour of that markup meets established needs.
This type of user will be working with schemas such as DocBook, DITA, or perhaps a proprietary schema.
The ITS Working Group has sought input from experts developing widely used formats such as the ones mentioned, and the ITS specification provides examples of how those formats (aka “host vocabulary”) could be used with ITS.
The Working Group covers the question
“How use ITS with existing popular markup schemes?”
in more detail in a separate document
Developers working on existing schemas should check whether their schemas support the markup proposed in this specification, and, where appropriate, add the markup proposed here to their schema.
In some cases, an existing schema may
already contain markup equivalent to that
recommended in ITS. In this case it is not necessary
to add duplicate markup since ITS provides
mechanisms for associating ITS
markup with markup in the host vocabulary which
serves a similar purpose (see
This type of users encompasses companies which provide tools for authoring, translation or other flavours of content-related software solutions. It is important to ensure that such tools enable worldwide use and effective localization of content. For example, translation tools should prevent content marked up as not for translation from being changed or translated. It is hoped that the ITS specification will make the job of vendors easier by standardising the format and processing expectations of certain relevant markup items, and allowing them to more effectively identify how content should be handled.
This type of users comprises authors, translators and other types of content authors. The markup proposed in this specification may be used by them to mark up specific bits of content. Aside: The burden of inserting markup should be removed from content producers by relating the ITS information to relevant bits of content in a global manner (see global, rule-based approach). This global work, however, may fall to information architects, rather than the content producers themselves.
In order to support all of these users, the information about what markup should be supported to enable worldwide use and effective internationalization and localization of content is provided in this specification in two ways:
The ITS specification proposes several mechanisms for
supporting worldwide use and effective internationalization and localization of
content. We will sketch them below by looking at them from the
perspectives of certain user types. For the purpose of
illustration, we will answer the question, how ITS can
indicate that certain parts of content should or should
not be translated.
A content author uses an attribute on a particular element to say that the text in the element should not be translated
A content author or information architect uses markup at the top of the document to identify a particular type of element or context in which the content should not be translated.
A processor may inject markup at the top of the document which links to ITS information outside of the document.
A schema developer integrates ITS markup declarations in his schema to allow users to indicate that specific parts of the content should not be translated.
The first two approaches above can be likened to the
use of CSS in XHTML. Using a
Content or software that is authored in one language (so-called source language) is often made available in additional languages or adapted with regard to other cultural aspects. This is done through a process called localization, where the original material is translated and adapted to the target audience.
In addition, document formats expressed by schemas may be used by people in different parts of the world, and these people may need special markup to support the local language or script. For example, people authoring in languages such as Arabic, Hebrew, Persian or Urdu need special markup to demarcate directionality in mixed direction text.
From the viewpoints of feasibility, cost, and efficiency,
it is important that the original material should be
suitable for localization. This is achieved by appropriate
design and development, and the corresponding process is
referred to as internationalization. For a detailed
explanation of the terms "localization" and
"internationalization", see
The increasing usage of XML as a medium for
documentation-related content (e.g. DocBook, and DITA as
formats for writing structured documentation, well suited to
computer hardware and software manuals) and software-related
content (e.g. the eXtensible User Interface Language
The following examples sketch one of the issues that currently hinder efficient XML-related localization: the lack of a standard, declarative mechanism which identifies which parts of an XML document need to be translated. Tools often cannot automatically do this identification.
The first file name in the first
In the example below, there are no clear mechanisms
allowing one to know which
This standard does not exhaustively cover all mechanisms
and data formats which might be needed for configuring
localization workflows or tools to process a specific
format. These mechanisms and data formats, sometimes called
“XML localization properties” is a generic term to name the mechanisms and data formats that allows localization tools to be configured in order to process a specific XML format. Examples of "XML localization properties" are: the "Trados DTD Settings" file, and the SDLX "Analysis" file.
Abstraction via
Powerful Selection relies on the information which is
given in the XML Information Set . ITS applications may implement inclusion mechanisms
such as XInclude or DITA's conref.
Content authors need for example a simple way to work
with the translatability data
category in order to express whether the content of an
element or attribute should be translated or
not. Localization coordinators, on the other hand, need an
efficient way for managing translations of large document
sets based on the same schema. This could by realized by a
specification of defaults for translatability and exceptions
from the defaults (e.g. all
This specification responds to these requirements by
introducing mechanisms for specifying ITS information in XML
documents, see
This specification has been developed using the ODD (
XSLT transformations are provided by the TEI to extract documentation in HTML, XSL FO or LaTeX forms, and to generate RELAX NG documents and DTD. From the RELAX NG documents, James Clark's trang can be used to create XML Schema documents.
Information (e.g. "translate this") captured by ITS markup
(e.g. its:translate='yes') always pertains to
one or more XML nodes (mainly element and attribute nodes). In
a sense, ITS markup “selects” the XML node(s). Selection may
be explicit or implicit. ITS distinguishes two approaches to
selection: local, and with global rules.
The mechanisms defined for ITS selection resemble those
defined in
ITS markup can be used with XML documents (e.g. a DocBook
article), or schemas (e.g. an XML Schema document for a proprietary document
format). Since each usage defines some specific requirements,
ITS markup may take different shapes.
The following two examples sketch the distinction between the local and global approaches.
The example above shows how a content author may use the
ITS
For this to work, the schema developer will need to add the
The example above shows a different approach to identifying
non-translatable content, similar to that used with a
For this to work, the schema developer needs to add the
The global, rule-based approach has the following benefits:
The commonality in both examples above is the markup
translate='no'. This piece of ITS markup can
be interpreted as follows:
To summarize: The examples with global and local usage of
ITS markup show that ITS markup, in some cases, appears in
elements defined by ITS itself (the
The ITS
term element in DITA)The power of the ITS selection mechanisms comes at a price: rules related to overriding/precedence, and inheritance, have to be established.
In this example, the ITS data category attribute
translatability data category of the
Depending on the data category and its usage, there are
additional attributes for adding information to the selected
nodes, or for pointing to existing information in the
document. For example, the data category for localization
information can be used to add information to selected nodes,
or to point at existing information in the document. For the
former purpose, a
The functionality of adding information to the selected
nodes is available for each data category except language information. Pointing to
existing information is not possible for data categories which
express
The functionalities of adding information and pointing to
existing information are
The keywords "MUST", "MUST NOT",
"REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "MAY", and "OPTIONAL" in this document are to
be interpreted as described in
The namespace URI that MUST be used by implementations of this specification is:
The namespace prefix used in this specification for this URI is "its". It is recommended that implementations of this specification use this prefix.
In addition, the following namespaces are used in this document:
http://www.w3.org/2001/XMLSchema for
the XML Schema namespace, here used with the prefix
"xs"http://relaxng.org/ns/structure/1.0 for
the RELAX NG namespace, here used with the prefix
"rng"http://www.w3.org/1999/xlink for the XLink namespace, here used with the prefix "xlink"
This specification provides schemas in the format of XML DTD, XML Schema or RELAX NG. However, these schemas are only non-normative: conformance for ITS markup declarations defines only mandatory positions of ITS declarations in schemas. This makes it possible to use ITS with any schema language which allows for using these positions.
For each data category, ITS distinguishes between the following:
The data category translatability conveys information as to whether a piece of content should be translated or not.
The simplest formalization of this prose description on
a schema language independent level is a
A different implementation would be a translateRule element which allows for specifying global rules about translatability.
Selection relies on the information which is
given in the XML Information Set
The selection of the ITS data categories applies to text nodes. In
some cases these nodes form pointers to other resources; a well-known
example is the
The usage of the term
This specification defines two types of conformance: conformance of 1) ITS markup declarations , and conformance of 2) processing expectations for ITS Markup. These conformance types complement each other. An implementation of this specification MAY use them separately or together.
Full implementations of this conformance type will implement all markup declarations for ITS. Statements related to this conformance type MUST list all markup declarations they implement.
.
Since the ITS markup declarations are schema language
independent, each schema language can use its own,
possibly multiple mechanisms to implement the conformance
clauses for ITS markup declarations. For example, an XML
DTD can use parameter entities to encapsulate the ITS local attributes, or
declare them directly for each element. The appropriate
steps to integrate ITS into a schema depend on the design
of this schema (e.g. whether it already has a
customization layer which uses parameter entities). The
ITS schemas in the format of XML DTD, XML Schema and RELAX
NG in
Since the goal of the ITS Working Group is
to deliver
Application-specific processing (that is processing which goes beyond the computation of ITS information for a node) such as automated filtering of translatable content based on the translatability data category is not covered by the conformance clauses below.
Statements related to this conformance type MUST list all data categories they implement, and for each data category which type of selection they support.
The following list summarizes elements relating to global rules and their attributes:
The following list summarizes elements and attributes to be used locally:
The version of the ITS schema defined in this specification is
its:version. If there is no
Each XML document can have a different
version. That is: if external rules are linked via an XLink
ITS data categories can appear in two places:
The two locations are described in detail below.
Global, rule-based selection is implemented using the
This attribute and all other possible attributes at rule elements are in the empty namespace and used without a prefix.
If there is more than one
Depending on the data
category and its usage, there are additional attributes
for adding information to the selected nodes, or for
pointing to existing information in the document. For
example, the data category localization
information can be used for adding information to
selected nodes, or for pointing to existing information in
the document. For the former purpose, an
The functionality of adding information to the selected
nodes is available for each data category except language
information. Pointing to existing information is not
possible for data categories which express
The functionalities of adding information and pointing
to existing information are
Another difference between adding and
pointing is the usage of XPath:
The value of the /". That is, it must
be an
AbsoluteLocationPath as described in The resulting nodes MUST be either element or
attribute nodes.
As for data category specific attributes like
If namespaces
The term element from the TEI is in a
namespace http://www.tei-c.org/ns/1.0.
The
The usage of the
Global rules can appear in the XML document they will
be applied to, or in a separate XML document. The
precedence of their processing depends on these
variations. See also
Markup for global, rule-based selection is defined as follows.
Local selection in XML documents is realized with local ITS attributes, the
It depends on the data category what is being
selected. The necessary data category specific defaults
are described in
its:translate="no" at the
head element means that the textual content
of this element, including child elements, should not be
translated. its:translate="yes" at the
body element means that the textual content
of this element, including child elements, should be
translated. Attribute values of the selected elements
or their children's are not affected by local
its:dir="ltr" at the
Markup for local selection is defined as follows.
One way to associate a document with a set of external
ITS rules is to use the optional XLink
The rules contained in the referenced document MUST be processed as if they were
at the top of the
The example demonstrates how metadata can be added to ITS rules.
The result of processing the two documents above is the same as processing the following document.
Application processing global ITS markup MUST recognize the XLink
External rules may also have links to other external rules. The linking mechanism is recursive, the deepest rules being overridden by the top-most rules, if any.
The following precedence order is defined for selections of ITS information in various positions (the first item in the list has the highest precedence):
In case of conflicts between global selections via multiple rule elements, the last selector has higher precedence.
The precedence order fulfills the same purpose as the
built-in template rules of
Due to the rules described above, the local
translatability information from the
Some markup schemes provide markup which can be used to
express ITS data categories. ITS data categories can be associated with such existing markup, using
the global selection mechanism described in
Associating existing markup with ITS data categories can be only done if the processing expectations are the same or if the processing expectations of the host markup cover at least the same as ITS.
The following table summarizes the relations between data categories, location of their selection mechanisms, and default selections in XML documents.
The data categories differ with respect to defaults in
the XML document for compatibility reasons with existing
standards and practices. For example, the taken from
The data category translatability expresses information
about whether the content of an element or attribute
should be translated or not. The values of this data
category are
Translatability can be expressed with global rules, or locally on an individual element.
As for global rules, translatability is expressed with
a
Locally, translatability is expressed with a
In the
The data category localization information is used to communicate information to localizers about a particular item of content.
This data category has several purposes:
enabledin isolation without knowing the gender, number and case of the thing it refers to.)
Two types of informative notes are needed:
Localization information can be expressed with global rules, or locally on an individual element.
Using global rules, addition of localization information to selected nodes is achieved with a
Pointing to existing localization information is
provided by a
The
The
The
The
Locally in a document, localization information is
expressed with the attributes description. The selection is the textual
content of element,
The terminology data category is used to mark terms. This helps to increase consistency across different parts of the documentation. It is also helpful for translation.
The terminology data category can be expressed with global rules, or locally on an individual element.
As for global rules, identifying
terminology information at selected nodes is realized with
a termInfoRef attribute can be used to refer to
external information about the term. The datatype of
xs:anyURI. Locally in a document, the
terminology data category is expressed with a
termInfoRef
attribute. The selection is the textual content of the
element,
This data category expresses the directionality of a
piece of text. Its values are An implementation of the directionality data category MUST follow the XHTML family user agent conformance critera defined in that specification.
The
Directionality can be expressed with global rules or locally on an individual element.
As for global rules, directionality is expressed in
rules using a
Locally in a document, directionality is expressed with
a
The data category ruby is used for a run of text that is associated with another run of text, referred to as the base text. Ruby text is used to provide a short annotation of the associated base text. It is most often used to provide a reading (pronunciation) guide.
Ruby can be expressed locally in a document or with global rules.
Locally in a document, Ruby is realized with a
The structure of the content model for the
An implementation of the ruby data category MUST follow the conformance critera for ruby defined in that specification.
The structure of ruby defined in section 5.4 of
The functionality of pointing to existing ruby markup
is realized with various pointer attributes
for ruby. There is a pointer attribute for the
In legacy situations, where one cannot change the element markup and there one wants to apply ruby text to an attribute or existing element content, then the following approach can be used.
A
The element
The following
The data category elements within text expresses information about how elements should affect the flow of the content. In this context the flow of the content represents how the nodes of the elements should be treated as a single unit for linguistic purposes. Sometimes, a flow can be nested within another one. The values associated with this data category are:
Elements not listed are considered to have the value
This data category can be expressed only in a set of rules. It cannot be expressed as local markup on an individual element.
Element within text is expressed with a
The following schemas are provided:
Several constraints of ITS markup cannot be validated with ITS schemas. The following
The following log records major changes that have been made to this document since the publication in November 2005.
The following log records major changes that have been made to this document since the publication in February 2006.
The following log records major changes that have been made to this document since the publication in April 2006.
This document has been developed with contributions by the ITS Working Group. At the date of publication, the members of the Working Group were: Damien Donlon (Sun Microsystems), Martin Dürst (Invited Expert), Richard Ishida (W3C), Masaki Itagaki (Invited Expert), Christian Lieske (SAP AG), Naoyuki Nomura (Ricoh), Sebastian Rahtz (Invited Expert), François Richard (HP), Goutam Saha (CDAC), Felix Sasaki (W3C), Yves Savourel (ENLASO), Dianne Stoick (Boeing), Najib Tounsi (Ecole Mohammadia d'Ingénieurs Rabat (EMI)) and Andrzej Zydroń (Invited Expert).
A special thanks goes to Sebastian Rahtz who introduced us to the ODD language, which was used to create this document, and who provided the stylesheets to generate schemas and the XHTML version out of an ODD document. The generation of XHTML from ODD takes an intermediate step through the xmlspec-i18n.dtd.
$Id: itstagset.odd,v 1.124 2006/04/20 14:24:12 fsasaki Exp $