This document defines data categories and their implementation as a set of elements and
attributes called the
This is an updated Public Working Draft of "Internationalization Tag Set (ITS)".
This document defines data categories and their implementation as a set of elements and
attributes called the
This document was developed by the
The Working Group is managing comments on this document using W3C's
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the
This is an updated version of this document.
This document defines a standard for high-quality, cost efficient
internationalization and localization of schemas and XML instances
(both existing ones and new ones). On the one hand, the standard
is defined conceptually through the notion of data categories. On
the other hand, the standard defines implementations of these data
categories as a set of elements and attributes called the Internationalization
Tag Set (ITS). The document provides examples of how ITS can be used
with existing popular markup schemes such as DocBook. Furthermore,
the document provides implementations for three schema languages:
XML DTD
Requirements for this document are formulated in
The Working Group will cover some of the requirements in a separate
document on techniques for internationalization and localization
of schemas and XML instances documents.
The ITS specification aims to provide different types of users with information about what markup should be supported to enable worldwide use and effective localization of content. The following paragraphs sketch these different types of users, and their usage of ITS.
This type of user will find proposals for attribute and element names to be included in their new schema (aka “host vocabulary”). Using the attribute and element names proposed in the ITS specification may be helpful because it leads to easier recognition of the concepts represented by both schema users and processors. It is perfectly possible, however, for a schema developer to develop his own set of attribute and element names. The specification sets out, first and foremost, to ensure the required markup is available, and that the behaviour of that markup meets established needs.
This type of user will be working with schemas such as DocBook, DITA, or perhaps a proprietary schema.
The ITS Working Group has sought input from experts developing widely used formats such as the ones mentioned, and the ITS specification provides examples of how those formats (aka “host vocabulary”) could be used with ITS.
The Working Group intends to cover the question “How use ITS with existing popular markup schemes?” in more detail in a separate document/note on “Modularizations for ITS”.
Developers working on existing schemas should check whether their schemas support the markup proposed in this specification, and, where appropriate, add the markup proposed here to their schema.
In some cases, an
existing schema may already contain markup equivalent to that recommended
in ITS. In this case it is not necessary to add duplicate markup
since ITS provides mechanisms for relating ITS markup with markup
in the host vocabulary which serves a similar purpose (see
This type of users encompasses companies which provide tools for authoring, translation or other flavours of content-related software solutions. It is important to ensure that such tools enable worldwide use and effective localization of content. For example, translation tools should prevent content marked up as not for translation from being changed or translated. It is hoped that the ITS specification will make the job of vendors easier by standardising the format and processing expectations of certain relevant markup items, and allowing them to more effectively identify how content should be handled.
This type of users comprises authors, translators and other types of content authors. The markup proposed in this specification may be used by them to mark up specific bits of content. Aside: The burden of inserting markup should be removed from content producers by relating the ITS information to relevant bits of content in a global manner (see global, rule-based approach) . This global work, however, may fall to information architects, rather than the content producers themselves.
In order to support all of these users, the information about what markup should be supported to enable worldwide use and effective localization of content is provided in this specification in two ways:
The ITS specification proposes several mechanisms for supporting worldwide use and effective localization of content. We will sketch them below by looking at from the perspectives of certain user types. For the purpose of illustration, we will answer the question, how ITS can indicate that certain parts of content should or should not be translated.
A content author uses an attribute on a particular element to say that the text in the element should not be translated
And he said: you need a
new T-Model
A content author or information architect uses markup at the top of the document to identify a particular type of element or context in which the content should not be translated.
...
A processor may inject markup at the top of the document which links to ITS information outside of the document.
...
A schema developer integrates ITS markup declarations in his schema to allow users to indicate that specific parts of the content should not be translated
(see
The first two approaches above can be likened to the use of CSS
in XHTML. Using a
Content or software that is authored in one language (so-called source language) is often made available in additional languages or adapted with regard to other cultural aspects. This is done through a process called localization, where the original material is translated and adapted to the target audience.
In addition, document formats expressed by schemas may be used by people in different parts of the world, and these people may need special markup to support the local language or script. For example, people authoring in languages such as Arabic, Hebrew, Persian or Urdu need special markup to demarcate directionality in mixed direction text.
From the viewpoints of feasibility, cost, and efficiency, it is
important that the original material should be suitable for localization.
This is achieved by appropriate design and development, and the
corresponding process is referred to as internationalization. For
a detailed explanation of the terms "localization" and "internationalization",
see
The increasing usage of XML as a medium for documentation-related
content (e.g. DocBook, and DITA as formats for writing structured
documentation, well suited to computer hardware and software manuals)
and software-related content (e.g. the eXtensible User Interface
Language
The following examples sketch one of the issues that currently
hinder efficient XML-related localization: the lack of a standard,
declarative mechanism which identifies which parts of an XML document need to be translated (the text in bold face shows the parts that need
to be translated). Tools often cannot automatically do this identification.
The first file name in the first
In the example below, there are no clear mechanisms allowing one
to know which
This standard does not exhaustively cover all mechanisms and data
formats which might be needed for configuring localization workflows
or tools to process a specific format. These mechanisms and data
formats, sometimes called
“XML localization properties” is a generic term to name the mechanisms and data formats that allows localization tools to be configured in order to process a specific XML format. Examples of "XML localization properties" are: the "Trados DTD Settings" file, and the SDLX "Analysis" file.
Abstraction via subsections in
Powerful
Content authors need for example a simple way to work with the translatability data category in order to
express whether the content of an element or attribute should be
translated or not. Localization coordinators, on the other hand,
need an efficient way for managing translations of large document
sets based on the same schema. This could by realized by a specification
of defaults for translatability and exceptions from the defaults
(e.g. all
This specification responds to these requirements by introducing
mechanisms for specifying ITS information in XML documents or schemas, see
This specification has been developed using the ODD (
XSLT transform are provided by the TEI to extract documentation
in HTML, XSL FO or LaTeX forms, and to generate RELAX NG documents
and DTD. From the RELAX NG documents, James Clark's trang can be used to create XML Schema documents.
Information (e.g. "translate this") captured by ITS markup (e.g.
its:translate='yes') always pertains to one
or more XML nodes (mainly element and attribute nodes). In a sense, ITS markup “selects” the XML node(s). Selection may be explicit or implicit. ITS distinguishes two approaches to selection: local, and with global rules.
The mechanisms defined for ITS selection resemble those defined
in The local
approach can be compared to the the approach with global rules is similar to the
ITS markup can be used with XML documents (e.g.
a DocBook article), or schemas (e.g. an XSD for a proprietary document
format). Since each usage defines some specific requirements, ITS
markup may take different shapes.
The following three two examples sketch the distinction between the local
and global approaches, and the difference between
ITS in XML instances and schemas.
The example above shows how a content author may use
the ITS
For this to work, the schema developer will need to add the
An
The example above shows a different approach to identifying
non-translatable content, similar to that used with a
For this to work, the schema developer needs to add the translate information, the contents of the
The global, rule-based approach has the following benefits:
The commonality in all of the both examples above is the markup its:translate='no'. This piece of ITS markup can be interpreted as follows:
To summarize: The examples with global and local usage of ITS markup
show that ITS markup,data category attributes in some cases, appears in elements defined by ITS itself (the translateRule element
(embedded within a
The ITS
term element
in DITA)The power of ITS selector attributes comes at a price:
rules related to overriding/precedence, and inheritance,
have to be established.
...
In this example, the ITS data category attribute translate data category of
the
Depending on the data category and its usage, there are additional
attributes for adding information to the selected nodes, or for pointing
to existing information in the document. For example, the data category
localization information can be used for adding information to selected
nodes, or for pointing to existing information in the document. For
the former purpose, a
The functionality of adding information to the selected
nodes is available for each data category except language information. Pointing to existing
information is not possible for data categories which express
The functionalities of adding information and pointing
to existing information are
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in
The namespace URI that must be used by implementations of this specification is:
The namespace prefix used in this specification for this URI is "its". It is recommended that implementations of this specification use this prefix.
In addition, the following namespaces are used in this document:
http://www.w3.org/2001/XMLSchema for the XML Schema namespace, here
used with the prefix "xs"http://relaxng.org/ns/structure/1.0 for the RELAX NG namespace, here
used with the prefix "rng"
an XML-related modelling or validation language such as XML DTD, XML Schema or RELAX NG.
This specification provides schemas in the format of XML DTD, XML Schema or RELAX NG. However, these schemas are only non-normative: conformance for ITS markup declarations defines only mandatory positions of ITS declarations in schemas. This makes it possible to use ITS with any schema language which allows for using these positions.
For each data category, ITS distinguishes between the following:
The data category translatability conveys information as to whether a piece of content should be translated or not.
The simplest
formalization of this prose description on a schema language independent level is a
An alternative formalization on a schema language independent level is a
or schema an ITS
data category and its values should be applied to.Selection can be applied globally, see , and locally, see . As for global selection, ITS information can be added to the selected nodes, or it can point to existing information which is related to selected nodes.
The usage of the term
This specification defines two types of conformance: conformance of 1) ITS markup declarations , and conformance of 2) processing expectations for ITS Markup. These conformance types are defined complementary. An implementation of this specification may use them together.
do not concern the
various subsections in in a schema language independent manner, relying on the ODD language. Their occurrence in other sections of this document is typographically marked via bold face and color.
conformance type:n existing or new schema. All conformance clauses for this productconformance type concern the position of ITS markup declarations in that schema, and their status as mandatory or facultative.
Since the definitions in
ITS markup declarations are a set of elements and attributes, that have been designed using state of the art knowledge about internationalization and localization needs. Since the goal of the ITS Working Group is to deliver clauses defined in this section do not allow an existing or new schema to use only parts of the ITS markup declarations. However, this concerns only the ITS markup declarations in a schema. As for the interpretation of ITS markup and the respective data categories, the product and conformance clauses defined in
The processing expectations for ITS markup define how ITS markup found in XML instance documents has to be interpreted by an application. The markup may be generated or validated relying on an existing or new schema which is conform to the conformance clauses in
schema, global and local). In addition, a set of processing expectations specific to the ruby data category and the directionality data category, refer to external specifications. In addition to selection related processing expectation, an additional set of expectations is described for the ruby data category and directionality data category, by normatively referencing external specifications.
conformance type:which needs to process the nodes (element and attribute nodes) which are captured by a data category for internationalization and localization.which needs to process for internationalization or localization the element or attribute nodes captured by a data category.
Applications which are conform to the clauses above can be, for example: ITS markup aware editors, or translation tools which make use of ITS markup to filter translatable text as an input to the localization process. Their only common property is that they are able to process ITS markup in the way described above. Further processing is not subject to this specification.
The processing expectations for ITS markup encompass knowledge about internationalization and localization needs. A key part of these needs is information about the relation between internationalization and localization data categories and nodes in an XML document. This information differs in a default case, for a specific node in a document, or globally. An example is the default that attribute content should not be translated, and local or global exceptions to this default. The product of processing expectations for ITS markup responds to such needs, by providing the respective mechanism.
Conformance clause 2-4 allows to reuse global ITS markup to select information from multiple documents, without the need to integrate the ITS markup itself into the documents.
Selections of ITS Informationdata categories can appear in threetwo places:
The two locationsselection mechanisms are defineddescribed in detail below.
In Schemas, selection of ITS information is realized with schema annotation. The selection for a data category depends on the position of the schema annotation. Since schema annotation mechanisms are schema language specific, the following definitions are made:
As for XML DTD, this specification defines no selection mechanism within the DTD.
To be able to select elements or attributes defined within a XML DTD, the
mechanisms described in
Several data categories on the same element or attribute declaration should be expressed at the same
Global, rule-based selection is implemented using the rules element. It
contains one or more
Depending on the data category and its usage, there are additional attributes for adding information to the selected nodes, or for pointing to existing information in the document. For example, the data category localization information can be used for adding information to selected nodes, or for pointing to existing information in the document. For the former purpose, an
element has one or more data category
attributes, and for each data category attribute an selector attribute which expresses the selected
information
The functionality of adding information to the selected nodes is available for each data category except language information. Pointing to existing information is not possible for data categories which express
The functionalities of adding information and pointing to existing information are
The naming convention for the selector attributes
is data category + Selector, e.g.
translateSelector. In ITS rules selections, t
/". That is, it must be an AbsoluteLocationPath as described in As for data category specific attributes like
If namespaces
The term element from the TEI is in a namespace
http://www.tei-c.org/ns/1.0. The
The usage of the inspired by
SelectionGlobal rules can appear in a schema (e.g. as content of the xs:appinfo
element), in the
The difference between
Markup for global, rule-based selection is defined as follows.
Having its-global as the entry point of the schema serves as a wrapper schema for an external rules file.
Local selection of ITS information in XML documents is realized with local ITS attributes, the ruby element, or the span element. span serves just as a wrapper for the local ITS attributes and ruby.
It depends on the
data category what is being selected. The necessary data category specific defaults
are described in
its:translate="no" at the its:translate="yes" at the its:translate="no" at the head element means that the textual content of this element, including child elements, should not be
translated. its:translate="yes" at the body element means that the textual content of this element, including child elements, should be translated. Attribute values of the selected elements or their children's are not affected by local
its:dir="ltr" at the
Markup for local selection is defined as follows.
The
One way to associate a document with a set of external ITS rules is to use the optional XLink
The rules contained in the referenced document must be processed as if they were at the top of the
A
The result of processing the two documents above is the same as processing the following document.
A
Application processing global ITS markup must recognize the XLink
External rules may also have links to other external rules. The linking mechanism is recursive, the deepest rules being overridden by the top-most rules, if any.
The following precedence order is defined for selections of ITS information in various positions (the first item in the list has the highest precedence):
In case of conflicts between global selections via multiple rule elements, the last selector has higher precedence.
The precedence order fulfills the same purpose as the built-in template rules of
Due to the rules described above, the local translatability information from the
...
...
Some markup schemes provide markup which can be used to express ITS data categories. ITS
data categories can be mapped to such existing markup, using the global selection mechanism
described in
For the implementation of ITS, apply the rules in the order:
Et voilà !
The following table summarizes the relations between data categories, location of their
selection mechanisms, and default selections in XML documents.
The data categories differ with respect to defaults in the XML document for
compatibility reasons with existing standards and practices. For example, the
The data category translatability expresses information about whether the content of an
element or attribute should be translated or not. The values of this
data category are
Translatability can be expressed in a schema, with global rules, or locally on an individual
element.
In a schema, translatability is expressed with a
As for global rules, translatability is expressed with a
Locally, translatability is expressed with a
In the
And he said: you need a new
motherboard
The data category localization information is used to communicate information to localizers about a particular item of content.
This data category has several purposes:
enabledin isolation without knowing the gender, number and case of the thing it refers to.)
Two types of informative notes are needed:
Localization information can be expressed in a schema, with global rules, or locally on an individual element.
In a schema, localization information is expressed with a
As for global rules, adding localization information to selected nodes is realized with a
The functionality of pointing to existing localization information is realized via a
In an instanceLocally in a document, localization information is expressed with the attributes
If the locInfoType attribute is not present, the type of localization information will be assumed as description. The selection is the textual content
of element,
And he said: you need a new
motherboard
To be able to identify globally existing localization information, the
To be able to differentiate the functionality of the
An example of the usage of the
At the locInfoRule element, there must be either a locInfo element [not attribute] or a locInfoRef attribute. If neither is present, there must be either a locInfoPointer attribute or a locInfoRefPointer attribute. There is an optional locInfoType attribute.
About locInfoLocal: There must be either a a locInfo attribute or a locInfoRef attribute. There is an optional locInfoType attribute.
The terminology data category is used to mark terms. This helps to increase consistency across different parts of the documentation. It is also helpful for translation.
The terminology data category can be expressed in a schema, with global rules, or locally on an individual element.
In a schema, the terminology data category is expressed with a
As for global rules, identifying terminology information at selected nodes is realized with a xs:anyURI. To point to existing term references, a
In an instanceLocally in a document, the terminology data category is expressed with a
And he said: you need a
new motherboard
In an instance document, an attribute
term="yes" is used to indicate a term. In the global rule, "being" a term is
expressed via the name of the element termRule, hence the attribute
term="yes" is not necessary any more. The attributes termRef and
termRefPointer are alternatives. It is an error if they occur at the same
termRule element.
About term: the attribute term is mandatory, the attribute termRef is optional.
This data category expresses the directionality of a piece of text. Its values are
The
Directionality can be expressed with global rules or locally on an individual element.
As for global rules, directionality is expressed in rules using a
In an instanceLocally in a document, directionality is expressed with a
And he said: ... a Hebrew quotation ...
TODO: comment?
The data category ruby is used for a run of text that is associated with another run of text, referred to as the base text. Ruby text is used to provide a short annotation of the associated base text. It is most often used to provide a reading (pronunciation) guide.
Ruby can be expressed locally in a n instance document or with global rules.
In an instanceLocally in a document, Ruby locally in an instance document is realized with a
This is about the
The structure of the content model for the
The structure of ruby defined in section 5.4 of
The functionality of pointing to existing ruby markup is realized with various pointer attributes for ruby. There is a pointer attribute for the
In legacy situations, where one cannot change the element markup and there one wants to apply ruby text to an attribute or existing element content, then the following approach can be used.
A
...
]]>The element rubyRule is used (1) to map existing ruby "markup to ITS ruby, which itself is defind in terms of the W3C ruby specification, or (2) to add ruby text to attribute values. Example for (1): <its:rubyRule its:selector="//span[class='ruby']" its:rbPointer="span[class='rubyBase']" its:rtPointer="span[class='rubyText']"/> . Example for (2): <its:rubyRule its:selector="/body/img[1]/@alt" its:rbPointer="." its:rt="World Wide Web Consortium"/> . It is an error if both an its:rt attribute and an its:rtPointer attribute occur at the same <its:rubyRule> element.
rubyLocal is defined in terms of http://www.w3.org/TR/ruby/\#definition. The (rbc, rtc, rtc?) alternative of the content model for the ruby element corresponds to complex ruby markup. The minimal content model for the ruby element is (rb, (rt | (rp, rt, rp))).
The element
The following
The data category elements within text expresses information about whether an element is part of its parent text unit. The values of this data category are
This data category can be expressed only in a set of rules. It cannot be expressed as local markup on an individual element.
Element within text is expressed with a
Two topics are covered in this section:
XHTML 1.0
In XHTML 1.0, the XHTML namespace may be used with other XML namespaces as per
An example of such a
Some text to translate.
Some text not to translate.
]]>The way to use ITS with XHTML and keep the XHTML document conformant is to use external ITS global rules. Even local information within the document that would be handled by ITS attributes can be set indirectly.
Some text to translate.
Some text not to translate.
]]>A number of XHTML constructs implement the same semantic as some of the ITS data categories. In addition, some of the attributes in XHTML are translatable which is not the default for XML documents according ITS defaults settings. These attributes need to be identified as translatable.
An external ITS
Additional notes on these rules:
its:selector="//h:del/descendant-or-self::*/@*" to overwrite any possible translatable attribute within a TODO
TODO
TODO
The TEI (
The TEI is maintained as a single ODD document, and customizations of it are also written as ODD documents. These are processed using XSLT stylesheets to make a tailored user-level schema in XML DTD, XML Schema or RELAX NG.
The ITS additions involve two changes to TEI:
Both of these can be easily achieved using standard techniques in
ODD.
The body of a TEI/ITS customization consists of a
In addition, we load the ITS schema (in its RELAX NG XML format, the
language used by the TEI for expressing content models), and overload
the definition of the TEI content class model.headerPart
to include the ITS
The content class determines which elements are allowed as children of
att.global to reference the ITS local attributes (available from the ITS schema we loaded earlier):
When processing, this customization produces a schema which permits markup like this:
Hello world
Goodbye
This must not be translated
In this example, a set of rule elements are provided in the header to provide rules, and the body of the text performs a specific override.
ITS has been integrated into xmlspec-i18n.dtd. This is a version of the XML DTD version 2.9 of XML
Spec which already supplies various internationalization and localization
related features. For example, there is an attribute
For the integration of ITS, the following modifications to the xmlspec-i18n.dtd have been made:
<!ENTITY % its SYSTEM "its.dtd"> and the
entity call %its; have been added to xmlspec-i18n.dtd.%common.att; has been modified . The ITS
entities %att.translate.attributes;,
%att.locInfo.attributes;,
%att.locInfoType.attributes;,
%att.locInfoRef.attributes;,
%att.term.attributes;,
%att.termRef.attributes; and
%att.dir.attributes; have been added to %common.att;. In
this way, the local attributes can be used
at any element defined in the XML Spec DTD.%header.mdl; contains the content model of the
%p.pcd.mix;. In this way it is possible to use As mentioned before, xmlspec-i18n.dtd has its own existing markup declarations for
various internationalization and localization related purposes. In the original XML
Spec 2.9 DTD, there is a term element which fulfills the same purpose as the
ITS
To relate such existing XML Spec and xmlspec-i18n.dtd related markup to ITS markup
(see
Since both XML Spec and xmlspec-i18n.dtd do not define a namespace, the mappings use XPath expressions with unqualified element and attribute names.
The
The
A data type data.selector is defined for selector attributes. Its value is an XPath expression
The attribute group att.datacats is used to express the ITS data categories. It makes use of the data type data.itsBoolean.
The elements
The attribute group att.selector is used at the
The
The
Conformance to ITS falls into two categories: conformance to the ITS data categories (cf.
An implementation of the ITS data categories is conformant if it supplies a schema which adopts the ITS data categories, with the following constraints:
The
Conformance to Selection Mechanisms encompasses conformance to the ITS data categories and data category specific default selection mechanisms, with the following changes:
A mandatory part of this conformance criterion is the usage of XPath. An application
which processes ITS selection rules must be able to
process XPath in version 1.0 or higher. It is not required to support a specific host
language of XPath, like for example