This document defines data categories and their implementation as a set of elements and attributes called
the
This document defines data categories and their implementation as a set of elements and attributes called
the
This document is an updated Public Working Draft published by the
Major changes in this version of the document include the addition of several data categories (
Feedback about the content of this document is encouraged. See also
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the
This is the first version of this document.
ITS 2.0 is a technology to add metadata to Web content, for the benefit of localization, language
technologies, and internationalization. The ITS 2.0 specification both identifies concepts (such as
Translate
) that are important for internationalization and localization, and defines
implementations of these concepts (termed “ITS data categories”) as a set of elements and attributes
called the
This document aims to realize many of the ideas formulated in the ITS 2.0 Requirements document, in
Not all requirements listed there are addressed in this document. Those which are not addressed here
are either covered in
ITS 2.0 has the following relations to ITS 1.0:
It adopts and maintains the following principles from ITS 1.0:
ITS 2.0 also adds the following principles and features not found in ITS 1.0:
As of the time of this writing, the new data categories included in ITS 2.0 are:
Content or software that is authored in one language (the
In addition, document formats expressed by schemas may be used by people in different parts of the world, and these people may need special markup to support the local language or script. For example, people authoring in languages such as Arabic, Hebrew, Persian, or Urdu need special markup to specify directionality in mixed direction text.
From the viewpoints of feasibility, cost, and efficiency, it is important that the original
material should be suitable for localization. This is achieved by appropriate design and
development, and the corresponding process is referred to as internationalization. For a
detailed explanation of the terms “localization” and “internationalization”, see
The increasing usage of XML as a medium for documentation-related content (e.g. DocBook and DITA
as formats for writing structured documentation, well suited to computer hardware and software
manuals) and software-related content (e.g. the eXtensible User Interface Language
The following examples sketch one of the issues that currently hinder efficient XML-related localization: the lack of a standard, declarative mechanism that identifies which parts of an XML document need to be translated. Tools often cannot automatically perform this identification.
In this document it is difficult to distinguish between those
Even when metadata are available to identify non-translatable text, the conditions may be
quite complex and not directly indicated with a simple flag. Here, for instance, only
the text in the nodes matching the expression
//component[@type!='image']/data[@type='text'] is translatable.
The ITS specification aims to provide different types of users with information about what markup should be supported to enable worldwide use and effective internationalization and localization of content. The following paragraphs sketch these different types of users, and their usage of ITS. In order to support all of these users, the information about what markup should be supported to enable worldwide use and effective localization of content is provided in this specification in two ways:
This type of user will find proposals for attribute and element names to be included in their new schema (also called "host vocabulary"). Using the attribute and element names proposed in the ITS specification may be helpful because it leads to easier recognition of the concepts represented by both schema users and processors. It is perfectly possible, however, for a schema developer to develop his own set of attribute and element names. The specification sets out, first and foremost, to ensure that the required markup is available, and that the behavior of that markup meets established needs.
This type of user will be working with schemas such as DocBook, DITA, or perhaps a proprietary schema. The ITS Working Group has sought input from experts developing widely used formats such as the ones mentioned.
The question "How to use ITS with existing popular markup schemes?" is covered in
more details (including examples) in a separate document:
Developers working on existing schemas should check whether their schemas support the markup proposed in this specification, and, where appropriate, add the markup proposed here to their schema.
In some cases, an existing schema may already contain markup equivalent to that
recommended in ITS. In this case it is not necessary to add duplicate markup since ITS
provides mechanisms for associating ITS markup with markup in the host vocabulary which
serves a similar purpose (see
This type of user includes companies which provide tools for authoring, translation or other flavors of content-related software solutions. It is important to ensure that such tools enable worldwide use and effective localization of content. For example, translation tools should prevent content marked up as not for translation from being changed or translated. It is hoped that the ITS specification will make the job of vendors easier by standardizing the format and processing expectations of certain relevant markup items, and allowing them to more effectively identify how content should be handled.
This type of user comprises authors, translators and other types of content author. The markup proposed in this specification may be used by them to mark up specific bits of content. Aside: The burden of inserting markup can be removed from content producers by relating the ITS information to relevant bits of content in a global manner (see global, rule-based approach). This global work, however, may fall to information architects, rather than the content producers themselves.
The ITS specification proposes several mechanisms for supporting worldwide use and effective internationalization and localization of content. We will sketch them below by looking at them from the perspectives of certain user types. For the purpose of illustration, we will demonstrate how ITS can indicate that certain parts of content should or should not be translated.
A content author uses an attribute on a particular element to say that the text in the element should not be translated.
The its:translate="no" attributes indicate that the
A content author or information architect uses markup at the top of the document to identify a particular type of element or context in which the content should not be translated.
The
A processor may insert markup at the top of the document which links to ITS information outside of the document.
A
The
A schema developer integrates ITS markup declarations in his schema to allow users to indicate that specific parts of the content should not be translated.
The declarations for the commonAtts. This allows to use the
The first two approaches above can be likened to the use of CSS in
ITS 2.0 adds support for usage in HTML5. In HTML5, ITS local selection is realized via dedicated, data category specific attributes.
For the so-called “global approach” in
HTML5, this specification defines a link type for referring to files with global rules. These
rules are then processed as described in
The link element points to the rules file
EX-translateRule-html5-1.xml The rel attribute identifies the
ITS specific link relation its-rules.
The rules file linked in
ITS 2.0 does not define how to use ITS in HTML versions prior version 5. Users are encouraged
to migrate their content to HTML5 or XHTML. While it is possible to use its-*
attributes introduced for HTML5 in older versions of HTML (such as 3.2 or 4.01) and pages
using these attributes will work without any problems, its-* attributes will be
marked as invalid in validators.
The definition of what a localization process or localization parameters must address is outside the scope of this standard and it does not address all of the mechanisms or data formats (sometimes called Localization Properties) that may be needed to configure localization workflows or process specific formats. However, it does define standard data categories that may be used in defining localization workflows or processing specific formats.
“
Abstraction via
Powerful
Content authors, for example, need a simple way to work with the Translate data category in order to express whether the content of an element or
attribute should be translated or not. Localization managers, on the other hand, need an
efficient way to manage translations of large document sets based on the same schema. These
needs could by realized by a specification of defaults for the Translate data category along with exceptions to those defaults (e.g. all
To meet these requirements this specification introduces mechanisms that add ITS information to
XML documents, see
The ITS selection mechanisms allows you to provide information about content locally (specified at the XML or HTML element to which it pertains) or globally (specified in another part of the document). Global selection mechanisms can be in the same document, or in a separate file.
This specification has been developed using the ODD (
XSLT transformations are provided by the TEI to create documentation into HTML, XSL FO or LaTeX forms, and to generate RELAX NG documents and DTD. From the RELAX NG documents, James Clark's trang can be used to create XML Schema documents.
Information (e.g. "translate this") captured by ITS markup (e.g.
its:translate='yes') always pertains to one or more XML or HTML nodes (primarily
element and attribute nodes). In a sense, ITS markup “selects” the relevant node(s). Selection
may be explicit or implicit. ITS distinguishes two approaches to selection: (1) local, and (2)
using global rules.
The mechanisms defined for ITS selection resemble those defined in
ITS markup can be used with XML documents (e.g. a DocBook article), or schemas (e.g. an XML Schema document for a proprietary document format). Since each usage defines some specific requirements, ITS markup may take different shapes.
The following two examples sketch the distinction between the local and global approaches, using
the
The document in
For this example to work, the schema developer will need to add the
The document in
Caveat Related to XSLT-based Processing of ITS Selector Attributes
The values of ITS
myElement/descendant-or-self::*/@*
Unfortunately, values like this cause trouble when they are used in XSLT-based processing
of ITS where the values of the ITS
Basically the following restrictions hold for patterns:
Using only XSLT patterns in ITS
*[self::myElement]/@* | myElement//*/@*
For this approach to work, the schema developer needs to add the
For specification of the Translate data category
information, the contents of the
The global, rule-based approach has the following benefits:
The commonality in both examples above is the markup translate='no'. This piece
of ITS markup can be interpreted as follows:
The ITS
term element in DITA)The power of the ITS selection mechanisms comes at a price: rules related to overriding/precedence, and inheritance, have to be established.
The document in its:translate="yes". Note that the global rule is
processed first, regardless of its position inside the document. In the main body of the
document, the default applies, and here it is its:translate="no" that is used to
set “faux pas” as non-translatable.
For some data categories, special attributes add or point to information about the selected
nodes. For example, the Localization Note data category can
add information to selected nodes (using a
The functionality of adding information to the selected nodes is available for each data category
except Language Information. Pointing to existing
information is not possible for data categories that express
The functionalities of adding information and pointing to existing information are
The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”,
“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be
interpreted as described in
The namespace URI that MUST be used by implementations of this specification is:
The namespace prefix used in this specification for this URI is “its”. It is recommended that implementations of this specification use this prefix.
In addition, the following namespaces are used in this document:
http://www.w3.org/2001/XMLSchema for the XML Schema namespace, here used with
the prefix “xs”http://relaxng.org/ns/structure/1.0 for the RELAX NG namespace, here used
with the prefix “rng”http://www.w3.org/1999/xlink for the XLink namespace, here used with the
prefix “xlink”
This specification provides schemas in the format of XML DTD, XML Schema, or RELAX NG. However, these schemas are only non-normative; conformance for ITS markup declarations defines only mandatory positions of ITS declarations in schemas. This makes it possible to use ITS with any schema language that allows for using these positions.
For each data category, ITS distinguishes between the following:
The Translate data category conveys information as to whether a piece of content should be translated or not.
The simplest formalization of this prose description on a schema language independent level
is a
Selection relies on the information that is given in
the XML Information Set
The selection of the ITS data categories applies to textual
values contained within element or attribute nodes. In some cases these nodes form pointers
to other resources; a well-known example is the
The attributes
The ITS schemas in
The usage of the term
This specification defines two types of conformance: conformance of 1) ITS markup declarations , and conformance of 2) processing expectations for ITS Markup. These conformance types complement each other. An implementation of this specification MAY use them separately or together.
Full implementations of this conformance type will implement all markup declarations for ITS. Statements related to this conformance type MUST list all markup declarations they implement.
Since the ITS markup declarations are schema language independent, each schema language can
use its own, possibly multiple, mechanisms to implement the conformance clauses for ITS
markup declarations. For example, an XML DTD can use parameter entities to encapsulate the
ITS local attributes, or declare them directly for
each element. The appropriate steps to integrate ITS into a schema depend on the design of
this schema (e.g. whether it already has a customization layer that uses parameter
entities). The ITS schemas in the format of XML DTD, XML Schema and RELAX NG in
Application-specific processing (that is processing that goes beyond the computation of ITS information for a node) such as automated filtering of translatable content based on the Translate data category is not covered by the conformance clauses below.
The ITS Working group provides a test suite to help implementers to write applications that support the ITS specifications. The test suite provides pairs of input and output files.
Statements related to this conformance type MUST list all data categories they implement, and for each data category which type of selection they support.
The version of the ITS schema defined in this specification is its:version) MUST be provided at
the root element of the document. If there is both a
Each XML document can have a different version. That is: if external rules are linked via an
XLink
ITS data categories can appear in two places:
The two locations are described in detail below.
Global, rule-based selection is implemented using the
If there is more than one
Depending on the data category and its usage, there are
additional attributes for adding information to the selected nodes, or for pointing to
existing information in the document. For example, the Localization Note data category can be used for adding notes to selected nodes,
or for pointing to existing notes in the document. For the former purpose, a
Each data category allows users to add information to the selected nodes except for language information. Pointing to existing
information is not possible for data categories that express
The functionalities of adding information and pointing to existing information are
Global rules can appear in the XML document they will be applied to, or in a separate XML
document. The precedence of their processing depends on these variations. See also
Markup for global, rule-based selection is defined as follows.
Global rules work in HTML5 as follows.
link element, with the
link relation its-rules.Using XPath in global rules linked from HTML5 documents does not create an additional burden to implementers. Parsing HTML5 content produces a DOM tree that can be directly queried using XPath, functionality supported by all major browsers.
Local selection in XML documents is realized with local ITS attributes, the
The content model of
The data category determines what is being selected. The necessary data category specific
defaults are described in
By default the content of all elements in a document is translatable. The attribute
its:translate="no" in the its:translate="yes" in the its:translate="no" in
The default directionality of a document is left-to-right. The its:dir="rtl"
in the
Markup for local selection is defined as follows. The attribute group att.local.no-ns.attributes contains ITS
attributes in no namespace and is used with the ITS elements
The its- attributes. The
definition of the two attributes in HTML5 is compatibly, that is it provides the same
values and interpretation, as the definition for the two data categories Translate and Directionality.
Rule elements have attributes which contain asbolute and
relative selectors. Interpretation of these selectors depends on the actual query languge.
The query language is set by
XPath 1.0 is identified by xpath value in
The absolute selector MUST be an XPath expression which
starts with "/". That is, it must be an
AbsoluteLocationPath or union of
AbsoluteLocationPaths as described in XPath 1.0.
This ensures that the selection is not relative to a specific location. The resulting
nodes MUST be either element or attribute nodes.
Context for evaluatiation of the XPath expression is as follows:
Context node is set to Root Node.
Both context position and context size are 1.
All variables defined by
All functions defined in the XPath Core Function Library are available. It is an error for an expression to include a call to any other function.
The set of namespace declarations are those in scope on the element which has the
attribute in which the expression occurs. This includes the implicit declaration
of the prefix xml required by the the XML
Namespaces Recommendation; the default namespace (as declared by
xmlns) is not part of this set.
The term element from the TEI is in a namespace
http://www.tei-c.org/ns/1.0.
The
The relative selector MUST use a RelativeLocationPath as described in XPath 1.0.
The XPath expression is evaluated relative to the nodes selected by the selector
attribute. The following attributes point to existing information:
Context for evaluatiation of the XPath expression is same as for absolute selector with the following changes:
Nodes selected by the expression in the
Context node comes from the current node list.
The context position comes from the position of the current node in the current node list; the first position is 1.
The context size comes from the size of the current node list.
CSS Selectors are identified by css value in
Absolute selector MUST be interpreted as selector as defined in Selectors Level 3. Both simple selectors and groups of selectors can be used.
Relative selector MUST be interpreted as selector as
defined in Selectors Level 3. Selector is not
evaluated against the complete document tree but only against subtrees rooted at nodes
selected by selector in the
ITS processors MAY support additional query languages. For each additional query language processor MUST define:
Future versions of this specification MAY define additional
query languages. The following query language identifiers are reserved: xpath,
css, xpath2, xpath3, xquery,
xquery3, xslt2, xslt3.
A
Implementation MUST support the
The
The $LCID variable.
In this case, only the msg element with the attribute lcid set
to
In XSLT-based applications, it may make sense to map ITS parameters directly to XSLT parameters. To avoid naming conflicts one can use a prefix with the attribute name's value to distinguish between the ITS parameters and the XSLT parameters.
One way to associate a document with a set of external ITS rules is to use the optional XLink
The rules contained in the referenced document MUST be
processed as if they were at the top of the
The example demonstrates how metadata can be added to ITS rules.
The result of processing the two documents above is the same as processing the following document.
Applications processing global ITS markup MUST recognize the
XLink
External rules may also have links to other external rules. The linking mechanism is recursive, the deepest rules being overridden by the top-most rules, if any.
The following precedence order is defined for selections of ITS information in various positions (the first item in the list has the highest precedence):
Global selections in documents (using a
Inside each
If identical selections are defined in different rules elements within one document, the selection defined by the last takes precedence.
ITS does not define precedence related to rules defined or linked based on non-ITS mechanisms (such as processing instructions for linking rules).
In case of conflicts between global selections via multiple rules elements, the last rule has higher precedence.
The precedence order fulfills the same purpose as the built-in template rules of
The two elements
The first rule specifies that
The second rule indicates that when
Some markup schemes provide markup which can be used to express ITS data categories. ITS data
categories can be associated with such existing markup, using the global selection mechanism
described in
Associating existing markup with ITS data categories can be done only if the processing
expectations of the host markup are the same as, or greater than, those of ITS. For example, the
In this example, there is an existing translate attribute in DITA, and it is
associated with the ITS semantics using the its:rules section. Similarly, the DITA
dt and term elements are associated with the ITS Terminology data category.
Global rules can be associated with a given XML document using different means:
By using an
This section will be written in an updated version of this document.
The following table summarizes for each data category which selection, default value, and inheritance and overriding behavior applies.
In this example, the content of all the its:translate="no"
attribute in the
The localization note for the two first its:locNote attribute.
The data categories differ with respect to defaults. This is due to existing standards and practices. It is common practice for example that information about translation refers only to textual content of an element. Thus, the default selection for the Translate data category is the textual content.
The Translate data category expresses information about
whether the content of an element or attribute should be translated or not. The values of
this data category are
The Translate data category can be expressed with global
rules, or locally on an individual element. The information applies to the textual content
of the element,
GLOBAL: The
The
LOCAL: The following local markup is available for the Translate data category:
It is not possible to override the Translate data
category settings of attributes using local markup. This limitation is consistent with
the advised practice of not using translatable attributes. If attributes need to be
translatable (e.g., an HTML alt attribute), then this must be declared
globally.
The local its:translate="no" specifies that the content of
The local translate="no" attribute specifies that the content of
span must not be translated.
Note:
The Localization Note data category is used to communicate notes to localizers about a particular item of content.
This data category can be used for several purposes, including, but not limited to:
enabledin isolation without knowing the gender, number and case of the thing it refers to.)
Two types of informative notes are needed:
Editing tools may offer an easy way to create this type of information. Translation tools can be made to recognize the difference between these two types of localization notes, and present the information to translators in different ways.
The Localization Note data category can be expressed
with global rules, or locally on an individual element. The information applies to the
textual content of the element,
GLOBAL: The
Exactly one of the following:
The
The
The
The
LOCAL: The following local markup is available for the Localization Note data category:
One of the following:
It is generally recommended to avoid using attributes to store text, however, in this specific case, the need to provide the notes without interfering with the structure of the host document is outweighing the drawbacks of using an attribute.
The Terminology data category is used to mark terms and optionally associate them with information, such as definitions. This helps to increase consistency across different parts of the documentation. It is also helpful for translation.
Existing terminology standards such as
The Terminology data category can be expressed with global rules, or locally on an individual element. There is no inheritance. The default is that neither elements nor attributes are terms.
GLOBAL: The
Exactly one of the following:
LOCAL: The following local markup is available for the Terminology data category:
The Directionality data category allows the user to
specify the base writing direction of blocks, embeddings and overrides for the Unicode
bidirectional algorithm. It has four values:
ITS defines only the values of the Directionality data category and their inheritance. The behavior of text labeled in this way may vary, according to the implementation. Implementers are encouraged, however, to model the behavior on that described in the CSS 2.1 specification or its successor. In such a case, the effect of the data category's values would correspond to the following CSS rules:
Data category value:
CSS rule:
*[dir="ltr"] { unicode-bidi: embed; direction: ltr}
Data category value:
CSS rule:
*[dir="rtl"] { unicode-bidi: embed; direction: rtl}
Data category value:
CSS rule:
*[dir="lro"] { unicode-bidi: bidi-override; direction:
ltr}
Data category value:
CSS rule:
*[dir="rlo"] { unicode-bidi: bidi-override; direction:
rtl}
More information about how to use this data category is provided by
The Directionality data category can be expressed with
global rules, or locally on an individual element. The information applies to the textual
content of the element,
GLOBAL: The
In this document the right-to-left directionality is marked using a
The direction="rtlText" have right-to-left content.
LOCAL: The following local markup is available for the Directionality data category:
On the first its:dir="rtl" attribute indicates a
right-to-left content.
Note:
The Ruby data category is used for a run of text that is associated with another run of text, referred to as the base text. Ruby text is used to provide a short annotation of the associated base text. It is most often used to provide a reading (pronunciation) guide.
The Ruby data category can be expressed with global rules, or locally. There is no inheritance.
GLOBAL: The
Where legacy formats do not contain ruby markup, it is still possible to associate ruby
text with a specified range of document content using the
LOCAL: In a document, the Ruby data category is realized
with a
All these elements share the attributes of the
The structure of the content model for the
The structure of ruby defined in section 5.4 of
The element xml:lang. The
The following
The Language Information data category only
provides for rules to be expressed at a global level. Locally users are able to use
Applying the Language Information data category
to
The Language Information data category can be
expressed only with global rules. The information applies to the textual content of the
element,
GLOBAL: The
The Elements Within Text data category reveals if and how an element affects the way text content behaves from a linguistic viewpoint. This information is for example relevant to provide basic text segmentation hints for tools such as translation memory systems. The values associated with this data category are:
<strong>Appaloosa horses</strong> have spotted coats.
Palouse horses<fn>A Palouse horse is the same as an Appaloosa.</fn>
have spotted coats.
<li>Palouse horses: <p>They have spotted coats.</p>
<p>They have been bred by the Nez Perce.</p> </li>
The Elements Within Text data category can be expressed with global rules, or locally on an individual element. There is no inheritance. The default is that elements are not within text.
GLOBAL: The
LOCAL: The following local markup is available for the Elements Within Text data category:
The Domain data category is used to identify the domain of content.
This data category addresses various challenges:
meta element. The Domain data category
addresses this by providing a mechanism to point to this information.The Domain data category can be expressed only with global rules.
The information applies to the textual content of the element,
GLOBAL: The
Although the DC.subject in Web pages or other types
of content.
Values used in the http://example.com/domains/automotive. The
The body
element is in the domain expressed by the HTML meta element with the
name attribute, value DC.Subject. The
meta element.
The body
element is in the domain expressed by associated values. The automotive is available in the source content, and auto is
used within the consumer tool, e.g. a machine translation system.
In source content, if available, it is recommended to use dublin core subject as the
metadata term for domain information. In HTML, this can be achieved via a
meta element with the name="DC.subject" attribute.
In the area of machine translation (e.g. machine translation systems or systems
harvesting content for machine translation training), there is no agreed upon set of
value sets for domain. Nevertheless it is recommended to use a small set of values both
in source content and within consumer tools, to foster interoperability. If larger value
sets are needed (e.g. detailed terms in the law or medical domain), mappings to the
smaller value set needed for interoperability should be provided. An example would be a
domainMapping="'criminal law' law, 'property law' law, 'contract law'
law".
It is possible to have more than one domain associated with a piece of content. For example, if the consumer tool is a statistical machine translation engine, it could include corpora from all domains available in the source content in training the machine translation engine.
The consumer machine translation engine might choose to ignore the domain and take a one size fits all approach, or may be selective in which domains to use, based on the range of content marked with domain. For example, if the content has hundreds of sentences marked with domain 'automotive' and 'medical', but only a couple of sentences marked with additional domains 'criminal law' and 'property law', the consumer tool may opt to include its domains 'auto' and 'medicine', but not 'law', since the extra training resources does not justify the improvement in the output.
The Disambiguation data category is used to communicate the mentions of specific concepts that may require special handling in the localization of the document.
This data category can be used for several purposes, including, but not limited to:
We introduce the following concepts:
Cityin
I am going to the Citymay be disambiguated in one of the WordNet synsets that can be represented by
city, an RDF ontology concept of a City that could represent a subclass of a PopulatedPlace, or the center area of a particular city, e.g. London City.
Two types of Disambiguation data categories are needed to identify:
Text analysis engines, such as named entity recognizers, named entity, concept and word sense disambiguators can offer an easy way to create this information. Content management tools can present and visualize this information or use it to index their content. Machine translations systems may use it for training and translation when dealing with proper names and edge cases.
The Disambiguation data category can be expressed with global rules, or locally on an individual element. The information applies to the textual content of the element. There is no inheritance. The entity type follows inheritance rules.
GLOBAL: The
LOCAL: The following local markup is available for the Disambiguation data category:
While the
The distinction between disambiguating word sense and entities is mainly in the different semantics: whereas word sense disambiguation targets literal words and their senses on the lexical level, entity disambiguation targets real-world concepts that are behind the selected phrases on the conceptual level.
When serializing the ITS markup in HTML5, the preferred way is to serialize in RDFa Lite or Microdata due to the existing search and crawling infrastructure that is able to consume this kind of data.
See
Companion document, having the mapping data for
The Locale Filter data category specifies that a node is only applicable to certain locales.
This data category can be used for several purposes, including, but not limited to:
The Locale Filter data category associates with each
selected node a list of extended language ranges conforming to
To express that all locales should be included, one can use the wildcard
The Locale Filter data category can be expressed with
global rules, or locally on an individual element. The information applies to the textual
content of the element,
Implementations MUST NOT combine lists of language ranges from multiple rules or local attributes.
GLOBAL: The
The
The
LOCAL: The following local markup is available for the Locale Filter data category:
The Provenance data category will be defined in an updated version of this document. For details of the proposed data category, see the ITS 2.0 Requirements document.
The TextAnalyisAnnotation data category will be defined in an updated version of this document. For details of the proposed data category, see the ITS 2.0 Requirements document.
The External Resource data category indicates that a node represents or references potentially translatable data in a resource outside the document. Examples of such resources are external images and audio or video files.
The External Resource data category can be expressed only with global rules. There is no inheritance. There is no default.
GLOBAL: The
The imagedata,
audiodata and videodata elements contain references to
external resources. These references are expressed via a fileref attribute.
The
video elements
The two src and the
poster attributes at HTML5 video elements. These
attributes identify different external resources, and at the same time contain the
references to these resources. For this reason, the
src and poster respectively. The underlying HTML5 document
is given in
Some formats, such as those designed for localization or for multilingual resources, hold the same content in different languages inside a single document. The Target Pointer data category is used to associate the node of a given source content (i.e. the content to be translated) and the node of its corresponding target content (i.e. the source content translated into a given target language).
This specification makes no provision regarding the presence of the target nodes or their content: A target node may or may not exist and it may or may not have content.
This data category can be used for several purposes, including but not limited to:
Extract the source content to translate and put back the translation at its proper location.
Compare source and target content for quality verification.
Re-use existing translations when localizing the new version of an existing document.
Access aligned bi-lingual content to build memories, or to train machine translation engines.
In general, it is recommended to avoid developing formats where the same content is
stored in different languages in the same document, unless for very specific use cases.
See the best practices Working
with multilingual documents
from
The Target Pointer data category can be expressed only with global rules. The information applies to the textual content of the element. There is no inheritance. There is no default.
GLOBAL: The
The source node and the target node may be of different types, but the target node must be able to contain the same content of the source node (e.g. an attribute node cannot be the target node of a source node that is an element with children).
The Id Value data category indicates a value that can be used as unique identifier for a given part of the content.
The recommended way to specify a unique identifier is to use Defining markup for unique identifiers
from
Providing a unique identifier that is maintained in the original document can be use for several purposes, for example:
Allow automated alignment between different versions of the source document, or between source and translated documents.
Improve the confidence in leveraged translation for exact matches.
Provide back-tracking information between displayed text and source material when testing or debugging.
The Id Value data category only provides for rules
to be expressed at a global level. Locally, users are able to use
Applying the Id Value data category to
The id Value data category can be expressed only with global rules. There is no inheritance. There is no default.
GLOBAL: The
xml:id is present for
the selected node, the value of the xml:id attribute MUST take precedence over the The <text> element is the value of the attribute name of
its parent element.
The <text> and
<desc> are translatable, but they have only one corresponding
identifier, the name attribute in their parent element.
To make sure the identifier is unique for both the content of <text> and
the content of <desc>, the XPath expression concat(../@name,
'_t') gives the identifier "settingsMissing_t" for the content of
<text> and the expression concat(../@name, '_d') gives
the identifier "settingsMissing_d" for the content of <desc>.
When an <res> element, and “retryTip” for
the second <res> element.
The Preserve Space data category indicates how whitespace should be handled in content. The possible values for the Preserve Space data category are "default" and "preserve" and carry the same meaning as the corresponding values of the xml:space attribute. The default value is "default".
The Preserve Space data category can be expressed with
global rules, or locally using the
The Preserve Space data category is not applicable to
HTML5 documents because
GLOBAL: The
The preserveSpaceRule element specifies that whitespace in all verse elements must be treated literally.
LOCAL: The
The standard
The Localization Quality Issue data category is used to express information related to localization quality assessment tasks. Such tasks can be conducted on the translation of some source text into a target language or on the source text itself where its quality may impact on the localization process.
This data category can be used in a number of ways, including the following example scenarios:
An automatic quality checking tool flags a number of potential quality issues in an XML or HTML file and marks them up using ITS 2.0 markup. Other tools in the workflow then examine this markup and decide whether the file needs to be reviewed manually or passed on for further processing without a manual review stage.
A quality assessment process identifies a number of issues and adds the ITS markup to a rendered HTML preview of an XML file along with CSS styling that highlights these issues. The resulting HTML file is then sent back to the translator to assist his or her revision efforts.
A human reviewer working with a web-based tool adds quality markup, including comments and suggestions, to a localized text as part of the review process. A subsequent process examines this markup to ensure that changes were made.
The data category defines four pieces of information:
The Localization Quality Issue data category can be expressed with global rules,
or locally on individual elements. The information applies to the textual content of the element,
GLOBAL: The
A required
At least one of the following:
Exactly one of the following:
A
A
Exactly one of the following:
A
A
Exactly one of the following:
A
A
None or exactly one of the following:
A
A
None or exactly one of the following:
A
A
The attributes
The
The
The
This document is used in
LOCAL: Using the inline markup to represent the data category locally is limited to a single
occurrence for a given content (e.g. one cannot have different
The following local markup is available for the Localization Quality Issue data category:
Either (inline markup):
At least one of the following attributes:
A
A
An optional
An optional
Or (standoff markup):
A
An element <span loc-quality-issues> in HTML) which contains:
One or more elements <span its-loc-quality-issue> in HTML),
each of which contains:
At least one of the following attributes:
A
A
An optional
An optional
Important: When the attributes <span loc-quality-issue>in HTML) where they are declared.
The attributes
In this example several spans of content are associated with a quality issue.
The following example shows a document using local standoff markup to encode several issues.
The mrk element delimits the content to markup and holds a
The following example shows a document using local standoff markup to encode several issues.
But because, in this case, the mrk element does not allow attributes from another
namespace we cannot use ref attribute of any mrk elements that has its attribute
type set to "x-itslq".
The following example shows a document using local standoff markup to encode several issues.
The span element delimits the content to markup and holds a span element where the issues are listed within a set of
other special span elements.
The Localization Quality Précis data category is used to express an overall measurement of the localization quality of a document.
This data category allows to specify a quality score for a given document, as well as to indicate what constitutes a passing score. It also allows to point to a profile where the quality assessment model used for the scoring is described.
The Localization Quality Précis data category can be expressed with global rules, or locally
on individual elements. The information applies to the textual content of the element,
GLOBAL: The
A required
Exactly one of the following:
A
A
None or exactly one of the following:
A
A
None or exactly one of the following:
A
A
The attributes
The following example shows how to us the
The following example shows how to us the
The following example shows how to us the
This document is used in
LOCAL: The following local markup is available for the Localization Quality Précis data category:
A
An optional
An optional
The
The
The MT Confidence data category will be defined in an updated version of this document.
The
The values listed in the following table are allowed for other,
which is reserved strictly for values that cannot be mapped to these values.
The following list summarizes elements relating to global rules and their attributes:
Pointer to external rules files.
Type of pointer to external rules files.
Legal values are:
Version of the ITS schema.
The text direction for the selection.
Legal values are:
Absolute selector identifying the nodes to be selected.
Relative selector pointing to a node that contains language information.
Absolute selector identifying the nodes to be selected.
The Translate data category information to be attached to the current node.
Localization note.
The type of localization note.
URI referring to the location of the localization note.
Pointer to a resource containing information about the term.
Indicates a term locally.
The text direction for the context.
Relative selector pointing to a node that holds the localization note.
The type of localization note.
Legal values are:
URI referring to the location of the localization note.
Relative selector pointing to a node that holds the URI referring to the location of the localization note.
Absolute selector identifying the nodes to be selected.
Indicates whether the selection is a term or not.
Legal values are:
URI referring to the resource providing information about the term.
Relative selector pointing to a node containing a URI referring to the resource providing information about the term.
Relative selector expression pointing to a node containing information about the term.
Absolute selector identifying the nodes to be selected.
The Translate data category information to be applied to selected nodes.
Legal values are:
Absolute selector identifying the nodes to be selected.
States whether current context is regarded as "within text".
Legal values are:
Absolute selector identifying the nodes to be selected.
Relative selector
pointing to a node that corresponds to a ruby element
Relative selector
pointing to a node that corresponds to a rt element
Relative selector
pointing to a node that corresponds to a rp element
Absolute selector identifying the nodes to be selected.
The following list summarizes elements that are available for local use:
The following list summarizes attributes that are available for local use, with the local elements mentioned above, or with other elements in a host schema:
The Translate data category information to be attached to the current node.
Localization note.
The type of localization note.
URI referring to the location of the localization note.
Pointer to a resource containing information about the term.
Indicates a term locally.
The text direction for the context.
The following schemas define ITS elements and attributes and could be used as building blocks when you want to integrate ITS markup into your own XML vocabulary. You can see examples of such integration in Best Practices for XML Internationalization. The schemas are not intended to be used alone for validation of documents with ITS markup.
The following schemas are provided:
Several constraints of ITS markup cannot be validated with ITS schemas. The following
The following
[Source file: its.nvdl]
The NVDL schema depends on the following two schemas:
RELAX NG schema for ITS elements
RELAX NG schema for ITS attributes
The following log records major changes that have been made to this document since the ITS 2.0 Working Draft 31 July 2012.
The following log records major changes that have been made to this document since the ITS 2.0 Working Draft 26 June 2012.
2.0.The following log records major changes that have been made to this document between the ITS 1.0 Recommendation and this document.
This document has been developed with contributions by the MultilingualWeb-LT Working Group: Mihael Arcan (DERI Galway at the National University of Ireland, Galway, Ireland), Pablo Badía (Linguaserve), Aaron Beaton (Opera Software), Luis Bellido (Universidad Politécnica de Madrid), Aljoscha Burchardt (German Research Center for Artificial Intelligence (DFKI) Gmbh), Nicoletta CalzolarI (CNR--Consiglio Nazionale delle Ricerche), Giuseppe Deriard (Linguaserve), Pedro Luis Díez Orzas (Linguaserve), David Filip (University of Limerick), Leroy Finn (Trinity College Dublin), Karl Fritsche (Cocomore AG), Daniel Grasmick (Lucy Software and Services GmbH), Declan Groves (Centre for Next Generation Localisation), Moritz Hellwig (Cocomore AG), Tao Hong (Baidu, Inc.), Dominic Jones (Trinity College Dublin), Milan Karásek (Moravia Worldwide), Jirka Kosek (University of Economics, Prague), Michael Kruppa (Cocomore AG), Maxime Lefrançois (Institut National de Recherche en Informatique et en Automatique (INRIA)), David Lewis (Trinity College Dublin), Fredrik Liden (ENLASO Corporation), Arle Lommel (German Research Center for Artificial Intelligence (DFKI) Gmbh), Shaun McCance ((public) Invited expert), Jan Nelson (Microsoft Corporation), Des Oates (Adobe Systems Inc.), Carina Pellar (Cocomore AG), Georg Rehm (German Research Center for Artificial Intelligence (DFKI) Gmbh), Phil Ritchie (VistaTEC), Thomas Rüdesheim (Lucy Software and Services GmbH), Nieves Sande (German Research Center for Artificial Intelligence (DFKI) Gmbh), Felix Sasaki (W3C Staff), Yves Savourel (ENLASO Corporation), Jörg Schütz (W3C Invited Experts), Ankit Srivastava (Centre for Next Generation Localisation), Tadej Štajner (Jozef Stefan Institute), Olaf-Michael Stefanov ((public) Invited expert), Najib Tounsi (Ecole Mohammadia d'Ingenieurs Rabat (EMI)), Ronny Unger (Cocomore AG), Piek Vossen (Vrije Universiteit).