Internationalization Tag Set (ITS)

1 Introduction

This section is informative.

This document defines data categories and their implementation as a schema that can be used with new and existing schemas to support the internationalization and localization of schemas and documents. The implementation is provided for three schema languages: XML DTDs [XML 1.0], XML Schema [XML Schema] and RELAX NG [RELAX NG]. In addition, implementations as fixed modularizations of various existing vocabularies (e.g. XHTML [XHTML 1.0], DocBook [DocBook], Open Document [OpenDocument]) are provided.

Requirements for the internationalization and localization of markup are formulated in [ITS REQ]. This working draft responds to only a part of these requirements. Some of the following items are mentioned in [ITS REQ], but are not covered in this working draft:

These requirements have not been addressed at this point in time since the ITS Working Group expects that it will take a substantial amount of time to address them, but that the framework suggested in this document will accomodate them.

Other requirements will also be addressed in the future in a document on techniques for internationalization and localization of XML schemas and XML instances.

1.1 Background: Motivation for ITS

Content or software that is authored in one language (i.e. source language) is often made available in additional languages. This is done through a process called localization, where the original material is translated and adapted to the target audience.

From the viewpoints of feasibility, cost, and efficiency, it is important that the original material should be suitable for localization. This is achieved by proper design and development, and the corresponding process is referred to as internationalization.

The increasing usage of XML as a medium for documentation-related content (e.g. DocBook, being a format for writing structured documentation, well suited to computer hardware and software manuals) and software-related content (e.g. the eXtensible User Interface Language [XUL]) provides growing challenges and opportunities in the domain of XML internationalization and localization.

Example 1: Document with localizable information

In this example the text in square brackets [...] shows the parts that need to be localized. Without localization-specific information it is difficult for tools to detect that PhaseCode should not be translated, or that the title attribute sometimes does and sometimes does not.

<Manual>
 <Info>
  <PhaseCode>Review Level</PhaseCode>
  <FormNo>8U81-GS-52C</FormNo>
  <Name>[Owner's Manual]</Name>
  ...
 </Info>
 <Section id="0" title="#Introduction#">
  <Ltitle id="005" title="#ZOOM#">
   <Mtitle id="00501" title="[Getting started]" option="no" cols="1">
    <MultiCol cols="1">
     <Text>[Some text to localize]</Text>
     ...
    </Multicol>
   </Mtitle>
  </Ltitle>...
</Manual>

Example 2: Document with localizable information

In the example below predicting what needs to be translated depends on more than the name of the element, but also on some attribute value of its parent.

<dialogue xml:lang="en-gb">
 <rsrc id="123">
  <component id="456" type="image">
   <data type="text">images/cancel.gif</data>
   <data type="coordinates">12,20,50,14</data>
  </component>
  <component id="789" type="caption">
   <data type="text">[Cancel]</data>
   <data type="coordinates">12,34,50,14</data>
  </component>
 </rsrc>
</dialogue>

Example 3: Document with localizable information

In the example below, there are no clear mechanism allowing one to know which string element needs to be translated.

<resources>
 <section id="Homepage">
  <arguments>
   <string>page</string>
   <string>childlist</string>
  </arguments>
  <variables>
   <string>POLICY</string>
   <string>[Corporate Policy]</string>
  </variables>
  <keyvalue_pairs>
   <string>Page</string>
   <string>[ABC Corporation - Policy Repository]</string>
   <string>Footer_Last</string>
   <string>[Pages]</string>
   <string>bgColor</string>
   <string>NavajoWhite</string>
   <string>title</string>
   <string>[List of Available Policies]</string>
  </keyvalue_pairs>
 </section>
</resources>

1.2 Out of Scope

The data categories and their implementation as a schema does not address document-external mechanisms or data formats for describing localization-relevant information over and above what is appropriate for inclusion in the format itself. Such mechanisms and data formats, also sometimes called XML Localization Properties, are out of the scope of this document. However, this document specifies a methodology how localization properties and information about internationalization and localization can be applied to various places in schemas and instance documents. See Section 3: Scope of ITS information.

1.3 Usage Scenarios

Information which supports internationalization and localization with respect to XML schemas and XML instances may be used in many ways. Example usages (see section 2 in [ITS REQ]) are:

Content authoring
Terminology creation and translation
Software development

The diversity of these usages leads to a great variety of requirements and possible formalization of an XML language supports information related to internationalization and localization. The concepts described in this document are meant to provide general answers to these sometimes conflicting requirements.

Example 4: Usage scenarios and possible implementations of ITS data categories: Example translatability

A content author needs a simple way to express whether the content of an element or attribute should be translated or not, e.g. an attribute translate. On the other hand, for translations of large document sets based on the same schema, a specification of defaults for translatability and exceptions from the defaults is of importance, e.g. all p elements should be translated, but not p elements inside of an index element.

This specification responds to this variety by introducing the concept of scope.

1.4 Important Design Decisions

Five design decisions are crucial for the development of ITS: data categories, scoping, extensibility, limited impact and technological viability.

About data categories: ITS defines data categories as a description of information for internationalization and localization of XML schemas and documents. This description is independent of its implementation e.g. via an element or attribute. See Section 2.4: Data category for a definition of the term data categories, Section 4: Description of Data Categories for the definition of the various ITS data categories, and Section 6: Markup Declarations for the data category implementations.

About scoping: Content authors need a simple way to express whether the content of an element or attribute should be translated or not, e.g. a translate attribute. On the other hand, for translations of large document sets based on the same schema, a specification of defaults for translatability and exceptions from the defaults is of importance (e.g. all p elements should be translated, but not p elements inside of an index element). This specification responds to these conflicting requirements by introducing a methodology for optionally specifying scoping information, cf. Section 3: Scope of ITS information. The methodology also provides a means for attaching information related to attributes (a task for which no standard means exists yet). The ITS mechanisms for expressing scope need to consider the following:

viable for both XML schemata and XML instances
viable in situ (at the XML node to which it pertains) or dislocated (not at the XML node to which it pertains)

About extensibility: It may be useful or necessary to extend the set of information available for internationalization or localization purposes beyond what is provided by ITS. This specification does not define a general extension mechanism, since ordinary XML mechanisms (e.g. XML Namespaces [XML Names]) may be used.

About limited impact: ITS follows the example from section 4 of [XLink 1.1], by providing mostly global attributes for the implementation of ITS data categories. Avoiding elements for ITS purposes as much as possible assures limited impact on existing markup schemes, see section 3.14 in [ITS REQ]. Only for some requirements additional child elements have to be used, see for example Section 4.5: Ruby.

About technological viability: In order to foster a quick adaptation, ITS was developed with two important criteria in mind:

No dependence on technologies which are yet to be developed
Fit with existing work in the W3C architecture (e.g. use of XPath [XPath 1.0] for scoping)

1.5 Development of this Specification

This specification has been developed using the ODD (One Document Does it all) language of the Text Encoding Initiative ([TEI]). This is a literate programming language for writing XML schemas, with three characteristics:

The element and attribute set is specified using an XML vocabulary which includes support for macros (like DTD entities, or schema patterns), a hierarchical class system for attributes and elements, and creation of modules.
The content models for elements and attributes is written using embedded Relax NG XML notation.
Documentation for elements, attributes, value lists etc is written inline, along with examples and other supporting material.

XSLT transform are provided by the TEI to extract documentation in HTML, XSL FO or LaTeX forms, and to generate Relax NG documents and DTDs. From the Relax NG documents, James Clark's trang can be used to create XML Schema documents.

2 Notation and Terminology

This section is normative.

2.1 Notation and Terminology

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119].

2.2 Namespaces used in this Specification

The namespace URI that must be used by implementations of this specification is:

http://www.w3.org/2005/11/its

The namespace prefix used in this specification for this URI is "its".

In addition, the following namespaces are used in this document:

http://www.w3.org/2000/10/XMLSchema for the XML Schema namespace, here used with the prefix "xs"
http://relaxng.org/ns/structure/1.0 for the RELAX NG namespace, here used with the prefix "rng"

2.3 Schema Language

[Definition: The term schema language refers in this specification to XML DTDs, XML Schema or RELAX NG.]

2.4 Data category

[Definition: ITS defines data category as an abstract concept for a particular type of information for internationalization and localization of XML schemas and documents.]. The concept of a data category is independent of its implementation in an XML environment (e.g. via an element or attribute).

For each data category, ITS distinguishes between the following:

the prose description, cf. Section 4: Description of Data Categories
schema language independent formalization, cf. Section 6: Markup Declarations
schema language specific implementation, cf. Appendix A: Schemas for ITS

Example 5: Data categories and their implementation

The data category translatability conveys mainly information whether a piece of content should be translated or not. The simplest formalization of this prose description on a schema language independent level is a translate attribute with two possible values: yes and no. An implementation on a schema language specific level would be the declaration of the translate attribute in e.g. an XML DTD, an XML Schema document or an RELAX NG document.

An alternative formalization on a schema language independent level is a schemaRule element which conveys via a translate attribute information about translatability. An implementation on a schema language specific level is the declaration of the schemaRule element.

2.5 Scope

[Definition: Scope is a means to describe to what elements and / or attributes an ITS data category and its values should be applied to.]. Scope is discussed in detail in Section 3: Scope of ITS information.

3 Scope of ITS information

This section is normative.

3.1 Relation between Data Categories and Scope

Scope information is always attached to a single data category. The relation between scope and the various data categories is described in Section 4: Description of Data Categories.The scope information - and the data categories - can be realized in various positions, which are defined below.

Example 6: Example for implementation of scope information about translatability, expressed via a translate attribute

<text its:translate="yes" its:translateScope="//p">...
<!-- all p elements should be translated, except the following one -->
 <p its:translate="no" its:translateScope="."/>
</text>

3.2 Position of Scope (Where to Express Information about Scope)

Information about scope can appear in three places:

in a schema: ITS information is expressed as schema annotation, and the scope is the element or attribute declaration which is being annotated
dislocated: scope is expressed as an AbsoluteLocationPath as described in [XPath 1.0]
in an instance document: scope is expressed as an RelativeLocationPath or AbbreviatedStep as described in [XPath 1.0]

The various mechanisms to define scope are defined in detail below.

3.2.1 Scope in a Schema

In Schemas, scoping is expressed via schema annotation [Definition: schema annotation is a schema language specific means to provide information about element, attribute, type etc. declarations. This information is not used by the schema processor, but for external, validation independent applications.]. The scope of the data category depends on the position of the schema annotation. Since schema annotation mechanisms are schema language specific, the following definitions are made:

[Definition: scope for elements in XML Schema is expressed via an xs:appinfo element which is a direct child of the xs:element element and which contains a schemaRule element.]
[Definition: scope for attributes in XML Schema is expressed via an xs:appinfo element which is a direct child of the xs:attribute element and which contains a schemaRule element.]
[Definition: scope for elements in RELAX NG is expressed via a schemaRule element which is a direct child of the rng:element element.]
[Definition: scope for attributes in RELAX NG is expressed via a schemaRule element which is a direct child of the rng:attribute element.]

As for XML DTDs, this specification defines no specific mechanism to express scope within the DTD.

Note: To be able to express scope information for XML DTDs, the mechanisms described in Section 3.2.2: Dislocated Scope can be used.

Example 7: Scope for translatability in an XML Schema

<xs:element name="p">
 <xs:annotation>
  <xs:appinfo>
   <its:schemaRule translate="yes"/>
  </xs:appinfo>
 </xs:annotation> ...
</xs:element>

Example 8: Scope for translatability in Relax NG

<element name="p">
 <its:schemaRule translate="yes"/> ...
</element>

To group several schemaRule elements, a schemaRules element should be used.

Example 9: Grouping schemaRule elements with a schemaRule element

<xs:element name="p">
 <xs:annotation>
  <xs:appinfo>
   <its:schemaRules>
    <its:schemaRule translate="yes"/>
    <its:schemaRule locInfo="This has to be handled carefully"
     locInfoType="alert"/>
   </its:schemaRules>
  </xs:appinfo>
 </xs:annotation> ...
</xs:element>

Several data categories about the same element or attribute declaration should be expressed at the same schemaRule element.

Example 10: Several data categories at the same element

<its:schemaRule translate="yes" locInfo="This has to be handled carefully"
 locInfoType="alert"/>

3.2.2 Dislocated Scope

Dislocated scope information is expressed via a documentRules element. It contains one or more documentRule elements. Each documentRule element has one or more attributes which express data categories, and for each data category attribute an attribute which expresses scope.

The naming convention for the attribute for scope is datacategory + Scope, e.g. translateScope. As for dislocated scope, the value of the attribute must be an XPath expression. It must start with "/", that is, it must be an AbsoluteLocationPath as described in [XPath 1.0]. Only in this way it is assured that the scope information can be applied in a dislocated way.

Dislocated scope information can appear in a schema (e.g. as content of the xs:appinfo element), in an instance file or in a separate XML document. The precedence of the processing of the scope information depends on these variations. See also Section 3.3: Processing of Scope Information.

Note: The difference between schemaRule and documentRule is that schemaRule has no attributes for scope, e.g. no translateScope attribute. The reason is that schemaRule always refers to the element or attribute declaration of which it is part of. In contrast, documentRule can be used everywhere in a schema for dislocated scope information. It is possible to use schemaRule and documentRule together in a schema.

Example 11: Example for using schemaRule and documentRule together in a schema.

<xs:schema>
 <xs:annotation>
  <xs:appinfo>
   <its:documentRule translate="no" translateScope="//p[@editor='john']"/>
<!-- This rule holds for p elements which are edited by John. -->
  </xs:appinfo>
 </xs:annotation>
 <xs:element name="p">
  <xs:annotation>
   <xs:appinfo>
    <its:schemaRule translate="yes"/>
<!-- This rule holds for all p elements -->
    </xs:appinfo>
   </xs:annotation> ...
  </xs:element> ...
 </xs:schema>

3.2.3 Scope in an Instance Document

In instance documents scope is expressed via a combination of an attribute which expresses the data category and a scope attribute for the data category. The naming convention for the attribute for scope is datacategory + Scope, e.g. translateScope. This is identical to Section 3.2.2: Dislocated Scope.

Example 12: scope for the content of an element and all attributes attached to the element.

<meta its:translate="yes" its:translateScope=". | @*"/>

Scope in an instance document must be either expressed via an AbbreviatedStep "." or it must be a RelativeLocationPath as described in [XPath 1.0]. If scope is the AbbreviatedStep ".", its evaluation must always be interpreted as the textual value of the current node, that is, the textual value of an element to which the scope attribute is attached to. For example, <p its:dir="ltr" dirScope="."> ... </p> is used to select the textual content of a p element.

If child elements should be part of the scope, an XPath step expression like descendant-or-self::* should be used.

Example 13: Scope for the content of an element, including all descendant elements.

<p its:translate="no" its:translateScope=".//descendant::*"> ... </p>

To avoid mismatches between the multiple scope attributes, only the following axis should be used in the XPath expression: child, descendant, attribute, descendant-or-self.

Example 14: Scope with various axis

<text its:translate="yes"
 its:translateScope="child::body/descendant::p">
 <body its:translate="no" its:translateScope="descendant::p/attribute:id"> ... 
  <p its:translate="no" its:translateScope="descendant-or-self::*"> ... </p>
 </body>
</text>

Note: The following xml schema datatype can be used to verify that only these axis are used at the beginning of the XPath expression:

Example 15: XML Schema datatype used to restrict XPath expressions

<xs:simpleType name="scopeInline">
 <xs:restriction base="xs:string">
  <xs:pattern value="
      (child::.+)
     | (descendant::.+)
     | (descendant-or-self::.+)
     | (\.//.+)
     | (attribute::/.+)
     | (@.+)
     | (name\(\)=.+)"/>
 </xs:restriction>
</xs:simpleType>

3.3 Processing of Scope Information

3.3.1 Precedence between Scope Information

The following precedence order is defined for scope information:

Scope information in instance documents (in situ, realized with scope attributes or the default scope rules in Section 4: Description of Data Categories), has precedence over scope information in instance documents (dislocated, using documentRule)
Scope information in instance documents (dislocated, using documentRule) has precedence over scope information in an external file (using documentRule)
Scope information in an external file (using documentRule) has precedence over scope information in a schema
In a schema, dislocated scope information expressed via documentRule has precedence over data categories expressed via schemaRule (See also the note in Section 3.2.2: Dislocated Scope).

Note: These proceeding rules fulfill the same purpose as the built-in template rules of [XSLT 1.0].

Example 16: Conflicts between scope information which are resolved via the precedence order

Due to the rules described above, the translatability information via the translate attribute at the p element has precedence before the translatability information at the documentRule element.

<text>
 <head>
  <its:documentRule its:translate="yes" its:translateScope="//p"/>
 </head>
 <body> ...
  <p its:translate="no"> ... </p>
 </body>
</text>

3.3.2 Default Scope

The default scope differs with respect to each data category, see the table in Section 4: Description of Data Categories. For many data categories, it is the textual content of an element and all its child elements. This is different from for example xml:lang, which scope is intended to be also all attributes.

For translatability, the default scope may be reset with the following XPath expression for elements, to be attached to a scope attribute to the element in question: descendant-or-self::*. As for attributes, the expression is descendant-or:self::*/attribute::*.

Example 17: Reset the default scope for elements for translatability

<p its:translate="yes" its:translateScope="descendant-or-self::*"> ... </p>
<p its:translate="no" its:translateScope="descendant-or:self::*/attribute::*"> ... </p>

3.3.3 Conflict between In Situ Scope Information

It is possible that the resolution of scope information leads to contradictions, for example if the default of translatability should be set back to the default for elements and attributes at the same time. Such conflicts occur also if different information about the same data categories should be expressed for attributes at the same element.

Example 18: Reset the default scope for elements for translatability.

<p its:translate="yes" its:translateScope="descendant-or-self::*"> ... </p>

Example 19: Reset the default scope for attributes for translatability. This is not possible at the same element:

<p its:translate="no" its:translateScope="descendant-or:self::*/attribute::*"> ... </p>

Example 20: Make an exception for the title attribute. This is not possible at the same element:

<p its:translate="yes" its:translateScope="attribute::title"> ... </p>

Such conflicts should be resolved via scope information which is attached to a different element node and evaluates to the node in question. To avoid mismatches with other descriptions of scope information, this should be the only case where axis other than child, descendant, attribute, descendant-or-self are used for in situ scope.

Example 21: Resolving the conflict between in situ information via scope information at different elements

<body its:translate="no" its:translateScope="child::p[1]/attribute::*">
 <p its:translate="yes" its:translateScope="descendant-or-self::*"
  title="This should be translated"/>
 <p its:translate="yes" its:translateScope="preceding-sibling::p/attribute::title"/>
</body>

3.4 Mapping In Situ Scope to Dislocated Scope

The in situ description of scope and the dislocated description are just positional variants. All apply to instance documents. It must be possible to convert the in situ descriptions to a dislocated description. This conversion can only be executed in a generic way, if the in situ scope attributes contain only relative path expressions.

Example 22: Conversion between in situ descriptions and dislocated descriptions of scope

<body its:translate="no" its:translateScope="child::p[1]/attribute::*">
 <p its:translate="yes" its:translateScope="descendant-or-self::*"/>
 <p its:translate="yes" its:translateScope="preceding-sibling::p/attribute::title"/>
</body>
<its:rules>
 <its:rule its:translate="no" translateScope="/body/child::p[1]/attribute::*"/>
 <its:rule its:translate="yes"
  its:translateScope="/body/child::p[1]/descendant-or-self::*"/>
 <its:rule its:translate="yes"
  its:translateScope="/body/child::/p[2]/preceding-sibling::p/attribute::title"/>
</its:rules>

3.5 Scope and XPath

When using XPath 1.0 or 2.0 as part of XSLT, the transformation of the document might lead to the loss of ITS scope information. This specification leaves it to an application of ITS what should happen in such cases, since this specification does not mandate XSLT, XQuery or other languages which encompass XPath.

4 Description of Data Categories

This section is normative.

The following table summarizes the relations between data categories and scope.

Data category	Applicable in schema	dislocated scope applicable	default scope in instance document
Translatability	+	+	Textual content of element, including child elements, but excluding attributes
Localization information	+	+	Textual content of element, including child elements, but excluding attributes
Terminology	+	+	Textual content of element, without attributes and child elements
Directionality	-	+	Textual content of element, including attributes and child elements
Ruby	-	+	Textual content of element, without attributes and child elements

Note: The data categories differ with respect to defaults in the instance document for compatibility reasons with existing standards and practices. For example, the dir attribute in [HTML 4.01] refers to the content of the element and all attributes and child elements. Hence, the data category of directionalty has the same scope. On the other hand, it is common practice that information about translatability refers only to textual content of an element. Hence, the data category of translatability has this kind of scope.

4.1 Translatability

4.1.1 Definition

[Definition: The data category translatability expresses information about whether the content of an element or attribute should be translated or not.]. The values of this data category are yes (translatable) or no (not translatable).

Note: This definition of translatability is identical to the definition in [Dita 1.0]. The implementation of this data category is different from [Dita 1.0] since the former allows for expressing scope information.

4.1.2 Implementation

Translatability can be expressed in a schema, dislocated or in an instance document.

In a schema, translatability is expressed via a schemaRule element with a translate attribute. The attribute has the values yes or no.

Example 23: Translatability expressed in a schema

<xs:element name="p">
 <xs:annotation>
  <xs:appinfo>
   <its:schemaRule translate="yes"/>
  </xs:appinfo>
 </xs:annotation> ...
</xs:element>

Dislocated, translatability is expressed via a documentRule element with a translate attribute. The attribute has the values yes or no. In addition, a translateScope attribute is required.

Example 24: Translatability expressed dislocated

<its:documentRules>
 <its:documentRule translate="yes" translateScope="//p"/>
<!-- All p elements should be translated-->
</its:documentRules>

In an instance document, translatability is expressed via a translate attribute with the values yes or no. If no translateScope attribute is present, the scope is the textual content of the element, including child elements, but excluding attributes. If a translateScope attribute is present, the scope is defined by the value of this attribute which is an XPath expression.

Example 25: Translatability expressed in an instance document

In the body element, all elements and attributes except id attributes should be translated. The content of the specified quote element, however, must not be translated.

<book>
 <head>...</head>
 <body its:translate="yes" its:translateScope=".//* or .//@*[not(name()='id')]"> ...
  <p>And he said: you need a new <quote its:translate="no">motherboard</quote>
  </p> ... 
 </body>
</book>

4.2 Localization Information

4.2.1 Definition

[Definition: The data category localization information is used to communicate information to localizers about a particular item of content.]

This data category has several purposes:

Tell the translator how to translate parts of the content
Expand on the meaning or contextual usage of a particular element, such as what a variable refers to or how a string will be used on the user interface
Clarify ambiguity and show relationships between items sufficiently to allow correct translation (e.g. in many languages it is impossible to translate the word "enabled" in isolation without knowing the gender, number and case of the thing it refers to.)
Indicate why a piece of text is emphasized (important, sarcastic, etc.)

Two types of informative notes are needed

An alert contains information that the translator must read before translating a piece of text. Example: an instruction to the translator to leave parts of the text in the source language.
A description provides useful background information that the translator will refer to only if they wish. Example: a clarification of ambiguity in the source text.

4.2.2 Implementation

Localization information can be expressed in a schema, dislocated or in an instance document.

In a schema, localization information is expressed via a schemaRule element with a locInfo attribute. The type of the localization information is expressed via a locInfoType attribute with the values alert or description.

Example 26: Localization information expressed in a schema

<xs:element name="p">
 <xs:annotation>
  <xs:appinfo>
   <its:schemaRule locInfo="This has to be handled carefully" locInfoType="alert"/>
  </xs:appinfo>
 </xs:annotation> ...
</xs:element>

Dislocated, localization information is expressed via a documentRule element with the attributes locInfo and locInfoType. In addition, a locInfoScope attribute is required.

Example 27: Localization information expressed dislocated

<its:documentRules>
 <its:documentRule locInfo="This p element has to be handled carefully"
  locInfoType="alert" locInfoScope="/body/p[1]"/>
</its:documentRules>

In an instance document, localization information is expressed via the attributes locInfo and locInfoType. If no locInfoScope attribute is present, the scope is the textual content of element, including child elements, but excluding attributes. If a locInfoScope attribute is present, the scope is defined by the value of this attribute which is an XPath expression.

Example 28: Localization information expressed in an instance document

<book>
 <head>...</head>
 <body its:locInfo="Just translate all p elements." its:locInfoType="alert"
  its:locInfoScope="//p"> ...
  <p its:locInfo="This p element has to be handled
   carefully" its:locInfoType="alert">And he said: you need a new
   <quote>motherboard</quote>
  </p> ...
 </body>
</book>

4.3 Terminology

4.3.1 Definition

The terminology data category is used to mark terms. This helps to increase consistency across different parts of the documentation. It is also helpful for translation.

4.3.2 Implementation

The terminology data category can be expressed in a schema, dislocated or in an instance document.

In a schema, the terminology data category is expressed via a schemaRule element with a term attribute, which has the value yes.

Example 29: The terminology data category expressed in a schema

<xs:element name="span">
 <xs:annotation>
  <xs:appinfo>
   <its:schemaRule term="yes"/>
<!-- All span elements are used to mark up terms-->
  </xs:appinfo>
 </xs:annotation> ...
</xs:element>

Dislocated, the terminology data category is expressed via a documentRule element with the term attribute, which has the value yes. A termScope attribute is required. In addition, an optional termRef attribute can be used to refer to external information about the term. The datatype of termRef is xs:anyURI.

Example 30: The terminology data category expressed dislocated

<its:documentRules>
 <its:documentRule term="yes" termScope="/body/p[1]/span"
  termRef="http://example.com/termdatabase/#x142539"/>
</its:documentRules>

In an instance document, the terminology data category is expressed via a term attribute, which has the value yes, and an optional termRef attribute. If no termScope attribute is present, the scope is the textual content of the element, without elements / attributes. If a termScope attribute is present, the scope is defined by the value of this attribute which is an XPath expression.

Example 31: The terminology data category expressed in an instance document

<book>
 <head>...</head>
 <body> ... 
  <p>And he said: you need a new <quote its:term="yes">motherboard</quote></p> ...
 </body>
</book>

4.4 Directionality

4.4.1 Definition

This data category expresses the directionality of a piece of text. Its values are ltr or rtl. This definition is compliant with the dir attribute in [HTML 4.01], except that [HTML 4.01] does not allow for scoping.

In addition, bdo with the value yes can be supplied. It has the same purpose as the bdo element in [HTML 4.01].

4.4.2 Implementation

The dir attribute is used for the implementation of the directionality data category. It has the two values ltr or rtl. An optional bdo attribute with the value yes can be provided.

Directionality can be expressed dislocated or in an instance document.

Dislocated, directionality is expressed via a documentRule element with the dir attribute which has the values ltr or rtl, and an optional bdo attribute with the value yes. In addition, a dirScope attribute is required.

Example 32: Directionality expressed dislocated

<its:documentRules>
 <its:documentRule dir="rtl" dirScope="/body/p[1]/quote[xml:lang='he']"/>
<!-- Some Hebrew quotation -->
</its:documentRules>

In an instance document, directionality is expressed via a dir attribute, which has the values ltr or rtl, and an optional bdo attribute with the value yes. If no dirScope attribute is present, the scope is the textual content of the element, including all child element and attributes. If a dirScope attribute is present, the scope is defined by the value of this attribute which is an XPath expression.

Example 33: Directionality expressed in an instance document

<book>
 <head>...</head>
 <body> ...
  <p>And he said: <quote its:dir="rtl"> ... a Hebrew quotation  ... </quote></p> ... 
 </body>
</book>

4.5 Ruby

4.5.1 Definition

The data category ruby is used for a run of text that is associated with another run of text, referred to as the base text. Ruby text is used to provide a short annotation of the associated base text. It is most often used to provide a reading (pronunciation) guide.

4.5.2 Implementation

Ruby can be expressed in an instance document with or without scope information.

Ruby in an instance document without scope information is realized with a ruby element which contains a rubyBase and a rubyText element.

Example 34: Ruby in an instance document without scope

<text>
 <head> ... </head>
 <body>
  <p>This is about the <its:ruby>
   <its:rubyBase>W3C</its:rubyBase>
    <its:rubyText>World Wide Web Consortium</its:rubyText>
   </its:ruby>.</p>
 </body>
</text>

Ruby in an instance document with scope information is expressed via two attributes:

A rubyText attribute contains the ruby text (corresponding to the rubyText element in the case of no scope information)
A rubyScope attribute contains the scope information. The XPath expression in this attribute selects the ruby base text, corresponding to the rubyBase element in the case of no scope information.

Example 35: Ruby in an instance document with scope

<text>
 <head> ... </head>
 <body>
  <img src="w3c_home.png" alt="W3C"
   its:rubyScope="@alt" its:rubyText="World Wide Web Consortium"/> ...
 </body>
</text>

Note: The structure of the content model for the ruby element without scope information is identical with the structure of ruby in section 5.4 of [OpenDocument], and simple ruby markup as defined in section 1.2.1 in [Ruby-TR].

5 Modularizations of ITS with existing Markup Schemes

[Ed. note: This section will be written in a subsequent working draft.]

Two topics are to be covered in this section:

How should ITS be integrated in specific markup schemes? For example, as for XHTML, it is helpful for the interoperability of ITS implementations to specify that the documentRules or documentRule elements will always be part of the content model of the head element.
How should ITS data categories be related to existing markup declarations in a schema, which fulfill identical or overlapping purposes? For example, [Dita 1.0] already has an attribute to indicate translatability of text, but without a mechanism for scope.

5.1 ITS and XHTML 1.0

5.2 ITS and DocBook

TODO

5.3 ITS and Open Document Format 1.0

TODO

5.4 ITS and DITA 1.0

TODO

6 Markup Declarations

This section is normative.

A data type data.scope is defined for scope. Its value is an XPath expression [XPath 1.0]. A data type data.itsBoolean is defined for boolean values, e.g. to express translatability.

data.scope

data.scope

[1] data.scope ::= text

data.itsBoolean

data.itsBoolean

[2] data.itsBoolean ::= "yes" | "no"

The attribute group att.datacats is used to express the ITS data categories. It makes use of the data type data.itsBoolean.

att.datacats

att.datacats

[3]	`att.datacats.attributes`	::=	`att.datacats.attribute.translate, att.datacats.attribute.locInfo, att.datacats.attribute.locInfoType, att.datacats.attribute.term, att.datacats.attribute.termRef, att.datacats.attribute.dir, att.datacats.attribute.bdo, att.datacats.attribute.rubyText`
[4]	`att.datacats.attribute.translate`	::=	`attribute translate { data.itsBoolean }?`
[5]	`att.datacats.attribute.locInfo`	::=	`attribute locInfo { text }?`
[6]	`att.datacats.attribute.locInfoType`	::=	`attribute locInfoType { "description" \| "alert" }?`
[7]	`att.datacats.attribute.term`	::=	`attribute term { "yes" }?`
[8]	`att.datacats.attribute.termRef`	::=	`attribute termRef { xsd:anyURI }?`
[9]	`att.datacats.attribute.dir`	::=	`attribute dir { "ltr" \| "rtl" }?`
[10]	`att.datacats.attribute.bdo`	::=	`attribute bdo { "yes" }?`
[11]	`att.datacats.attribute.rubyText`	::=	`attribute rubyText { text }?`

The attribute group att.scope is used to express scope for ITS data categories. It makes use of the data type data.scope. An overview of the relation between scope and ITS data categories is given in Section 4: Description of Data Categories.

att.scope

att.scope

[12]	`att.scope.attributes`	::=	`att.scope.attribute.translateScope, att.scope.attribute.locInfoScope, att.scope.attribute.termScope, att.scope.attribute.dirScope, att.scope.attribute.rubyScope`
[13]	`att.scope.attribute.translateScope`	::=	`attribute translateScope { data.scope }?`
[14]	`att.scope.attribute.locInfoScope`	::=	`attribute locInfoScope { data.scope }?`
[15]	`att.scope.attribute.termScope`	::=	`attribute termScope { data.scope }?`
[16]	`att.scope.attribute.dirScope`	::=	`attribute dirScope { data.scope }?`
[17]	`att.scope.attribute.rubyScope`	::=	`attribute rubyScope { data.scope }?`

ruby

ruby

[18]	`ruby`	::=	`element ruby { ruby.content }`
[19]	`ruby.content`	::=	`rubyBase, rubyText`

rubyBase

rubyBase

[20]	`rubyBase`	::=	`element rubyBase { rubyBase.content }`
[21]	`rubyBase.content`	::=	`text`

rubyText

rubyText

[22]	`rubyText`	::=	`element rubyText { rubyText.content }`
[23]	`rubyText.content`	::=	`text`

The schemaRules element contains rules for ITS information, to be used as schema annotation. The schemaRule element contains attributes from the ITS data categories.

schemaRules

schemaRules

[24]	`schemaRules`	::=	`element schemaRules { schemaRules.content }`
[25]	`schemaRules.content`	::=	`schemaRule+`

schemaRule

schemaRule

[26]	`schemaRule`	::=	`element schemaRule { schemaRule.content, schemaRule.attributes }`
[27]	`schemaRule.content`	::=	`empty`
[28]	`schemaRule.attributes`	::=	`att.datacats.attributes, empty`

The documentRules element contains rules for ITS information, to be used as schema annotation. The documentRule element contains attributes from the ITS data categories and the scope attributes.

documentRules

documentRules

[29]	`documentRules`	::=	`element documentRules { documentRules.content }`
[30]	`documentRules.content`	::=	`documentRule+`

documentRule

documentRule

[31]	`documentRule`	::=	`element documentRule { documentRule.content, documentRule.attributes }`
[32]	`documentRule.content`	::=	`empty`
[33]	`documentRule.attributes`	::=	`att.scope.attributes, att.datacats.attributes, empty`

7 Conformance

This section is normative.

Conformance to ITS falls into two categories: conformance to the ITS data categories (cf. Section 4: Description of Data Categories) and conformance to Scope (cf. Section 3: Scope of ITS information).

7.1 Conformance to the ITS data categories

An implementation of the ITS data categories is conformant if it supplies a schema which adopts the ITS data categories:

The schema must allow the usage of the attribute group att.datacats at every element which is declared in the schema
The schema should allow the usage of the attribute group att.scope at every element which is declared in the schema
The schema should allow the usage of the documentRules and the documentRule element in at least one element in the schema

The schemaRules and schemaRule element are to be used as schema annotations. It is the responsibility of the schema processor to allow for such annotations.

Example 36: A schema which is conformant to the ITS data categories

<xs:schema xmlns:myns="http://example.com/mySchema"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:its="http://www.w3.org/2005/11/its"
 targetNamespace="http://example.com/mySchema" elementFormDefault="qualified"
 attributeFormDefault="unqualified">
 <xs:import namespace="http://www.w3.org/2005/11/its" schemaLocation="its.xsd"/>
 <xs:element name="document">
  <xs:complexType>
   <xs:sequence>
     <xs:element ref="myns:head"/>
     <xs:element ref="myns:body"/>
   </xs:sequence>
   <xs:attributeGroup ref="myns:commonAtts"/>
  </xs:complexType>
 </xs:element>
 <xs:attributeGroup name="commonAtts">
  <xs:attributeGroup ref="its:att.datacats.attributes"/>
  <xs:attributeGroup ref="its:att.scope.attributes"/>
 </xs:attributeGroup>
 <xs:element name="head">
  <xs:complexType>
   <xs:choice minOccurs="0" maxOccurs="unbounded">
     <xs:element ref="its:documentRules"/>
     <xs:element ref="its:documentRule"/>
   </xs:choice>
   <xs:attributeGroup ref="myns:commonAtts"/>
  </xs:complexType>
 </xs:element>
 <xs:element name="body">
  <xs:complexType>
   <xs:sequence>
    <xs:element ref="myns:para" maxOccurs="unbounded"/>
   </xs:sequence>
   <xs:attributeGroup ref="myns:commonAtts"/>
  </xs:complexType>
 </xs:element>
 <xs:element name="para">
  <xs:complexType mixed="true">
   <xs:attributeGroup ref="myns:commonAtts"/>
  </xs:complexType>
 </xs:element>
</xs:schema>

7.2 Conformance to scope

Conformance to scope encompasses conformance to the ITS data categories, with the following changes:

The schema must allow the usage of the attribute group att.scope at every element which is declared in the schema
The schema must allow the usage of the documentRules and the documentRule element in at least one element in the schema
An application which processes ITS elements and attributes must take into account Section 3.3: Processing of Scope Information

A mandatory part of this conformance criterion is the usage of XPath. An application which processes ITS information must be able to process XPath in the version 1.0 or higher. It is not required to support a specific host language of XPath, like for example [XSLT 1.0].

Internationalization Tag Set (ITS)

W3C Working Draft 22 November 2005

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

1.1 Background: Motivation for ITS

1.2 Out of Scope

1.3 Usage Scenarios

1.4 Important Design Decisions

1.5 Development of this Specification

2 Notation and Terminology

2.1 Notation and Terminology

2.2 Namespaces used in this Specification

2.3 Schema Language

2.4 Data category

2.5 Scope

3 Scope of ITS information

3.1 Relation between Data Categories and Scope

3.2 Position of Scope (Where to Express Information about Scope)

3.2.1 Scope in a Schema

3.2.2 Dislocated Scope

3.2.3 Scope in an Instance Document

3.3 Processing of Scope Information

3.3.1 Precedence between Scope Information

3.3.2 Default Scope

3.3.3 Conflict between In Situ Scope Information

3.4 Mapping In Situ Scope to Dislocated Scope

3.5 Scope and XPath

4 Description of Data Categories

4.1 Translatability

4.1.1 Definition

4.1.2 Implementation

4.2 Localization Information

4.2.1 Definition

4.2.2 Implementation

4.3 Terminology

4.3.1 Definition

4.3.2 Implementation

4.4 Directionality

4.4.1 Definition

4.4.2 Implementation

4.5 Ruby

4.5.1 Definition

4.5.2 Implementation

5 Modularizations of ITS with existing Markup Schemes

5.1 ITS and XHTML 1.0

5.2 ITS and DocBook

5.3 ITS and Open Document Format 1.0

5.4 ITS and DITA 1.0

6 Markup Declarations

data.scope

data.itsBoolean

att.datacats

att.scope

ruby

rubyBase

rubyText

schemaRules

schemaRule

documentRules

documentRule

7 Conformance

7.1 Conformance to the ITS data categories

7.2 Conformance to scope

A Schemas for ITS

B References

C References (Non-Normative)

D Acknowledgements (Non-Normative)