[ contents ]
Copyright © 2007 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document provides a set of guidelines for developing XML documents and schemas that are internationalized properly. Following the best practices describes here allow both the developer of XML applications, as well as the author of XML content to create material in different languages.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document was developed by the Internationalization Tag Set (ITS) Working Group, part of the W3C Internationalization Activity. A complete list of changes to this document is available. Major changes in this version of the document encompass modifications of the Best Practices listed in that revision log.
This is an updated Working Draft of "Best Practices for XML Internationalization". The Internationalization Tag Set (ITS) Working Group intends to publish this document as a Working Group Note before the end of December 2007.
Feedback about this document is encouraged. Send your comments to www-i18n-comments@w3.org. Use "[Comment on xml-i18n-bp WD]" in the subject line of your email, followed by a brief subject. The archives for this list are publicly available.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
xml:lang
to specify natural language contentspan
-like element for your schemaThis document is a complement to the W3C Recommendation Internationalization Tag Set (ITS) Version 1.0 [ITS]. However, not all internationalization-related issues can be resolved by the special markup described in ITS. The best practices in this document therefore go beyond application of ITS markup to address a number of problems that can be avoided by correctly designing the XML format, and by applying a few additional guidelines when developing content.
This document and Internationalization Tag Set (ITS) Version 1.0 [ITS] implement requirements formulated in Internationalization and Localization Markup Requirements [ITS REQ].
This document is divided into two main sections:
The first one is intended for the designers and developers of XML applications (also referred to here as 'schemas' or 'formats').
The second is intended for the XML content authors. This includes users modifying the original content, such as translators.
Section 2: When Designing an XML Application provides a list of some of the important design choices you should make in order to ensure the internationalization of your format.
Section 4: Generic Techniques provides additional generic techniques such as writing ITS rules or adding an attribute to a schema. Such techniques apply to many of the best practices.
Section 5: ITS Applied to Existing Formats provides a set of concrete examples on how to apply ITS to existing XML based formats. This section illustrates many of the guidelines in this document.
Section 3: When Authoring XML Content provides a number of guidelines on how to create content with internationalization in mind. Many of these best practices are relevant regardless of whether or not your XML format was developed especially for internationalization.
Section 4.1: Writing ITS Rules provides practical guidelines on how to write ITS rules. Such techniques may be useful when applying some of the more advanced authoring best practices.
Designers and developers of XML applications should take into account the following best practices:
Best Practice | Implementing a new feature | Handling legacy markup |
---|---|---|
Best Practice 1: Providing xml:lang to specify natural language content | Make sure the xml:lang attribute is defined for the root element of your document, and for any element where a change of language may occur. | Provide an ITS rules document where you use the its:langRule element to specify what attribute or element is used instead of xml:lang . |
Best Practice 2: Providing a way to specify text directionality | Make sure the its:dir attribute is defined for the root element of your document and for all elements with content that may be rendered. | Provide an ITS rules document where you use the its:dirRule element to associate the different directionality indicators with their equivalents in ITS. |
Best Practice 3: Avoiding translatable attributes | Make sure all translatable text is stored as element content, not as attribute values. | Provide an ITS rules document where you use the its:langRule element to specify what attribute or element is used instead of xml:lang . |
Best Practice 4: Indicating which elements and attributes should be translated | Provide an ITS rules document where you use its:translateRule elements to indicate which elements have non-translatable content and which attributes have translatable values. | |
Best Practice 5: Providing a way to override translation information |
| If authors can use a proprietary mechanism for this, make sure it is covered in the ITS rules document provided when applying Best Practice 4: Indicating which elements and attributes should be translated. |
Best Practice 6: Providing text segmentation-related information | Provide an ITS rules document where you use its:withinTextRule elements to indicate which elements should be treated as part of their parents or as a nested and independent run of text. | |
Best Practice 7: Providing a way to specify ruby text |
| Provide an ITS rules document where you use the its:rubyRule element to associate your ruby markup with its equivalent in ITS. |
Best Practice 8: Providing a way to specify notes for localizers |
| Provide an ITS rules document where you use the its:locNoteRule element to associate your notes markup with its equivalent in ITS. |
Best Practice 9: Providing a way to specify unique identifiers | Make sure the elements with translatable content are associated with a unique identifier. | |
Best Practice 10: Identifying terminology-related elements | Provide an ITS rules document where you use its:termRule elements to indicate which elements are terms and information related to them (e.g. definitions). | |
Best Practice 11: Providing a way to specify or override terminology-related information |
| If authors can use a proprietary mechanism for this, make sure it is covered in the ITS rules document provided for Best Practice 10: Identifying terminology-related elements. |
Best Practice 12: Using multilingual documents with caution | For documents that need to go through some localization tasks, always store a single language per document. | |
Best Practice 13: Naming elements and attributes with caution | Make sure the names of the elements and attributes of your schema reflect their functions, rather than one possible way of rendering their content. | N/A |
Best Practice 14: Providing a span-like element for your schema | Make sure to define a span -like element in your content that will allow the authors to associate a delimited run of text with language-oriented properties such as directionality, or language identification. | N/A |
Best Practice 15: Documenting the ITS-related features of your schema | Make sure to document the internationalization and localization aspects of your schema by providing the set of relevant ITS rules in a single standalone ITS rule document. |
Where it says "How to implement this as a new feature", this section describes how to create new schemas or add new features to existing schemas. When doing this you may need to take into account the following:
Think twice before creating your own schema. Seriously consider using existing formats such as DITA, DocBook, Open Document Format, Office Open XML, XML User Interface Language, Universal Business Language, etc. Those formats have many useful insights already built in.
Check carefully whether an existing format comes with a built-in capability for modification. DocBook and DITA, for example, come with their own set of features for adapting their format to special needs.
The modification mechanisms available will depend on the schema language (DTD, XML Schema, RELAX NG, etc.) For example, namespace-based modularization of schemas is difficult to achieve with DTDs.
NVDL is an example of a meta-schema language was designed especially to allow integration of several existing vocabularies into a single XML vocabulary without the need to know the details of source schemas. This means that with NVDL you can usually create a schema for compound documents more easily than with other schema technologies.
Each schema language provides different ways of extending or modifying existing schemas. Some examples are the include, import or redefine mechanisms in XML Schema.
Some processors do not implement support for all schema language constructs, due to erroneous implementations or differences in conformance profiles (e.g. see the conformance requirements to XML Schema part 1). Therefore a schema which works in one environment may not work in a different one.
What is possible also depends on the features of the schema which the modification is targeting. For example:
An XML Schema redefine
is only possible if the modified schema has been created with named types.
If you are working with XML Schema, you can only apply the technique of 'chameleon' or 'proxy' schemas (see http://www.xfront.com/ZeroOneOrManyNamespaces.html) if the 'chameleon' schemas have no namespace. For example, the XML Schema document for ITS XML Schema document for ITS has a target namespace and therefore cannot be a 'chameleon' schema.
Note: The considerations above are only a portion of what you need to take into account. You need to know a lot more when diving into schema modularization. The following provides some good additional reading: [Ed. note: TODO: point to references].
The XML namespace provides the xml:lang
attribute and the ITS Language Information data category provides the its:langRule
element to address this requirement.
How to implement this as a new feature
Make sure the xml:lang
attribute is defined for the root element of your document, and for any element where a change of language may occur.
For examples of how to add attributes in your existing schema see Section 4.2: Example of adding an attribute to an existing schema.
Some XML documents may be designed to store data without natural language content. In these cases, there is no need for the xml:lang
attribute.
The scope of the xml:lang
attribute applies to both the attributes and the content of the element where it appears, therefore one cannot specify different languages for an attribute and the element content. ITS does not provide a remedy for this. Instead, it is recommended that you avoid translatable attributes.
Make sure that the definition of the xml:lang
attribute allows for empty values. That is:
In a DTD you must not use NMTOKEN
as the data type, instead use CDATA
.
In XML Schema the built-in data type language
does not allow empty values. However, the declaration for xml:lang
in the XML Schema document for the XML namespace at http://www.w3.org/2001/xml.xsd does allow for empty values and therefore can be used.
Note: If you need to specify language as data or meta-data about something external to the document, do it with an attribute different from xml:lang
. For more information see the article xml:lang in XML document schemas.
In XHTML the language of a file linked with the a
element is indicated with a hreflang
attribute because it does not apply to the content of the a
element.
<a xml:lang="en" href="german.html" hreflang="de">Click here for German</a>
It is not recommended to use your own attribute or element to specify the language of the content. The xml:lang
attribute is supported by various XML technologies such as XPath and XSLT (e.g. the lang()
function). Using something different would diminish the interoperability of your documents and reduce your ability to take advantage of some XML applications.
How to handle legacy markup
If you are working with an existing schema where there is a way to specify content language that uses something other than the xml:lang
attribute (but still uses the same values as xml:lang
), you should use the its:langRule
element to specify what attribute or element is used instead of xml:lang
. This can be done in the ITS rules elements in the head of a document, if your format supports that, or in a separate document.
In this document the langcode
element is used to specify the language of the text
element. The langcode
element has no inheritance behavior equivalent to the one of xml:lang
.
Note: This example is a multilingual document, which has its own set of issues (see Best Practice 12: Using multilingual documents with caution).
<myRes> <messages> <msg id="1"> <langcode>en</langcode> <text>Cannot find file.</text> </msg> <msg id="2"> <langcode>fr</langcode> <text>Fichier non trouvé.</text> </msg> </messages> </myRes>
The corresponding ITS rules document contains an its:langRule
element that specifies that the langcode
element holds the same values as the xml:lang
attribute and applies to the text
element.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:langRule selector="//text[../langcode]" langPointer="../langcode"/> </its:rules>
Why do this
Information about the language of content can be very important for correctly rendering or styling text in some scripts, applying spell-checkers during content authoring, appropriate selection of voice for text-to-speech systems, script-based processing, and numerous other reasons. You must provide a standard way to specify the language for the document as a whole, but also for parts of the document where the language changes.
In scripts such as Arabic and Hebrew characters may run from both left to right and right to left when displayed. Directional markup allows you to manage the flow of characters. For an example of how directional markup is used see Creating (X)HTML Pages in Arabic & Hebrew.
The ITS Directionality data category provides the its:dir
attribute and the its:dirRule
element to address this requirement.
How to implement this as a new feature
Make sure the its:dir
attribute is defined for the root element of your document and for all elements whose content rendering is affected by directionality[Ed. note: Maybe this should say "all elements which can have any text content"].
For examples of how to add attributes in your existing schema see Section 4.2: Example of adding an attribute to an existing schema.
How to handle legacy markup
If you are working with an existing schema where there is a way to specify text directionality that is not implemented using the its:dir
attribute, you should document the semantics in a separate document. You can use the its:dirRule
element to associate the different directionality indicators with their equivalents in ITS.
In this document the textdir
attribute is used to specify directionality of a text run.
<text xml:lang="en"> <body> <par>In Hebrew, the title <quote xml:lang="he" textdir="r2l">פעילות הבינאום, W3C</quote> means <quote>Internationalization Activity, W3C</quote>.</par> </body> </text>
Note: This example shows the directionality of the source text correctly. This is to ensure that you understand the concepts being described. For such display, you need a sophisticated editor that resolves directionality of the source text correctly. Many editors are not yet this sophisticated. See the related discussion about Problems with bidirectional source text in [Bidi in X/HTML].
The corresponding ITS rules document contains a set of its:dirRule
elements that specifies the relationships between the textdir
attribute and the ITS Directionality data category.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:dirRule selector="//*[@textdir='l2r']" dir="ltr"/> <its:dirRule selector="//*[@textdir='r2l']" dir="rtl"/> <its:dirRule selector="//*[@textdir='lro']" dir="lro"/> <its:dirRule selector="//*[@textdir='rlo']" dir="rlo"/> </its:rules>
Why do this
Generally the Unicode bidirectional algorithm will produce the correct ordering of mixed directionality text in scripts such as Arabic and Hebrew. Sometimes, however, additional help is needed. For instance, in the sentence of Example 4 the 'W3C' and the comma should appear to the left side of the quotation. This cannot be achieved using the bidirectional algorithm alone.
The following is incorrect, since no directional markup has been used:
The title says "פעילות הבינאום, W3C" in Hebrew.
The text should look like this (assuming your browser supports bidirectional display):
The title says "פעילות הבינאום, W3C" in Hebrew.
The desired effect can be achieved using Unicode control characters, but this is not recommended (See Unicode in XML and other Markup Languages [Unicode in XML]). Markup is needed to establish the default directionality of a document, and to change that where appropriate by creating nested embedding levels.
Markup is also occasionally needed to disable the effects of the bidirectional algorithm for a specified range of text.
How to implement this as a new feature
Make sure you store all translatable text as element content, not as attribute values.
It is bad design to use the alt
attribute to store the alternate descriptive text for the img
element, as in this example.
<image src="elephants.png" alt="Elephants bathing in the Zambezi River."/>
Instead,define the content of img
itself to hold the text. This way there is no translatable text in an attribute.
<image src="elephants.png">Elephants bathing in the Zambezi River.</image>
Note: In many cases, moving translatable text from attribute value to element content will result in one sentence being embedded within another one. For instance, in Example 5 the description of the image will be embedded inside the text of the paragraph that contains it. In such cases, do not forget to declare the relevant element (here image
) as 'nested', as described here: Best Practice 6: Providing text segmentation-related information.
How to handle legacy markup
If you are working with an existing schema where there are attributes with translatable values, you should document this in a separate document containing ITS rules: use the its:translateRule
element to specify what attributes are translatable. See Best Practice 4: Indicating which elements and attributes should be translated for more information about how to do this.
Why do this
There are a number of issues related to storing translatable text in attribute values. Some of them are:
The language identification mechanism (i.e. xml:lang
) applies to both the content and to the attribute values of the element where it is declared. If the text of an attribute is in a different language than the text of the element content, one cannot set the language for both correctly.
It may be necessary to apply some language-related properties, such as directionality and language identification, to only part of the text in an attribute value. This requires the use of a span
-like element, but elements cannot be used within an attribute value.
It is difficult to apply meta-information, such as no-translate flags, author's notes, etc., to the text of an attribute value
The difficulty of attaching unique identifiers to translatable attribute text makes it more complicated to use ID-based leveraging tools.
It can be problematic to prepare translatable attributes for localization because they can occur within the content of a translatable element, breaking it into different parts, and possibly altering the sentence structure.
All these potential problems are less likely to occur when the text is the content of an element rather than the value of an attribute.
The ITS Translate data category provides the its:translateRule
element to address this requirement.
How to do this
Use its:translateRule
elements to indicate which elements have non-translatable content. This can be done using ITS rules elements in the head of a document, if your format supports that, or in a separate document.
Note: Where appropriate, allow for the content of an element to be flagged with xml:lang="zxx"
, where zxx
indicates content that is not in a language, and therefore is most likely not translatable.
If you are working with a schema where there are translatable attributes (something that is not recommended), you should also use its:translateRule
to specify these translatable attributes.
In the following document, the content of the head
element should not be translated, and the value of the alt
attribute should be translated. In addition, the content of the del
element should not be translated.
<myDoc xml:lang='en'> <head> <id xml:lang="zxx">H4-A3-F8-A1</id> <author>Robert Griphook</author> <rev>v13 2007-10-27</rev> </head> <par>To start click <ins>the <ui>Start</ui> button</ins><del>green icon</del> and fill the form labeled by the following icon: <ref file="vat.png" alt="Value Added Tax Form"/></par> </myDoc>
The following rules specify exceptions from the default ITS behavior for documents like the one above.
Rule 1: The content of head
in myDoc
is not translatable. By inheritance, the child elements of head
are also assumed not translatable.
Rule 2: All the alt
attributes are translatable.
Rule 3: The content of del
is not translatable.
Rule 4: The non-translatability of del
applies also to any attribute that may have been set as translatable by a prior rule (i.e. the second rule).
Rule 5: Any element or attribute with their language set to zxx
is not translatable.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:translateRule selector="/myDoc/head" translate="no"/> <its:translateRule selector="//*/@alt" translate="yes"/> <its:translateRule selector="//del" translate="no" /> <its:translateRule selector="//@*[ancestor::del]" translate="no"/> <its:translateRule selector="//*[lang('zxx')] | //@*[lang('zxx')]" translate="no"/> </its:rules>
Why do this
By default, ITS assumes that the content of all elements is translatable and that all attributes have non-translatable values. If your XML document type does not correspond to this default assumption it is important to indicate what are the exceptions. Doing so can significantly improve translation throughput.
The ITS Translate data category provides the its:translate
attribute and the its:translateRule
element to address this requirement.
How to implement this as a new feature
Make sure the its:translate
attribute is defined for the root element of your documents, and for any element that has text content.
For examples of how to add attributes in your existing schema see Section 4.2: Example of adding an attribute to an existing schema.
It is also recommended that you define the its:rules
element in your schema, for example in a header if there is one, and within that the its:translateRule
element. Content authors can then use these elements to globally change the default translate rules for specific elements and attributes.
How to handle legacy markup
If you are working with a schema where there is a way to override translate information that is not its:translate
, the authors of the documents should use it. In addition, you should provide an ITS rules document where you use the its:translateRule
element to associate this mechanism to the ITS Translate data category.
For example, DITA offers a translate
attribute, and Glade provides a translatable
attribute. Both have the same semantics as its:translate
, ie. the translation information applies to element content, including child elements, but excluding attribute values.
The following rules indicate how to associate the DITA translate
attribute with the ITS Translate data category. The order in which the rules are listed is important:
Rule 1: Indicates that the content of any element with a translate
attribute set to no
is not translatable.
Rule 2: Indicates that any attribute value of any element with a translate
attribute set to no
is not translatable. This is needed because some attributes are translatable in DITA and we need to make sure they are not translated when translate="no"
is used in the elements where they are.
Rule 3: Indicates that the content of any element with a translate
attribute set to yes
is translatable. This takes care of the cases where translate="yes"
is used to override a prior translate="no"
.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:translateRule selector="//*[@translate='no']" translate="no"/> <its:translateRule selector="//*[@translate='no']/descendant-or-self::*/@*" translate="no"/> <its:translateRule selector="//*[@translate='yes']" translate="yes"/> </its:rules>
You can find a more complete example of how DITA markup is associated with ITS in Section 5.4.2: Associating existing DITA markup with ITS.
Why do this
In some cases, the author of a document may need to change the translatability property on parts of the content, overriding ITS defaults behavior, or more the general rules for the schema that you have specified when applying Best Practice 4: Indicating which elements and attributes should be translated.
Segmentation refers to how text is broken down, from a linguistic viewpoint, into units that can be handled by processes such as translation.
The ITS Element Within Text data category provides the its:withinTextRule
element to address this requirement.
How to do this
This is relevant for new feature development and for dealing with legacy markup.
Provide an ITS rules document where you use its:withinTextRule
elements to indicate which elements should be treated as either part of their parents, or as a nested and independent run of text. By default, elements boundaries are assumed to correspond to segmentation boundaries.
In the following DITA document:
The elements term
and b
should be treated as parts of their parents.
The element fn
should be treated as an independent run of text.
<concept id="myConcept" xml:lang="en-us"> <title>Types of horse</title> <conbody> <ol> <li>Palouse horse:<p><term>Palouse horses</term><fn>A palouse horse is the same as an <b>Appaloosa</b>.</fn> have spotted coats. The <term>Nez-Perce</term> Indians have been key in breeding this type of horse.</p></li> </ol> </conbody> </concept>
The its:withinTextRule
element is used to specify the behavior of three elements, all other elements are assumed to have the value its:withinText="no"
:
Rule 1: The elements term
and b
are defined as part of the text flow.
Rule 2: The element fn
is defined as a separate text nested inside its parent element.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:withinTextRule selector="//term | //b" withinText="yes"/> <its:withinTextRule selector="//fn" withinText="nested"/> </its:rules>
These rules applied on the DITA document above will result on four distinct runs of text:
title: "Types of horse"
li: "Palouse horse:"
p: "{term}Palouse horses{/term}{fn/} have spotted coats. The {term}Nez-Perce{/term} Indians have been key in breeding this type of horse."
fn: "A palouse horse is the same as an {b}Appaloosa{/b}."
Why do this
Many applications that process content for linguistic-related tasks need to be able to perform a basic segmentation of the text content. They need to be able to do this without knowing the semantics of the elements.
While in many cases it is possible to detect mixed content automatically, there are some situations where the structure of an element makes it impossible for tools to know for sure where appropriate segmentation boundaries fall. For example, the li
element in XHTML can contain text as well as p
elements. [Ed. note: I don't think this example, as expressed here, clarifies much.] For example, the boundaries of some inline elements, such as emphasis, do not typically correspond to segmentation boundaries; and some inline elements embedded in a parent element, such as footnotes or quotations, may define segments that should be handled separately from the text in which they are embedded.
Intelligent segmentation is particularly important in translation to successfully match source text against translation-memory databases.
Ruby text is used to provide a short annotation of an associated base text. It is most often used to provide a reading (pronunciation) guide.
The ITS Ruby data category provides the elements its:ruby
and its:rubyRule
to address this requirement.
How to implement this as a new feature
Make sure the its:ruby
element is defined in all elements where there is text content.
It is also recommended to define the its:rules
element in your schema, for example in a header if there is one. The its:rules
element provides access to the its:rubyRule
element which can be used to associate ruby information with elements and attributes globally.
Note: [Ed. note: TODO: Ask Felix to write the paragraph about conformance!]
How to handle legacy markup
If you are working with an existing schema where there is a way to specify ruby text that has the same semantics as the ITS Ruby data category (for example the Ruby Annotation [Ruby]), you should provide an ITS rules document where you use the its:rubyRule
element to associate your ruby markup with its equivalent in ITS.
In this document, the rubyBlock
element has the same functionality as its:ruby
, rBase
as its:rb
, rParen
as its:rp
, and rText
as its:rt
.
<text> <para>この本は <rubyBlock> <rBase>慶応義塾大学</rBase> <rParen>(</rParen> <rText>けいおうぎじゅくだいがく</rText> <rParen>)</rParen> </rubyBlock>の歴史を説明するものです。</para> </text>
This its:rubyRule
element indicates that the rBase
element has the same functionality as its:rb
and that the elements its:ruby
, its:rt
and its:rt
have equivalent elements as well.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:rubyRule selector="//rBase" rubyPointer=".." rpPointer="../rParen" rtPointer="../rText" /> </its:rules>
Why do this
Ruby is a type of annotation that, while not typically used for Western languages, is used for East Asian scripts to provide phonetic transcriptions of characters that the reader is not expected to be familiar with. For example it is widely used in education materials and children’s texts. It is also occasionally used to convey information about meaning.
Because ruby annotation may be needed when localizing into Japanese or Chinese, it is useful to make provision for it, even if your original documents are to be developed into a language that does not use such markup.
The ITS Localization Note data category provides the attributes its:locNote
, its:locNoteType
and its:locNoteRef
, as well as the its:locNoteRule
element to address this requirement.
How to implement this as a new feature
Make sure the attributes its:locNote
, its:locNoteType
and its:locNoteRef
are defined in your schema.
For examples of how to add attributes in your existing schema see Section 4.2: Example of adding an attribute to an existing schema.
It is also recommended to define the its:rules
element in your schema, for example in a header if there is one. The its:rules
element provides access to the its:locNoteRule
element which can be used to specify localization-related notes globally.
The its:locNoteRule
element also allows to specify existing notes in an XML document via the locNotePointer
attribute, or to provide an existing reference to notes via the locNoteRefPointer
attribute.
How to handle legacy markup
If you are working with an existing schema where there is a way to provide notes to the localizers that is not implemented using ITS, you should provide an ITS rules document where you use the its:locNoteRule
element to associate your notes markup with its equivalent in ITS.
In this document the comment
element is a note for its sibling text
element.
<messages> <msg id="ERR_NOFILE"> <text>The file '{0}' could not be found.</text> <comment>The variable {0} is the name of a file.</comment> </msg> </messages>
The its:locNoteRule
element specifies that the text
elements have an associated localization description in their sibling comment
elements.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:locNoteRule selector="//msg/text" locNoteType="description" locNotePointer="../comment"/> </its:rules>
Why do this
To assist the translator to achieve a correct translation, authors may need to provide information about the text that they have written. For example, the author may want to do the following:
Tell the translator how to translate part of the content (e.g. "Leave text in uppercase").
Expand on the meaning or contextual usage of a particular element, such as what a variable refers to or how a string will be used on the UI.
Clarify ambiguity and show relationships between items sufficiently to allow correct translation (e.g. in many languages it is impossible to translate the word 'enabled' in isolation without knowing the gender, number and case of the thing it refers to.)
Explain why text is not to be translated, point to text reuse, or describe the use of conditional text.
Indicate why a piece of text is emphasized (important, sarcastic, etc.)
How to do this
Make sure the elements with translatable content are associated with a unique identifier.
It is strongly recommended for such identifier to be an attribute of type ID, following the rules described in xml:id Version 1.0 [xml:id]. This allows XML applications to take advantage of the built-in processes associated with the datatype, for example validation.
It is also recommended to name such attribute xml:id
to increase interoperability.
Note: Using identifiers that are globally unique (i.e. unique across any documents) and persistent (i.e. ones which do not change over time) often provides additional benefits.
Why do this
In order to most effectively reuse translated text where content is reused (for example across updates) it is necessary to have a unique and persistent identifier associated with the element
This identifier allows the translation tools to correctly track an item from one version or location to the next. After ensuring that this is the same item, the content can be examined for changes, and if no change has taken place the potential for reuse of the previous translation is very high.
Change analysis constitutes an extremely powerful productivity tool for translation when compared to the typical source matching techniques (a.k.a. translation memory). These techniques simply look for similar source text in a multilingual database without, most of the time, being able to tell whether the context of its use is the same.
Identifiers can also be helpful to track displayed text back to their underlying source. For example, when reviewing a translated user interface, the identifiers can be used as temporary prefixes to the text so any correction can be efficiently done to the proper strings.
The ITS Terminology data category provides the its:termRule
element to address this requirement.
How to do this
Provide an ITS rules document where you use its:termRule
elements to indicate which elements are terms and information related to them (e.g. definitions).
Note: The information identified through the its:termInfoRef
can be of any type (e.g. human-readable or machine-specific). It is up to the application processing the data to make the distinction.
In this document, the elements term
and dt
, as well as any element with a syn
attribute, denote terms. In addition, they can all have associated information.
<myDoc> <body> <p>A <term def="d001" syn="#alterego">doppelgänger</term> is basically <def xml:id="d001">the counterpart of a person</def>. It is almost the same as an <emph syn="#alterego">alter ego</emph>, but with a more sinister connotation. Sometimes the word <emph syn="#alterego">fetch</emph> is also used.</p> </body> <definitions> <entry xml:id="alterego"> <dt>alter ego</dt> <dd>A second self. Figurative sense: trusted friend.</dd> <origin>Latin, literally: "second I"</origin> </entry> </definitions> </myDoc>
The set of ITS rules below indicates:
Rule 1: The term
element is a term and its associated information can be accessed in the node that has the identifier corresponding to the value in its def
attribute.
Rule 2: Any element with a syn
attribute is considered a term and the syn
attribute contains a URI location where some associated information can be found.
Rule 3: The dt
element is a term and its associated information is in its sibling element dd
.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:termRule selector="//term" term="yes" termInfoPointer="id(@def)"/> <its:termRule selector="//*[@syn]" term="yes" termInfoRefPointer="@syn"/> <its:termRule selector="//dt[../dd]" term="yes" termInfoPointer="../dd"/> </its:rules>
Why do this
The capability of specifying terms within the source content is important for terminology management and beneficial to translation and localization quality. For example, term identification facilitates the creation of glossaries and allows the validation of terminology usage in the source and translated documents.
Term identification is also useful for change management and to ensure source language quality.
Terms may require various associated information, such as part of speech, gender, number, term types, definitions, notes on usage, etc. To avoid associated information to be repeated throughout a document, it should be possible for identified terms to link to externalized attribute data, such as glossary documents and terminology database.
The ITS Terminology data category provides the attributes its:term
and its:termInfoRef
, as well as the its:termRule
element to address this requirement.
How to do this
Make sure the its:term
and the its:termInfoRef
attributes are defined for any element that text content.
For examples of how to add attributes in your existing schema see Section 4.2: Example of adding an attribute to an existing schema.
It is also recommended to define the its:rules
element in your schema, for example in a header if there is one. The its:rules
element provides access to the its:termRule
element which can be used to override terminology-related information globally.
Why do this
In some cases, the author of a document may need to change the information indicating what is a term or how to point to term information, overriding the general rules for the schema that you have specified when applying Best Practice 10: Identifying terminology-related elements.
The type of multilingual documents discussed here are the ones where copies of the same content are stored the multiple languages in a single document.
How to do this
For documents that need to go through some localization tasks, always store a single language per document.
In this example, it is bad design to use a single document that contains multiple translations of the same content:
<messages> <msg xml:id='fileNotFound'> <text xml:lang="en">File not found.</text> <text xml:lang="fr">Fichier non trouvé.</text> </msg> </messages>
Instead, use one document for each language. Here one in English, and the other one in French. Other languages would go in similar separate documents.
<messages xml:lang="en"> <msg xml:id='fileNotFound'> <text>File not found.</text> </msg> </messages>
<messages xml:lang="fr"> <msg xml:id='fileNotFound'> <text>Fichier non trouvé.</text> </msg> </messages>
Note: It is admissible to store multilingual copies of a content in a single document before the document to send to localization, or after all localization tasks are done. For example, a final resource file could be constructed by collating the different language entries.
Note: It is admissible to provide the localizer with multilingual documents in XML formats that are specifically designed for localization, and are industry standards, like the XML Localisation Interchange File Format [XLIFF 1.2].
Why do this
There are two main reasons to avoid sending multilingual documents for localization:
During localization, if the source material is provided in the same document where the different translations should be placed, it will be difficult to do concurrent translations in all languages. Each translation being very likely done by a different translator, the document will have to be broken down into separate parts and re-constructed later on. This will add processing time, increase cost and provide more opportunities for introducing errors.
Also, depending on its live cycle, such multilingual document may contain existing translations. Some up-to-date and some outdated (because the source material may have changed). In order to be able to identify what parts needs to be localized and what parts should be left alone, the document must then also contain custom information about localization state, which may or may not be supported by localization tools.
How to do this
Make sure the names of the elements and attributes of your schema reflect their functions, rather than one possible way of rendering their content.
In this example, it is bad design to use the element b
for several purposes.
<doc> <p>To run the application, click the <b>Start</b> button.</p> <p><b>Make sure to enter your username</b>, and then press <b>OK</b>.</p> </doc>
Instead, define different elements based on their functions rather than a pre-supposed rendering.
<doc> <p>To run the application, click the <ui>Start</ui> button.</p> <p><emph>Make sure to enter your username</emph>, and then press <ui>OK</ui>.</p> </doc>
Also, if possible, avoid having element names which do not follow a fixed naming scheme (for example element names that serve also as identifiers).
In this example, it is bad design to have the names of the elements to serve as text identifiers.
<strings> <INPUTPATH>Input path:</INPUTPATH> <HELP>Help</HELP> <OK>OK</OK> <CANCEL>Cancel</CANCEL> </strings>
Instead, use elements names that follow a fixed naming scheme, and use xml:id
to store the identifiers.
<strings> <str xml:id="INPUTPATH">Input path:</str> <str xml:id="HELP">Help</str> <str xml:id="OK">OK</str> <str xml:id="CANCEL">Cancel</str> </strings>
Why do this
The name of an element should indicate what its function is, not how its content will be presented, because presentation may vary depending on different factors such as language, media, or accessibility.
Using documents where elements or attributes do not follow a predictable naming pattern may cause problems when using XSLT-driven processes. It may also be an issue for translation tools. This is especially true if not all parts of the document are to be translated. In such case the rules to distinguish the translatable nodes from the non-translatable ones would be more difficult to specify.
A span
-like element is an element that can be used to mark up an arbitrary content and associate with it various properties such as directionality or language information. Examples of such element is the span
element in XHTML, or the phrase
element in DocBook.
How to do this
Make sure to define a span
-like element in your content that will allow the authors to associate a delimited section with language-oriented properties such as directionality, or language information.
If your schema does not provide such an element, you should allow the its:span
element to be used in any element that can contain text.
Why do this
Some properties of a text are applied using attributes, and therefore require the use of a neutral element that has for unique function to delimit the run of text to which the attributes apply. Directionality, terminology, localization notes, translatability, or language identification are examples of such properties.
How to do this
Make sure to document the internationalization and localization aspects of your schema by providing the set of relevant ITS rules in a single standalone ITS rule document.
Your ITS rules document should include the following information, when applicable:
The correspondence between any proprietary mechanism you have to specify the language of content and xml:lang
(See Best Practice 1: Providing xml:lang to specify natural language content).
The correspondence between any proprietary mechanism you have to indicate text directionality and its:dir
(See Best Practice 2: Providing a way to specify text directionality).
What part of your markup has translatability rules different from the defaults (See Best Practice 4: Indicating which elements and attributes should be translated).
The correspondence between any proprietary mechanism you have to override translatability information and the ITS equivalent (See Best Practice 5: Providing a way to override translation information).
The list of elements that should be treated as "nested" or "within text" from a segmentation viewpoint (See Best Practice 6: Providing text segmentation-related information).
The correspondence between any proprietary mechanism you have to markup ruby text and its:ruby
(See Best Practice 7: Providing a way to specify ruby text).
What part of your markup holds notes for the localizers (See Best Practice 8: Providing a way to specify notes for localizers).
What part of your markup denotes terms and term-related information (See Best Practice 10: Identifying terminology-related elements).
Some examples of ITS rules documents for existing XML formats are shown in Section 5: ITS Applied to Existing Formats.
Why do this
Although some XML vocabularies are easy to understand or process, it is often helpful or necessary to provide explicit information about a given vocabulary. If such vocabulary is to be used in a multilingual context, it is of high importance to provide specific information such as which elements contain translatable content. This is needed because general information on purpose, general structure, and node types very often are not sufficient. In a way, this need for explicit information is related to the general good practice of documenting source code.
In XML it should come naturally to use a well-defined structured format to capture such information. With regard to information related to internationalization and translation, ITS rules documents are a good choice for the following reasons:
They cover many important aspects related to internationalization and translation.
They capture information precisely (for example selectors identify to which nodes a data category pertains).
They can be processed by ITS-aware applications.
They can easily be combined with additional structured information (e.g. related to version control, as shown in the example below)
This document shows how a set of ITS rules can be easily included along with some customized information.
<myFormatInfo xmlns:its="http://www.w3.org/2005/11/its"> <desc>ITS rules used by the Open University</desc> <hostVoc>http://www.example.com/ns/myFormat</hostVoc> <rulesId>98ECED99DF63D511B1250008C784EFB1</rulesId> <rulesVersion>v 1.81 2006/03/28 07:43:21</rulesVersion> <its:rules version="1.0"> <its:translateRule selector="//header" translate="no"/> <its:translateRule selector="//term" translate="no"/> <its:termRule selector="//term" term="yes"/> <its:withinTextRule withinText="yes" selector="//term|//b"/> </its:rules> </myFormatInfo>
Authors of XML content should consider the following best practices:
Best Practice | Summary |
---|---|
Best Practice 16: Specifying the language of the content | Use xml:lang (or its equivalent in your schema) on the root element of the document, and, if needed, on each element for which the language content is different. |
Best Practice 17: Specifying text directionality if needed | By default the text directionality in an XML document is assumed to be left-to-right. Use its:dir (or its equivalent in your schema) on each element for which the text directionality is different from its parent. |
Best Practice 18: Overriding translatability information if needed | Use its:translate (or its equivalent in your schema) on each element for which the translatability property is different from its parent. |
Best Practice 19: Assigning unique identifiers to elements with translatable content | Use xml:id (or its equivalent in your schema) on each element that can be uniquely identified. If possible, use globally unique and persistent values as identifiers. |
Best Practice 20: Avoiding CDATA sections when possible | Avoid using CDATA notation in translatable XML content. |
Best Practice 21: Providing notes for localizers | Use its:locNote , its:locNoteType and its:locNoteRef (or their equivalent in your schema) to provide comments and notes to the localizer. |
Best Practice 22: Ensuring that any inserted text is context-independent | Make sure any piece of inserted text is grammatically independent of its surrounding context. |
Best Practice 23: Identifying terms | Use its:term and its:termInfoRef (or their equivalent in your schema) to mark terms and supply term-related information. |
Best Practice 24: Avoiding including markup in escape form | Avoid storing XML or HTML markup as text content. |
A number of these practices can be followed only when the XML application has been internationalized properly using the design guidelines Section 2: When Designing an XML Application.
How to do this
Use xml:lang
(or its equivalent in your schema) on the root element of the document, and on each element where the language of the content changes. The elements without declaration inherit the language information from their parents. The attributes values are in the same language as the element where they are declared.
Your schema should provide the xml:lang
attribute (or an equivalent mechanism). See Best Practice 1: Providing xml:lang to specify natural language content for more information.
Make sure the values of xml:lang
conform to Tags for Identifying Languages [BCP 47].
xml:lang
In this example, the main content of the document is in English, while a short citation in the q
element is identified as being in French using xml:lang
set to fr
.
<document xml:lang="en"> <para>The motto of Québec is the short phrase: <q xml:lang="fr">Je me souviens</q>. It is chiseled on the front of the Parliament Building.</para> </document>
If the schema you are using does not have provision for xml:lang
, use the equivalent attribute.
In this example, the schema for this document type has a non-standard way to specify language: a code
attribute. The author should use that mechanism, not xml:lang
. This is possible because the developer of the stringList
document type is providing, along with the schema, the ITS rules document shown below), where code
is declared as an equivalent of xml:lang
for the lang
element.
Note: This example is a multilingual document, which has its own set of issues as described in Best Practice 12: Using multilingual documents with caution.
<stringList> <msg id="connected"> <lang code="cs">Jste připojeni k Internetu.</lang> <lang code="de">Sie sind an das Netz angeschlossen.</lang> <lang code="fr">Vouz êtes connecté à la Toile.</lang> <lang code="it">Sei connesso al Web.</lang> <lang code="ja">インターネットに接続しました。</lang> <lang code="ko">웹에 연결되었습니다.</lang> <lang code="ru">Вы подключены к Интернету.</lang> </msg> </stringList>
This ITS rules document is provided by the developer of the stringList
document type in compliance with Best Practice 1: Providing xml:lang to specify natural language content for existing schemas. Here the its:langRule
element defines the code
attribute of the lang
element as an equivalent to xml:lang
.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:langRule selector="//lang[@code]" langPointer="@code" /> </its:rules>
Note: In some cases, a change in language has implications for translation. For example, a content in a different language may have to remain untranslated, or require specific handling. Such information could be provided to the localizer using its:translate
or its:locNote
(or their equivalents in your schema). For more details, see Best Practice 18: Overriding translatability information if needed and Best Practice 21: Providing notes for localizers.
Why do this
Having information about what is the language of the content is very important in many situations. Some of them are as follow:
Selection of a proper font (e.g. for traditional or simplified Chinese.)
Processing of the text for wrapping and hyphenation.
Providing spell-checking or grammar verification of the text.
Selecting proper automated text such as quotation marks or other punctuation signs.
Using the text with voice browsers.
How to do this
By default the text directionality in an XML document is assumed to be left-to-right. Use its:dir
(or its equivalent in your schema) on each element where the directionality changes.
Your schema should provide its:dir
(or an equivalent mechanism). See Best Practice 2: Providing a way to specify text directionality.
In this example, the attribute its:dir
is used to specify the directionality of a right-to-left text run in a document that is by default left-to-right.
<text xmlns:its="http://www.w3.org/2005/11/its" xml:lang="en" its:version="1.0"> <body> <par>In Hebrew, the title <quote xml:lang="he" its:dir="rtl">פעילות הבינאום, W3C</quote> means <quote>Internationalization Activity, W3C</quote>.</par> </body> </text>
See also Example 3 for more information about source text display.
Why do this
Directional markup is needed for bidirectional scripts.
Language and directionality are distinct dimensions:
There is not necessarily a one-to-one match between a language and what directionality to use. For example, Azerbaijani can be written using both right-to-left and left-to-right scripts, and the language code az
is relevant for either.
The values of inline directionality markup are not necessarily aligned with the values of markup about the language. For example, a part of a document might be declared as having right-to-left directionality, but there might be only a general language declaration for a left-to-right script language available, like fr
.
Markup used to indicate directionality has values that indicate that the normal directionality should be overridden; it is not possible to indicate that using language related values.
CSS should not be used to define the semantics of elements.
In XML documents, using markup is more appropriate than using Unicode Bidi Embedding Controls.
How to do this
Use its:translate
(or its equivalent in your schema) on each element for which the translatability property is different from the defaults set for your schema.
Your schema should provide its:translate
(or an equivalent mechanism). See Best Practice 5: Providing a way to override translation information.
In the following document, the content of the par
elements is normally translatable, but in this instance, the last par
should remain in English. Using its:translate
the author can set the given paragraph as not translatable.
Note that the author does not need to specify that the head
element is not translatable because this is a setting defined for all documents of type myDoc
by the ITS rules document provided by the developer along with the myDoc
schema.
<myDoc xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <head> <lastRev>2007-10-23 041254Z</lastRev> <docID>1A454AE4-7EB8-4ed2-A58E-1EC7F75BB0D5</docID> </head> <par>To apply these terms to you library, attach the following notice. It is safest to attach it to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found.</par> <par>The notice should read (preferably in English):</par> <par its:translate="no">This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This software is distributed as open source under LGPL.</par> </myDoc>
This ITS rules document is the one created by the developer of the myDoc
document type (in implementing Best Practice 4: Indicating which elements and attributes should be translated). By default all element are translatable, and no attribute value is. This ITS overrides the default as follows:
Rule 1: The head
element, and its children, are not translatable.
Rule 2: The alt
attribute of any img
element is translatable
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:translateRule selector="/myDoc/head" translate="no"/> <its:translateRule selector="//img/@alt" translate="yes"/> </its:rules>
To override translatability information for attributes, you have to use a its:translateRule
element in the given document.
This document is of the same type as the one in Example 19 and uses the same ITS rules, therefore the alt
attribute is normally translatable. Because in this specific document the images refer to a user interface that will not be translate (while the document will be), the author needs to override the rule that made any alt
attribute translatable. This is done at the top of the document, using a its:translateRule
.
<myDoc xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <head> <lastRev>2007-11-12 234503Z</lastRev> <docID>D1EA7453-DC53-488a-B950-137BE0EF5253</docID> <its:rules> <!-- The UI is not translated. Do not translate the alt text that refer to any UI buttons --> <its:translateRule selector="//img/@alt" translate="no"/> </its:rules> </head> <par>Once you have selected your options, click the <img src="runBtn.png" alt="Run"/> button to start the process.</par> </myDoc>
Note: Authors should NOT use its:translate
to tag single words or terms that (they think) should remain the same as the source language when translated into a given target language (e.g. loan-words). This type of decision is normally made during translation.
Authors may decide what is translatable, but not how to translate it.
its:translate
.In this document its:translate
is used to markup a proper name and two loan words in an attempt to indicate that they should not be translated. You should NOT do this.
<book xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <body> <p>Everything started when <span its:translate="no">Zebulon</span> discovered that he had a <span its:translate="no">doppelgänger</span> who was a serious baseball <span its:translate="no">aficionado</span>.</p> </body> </book>
One thing that may be useful in helping the translator in this example, would possibly be to mark up loan-words or any special words as terms, following Best Practice 23: Identifying terms.
Why do this
While any exception to the default translation rules for a given schema level should be specified in a set of ITS rules provided with the schema (See Best Practice 4: Indicating which elements and attributes should be translated), there are cases where these general rules need to be overridden for specific elements, in specific documents. It is up to the author of the content to provide such overriding mark up.
How to do this
Use unique identifiers as provided by your schema on each elements where it can be useful for localization. If possible use globally unique and persistent values as identifiers.
Your schema should provide xml:id
(or an equivalent mechanism). See Best Practice 9: Providing a way to specify unique identifiers.
Why do this
Providing unique identifiers can be very useful for change analysis, text tracking, and various other tasks often utilized during the authoring and the localization of documents.
Additional reasons are also listed in Best Practice 9: Providing a way to specify unique identifiers.
How to do this
Do not use CDATA sections in translatable content.
For example, in this document, part of the content is in a CDATA section. This prevent having any additional tagging within the section.
<myData> <item course="12" page="2"> <title>Accessing the R&D facilities</title> <body><![CDATA[The R&D facilities are located in the South wing of Building 12-W, in the East quarter of the section Q. IMPORTANT ==> These facilities are accessible only to personal with Class Omega-45Q1 clearance.]]></body> </item> </myData>
Instead, use a normal XML content. This allows you to tag the content as needed. For instance, here you can add some terminology markup.
<myData xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <item course="12" page="2"> <title>Accessing the R&D facilities</title> <body>The R&D facilities are located in the South wing of Building 12-W, in the East quarter of the section Q. IMPORTANT ==> These facilities are accessible only to personal with <span its:term="yes">Class Omega-45-Q1</span> clearance.</body> </item> </myData>
If the CDATA section encloses a large, self-contained block of data, such as a script or an XML example, you may be able to replace the section by some inclusion mechanism such as XInclude or XLink.
In SVG you can place a script directly into an SVG document and then you usually use CDATA sections to avoid escaping characters inside script source code.
<?xml version="1.0" encoding="utf-8"?> <svg width="6cm" height="5cm" viewBox="0 0 600 500" xmlns="http://www.w3.org/2000/svg" version="1.1"> <!-- Script is inlined and enclosed in CDATA section --> <script type="text/ecmascript"> <![CDATA[ function circle_click(evt) { var circle = evt.target; var currentRadius = circle.getAttribute("r"); if (currentRadius < 100) circle.setAttribute("r", currentRadius*2); else circle.setAttribute("r", currentRadius*0.5); } ]]> </script> <rect x="1" y="1" width="598" height="498" fill="none" stroke="blue"/> <circle onclick="circle_click(evt)" cx="300" cy="225" r="10" fill="red"/> <text x="300" y="480" font-family="Verdana" font-size="35" text-anchor="middle"> Click on circle to change its size </text> </svg>
Instead, you could use XLink to store the script in a separate file and reference it from the SVG document.
<?xml version="1.0" encoding="utf-8"?> <svg width="6cm" height="5cm" viewBox="0 0 600 500" xmlns="http://www.w3.org/2000/svg" version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink"> <!-- Script is included from external file --> <script type="text/ecmascript" xlink:href="animate.js"/> <rect x="1" y="1" width="598" height="498" fill="none" stroke="blue"/> <circle onclick="circle_click(evt)" cx="300" cy="225" r="10" fill="red"/> <text x="300" y="480" font-family="Verdana" font-size="35" text-anchor="middle"> Click on circle to change its size </text> </svg>
It is quite common to use CDATA sections to put examples of source code into XML documents. The following example shows how to do this using DocBook.
<?xml version="1.0" encoding="utf-8"?> <example xmlns="http://docbook.org/ns/docbook"> <title>Skeleton of XHTML page</title> <programlisting><![CDATA[<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>… page title goes here …</title> </head> <body> … page content goes here … </body> </html>]]></programlisting> </example>
Instead, you could use XInclude to store the example code in a separate file and include it during at processing time. Note that you have to use parse="text"
to treat the included file as plain text rather than markup.
<?xml version="1.0" encoding="utf-8"?> <example xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"> <title>Skeleton of XHTML page</title> <programlisting><xi:include href="EX-xhtml-skeleton.xhtml" parse="text" encoding="utf-8"/></programlisting> </example>
If you must use CDATA sections:
Make sure to document the type of content, for example with an attribute set to the appropriate MIME-type. This may help tools to use a more appropriate parser to process the given content.
Aim at having the content well-formed. This will allow parsers to process it more easily.
Note: CDATA is often used to store text markup with HTML or XML tags, which is not recommended. See Best Practice 24: Avoiding including markup in escape form for more details.
Note: Using CDATA has no effect on whether white-spaces are preserved or not by XML processors. To preserve white-spaces use the xml:space
attribute with the value preserve
.
Why do this
The use of CDATA sections prevents the insertion of markup for internationalization or localization purposes. For example, tags to denote change of directionality, or language, or to add localization notes, cannot be used within the content of CDATA.
Numeric character references and entity references are not supported in CDATA sections, which could lead to a possible loss of data if the document is converted from one encoding to another, or when translating.
Mixing content in CDATA sections and content not in CDATA sections in the same document causes more work when doing some tasks with non-XML-aware tools. For example, when searching for the text "R&D" the user has to search both for R&D
(for the CDATA sections) and R&D
(for the normal content).
How to do this
Use its:locNote
, its:locNoteType
and its:locNoteRef
(or their equivalents in your schema) to provide notes to the localizer.
This is especially important for content with inserted text where the translator will need context to translate more accurately.
Your schema should provide its:locNote
, its:locNoteType
, and its:locNoteRef
(or equivalent mechanisms). See Best Practice 8: Providing a way to specify notes for localizers.
In this document two ITS local attributes are used to annotate an XSLT template:
its:locNoteRef
is used to point to an explanation of the acronym RFID.
its:locNote
is used to indicate what kind of value the element <xsl:value-of select="PNum"/>
corresponds to.
Note: When working with XSLT, you need to decide whether the ITS markup should be in the output or not, and may have to use different mark up accordingly. In this example, the ITS attributes do not appear in the output.
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns="http://www.w3.org/1999/xhtml" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <xsl:template match="/data"> <xsl:variable name="Lang" select="Lang"/> <xsl:variable name="EMail" select="EMail"/> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="{$Lang}" lang="{$Lang}"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <title>Login</title> </head> <body> <p>Login Into Queztal-Systems</p> <form method="POST"> <table border="0" id="table2"> <tr><td>First, place your pass card in front of the reader to scan your <xsl:text its:locNoteRef="http://en.wikipedia.org/wiki/RFID">RFID</xsl:text>. When the light turns green, enter your password in the box below, and click Submit.</td></tr> <tr><td><input type="password" name="pword" size="25"/></td></tr> </table> <p><input type="submit" value="Submit" name="go"/></p> </form> <p>If you have difficulties login in, please call <xsl:value-of select="PNum" its:locNote="Toll-free phone number"/>, or send an email to <a href="mailto:{$EMail}"><xsl:value-of select="EMail"/></a>.</p> </body> </html> </xsl:template> </xsl:stylesheet>
Why do this
There are many reasons to provide information to localizers. You may want to:
Expand on the meaning or contextual usage of a particular element, such as what a variable refers to or how a string will be used in the user interface.
Clarify ambiguity and show relationships between items sufficiently to allow correct translation. For example, in many languages it is impossible to translate the word "enabled" in isolation without knowing the gender, number and case of the thing it refers to.
Explain why text is not translated, point to text reuse, or describe the use of conditional text.
Indicate why a piece of text is emphasized (important, sarcastic, etc.)
Using XML comments for doing this may not be enough as they may get stripped out or ignored during the localization process.
Inserted text refers to any text that is marked by a placeholder in the source XML document and automatically inserted within text content when the document is processed.
[Ed. note: TODO: Yves to check with DITA folks about possible link to their BP(?)]How to do this
Use inserted text only when the text is self-contained and does not affect its surrounding context. For example, titles and quotations are inserted text that, usually, would not cause problems.
Avoid using inserted text that has any effect or dependence on the context where it is inserted.
If you must insert text, use its:locNote
or its:termInfoRef
(or their equivalents in your schema) to provide the localizers with some context. See Best Practice 21: Providing notes for localizers and Best Practice 23: Identifying terms.
[Ed. note: TODO: Richard to re-work the examples.]
In this example, in the first message, the element var
is used to insert the name of a printer. In the second example, it is used to insert a filename. The its:locNote
attribute is utilized to provide a description of what the variables represent. This may help in deciding how to translate each message.
<strings xmlns:its="http://www.w3.org/2005/11/its" xml:lang="en" its:version="1.0"> <msg id="pmAdded"><var arg="0" its:locNote="Printer name"/> has been added to the list.</msg> <msg id="fmAdded"><var arg="0" its:locNote="Filename"/> has been added to the list.</msg> </strings>
This is a French translation of the document shown above. The context provided allowed to disambiguate the variable and to get a more accurate translation.
<strings xmlns:its="http://www.w3.org/2005/11/its" xml:lang="fr" its:version="1.0"> <msg id="pmAdded">L'imprimante '<var arg="0" its:locNote="Printer name"/>' a été ajoutée à la liste.</msg> <msg id="fmAdded"><var arg="0" its:locNote="Filename"/> a été ajouté à la liste.</msg> </strings>
Why do this
Types of inserted text are for example:
Boilerplate text reused in different contexts.
Various parts of a compound sentence.
Variables values replaced by their values during the document processing.
The implementation of such text can be done in different ways in XML. Some of them are:
Using entity references.
Using XSLT processing.
Using XInclude mechanisms.
Using XLink mechanisms.
Using a custom mechanism specific to a given format (e.g. the conref
attribute in [DITA 1.0]).
If not used properly, inserted text can cause important (and sometimes un-resolvable) problems during localization. Consider the following:
In this example, the author, working with [DITA 1.0], decided to reference a term in a termbase by using the conref
mechanism. In this occurrence, the term t123
in termbase.xml
has the value "hydraulic lift".
<p>Using an <term conref="termbase.xml#t123"/>, raise the vehicle from the ground.</p>
At a first glance this seems to work fine in English. However, such a construction has several problems:
You should not separate the article from the noun. If "hydraulic lift" is modified in the future and replaced by some other term, it may require an article "a" instead of "an".
The article/noun separation also causes trouble for the translators: Without any easy way to see the actual term when translating the paragraph, they may not be able to decide the gender of the article.
If it is used at the beginning of a sentence, the term would need to be capitalized.
The term is singular in the termbase, while it may need to be plural somewhere in the document.
In inflected languages the form required in the text may be different from the form stored in the termbase. For example, in Polish the term would be stored in its nominative form ("dźwignia hydrauliczna"), while it should be in its instrumental form once inserted in this context: "Używając dźwignię hydrauliczną podnieś pojazd z ziemi."
What constitutes a term depends on many factors specific to each organization and project. Terms may include for example names of features, programs, services, and so forth. They also may include words or expressions that are specific to the domain to which the content pertains, such as technical terms, or legal terms, and they may include terms that simply occur often and should be translated consistently.
How to do this
Use its:term
and its:termInfoRef
(or their equivalent in your schema) to mark terms and supply term-related information.
Your schema should provide its:term
and its:termInfoRef
(or equivalent mechanisms). See Best Practice 11: Providing a way to specify or override terminology-related information.
You should also override default terminology rules as needed.
In this document, terms are normally denoted with a term
element. Following Best Practice 10: Identifying terminology-related elements, the developer of the schema has provided an ITS rules document that defines such property for term
.
However, in this specific document, the author wants to indicate the following:
The content of any ui
element should be seen as a term.
The text Vector Files
in the title is a term.
In the first case, the author uses a its:termRule
element in the header of the document to indicate that any ui
element in this document is a term. This is more efficient than adding an attribute for each instance of ui
in the body of the document.
In the second case, because the schema does not allow the element term
to be used in title
(an oversight of the developer), the author uses a simple span
element with its:term
and its:termInfoRef
to associate Vector Files
with its corresponding term information.
<myManual xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <head> <its:rules> <its:termRule selector="//ui" term="yes"/> </its:rules> <title>Generating <span its:term="yes" its:termInfoRef="#vFile">Vector Files</span></title> </head> <body> <par>Select the command <ui>Build Output Files</ui> from the <ui>Tasks</ui> menu to generate the final <term ref="vFile">vector files</term>.</par> </body> <extra> <terms> <termDef xml:id="vFile">A <emph>vector file</emph> is a binary document that contains the complete set of vectors needed to draw the background layer of a map.</termDef> </terms> </extra> </myManual>
This ITS rules document is the one created by the developer of the myManual
document type (in implementing Best Practice 10: Identifying terminology-related elements). The following rule is provided:
Rule 1: Any term
element is a term and its associated information is located in the element that is identified with the value stored in the ref
attribute of term
.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:termRule selector="//term" term="yes" termInfoRef="id(@ref)"/> </its:rules>
Why do this
If you do not indicate what words are terms of interest in the content, the translators will not know that these terms need to be translated consistently. Often, multiple translators are working on different files in a given project, and the way they choose to translate specific words can be inconsistent with the way that other translators have translated them. If important terms are marked in the content, they can extract these terms from the content before the content is translated, and pre-translate them in the form of a shared electronic dictionary. This ensures consistency of translation of important terms.
While markup denoting terms for a given schema level should be specified in a set of ITS rules provided with the schema (See Best Practice 10: Identifying terminology-related elements), there are cases where these general rules need to be overridden or complemented for specific elements, in specific documents. It is up to the author of the content to provide such overriding markup.
How to do this
If possible, use the XML namespace mechanism to store different vocabularies inside a single XML document.
In this document, the elements top
and body
both contain HTML markup coded as text. There is no easy way to make the distinction between the HTML markup and the HTML text content.
<pages> <row> <key>ENConvClasses</key> <top><span class="h1">Elibur Library</span> - Conversation Groups</top> <body><![CDATA[<p>These small discussion groups meet <b>weekly</b> and are for people learning English. Each group is led by a volunteer who is a native speaker of American English. Groups converse about books, articles, and other materials.</p> <p>Space is limited. Ask for availability to <a href="mailto:enconv@elibur-lib.com"> enconv@elibur-lib.com</a>.</p>]]></body> </row> </pages>
Instead, use the XML namespace mechanism. Here the content of top
and body
is now a mix of text and XHTML elements. This avoid any confusion between text and HTML tags.
<pages xmlns:h="http://www.w3.org/1999/xhtml"> <row> <key>ENConvClasses</key> <top><h:span class="h1">Elibur Library</h:span> - Conversation Groups</top> <body><h:p>These small discussion groups meet <h:b>weekly</h:b> and are for people learning English. Each group is led by a volunteer who is a native speaker of American English. Groups converse about books, articles, and other materials.</h:p> <h:p>Space is limited. Ask for availability to <h:a href="mailto:enconv@elibur-lib.com">enconv@elibur-lib.com</h:a>.</h:p></body> </row> </pages>
Another alternative to using markup as text is to store it externally and include it into the document using a mechanism such as XInclude or XLink.
If you must include markup as text content:
Make sure to document the type of content, for example with an attribute set to the appropriate MIME-type. This may help tools to use a more appropriate parser to process the given content.
Aim at having the content well-formed. This will allow parsers to process it more easily.
Why do this
Storing marked up content has several drawbacks:
Any handling of such content is made difficult by the impossibility to separate text from markup without extra processing.
Often, such content is put in CDATA sections, which has its own set of issues. See Best Practice 20: Avoiding CDATA sections when possible.
This section provides a set of generic techniques that are applicable to various guidelines, for example, how to add ITS attributes to different types of schemas, or how to optimize XPath expressions for the ITS selector
attribute.
Whether they are external or embedded, there are a few things you should take into consideration when writing ITS rules.
Try to keep the number of nodes to be overridden to a minimum for better performance. For example, if most of a document should not be translated, it is better to set the root element to be non-translatable than to set all elements. The inheritance mechanism will have the same effect for a much lower computing cost.
Because a rule has precedence over the ones before, you want to start with the most general rules first and progressively override them as needed. Some rules may be more complex to take into account all the aspects of inheritance.
The order in which the rules are declared matter greatly. ITS defines an order of precedence to process the rules.
Within a its:rules
element, rules go from the most general to the most specific. When two rules select the same nodes of a document, the last rule wins.
Be mindful of the inheritance properties of each data category, a table summarizes the type and scope of inheritance for each data category.
Remember also than inheritance does not override selection. For example:
In this document, the first rule sets all nodes as non-translatable, then the second rule sets all p
elements as translatable, overriding the first rule for the selected nodes. But the b
element is not part of the selection of the second rule and therefore keeps the original setting of non-translatable: Only the text "Some text with " and the terminal "." will be translated.
<doc xmlns:its="http://www.w3.org/2005/11/its"> <head> <its:rules version="1.0"> <its:translateRule selector="//*" translate="no"/> <its:translateRule selector="//p" translate="yes"/> </its:rules> </head> <text> <data>Some data with <b>bolded parts</b>.</data> <p>Some text with <b>bolded words</b>.</p> </text> </doc>
If you change the selector of the first rule to selector="/doc"
, the non-translatable property is inherited for each child node of the doc
element, and when the second rule is applied, the translate property is also applied to the child nodes of the p
element, overriding the previous rule for the b
element inside p
. Therefore the translatable text is "Some text with bolded words."
You could also get the same effect by changing the selector of the second rule instead of the first rule, and explicitly selecting the nodes inside the p
elements with the expression selector="//p/descendant-or-self::*"
.
In general it is usually better to let the inheritance propagate the rules, rather than explicitly select child elements. Such a method is also faster since less nodes are selected.
When writing rules for documents that use XML namespaces you must make sure to declare the namespaces, and to use the relevant prefixes in the different XPath expressions.
ITS uses XPath expressions in several contexts to identify nodes. The most prominent contexts are selectors, and pointer attributes such as:
<its:translateRule selector="//term" translate="no"/>
or
<its:locNoteRule locNoteType="description" selector="//msg/data" locNotePointer="../notes"/>
When writing ITS-related XPath expressions like the one above, the following should be considered:
ITS XPath expressions pertain to XPath 1.0 or its successor
The values of ITS selector attributes are XPath absolute location paths
The values of ITS pointer attributes are XPath relative location paths. The ITS pointer attributes are: locNotePointer
, locNoteRefPointer
, its:termInfoPointer
, its:termInfoRefPointer
, its:rubyPointer
, its:rtPointer
, its:rpPointer
, its:rbcPointer
, its:rtcPointer
, its:rbspanPointer
, and its:langPointer
.
In environments where XSLT is used to process ITS-related XPath expressions, it is important to know about the subset of XPath which is termed "XSLT patterns" (See the note in the section Global Approach of the ITS Specification). Using only XSLT patterns in ITS selector attributes helps to avoid issues which may arise with respect to the match
attribute in XSLT template
elements.
In addition to these general aspects, best practices related to writing XPath expressions should be taken into account (See for example the XPath tutorial http://www.zvon.org/xxl/XPathTutorial/General/examples.html).
This example shows how to add an attribute (here xml:lang
) to an existing document type namely to para
element.
Note that this example only shows a few ways of adding attributes. There are many others, depending on the schema language and the modularization techniques used in the existing schema.
xml:lang
in XML SchemaImport the xml.xsd file in your schema and use references to xml:lang
in your element declarations.
To include the xml:lang
attribute in your XML Schema document, import the W3C xml.xsd schema in your own schema using the xs:import
element.
Importing the xml:lang
declaration in XML Schema.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <!-- Import for xml:lang and xml:space --> <xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/xml.xsd"/> ...
Once the xml.xsd schema is imported, you can use the reference to xml:lang
in any of your element declarations.
Using xml:lang
in XML Schema.
... <xs:element name="para"> <xs:complexType> <xs:sequence maxOccurs="unbounded"> ... </xs:sequence> <xs:attribute ref="xml:lang" use="optional"/> </xs:complexType> </xs:element> ...
xml:lang
in RELAX NGDeclare xml:lang
directly in your schema.
There is no existing declaration and standardized location of
schema fragment defining xml:lang
attribute. You have to declare
xml:lang
directly in your schema as choice between
language
XML Schema datatype and empty value.
Declaration of xml:lang
in RELAX NG
<element name="para" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <attribute name="xml:lang"> <choice> <data type="language"/> <value></value> </choice> </attribute> ... </element>
This section presents several examples of how ITS can be used to enhance the internationalization readiness of some well-known XML document types. These examples are only illustrative and may have to be adapted to fit the need of each specific user.
Two topics are covered for each format:
How should ITS be integrated in specific markup schemas? For example, as for XHTML, it is helpful for the interoperability of ITS implementations to specify that the ITS rules
element will always be part of the content model of the head
element.
How should ITS data categories be associated with existing markup declarations in a schema, which fulfill identical or overlapping purposes? For example, [DITA 1.0] already has an attribute to indicate translatability of text, but without a mechanism for selection of information in documents and schemas.
The following XML vocabularies are discussed:
[XHTML 1.0] is a reformulation of the three HTML 4 document types as applications of XML 1.0. HTML is an SGML (Standard Generalized Markup Language) application, widely regarded as the standard publishing language of the World Wide Web.
In XHTML 1.0, the XHTML namespace may be used with other XML namespaces as per [XML Names], but such documents are not strictly conforming XHTML 1.0 documents in the sense of XHTML 1.0.
An example of such a non-conformant XHTML 1.0 document is as follows.
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:its="http://www.w3.org/2005/11/its" lang="en" xml:lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="keywords" content="ITS example, XHTML translation" /> <its:rules version="1.0" xmlns:h="http://www.w3.org/1999/xhtml"> <its:translateRule selector="//h:meta[@name='keywords']/@content" translate="yes" /> <its:termRule selector="//h:span[@class='term']" term="yes" /> </its:rules> <title>ITS Working Group</title> </head> <body> <h1>Test of ITS on <span class="term">XHTML</span></h1> <p>Some text to translate.</p> <p its:translate="no">Some text not to translate.</p> </body> </html>
There are three ways to use ITS with XHTML and keep the XHTML document conformant:
To use [XHTMLMod1.1]. See Section 5.1.2: Using XHTML Modularization 1.1 for the Definition of ITS for details.
To use either external ITS global rules (as shown below). Even local information within the document that would be handled by ITS attributes can be set indirectly.
To use NVDL. See Section 5.1.3: Using NVDL to integrate ITS into XHTML for details.
These rules illustrate some of the ITS data categories you can associate to specific XHTML markup. The first its:translateRule
indicates that the attribute content
of the meta
element should be translated if the attribute name
is set to "keywords". The second its:translateRule
indicates that no p
with a class="notrans"
should be translated. And the its:termRule
indicates that any span
element with class="term"
is a term.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0" xmlns:h="http://www.w3.org/1999/xhtml"> <its:translateRule selector="//h:meta[@name='keywords']/@content" translate="yes" /> <its:translateRule selector="//h:p[@class='notrans']" translate="no" /> <its:termRule selector="//h:span[@class='term']" term="yes" /> </its:rules>
The corresponding document:
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="keywords" content="ITS example, XHTML translation" /> <title>ITS Working Group</title> </head> <body> <h1>Test of ITS on <span class="term">XHTML</span></h1> <p>Some text to translate.</p> <p class="notrans">Some text not to translate.</p> </body> </html>
This section describes how to use [XHTMLMod1.1] for the definition of ITS. It first defines an ITS abstract module which is then implemented in the format of XML Schema. The module is meant to be integrated in existing or new schemas which rely on [XHTMLMod1.1].
The following is the abstract definition of the elements for global ITS markup, which is consistent with the XHTML Modularization framework [XHTMLMod1.1]. Further definitions of XHTML abstract modules can be found in [XHTMLMod1.1].
Note that this definition does not contain the ruby element and the dir attribute, since these are already available in XHTML. Such existing markup should be associated with ITS data categories using its:rules
element. See Section 5.1.4: Associating existing XHTML markup with ITS.
Elements | Attributes | Minimal Content Model |
---|---|---|
rules | version (CDATA), xlink:href (URI), xlink:type ("simple") | ( translateRule | locNoteRule | termRule | dirRule | rubyRule | langRule | withinTextRule )* |
translateRule | Selector, translate ("yes"|"no") | EMPTY |
locNoteRule | Selector, locNotePointer (CDATA), locNoteType ("alert"| "description"*), locNoteRef (URI), locNoteRefPointer (CDATA) | locNote? |
locNote | translate ("yes"|"no"), locNote (CDATA), locNoteType ( "alert" | "description"* ), locNoteRef (URI), termInfoRef ( URI ), term ( "yes" | "no" ), dir ( "ltr" | "rtl" | "lro" | "rlo" ) | (PCDATA | ruby)* |
termRule | Selector, term ( "yes" | "no" ), termInfoRef ( URI ), termInfoRefPointer ( CDATA), termInfoPointer ( CDATA ) | EMPTY |
dirRule | Selector, dir ("ltr" | "rtl" | "lro" | "rlo") | EMPTY |
rubyRule | Selector, rubyPointer (CDATA), rtPointer (CDATA), rpPointer (CDATA), rbcPointer (CDATA), rtcPointer (CDATA), rbspanPointer (CDATA) | rubyText |
rubyText | translate ("yes"|"no"), locNote (CDATA), locNoteType ("alert"|"description"*), locNoteRef (URI), term ("yes" | "no"), termInfoRef (CDATA), dir ("ltr" | "rtl" | "lro" | "rlo" ), rbspan (CDATA) | PCDATA |
langRule | Selector, langPointer (CDATA) | EMPTY |
withinTextRule | Selector, withinText ("yes"|"no"|"nested") | EMPTY |
The following is the abstract definitions of two attribute groups: the selector attribute used within global rules, and ITS attributes to be used locally. Again these definition makes use of [XHTMLMod1.1].
Collection | Attributes in Collection |
---|---|
Selector | selector (CDATA) |
ITSLocal | translate ("yes"|"no"), locNote (CDATA), locNoteType ("alert"|"description"*), locNoteRef (URI), termInfoRef (URI), term ("yes" | "no") |
The following schema contains the implementation of the abstract markup module in XML Schema.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/2005/11/its" xmlns:its="http://www.w3.org/2005/11/its" xmlns:h="http://www.w3.org/1999/xhtml" elementFormDefault="qualified" xmlns:xlink="http://www.w3.org/1999/xlink"> <xs:import namespace="http://www.w3.org/1999/xlink" schemaLocation="xlink.xsd"/> <xs:import namespace="http://www.w3.org/1999/xhtml" schemaLocation="xhtml-schemas/xhtml-ruby-1.xsd"/> <xs:simpleType name="translate.type"> <xs:restriction base="xs:string"> <xs:enumeration value="yes"/> <xs:enumeration value="no"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="term.type"> <xs:restriction base="xs:string"> <xs:enumeration value="yes"/> <xs:enumeration value="no"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="locNoteType.type"> <xs:restriction base="xs:string"> <xs:enumeration value="alert"/> <xs:enumeration value="description"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="dir.type"> <xs:restriction base="xs:string"> <xs:enumeration value="ltr"/> <xs:enumeration value="ltr"/> <xs:enumeration value="lro"/> <xs:enumeration value="rlo"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="withinText.type"> <xs:restriction base="xs:string"> <xs:enumeration value="yes"/> <xs:enumeration value="no"/> <xs:enumeration value="nested"/> </xs:restriction> </xs:simpleType> <xs:attributeGroup name="its.Selector.attlist"> <xs:attribute name="selector" type="xs:string" use="required"/> </xs:attributeGroup> <xs:attributeGroup name="its.ITSLocal.attlist"> <xs:attribute name="translate" form="qualified" use="optional" type="its:translate.type"/> <xs:attribute name="locNote" type="xs:string" form="qualified" use="optional"/> <xs:attribute name="locNoteType" form="qualified" use="optional" type="its:locNoteType.type"/> <xs:attribute name="locNoteRef" type="xs:anyURI" form="qualified" use="optional"/> <xs:attribute name="termInfoRef" type="xs:string" form="qualified" use="optional"/> <xs:attribute name="term" type="its:term.type" form="qualified" use="optional"/> </xs:attributeGroup> <xs:element name="rules" type="its:rules.type"/> <xs:complexType name="rules.type" mixed="false"> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element ref="its:translateRule"/> <xs:element ref="its:locNoteRule"/> <xs:element ref="its:termRule"/> <xs:element ref="its:dirRule"/> <xs:element ref="its:rubyRule"/> <xs:element ref="its:langRule"/> <xs:element ref="its:withinTextRule"/> </xs:choice> <xs:attributeGroup ref="its:rules.attlist"/> </xs:complexType> <xs:attributeGroup name="rules.attlist"> <xs:attribute name="version" use="required" type="xs:string"/> <xs:attribute ref="xlink:href" use="optional"/> <xs:attribute ref="xlink:type" use="optional"/> </xs:attributeGroup> <xs:element name="translateRule" type="its:translateRule.type"/> <xs:complexType name="translateRule.type"> <xs:attributeGroup ref="its:its.Selector.attlist"/> <xs:attribute name="translate" use="required" type="its:translate.type"/> </xs:complexType> <xs:element name="locNoteRule" type="its:locNoteRule.type"/> <xs:complexType name="locNoteRule.type"> <xs:sequence minOccurs="0" maxOccurs="1"> <xs:element ref="its:locNote"/> </xs:sequence> <xs:attributeGroup ref="its:its.Selector.attlist"/> <xs:attribute name="locNotePointer" type="xs:string" use="optional"/> <xs:attribute name="locNoteType" use="required" type="its:locNoteType.type"/> <xs:attribute name="locNoteRef" type="xs:anyURI" use="optional"/> <xs:attribute name="locNoteRefPointer" type="xs:string" use="optional"/> </xs:complexType> <xs:element name="locNote" type="its:locNote.type"/> <xs:complexType name="locNote.type" mixed="true"> <xs:attribute name="translate" use="optional" type="its:translate.type"/> <xs:attribute name="locNote" type="xs:string" use="optional"/> <xs:attribute name="locNoteType" use="optional" type="its:locNoteType.type"/> <xs:attribute name="locNoteRef" type="xs:anyURI" use="optional"/> <xs:attribute name="termInfoRef" type="xs:anyURI" use="optional"/> <xs:attribute name="term" use="optional" type="its:term.type"/> <xs:attribute name="dir" use="optional" type="its:dir.type"/> </xs:complexType> <xs:element name="termRule"/> <xs:complexType name="termRule.type"> <xs:attributeGroup ref="its:its.Selector.attlist"/> <xs:attribute name="term" type="its:term.type" use="required"/> <xs:attribute name="termInfoRef" type="xs:anyURI" use="optional"/> <xs:attribute name="termInfoRefPointer" type="xs:string" use="optional"/> <xs:attribute name="termInfoPointer" type="xs:string" use="optional"/> </xs:complexType> <xs:element name="dirRule" type="its:dirRule.type"/> <xs:complexType name="dirRule.type"> <xs:attributeGroup ref="its:its.Selector.attlist"/> <xs:attribute name="dir" type="its:dir.type" use="required"/> </xs:complexType> <xs:element name="rubyRule"/> <xs:complexType name="rubyRule.type"> <xs:sequence> <xs:element ref="its:rubyText"/> </xs:sequence> <xs:attributeGroup ref="its:its.Selector.attlist"/> <xs:attribute name="rubyPointer" type="xs:string" use="optional"/> <xs:attribute name="rtPointer" type="xs:string" use="optional"/> <xs:attribute name="rpPointer" type="xs:string" use="optional"/> <xs:attribute name="rbcPointer" type="xs:string" use="optional"/> <xs:attribute name="rtcPointer" type="xs:string" use="optional"/> <xs:attribute name="rbspanPointer" type="xs:string" use="optional"/> </xs:complexType> <xs:element name="rubyText" type="its:rubyText.type"/> <xs:complexType name="rubyText.type" mixed="true"> <xs:attribute name="translate" type="its:translate.type" use="optional"/> <xs:attribute name="locNote" type="xs:string" use="optional"/> <xs:attribute name="locNoteType" type="its:locNoteType.type" use="optional"/> <xs:attribute name="locNoteRef" type="xs:anyURI" use="optional"/> <xs:attribute name="term" type="its:term.type" use="optional"/> <xs:attribute name="termInfoRef" type="xs:string" use="optional"/> <xs:attribute name="dir" type="its:dir.type" use="optional"/> <xs:attribute name="rbspan" type="xs:string" use="optional"/> </xs:complexType> <xs:element name="langRule"/> <xs:complexType name="langRule.type"> <xs:attributeGroup ref="its:its.Selector.attlist"/> <xs:attribute name="langPointer" type="xs:string" use="required"/> </xs:complexType> <xs:element name="withinTextRule"/> <xs:complexType name="withinTextRule.type"> <xs:attributeGroup ref="its:its.Selector.attlist"/> <xs:attribute name="withinText" type="its:withinText.type"/> </xs:complexType> </xs:schema>
The following is a driver file which can be used to evoke the schema above.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xhtml="http://www.w3.org/1999/xhtml" targetNamespace="http://www.w3.org/1999/xhtml" xmlns:its="http://www.w3.org/2005/11/its" xmlns="http://www.w3.org/1999/xhtml" blockDefault="#all"> <xs:annotation> <xs:documentation> This is the XML Schema Driver for new Document Type XHTML Basic 1.0 + ITS $Id: Overview.html,v 1.8 2018/10/09 13:16:41 denis Exp $ </xs:documentation> <xs:documentation source="http://www.w3.org/TR/xml-i18n-bp/#integration-its-xhtmlmod"/> </xs:annotation> <xs:import namespace="http://www.w3.org/2005/11/its" schemaLocation="its-module.xsd"/> <xs:redefine schemaLocation="xhtml-schemas/xhtml-basic10.xsd"> <xs:group name="HeadOpts.mix"> <xs:choice> <xs:group ref="HeadOpts.mix"/> <xs:element ref="its:rules"/> </xs:choice> </xs:group> <xs:attributeGroup name="Common.attrib"> <xs:attributeGroup ref="Common.attrib"/> <xs:attributeGroup ref="its:its.ITSLocal.attlist"/> </xs:attributeGroup> </xs:redefine> </xs:schema>
The file below is an instance which can be validated against this schema.
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:its="http://www.w3.org/2005/11/its" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/1999/xhtml xhtml-plus-its.xsd"> <head> <title> </title> <its:rules version="1.0"> <its:locNoteRule locNoteType="alert" selector="..." locNoteRef="..."> </its:locNoteRule> <its:locNoteRule locNoteType="alert" selector="..."> <its:locNote> </its:locNote> </its:locNoteRule> <its:termRule selector="..." term="yes"/> </its:rules> </head> <body> <h3> </h3> <table> <tr> <td> </td> </tr> </table> <ul> <li its:locNote="..." its:translate="no"> </li> </ul> </body> </html>
This schema conforms to Conformance Type 1.
The schema adds the following ITS element into XHTML schema:
The schema adds the following local ITS attributes into XHTML schema:
As you have seen in previous section it sometimes might be quite laborious to integrate ITS into an existing vocabulary using only modularization and customization features of particular schema language. In such situations you can use the NVDL schema language instead.
In NVDL you can create sort of "meta-schema" which defines how to compose already existing schemas. NVDL schema can be used in the same way as schemas written in other languages like DTD, RELAX NG or XML Schema—you can use such schema for validation of your document instances or XML editor can guide you while you are editing documents. NVDL.org site provides additional information about language and you can find there also list of applications which are supporting the NVDL language.
Adding ITS into XHTML consist of allowing its:rules
element
inside head
element and allowing the ITS local
attributes on every existing XHTML element.
<?xml version="1.0" encoding="UTF-8"?> <rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" startMode="xhtml"> <!-- Validation starts here --> <mode name="xhtml"> <!-- XHTML elements are validated against XHTML schema --> <namespace ns="http://www.w3.org/1999/xhtml"> <validate schema="../xhtml-schemas/xhtml11.xsd"> <!-- Inside head element its:rules element is allowed --> <context path="head" useMode="its-rules"/> </validate> </namespace> <!-- ITS attributes are validated against separate schema --> <namespace ns="http://www.w3.org/2005/11/its" match="attributes"> <validate schema="its-attributes-for-xhtml.rng"/> </namespace> </mode> <!-- Handling of ITS markup in head is different because its:rules should be allowed --> <mode name="its-rules"> <namespace ns="http://www.w3.org/2005/11/its"> <validate schema="its-rules.rng"/> </namespace> <namespace ns="http://www.w3.org/2005/11/its" match="attributes"> <validate schema="its-attributes-for-xhtml.rng"/> </namespace> </mode> </rules>
NVDL script references three schemas. One for XHTML and two supplementary for ITS. The first supplementary schema defines local attributes which are needed for XHTML.
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> <!-- Include schema with all ITS building blocks --> <include href="its.rng"/> <!-- Pull out only definitions of ITS attributes which are useful for XHTML --> <start> <group> <ref name="its-att.translate.attributes"/> <ref name="its-att.locNote.attributes"/> <ref name="its-att.term.attributes"/> <optional> <ref name="its-att.version.attributes"/> </optional> </group> </start> </grammar>
The second supplementary schema defines its:rules
element.
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> <!-- Include schema with all ITS building blocks --> <include href="its.rng"/> <!-- Pull out only definition of its:rules element --> <start> <ref name="its-rules"/> </start> </grammar>
This schema conforms to Conformance Type 1.
The schema adds the following ITS element into XHTML schema:
The schema adds the following local ITS attributes into XHTML schema:
A number of XHTML constructs implement the same semantic as some of the ITS data categories. In addition, some of the attributes in XHTML are translatable, which is not the default for XML documents according to ITS defaults settings for translatability. These attributes need to be identified as translatable.
An external ITS rules
element can summarize these relations. Because XHTML use is widespread and covers a large amount of legacy material the rules defined here may not be optimal for everyone.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0" xmlns:h="http://www.w3.org/1999/xhtml"> <!-- special content. (See note 1) --> <its:translateRule selector="//h:script" translate="no"/> <its:translateRule selector="//h:style" translate="no"/> <!-- Normal translatable attributes --> <its:translateRule selector="//h:*/@abbr" translate="yes"/> <its:translateRule selector="//h:*/@accesskey" translate="yes"/> <its:translateRule selector="//h:*/@alt" translate="yes"/> <its:translateRule selector="//h:*/@prompt" translate="yes"/> <its:translateRule selector="//h:*/@standby" translate="yes"/> <its:translateRule selector="//h:*/@summary" translate="yes"/> <its:translateRule selector="//h:*/@title" translate="yes"/> <!-- The input element (Important: See note 2) --> <its:translateRule selector="//h:input/@value" translate="yes"/> <its:translateRule selector="//h:input[@type='hidden']/@value" translate="no"/> <!-- Non-translatable element (See note 3) --> <its:translateRule selector="//h:del" translate="no"/> <its:translateRule selector="//h:del/descendant-or-self::*/@*" translate="no"/> <!-- Often-used translatable meta content. --> <its:translateRule selector="//h:meta[@name='keywords']/@content" translate="yes"/> <its:translateRule selector="//h:meta[@name='description']/@content" translate="yes"/> <!-- Possible term (Important: See note 4) --> <its:termRule selector="//h:dt" term="yes"/> <!-- Bidirectional information --> <its:dirRule selector="//h:*[@dir='ltr']" dir="ltr"/> <its:dirRule selector="//h:*[@dir='rtl']" dir="rtl"/> <its:dirRule selector="//h:bdo[@dir='ltr']" dir="lro"/> <its:dirRule selector="//h:bdo[@dir='rtl']" dir="rlo"/> <!-- Elements within text --> <its:withinTextRule withinText="yes" selector="//h:abbr | //h:acronym | //h:br | //h:cite | //h:code | //h:dfn | //h:kbd | //h:q | //h:samp | //h:span | //h:strong | //h:var | //h:b | //h:em | //h:big | //h:hr | //h:i | //h:small | //h:sub | //h:sup | //h:tt | //h:del | //h:ins | //h:bdo | //h:img | //h:a | //h:font | //h:center | //h:s | //h:strike | //h:u | //h:isindex" /> </its:rules>
Additional notes on these rules:
Note 1: The script
and style
elements may have translatable text, but their content needs to be parsed with respectively a script filter and a CSS filter. Depending on the capability of your translation tools you may want to leave these elements translatable.
Note 2: The value
attribute of the input
element may or may not be translatable depending on the way the element is used. Selecting value as translatable or not needs to be decided depending on your own use.
Note 3: The del
element indicates removed text and therefore, most often, would not be translatable. Because this element may contain elements with translatable attributes such as img
with an alt
attribute, and because the scope of translatability does not include attributes, you need to: a) define this rule after the definition of the translatable attributes, and b) use the rules with selector="//h:del/descendant-or-self::*/@*"
to override any possible translatable attribute within a del
element or any of its descendants.
Note 4: The dt
element is defined by HTML as a "definition term" and can therefore be seen as a candidate to be associated with the ITS Terminology data category. However, for historical reasons, this element has been used for many other purposes. Selecting dt
as a term or not needs to be decided depending on your own use.
The Text Encoding Initiative [TEI] is intended for literary and linguistic material, and is most often used for digital editions of existing printed material. It is also suitable, however, for general purpose writing. The P5 release of the TEI consists of 23 modules which can be combined together as needed.
The TEI is maintained as a single ODD document, and customizations of it are also written as ODD documents. These are processed using XSLT style sheets to make a tailored user-level schema in XML DTD, XML Schema or RELAX NG.
The ITS additions involve two changes to TEI:
Allowing rules
to appear in the TEI metadata section (the teiHeader
).
Adding the ITS local attributes to the TEI global attribute set.
Both of these can be easily achieved using standard techniques in ODD.
The body of a TEI+ITS customization consists of a schemaSpec
which lists the modules to be included (this example includes six common ones):
<schemaSpec ident="tei-its" start="TEI"> <moduleRef key="header"/> <moduleRef key="core"/> <moduleRef key="tei"/> <moduleRef key="textstructure"/> <moduleRef key="namesdates"/> <moduleRef key="msdescription"/> <!-- Etc. --> </schemaSpec>
In addition, we load the ITS schema (in its RELAX NG XML format, the language used by the TEI for expressing content models), and overload the definition of the TEI content class model.headerPart
to include the ITS rules
:
<moduleRef url="its.rng"> <content xmlns:rng="http://relaxng.org/ns/structure/1.0"> <rng:define name="model.headerPart" combine="choice"> <rng:ref name="rules"/> </rng:define> </content> </moduleRef>
The content class determines which elements are allowed as children of teiHeader
. Lastly, we change the definition of the global attribute class att.global
to reference the ITS local attributes (available from the ITS schema we loaded earlier):
<classSpec ident="att.global" type="atts" mode="change"> <attList> <attRef name="span.attributes"/> </attList> </classSpec>
When processing, this customization produces a schema which permits markup like this:
<TEI xmlns:its="http://www.w3.org/2005/11/its" xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> <fileDesc> <!-- details of the file --> </fileDesc> <rules xmlns="http://www.w3.org/2005/11/its" version="1.0" xmlns:t="http://www.tei-c.org/ns/1.0"> <translateRule translate="no" selector="//t:body/t:p/@*"/> <translateRule translate="yes" selector="//t:body/t:p"/> </rules> </teiHeader> <text> <body> <p rend="normal">Hello <hi>world</hi> </p> <p rend="special">Goodbye</p> <p its:translate="no">This must not be translated</p> </body> </text> </TEI>
In this example, a set of rule elements are provided in the header to provide rules, and the body of the text performs a specific override.
This schema conforms to Conformance Type 1.
The schema adds the following ITS element into TEI schema:
The schema adds the following local ITS attributes into TEI schema:
[XML Spec] is intended for W3C working drafts, notes, recommendations, and all other document types that fall under the category of technical reports. XML Spec is available in the formats of XML DTD, XML Schema and RELAX NG.
ITS has been integrated into xmlspec-i18n.dtd. This is a version of the XML DTD version 2.9 of XML Spec which already supplies various internationalization and localization related features. There is an attribute translate
in xmlspec-i18n.dtd, which can be used for the same purposes as the ITS translate
attribute. To be able to separate them from original XML Spec declarations, all additions are stored in two separate files i18n-extensions.mod and i18n-elements.mod. Xmlspec-i18n.dtd is used within the W3C Internationalization Activity for the creation of technical reports.
For the integration of ITS, the following modifications to the xmlspec-i18n.dtd have been made:
A new entity <!ENTITY % its SYSTEM "its.dtd">
and the entity call %its;
have been added to xmlspec-i18n.dtd.
The existing XML Spec entity %common.att;
has been modified with two additional declarations: its:term
and its:termInfoRef
. The description of implementation information for terminology provides more information about these attributes. The ITS attributes its:translate
, its:locNote
, its:locNoteType
and its:dir
have not been added to the XML Spec DTD, since the DTD provides markup with the same functionality already. Users of XML Spec are encouraged to associate this markup with ITS, see Section 5.3.2: Associating existing XML Spec markup with ITS.
The XML Spec entity %header.mdl;
contains the content model of the header
element. The ITS element rules
has been added as the last element to this content model. In this way, rules
can be used inside an XML Spec document. The header
element of the XML Spec DTD has been chosen as the place for rules
, to avoid the impact of ITS markup on XML Spec markup.
The ITS element ruby
has been added to the XML Spec entity %p.pcd.mix;
. In this way it is possible to use ruby
as an inline element.
This schema conforms to Conformance Type 1.
The schema adds the following ITS element into XML Spec schema:
The schema adds the following local ITS attributes into XML Spec schema:
As mentioned before, xmlspec-i18n.dtd has its own existing markup declarations for various internationalization and localization related purposes. In the original XML Spec 2.9 DTD, there is a term
element which fulfills the same purpose as the ITS term
attribute.
To associate such existing XML Spec and xmlspec-i18n.dtd related markup to ITS markup, the following rules
element has been created.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <!--The following rules are for xmlspec-i18n.dtd--> <its:termRule selector="//qterm" term="yes"/> <its:dirRule dir="ltr" selector="//*[@dir='ltr']"/> <its:dirRule dir="rtl" selector="//*[@dir='rtl']"/> <its:dirRule dir="lro" selector="//*[@dir='lro']"/> <its:dirRule dir="rlo" selector="//*[@dir='rlo']"/> <its:locNoteRule locNoteType="alert" locNotePointer="@locn-alert" selector="//*[@locn-alert]"/> <its:locNoteRule locNoteType="description" locNotePointer="//@locn-note" selector="//*[@locn-note]"/> <its:translateRule translate="yes" selector="//*[@translate='yes']"/> <its:translateRule translate="no" selector="//*[@translate='no']"/> <!--This rule is for the original XML Spec DTD--> <its:termRule selector="//term" term="yes"/> </its:rules>
Since both XML Spec and xmlspec-i18n.dtd do not define a namespace, the mappings use XPath expressions with unqualified element and attribute names.
The Darwin Information Typing Architecture [DITA 1.0] is an XML-based architecture for authoring, producing, and delivering readable information as discrete, typed topics.
DITA offers by default some of the ITS features (See Section 5.4.2: Associating existing DITA markup with ITS for more information on that aspect). But in some cases you may still want to allow the use of ITS markup directly into your DITA documents. For example, the its:locNote
attribute, or the its:rules
element. DITA provides a way to create a domain specialization based on the foreign
element and attribute extension points.
For example, the DITA Concept DTD can be extended as follows:
First, by creating two files for the ITS domain specialization. The first one itsDomain.ent
contains the entity definitions that will be used in the extended DTD.
<!ENTITY % its-d-foreign "its" > <!ENTITY its-d-att "(topic its-d)" >
The second file, itsDomain.mod
, contains the definition of the element where the ITS markup will be placed.
<!-- declaration for the specialized wrapper and alternate element --> <!ENTITY % its "its"> <!-- definition for the specialized wrapper and alternate element --> <!ELEMENT its ((%its-rules;) | (%its-ruby;)) > <!ATTLIST its %global-atts; class CDATA "+ topic/foreign its-d/its ">
Then you can adapt the concept.dtd
file to take into account the new domain.
Include the ITS domain entities at the end of the Domain Entity Declarations section:
<!ENTITY % its-d-dec SYSTEM "itsDomain.ent" > %its-d-dec;
Define the extension element at the end of the Domain Extension section:
<!ENTITY % foreign "foreign | %its-d-foreign;" >
Modify the list of included domains in the included-domains entity:
<!ENTITY included-domains "&ui-d-att; &hi-d-att; &pr-d-att; &sw-d-att; &ut-d-att; &indexing-d-att; &its-d-att;" >
Include the ITS domain module at the end of the Domain Element Integration section:
<!ENTITY % its-d-def SYSTEM "itsDomain.mod" > %its-d-def;
This schema conforms to Conformance Type 1.
The schema adds the following ITS element into DITA schema:
The schema adds the following local ITS attributes into DITA schema:
[Ed. note: TODO: Add final list after DITA integration is done.]There are several ITS data categories that are already implemented in DITA. For example, DITA offers a translate
attribute that provides the same functionality as its:translate
.
Like for other formats, these existing features can be associated with ITS data categories, so ITS-enabled tools can process seamlessly DITA source documents.
Note: When you have the choice of using a DITA construct or an ITS construct to express the same thing, make sure to use the DITA construct to ensure DITA processors work properly. Use ITS local markup only if DITA does not provide an equivalent.
<?xml version="1.0"?> <!-- Possible default ITS rules for DITA --> <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <!-- Translatable attribute (some are deprecated) --> <its:translateRule selector="//image/@alt" translate="yes"/> <its:translateRule selector="//lq/@reftitle" translate="yes"/> <its:translateRule selector="//note/@othertype" translate="yes"/> <its:translateRule selector="//object/@standby" translate="yes"/> <its:translateRule selector="//othermeta/@content" translate="yes"/> <its:translateRule selector="//state/@value" translate="yes"/> <its:translateRule selector="//map/@title" translate="yes"/> <its:translateRule selector="//topicref/@navref" translate="yes"/> <its:translateRule selector="//topicgroup/@navtitle" translate="yes"/> <its:translateRule selector="//topichead/@navtitle" translate="yes"/> <its:translateRule selector="//data/@label" translate="yes"/> <!-- Non-translatable elements --> <its:translateRule selector="//draft-comment//*" translate="no"/> <its:translateRule selector="//draft-comment/descendant-or-self::*/@*" translate="no"/> <its:translateRule selector="//required-cleanup//*" translate="no"/> <its:translateRule selector="//required-cleanup/descendant-or-self::*/@*" translate="no"/> <its:translateRule selector="//coords" translate="no"/> <its:translateRule selector="//shape" translate="no"/> <!-- Translatability flags --> <its:translateRule selector="//*[@translate='no']" translate="no"/> <its:translateRule selector="//*[@translate='no']/descendant-or-self::*/@*" translate="no"/> <its:translateRule selector="//*[@translate='yes']" translate="yes"/> <!-- Directionality flags --> <its:dirRule selector="//*[dir='ltr']" dir="ltr"/> <its:dirRule selector="//*[dir='rtl']" dir="rtl"/> <its:dirRule selector="//*[dir='lro']" dir="lro"/> <its:dirRule selector="//*[dir='rlo']" dir="rlo"/> <!-- Elements within text (inline) --> <its:withinTextRule withinText="yes" selector="//boolean | //cite | //itemgroup | //keyword | //ph | //q | //state | //term | //tm | //xref | //b | //i | //sub | //sup | //tt | //u | //apiname | //codeph | //delim | //fragref | //kwd | //oper | //option | //parmname | //repsep | //sep | //synnoteref | //synph | //var | //cmdname | //filepath | //msgnum | //msgph | //systemoutput | //userinput | //varname | //menucascade | //shortcut | //uicontrol | //wintitle | //coords | //shape" /> <!-- The keyword elements within keywords are sub-flow, not in-line --> <its:withinTextRule withinText="nested" selector="//keywords/keyword" /> <!-- Elements within text (subflow) --> <its:withinTextRule withinText="nested" selector="//draft-comments | //required-cleanup | //alt | //fn | //indexterm" /> <!-- Terminology --> <its:termRule selector="//term | //dt | //termindex" term="yes" /> </its:rules>
The declarations above cover different versions of DITA.
[Glade] is a user interface builder system for GTK+ and Gnome. It uses XML files to store the UI components. The library has been ported to different platform and offers bindings in different programming languages.
<?xml version="1.0" standalone="no"?><!--*- mode: xml -*--> <glade-interface> <widget class="GtkWindow" id="main_window"> <property name="visible">True</property> <property name="title" translatable="yes">Glade Text Editor</property> <property name="type">GTK_WINDOW_TOPLEVEL</property> <property name="window_position">GTK_WIN_POS_NONE</property> <property name="modal">False</property> <property name="default_width">600</property> <property name="default_height">450</property> <property name="resizable">True</property> <property name="destroy_with_parent">False</property> <property name="decorated">True</property> <property name="skip_taskbar_hint">False</property> <property name="skip_pager_hint">False</property> <property name="type_hint">GDK_WINDOW_TYPE_HINT_NORMAL</property> <property name="gravity">GDK_GRAVITY_NORTH_WEST</property> <property name="focus_on_map">True</property> <property name="urgency_hint">False</property> <signal name="delete_event" handler="on_main_window_delete_event"/> <child> <widget class="GtkVBox" id="vbox1"> <property name="visible">True</property> <property name="homogeneous">False</property> <property name="spacing">0</property> <child> <widget class="GtkHandleBox" id="handlebox2"> <property name="visible">True</property> <property name="shadow_type">GTK_SHADOW_OUT</property> <property name="handle_position">GTK_POS_LEFT</property> <property name="snap_edge">GTK_POS_TOP</property> <child> <widget class="GtkMenuBar" id="menubar1"> <property name="visible">True</property> <property name="pack_direction">GTK_PACK_DIRECTION_LTR</property> <property name="child_pack_direction">GTK_PACK_DIRECTION_LTR</property> <child> <widget class="GtkMenuItem" id="File"> <property name="visible">True</property> <property name="label" translatable="yes">_File</property> <property name="use_underline">True</property> <child> <widget class="GtkMenu" id="File_menu"> <child> <widget class="GtkImageMenuItem" id="Open"> <property name="visible">True</property> <property name="label">gtk-open</property> <property name="use_stock">True</property> <signal name="activate" handler="on_Open_activate"/> </widget> </child> <child> <widget class="GtkImageMenuItem" id="Exit"> <property name="visible">True</property> <property name="label">gtk-quit</property> <property name="use_stock">True</property> <signal name="activate" handler="on_Exit_activate"/> </widget> </child> </widget> </child> </widget> </child> </widget> </child> </widget> <packing> <property name="padding">0</property> <property name="expand">False</property> <property name="fill">True</property> </packing> </child> </widget> </child> </widget> </glade-interface>
The content of the Glade files are mostly made of not translatable data: UI widgets properties. Text content is limited to title, label and various other type of UI strings. While Glade does offers support for some of the ITS features, in some cases you may still want to allow the use of ITS markup directly into your Glade resources.
[Ed. note: TODO]Glade offers a translatable
attribute that provides the same functionality as its:translate
. The comments
attribute can also be associated with localization information.
Like for other formats, existing features of Glade can be associated with ITS data categories using global rules, so ITS-enabled tools can process seamlessly Glade source documents.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <!-- ITS rules for Glade 2.0, based on http://glade.gnome.org/glade-2.0.dtd --> <its:translateRule selector="/glade-interface" translate="no"/> <its:translateRule selector="//*[@translatable='yes']" translate="yes"/> <its:translateRule selector="//atkaction/@description" translate="yes"/> <its:locNoteRule selector="//*[@translatable='yes']" locNoteType="description" locNotePointer="@comments"/> </its:rules>
DocBook is a general purpose XML schema particularly well suited to books and papers about computer hardware and software (though it is by no means limited to these applications). DocBook is maintained by the DocBook Technical Committee of OASIS.
DocBook V5.0 schema is maintained as a very modular and easy to customize schema written in RELAX NG [RELAX NG 1.0]. General techniques for schema customization are described in [DocBook V5.0 HOWTO].
The ITS additions involve the following changes to DocBook schema:
Adding the ITS local attributes to every existing DocBook element.
Not all ITS
local attributes are added into schema as DocBook already provides its
own means for specifying directionality of text. Such existing markup
should be associated with ITS data categories using its:rules
element. See Section 5.6.2: Associating existing DocBook markup with ITS.
Allowing its:rules
element inside
DocBook info
element which is a general metadata container.
Allowing its:ruby
as inline element almost everywhere where plain text could be.
# This schema integrates ITS markup (http://www.w3.org/TR/its/) # into DocBook schema (http://docbook.org) # # This schema conforms to Conformance Type 1 defined in # http://www.w3.org/TR/its/#conformance-product-schema # # Schema adds the following ITS elements into DocBook schema: # * rules # * ruby # # Schema adds the following local ITS attributes into DocBook schema: # * translate # * locNote # * locNoteType # * locNoteRef # * term # * termInfoRef # # $Id: Overview.html,v 1.8 2018/10/09 13:16:41 denis Exp $ # # Namespace declarations for DocBook, ITS and HTML # (HTML is used internally in DocBook schema) namespace db = "http://docbook.org/ns/docbook" namespace its = "http://www.w3.org/2005/11/its" namespace html = "http://www.w3.org/1999/xhtml" # Include base DocBook schema include "docbook.rnc" { # Exclude ITS markup from "wildcard" element db._any = element * - (db:* | html:* | its:*) { (attribute * { text } | text | db._any)* } } # Include base ITS schema include "its.rnc" # Define pattern for local ITS attributes db.its.attributes = its-att.translate.attributes? & its-att.locNote.attributes? & its-att.term.attributes? & its-att.version.attributes? # Add local ITS attributes to all DocBook elements db.common.base.attributes &= db.its.attributes # Allow its:rules inside info element db.info.extension |= its-rules # Allow Ruby markup almost everywhere db.ubiq.inlines |= its-ruby
For your convenience there is also available “flattened” schema stored inside one file and converted to other schema languages as well.
dbits.rnc (RELAX NG compact syntax schema in one file)[Ed. note: Flattened version are broken at this time]
dbits.rng (RELAX NG schema in one file)[Ed. note: Flattened version are broken at this time]
dbits.dtd (DTD in one file)[Ed. note: Flattened version are broken at this time]
dbits.xsd (W3C XML Schema)[Ed. note: TODO]
There is no need for adding its:span
element as
DocBook provides similar element called phrase
which can be
used for attaching ITS local attributes to an arbitrary piece of
text.
The following example shows sample DocBook article conforming to
DocBook+ITS schema. The its:translateRule
element is used to indicate that
function names (marked-up by function
element) should not be
translated. The first paragraph is also marked as non-translatable
using local ITS markup.
<?xml version="1.0" encoding="UTF-8"?> <article xmlns="http://docbook.org/ns/docbook" xmlns:its="http://www.w3.org/2005/11/its" xmlns:db="http://docbook.org/ns/docbook" version="5.0" xml:lang="en"> <info> <title>Sample article</title> <its:rules version="1.0"> <its:translateRule translate="no" selector="//db:function"/> </its:rules> </info> <para its:translate="no">Non-translatable content</para> <section> <title>Sample section</title> <para>You can delete file using <function>unlink()</function> function.</para> </section> </article>
This schema conforms to Conformance Type 1.
The schema adds the following ITS element into DocBook schema:
The schema adds the following local ITS attributes into DocBook schema:
A number of DocBook constructs implement the same semantic as some of the ITS data categories. In addition, some of the DocBook attributes are translatable, which is not the default for XML documents according to ITS defaults settings for translatability. These attributes need to be identified as translatable.
Note: When you have the choice of using a DocBook construct or an ITS construct to express the same thing, make sure to use the DocBook construct to ensure DocBook processing tools work properly. Use ITS local markup only if DocBook does not provide an equivalent.
An external ITS its:rules
element can
summarize these relations. Because DocBook use is widespread and
diverse the rules defined here are just example which may need further
tailoring for specific use.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" xmlns:db="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.0"> <!-- Translatable attributes --> <its:translateRule selector="//db:table/@summary" translate="yes"/> <its:translateRule selector="//db:*/@xlink:title" translate="yes"/> <its:translateRule selector="//db:*/@xreflabel" translate="yes"/> <its:translateRule selector="//db:*/@label" translate="yes"/> <!-- Non-translatable elements/attributes --> <its:translateRule translate="no" selector="//db:*[@revisionflag = 'deleted']"/> <its:translateRule translate="no" selector="//db:*[@revisionflag = 'deleted']//@*"/> <its:translateRule translate="no" selector="//db:acronym | //db:author | //db:classname | //db:command | //db:constant | //db:date | //db:editor | //db:email | //db:envar | //db:errorcode | //db:exceptionname | //db:filename | //db:function | //db:initializer | //db:interfacename | //db:markup | //db:methodname | //db:modifier | //db:ooclass | //db:ooexception | //db:oointerface | //db:option | //db:parameter | //db:person | //db:personname | //db:productnumber | //db:property | //db:returnvalue | //db:symbol | //db:tag | //db:type | //db:uri | //db:varname"/> <!-- Possible terms --> <its:termRule selector="//db:glossterm" term="yes"/> <its:termRule selector="//db:firstterm" term="yes"/> <!-- Bidirectional information --> <its:dirRule selector="//db:*[@dir='ltr']" dir="ltr"/> <its:dirRule selector="//db:*[@dir='rtl']" dir="rtl"/> <its:dirRule selector="//db:*[@dir='lro']" dir="lro"/> <its:dirRule selector="//db:*[@dir='rlo']" dir="rlo"/> <!-- Elements within text --> <its:withinTextRule withinText="yes" selector="//db:abbrev | //db:accel | //db:acronym | //db:application | //db:author | //db:citation | //db:citebiblioid | //db:citerefentry | //db:citetitle | //db:classname | //db:code | //db:command | //db:computeroutput | //db:constant | //db:database | //db:date | //db:editor | //db:email | //db:emphasis | //db:envar | //db:errorcode | //db:errorname | //db:errortext | //db:errortype | //db:exceptionname | //db:filename | //db:foreignphrase | //db:function | //db:guibutton | //db:guiicon | //db:guilabel | //db:guimenu | //db:guimenuitem | //db:guisubmenu | //db:hardware | //db:initializer | //db:interfacename | //db:jobtitle | //db:keycap | //db:keycode | //db:keycombo | //db:keysym | //db:link | //db:literal | //db:markup | //db:menuchoice | //db:methodname | //db:modifier | //db:mousebutton | //db:olink | //db:ooclass | //db:ooexception | //db:oointerface | //db:option | //db:optional | //db:org | //db:orgname | //db:package | //db:parameter | //db:person | //db:personname | //db:phrase | //db:productname | //db:productnumber | //db:prompt | //db:property | //db:quote | //db:replaceable | //db:returnvalue | //db:shortcut | //db:subscript | //db:superscript | //db:symbol | //db:systemitem | //db:tag | //db:token | //db:trademark | //db:type | //db:uri | //db:userinput | //db:varname | //db:wordasword"/> <its:withinTextRule withinText="nested" selector="//db:alt | //db:footnote | //db:remark | //db:indexterm | //db:primary | //db:secondary | //db:tertiary"/> </its:rules>
The following log records major changes that have been made to this document since the publication in June 2007.
Updated the section Best Practice 1: Providing xml:lang to specify natural language content.
Updated the section Best Practice 2: Providing a way to specify text directionality.
Updated the section Best Practice 7: Providing a way to specify ruby text.
Updated the section Best Practice 8: Providing a way to specify notes for localizers.
Updated the section Best Practice 9: Providing a way to specify unique identifiers.
Updated the section Best Practice 15: Documenting the ITS-related features of your schema.
Updated the section Best Practice 20: Avoiding CDATA sections when possible.
Updated the section Best Practice 16: Specifying the language of the content.
Updated the section Best Practice 18: Overriding translatability information if needed.
Created the content for the section Best Practice 12: Using multilingual documents with caution.
Created the content for the section Best Practice 11: Providing a way to specify or override terminology-related information.
Created the content for the section Best Practice 12: Using multilingual documents with caution.
Created the content for the section Best Practice 13: Naming elements and attributes with caution.
Created the content for the section Best Practice 17: Specifying text directionality if needed.
Created the content for the section Best Practice 19: Assigning unique identifiers to elements with translatable content.
Created the content for the section Best Practice 21: Providing notes for localizers.
Added the section Best Practice 23: Identifying terms.
Added the section Best Practice 24: Avoiding including markup in escape form.
Added summary table for developers.
Added summary table for authors.
This document has been developed with important contributions from: Martin Dürst (W3C Invited Expert), Richard Ishida (W3C/ERCIM), Jirka Kosek (W3C Invited Expert), Christian Lieske (SAP AG), Sebastian Rahtz (W3C Invited Expert), Felix Sasaki (W3C/Keio), Yves Savourel (ENLASO Corporation), Diane Stoick (The Boeing Company), Najib Tounsi (Ecole Mohammadia d'Ingenieurs Rabat (EMI)), Andrzej Zydron.
At the date of publication, the members of the Working Group were: Bartosz Bogacki (W3C Invited Experts), Damien Donlon (Sun Microsystems, Inc.), Martin Dürst (W3C Invited Experts), Poonam Gupta (Centre for Development of Advanced Computing (CDAC)), Richard Ishida (W3C/ERCIM), Jirka Kosek (W3C Invited Experts), Christian Lieske (SAP AG), Sebastian Rahtz (W3C Invited Experts), Francois Richard (HP), Goutam Saha (Centre for Development of Advanced Computing (CDAC)), Felix Sasaki (W3C/Keio), Yves Savourel (ENLASO Corporation), Diane Stoick (The Boeing Company), Najib Tounsi (Ecole Mohammadia d'Ingenieurs Rabat (EMI)), Andrzej Zydron (W3C Invited Experts).