ITS 2.0 provides a foundation for integrating automated processing of human language into core Web technologies. ITS 2.0 bears many commonalities with its predecessor, ITS 1.0, but provides additional concepts that are designed to foster the automated creation and processing of multilingual Web content.
Work on application scenarios for ITS 2.0 and gathering of usage and implementation experience will now take place in the ITS Interest Group.
The Unicode Consortium has announced Version 6.3 of the Unicode Standard and with it, significantly improved bidirectional behavior. The updated Version 6.3 Unicode Bidirectional Algorithm now ensures that pairs of parentheses and brackets have consistent layout and provides a mechanism for isolating runs of text.
Based on contributions from major browser developers, the updated Bidirectional Algorithm and five new bidi format characters will improve the display of text for hundreds of millions of users of Arabic, Hebrew, Persian, Urdu, and many others. The display and positioning of parentheses will better match the normal behavior that users expect. By using the new methods for isolating runs of text, software will be able to construct messages from different sources without jumbling the order of characters. The new bidi format characters correspond to features in markup (such as in CSS). Overall, these improvements also bring greater interoperability and an improved ability for inserting text and assembling user interface elements.
The improvements come with new rigor: the Consortium now offers two reference implementations and greatly improved testing and test data.
In a major enhancement for CJK usage, this new version adds standardized variation sequences for all 1,002 CJK compatibility ideographs. These sequences address a well-known issue of the CJK compatibility ideographs — that they could change their appearance when any process normalized the text. Using the new standardized variation sequences allows authors to write text which will preserve the specific required shapes of these CJK ideographs, even under Unicode normalization.
Version 6.3 includes other improvements as well:
- Improved Unihan data to better align with ISO/IEC 10646
- Better support for Hebrew word break behavior and for ideographic space in line breaking
The MultilingualWeb-LT Working Group has published a Proposed Recommendation of Internationalization Tag Set (ITS) Version 2.0. The technology described in this document provides a foundation for to integrating automated processing of human language into core Web technologies. ITS 2.0 bears many commonalities with its predecessor, ITS 1.0 but provides additional concepts that are designed to foster the automated creation and processing of multilingual Web content. ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF). Comments are welcome through 22 October.
The Internationalization Working Group has published a Last Call Working Draft of Use Cases & Exploratory Approaches for Ruby Markup.
Comments are welcome through 24 September. As this document has already been through a review cycle, we are not anticipating major changes to arise over the coming two weeks, and hope to move it to publication as a WG Note in two to three weeks time. See the status section for information about where to send feedback if you have any.
This document aims to support discussion about what is needed in the HTML5 specification, and possibly other markup vocabularies, to adequately support ruby markup. It looks at a number of use cases involving ruby, and how well the following approaches support those use cases: the HTML5 model described in the Candidate Recommendation as of 17 December 2012, the XHTML Ruby Annotation model, and the Ruby Extension Specification proposed in February 2013.
The MultilingualWeb-LT Working Group has published a Last Call Working Draft of Internationalization Tag Set (ITS) Version 2.0. ITS 2.0 makes it easier to integrate automated processing of human language into core Web technologies. ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF). Comments are welcome through 10 September.
The recently announced Internationalization Tag Set 2.0 showcase event in Dublin now allows for remote participation. Please register by 17 June 6 p.m. UTC. We will provide dial in details to registered
participants. The number of remote participants is limited and we choose on a
first-come, first-served basis – get your seat soon!
On 18 June the MultilingualWeb-LT Working Group holds a showcase event in Dublin about the upcoming Internationalization Tag Set (ITS) 2.0 specification. Group participants demonstrate implementations for authoring ITS 2.0 data categories, for using them in localization workflows, and for improving machine translation or other language technology processes with ITS 2.0. Participation is free, but registration is required.
The draft implements all changes since the previous publication of 11 April 2013. There are no remaining open issues. The Working Group is planning to finalize ITS 2.0 now: this is your last time to provide feedback! The Last Call period ends 11 June.
ITS 2.0 provides metadata to foster the adoption of the multilingual Web.
Unicode Bidirectional Algorithm basics is a repackaging of the initial part of “What you need to know about the bidi algorithm and inline markup” as a standalone article. It provides a gentle introduction to the behaviour of the Unicode Bidirectional Algorithm, and helps you understand why bidirectional text in Arabic, Hebrew, Thaana, Urdu, etc. behaves the way it does.
Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts
This tutorial has been modified to bring it in line with the current tutorial format. Rather than contain duplicate content, it now introduces the novice to key concepts and points off to useful further reading in an organized fashion. It has been completely rewritten.
Text direction and structural markup in HTML
This article has been created from material formerly in the tutorial “Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts” and augmented with information about new HTML5 markup constructs that are beginning to see adoption. It should be regarded as a new article, focusing on applying bidi markup to document- and block-level content, including forms.
What you need to know about the bidi algorithm and inline markup
This is an update of an existing article, but it has been almost completely rewritten. The most significant changes are the new parts describing how to apply the new HTML5 constructs which are beginning to see adoption. Additional changes will be needed as HTML5 bidi markup is finalised over the coming months. The article also proposes a simpler way to approach markup of bidi text, particularly useful for those with less experience, that relies less on a deep understanding of the issues involved.
Visual vs. logical ordering of text
This is a new article created from material that has been removed from the previously mentioned articles. It was removed into a separate article because visual ordering is much less important these days, and to avoid duplication. Only a few changes have been made to the content itself.