ITS 2.0 provides a foundation for integrating automated processing of human language into core Web technologies. ITS 2.0 bears many commonalities with its predecessor, ITS 1.0, but provides additional concepts that are designed to foster the automated creation and processing of multilingual Web content.
Work on application scenarios for ITS 2.0 and gathering of usage and implementation experience will now take place in the ITS Interest Group.
The Unicode Consortium has announced Version 6.3 of the Unicode Standard and with it, significantly improved bidirectional behavior. The updated Version 6.3 Unicode Bidirectional Algorithm now ensures that pairs of parentheses and brackets have consistent layout and provides a mechanism for isolating runs of text.
Based on contributions from major browser developers, the updated Bidirectional Algorithm and five new bidi format characters will improve the display of text for hundreds of millions of users of Arabic, Hebrew, Persian, Urdu, and many others. The display and positioning of parentheses will better match the normal behavior that users expect. By using the new methods for isolating runs of text, software will be able to construct messages from different sources without jumbling the order of characters. The new bidi format characters correspond to features in markup (such as in CSS). Overall, these improvements also bring greater interoperability and an improved ability for inserting text and assembling user interface elements.
The improvements come with new rigor: the Consortium now offers two reference implementations and greatly improved testing and test data.
In a major enhancement for CJK usage, this new version adds standardized variation sequences for all 1,002 CJK compatibility ideographs. These sequences address a well-known issue of the CJK compatibility ideographs — that they could change their appearance when any process normalized the text. Using the new standardized variation sequences allows authors to write text which will preserve the specific required shapes of these CJK ideographs, even under Unicode normalization.
Version 6.3 includes other improvements as well:
- Improved Unihan data to better align with ISO/IEC 10646
- Better support for Hebrew word break behavior and for ideographic space in line breaking
The draft implements all changes since the previous publication of 11 April 2013. There are no remaining open issues. The Working Group is planning to finalize ITS 2.0 now: this is your last time to provide feedback! The Last Call period ends 11 June.
ITS 2.0 provides metadata to foster the adoption of the multilingual Web.
Unicode Bidirectional Algorithm basics is a repackaging of the initial part of “What you need to know about the bidi algorithm and inline markup” as a standalone article. It provides a gentle introduction to the behaviour of the Unicode Bidirectional Algorithm, and helps you understand why bidirectional text in Arabic, Hebrew, Thaana, Urdu, etc. behaves the way it does.
eBooks & i18n: Richer Internationalization for eBooks on 4 June 2013 in Tokyo, Japan, will investigate international functionality that needs to be added to the Open Web Platform. The Open Web Platform includes core W3C technologies such as HTML, CSS, SVG, XML, XSLT, XSL-FO, PNG, RDF, and many more, that are used extensively in eBooks and eBook production.
The goal is to make the various eBook reading platforms suitable for electronic books that use the printing and typesetting traditions of different cultures. If you are interested in participating, please submit a position paper by 30 April 2013. See the Call for Participation for details.
An Indic Layout Task Force has just been announced, as part of the W3C Internationalization Activity. Similar to the very successful Japanese Layout Task Force, the Indic group will provide input to the W3C Open Web Platform related to Indic Languages and Layout.
This task force will gather and integrate feedback from the participating members about the needs and technical feasibility of Indic requirements, and will report the results of its activities as a group back to the Internationalization Core Working Group, as well as to other relevant groups and to the W3C membership and community.
The chair of the Task Force is Swaran Lata, the contact person at the Indian Office of W3C is Somnath Chandra, and the Staff Contact is Richard Ishida. See the home page for more information.
In order to participate in, or follow, the work of the Task Force, please subscribe to the mailing list of the Task Force. You therewith also become a member of the Internationalization Interest Group.
The MultilingualWeb-LT Working Group published a First Public Working Draft of Metadata for the Multilingual Web – Usage Scenarios and Implementations. This document introduces a variety of usage scenarios and applications for the Internationalization Tag Set (ITS) 2.0, ranging from simple machine translation or human translation quality check to training for machine translation systems or automatic text analyis. Many of the underlying implemementations will be showcased in the upcoming W3C MultilingualWeb Workshop 12-13 March in Rome.
A new FAQ page devoted to the topic of private-use characters, noncharacters, and sentinels has been posted on the Unicode web site. This FAQ aims to clear up confusion about whether noncharacters are permitted in Unicode text, and how they differ from ordinary private-use characters. The recently published Corrigendum #9: Clarification About Noncharacters makes it clear that noncharacters are permitted even in interchange, and the new FAQ page addresses some of the fine points about their usage and about differences from other types of Unicode code points. The brief mentions of noncharacters in other FAQ pages have also been updated accordingly.
Are you unclear about what Unicode “noncharacters” even are? The new FAQ page also answers basic questions about noncharacters and private-use characters, and provides a bit of history about how they came to be part of the Unicode Standard.
The program has been published for the upcoming W3C MultilingualWeb Workshop: Making the Multilingual Web Work in Rome, 12–13 March 2013.
Mark Davis and Vladimir Weinstein of Google will deliver the keynote presentation, “Innovations in Internationalization at Google”. This will be followed by one and a half days of talks on various aspects of what it takes to make multilingualism work on the Web, plus an afternoon of discussion-oriented breakout sessions that focus on best practices for various aspects the multilingual Web. Speakers will come from organizations like Adobe Systems, SAP, Yandex, the Spanish Tax Agency, the U.N. Food and Agriculture Organization, Microsoft, Lionbridge, SDL, the European Commission, and leading universities and research institutions from around the world.
The program will also feature a showcase of implementations of the forthcoming ITS 2.0 specification that will allow attendees to get a sneak peak at how this specification will impact and support multilingual requirements on the Web.
See the Call for Participation for details about how to register for the workshop. Participation in the workshop is free.
Important: The deadline for registration is 8 March, but available attendance slots are filling up fast and are expected to run out before the deadline. So please be sure to register soon to ensure that you can attend.
The MultilingualWeb workshops, funded by the European Commission and coordinated by the W3C, looks at best practices and standards related to all aspects of creating, localizing and deploying the multilingual Web. The workshops are successful because they attracted a wide range of participants, from fields such as localization, language technology, browser development, content authoring and tool development, etc., to create a holistic view of the interoperability needs of the multilingual Web.
We look forward to seeing you in Rome!
The deadline for speaker submissions for the 6th MultilingualWeb Workshop (March 12–13, 2013 in Rome, Italy) is this Friday (January 18 at 23:59 UTC).
With a keynote by Mark Davis and Vladimir Weinstein (Google), special breakout sessions on linked open data and other critical topics, this Workshop is set to continue the tradition of excellence set by the previous six Workshops, and will provide an outstanding forum for thought leaders to share their ideas and gain critical feedback.
While the organizers have already received many excellent submissions, there is still time to make a proposal, and we encourage interested parties to do so by the deadline. With over 100 attendee registrations already submitted for the Workshop, we are certain to have a large and diverse audience and stimulating discussion about all of the presentations.
For more information, please visit the Rome Workshop Call for Participation.