Category: Highlight
Posts
Unicode Collation Algorithm Version 5.2 Released
Version 5.2 of the Unicode Collation Algorithm has been released. This version resynchronizes the Unicode Collation Algorithm with all of the updates for the Unicode Standard, Version 5.2.
The rest of this post is taken from the Unicode Consortium's release notification and details changes and issues for implementations.
- The text of UTS #10 has been updated. Among other changes, the revised text for UTS #10 makes it clear that the BASE for implicit generation of weights for Han characters does not include unassigned code points.
- There are small changes in Gujarati, Telugu, Malayalam (including weighting for chillus), Tamil, and Sinhala. While these changes move in the direction of expected behavior, good results will only come from tailoring for particular languages, such as with CLDR.
- There have been significant changes to the ordering of many combining marks. Many combining marks that are not in customary use in modern languages now have the same secondary weight, and will only be distinguished on a fourth level, by code point ordering. This can be seen by looking at the Unicode Collation Charts (http://unicode.org/charts/collation/). In 5.2, many characters now have a white background, indicating that they sort exactly the same as the previous character, unless a 4th (codepoint) level is used.
- Implementations of UCA should take note that the increased number of characters may cause overflows if the implementing code makes certain assumptions or optimizations. This can result either from the new character additions (which increase the number of distinct weights in the table) or because of changes in the way the weights, particularly for secondary weight values, are assigned in the table. The latter change may result in unexpected numbers of characters having the same weight.
Article for wide review: Choosing a language tag
Comments are being sought on this article prior to final release. Please send any comments to www-international@w3.org (subscribe). We expect to publish a final version in one to two weeks. [search keys: qa-choosing-language-tags]
Updated article: Language tags in HTML and XML
This tutorial was updated to incorporate changes made to BCP 47 by the recent publication of RFC 5646. Changes to BCP 47 include the introduction of extended language subtags, and the addition of ISO 639-3 language subtags, bringing the total number of subtags in the registry to almost 8,000.
Translators should consider retranslating the whole tutorial. [search keys: article-language-tags]
Unicode 5.2.0 Released
On 1st October, Unicode 5.2 was released! The data files, code charts, and Unicode Standard Annexes for this version are final and are posted on the Unicode site.
For Unicode 5.2, the core specification is no longer just a delta document applied to the book; instead, the entire core specification,with all textual changes integrated, will be available on the Unicode site. As of this announcement, the first five chapters are available; the other chapters will follow soon
For full details about what is new or changed in this release, see the version documentation for Unicode 5.2.
New Working Group Note: Authoring HTML: Handling Right-to-left Scripts
The Internationalization Core Working Group has published Authoring HTML: Handling Right-to-left Scripts as a Working Group Note.
This document describes techniques for the use of HTML markup and CSS style sheets when creating content in languages that use right-to-left scripts, such as Arabic, Hebrew, Persian, Thaana, Urdu, etc. It builds on (but also goes beyond) markup needed to supplement the Unicode bidirectional algorithm, and also touches on how to prepare content that will later be localized into right-to-left scripts.
Editor: Richard Ishida. [search keys: tr-i18n-html-tech-bidi]
New language tag specification, RFC 5646, published
The IETF has published RFC 5646, an update of Tags for Identifying Languages. This specification obsoletes former RFCs 4646, 3066 and 1766.
RFC 5646 makes it possible to use over 7,000 three-letter ISO 639-3 language codes, in addition to the 2 letter codes that have been in use for some time. It also introduces 220 'extended language' subtags, mainly for backwards compatibility.
It continues to be best to refer to this specification as BCP47. This is a non-changing name and web address that points to the latest relevant RFCs.
The Internationalization Working Group at the W3C is working on an article to help users choose language tags, given the various types of subtag that are now available, and the sheer number of subtags.
You can look up language and other subtags in the IANA Language Subtag Registry.
(Richard Ishida has provided an unofficial tool for searching the registry that also provides advice for choosing subtags, and allows you to partially validate a hyphen-separated language tag.)Updated tests: Web fonts
The tests of font linking and eot fonts were updated, along with the associated results pages. The number of tests was reduced to a single test per script, but test cases were created for HTML4, XHTML 1.1 and XHTML served as both text/html and XML. In addition, links to font licence information were added to the test notes. The Urdu font was also updated.
The tests are linked from here:
The results can be found here:
tcworld article about Japanese Requirements Note
tcworld magazine has published an article by Tony Graham about the recently published W3C Note, Requirements for Japanese Text Layout.
Updated tests: HTML and CSS and text direction
Continuing the work of repackaging the tests in the Internationalization test suite around 87 more tests, this time relating to right-to-left and bidirectional text have been updated. Each of the 87 tests are implemented for HTML 4.0, XHTML 1.0 served as text/html, XHTML 1.0 served as XML, and XHTML 1.1 served as XML (ie. totally around 350 test cases).
There are also tables covering the results of the tests, and summaries of the findings. Most of these are new. The tests were run on recent versions of major browsers.
The tests and results are linked from here:
(Note that the vertical text tests are not included in this announcement, since they are still in the early stages of development.)
Updated tests: HTML and CSS character encodings and language declarations
As part of the ongoing work of repackaging the tests in the Internationalization test suite around 70 tests relating to character encodings and language declarations have been updated. Each of the 70 tests are implemented for HTML 4.0, XHTML 1.0 served as text/html, XHTML 1.0 served as XML, and XHTML 1.1 served as XML (ie. totally around 280 test cases).
There are also tables covering the results of each test, and summaries of the findings. The tests were run on recent versions of major browsers.
The tests and results are linked from here:
Questions or comments? ishida@w3.org
Powered by ![]()
Copyright © 1997-2009 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements.