Monthly Archives: October 2009

Posts

October 23, 2009

Unicode Collation Algorithm Version 5.2 Released

Version 5.2 of the Unicode Collation Algorithm has been released. This version resynchronizes the Unicode Collation Algorithm with all
of the updates for the Unicode Standard, Version 5.2.

The rest of this post is taken from the Unicode Consortium’s release notification and details changes and issues for implementations.

The text of UTS #10 has been updated. Among other changes, the revised text for UTS #10 makes it clear that the BASE for implicit generation of weights for Han characters does not include unassigned code points.
There are small changes in Gujarati, Telugu, Malayalam (including weighting for chillus), Tamil, and Sinhala. While these changes move in the direction of expected behavior, good results will only come from tailoring for particular languages, such as with CLDR.
There have been significant changes to the ordering of many combining marks. Many combining marks that are not in customary use in modern languages now have the same secondary weight, and will only be distinguished on a fourth level, by code point ordering. This can be seen by looking at the Unicode Collation Charts (http://unicode.org/charts/collation/). In 5.2, many characters now have a white background, indicating that they sort exactly the same as the previous character, unless a 4th (codepoint) level is used.
Implementations of UCA should take note that the increased number of characters may cause overflows if the implementing code makes certain assumptions or optimizations. This can result either from the new character additions (which increase the number of distinct weights in the table) or because of changes in the way the weights, particularly for secondary weight values, are assigned in the table. The latter change may result in unexpected numbers of characters having the same weight.

Leave a comment

Tags: unicode

Categories: Highlight, Miscellaneous, w3cWebDesign, w3cWebUserAgents

October 9, 2009

Article for wide review: Choosing a language tag

Read the article

Comments are being sought on this article prior to final release. Please send any comments to www-international@w3.org (subscribe). We expect to publish a final version in one to two weeks.

Leave a comment

Tags: qa-choosing-language-tags

Categories: For review, Highlight, w3cSemanticWeb, w3cWebArchitecture, w3cWebDesign, w3cWebOfDevices, w3cWebServices, w3cWebUserAgents, w3cXMLCore

Updated article: Language tags in HTML and XML

Read the article

This tutorial was updated to incorporate changes made to BCP 47 by the recent publication of RFC 5646. Changes to BCP 47 include the introduction of extended language subtags, and the addition of ISO 639-3 language subtags, bringing the total number of subtags in the registry to almost 8,000.

Translators should consider retranslating the whole tutorial.

Leave a comment

Tags: article-language-tags

Categories: Articles, Highlight, Update, w3cSemanticWeb, w3cWebArchitecture, w3cWebDesign, w3cWebOfDevices, w3cWebServices, w3cWebUserAgents, w3cXMLCore

October 7, 2009

Unicode 5.2.0 Released

On 1st October, Unicode 5.2 was released! The data files, code charts, and Unicode Standard Annexes for this version are final and are posted on the Unicode site.

For Unicode 5.2, the core specification is no longer just a delta document applied to the book; instead, the entire core specification,with all textual changes integrated, will be available on the Unicode site. As of this announcement, the first five chapters are available; the other chapters will follow soon

For full details about what is new or changed in this release, see the version documentation for Unicode 5.2.

Leave a comment

Tags: unicode

Categories: Highlight, Miscellaneous, w3cSemanticWeb, w3cWebArchitecture, w3cWebDesign, w3cWebOfDevices, w3cWebServices, w3cWebUserAgents, w3cXMLCore

October 1, 2009

New Working Group Note: Requirements for String Identity Matching and String Indexing

On 15th September, the Internationalization Core Working Group published Requirements for String Identity Matching and String Indexing as a Working Group Note.

This document is being published as a Working Group note in order to capture and preserve historical information. It contains requirements elaborated in 1998 for aspects of the character model for W3C specifications. It was developed and extensively reviewed by the Internationalization Working Group, but never progressed beyond Working Draft status. For this publication, the wording of the 1998 version remains unchanged (except for correction of a small number of typographic errors), but the links to references have been updated prior to this publication.

The document describes requirements for some important aspects of the character model for W3C specifications. The two aspects discussed are string identity matching and string indexing.

Editor: Martin Dürst.

Leave a comment

Tags: tr-charreq

Categories: Articles, New resource

Search for news

Monthly Archives: October 2009

Posts

Unicode Collation Algorithm Version 5.2 Released

Article for wide review: Choosing a language tag

Updated article: Language tags in HTML and XML

Unicode 5.2.0 Released

New Working Group Note: Requirements for String Identity Matching and String Indexing