Internationalization (i18n)

Making the World Wide Web worldwide!

Learn Find Ask Follow Participate About

Groups/repos

i18n Interest Group

Participate!

Follow the work

Translate a specification or page

Internationalization Sponsorship Program

Search for news

News by category

Announcements (91)

Feedback request (4)

For review (73)

Highlight (317)

Miscellaneous (59)

MLW (40)

New resource (127)

New translation (17)

notify-multilingweb (63)

notify-webi18n (56)

Spec review (2)

Translation needed (218)

Uncategorized (3)

w3cSemanticWeb (134)

w3cWebArchitecture (107)

w3cWebDesign (386)

w3cWebOfDevices (106)

w3cWebServices (114)

w3cWebUserAgents (183)

w3cXMLCore (130)

x (0)

News archives

December 2023 (1)

November 2023 (1)

October 2023 (2)

September 2023 (3)

August 2023 (2)

February 2023 (3)

January 2023 (1)

November 2022 (1)

October 2022 (1)

August 2022 (3)

August 2021 (2)

February 2021 (2)

October 2020 (1)

September 2020 (1)

August 2020 (1)

February 2020 (1)

August 2019 (1)

February 2019 (1)

January 2019 (1)

August 2018 (2)

February 2018 (1)

December 2017 (2)

August 2017 (3)

February 2017 (2)

December 2016 (1)

November 2016 (1)

September 2016 (3)

February 2016 (5)

January 2016 (1)

November 2015 (2)

October 2015 (4)

September 2015 (1)

February 2015 (6)

January 2015 (3)

December 2014 (5)

November 2014 (1)

September 2014 (1)

August 2014 (5)

February 2014 (6)

January 2014 (7)

December 2013 (2)

October 2013 (4)

September 2013 (4)

August 2013 (3)

February 2013 (3)

January 2013 (4)

December 2012 (4)

November 2012 (1)

October 2012 (1)

September 2012 (4)

August 2012 (5)

February 2012 (11)

January 2012 (4)

December 2011 (4)

November 2011 (6)

October 2011 (3)

September 2011 (4)

August 2011 (6)

March 2011 (10)

February 2011 (7)

January 2011 (5)

December 2010 (3)

November 2010 (7)

October 2010 (3)

September 2010 (17)

August 2010 (6)

February 2010 (6)

January 2010 (5)

December 2009 (5)

November 2009 (3)

October 2009 (5)

September 2009 (6)

August 2009 (8)

January 2009 (4)

December 2008 (2)

November 2008 (3)

September 2008 (3)

August 2008 (6)

February 2008 (2)

January 2008 (2)

December 2007 (2)

November 2007 (2)

October 2007 (2)

September 2007 (4)

August 2007 (2)

February 2007 (8)

December 2006 (6)

November 2006 (7)

October 2006 (7)

September 2006 (21)

August 2006 (5)

February 2006 (5)

January 2006 (3)

December 2005 (1)

November 2005 (3)

October 2005 (1)

August 2005 (4)

February 2005 (2)

January 2005 (1)

November 2004 (1)

October 2004 (1)

September 2004 (2)

February 2004 (2)

January 2004 (1)

November 2003 (2)

October 2003 (4)

September 2003 (3)

August 2003 (5)

August 2002 (1)

September 1999 (1)

August 1999 (1)

Search news

I18n sponsors

APL, Japan

Monotype

Log in
SMedia icons by Icons8

Unicode Collation Algorithm Version 5.2 Released

October 23, 2009

Version 5.2 of the Unicode Collation Algorithm has been released. This version resynchronizes the Unicode Collation Algorithm with all
of the updates for the Unicode Standard, Version 5.2.

The rest of this post is taken from the Unicode Consortium’s release notification and details changes and issues for implementations.

The text of UTS #10 has been updated. Among other changes, the revised text for UTS #10 makes it clear that the BASE for implicit generation of weights for Han characters does not include unassigned code points.
There are small changes in Gujarati, Telugu, Malayalam (including weighting for chillus), Tamil, and Sinhala. While these changes move in the direction of expected behavior, good results will only come from tailoring for particular languages, such as with CLDR.
There have been significant changes to the ordering of many combining marks. Many combining marks that are not in customary use in modern languages now have the same secondary weight, and will only be distinguished on a fourth level, by code point ordering. This can be seen by looking at the Unicode Collation Charts (http://unicode.org/charts/collation/). In 5.2, many characters now have a white background, indicating that they sort exactly the same as the previous character, unless a 4th (codepoint) level is used.
Implementations of UCA should take note that the increased number of characters may cause overflows if the implementing code makes certain assumptions or optimizations. This can result either from the new character additions (which increase the number of distinct weights in the table) or because of changes in the way the weights, particularly for secondary weight values, are assigned in the table. The latter change may result in unexpected numbers of characters having the same weight.

Tags: unicode

Categories: Highlight, Miscellaneous, w3cWebDesign, w3cWebUserAgents

Previous post: Article for wide review: Choosing a language tag

Next post: Updated article: Styling using language attributes

Leave a Reply Cancel reply

Copyright © 2023 World Wide Web Consortium.
W3C^® liability, trademark and permissive license rules apply.
Questions or comments? ishida@w3.org