W3C   Internationalization (I18n) Activity: Making the World Wide Web truly world wide!

i18n resources

Authoring X/HTML/CSS

New! Authoring SVG

Authoring XML

Developing specifications

Setting up a server

Developing schemas

Using the Web

Quick links

Planet Web i18n

Specifications

Articles, tutorials & best practices

I18n tests

About the Activity

Groups: Core, ITS, IG, JLTF

Mission, Contacts

Activity Statement

Participate!

Join a Working Group

Review a W3C specification

Translate a specification or page

Subscribe to the Interest Group list

Search for news

Admin

Category: Miscellaneous

Posts

23 October 2009

Unicode Collation Algorithm Version 5.2 Released

Version 5.2 of the Unicode Collation Algorithm has been released. This version resynchronizes the Unicode Collation Algorithm with all of the updates for the Unicode Standard, Version 5.2.

The rest of this post is taken from the Unicode Consortium's release notification and details changes and issues for implementations.

  • The text of UTS #10 has been updated. Among other changes, the revised text for UTS #10 makes it clear that the BASE for implicit generation of weights for Han characters does not include unassigned code points.
  • There are small changes in Gujarati, Telugu, Malayalam (including weighting for chillus), Tamil, and Sinhala. While these changes move in the direction of expected behavior, good results will only come from tailoring for particular languages, such as with CLDR.
  • There have been significant changes to the ordering of many combining marks. Many combining marks that are not in customary use in modern languages now have the same secondary weight, and will only be distinguished on a fourth level, by code point ordering. This can be seen by looking at the Unicode Collation Charts (http://unicode.org/charts/collation/). In 5.2, many characters now have a white background, indicating that they sort exactly the same as the previous character, unless a 4th (codepoint) level is used.
  • Implementations of UCA should take note that the increased number of characters may cause overflows if the implementing code makes certain assumptions or optimizations. This can result either from the new character additions (which increase the number of distinct weights in the table) or because of changes in the way the weights, particularly for secondary weight values, are assigned in the table. The latter change may result in unexpected numbers of characters having the same weight.
7 October 2009

Unicode 5.2.0 Released

On 1st October, Unicode 5.2 was released! The data files, code charts, and Unicode Standard Annexes for this version are final and are posted on the Unicode site.

For Unicode 5.2, the core specification is no longer just a delta document applied to the book; instead, the entire core specification,with all textual changes integrated, will be available on the Unicode site. As of this announcement, the first five chapters are available; the other chapters will follow soon

For full details about what is new or changed in this release, see the version documentation for Unicode 5.2.

7 September 2009

New language tag specification, RFC 5646, published

The IETF has published RFC 5646, an update of Tags for Identifying Languages. This specification obsoletes former RFCs 4646, 3066 and 1766.

RFC 5646 makes it possible to use over 7,000 three-letter ISO 639-3 language codes, in addition to the 2 letter codes that have been in use for some time. It also introduces 220 'extended language' subtags, mainly for backwards compatibility.

It continues to be best to refer to this specification as BCP47. This is a non-changing name and web address that points to the latest relevant RFCs.

The Internationalization Working Group at the W3C is working on an article to help users choose language tags, given the various types of subtag that are now available, and the sheer number of subtags.

You can look up language and other subtags in the IANA Language Subtag Registry.

(Richard Ishida has provided an unofficial tool for searching the registry that also provides advice for choosing subtags, and allows you to partially validate a hyphen-separated language tag.)
29 July 2009

tcworld article about Japanese Requirements Note

tcworld magazine has published an article by Tony Graham about the recently published W3C Note, Requirements for Japanese Text Layout.

Read the article

1 May 2009

ITS support in the Okapi framework

The Okapi Framework Team has announced the first milestone of its Java-based products. The framework provides cross-platform and open-source components and applications for localization tasks.

One of the components in this release is an XML filter based on an implementation of the W3C Internationalization Tag Set (ITS) Recommendation.

The filter allows access to the translatable content of an XML document, based on any external or internal global rules, as well as local rules. The ITS processor provided supports the following data categories: Translate, Localization Note, Element Within Text, Terminology, Directionality, and Language Information.

Rainbow, an Okapi application, uses the filter to extract and merge translatable content to and from XLIFF. Many other utilities provided in the framework take advantage of the ITS-based filter as well, for example to perform pseudo-translation.

You can download the Okapi components and get their source code.

6 March 2009

Changes in the W3C Internationalization Team

At the end of February, Felix Sasaki left the W3C to take up a post at the University of Applied Sciences at Potsdam in Germany.

We wish Felix success for the future, and thank him for his dedication and hard work in supporting the internationalization effort for the past four years.

Categories: Miscellaneous, Highlight
26 March 2008

Internationalization Tag Set Interest Group Launched

The Internationalization Tag Set (ITS) Interest Group, chaired by Yves Savourel (ENLASO Corporation), was launched today. The ITS IG is a forum to foster a community of users of the Internationalization Tag Set (ITS), and aims to promote its adoption, and gather information on its further development. ITS defines data categories that may be used with schemas to support the internationalization and localization of XML-based documents. Participation in the new ITS IG is open to W3C Members and the public.
Categories: Miscellaneous, Highlight
21 February 2007

Internationalization Activity renewed with changes to Working Groups

The Internationalization (I18n) Activity has been renewed, and a new Internationalization Architecture Working Group chaired by François Yergeau (Invited Expert), has been launched. The group is chartered to work on the Character Model Resource Identifiers and Normalization and on Language Tags and Locale Identifiers. The Internationalization Core Working Group is chaired by Addison Phillips (Yahoo!) and is rechartered to propose and coordinate technology to enable universal and worldwide access to the Web. The charter of the Internationalization Tag Set (ITS) Working Group chaired by Yves Savourel (Enlaso) and the Internationalization Interest Group chaired by Martin Dürst (Invited Expert) have been extended. The Internationalization Guidelines, Education & Outreach (GEO) Working Group has closed and its work moved to the Core group. Calls for participation have been issued for the Core Working Group and the Architecture Working Group.
Categories: Miscellaneous, Highlight
15 February 2007

International developments in SSML

Read the article [PDF 82kb]

Paolo Baggia of Loquendo has written a short article to describe Speech Synthesis Markup Language (SSML) developments since 2004 related to internationalization. A series of workshops was held, leading to work on requirements for a new version 1.1 of SSML. A first Working Draft of SSML 1.1 was published in January of this year.

Categories: Miscellaneous
5 February 2007

Workshop on Internationalizing the Speech Synthesis Markup Language (SSML)

The third workshop aimed at internationalizing the Speech Synthesis Markup Language (SSML) was held in Hyderabad, India, on 13-14 January 2007. The summary and the minutes are now available. This workshop was more narrowly focused than the previous workshops, specifically targeting languages of the Indian subcontinent. The workshop was a success and the Voice Browser Working Group has started to review the requirements raised during the workshop with a view to improving international support in the SSML specification.
Categories: Miscellaneous, Highlight

Questions or comments? ishida@w3.org
Powered by b2evolution