Tag(s): unicode


Unicode Conference speaker submission deadline 24 March

The Internationalization and Unicode® Conference (IUC) is the annual conference of the Unicode Consortium where experts and industry leaders gather to map the future of internationalization, ignite new ideas and present the latest in technologies and best practices for creation, management, and testing of global, Web, and multilingual software solutions.

The deadline for speaker submissions is Friday, 24 March, so don’t forget to send in an abstract if you want to speak at the conference.


Unicode Conference speaker submission deadline 4 April

For twenty-five years the Internationalization & Unicode® Conference (IUC) has been the preeminent event highlighting the latest innovations and best practices of global and multilingual software providers. The 40th conference will be held this year on November 1-3, 2016 in Santa Clara, California.

The deadline for speaker submissions is Monday, 4 April, so don’t forget to send in an abstract if you want to speak at the conference.

The Program Committee will notify authors by Friday, May 13, 2016. Final presentation materials will be required from selected presenters by Friday, July 22, 2016.

Tutorial Presenters receive complimentary conference registration, and two nights lodging, while Session Presenters receive a fifty percent conference discount and two nights lodging.

Proposed Update UTR #50, Unicode Vertical Text Layout

A Proposed Update of UTR #50 is now available for public review and comment. The UTR is being reissued with a set of data updated to the character repertoire of Unicode Version 8.0. In this revision, four characters are added to the arrows tailoring set. For details on the proposed changes in the data, please refer to the Modifications section in the UTR.

For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the PRI #309 page.

Tags: ,

Announcing The Unicode® Standard, Version 8.0

Version 8.0 of the Unicode Standard is now available. It includes 41 new emoji characters (including five modifiers for diversity), 5,771 new ideographs for Chinese, Japanese, and Korean, the new Georgian lari currency symbol, and 86 lowercase Cherokee syllables. It also adds letters to existing scripts to support Arwi (the Tamil language written in the Arabic script), the Ik language in Uganda, Kulango in the Côte d’Ivoire, and other languages of Africa. In total, this version adds 7,716 new characters and six new scripts. For full details on Version 8.0, see Unicode 8.0.

The first version of Unicode Technical Report #51, Unicode Emoji is being released at the same time. That document describes the new emoji characters. It provides design guidelines and data for improving emoji interoperability across platforms, gives background information about emoji symbols, and describes how they are selected for inclusion in the Unicode Standard. The data is used to support emoji characters in implementations, specifying which symbols are commonly displayed as emoji, how the new skin-tone modifiers work, and how composite emoji can be formed with joiners. The Unicode website now supplies charts of emoji characters, showing vendor variations and providing other useful information.

Some of the changes in Version 8.0 and associated Unicode technical standards may require modifications in implementations. For more information, see Unicode 8.0 Migration and the migration sections of UTS #10, UTS #39, and UTS #46.


Unicode 8.0 Beta Review

The Unicode® Consortium announced the start of the beta review for Unicode 8.0.0, which is scheduled for release in June, 2015. All beta feedback must be submitted by April 27, 2015.

Unicode 8.0.0 comprises several changes which require careful migration in implementations, including the conversion of Cherokee to a bicameral script, a different encoding model for New Tai Lue, and additional character repertoire. Implementers need to change code and check assumptions regarding case mappings, New Tai Lue syllables, Han character ranges, and confusables. Character additions in Unicode 8.0.0 include emoji symbol modifiers for implementing skin tone diversity, other emoji symbols, a large collection of CJK unified ideographs, a new currency sign for the Georgian lari, and six new scripts. For more information on emoji in Unicode 8.0.0, see the associated draft Unicode Emoji report.

Please review the documentation, adjust code, test the data files, and report errors and other issues to the Unicode Consortium by April 27, 2015. Feedback instructions are on the beta page.

See more information about testing the 8.0.0 beta. See the current draft summary of Unicode 8.0.0.


Unicode 7.0 Paperback Available

The Unicode 7.0 core specification is now available in paperback book form.

Responding to requests, the editorial committee has created a pair of modestly-priced print-on-demand volumes that contain the complete text of the core specification of Version 7.0 of the Unicode Standard.

The form-factor in this edition has been changed from US letter to 6×9 inch US trade paperback size, making the two volumes more compact than previous versions. The two volumes may be purchased separately or together. The cost for the pair is US$16.27, plus postage and applicable taxes. Please visit to order.

Note that these volumes do not include the Version 7.0 code charts, nor do they include the Version 7.0 Standard Annexes and Unicode Character Database, all of which are available only on the Unicode website.

Categories: Unicode

Announcing The Unicode Standard, Version 7.0

Version 7.0 of the Unicode Standard is now available, adding 2,834 new characters. This latest version adds the new currency symbols for the Russian ruble and Azerbaijani manat, approximately 250 emoji (pictographic symbols), many other symbols, and 23 new lesser-used and historic scripts, as well as character additions to many existing scripts. These additions extend support for written languages of North America, China, India, other Asian countries, and Africa. See the link above for full details.

Most of the new emoji characters derive from characters in long-standing and widespread use in Wingdings and Webdings fonts.

Major enhancements were made to the Indic script properties. New property values were added to enable a more algorithmic approach to rendering Indic scripts. These include properties for joining behavior, new classes for numbers, and a further division of the syllabic categories of viramas and rephas. With these enhancements, the default rendering for newly added Indic scripts can be significantly improved.

Unicode character properties were extended to the new characters. The old characters have enhancements to Script and Alphabetic properties, and casing and line-breaking behavior. There were also nearly 3,000 new Cantonese pronunciation entries, as well as new or clarified stability policies for promoting interoperable implementations.

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 7.0. These will be released at the same time:

UTS #10, Unicode Collation Algorithm — the standard for sorting Unicode text
UTS #46, Unicode IDNA Compatibility Processing — for processing of non-ASCII URLs (IDNs)


New version of Unicode Ideographic Variation Database released

The Unicode Consortium is pleased to announce the release of version 2014-05-16 of the Unicode Ideographic Variation Database (IVD). This release registers the new Moji_Joho collection, along with the first 10,710 sequences in that collection, 9,685 of which are shared by the registered Hanyo-Denshi collection. Details can be found at


CLDR Version 25 Released

Unicode CLDR 25 has been released, providing an update to the key building blocks for software supporting the world’s languages. This data is used by a wide spectrum of companies for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Unicode CLDR 25 focused primarily on improvements to the LDML structure and tools, and on consistency of data. There are many smaller data fixes, but there was no general data submission. Changes include the following:

  • New rules for plural ranges (1-2 liters) for 72 locales, plurals for 2 locales, and ordinals for 18 locales.
  • Better locale matching with fallbacks for languages, default languages for continents and subcontinents, and default scripts for more languages.
  • Two new locales: West Frisian (fy) and Uyghur (ug).
  • Two new metazones: Mexico_Pacific and Mexico_Northwest
  • Updated zh pinyin & zhuyin collations and translators for Unicode 6.3 kMandarin data
  • Updated keyboard layout data for OSX, Windows and others.

This version contains data for 238 languages and 259 territories—740 locales in all.

Details are provided in, along with a detailed Migration section.

Tags: ,

The Unicode Standard, Version 6.3 published

The Unicode Consortium has announced Version 6.3 of the Unicode Standard and with it, significantly improved bidirectional behavior. The updated Version 6.3 Unicode Bidirectional Algorithm now ensures that pairs of parentheses and brackets have consistent layout and provides a mechanism for isolating runs of text.

Based on contributions from major browser developers, the updated Bidirectional Algorithm and five new bidi format characters will improve the display of text for hundreds of millions of users of Arabic, Hebrew, Persian, Urdu, and many others. The display and positioning of parentheses will better match the normal behavior that users expect. By using the new methods for isolating runs of text, software will be able to construct messages from different sources without jumbling the order of characters. The new bidi format characters correspond to features in markup (such as in CSS). Overall, these improvements also bring greater interoperability and an improved ability for inserting text and assembling user interface elements.

The improvements come with new rigor: the Consortium now offers two reference implementations and greatly improved testing and test data.

In a major enhancement for CJK usage, this new version adds standardized variation sequences for all 1,002 CJK compatibility ideographs. These sequences address a well-known issue of the CJK compatibility ideographs — that they could change their appearance when any process normalized the text. Using the new standardized variation sequences allows authors to write text which will preserve the specific required shapes of these CJK ideographs, even under Unicode normalization.

Version 6.3 includes other improvements as well:

  • Improved Unihan data to better align with ISO/IEC 10646
  • Better support for Hebrew word break behavior and for ideographic space in line breaking

