Version 7.0 of the Unicode Standard is now available, adding 2,834 new characters. This latest version adds the new currency symbols for the Russian ruble and Azerbaijani manat, approximately 250 emoji (pictographic symbols), many other symbols, and 23 new lesser-used and historic scripts, as well as character additions to many existing scripts. These additions extend support for written languages of North America, China, India, other Asian countries, and Africa. See the link above for full details.
Most of the new emoji characters derive from characters in long-standing and widespread use in Wingdings and Webdings fonts.
Major enhancements were made to the Indic script properties. New property values were added to enable a more algorithmic approach to rendering Indic scripts. These include properties for joining behavior, new classes for numbers, and a further division of the syllabic categories of viramas and rephas. With these enhancements, the default rendering for newly added Indic scripts can be significantly improved.
Unicode character properties were extended to the new characters. The old characters have enhancements to Script and Alphabetic properties, and casing and line-breaking behavior. There were also nearly 3,000 new Cantonese pronunciation entries, as well as new or clarified stability policies for promoting interoperable implementations.
Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 7.0. These will be released at the same time:
The Unicode Consortium is pleased to announce the release of version 2014-05-16 of the Unicode Ideographic Variation Database (IVD). This release registers the new Moji_Joho collection, along with the first 10,710 sequences in that collection, 9,685 of which are shared by the registered Hanyo-Denshi collection. Details can be found at http://www.unicode.org/ivd/.
Unicode CLDR 25 has been released, providing an update to the key building blocks for software supporting the world’s languages. This data is used by a wide spectrum of companies for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.
Unicode CLDR 25 focused primarily on improvements to the LDML structure and tools, and on consistency of data. There are many smaller data fixes, but there was no general data submission. Changes include the following:
- New rules for plural ranges (1-2 liters) for 72 locales, plurals for 2 locales, and ordinals for 18 locales.
- Better locale matching with fallbacks for languages, default languages for continents and subcontinents, and default scripts for more languages.
- Two new locales: West Frisian (fy) and Uyghur (ug).
- Two new metazones: Mexico_Pacific and Mexico_Northwest
- Updated zh pinyin & zhuyin collations and translators for Unicode 6.3 kMandarin data
- Updated keyboard layout data for OSX, Windows and others.
This version contains data for 238 languages and 259 territories—740 locales in all.
Details are provided in http://cldr.unicode.org/index/downloads/cldr-25, along with a detailed Migration section.
The Unicode Consortium has announced Version 6.3 of the Unicode Standard and with it, significantly improved bidirectional behavior. The updated Version 6.3 Unicode Bidirectional Algorithm now ensures that pairs of parentheses and brackets have consistent layout and provides a mechanism for isolating runs of text.
Based on contributions from major browser developers, the updated Bidirectional Algorithm and five new bidi format characters will improve the display of text for hundreds of millions of users of Arabic, Hebrew, Persian, Urdu, and many others. The display and positioning of parentheses will better match the normal behavior that users expect. By using the new methods for isolating runs of text, software will be able to construct messages from different sources without jumbling the order of characters. The new bidi format characters correspond to features in markup (such as in CSS). Overall, these improvements also bring greater interoperability and an improved ability for inserting text and assembling user interface elements.
The improvements come with new rigor: the Consortium now offers two reference implementations and greatly improved testing and test data.
In a major enhancement for CJK usage, this new version adds standardized variation sequences for all 1,002 CJK compatibility ideographs. These sequences address a well-known issue of the CJK compatibility ideographs — that they could change their appearance when any process normalized the text. Using the new standardized variation sequences allows authors to write text which will preserve the specific required shapes of these CJK ideographs, even under Unicode normalization.
Version 6.3 includes other improvements as well:
- Improved Unihan data to better align with ISO/IEC 10646
- Better support for Hebrew word break behavior and for ideographic space in line breaking
The Unicode Technical Committee (UTC) document register is now freely available for public access. This change has been made to increase public involvement in the ongoing deliberations of the UTC in its work developing and maintaining the Unicode Standard and other related standards and reports. Open access to the document register makes it easier to search both current and historical documents for topics of interest, using widely available search engines. The UTC document register contains online documents dating back to 1997 and online registers for paper document distributions dating back to 1991.
A new FAQ page devoted to the topic of private-use characters, noncharacters, and sentinels has been posted on the Unicode web site. This FAQ aims to clear up confusion about whether noncharacters are permitted in Unicode text, and how they differ from ordinary private-use characters. The recently published Corrigendum #9: Clarification About Noncharacters makes it clear that noncharacters are permitted even in interchange, and the new FAQ page addresses some of the fine points about their usage and about differences from other types of Unicode code points. The brief mentions of noncharacters in other FAQ pages have also been updated accordingly.
Are you unclear about what Unicode “noncharacters” even are? The new FAQ page also answers basic questions about noncharacters and private-use characters, and provides a bit of history about how they came to be part of the Unicode Standard.
Responding to requests, the editorial committee has created a modestly-priced print-on-demand volume that contains the complete text of the core specification of Version 6.2 of the Unicode Standard. This 692-page volume may be purchased from Lulu.com for $17.24, plus shipping (prices are available in some other local currencies).
Note that this volume does not include the Version 6.2 code charts, nor does it include the Version 6.2 Standard Annexes and Unicode Character Database, all of which are available only on the Unicode website.
The Unicode Consortium has announced the release of Version 6.1 of the Unicode Standard, continuing Unicode’s long-term commitment to support the full diversity of languages around the world. This latest version adds characters to support additional languages of China, other Asian countries, and Africa. It also addresses educational needs in the Arabic-speaking world. A total of 732 new characters have been added.
This version of the Standard also brings technical improvements to support implementers. Improved changes to property values and their aliases mean that properties now have easy-to-specify labels. The new labels combined with a new script extensions property means that regular expressions can be more straightforward and are easier to validate.
Over 200 new Standardized Variants have been added for emoji characters, allowing implementations to distinguish preferred display styles between text and emoji styles. For example:
26FA FE0F TENT emoji style
26FD FE0E FUEL PUMP text style
26FD FE0F FUEL PUMP emoji style
Among the notable property changes and additions in Unicode 6.1 are two new line break property values, which improve the line-breaking behavior of Hebrew and Japanese text. Segmentation behavior was also improved for Thai, Lao, and similar languages.
Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.1. These will be finalized in February:
UTS #10, Unicode Collation Algorithm
UTS #46, Unicode IDNA Compatibility Processing
The Unicode® Consortium will close its call for participation in the 35th Internationalization & Unicode® Conference (IUC 35) on Friday, March 25. If you want to talk at the conference, you should submit your proposal soon.
The Program Committee will notify authors by Wednesday, April 20. Final presentation materials will be required from selected presenters by Wednesday, August 3.
The conference will take place in Santa Clara, Calif., USA; October 17-19, 2011, sponsored by Adobe. The conference is produced by OMG®.
This is the premier conference on technologies and practices for the creation and management of global and multilingual software solutions. This annual event is praised for its excellent technical content, industry-tested recommendations and updates on the latest standards.
The Unicode 6.0 core specification includes information on scripts newly encoded in Unicode 6.0, as well as many updates and clarifications to other sections of the text. The release of the core specification completes the definitive documentation of the Unicode Standard, Version 6.0.
In Version 6.0, the standard grew by 2,088 characters. Over 1,000 of these characters are symbols used for text exchange on mobile phones. The Unicode Standard now also includes the recently created official symbol for the Indian rupee. After computers and mobile phones update to Version 6.0, the rupee sign will be available for use like the $ or € now.
In addition, this version adds many CJK Unified Ideographs in common use in China, Taiwan, and Japan,as well as characters for African language support, including extensions to the Tifinagh, Ethiopic,and Bamum scripts. Three scripts are supported for the first time: Mandaic, Batak, and Brahmi.
In October of 2010, the other portions of Unicode 6.0 were released: the Unicode Standard Annexes, code charts, and the Unicode Character Database. This allowed vendors to update their implementations of Unicode 6.0 as quickly as possible.
For more information on all of The Unicode Standard, Version 6.0, see http://www.unicode.org/versions/Unicode6.0.0/