The Unicode Consortium has released CLDR 23, which contains data for 215 languages and 227 territories—654 locales. This release focused primarily on improvements to the LDML structure and tools, and on consistency of data. It includes substantially improved support for non-Gregorian calendars (such as the Japanese Imperial calendar used extensively in Japan). The data and structure has also been modified to easily permit changing between 12 and 24 hour formats, and between 2 digit and 4 digit years. The new Unicode character is used for the Turkish Lira, and information is provided for currencies that round to 5 cents (or other subunits) in cash transactions. For most languages that use non-Latin scripts, characters in the language’s script now collate before those in other scripts (including A-Z). Language-specific letter-casing changes (Lower, Upper, Title) have been added for Azerbaijani, Greek, Lithuanian, and Turkish. Keyboard data has also been updated for Android. Also, as of this release, the LDML specification is split into multiple parts, each focusing on a particular area.
A new FAQ page devoted to the topic of private-use characters, noncharacters, and sentinels has been posted on the Unicode web site. This FAQ aims to clear up confusion about whether noncharacters are permitted in Unicode text, and how they differ from ordinary private-use characters. The recently published Corrigendum #9: Clarification About Noncharacters makes it clear that noncharacters are permitted even in interchange, and the new FAQ page addresses some of the fine points about their usage and about differences from other types of Unicode code points. The brief mentions of noncharacters in other FAQ pages have also been updated accordingly.
Are you unclear about what Unicode “noncharacters” even are? The new FAQ page also answers basic questions about noncharacters and private-use characters, and provides a bit of history about how they came to be part of the Unicode Standard.
The Unicode Consortium announced today that the CLDR Survey Tool is open for beta testing. CLDR provides key building blocks for software to support the world’s languages, with the largest and most extensive standard repository of locale data available. The survey tool is an online tool used by organizations and individuals to contribute data to this repository, and to vote on alternative contributions.
The survey tool has undergone substantial revision, with dramatic improvements in performance and usability. The Unicode Consortium would appreciate people trying out the tool so that they can identify any remaining problems before we start data submission (currently scheduled for April 4). More information.
The Unicode CLDR 21.0.1 maintenance release is also now available. See details.
The next major release is CLDR 22, scheduled for late August. The CLDR 22 release does involve general data submission, which will begin soon. See the latest schedule.