The Planet Web I18n aggregates posts from various blogs that talk about Web internationalization (i18n). While it is hosted by the W3C Internationalization Activity, the content of the individual entries represent only the opinion of their respective authors and does not reflect the position of the Internationalization Activity.
August 11, 2015
August 10, 2015
August 09, 2015
This update allows you to link to information about Han characters and Hangul syllables, and fixes some bugs related to the display of Han character blocks.
Information about Han characters displayed in the lower right area will have a link View data in Unihan database. As expected, this opens a new window at the page of the Unihan database corresponding to this character.
Han and hangul characters also have a link View in PDF code charts (pageXX). On Firefox and Chrome, this will open the PDF file for that block at the page that lists this character. (For Safari and Edge you will need to scroll to the page indicated.) The PDF is useful if there is no picture or font glyph for that character, but also allows you to see the variant forms of the character.
For some Han blocks, the number of characters per page in the PDF file varies slightly. In this case you will see the text approx; you may have to look at a page adjacent to the one you are taken to for these characters.
Note that some of the PDF files are quite large. If the file size exceeds 3Mb, a warning is included.
July 30, 2015
A report summarizing the MultilingualWeb workshop in Riga is now available from the MultilingualWeb site. It contains a summary of each session with links to presentation slides and minutes taken during the workshop in Riga. The workshop was a huge success. With the parallel Connecting Europe Facility (CEF) event, it had more than 200 registered participants. See a summary of highlights, and a dedicated report about outreach activities of the supporting EU funded LIDER project. The Workshop was locally organized by Tilde, sponsored by the LIDER project and by Verisign. Learn more about the Internationalization Activity.
July 23, 2015
The Internationalization Working Group has published a First Public Working Draft of Requirements for Chinese Text Layout (中文排版需求), on behalf of the Chinese Layout Task Force, part of the Internationalization Interest Group.
The document describes requirements for Chinese script layout and text support on the Web and in digital publications. These requirements inform developers of Web technologies such as CSS, HTML, and SVG, and inform browser and tool implementers, about how to support the needs of users in Chinese-speaking communities.
This is still a very early draft and the group is looking for comments and contributions to support the ongoing development of the document.
Changes in this publication of Requirements for Hangul Text Layout and Typography (한국어 텍스트 레이아웃 및 타이포그래피를 위한 요구사항) are editorial in nature, but significant. The separate English and Korean versions of the document were merged into one page. (You can use buttons at the top right of the page to view the document in one language or the other, if you prefer.)
Merging the languages helps significantly for development and maintenance of the document, for guiding users to a language version they prefer, and for bilingual readers offers additional opportunities.
In addition, the links to issues in the document were changed to point to the github issues list, rather than the former Tracker list.
There were no substantive changes to the English (authoritative) version, but the Korean version was brought into line with earlier changes to the English text.
July 22, 2015
Indic Layout Requirements describes the basic requirements for Indic script layout and text support on the Web and in Digital Publications. These requirements provide information for Web technologies such as CSS, HTML, and SVG about how to support users of Indic scripts. The current document focuses on Devanagari, but there are plans to widen the scope to encompass additional Indian scripts as time goes on.
Changes in the new version relate to initial letter styling in Devanagari text. Editorial changes were also made to bring the document in line with recent changes to the Internationalization Activity publishing process.
Character Model for the World Wide Web: String Matching and Searching builds upon Character Model for the World Wide Web 1.0: Fundamentals to provide authors of specifications, software developers, and content developers a common reference on string identity matching on the World Wide Web and thereby increase interoperability.
This new version introduces numerous editorial changes as well as replacing some temporary terminology with better terms, and integrating the case folding text from the string matching algorithm into the case folding section. The document template was also adapated to match the new Internationalization publication process. See details of changes.
Additional Requirements for Bidi in HTML & CSS was used to work through and communicate recommendations made to the HTML and CSS Working Groups for some of the most repetitive pain points prior to HTML5 and CSS3 for people working with bidirectional text in scripts such as Arabic, Hebrew, Thaana, etc.
It is being published now as a Working Group Note for the historical record in order to capture some of the thinking that lay behind the evolution of the specifications and to help people in the future working on bidi issues to understand the history of the decisions taken. Notes have been added to give a brief summary of what was actually implemented in the HTML or CSS specifications.
July 17, 2015
July 14, 2015
July 07, 2015
June 18, 2015
Version 8.0 of the Unicode Standard is now available. It includes 41 new emoji characters (including five modifiers for diversity), 5,771 new ideographs for Chinese, Japanese, and Korean, the new Georgian lari currency symbol, and 86 lowercase Cherokee syllables. It also adds letters to existing scripts to support Arwi (the Tamil language written in the Arabic script), the Ik language in Uganda, Kulango in the Côte d’Ivoire, and other languages of Africa. In total, this version adds 7,716 new characters and six new scripts. For full details on Version 8.0, see Unicode 8.0.
The first version of Unicode Technical Report #51, Unicode Emoji is being released at the same time. That document describes the new emoji characters. It provides design guidelines and data for improving emoji interoperability across platforms, gives background information about emoji symbols, and describes how they are selected for inclusion in the Unicode Standard. The data is used to support emoji characters in implementations, specifying which symbols are commonly displayed as emoji, how the new skin-tone modifiers work, and how composite emoji can be formed with joiners. The Unicode website now supplies charts of emoji characters, showing vendor variations and providing other useful information.
Some of the changes in Version 8.0 and associated Unicode technical standards may require modifications in implementations. For more information, see Unicode 8.0 Migration and the migration sections of UTS #10, UTS #39, and UTS #46.
June 17, 2015
Unicode 8.0.0 is released today. This new version of UniView adds the new characters encoded in Unicode 8.0.0 (including 6 new scripts). The scripts listed in the block selection menu were also reordered to match changes to the Unicode charts page.
The URL for UniView is now http://r12a.github.io/uniview/. Please change your bookmarks.
The github site now holds images for all 28,000+ Unicode codepoints other than Han ideographs and Hangul syllables (in two sizes).
I also fixed the Show Age filter, and brought it up to date.
June 15, 2015
June 10, 2015
You Say .Sucks, I Say .Global: The flood of new domain names isn’t pretty but will create a truly global Internet
The Content Translation Tool makes it easier to create new new Wikipedia articles from other languages. It is now available as a beta-feature in 148 Wikipedias. The tool now features an updated selector with image thumbnails for search results. Screenshot by Runa Bhattacharjee, freely licensed under CC0 1.0
Since our last blog post, much has happened in the world of Content Translation — a tool that makes it easier to translate Wikipedia articles into different languages. The Wikimedia Language Engineering team deployed as a beta-feature in January 2015 in the Catalan, Spanish and Portuguese Wikipedias; today, nearly 150 Wikipedias have access to the tool, and more than 5,000 articles have been created by more than 1,500 editors.
While on the one hand there are large Wikipedias, like English or German, where thousands of volunteers have written articles about millions of topics, there are over 100 smaller Wikipedias where a handful of volunteers are struggling to add more content. Translating from an existing article in another language is a common method adopted in such Wikipedias to create more content. Content Translation attempts to solve this rather daunting problem by simplifying the process, allowing editors to quickly create the first draft of the article and focus on improving the content. It includes an editing interface and translation tools that make it easy to adapt wiki-specific syntax, links, references, and categories. Machine translation support via Apertium is also available for a limited set of languages, which can considerably speed up the process; if it is currently missing for yours, we invite you to take our ongoing survey to test and provide feedback.
Even without machine translation, the tool has been used to translate from any of the available languages (i.e. from all Wikipedias) with features that allow automatic adaptation of links, images, references and categories. For instance, nearly 500 new articles have been created in the French Wikipedia using Content Translation and without machine translation.
New languages and feature improvements
At present, Content Translation is available on 148 Wikipedias as a beta-feature for logged-in users, including several large Wikipedias like French, Dutch and Polish. Being a beta-feature, only logged-in users can use it at present by enabling it from their preferences.
The tool presents a simple workflow—select the source language and article to translate, select the target language, translate the contents of the article, and publish it as a new page in the corresponding Wikipedia. For category adaptation, the corresponding category needs to exist in the target Wikipedia. Translators can also save the translations and work on it later.
In the months of April and May, we focused on improving features that made it easy for users to start translating with Content Translation by quickly gaining access to it. We introduced a campaign that prompted users to try Content Translation instead, when they were creating a new article. Users could enable the feature directly from the campaign message screen and begin translating the page from another language. A call-out message was also added to the Contributions menu providing quick access to different kinds of contributions (including translations). As an outcome of these measures, we now see a sharp increase in the number of new articles being created every week by increasing number of new users (see images). We expect to get better insight into the usage numbers in the coming month.
Feature improvement highlights:
- To prepare for deployment on wikis with Right-to-Left content, several bugs have been fixed.
- Users will also see an improvement in the selector dialog, where results from articles searched are now displayed with a thumbnail and small description (see image).
- The ULS input method has been integrated in the Content Translation editing interface
- New articles created using Content Translation are now automatically linked through Wikidata
Deployment update and what’s coming next
In the coming month, we aim to continue adding Content Translation as a beta-feature to more Wikipedias so that more users can test the tool. This not only exposes special cases that we need to be aware of (like local gadgets or Wikipedia specific scripts) but also provides us with feature suggestions.
- Improved link handling with provision for complex use-cases.
- Redesigned statistics page with additional data.
- Preliminary features for an integrated notification system using Echo, to better connect with our users
The Language team will be hosting two Content Translation workshops at Wikimania this year. You can sign up on the Wikimania website (here and here); it is open for all participants. You can read more about Content Translation on the project page and also in the new User Guide (translations are very welcome!).
Read more about Content Translation developments and other updates from the Language Engineering team in our monthly report. We would also like to invite everyone for our online office hour session on June 10 at 14:30 UTC.
Runa BhattacharjeeLanguage Engineering, EditingWikimedia Foundation
June 08, 2015
June 02, 2015
May 30, 2015
The Translatewiki.net project enables communities to localize open source software. It was recently used for a “Translation Rally” that engaged volunteers around the world to translate over 44,000 messages in nine days. Photo by Christian Mehlführer, CC BY-SA 3.0.
How can we engage volunteers to contribute in important yet monotonous tasks? Over the past year, Wikimedia Sverige (Sweden) has been experimenting with ways to strengthen its community on translatewiki.net — a little-known project that nevertheless benefits hundreds of millions of people each month.
Translatewiki is a platform for translating the texts that appear in open source software, including the MediaWiki software used on Wikipedia. These translations make it possible for you to get all the buttons and system messages on Wikipedia in your preferred language; it is preparatory work to make it as easy as possible for writers and readers to use Wikipedia and other open source software.
Translating technical messages is therefore a very important task, but it is often rather isolated and and independent work. Wikimedia Sverige aimed to change that, to make it fun to jointly produce an effort which differs from the regular activity, and therefore invited Translatewiki‘s volunteer translators to a “Translation Rally” for nine days in mid-May with a sum of 500 euros to be divided between all participants reaching more than 500 translations of some of the most important messages. This concept was originally developed by Wikimedia Nederland (Netherlands).
We initially aimed to complete the messages in MediaWiki’s core software—the central messages used on the Wikimedia projects. When finished, the participants could continue with 11 other selected projects. There are almost 65,000 messages to translate to each language, of which MediaWiki constitutes approximately 24,500. Some are only one word long (e.g. “Save”), while others may be several sentences long. As the translations are completed by volunteers, some languages are almost completely translated—but others are almost entirely untranslated. Many even lack translations of the core messages. Participants were given the opportunity to either keep the money for themselves or donate them to Translatewiki‘s continued operation. The majority of the translations were made into non-European languages, but these languages also benefited; for example, hundreds of messages were translated into Swedish.
Prior to the Translation Rally, an email was sent to all registered users asking them to join; this was an important step, as it brought in older users whose activity had dropped off over the years. During the rally’s nine days, the website’s activity was around four times higher than normal. 201 users contributed at least one new translation, and a massive 44,844 messages were added.
Sites using MediaWiki software are now easier to use in the 116 languages improved; it is clear that a much higher activity was achieved thanks to the Rally. However, most of the volunteers did not reach the minimum of 500 translations and couldn’t claim a slice of the 500 euros; 23 of them had valid claims and will split the prize. The winner with the most qualified translations are yet to be appointed.
A remaining question is if this type of activity has a positive or negative effect on the community in the long term. The benefit for community engagement is that people are invited and engaged in something new and exiting; you can create a noticeable buzz. However, there are potential risks when adding money or prizes into the mix. Will that reduce interest to participate in the long run, when there are no more prices? Can conflicts increase because of this? Will participants be more sloppy with their translations (this seem to have happened this time)? What can we then do to mitigate these risks? These types of predictions are notoriously hard to do without proper research, as different methods might have different problems and gains. We would greatly welcome more studies in this area.
The only thing we can say with some certainty right now is that for a limited cost, there has been a massive short-term positive effect, especially for languages spoken in poorer countries. From the graphs we developed, you can see that activity has regressed to its previous norm now that the rally ended.
The Translation Rally was organized and sponsored by Wikimedia Sverige, with a generous support from Internetfonden, and the rally itself was run by Siebrand Mazeland at the Wikimedia Foundation.
John AnderssonWikimedia Sverige
May 28, 2015
May 27, 2015
May 26, 2015
May 19, 2015
May 15, 2015
May 12, 2015
May 07, 2015
May 03, 2015
April 27, 2015
April 24, 2015
Language Tags and Locale Identifiers for the World Wide Web describes the best practices for identifying or selecting the language of content as well as the the locale preferences used to process or display data values and other information on the Web. It describes how document formats, specifications, and implementations should handle language tags, as well as extensions to language tags that describe the cultural or linguistic preferences referred to in internationalization as a “locale”.
Changes in this update include the following: All references to RFC3066bis were updated to BCP 47 or to RFC5646 or RFC 4647 as appropriate.References to HTML were changed to point to HTML5. Imported and rewrote the text formerly containing in Web Services Internationalization Usage Scenarios defining internationalization, locale, and other important terms. Modified and reorganized the other sections of this document. Moved the Web services materials to an appendix.