The Planet Web I18n aggregates posts from various blogs that talk about Web internationalization (i18n). While it is hosted by the W3C Internationalization Activity, the content of the individual entries represent only the opinion of their respective authors and does not reflect the position of the Internationalization Activity.
May 27, 2015
May 26, 2015
May 19, 2015
May 15, 2015
May 12, 2015
May 07, 2015
May 03, 2015
April 27, 2015
April 24, 2015
Language Tags and Locale Identifiers for the World Wide Web describes the best practices for identifying or selecting the language of content as well as the the locale preferences used to process or display data values and other information on the Web. It describes how document formats, specifications, and implementations should handle language tags, as well as extensions to language tags that describe the cultural or linguistic preferences referred to in internationalization as a “locale”.
Changes in this update include the following: All references to RFC3066bis were updated to BCP 47 or to RFC5646 or RFC 4647 as appropriate.References to HTML were changed to point to HTML5. Imported and rewrote the text formerly containing in Web Services Internationalization Usage Scenarios defining internationalization, locale, and other important terms. Modified and reorganized the other sections of this document. Moved the Web services materials to an appendix.
April 23, 2015
April 08, 2015
The Content Translation tool makes it easier to create new Wikipedia articles from other languages. You can now start translations from your Contributions link, where you can find articles missing in your language. Screenshot by Runa Bhattacharjee, freely licensed under CC0 1.0
The Content Translation tool makes it easier to create new Wikipedia articles from other languages. You can now start translations from your Contributions link, where you can find articles missing in your language. Screenshot by Runa Bhattacharjee, licensed under CC0 1.0
Since it was first introduced three months ago, the Content Translation tool has been used to write more than 850 new articles on 22 Wikipedias. This tool was developed by Wikimedia Foundation’s Language Engineering team to help multilingual users quickly create new Wikipedia articles by translating them from other languages. It includes an editing interface and translation tools that make it easy to adapt wiki-specific syntax, links, references, and categories. For a few languages, machine translation support via Apertium is also available.
Content Translation (aka CX) was first announced on January 20, 2015, as a beta feature on 8 Wikipedias: Catalan, Danish, Esperanto, Indonesian, Malay, Norwegian (Bokmal), Portuguese, and Spanish. Since then, Content Translation has been added gradually to more Wikipedias – mostly at the request of their communities. As a result, the tool is now available as a beta feature on 22 Wikipedias. Logged-in users can enable the tool as a preference on those sites, where they can translate articles from any of the available source languages (including English) into these 22 languages.
Here is what we have learned by observing how Content Translation was used by over 260 editors in the last three months.
To date, nearly 1,000 users have manually enabled the Content Translation tool — and more than 260 have used it to translate a new article. Most translators are from the Catalan and Spanish Wikipedias, where the tool was first released as a beta feature.
Articles created with the Content Translation tool cover a wide range of topics, such as fashion designers, Field Medal scholars, lunar seas and Asturian beaches. Translations can be in two states: published or in-progress. Published articles appear on Wikipedia like any other new article and are improved collaboratively; these articles also include a tag that indicates that they were created using Content Translation. In-progress translations are unpublished and appear on the individual dashboard of the translator who is working on it. Translations are saved automatically and users can continue working on them anytime. In cases where multiple users attempt to translate or publish the same article in the same language, they receive a warning. To avoid any accidental overwrites, the other translators can publish their translations under their user page — and make separate improvements on the main article. More than 875 new articles have been created since Content Translation has been made available — 500 of which were created on the Catalan Wikipedia alone.
When we first planned to release Content Translation, we decided to monitor how well the tool was being adopted — and whether it was indeed useful to complement the workflow used by editors to create a new article. The development team also agreed to respond quickly all queries or bugs. Complex bugs and other feature fixes were planned into the development cycles. But finding the right solution for the publishing target proved to be major challenge, from user experience to analytics. Originally, we did not support publishing into the main namespace of any Wikipedia: users had to publish their translations under their user pages first and then move them to the main namespace. However, this caused delays, confusion and sometimes conflicts when the articles were eventually moved for publication. In some cases, we also noticed that articles had not been counted correctly after publication. To avoid these issues, that original configuration was changed for all supported sites. A new translation is now published like any other new article and in case an article already exists or gets created while the translation was being done, the user is displayed warnings.
Considering the largely favorable response from our first users, we have now started to release the tool to more Wikipedias. New requests are promptly handled and scheduled, after language-specific checks to make sure that proposed changes will work for all sites. However, usage patterns have varied across the 22 Wikipedias. While some of the causes are outside of our control (like the total number of active editors), we plan to make several enhancements to make Content Translation easily discoverable by more users, at different points of the editing and reading workflows. For instance, when users are about to create a new article from scratch, a message gives them the option to start with a translation instead. Users can also see suggestions in the interlanguage link section for languages that they can translate an article into. And last but not least, the Contributions section now provides a link to start a new translation and find articles missing in your language (see image at the top of this post).
In coming months, we will continue to introduce new features and make Content Translation more reliable for our users. See the complete list of Wikipedias where Content Translation is currently available as a beta feature. We hope you will try it out as well, to create more content
Runa Bhattacharjee, Language Engineering, Wikimedia Foundation
April 07, 2015
April 06, 2015
The Content Translation tool has made it a lot easier for Catalan Wikimedians to convert articles to and from different languages. Photo by Flamenc, freely licensed under CC BY-SA 3.0
Catalan Wikimedians are a very enthusiastic wiki community. In relation to the whole movement, we are mid-sized but one of the most active in terms of editors per millions of speakers.
Surprisingly Catalan, our mother language, was banished for more than 40 years. Thankfully, editors like to use wikis for digital language activism. With Wikipedia (Viquipèdia, in Catalan) we founded a digital space where we can freely spread our language without real life restrictions (governments, markets).
Almost 99% of Catalan speakers are bilingual and also speak Spanish. This means that content translation from Spanish Wikipedia happens frequently on our project. Some translate by hand, others use commercial platforms like Google Translate or freely licensed translation engines like Apertium. Some users even create their own translation bots, like the AmicalBot or EVA, which our community loves and uses often.
A few months ago, we heard news of the upcoming Wikimedia’s ContentTranslation tool, and we’re really happy to find that the very first language tests were planned between Spanish and Catalan. Our community responded to this news with great enthusiasm and we have been testing the tool for months now. The development team has kindly listened to our comments and demands, while implementing many of our shared recommendations.
At a personal level, I found the tool really helpful. It is easy to use and understand, and it greatly facilitates our work. I can now translate a 20- line article in less than 5 minutes, saving lots of time. Before, the worst part of translating articles was spending extra time translating reference templates and some of the wikicode. We understand the tool is not perfect yet, but nothing is perfect in a wiki environment: it is continuously being improved.
One of our community’s biggest challenges is updating different language wikis. We have good content about Catalan culture in the Catalan language, but we are not that good at exporting this content to other wikis. I personally hope that this tool can help us with both tasks.
I recommend that you try the ContentTranslation tool with an open mind and spend some time with it. Translate a few articles and if you find any bugs, please report them. When we say Wikipedia is a global project, we mean that it is multilingual, and this tool really helps us reach our shared vision to help every single human being can freely share in the sum of all knowledge.
Alex Hinojo, Amical Wikimedia community member
March 25, 2015
See the program. The keynote speaker will be Page Williams, Director of Global Readiness, Trustworthy Computing, Microsoft. She is followed by a strong line up in sessions entitled Developers and Creators, Localizers, Machines, and Users, including speakers from Microsoft, the European Parliament, the UN FAO, Intel, Verisign, and many more. The workshop is made possible with the generous support of the LIDER project.
Participation in the event is free. Please register via the Riga Summit for the Multilingual Digital Single Market site.
The MultilingualWeb workshops, funded by the European Commission and coordinated by the W3C, look at best practices and standards related to all aspects of creating, localizing and deploying the multilingual Web. The workshops are successful because they attract a wide range of participants, from fields such as localization, language technology, browser development, content authoring and tool development, etc., to create a holistic view of the interoperability needs of the multilingual Web.
We look forward to seeing you in Riga!
March 24, 2015
March 16, 2015
March 14, 2015
March 11, 2015
The Unicode® Consortium announced the start of the beta review for Unicode 8.0.0, which is scheduled for release in June, 2015. All beta feedback must be submitted by April 27, 2015.
Unicode 8.0.0 comprises several changes which require careful migration in implementations, including the conversion of Cherokee to a bicameral script, a different encoding model for New Tai Lue, and additional character repertoire. Implementers need to change code and check assumptions regarding case mappings, New Tai Lue syllables, Han character ranges, and confusables. Character additions in Unicode 8.0.0 include emoji symbol modifiers for implementing skin tone diversity, other emoji symbols, a large collection of CJK unified ideographs, a new currency sign for the Georgian lari, and six new scripts. For more information on emoji in Unicode 8.0.0, see the associated draft Unicode Emoji report.
Please review the documentation, adjust code, test the data files, and report errors and other issues to the Unicode Consortium by April 27, 2015. Feedback instructions are on the beta page.
March 04, 2015
February 26, 2015
We would like to remind you that the deadline for speaker proposals for the 8th MultilingualWeb Workshop (April 29, 2015, Riga, Latvia) is on Sunday, March 8, at 23:59 UTC.
Featuring a keynote by Paige Williams (Director of Global Readiness, Trustworthy Computing at Microsoft) and sessions for various audiences (Web developers, content creators, localisers, users, and multilingual language processing), this workshop will focus on the advances and challenges faced in making the Web truly multilingual. It provides an outstanding and influential forum for thought leaders to share their ideas and gain critical feedback.
While the organizers have already received many excellent submissions, there is still time to make a proposal, and we encourage interested parties to do so by the deadline. With roughly 150 attendees anticipated for the Workshop from a wide variety of profiles, we are certain to have a large and diverse audience that can provide constructive and useful feedback, with stimulating discussion about all of the presentations.
The workshop is made possible by the generous support of the LIDER project and will be part of the Riga Summit 2015 on the Multilingual Digital Single Market. We are organizing the workshop as part of the Riga Summit to strengthen the European related community at large. Depending on the number of submissions to the MultilingualWeb workshop we may suggest to move some presentations to other days of the summit. For these reasons we highly recommend you to attend the whole Riga Summit! See the line-up of speakers already confirmed for the various events during the summit.
For more information and to register a presentation proposal, please visit the Riga Workshop Call for Participation. For registration as a regular participant of the MultilingualWeb workshop or other events at the Riga Summit, please register at the Riga Summit 2015 site.
February 20, 2015
February 18, 2015
February 10, 2015
February 05, 2015
We are please to announce that Paige Williams, Director of Global Readiness, Trustworthy Computing at Microsoft, will deliver the keynote at the 8th Multilingual Web Workshop, “Data, content and services for the Multilingual Web,” in Riga, Latvia (29 April 2015).
Paige spent 10 years managing internationalization of Microsoft.com, before joining the Trustworthy Computing organization in 2005. In TwC, Paige oversees compliance of company policy for geographic, country-region and cultural requirements, establishing a new center of excellence for market and world readiness, globalization/localizability, and language programs, tools, resources and external community forums to reach markets across the world with the right local experience.
The Multilingual Web Workshop series brings together participants interested in the best practices, new technologies, and standards needed to help content creators, localizers, language tools developers, and others address the new opportunities and challenges of the multilingual Web. It will provide for networking across communities and building connections.
Registration for the Workshop is free, and early registration is recommended since space at the Workshop is limited.
The workshop will be part of the Riga Summit 2015 on the Multilingual Digital Single Market. We are organizing the workshop as part of the Riga Summit to strengthen the European related community at large. Depending on the number of submissions to the MultilingualWeb workshop we also may suggest to move presentations to other days of the summit. For these reasons we highly recommend you to attend the whole Riga Summit!
There is still opportunity for individuals to submit proposals to speak at the workshop. Ideal proposals will highlight emerging challenges or novel solutions for reaching out to a global, multilingual audience. The deadline for speaker proposals is March 8, but early submission is strongly encouraged. See the Call for Participation for more details.
This workshop is made possible by the generous support of the LIDER project.
February 03, 2015
The Cascading Style Sheets (CSS) Working Group has published a Candidate Recommendation of CSS Counter Styles Level 3. It adds new built-in counter styles to those defined in CSS 2.1, but, more importantly, it also allows authors to define custom styles for list markers, numbered headings and other types of generated content.
At the same time, the Internationalization Working Group has updated their Working Draft of Predefined Counter Styles, which provides custom rules for over a hundred counter styles in use around the world. It serves both as a ready-to-use set of styles to copy into your own style sheets, and also as a set of worked examples.
January 29, 2015
A key issue for handling of bopomofo (zhùyīn fúhào) is the placement of tone marks. When bopomofo text runs vertically (either on its own, or as a phonetic annotation), some smarts are needed to display tone marks in the right place. This may also be required (though with different rules) for bopomofo when used horizontally for phonetic annotations (ie. above a base character), but not in all such cases. However, when bopomofo is written horizontally in any other situation (ie. when not written above a base character), the tone mark typically follows the last bopomofo letter in the syllable, with no special handling.
From time to time questions are raised on W3C mailing lists about how to implement phonetic annotations in bopomofo. Participants in these discussions need a good understanding of the various complexities of bopomofo rendering.
To help with that, I just uploaded a new Web page Bopomofo on the Web. The aim is to provide background information, and carry useful ideas from one discussion to the next. I also add some personal thoughts on implementation alternatives, given current data.
I intend to update the page from time to time, as new information becomes available.
January 20, 2015
Wikimedia Foundation’s Language Engineering team is happy to announce the first version of Content Translation on Wikipedia for 8 languages: Catalan, Danish, Esperanto, Indonesian, Malay, Norwegian (Bokmål), Portuguese and Spanish. Content Translation, available as a beta feature, provides a quick way to create new articles by translating from an existing article into another language. It is also well suited for new editors looking to familiarize themselves with the editing workflow. Our aim is to build a tool that leverages the power of our multicultural global community to further Wikimedia’s mission of creating a world where every single human being can share in the sum of all knowledge.
During early 2014, when the design ideas for Content Translation were being conceptualized, we came across an interesting study by Scott A.Hale of University of Oxford, on the influences and editing patterns of multilingual editors on Wikipedia. Combined with feedback from editors we interacted with, the data presented in the study guided our initial choices, both in terms of features and languages. We were fortunate to have met the researcher in person at Wikimania 2014, so we could learn more about his findings and references.
The tool was designed for multilingual editors as our main target users. Several important patterns emerged from a month-long user study, including:
- Multilingual editors are relatively more active in Wikipedias of smaller size. Often the editors from smaller sized Wikipedias would also edit on a relatively large sized Wikipedia like English or German;
- Multilingual editors often edited the same articles in their primary and non-primary languages.
These and other factors listed in the study impact the transfer of content between different language versions of Wikipedia; they increase content parity between versions — and decrease ‘self-focus’ bias in individual editions.
When selecting languages for the tool’s introduction, we were guided by several factors, including signs of relatively high multilingualism amongst the primary editors. The availability of high quality machine-translated content was an additional consideration, to fully explore the usability of the core editing workflow designed for the tool. Based on these considerations, Catalan Wikipedia, a very actively edited project of medium size was a logical choice. Subsequent language selections were made by studying possible overlap trends between language users — and the probability of editors benefiting from those overlaps when creating new articles. Availability of machine translation to speed up the process and community requests were important considerations.
How it works
The article Abel Martín in
the Spanish Wikipedia doesn’t have a version in Portuguese, so a
red link to Portuguese is shown.
Content Translation red interlanguage link screenshot by Amire80 , licensed under CC BY-SA 4.0
Content Translation combines a rich text translation interface with tools targeted for editing — and machine translation support for most language pairs. It integrates different tools to automate repetitive steps during translation: it provides an initial automatic translation while keeping the original text format, links, references, and categories. To do so, the tool relies on the inter-language connections from Wikidata, html-to-wikitext conversion from Parsoid, and machine translation support from Apertium. This saves time for editors and allows them to focus on creating quality content.
Although basic text formatting is supported, the purpose of the tool is to create an initial version of the content that each community can keep improving with their usual editing tools. Content Translation is not intended to keep the information in sync across multiple language versions, but to provide a quick way to reuse the effort already made by the community when creating an article from scratch in a different language.
The tool can be accessed in different ways. There is a persistent access point at your contributions page, but access to the tool is also provided in situations where you may want to translate the content you are just reading. For instance, a red link in the interlanguage link area (see image).
Next steps for the tool’s future development include adding support for more – eventually all – languages, managing lists of articles to translate, and adding features for more streamlined translation.
In coming weeks, we will closely monitor feedback from users and interact with them to guide our future development. Please read the release announcement for more details about the features and instructions on using the tool. Thank you!
January 18, 2015
Version 16 of the Bengali character picker is now available.
Other than a small rearrangement of the selection table, and the significant standard features that version 16 brings, this version adds the following:
- three new buttons for automatic transcription between latin and bengali. You can use these buttons to transcribe to and from latin transcriptions using ISO 15919 or Radice approaches.
- hinting to help identify similar characters.
- the ability to select the base character for the display of combining characters in the selection table.
For more information about the picker, see the notes at the bottom of the picker page.
About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.