W3C   W3C Internationalization (I18n) Activity: Making the World Wide Web truly world wide!

Latest del.icio.us tags

Blog searches


If you own a blog with a focus on internationalization, and want to be added or removed from this aggregator, please get in touch with Richard Ishida at ishida@w3.org.

All times are UTC.

Powered by: Planet

Planet Web I18n

The Planet Web I18n aggregates posts from various blogs that talk about Web internationalization (i18n). While it is hosted by the W3C Internationalization Activity, the content of the individual entries represent only the opinion of their respective authors and does not reflect the position of the Internationalization Activity.

December 18, 2014

Global By Design

Have you registered your Cuba domain?

Cuba has long had its own country code: .CU. But most companies didn’t view this domain as a priority. Until now. But be aware that this domain isn’t cheap. I’ve seen prices ranging from $800 to $1,100, so only larger companies will see this is an impulse buy. But if you can get it, I think […]

The post Have you registered your Cuba domain? appeared first on Global by Design.

by John Yunker at 18 December 2014 05:27 PM

ishida>>blog » i18n

Thai character picker v15

I have uploaded a new version of the Thai character picker.

The new version uses characters instead of images for the selection table, making it faster to load and more flexible, and dispenses with the transcription view. If you prefer, you can still access the previous version.

Other changes include:

  • Significant rearrangement of the default selection table. The new arrangement makes it easy to choose the right characters if you have a Latin transcription to hand, which allows the removal of the previous transcription view, at the same time as speeding up that type of picking.
  • Addition of latin prompts to help locate letters (standard with v15).
  • Automatic transcription from Thai into ISO 11940-1, ISO 11940-2 and IPA. Note that for the last two there are some corner cases where the results are not quite correct, due to the ambiguity of the script, and note also that you need to show syllable boundaries with spaces before transcribing. (There’s a way to remove those spaces quickly afterwards.) See below for more information.
  • Hints! When switched on and you mouse over a character, other similar characters or characters incorporating the shape you moused over, are highlighted. Particularly useful for people who don’t know the script well, and may miss small differences, but also useful sometimes for finding a character if you first see something similar.
  • It also comes with the new v15 features that are standard, such as shape-based picking without losing context, range-selectable codepoint information, a rehabilitated escapes button, the ability to change the font of the table and the line-height of the output, and the ability to turn off autofocus on mobile devices to stop the keyboard jumping up all the time, etc.

For more information about the picker, see the notes at the bottom of the picker page.

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

More about the transcriptions: There are three buttons that allow you to convert from Thai text to Latin transcriptions. If you highlight part of the text, only that part will be transcribed.

The toISO-1 button produces an ISO 11940-1 transliteration, that latinises the Thai characters without changing their order. The result doesn’t normally tell you how to pronounce the Thai text, but it can be converted back to Thai as each Thai character is represented by a unique sequence in Latin. This transcription should produce fully conformant output. There is no need to identify syllables boundaries first.

The toISO-2 and toIPA buttons produce an output that is intended to approximately reflect actual pronunciation. It will work fine most of the time, but there are occasional ambiguities and idiosynchrasies in Thai which will cause the converter to render certain, less common syllables incorrectly. It also doesn’t automatically add accent marks to the phonetic version (though that may be added later). So the output of these buttons should be treated as something that gets you 90% of the way. NOTE: Before using these two buttons you need to add spaces or hyphens between each syllable of the Thai text. Syllable boundaries are important for correct interpretation of the text, and they are not detected automatically.

The condense button removes the spaces from the highlighted range (or the whole output area, if nothing is highlighted).

Note: For the toISO-2 transcription I use a macron over long vowels. This is non-standard.

by r12a at 18 December 2014 02:35 PM

W3C I18n Activity highlights

W3C MultilingualWeb Workshop Announced: 29 April 2015, Riga, Latvia

W3C announced today the 8th MultilingualWeb workshop in a series of events exploring the mechanisms and processes needed to ensure that the World Wide Web lives up to its potential around the world and across barriers of language and culture.

This workshop will be held 29 April 2015 in Riga, Latvia, and is made possible by the generous support of the LIDER project. The workshop is part of the Riga Summit 2015 on the Multilingual Digital Single Market (27-29 April)

Anyone may attend all sessions at no charge and the W3C welcomes participation by both speakers and non-speaking attendees. Early registration is encouraged due to limited space.

Building on the success of seven highly regarded previous workshops, this workshop will emphasize new technology developments that lead to new opportunities for the Multilingual Web. The workshop brings together participants interested in the best practices and standards needed to help content creators, localizers, language tools developers, and others meet the challenges of the multilingual Web. It provides further opportunities for networking across communities. We are particularly interested in speakers who can demonstrate novel solutions for reaching out to a global, multilingual audience.

See the Call for Participation and register online.

by Richard Ishida at 18 December 2014 10:34 AM

December 16, 2014

W3C I18n Activity highlights

First Public Working Draft of Indic Layout Requirements published

The W3C Internationalization Working Group has published a First Public Working Draft of Indic Layout Requirements on behalf of the Indic Layout Task Force, part of the W3C Internationalization Interest Group.

This document describes the basic requirements for Indic script layout and text support on the Web and in eBooks. These requirements provide information for Web technologies such as CSS, HTML and SVG about how to support users of Indic scripts. The current document focuses on Devanagari, but there are plans to widen the scope to encompass additional Indian scripts as time goes on.

Publication as a First Public Working Draft, signals the beginning of the process, rather than an end point. We are now looking for comments on the document. Please send any comments you have to public-i18n-indic@w3.org. The archive is public, but you need to subscribe to post to it.

by Richard Ishida at 16 December 2014 05:10 PM

December 10, 2014

Wikimedia Foundation


Exciting new features are now available in the third version of the Content Translation tool. Development of the new version was recently completed and the newly added features can be used in Wikimedia’s beta environment. To use it, you first need to enable the Content Translation beta-feature in the wiki, then go to the Special Page to select the article to translate. This change in behavior was done in preparation for the activation of Content Translation as a beta-feature on a few selected Wikipedias in early 2015.

The Content Translation user dashboard


Two important features have been included in this phase of development work: a user dashboard, and saving & continuing of unfinished translations.

Users can currently use these two features to monitor only their own work. The dashboard (see image) will display all the published and unpublished articles created by the user. Unpublished articles are translations that the user has not published to the user namespace of the wiki. These articles can be opened from the dashboard and users can continue to translate them. The dashboard is presently in a very early stage of development, and enhancements will be made to enrich the features.

Additionally, the selector for source and target languages and articles has been redesigned. Published articles with excessive amount of unedited machine-translated content are now included in a category so that they can be easily identified.

Languages currently available with Apertium‘s machine translation support are Catalan, Portuguese and Spanish. Users of other languages can also explore the tool after they have enabled the beta-feature. Please remember that this wiki is hosted on Wikimedia’s beta servers and you will need to create a separate account.

Upcoming plans and participation

Development work is currently going on for the fourth version of this tool. During this phase, we will focus our attention on making the translation interface stable and prepare the tool for deployment as a beta-feature in several Wikipedias.

Since the first release in July 2014, we have been guided by the helpful feedback we have continuously received from early users. We look forward to wider participation and more feedback as the tool progresses with new features and is enabled for new languages. Please let us know your views about Content Translation on the Project talk page, or by signing up for user testing sessions. You can also participate in the language quality evaluation survey to help us identify new languages that can be served through the tool.

Runa Bhattacharjee, Wikimedia Foundation, Language Engineering team

by Guillaume Paumier at 10 December 2014 11:24 PM

December 06, 2014

ishida>>blog » i18n

Tibetan character picker v15

I have uploaded a new version of the Tibetan character picker.

The new version dispenses with the images for the selection table. If you don’t have a suitable font to display the new version of the picker, you can still access the previous version, which uses images.

Other changes include:

  • Significant rearrangement of the default table, with many less common symbols moved into a location that you need to click on to reveal. This declutters the selection table.
  • Addition of latin prompts to help locate letters (standard with v15).
  • Hints (When switched on and you mouse over a character, other similar characters or characters incorporating the shape you moused over, are highlighted. Particularly useful for people who don’t know the script well, and may miss small differences, but also useful sometimes for finding a character if you first see something similar.)
  • A new Wylie button that converts Tibetan text into an extended Wylie Latin transcription. There are still some uncommon characters that don’t work, but it should cover most normal needs. I used diacritics over lowercase letters rather than uppercase letters, except for the fixed form characters. I also didn’t provide conversions for many of the symbols – they will appear without change in the transcription. See the notes on the page for more information.
  • The Codepoints button, which produces a list of characters in the output box, now has a new feature. If you have highlighted some text in the output box, you will only see a list of the highlighted characters. If there are no highlights, the contents of the whole output box are listed.
  • Don’t forget, if you are using the picker on an iPad or mobile device, to set Autofocus to Off before tapping on characters. This stops the device keypad popping up every time you select a character. (This is also standard for v15.)

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

by r12a at 06 December 2014 10:39 PM

December 02, 2014

W3C I18n Activity highlights

Final report for Chinese Layout Requirements Workshop available

The final report for the Workshop on Chinese Language Text Layout Requirements, which was held on September 11, 2014, at Beihang University, is now available. See also the Chinese version of this report.

The report contains links to slides.

The workshop gave a strong message of support for W3C Beihang and CESI to cooperate and lead the work on the Chinese Layout Requirement Document. In addition to Simplified and Traditional Chinese, there was also strong interest from representatives of the Mongolian, Tibetan and Uighur script communities to participate in the work. The closing session of the workshop proposed a number of steps to continue the efforts.

The W3C staff is driving the process of setting up this task force and reaching out to a wide range of interested stakeholders. This consultation will seek to clarify the mission for the task force, the target topics and industry priorities, and opportunities for liaisons with other related standards development organizations.

by Richard Ishida at 02 December 2014 05:01 PM

November 26, 2014

Global By Design

WordPress now at 70 languages, and counting

This blog has been hosted on WordPress since 2002. Since then, WordPress has grown into one of the dominant publishing platforms on the Internet. And one of the most multilingual as well, with strong support for 53 locales and limited support for an additional 20 or so locales. Languages supported include Russian, Arabic, Hebrew, Icelandic, […]

The post WordPress now at 70 languages, and counting appeared first on Global by Design.

by John Yunker at 26 November 2014 03:32 PM

November 17, 2014

ishida>>blog » i18n

Picker changes

If you use my Unicode character pickers, you may have noticed some changes recently. I’ve moved several pickers on to version 14. Most of the noticeable changes are in the location and styling of elements on the UI – the features remain pretty much unchanged.

Pages have acquired a header at the top (which is typically hidden), that provides links to related pages, and integrates the style into that of the rest of the site. What you don’t see is a large effort to tidy the code base and style sheets.

So far, I have changed the following: Arabic block, Armenian, Balinese, Bengali, Khmer, IPA, Lao, Mongolian, Myanmar, and Tibetan.

I will convert more as and when I get time.

However, in parallel, I have already made a start on version 15, which is a significant rewrite. Gone are the graphics, to be replaced by characters and webfonts. This makes a huge improvement to the loading time of the page. I’m also hoping to introduce more automated transcription methods, and simpler shape matching approaches.

Some of the pickers I already upgraded to version 14 have mechanisms for transcription and shape-based identification that took a huge effort to create, and will take a substantial effort to upgrade to version 15. So they may stay as they are for a while. However, easier to handle and new pickers will move to the new format.

Actually, I already made a start with Gurmukhi v15, which yanks that picker out of the stone-age and into the future. There’s also a new picker for the Uighur language that uses v15 technology. I’ll write separate blogs about those.


[By the way, if you are viewing the pickers on a mobile device such as an iPad, don't forget to turn Autofocus off (click on 'more controls' to find the switch). This will stop the onscreen keyboard popping up, annoyingly, each time you try to tap on a character.]

by r12a at 17 November 2014 10:51 PM

November 14, 2014

Wikimedia Foundation


Many readers of this blog know about the Content Translation initiative. This project, developed by the Language Engineering team of the Wikimedia Foundation, brings together machine translation and rich text editing to provide a quick method to create Wikipedia articles by translating them from another language.

Content Translation uses Apertium as its machine translation back-end. Apertium is a freely licensed open source project and was our first choice for this stage of development. The first version of Content Translation focused on the Spanish-Catalan language pair, and one of the reasons for this choice was the maturity of Apertium’s machine translation for those languages.

However, with growing needs to support more language pairs in the newer versions of Content Translation, it became essential that the machine translation continue to be reliable, and that the back-end be stable and up-to-date. To ensure this stability, we needed to use the latest updates released by the Apertium upstream project maintainers, and we needed to use Apertium as a separate service. Prior to this set-up, the Apertium service was being provided from within the Content Translation server (cxserver).

The Content Translation tool is currently hosted on Wikimedia’s beta servers. To set up the independent Apertium service, it was important to use the latest released stable packages from Apertium, but they were not available for the current versions of Ubuntu and Debian. This became a significant blocker, because use of third party package repositories is not recommended for Wikimedia’s server environments.

After discussion with Wikimedia’s Operations team and Apertium project maintainers, it was decided that the Apertium packages would be built for the Wikimedia repository. In addition to the Apertium base packages, individual packages for supporting the language pairs and other service packages were built, tested and included in the Wikimedia repository. Alexandros Kosiaris (from the Wikimedia Operations team), reviewed and merged these packages and the patches for their inclusion in the repository. The Apertium service was then puppetized for easy configuration and management on the Wikimedia beta cluster.

Meanwhile, to make Apertium more accessible for Ubuntu and Debian users, Kartik Mistry (from the Wikimedia Language Engineering team) also started working closely with the Apertium project maintainers, to make sure that the Debian packages were up-to-date in the main repository. Going forward, once the updated packages are included in Ubuntu’s next Long Term Support (LTS) version, we plan to remove these packages from the internal Wikimedia repository.

The Content Translation tool has since been updated and now supports Catalan, Portuguese and Spanish machine translation, using the updated Apertium service through cxserver. We hope our users will benefit from the faster and more reliable translation experience.

We would like to thank Tino Didriksen, Francis Tyers and Kevin Brubeck Unhammer from the Apertium project, and Alexandros Kosiaris and Antoine Musso from the Wikimedia Operations and Release Engineering teams respectively, for their continued support and guidance.

Runa Bhattacharjee, and Kartik Mistry, Wikimedia Language Engineering team

by Guillaume Paumier at 14 November 2014 06:41 PM

November 13, 2014

Global By Design

The Four Seasons improves its global gateway

I was pleased to see the Four Seasons embrace the globe icon for its global gateway. It is well positioned in the upper right-hand corner. The Four Seasons website ranked 145th out of the 150 websites scored in the 2014 Web Globalization Report Card. I predict its ranking will improve in the 2015 edition!  

The post The Four Seasons improves its global gateway appeared first on Global by Design.

by John Yunker at 13 November 2014 10:07 PM

November 10, 2014

Global By Design

Amazon pluralizes Singles Day

Leave it to Amazon to turn Single Day plural. And why not. If we can extend Black Friday to Cyber Monday, why not extend Singles day an extra day? Here’s a screen grab of the Amazon China home page (note that the sale begins on 11/10): Nike is sticking with one day, for now. Here’s a Singles […]

The post Amazon pluralizes Singles Day appeared first on Global by Design.

by John Yunker at 10 November 2014 04:42 PM

November 06, 2014

Global By Design

The biggest ecommerce day in November? It’s not Black Friday.

In China, November 11th is known as Singles Day and it has quickly become the world’s biggest day for ecommerce. Tmall, the massive ecommerce website owned by Alibaba is already promoting this day: Tmall hosts a great number of Western brands that are also eager to capitalize on this day, like Clinique: Xiaomi, China’s leading […]

The post The biggest ecommerce day in November? It’s not Black Friday. appeared first on Global by Design.

by John Yunker at 06 November 2014 03:19 AM

November 04, 2014

Wikimedia Foundation


CLDR, the Common Locale Data Repository project from the Unicode Consortium, provides translated locale-specific information like language names, country names, currency, date/time etc. that can be used in various applications. This library, used across several platforms, is particularly useful in maintaining parity of locale information in internationalized applications. In MediaWiki, the CLDR extension provides localized data and functions that can be used by developers.

The CLDR project constantly updates and maintains this database and publishes it twice a year. The information is periodically reviewed through a submission and vetting process. Individual participants and organisations can contribute during this process to improve and add to the CLDR data. The most recent version of CLDR was released in September 2014.

An important part of the CLDR data are the rules that impact how plurals are handled within the grammar of a language. In CLDR versions 25 and 26, plural rules for several languages were altered. These changes have already been incorporated in MediaWiki, which was still using rules from CLDR version 24.

The affected languages are: Russian (ru), Abkhaz (ab), Avaric (av), Bashkir (ba), Buryat (bxr), Chechen (ce), Crimean Tatar (crh-cyrl), Chuvash (cv), Inguish (inh), Komi-Permyak (koi), Karachay-Balkar (krc), Komi (kv), Lak (lbe), Lezghian (lez), Eastern Mari (mhr), Western Mari (mrj), Yakut (sah), Tatar (tt), Tatar-Cyrillic (tt-cyrl), Tuvinian (tyv), Udmurt (udm), Kalmyk (xal), Prussian (prg), Tagalog (tl), Manx (gv), Mirandese (mwl), Portuguese (pt), Brazilian Portuguese (pt-br), Uyghur (ug), Lower Sorbian (dsb), Upper Sorbian (hsb), Asturian (ast) and Western Frisian (fy).

This change will have very little impact on our users. Translators, however, will have to review the user interface messages that have already been changed to include the updated plural forms. An announcement with the details of the change has also been made. The announcement also includes instructions for updating the translations for the languages mentioned above.

The CLDR MediaWiki extension, which provides convenient abstraction for getting country names, language names etc., has also been upgraded to use CLDR 26. Universal Language Selector and CLDRPluralRuleParser libraries have been upgraded to use latest data as well.

The Wikimedia Foundation is a participating organisation in the CLDR project. Learn more about how you can be part of this effort.

Further reading about CLDR and its use in Wikimedia internationalization projects:

  1. http://laxstrom.name/blag/2014/01/05/mediawiki-i18n-explained-plural/
  2. http://thottingal.in/blog/2014/05/24/parsing-cldr-plural-rules-in-javascript/

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

by Guillaume Paumier at 04 November 2014 05:17 PM

The Content Translation tool car be used to translate articles more easily (here from Spanish to Portuguese). It provides features such as link cards, category adaptation (in development), and a warning to the editor when the text is coming exclusively from machine translation.

A few months back, the Language Engineering team of the Wikimedia Foundation announced the availability of the first version of the Content Translation tool, with machine translation support from Spanish to Catalan. The response from the Catalan Wikipedia editors was overwhelming and nearly 200 articles have already been created using the tool.

We have now enabled support for translating across Spanish, Portuguese and Catalan using Apertium as the machine translation back-end system. This extends our Spanish-to-Catalan initial launch.

The Content Translation tool is particularly useful for multilingual editors who can create new articles from corresponding articles in another language. The tool features a minimal rich-text editor with translation tools like dictionaries and machine translation support.

The Content Translation tool car be used to translate articles more easily (here from Spanish to Portuguese). It provides features such as link cards, category adaptation (in development), and a warning to the editor when the text is coming exclusively from machine translation.

The Content Translation tool can be used to translate articles more easily (here from Spanish to Portuguese). It provides features such as link cards, category adaptation (in development), and a warning to the editor when the text is coming exclusively from machine translation.

Development for the second version was completed on September 30, 2014. Due to technical difficulties in the deployment environment, availability of the updated version of the tool was delayed. As a result, the current deployment also includes some of the planned features from the next release, which is scheduled to be complete on November 18, 2014.

Highlights from this version

Some of the features included in this version originated from feedback received from the community, either during usability testing sessions, or as comments and suggestions from our initial users. Editors from the Catalan Wikipedia provided constant feedback after the first release of the tool and also during the recent roundtable.


  1. Automatic adaptation of categories.
  2. Text formatting with a simple toolbar in the Chrome browser. In Firefox, this support is limited only to keyboard shortcuts (Ctrl-B for bold, Ctrl-I for italics).
  3. Bi-directional machine translation support for Spanish and Portuguese
  4. Machine translation support from Catalan to Spanish
  5. Paragraph alignment improvements to better match original and translated sentences.
  6. More accurate detection for the use of Machine Translation suggestions without further corrections, with warnings shown to the user
  7. Redesigned top bar and progress bar.
  8. Numerous bug fixes.

How to Use

To use the tool, users can visit http://en.wikipedia.beta.wmflabs.org/wiki/Special:ContentTranslation and make the following selections:

  • source language – the language of the article to translate from. Can be Catalan, Spanish or Portuguese.
  • target language – the language of the article you will be translating into. Can be Catalan, Spanish or Portuguese.
  • article name – the title of the article to translate.

Users can also continue using the tool from the earlier available instance at http://es.wikipedia.beta.wmflabs.org/wiki/Especial:ContentTranslation

After translation, users can publish the translation in their own namespace on the same wiki and can choose to copy the page contents to the real Wikipedia for the target language. Please visit this link for more instructions on how to create and publish a new article.

Feedback and Participation

In the next few weeks, we will be reaching out to the editors from the Catalan, Spanish and Portuguese Wikipedia communities to gather feedback and also work closely to resolve any issues.

Please let us know about your feedback through the project talk page. You can also volunteer for our testing sessions.

Runa Bhattacharjee, Wikimedia Foundation, Language Engineering team

by Guillaume Paumier at 04 November 2014 04:59 AM

October 29, 2014

Global By Design

Is your global gateway stuck in the basement?

When you welcome visitors into your home, you probably don’t usher them directly to the basement. Yet when it comes to websites, this is exactly how many companies treat visitors from around the world. That is, they expect visitors to scroll down to the footer (basement) of their websites in order to find the global gateway. Now I want […]

The post Is your global gateway stuck in the basement? appeared first on Global by Design.

by John Yunker at 29 October 2014 11:43 PM

October 27, 2014

Global By Design

Bulgaria (at long last) gets it own internationalized domain name (IDN)

Five years ago, Bulgaria applied for an IDN but was denied by ICANN on the basis of “string similarity” with the country code of Brazil. Here is the Bulgarian IDN side by side with Brazil’s ccTLD:  бг  br. String similarity is a complex and controversial issue. But Bulgaria refused to take no for an answer and, five […]

The post Bulgaria (at long last) gets it own internationalized domain name (IDN) appeared first on Global by Design.

by John Yunker at 27 October 2014 08:09 PM

October 22, 2014

Global By Design

You say Sea of Japan. I say East Sea.

Who said the life of a map maker isn’t interesting? Every other day it seems there is another disputed territory, which usually means another disputed name. I’ve already mentioned the Falkland Islands/Islas Malvinas issue. On the other side of the planet there is a dispute brewing over the Sea of Japan. South Korea maintains that the […]

The post You say Sea of Japan. I say East Sea. appeared first on Global by Design.

by John Yunker at 22 October 2014 02:31 AM

October 18, 2014

ishida>>blog » i18n

Notes on Tibetan script

See the Tibetan Script Notes

Last March I pulled together some notes about the Tibetan script overall, and detailed notes about Unicode characters used in Tibetan.

I am writing these pages as I explore the Tibetan script as used for the Tibetan language. They may be updated from time to time and should not be considered authoritative. Basically I am mostly simplifying, combining, streamlining and arranging the text from the sources listed at the bottom of the page.

The first half of the script notes page describes how Unicode characters are used to write Tibetan. The second half looks at text layout in Tibetan (eg. line-breaking, justification, emphasis, punctuation, etc.)

The character notes page lists all the characters in the Unicode Tibetan block, and provides specific usage notes for many of them per their use for writing the Tibetan language.

See the Tibetan Character Notes

Tibetan is an abugida, ie. consonants carry an inherent vowel sound that is overridden using vowel signs. Text runs from left to right.

There are various different Tibetan scripts, of two basic types: དབུ་ཙན་ dbu-can, pronounced /uchen/ (with a head), and དབུ་མེད་ dbu-med, pronounced /ume/ (headless). This page concentrates on the former. Pronunciations are based on the central, Lhasa dialect.

The pronunciation of Tibetan words is typically much simpler than the orthography, which involves patterns of consonants. These reduce ambiguity and can affect pronunciation and tone. In the notes I try to explain how that works, in an approachable way (though it’s still a little complicated, at first).

Traditional Tibetan text was written on pechas (དཔེ་ཆ་ dpe-cha), loose-leaf sheets. Some of the characters used and formatting approaches are different in books and pechas.

For similar notes on other scripts, see my docs list.

by r12a at 18 October 2014 05:48 AM

October 08, 2014

Global By Design

Apple continues to neglect its global gateway

Every time Apple updates its web design (which it did recently) I get hopeful that the global gateway will receive a similar upgrade. But this has not yet happened. Apple’s global gateway remains firmly entrenched in the use of flags. And that’s unfortunate. Flags are not the best icons for global navigation. They are fraught with […]

The post Apple continues to neglect its global gateway appeared first on Global by Design.

by John Yunker at 08 October 2014 07:56 PM

September 29, 2014

Wikimedia Foundation


On September 5, 2014, the Language Engineering team hosted an online round-table with editors of the Catalan Wikipedia to discuss the Content Translation tool. Besides the translation editor and tools, the first release of Content Translation supported machine translation from Spanish to Catalan. This helped the editors work efficiently and explore the tool more deeply.

The initial feedback from editors was greatly encouraging. They liked the tool and were pleased by the tool’s ease of use. After a month of extensive use during which 160+ articles had been created and contributed to the Catalan Wikipedia, the team wanted to find out more about how the tool was being used on day-to-day editing workflows by the editors as well as gaps that the tool leaves unaddressed. The conversation resulted in valuable feedback from the editors, some of which has been presented below.

Screenshot of the Content Translation tool that shows the user a warning about a large amount of machine translated content in the translated article.

(Content-Translation-Warning.png, includes text from en:Tree, by Wikipedia contributors, under CC-BY-SA 3.0, and es:Árbol, by Wikipedia contributors, under CC-BY-SA 3.0)

Faster Editing: The editors unequivocally agreed that the tool provided an overall improvement in their workflow. They were able to create new articles faster and the high-quality machine translated drafts often needed very few corrections. Editor Xavi Bosch felt that he could create articles in approximately 30% of the time he originally needed before the tool was available. With the extra time gained, the editors could focus on fine-tuning the article. For instance, by adding more references.

Machine Translation: Content Translation uses Apertium as the machine translation engine. The editors expressed their satisfaction at the overall quality of translation provided by the tool. However, they suggested adding more checks that would identify articles which were largely unchanged. Presently, the user is warned when the tool detects when not much has been changed from the original translation. Pau Giner suggested exploring community best practices from the Catalan Wikipedia to create additional baselines for articles published using Content Translation.

Category Adaptation: After creating an article, the current setup on beta labs requires users to publish the article manually on the Catalan Wikipedia. This allows the editors to review the articles and prepare them for publication. The editors highlighted that categories are a major addition during these reviews and a feature to adapt categories would be a major benefit. Category adaption is a feature planned for development. The editors suggested:

  • inserting the translated equivalents of the categories in the original article, and
  • a feature to add new categories (similar to HotCat)

Article continuity through red links: At present, articles from the source language that are not present in the target language are not marked in the translated text. In wiki pages, these are marked as red links. Editors suggested that a similar indicator should be displayed in the published article. This will be especially helpful when creating closely linked articles like the ones recently created on the Catalan Wikipedia for the Fields Medal awardees.

Complementing the current tools: The Catalan Wikipedia editors also use several tools for typo correction and other aids. It was suggested to explore the possibility of integrating these tools to complement the current services provided through Content Translation. Editor B25es highlighted some long existing minor errors in the Apertium translation service that were being carried into Content Translation as well. The editors recommended extending Content Translation to learn from these known issues and provide corrections that would be beneficial.

Issues while publishing articles : On several occasions the editors had not been able to save a translated article. While some of this was due to the technical instability of the beta labs environment where the tool is currently hosted, the editors found some patterns and content where this error had been recurring. Articles with more visual content or complex templates (like football results) have often been problematic. In a few cases where the article was not saved, it was noticed that the sequence in which the paragraphs had been translated was similar. For instance in articles about Cédric Villani or Stanislav Smirnov. The development team has begun investigating these issues.

To know more, watch the recording of the conversation and read about the features of the upcoming release. If you haven’t tried the tool yet, please do so using these instructions. We would love to hear your feedback.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

by carlosmonterrey at 29 September 2014 06:45 PM

September 23, 2014

ishida>>blog » i18n

Improving international text layout and typography on the Web and in eBooks

Screen shot 2014-09-26 at 16.36.47

The W3C needs to make sure that the typographic needs of scripts and languages around the world are built in to technologies such as HTML, CSS, SVG, etc. so that Web pages and eBooks can look and behave as expected for people around the world.

To that end we have experts in various parts of the world documenting typographic requirements and gaps between what is needed and what is currently supported in browsers and ebook readers.

The flagship document is Requirements for Japanese Text Layout. The information in this document has been widely used, and the process used for creating it was extremely effective. It was developed in Japan, by a task force using mailing lists and holding meetings in japanese, then converted to english for review. It was published in both languages.

We now have groups working on Indic Layout Requirements and Requirements for Hangul Text Layout and Typography, and this month I was in Beijing to discuss ongoing work on Chinese layout requirements (URL coming soon), and we heard from experts in Mongolian, Tibetan, and Uyghur who are keen to also participate in the Chinese task force and produce similar documents for their part of the world.

The Internationalization (i18n) Working Group at the W3C has also been working on other aspects of the mutlilingual user experience. For example, improvements for bidirectional text support (Arabic, Hebrew, Thaana, etc) for HTML and CSS, and supporting the work on counter styles at CSS.

To support local relevance of Web pages and eBook formats we need local experts to participate in gathering information in these task forces, to review the task force outputs, and to lobby or support via coding the implementation of features in browsers and ereaders. If you are one of these people, or know some, please get in touch!

We particularly need more information about how to handle typographic features of the Arabic script.

In the hope that it will help, I have put together some information on current areas of activity at the W3C, with pointers to useful existing requirements, specifications and tests. It is not exhaustive, and I expect it to be added to and improved over time.

Look through the list and check whether your needs are being adequately covered. If not, write to www-international@w3.org (you need to subscribe first) and make the case. If the spec does cover your needs, but the browsers don’t support your needs, raise bugs against the browsers.

by r12a at 23 September 2014 03:55 PM

September 18, 2014

Global By Design

One probable beneficiary of Scotland independence: .SCOT

So today is the big day for the people of Scotland as well as the UK. One question that occurs to country code geeks such as myself is what country code domain would Scotland use if/when it became separate from .UK? It turns out that one domain is already available right now: .scot. However, this isn’t technically a […]

The post One probable beneficiary of Scotland independence: .SCOT appeared first on Global by Design.

by John Yunker at 18 September 2014 03:15 PM

September 17, 2014

W3C I18n Activity highlights

Encoding is a Candidate Recommendation

The Encoding specification has been published as a Candidate Recommendation. This is a snapshot of the WHATWG document, as of 4 September 2014, published after discussion with the WHATWG editors. No changes have been made in the body of this document other than to align with W3C house styles. The primary reason that W3C is publishing this document is so that HTML5 and other specifications may normatively refer to a stable W3C Recommendation.

Going forward, the Internationalization Working Group expects to receive more comments in the form of implementation feedback and test cases. The Working Group
believes it will have satisfied its implementation criteria no earlier than 16 March 2015. If you would like to contribute test cases or information about implementations, please send mail to www-international@w3.org.

The utf-8 encoding is the most appropriate encoding for interchange of Unicode, the universal coded character set. Therefore for new protocols and formats, as well as existing formats deployed in new contexts, this specification requires (and defines) the utf-8 encoding.

The other (legacy) encodings have been defined to some extent in the past. However, user agents have not always implemented them in the same way, have not always used the same labels, and often differ in dealing with undefined and former proprietary areas of encodings. This specification addresses those gaps so that new user agents do not have to reverse engineer encoding implementations and existing user agents can converge.

by Richard Ishida at 17 September 2014 04:42 PM

September 11, 2014

Internet Globalization News

The Challenges of Globalization and Democracy

Great analysis of some of the challenges and tensions between democracy and globalization by Dani Rodrik , Rafiq Hariri Professor of International Political Economy at the John F. Kennedy School of Government, Harvard University. While not always globalization is necessarily against democracy or democratic decision-making processes, there is a tendency, in some of the world's financial elites, to try and find non-democratic ways to make their plans prevail. Globalization should not be just about free trade and lower salaries. via www.worldfinancialreview.com Even though it is possible to advance both democracy and globalization, it requires the creation of a global political community that is vastly more ambitious than anything we have seen to date or are likely to experience soon. It would call for global rulemaking by democracy, supported by accountability mechanisms that go far beyond what we have at present. Democratic global governance of this sort is a chimera. There...

by blogalize.me at 11 September 2014 04:35 PM

Does globalization mean we will become one culture?

Interesting approach to the impact of globalization on cultures around the world. As expected, the article does not really give an answer to the question if we will become one culture. I would add that even though we might all buy the same brands and use services provided by the same transnational companies, "culture" is something much deeper, that responds to other factors, and brands/services homogenization will not bring a "one culture world". Mark Pagel via www.bbc.com Stroll into your local Starbucks and you will find yourself part of a cultural experiment on a scale never seen before on this planet. In less than half a century, the coffee chain has grown from a single outlet in Seattle to nearly 20,000 shops in around 60 countries. Each year, its near identical stores serve cups of near identical coffee in near identical cups to hundreds of thousands of people. For the...

by blogalize.me at 11 September 2014 04:34 PM

The $34 Billion Multilingual Business Conversation

Good overall description of what's happening today in the localization industry. New, lean and innovative technology companies like Cloudwords are disrupting an industry that was stagnant and dominated for a long time by slow-moving translation vendors. via www.cnbc.com ...The innovations are enabling corporations to enter markets and disrupt many sectors that were previously unreachable. Language is the mother tongue of global business opportunity. Coupa Software, a San Mateo, Calif., maker of cloud spend management solutions, wanted to test the waters in Latin America by sponsoring a trade show in Mexico City. But finding interpreters and building in-house technology to translate the intricate code of its websites, marketing materials and social media—for a project that may not result in new business—required too much time and other resources. Instead, Coupa contracted with Cloudwords, a San Francisco-based firm whose project management software streamlines the translation process. What would have taken Coupa about three...

by blogalize.me at 11 September 2014 04:31 PM

Globalization for Whom?

There is no doubt that globalization "can work" for poor people (or, better, for poor countries). Global integration can be a powerful force for reducing poverty and empowering people. The question of whether it "does work" is much less certain. According to Ian Goldin of the Oxford Martin School, the relationship between globalization and poverty reduction is far from automatic — and far from simple. via www.theglobalist.com Globalization today is at a critical crossroads. It has provided immense benefits, but the systemic risks and rising inequality it causes require urgent action. The failure to arrest these developments is likely to lead to growing protectionism, nationalist policies and xenophobia, which will slow the global recovery and be particularly harmful for poor people. The scope and scale of the required reforms are vast and complex. Urgent action is needed for globalization to realize the positive potential that increased connectedness and interdependency can...

by blogalize.me at 11 September 2014 04:31 PM

Getting Cross-Cultural Teamwork Right

As someone who has worked in cross-cultural, remotely based teams for many many years, I think the insights provided by Neeley are spot on. In order to have mutual understanding, learning and teaching, team members must have a minimum level of sensitivity and of self-awareness so they do not fall into the mistake of thinking "my way of doing things is the 'normal' (read: correct) way". Being open minded and aware of the fact that people are just different, and not necessarily wrong, are the key to getting cross-cultural teamwork right. In short, companies thinking about or in the process of expanding internationally should be extra careful when selecting people to work in their international teams. This is a case where the personality of teammates is as important as the processes used to manage the international side of the business. Tsedal Neeley via blogs.hbr.org People struggle with global teamwork, even...

by blogalize.me at 11 September 2014 04:06 PM

September 10, 2014

Global By Design

What’s the ROI of web globalization?

I’ve been meaning to write about this for awhile. A few months ago, Apple CEO Tim Cook reportedly said this at an investor meeting: “When we work on making our devices accessible by the blind,” he said, “I don’t consider the bloody ROI.” I love this quote. And I love any CEO who knows when the […]

The post What’s the ROI of web globalization? appeared first on Global by Design.

by John Yunker at 10 September 2014 08:52 PM

Contact: Richard Ishida (ishida@w3.org).