W3C   W3C Internationalization (I18n) Activity: Making the World Wide Web truly world wide!

Latest del.icio.us tags

Blog searches


If you own a blog with a focus on internationalization, and want to be added or removed from this aggregator, please get in touch with Richard Ishida at ishida@w3.org.

All times are UTC.

Powered by: Planet

Planet Web I18n

The Planet Web I18n aggregates posts from various blogs that talk about Web internationalization (i18n). While it is hosted by the W3C Internationalization Activity, the content of the individual entries represent only the opinion of their respective authors and does not reflect the position of the Internationalization Activity.

September 18, 2014

Global By Design

One probable beneficiary of Scotland independence: .SCOT

So today is the big day for the people of Scotland as well as the UK. One question that occurs to country code geeks such as myself is what country code domain would Scotland use if/when it became separate from .UK? It turns out that one domain is already available right now: .scot. However, this isn’t technically a […]

by John Yunker at 18 September 2014 03:15 PM

September 17, 2014

W3C I18n Activity highlights

Encoding is a Candidate Recommendation

The Encoding specification has been published as a Candidate Recommendation. This is a snapshot of the WHATWG document, as of 4 September 2014, published after discussion with the WHATWG editors. No changes have been made in the body of this document other than to align with W3C house styles. The primary reason that W3C is publishing this document is so that HTML5 and other specifications may normatively refer to a stable W3C Recommendation.

Going forward, the Internationalization Working Group expects to receive more comments in the form of implementation feedback and test cases. The Working Group
believes it will have satisfied its implementation criteria no earlier than 16 March 2015. If you would like to contribute test cases or information about implementations, please send mail to www-international@w3.org.

The utf-8 encoding is the most appropriate encoding for interchange of Unicode, the universal coded character set. Therefore for new protocols and formats, as well as existing formats deployed in new contexts, this specification requires (and defines) the utf-8 encoding.

The other (legacy) encodings have been defined to some extent in the past. However, user agents have not always implemented them in the same way, have not always used the same labels, and often differ in dealing with undefined and former proprietary areas of encodings. This specification addresses those gaps so that new user agents do not have to reverse engineer encoding implementations and existing user agents can converge.

by Richard Ishida at 17 September 2014 04:42 PM

September 11, 2014

Internet Globalization News

The Challenges of Globalization and Democracy

Great analysis of some of the challenges and tensions between democracy and globalization by Dani Rodrik , Rafiq Hariri Professor of International Political Economy at the John F. Kennedy School of Government, Harvard University. While not always globalization is necessarily against democracy or democratic decision-making processes, there is a tendency, in some of the world's financial elites, to try and find non-democratic ways to make their plans prevail. Globalization should not be just about free trade and lower salaries. via www.worldfinancialreview.com Even though it is possible to advance both democracy and globalization, it requires the creation of a global political community that is vastly more ambitious than anything we have seen to date or are likely to experience soon. It would call for global rulemaking by democracy, supported by accountability mechanisms that go far beyond what we have at present. Democratic global governance of this sort is a chimera. There...

by blogalize.me at 11 September 2014 04:35 PM

Does globalization mean we will become one culture?

Interesting approach to the impact of globalization on cultures around the world. As expected, the article does not really give an answer to the question if we will become one culture. I would add that even though we might all buy the same brands and use services provided by the same transnational companies, "culture" is something much deeper, that responds to other factors, and brands/services homogenization will not bring a "one culture world". Mark Pagel via www.bbc.com Stroll into your local Starbucks and you will find yourself part of a cultural experiment on a scale never seen before on this planet. In less than half a century, the coffee chain has grown from a single outlet in Seattle to nearly 20,000 shops in around 60 countries. Each year, its near identical stores serve cups of near identical coffee in near identical cups to hundreds of thousands of people. For the...

by blogalize.me at 11 September 2014 04:34 PM

The $34 Billion Multilingual Business Conversation

Good overall description of what's happening today in the localization industry. New, lean and innovative technology companies like Cloudwords are disrupting an industry that was stagnant and dominated for a long time by slow-moving translation vendors. via www.cnbc.com ...The innovations are enabling corporations to enter markets and disrupt many sectors that were previously unreachable. Language is the mother tongue of global business opportunity. Coupa Software, a San Mateo, Calif., maker of cloud spend management solutions, wanted to test the waters in Latin America by sponsoring a trade show in Mexico City. But finding interpreters and building in-house technology to translate the intricate code of its websites, marketing materials and social media—for a project that may not result in new business—required too much time and other resources. Instead, Coupa contracted with Cloudwords, a San Francisco-based firm whose project management software streamlines the translation process. What would have taken Coupa about three...

by blogalize.me at 11 September 2014 04:31 PM

Globalization for Whom?

There is no doubt that globalization "can work" for poor people (or, better, for poor countries). Global integration can be a powerful force for reducing poverty and empowering people. The question of whether it "does work" is much less certain. According to Ian Goldin of the Oxford Martin School, the relationship between globalization and poverty reduction is far from automatic — and far from simple. via www.theglobalist.com Globalization today is at a critical crossroads. It has provided immense benefits, but the systemic risks and rising inequality it causes require urgent action. The failure to arrest these developments is likely to lead to growing protectionism, nationalist policies and xenophobia, which will slow the global recovery and be particularly harmful for poor people. The scope and scale of the required reforms are vast and complex. Urgent action is needed for globalization to realize the positive potential that increased connectedness and interdependency can...

by blogalize.me at 11 September 2014 04:31 PM

Getting Cross-Cultural Teamwork Right

As someone who has worked in cross-cultural, remotely based teams for many many years, I think the insights provided by Neeley are spot on. In order to have mutual understanding, learning and teaching, team members must have a minimum level of sensitivity and of self-awareness so they do not fall into the mistake of thinking "my way of doing things is the 'normal' (read: correct) way". Being open minded and aware of the fact that people are just different, and not necessarily wrong, are the key to getting cross-cultural teamwork right. In short, companies thinking about or in the process of expanding internationally should be extra careful when selecting people to work in their international teams. This is a case where the personality of teammates is as important as the processes used to manage the international side of the business. Tsedal Neeley via blogs.hbr.org People struggle with global teamwork, even...

by blogalize.me at 11 September 2014 04:06 PM

September 10, 2014

Global By Design

What’s the ROI of web globalization?

I’ve been meaning to write about this for awhile. A few months ago, Apple CEO Tim Cook reportedly said this at an investor meeting: “When we work on making our devices accessible by the blind,” he said, “I don’t consider the bloody ROI.” I love this quote. And I love any CEO who knows when the […]

by John Yunker at 10 September 2014 08:52 PM

September 03, 2014

Global By Design

What’s wrong with this global gateway?

A few things. First, using flags to indicate language is almost always a mistake. Second, why are the language names all in English? Only the “English language” text needs to be in English. The purpose of the gateway is to communicate with speakers of other languages, not just English speakers. Finally, do we need “Language” at all? […]

by John Yunker at 03 September 2014 10:33 PM

August 27, 2014

Global By Design

Managing language expectations when you can’t translate everything

I don’t know of any large company that translates all of its content into all of its target languages. I won’t go into the many reasons for why this is — money being the major reason — but I will say that if this is an issue you struggle with you’re not alone. The key to […]

by John Yunker at 27 August 2014 09:30 PM

August 26, 2014

Global By Design

Buick: A Chinese success story

I still look at the Buick brand as something for the post-60 demographic (though I must confess that demographic doesn’t feel quite so old anymore). It’s an image Buick has been working to change for years. But the beauty of globalization is that Buick doesn’t carry this sort of generational baggage in other countries. Like China. The Chinese apparently love […]

by John Yunker at 26 August 2014 04:31 PM

Wikimedia Foundation

Amir Aharoni of the Wikimedia Language Engineering team introduces the Content Translation tool to the student delegation from Kazakhstan at Wikimania 2014, in London.

On July 17, 2014, the Wikimedia Language Engineering team announced the deployment of the ContentTranslation extension in Wikimedia Labs. This first deployment was targeted primarily for translation from Spanish to Catalan. Since then, users have expressed generally positive feedback about the tool. Most of the initial discussion took place in the Village pump (Taverna) of the Catalan Wikipedia. Later, we had the opportunity to showcase the tool to a wider audience at Wikimania in London.

Initial response

In the first 2 weeks, 29 articles were created using the Content Translation tool and published in the Catalan Wikipedia. Article topics were diverse, ranging from places in Malta, to companies in Italy, a river, a monastery, a political manifesto, and a prisoner of war. As the Content Translation tool is also being used for testing by the developers and other volunteers, the full list of articles that make it to a Wikipedia is regularly updated. The Language Engineering team also started addressing some of the bugs that were encountered, such as issues with paragraph alignment and stability of the machine translation controller.

The number of articles published using Content Translation has now crossed over 100 and its usage has not been only limited to Catalan Wikipedia. Users have been creating articles in other languages like Gujarati and Malayalam, although machine translation has not been extended beyond Spanish−Catalan yet. All the pages that were published as articles had further edits for wikification, grammar correction, and in some cases meaningful enhancement. A deeper look at the edits revealed that the additional changes were first made by the same user who made the initial translation, and later by other editors or bots.

Wikimania in London

Amir Aharoni of the Wikimedia Language Engineering team introduces the Content Translation tool to the student delegation from Kazakhstan at Wikimania 2014, in London.

Amir Aharoni of the Wikimedia Language Engineering team introduces the Content Translation tool to the student delegation from Kazakhstan at Wikimania 2014, in London.

The Content Translation tool was showcased widely at Wikimania 2014, the annual conference of the Wikimedia communities. In the main conference, Santhosh Thottingal and Amir Aharoni presented about machine aided translation delivery through Content Translation. During the pre-conference hackathon, Pau Giner conducted a testing session with student volunteers from Kazakhstan, who were enthusiastic about using the tool in their local Wiki Club. Requests for fully supporting other language pairs were brought up by many users and groups like the Wikipedia Medical Translation project. Discussions were held with the Wikidata team to identify areas of collaboration on data reuse for consistent referencing across translated versions. These include categories, links etc.

The Language Engineering team members worked closely with Wikimedians to better understand requirements for languages like Arabic, Persian, Portuguese, Tajik, Swedish, German and others, that can be instrumental in extending support for these languages.

Further development

The development of ContentTranslation continues. Prior to Wikimania, the Language Engineering team met to evaluate the response and effectiveness of the first release of the tool, and prepared the goals for the next release. The second release is slated for the last week of September 2014. Among the features planned are support for more languages (machine translation, dictionaries), a smarter entry point to the translation UI, and basic editor formatting. It is expected that translation support from Catalan to Spanish will be activated by the end of August 2014. Read the detailed release plan and goals to know more.

Over the next couple of months, the Language Engineering team intends to work closely with our communities to better understand how the Content Translation tool has helped the editors so far and how it can serve the the global community better with the translation aids and resources currently integrated with tool. We welcome feedback at the project talk page. Get in touch with the Language Engineering team for more information and feedback.

Amir Aharoni and Runa Bhattacharjee, Language Engineering, Wikimedia Foundation

by Guillaume Paumier at 26 August 2014 12:34 PM

August 23, 2014

Global By Design

GoDaddy’s expanding global gateway

You can tell a lot about a company simply by keeping an eye on its global gateway. Here is the GoDaddy global gateway menu in March, 2014: And here it is today: From 21 locales to 42 locales in a few months.

by John Yunker at 23 August 2014 03:22 PM

August 22, 2014

ishida>>blog » i18n

Burmese divergence on the Web

It’s disappointing to see that non-standard uses of UTF-8 are being used by the BBC on their BBC Burmese Facebook page.

Take, for example, the following text.

On the actual BBC site it looks like this (click on the burmese text to see a list of the characters used):

အိန္ဒိယ မိန်းမငယ် ၂ဦး အမှု ဆေးစစ်ချက် ကွဲလွဲနေ

As far as I can tell, this is conformant use of Unicode codepoints.

Look at the same title on the BBC’s Facebook page, however, and you see:

အိႏၵိယ မိန္းမငယ္ ၂ဦး အမႈ ေဆးစစ္ခ်က္ ကြဲလြဲေန

Depending upon where you are reading this (as long as you have some Burmese font and rendering support), one of the two lines of Burmese text above will contain lots of garbage. For me, it’s the second (non-standard).

This non-standard approach uses visual encoding for combining characters that appear before or on both sides of the base, uses Shan or Rumai Palaung codepoints for subjoining consonants, uses the wrong codepoints for medial consonants, and uses the virama instead of the asat at the end of a word.

I assume that this is because of prevalent use of the non-standard approach on mobile devices (and that the BBC is just following that trend), caused by hacks that arose when people were impatient to get on the Web but script support was lagging in applications.

However, continuing this divergence does nobody any long-term good.

[Find fonts and other resources for the Myanmar script]

by r12a at 22 August 2014 09:23 AM

August 21, 2014

W3C I18n Activity highlights

Predefined Counter Styles Draft Published

The W3C i18n Working Group has published a new Working Draft of Predefined Counter Styles. This document describes numbering systems used by various cultures around the world and can be used as a reference for those wishing to create user-defined counter styles for CSS. The latest draft synchronizes the document with changes to the related document CSS Counter Styles Level 3, for which a second Last Call is about to be announced. If you have comments on the draft, please send to www-international@w3.org.

by Richard Ishida at 21 August 2014 04:49 PM

August 13, 2014

W3C I18n Activity highlights

Industry speakers lined up to discuss use cases and requirements for linked data and content analytics

The agenda of the 4th LIDER roadmapping workshop and LD4LT event has been published. A great variety of industry stakeholders will talk about linked data and content analytics. Industry areas represented include content analytics technology, multilingual conversational applications, localisation and more.

The workshop will take place on September 2nd in Leipzig, Germany and it will be collocated with the SEMANTiCS conference. The workshop will be organised as part of MLODE 2014 and will be preceded by a hackathon on the 1st of September.

The event is supported by the LIDER EU project, the MultilingualWeb community, the NLP2RDF project as well as the DBpedia project.

by Richard Ishida at 13 August 2014 03:23 PM

August 08, 2014

W3C I18n Activity highlights

XLIFF 2.0 becomes OASIS standard

The XML Localization Interchange File Format (XLIFF) version 2.0 has been approved as an OASIS Standard.

XLIFF is the open standard bi-text format: Bi-text keeps source language and target language data in sync during localization.

The publication of XLIFF 2.0 is of high importance for W3C since several of the main ITS 2.0 data categories can be used within XLIFF 2.0 to provide content related information during the localization process. Full ITS 2.0 support is planned for the upcoming XLIFF 2.1 version.

by Richard Ishida at 08 August 2014 10:39 AM

August 06, 2014

Global By Design

Gmail leads the global (as in non-Latin) email race

It’s official. Gmail supports (to a degree) non-Latin email addresses. That is, you can receive an email from someone with a non-Latin email address, as well as send an email to such an address. You cannot (yet) setup a Google email account with a non-Latin address, though this is coming. As well as support across […]

by John Yunker at 06 August 2014 06:50 PM

W3C I18n Activity highlights

Report available for W3C MultilingualWeb workshop in Madrid

A report summarizing the MultilingualWeb workshop in Madrid is now available from the MultilingualWeb site. It contains a summary of each session with links to presentation slides and minutes taken during the workshop in Madrid. The workshop was a huge success, with approximately 110 participants, and with the associated LIDER roadmapping workshop. The Workshop was hosted by Universidad Politécnica de Madrid, sponsored by the EU-funded LIDER project, by Verisign and by Lionbridge.
A new workshop in the MultilingualWeb series is planned for 2015.

by Richard Ishida at 06 August 2014 02:53 PM

July 31, 2014

ishida>>blog » i18n

UniView 7.0.0: Final spec version; links to detailed notes

Picture of the page in action.

>> Use UniView

This version updates the app per the changes during beta phase of the specification, so that it now reflects the finalised Unicode 7.0.0.

The initial in-app help information displayed for new users was significantly updated, and the help tab now links directly to the help page.

A more significant improvement was the addition of links to character descriptions (on the right) where such details exist. This finally reintegrates the information that was previously pulled in from a database. Links are only provided where additional data actually exists. To see an example, go here and click on See character notes at the bottom right.

Rather than pull the data into the page, the link opens a new window containing the appropriate information. This has advantages for comparing data, but it was also the best solution I could find without using PHP (which is no longer available on the server I use). It also makes it easier to edit the character notes, so the amount of such detail should grow faster. In fact, some additional pages of notes were added along with this upgrade.

A pop-up window containing resource information used to appear when you used the query to show a block. This no longer happens.

Changes in version 7beta

I forgot to announce this version on my blog, so for good measure, here are the (pretty big) changes it introduced.

This version adds the 2,834 new characters encoded in the Unicode 7.0.0 beta, including characters for 23 new scripts. It also simplified the user interface, and eliminated most of the bugs introduced in the quick port to JavaScript that was the previous version.

Some features that were available in version 6.1.0a are still not available, but they are minor.

Significant changes to the UI include the removal of the ‘popout’ box, and the merging of the search input box with that of the other features listed under Find.

In addition, the buttons that used to appear when you select a Unicode block have changed. Now the block name appears near the top right of the page with a I icon icon. Clicking on the icon takes you to a page listing resources for that block, rather than listing the resources in the lower right part of UniView’s interface.

UniView no longer uses a database to display additional notes about characters. Instead, the information is being added to HTML files.

by r12a at 31 July 2014 07:24 PM

Global By Design

Adobe and Google release open source CJK font family

This is the result of a massive investment of resources and expertise — and I’m excited they’ve made it open source. From Adobe: Source Han Sans, available in seven weights, is a typeface family which provides full support for Japanese, Korean, Traditional Chinese, and Simplified Chinese, all in one font. It also includes Latin, Greek, and Cyrillic […]

by John Yunker at 31 July 2014 06:30 PM

July 22, 2014

Global By Design

Aussies love .AU

Keeping in mind that this is a survey funded by Australia’s registry, the data points pretty clearly toward a preference for .au over .com. From the announcement: The report found .au remains Australia’s home on the Internet with more than double the level of trust over any other namespace. George Pongas, General Manager of Naming Services at […]

by John Yunker at 22 July 2014 10:33 PM

July 18, 2014

Global By Design

Q&A with Kathleen Bostick of SDL

I was happy to chat (virtually) with Kathleen recently about web globalization. Here’s the interview.

by John Yunker at 18 July 2014 10:49 PM

July 17, 2014

Wikimedia Foundation


The projects in the Wikimedia universe can be accessed and used in a large number of languages from around the world. The Wikimedia websites, their MediaWiki software (bot core and extensions) and their growing content benefit from standards-driven internationalization and localization engineering that makes the sites easy to use in every language across diverse platforms, both desktop and and mobile.

However, a wide disparity exists in the numbers of articles across language wikis. The article count across Wikipedias in different languages is an often cited example. As the Wikimedia Foundation focuses on the larger mission of enabling editor engagement around the globe, the Wikimedia Language Engineering team has been working on a content translation tool that can greatly facilitate the process of article creation by new editors.

About the Tool

The Content Translation editor displaying a translation of the article for Aeroplane from Spanish to Catalan.

Particularly aimed at users fluent in two or more languages, the Content Translation tool has been in development since the beginning of 2014. It will provide a combination of editing and translation tools that can be used by multilingual users to bootstrap articles in a new language by translating an existing article from another language. The Content Translation tool has been designed to address basic templates, references and links found in Wikipedia articles.

Development of this tool has involved significant research and evaluation by the engineering team to handle elements like sentence segmentation, machine translation, rich-text editing, user interface design and scalable backend architecture. The first milestone for the tool’s rollout this month includes a comprehensive editor, limited capabilities in areas of machine translation, link and reference adaptation and dictionary support.

Why Spanish and Catalan as the first language pair?

Presently deployed at http://es.wikipedia.beta.wmflabs.org/wiki/Especial:ContentTranslation, the tool is open for wider testing and user feedback. Users will have to create an account on this wiki and log in to use the tool. For the current release, machine translation can only be used to translate articles between Spanish and Catalan. This language pair was chosen for their linguistic similarity as well as availability of well-supported language aids like dictionaries and machine translation. Driven by a passionate community of contributors, the Catalan Wikipedia is an ideal medium sized project for testing and feedback. We also hope to enhance the aided translation capabilities of the tool by generating parallel corpora of text from within the tool.

To view Content Translation in action, please follow the link to this instance and make the following selections:

  • article name – the article you would like to translate
  • source language – the language in which the article you wish to translate exists (restricted to Spanish at this moment)
  • target language – the language in which you would like to translate the article (restricted to Catalan at this moment)

This will lead you to the editing interface where you can provide a title for the page, translate the different sections of the article and then publish the page in your user namespace in the same wiki. This newly created page will have to be copied over to the Wikipedia in the target language that you had earlier selected.

Users in languages other than Spanish and Catalan can also view the functionality of the tool by making a few tweaks.

We care about your feedback

Please provide us your feedback on this page on the Catalan Wikipedia or at this topic on the project’s talk page. We will attempt to respond as soon as possible based on criticality of issues surfaced.

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

by wikimediablog at 17 July 2014 12:29 AM

July 16, 2014

W3C I18n Activity highlights

Character Model for the World Wide Web: String Matching and Searching Draft Published

This document builds upon on the Character Model for the World Wide Web 1.0: Fundamentals to provide authors of specifications, software developers, and content developers a common reference on string matching on the World Wide Web and thereby increase interoperability. String matching is the process by which a specification or implementation defines whether two string values are the same or different from one another.

The main target audience of this specification is W3C specification developers. This specification and parts of it can be referenced from other W3C specifications and it defines conformance criteria for W3C specifications, as well as other specifications.

This version of this document represents a significant change from its previous edition. Much of the content is changed and the recommendations are significantly altered. This fact is reflected in a change to the name of the document from “Character Model: Normalization” to “Character Model for the World Wide Web: String Matching and Searching”.

by Richard Ishida at 16 July 2014 05:07 PM

July 14, 2014

W3C I18n Activity highlights

Linked Data meets Content Analytics: 4th LIDER & LD4LT event, 2nd September, Leipzig

The 4th LIDER roadmapping workshop and LD4LT event will take place on September 2nd in Leipzig, Germany. It will be collocated with the SEMANTiCS conference.

The goal of the workshop is to gather input from experts and stakeholders in the area of content analytics, to identify areas and tasks in content analytics where linked data & semantic technologies can contribute. The workshop will organised as part of MLODE 2014 and will be preceded by a hackathon on the 1st of September.

The event is supported by the LIDER EU project, the MultilingualWeb community, the NLP2RDF project as well as the DBpedia Project.

by Richard Ishida at 14 July 2014 10:43 AM

June 30, 2014

Wikimedia Foundation


Translatewiki.net’s logo.

Most Swedes have a basic understanding of English, but many of them are far from being fluent. Hence, it is important that different computer programs are localized so that they can also work in Swedish and other languages. This helps people avoid mistakes and makes the users work faster and more efficienttly. But how is this done?

First and foremost, the different messages in the software need to be translated separately. To get the translation just right and to make sure that the language is consistent requires a lot of thought. In open source software, this work is often done by volunteers who double check each other’s work. This allows for the program to be translated into hundreds of different languages, including minority languages that commercial operators usually do not focus on. As an example, the MediaWiki software that is used in all Wikimedia projects (such as Wikipedia), is translated in this way. As MediaWiki is developed at a rapid pace, with a large amount of new messages each month, it is important for us that we have a large and active community of translators. This way we make sure that everything works in all languages as fast as possible. But what could the Wikimedia movement do to help build this translator community?

We are happy to announce that Wikimedia Sverige is about to start a new project with support from Internetfonden (.Se) (the Internet Fund). The Internet Fund supports projects that improve the Internet’s infrastructure. The idea of translating open software to help build the translator community is in line with their goals. We gave the project a zingy name: “Expanding the translatewiki.net – ‘Improved Swedish localization of open source, for easier online participation’.” This is the first time that Wikimedia Sverige has had a project that focuses on this important element of the user experience. Here we will learn many new things that we will try to share with the wider community while aiming to improve the basic infrastructure on translatewiki.net. The translation platform translatewiki.net currently has 27 programs ready to be translated into 213 languages by more than 6,400 volunteers from around the world.

We will carry out the project in cooperation with Umeå University and Meta Solutions Ltd, with support from the developers of translatewiki.net (who are employed by the Wikimedia Foundation). We will be working on several exciting things and together we will:

  • Build a larger and more active community of Swedish-speaking translator on translatewiki.net;
  • Design a system for Open Badges and explore how it can be integrated with MediaWiki software. (Do let us know if you are working on something similar so that we can help each other!);
  • Complete translations into Swedish for at least five of the remaining programs that are on translatewiki.net;
  • Improve usability by inventorying and clarifying the documentation, something that will be done in cooperation with and will benefit the entire community on translatewiki.net;
  • Umeå University will conduct research on parts of the project so that we get a deeper understanding of the processes (what exactly they will focus their research on is yet to be determined); and
  • Add Meta Solutions’ program EntryScape for translation on translatewiki.net, and document the steps and how it went. This case study will hopefully identify bottle necks and make it easier for others to add their programs. MetaSolutions will also develop the necessary code to make it possible for similar programs to be added to translatewiki.net.

We will also organize several translation sprints where we can jointly translate as many messages as possible (you can also participate remotely). Last year we organized a translation sprint and discovered real value in sitting together. It made the work more enjoyable and it made it easier to arrive at the appropriate translations for the trickier messages. If you would like to be involved in the Swedish translations, please get in contact with us!

Kind regards,

John Andersson
Project Manager

Wikimedia Sverige



by wikimediablog at 30 June 2014 07:07 PM

June 26, 2014

Global By Design

Gmail to be first major platform to support non-Latin email addresses

At the ICANN 50 conference Jordyn Buchanan of Google confirmed that Gmail would support EAI (email address internationalization) by the end of this month. This is significant news. But what does it mean exactly? I don’t have the details yet, but at a minimum I assume it means a Gmail user could create an email address using a […]

by John Yunker at 26 June 2014 05:38 PM

June 24, 2014

Global By Design

Google reenters the domain name business

It is being reported that Google is venturing into new territory by getting into the domain registration business. This isn’t completely accurate. Google dabbled its feet in domain registration years ago. And a few months ago Google began accepting registrations for its Japanese TLD. But perhaps Google is serious this time about domains. I suspect it is, […]

by John Yunker at 24 June 2014 02:27 AM

June 20, 2014

Wikimedia Foundation


Screenshot mock-up of Akruti Sarala – Unicode Odia converter

It’s been over a decade since Unicode standard was made available for Odia script. Odia is a language spoken by roughly 33 million people in Eastern India, and is one of the many official languages of India. Since its release, it has been challenging to get more content on Unicode, the reason being many who are used to other non-Unicode standards are not willing to make the move to Unicode. This created the need for a simple converter that could convert text once typed in various non-Unicode fonts to Unicode. This could enrich Wikipedia and other Wikimedia projects by converting previously typed content and making it more widely available on the internet. The Odia language recently got such a converter, making it possible to convert two of the most popular fonts among media professionals (AkrutiOriSarala99 and AkrutiOriSarala) into Unicode.

All of the non-Latin scripts came under one umbrella after the rollout of Unicode. Since then, many Unicode compliant fonts have been designed and the open source community has put forth effort to produce good quality fonts. Though contribution to Unicode compliant portals like Wikipedia increased, the publication and printing industries in India were still stuck with the pre-existing ASCII and ISCII standards (Indian font encoding standard based on ASCII). Modified ASCII fonts that were used as typesets for newspapers, books, magazines and other printed documents still exist in these industries. This created a massive amount of content that is not searchable or reproducible because it is not Unicode compliant. The difference in Unicode font is the existence of separate glyphs for the Indic script characters along with the Latin glyphs that are actually replaced by the Indic characters. So, when someone does not have a particular ASCII standard font installed, the typed text looks absurd (see Mojibake), however text typed using one Unicode font could be read using another Unicode font in a different operating system. Most of the ASCII fonts that are used for typing Indic languages are proprietary and many individuals/organizations even use pirated software and fonts. Having massive amounts of content available in multiple standards and little content in Unicode created a large gap for many languages including Odia. Until all of this content gets converted to Unicode to make it searchable, sharable and reusable, then the knowledge base created will remain inaccessible. Some of the Indic languages fortunately have more and more contributors creating Unicode content. There is a need to work on technological development to convert non-Unicode content to Unicode and open it up for people to use.

Akruti Sarala – Unicode Odia converter user manual

There are a few different kinds of fonts used by media and publication houses, the most popular one is Akruti. Two other popular standards are LeapOffice and Shreelipi. Akruti software comes bundled with a variety of typefaces and an encoding engine that works well in Adobe Acrobat Creator, the most popular DTP software package. Industry professionals are comfortable using it for its reputation and seamless printing. The problem of migrating content from other standards to Unicode arose when the Odia Wikimedia community started reaching out to these industry professionals. Apparently authors, government employees and other professional were more comfortable using one of the standards mentioned above. All of these people type using either a generic popular standard, Modular, or a universal standard, Inscript. Fortunately, the former is now incorporated into Mediawiki‘s Universal Language Selector (ULS) and the latter is in the process of getting added to ULS. Once this is done, many folks could start contributing to Wikipedia easily.

Content that has been typed in various modified ASCII fonts include encyclopedias that could help grow content on Wikisource and Wikiquote. All of these need to be converted to Unicode. The non-profit group Srujanika first initiated a project to build a converter for conversion of two different Akruti fonts: AkrutiOriSarala99 and OR-TT Sarala. The former being outdated and the other being less popular. The Rebati 1 converter which was built by the Srujanika team was not being maintained and was more of an orphan project. Fellow Wikimedian Manoj Sahukar and myself used parts of the “Rebati 1 converter” code and worked on building another converter. The new “Akruti Sarala – Unicode Odia converter” can convert the more popular AkrutiOriSarala font and its predecessor AkrutiOriSarala99, which is still used by some. Odia Wikimedian Mrutyunjaya Kar and journalist Subhransu Panda have helped by reporting broken conjuncts which helps in fixing all problems before publishing. Odia authors and journalists have already started using the font and many of them have regular posts in Odia. We are waiting for more authors to contribute to Wikipedia by converting their work and wikifying it.

Recently a beta version of another Unicode font converter for Shreelipi fonts based on Odia Wikipedian Shitikantha Dash‘s initial code is released. It works with at least 85 % accuracy.

Even after getting the classical status, Odia language is not being used actively on the internet like some other Indian languages. The main reason behind this is our writing system has not been web-friendly. Most of those in Odisha having typing skills, use modular keyboard and Akruti fonts. Akruti is not web-compatible as we know. There are thousands of articles, literary works, news stories typed in Akruti fonts lying unused (on the internet). Thanks to Subhashish Panigrahi and his associates, they have developed this new font converter that can convert your Akruti text into Unicode. I have checked it. It’s error-free. Now it’s easy for us to write articles online (for Wikipedia and other sites).

Yes, we are late entrants as far as use of vernacular languages on the internet is concerned. But this converter will help us to go godspeed. Lets make Odia our language of communication and expression.

Subhransu Panda, Journalist, author and publisher

Subhashish Panigrahi, Odia Wikipedian and Programme Officer, Centre for Internet and Society

Quick links:

by wikimediablog at 20 June 2014 07:06 PM

Contact: Richard Ishida (ishida@w3.org).