W3C   W3C Internationalization (I18n) Activity: Making the World Wide Web truly world wide!

Latest del.icio.us tags

Blog searches

Contributors

If you own a blog with a focus on internationalization, and want to be added or removed from this aggregator, please get in touch with Richard Ishida at ishida@w3.org.

All times are UTC.

Powered by: Planet

Planet Web I18n

The Planet Web I18n aggregates posts from various blogs that talk about Web internationalization (i18n). While it is hosted by the W3C Internationalization Activity, the content of the individual entries represent only the opinion of their respective authors and does not reflect the position of the Internationalization Activity.

April 14, 2014

Global By Design

Xiaomi’s global expansion plans include Malaysia, Indonesia, Thailand

Xiaomi

In January, I wrote about Xiaomi’s success in crowdsourced translation of its operating system — and how this bodes well for global expansion.

Xiaomi, for those who’ve never heard of it, is a fast-growing mobile phone company in China — and a company with global aspirations.

Xiaomi VP (and ex-Google Android boss) Hugo Barra recently shed light on the markets that Xiaomi will be targeting initially. Here they are:

  • Malaysia
  • Indonesia and the Philippines
  • Thailand
  • India
  • Brazil, Mexico, and potentially other Latin American countries

Malaysia is next in line as the company expands across Asia. But also note Latin America in that list — which in my view will also include the US.

Naturally, I gave the Xiaomi website a quick once-over.

Shown below is the header —  note the globe icon used for the global gateway.

Xiaomi global gateway

Only Chinese and English are  supported currently, though I expect this to change in the year ahead…

 

by John Yunker at 14 April 2014 05:10 PM

April 11, 2014

W3C I18n Activity highlights

Program published for W3C MultilingualWeb Workshop in Madrid, 7-8 May

See the program. The keynote speaker will be Alolita Sharma, Director of Language Engineering from the Wikimedia Foundation. She is followed by a strong line up in sessions entitled Developers, Creators, Localizers, Machines, and Users, including speakers from Microsoft, Wikimedia Foundation, the UN FAO, W3C, Yandex, SDL, Lionbridge, Asia Pacific TLD, Verisign, DFKI, and many more. On the afternoon of the second day we will hold Open Space breakout discussions. Abstracts and details about an additional poster session will be provided shortly.

The program will also feature an LD4LT event on May 8-9, focusing on text analytics and the usefulness of Wikipedia and Dbpedia for multiilngual text and content analytics, and on language resources and aspects of converting selected types of language resources into RDF.

Participation in both events is free. See the Call for Participation for details about how to register for the MultilingualWeb workshop. The LD4LT event requires a separate registration and you have the opportunity to submit position statements about language resources and RDF.

If you haven’t registered yet, note that space is limited, so please be sure to register soon to ensure that you get a place.

The MultilingualWeb workshops, funded by the European Commission and coordinated by the W3C, look at best practices and standards related to all aspects of creating, localizing and deploying the multilingual Web. The workshops are successful because they attract a wide range of participants, from fields such as localization, language technology, browser development, content authoring and tool development, etc., to create a holistic view of the interoperability needs of the multilingual Web.

We look forward to seeing you in Madrid!

by Richard Ishida at 11 April 2014 09:42 AM

April 10, 2014

Wikimedia Foundation

MediaWiki localization file format changed from PHP to JSON

Translations of MediaWiki’s user interface are now stored in a new file format—JSON. This change won’t have a direct effect on readers and editors of Wikimedia projects, but it makes MediaWiki more robust and open to change and reuse.

MediaWiki is one of the most internationalized open source projects. MediaWiki localization includes translating over 3,000 messages (interface strings) for MediaWiki core and an additional 20,000 messages for MediaWiki extensions and related mobile applications.

User interface messages originally in English and their translations have been historically stored in PHP files along with MediaWiki code. New messages and documentation were added in English and these messages were translated on translatewiki.net to over 300 languages. These translations were then pulled from MediaWiki websites using LocalisationUpdate, an extension MediaWiki sites use to receive translation updates.

So why change the file format?

The motivation to change the file format was driven by the need to provide more security, reduce localization file sizes and support interoperability.

Security: PHP files are executable code, so the risk of malicious code being injected is significant. In contrast, JSON files are only data which minimizes this risk.

Reducing file size: Some of the larger extensions have had multi-megabyte data files. Editing those files was becoming a management nightmare for developers, so these were reduced to one file per language instead of storing all languages in large sized files.

Interoperability: The new format increases interoperability by allowing features like VisualEditor and Universal Language Selector to be decoupled from MediaWiki because it allows using JSON formats without MediaWiki. This was earlier demonstrated for the jquery.18n library. This library, developed by Wikimedia’s Language Engineering team in 2012, had internationalization features that are very similar to what MediaWiki offers, but it was written fully in JavaScript, and stored messages and message translations using JSON format. With LocalisationUpdate’s modernization, MediaWiki localization files are now compatible with those used by jquery.i18n.

An RFC on this topic was compiled and accepted by the developer community. In late 2013, developers from the Language Engineering and VisualEditor teams at Wikimedia collaborated to figure out how MediaWiki could best be able to process messages from JSON files. They wrote a script for converting PHP to JSON, made sure that MediaWiki’s localization cache worked with JSON, updated the LocalisationUpdate extension for JSON support.

Siebrand Mazeland converted all the extensions to the new format. This project was completed in early April 2014, when MediaWiki core switched over to processing JSON, creating the largest MediaWiki patch ever in terms of lines of code. The localization formats are documented in mediawiki.org, and MediaWiki’s general localization guidelines have been updated as well.

As a side effect, code analyzers like Ohloh no longer report skewed numbers for lines of PHP code, making metrics like comment ratio comparable with other projects.

Work is in progress on migrating other localized strings, such as namespace names and MediaWiki magic words. These will be addressed in a future RFC.

This migration project exemplifies collaboration at its best between many MediaWiki engineers contributing to this project. I would like to specially mention Adam Wight, Antoine Musso, David Chan, Ed Sanders, Federico Leva, James Forrester, Jon Robson, Kartik Mistry, Niklas Laxström, Raimond Spekking, Roan Kattouw, Rob Moen, Sam Reed, Santhosh Thottingal, Siebrand Mazeland and Timo Tijhof.

Amir Aharoni, Interim PO and Software Engineer, Wikimedia Language Engineering Team

by Amir E. Aharoni at 10 April 2014 11:55 AM

April 07, 2014

Global By Design

Wikipedia and the Internet language chasm

When talking about language diversity across the Internet, I like to include a visual that illustrates the language leaders of the Internet:

Language leaders of the Internet 2014

This chart is based on data from the 2014 Web Globalization Report Card. English (US) is not counted.

In it, you have Wikipedia at the top, supporting more than 280 languages.Wikipedia represents (for now) the high-water mark for linguistic diversity on a website. It’s a fascinating benchmark because people are not paid to create content; what you see reflects user initiative (as well as factors such as Internet and computer penetration).

I was interested to see this quote in Motherboard:

There are 533 proposals for Wikipedia languages in incubator stage, more than twice the number of actual Wikipedias, but Kornai estimates no more than a third of them will ever get the required minimum of at least five active users and get enough pages to make it onto Wikipedia proper.

So it’s feasible we could the see the number of languages on Wikipedia double in the years ahead — though the article stresses that languages are in fact dying as a result of the Internet (a topic for a future blog post).

To the left of Wikipedia we have Google Search with support for more than 140 languages. However, this number reflects only the Google Search interface; most Google services (such as YouTube and Gmail) support fewer than 60 languages.

Next you have global companies such as Toyota and DHL and Panasonic, which support roughly 41-42 languages on their websites.

For most companies, 40 languages is a goal they cannot even imagine reaching. The average number of languages supported by the websites in the Report Card is 28 — which reflects only the leading global companies and brands.

Average number of languages supported by leading global websites

Most companies are happy if they support five or more languages on their websites.

So what does this data mean? To me, it means that there is a profound gap between possible number of languages a website can support (Wikipedia) and the practical number of languages that most websites currently support. By practical, I’m referring to the limited budgets that companies commit to professional translation.

Now, to the far right of the chart is Google Translate — with support for roughly 80 languages. Now here is where things get interesting, because machine translation (warts and all) supports a vastly greater number of languages than the Fortune 500 (or 50 for that matter)

google_translate_2014

That’s not to say that companies shouldn’t continue to invest in professional translation — indeed they should.

But machine translation has a  disruptive role to play in helping to overcome the language chasm. 

by John Yunker at 07 April 2014 03:13 PM

April 04, 2014

W3C I18n Activity highlights

Registration for Workshop on Linked Data, Language Technologies and Multilingual Content Analytics now open!

Register now for the recently announced workshop on Linked Data, Language Technologies and Multilingual Content Analytics (8-9 May, Madrid). A preliminary agenda has been created and the registration form is available.

If you are interested in contributing a position statement please indicate this in the dedicated field in the registration form. The workshop organizers will come back to you with questions to answer in the position statement. We then will select which statements are appropriate for presentations on 9 May, and inform you by 28 April.

We are looking forward to see you in Madrid, both for this event and the MultilingualWeb workshop!

by Richard Ishida at 04 April 2014 08:42 AM

April 03, 2014

Wikimedia Foundation

Modernising MediaWiki’s Localisation Update

Interface messages on MediaWiki and its many extensions are translated into more than 350 languages on translatewiki.net. Thousands of translations are created or updated each day. Usually, users of a wiki would have to wait until a new version of MediaWiki or of an extension is released to see these updated translations. However, webmasters can use the LocalisationUpdate extension to fetch and apply these translations daily without having to update the source code.

LocalisationUpdate provides a command line script to fetch updated translations. It can be run manually, but usually it is configured to run automatically using cron jobs. The sequence of events that the script follows is:

  1. Gather a list of all localisation files that are in use on the wiki.
  2. Fetch the latest localisation files from either:
    • an online source code repository, using https, or
    • clones of the repositories in the local file system.
  3. Check whether English strings have changed to skip incompatible updates.
  4. Compare all translations in all languages to find updated and new translations.
  5. Store the translations in separate localisation files.

MediaWiki’s localisation cache will automatically find the new translations via a hook subscribed by the LocalisationUpdate extension.

Until very recently the localisation files existed in PHP format. These are now converted to JSON format. This update required changes to be made in LocalisationUpdate to handle JSON files. Extending the code piecemeal over the years had made the code base tough to maintain. The code has been rewritten with extensibility to support future development as well as to retain adequate support for older MediaWiki versions that use this extension.

The rewrite did not add any new features except support for JSON format. The code for the existing functionality was refactored using modern development patterns such as separation of concerns and dependency injection. Unit tests were added as well.

The configuration format for the update scripts changed, but most webmasters won’t need to change anything, and will be able to use the default settings. Changes will be needed only on sites that for some reason don’t use the default repositories.

New features are being planned for future versions that would optimise LocalisationUpdate to run faster and without any manual configuration. Currently, the client downloads the latest translations for all extensions in all languages and then compares which translations can be updated. By moving some of the complex processing to a separate web service, the client can save bandwidth by downloading only updated messages for specific updated languages used by the reader.

There are still more things to improve in LocalisationUpdate. If you are a developer or a webmaster of a MediaWiki site, please join us in shaping the future of this tool.

Niklas Laxström and Runa Bhattacharjee, Language Engineering, Wikimedia Foundation

by Runa Bhattacharjee at 03 April 2014 09:41 AM

April 01, 2014

W3C I18n Activity highlights

Updated article: Declaring character encodings in CSS

This update brings the article in line with recent developments in CSS, and reorganizes the material so that readers can find information more quickly. This led to the article being almost completely rewritten.

The article addresses the question: How do I declare the character encoding of a CSS style sheet?

German, Greek, Spanish, Hebrew, Hungarian, Brazilian Portuguese, Russian, Swedish, Ukrainian and Vietnamese translators are asked to update their translation of this article within the next month, otherwise the translations will be removed per the translation policy, since the changes are substantive.

by Richard Ishida at 01 April 2014 04:46 PM

Updated article: Choosing & applying a character encoding

This update brings the article in line with recent developments in HTML5, and reorganizes the material so that readers can find information more quickly. This led to the article being almost completely rewritten.

The article addresses the question: Which character encoding should I use for my content, and how do I apply it to my content?

German, Spanish, Brazilian Portuguese, Russian, Swedish and Ukrainian translators are asked to update their translation of this article within the next month, otherwise the translations will be removed per the translation policy, since the changes are substantive.

by Richard Ishida at 01 April 2014 04:44 PM

Global By Design

5 Questions to Ask Any Potential Translation Vendor

My latest post for client Pitney Bowes on selecting a translation agency.

An excerpt:

Can I speak with your references?
Always, always check references. Ideally, you should speak with three to five references and at least one reference from within your industry. A reference can play an important role in not only helping you make a vendor selection, but also helping you avoid mistakes along the way. Often, references are more than happy to share lessons learned and best practices. If you’re new to translation, these references can be invaluable.

Your translation vendor is your partner in speaking to the world — as important as any employee. So take the extra time to check references!

Link

by John Yunker at 01 April 2014 12:10 AM

March 26, 2014

W3C I18n Activity highlights

Extend your stay for the upcoming MultilingualWeb workshop! Join the LIDER Workshop 8-9 May, to discuss Wikipedia, Multilingual Analytics and Linked Data for Language Resources

Aligned with the MultilingualWeb workshop (7-8 May, Madrid), the LIDER project is organizing a roadmapping workshop 8-9 May. The 8 May afternoon session will provide a keynote by Seth Grimes and also focus on the topic of Wikipedia for multilingual Web content. Via several panels including contributions from key Wikipedia engineers, we will discuss cross lingual analytics and intelligent multilingual content handling in Wikipedia. On 9 May, a 1/2 day session will focus on aspects of migrating language resources into linked data.

Mark your calendar now! A dedicated registration form including ways to contribute to the workshop agenda will be made available soon.

by Richard Ishida at 26 March 2014 03:17 PM

Global By Design

GoDaddy’s new global gateway sets the stage for global growth

It’s nice to see GoDaddy improving its global gateway.

Note the use of the globe icon below to indicate the global gateway menu:

goddaddy gateway

Click on the globe or locale name and you’ll see the following menu:

godaddy global gateway

It’s text-only, easy to read. Simple.

GoDaddy has a long ways to go in regards to web globalization, but this global gateway is a good foundation for growth — which I suspect is on the horizon.

 

by John Yunker at 26 March 2014 02:19 AM

March 20, 2014

W3C I18n Activity highlights

CLDR Version 25 Released

Unicode CLDR 25 has been released, providing an update to the key building blocks for software supporting the world’s languages. This data is used by a wide spectrum of companies for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Unicode CLDR 25 focused primarily on improvements to the LDML structure and tools, and on consistency of data. There are many smaller data fixes, but there was no general data submission. Changes include the following:

  • New rules for plural ranges (1-2 liters) for 72 locales, plurals for 2 locales, and ordinals for 18 locales.
  • Better locale matching with fallbacks for languages, default languages for continents and subcontinents, and default scripts for more languages.
  • Two new locales: West Frisian (fy) and Uyghur (ug).
  • Two new metazones: Mexico_Pacific and Mexico_Northwest
  • Updated zh pinyin & zhuyin collations and translators for Unicode 6.3 kMandarin data
  • Updated keyboard layout data for OSX, Windows and others.

This version contains data for 238 languages and 259 territories—740 locales in all.

Details are provided in http://cldr.unicode.org/index/downloads/cldr-25, along with a detailed Migration section.

by Richard Ishida at 20 March 2014 10:16 AM

March 10, 2014

Global By Design

The worst global websites of the 2014 Web Globalization Report Card

report card

You can be a wildly successful global company and still have a poorly localized website.

A number of factors determine global success and the website is only one of these factors — unless of course you’re an Internet-based company (you’ll note below that none of these companies are “web only”).

I also want to stress that the websites listed below are the lowest-scoring websites in the 2014 Web Globalization Report Card — and not necessarily the worst global websites, period. The Report Card analyzes a carefully curated group of websites, across more than a dozen industry sectors, with the primary intention of noting emerging and established best practices. Some industry sectors simply do a better job at web globalization than other sectors, such as retail, consumer products, and financial services.

Okay, now that I’ve gotten the caveats out of the way, here are the 20 websites that finished at the bottom of the 2014 Report Card:

  • Axa
  • Hilton
  • Toys R Us
  • Sony
  • Citibank
  • Best Buy
  • Visa
  • Hyundai
  • GameStop
  • Costco
  • Dolby
  • Loréal
  • Enterprise
  • Dollar
  • Rent A Car
  • Jack Daniels
  • Kleenex
  • Gap
  • Heineken
  • Four Seasons
  • MTV
  • Ramada
  • Thrifty
  • Walmart
  • Budweiser

Budweiser finished in last place, preceded by Walmart and so on.

A handful of companies have become regulars on this list over the past few years, companies like Walmart, Heineken, Loréal, Sony and Four Seasons.

But there is no “one” reason why these websites finished at the bottom of the list. To paraphrase Tolstoy: All successful global websites are alike; each unsuccessful global website is unsuccessful in its own way. 

It would be easy to blame limited language support and, yes, many of these websites fall well short of the average of 28 languages set by the leading global websites, particularly the websites of Costco and Walmart. But I should note that 10 of the websites on this list support 20 or more languages and Sony and Hyundai support 40 or more languages.

Lack of global consistency is an issue with many websites. That is, each country web team appears to have gone off on its own and created a website from scratch instead of working across company to share common design templates and resources.

Global gateways are often wildly erratic — or missing altogether. A number of these websites offer no visual clues on their .com home pages to help users find localized websites.

One important note: Taking retail global is incredibly difficult. Companies like Costco, Best Buy, Gamestop, and Walmart face steep odds when trying to expand into new markets — for them, the website may be the easiest aspect of going global. Walmart, for example, has struggled greatly in countries like Japan and Germany — and not because of web localization. The type of industry you are in weighs heavily in the challenges you face as you go global.

Having said all this, there is some good news. A few of these companies could vastly improve their websites with relatively minor fixes; that is, they have the localized content but they’re just not presenting it in a scalable or user-friendly manner.  I also know a few of these companies are in the process of rolling out new and improved global websites right now and they won’t be on this list much longer.

 

by John Yunker at 10 March 2014 11:32 PM

March 07, 2014

Wikimedia Foundation

Webfonts: Making Wikimedia projects readable for everyone

Wikimedia wikis are available in nearly 300 languages, with some of them having pages with mixed-script content. An example is the page on the writing systems of India on the English Wikipedia. We expect users to be able to view this page in full and not see meaningless squares also known as tofu. These tofu squares represent letters written in the language, but cannot be rendered by the web browser on the reader’s computer. This may happen due to several reasons:

  • The device does not have the font for the particular script;
  • The operating system or the web browser do not support the technology to render the character;
  • The operating system or the browser support the script partially. For instance, due to gradual addition of characters in recent Unicode versions for several scripts, the existing older fonts may not be able to support the new characters.

Fonts for most languages written in the Latin script are widely available on a variety of devices. However, languages written in other scripts often face obstacles when fonts on operating systems are unavailable, outdated, bug-ridden or aesthetically sub-optimal for reading content.

Using Webfonts with MediaWiki

To alleviate these shortcomings, the WebFonts extension was first developed and deployed to some wikis in December 2011. The underlying technology provides the ability to download fonts automatically to the user if they are not present on the reader’s device, similar to how images in web pages are downloaded.

The old WebFonts extension was converted to the jquery.webfonts library, which was included in the Universal Language Selectorthe extension that replaced the old WebFonts extension. Webfonts are applied using the jquery.webfonts library, and on Wikimedia wikis it is configured to use the fonts in the MediaWiki repository. The two important questions we need answered before this can be done are:

  1. Will the user need webfonts?
  2. If yes, which one(s)?

Webfonts are provided when:

  • Users have chosen to use webfonts in their user preference.
  • The font is explicitly selected in CSS.
  • Users viewing content in a particular language do not have the fonts on their local devices, or the devices do not display the characters correctly, and the language has an associated default font that can be used instead. Before the webfonts are downloaded, a test currently known as “tofu detection” is done to ascertain that the local fonts are indeed not usable. The default fonts are chosen by the user community.

Webfonts are not applied:

  • when users choose not to use webfonts, even if there exists a valid reason to use webfonts (see above);
  • in the text edit area of the page, where the user’s preference or browser settings are honored.

See image (below) for a graphical description of the process.

‘Tofu’ Detection

The font to be applied is chosen either by the name of the font-family or as per the language, if the designated font family is not available. For the latter, the default font is at the top of the heap. However, negotiating more complex selection options like font inheritance, and fallback add to the challenge. For projects like Wikimedia, selecting appropriate fonts for inclusion is also of concern. The many challenges include the absence of well-maintained fonts, limited number of freely licensed fonts and rejection of fonts by users for being sub-optimal.

Challenges to Webfonts

Merely serving the webfont is not the only challenge that this technology faces. The complexities are compounded for languages of South and South-East Asia, as well as Ethiopia and few other scripts with nascent internationalization support. Font rendering and support for the scripts vary across operating system platforms. The inconsistency can stem from the technology that is used like the rendering engines, which can display widely different results across browsers and operating systems. Santhosh Thottingal, senior engineer for Wikimedia’s Language Engineering team who has been participating in recent developments to make webfonts more efficient, outlines this in greater detail.

Checkbox in the Universal Language Selector preferences to download webfonts

A major impact is on bandwidth consumption and on page load time due to additional overhead of delivering webfonts for millions of users. A recent fallout of this challenge was the change that was introduced in the Universal Language Selector (ULS) to prevent pages from being loaded slowly, particularly when bandwidth is a premium commodity. A checkbox now allows the users to determine if they would like webfonts to be downloaded.

Implementing Webfonts

Several clever solutions are currently in use to avoid the known challenges. The webfonts are prepared with an aim to create comparatively smaller footprints. For instance, Google’s sfntly tool that uses MicroType Express for compression is used for creating the fonts in EOT format (WOFF being the other widely used webfont format). However, the inherent demands of a script with larger character sets cannot always be overridden efficiently. Caches are used to reduce unnecessary webfonts downloads.

FOUT or Flash Of Unstyled Text is an unavoidable consequence when the browser displays text in dissimilar styling or no text at all, while waiting for the webfonts to load. Different web browsers handle this differently while optimizations are in the making. A possible solution in the near future may be the introduction of the in-development WOFF2 webfonts format that is expected to further reduce font size, improve performance and font load events.

Special fonts like the Autonym font are used in places where known textlike a list of language namesis required to be displayed in multiple scripts. The font carries only the characters that are necessary to display the predefined content.

Additional optimizations at this point are directed towards improving the performance of the JavaScript libraries that are used.

Conclusion

Several technical solutions are being explored within Wikimedia Language Engineering and in collaboration with organizations with similar interests. Wikipedia’s sister project Wikisource attempts to digitize and preserve copyright-expired literature, some of which is written in ancient scripts. In these as well as other cases like accessibility support, webfonts technology allows fonts for special needs to be made available for wider use. The clear goal is to have readable text available for all users irrespective of the language, script, device, platform, bandwidth, content and special needs.

For more information on implementing webfonts in MediaWiki, we encourage you to read and contribute to the technical document on mediawiki.org

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering, Wikimedia Foundation

by Runa Bhattacharjee at 07 March 2014 12:53 PM

March 06, 2014

Global By Design

Google launches its first Japanese IDN

I’ve long talked about the importance of non-Latin domain names, or IDNs (Internationalized Domain Names).

Google has gone live with one if its many IDNs: みんな.

I want to emphasize here that this is a top-level IDN — that is, the equivalent of a .com or .org.

This TLD, according to Google, stands for “everyone.”

So you could in effect register “someword.everyone,” which sounds a bit odd to me but I’m not Japanese.

And, frankly, the Japanese have not been blessed with much in the way of IDN options up to this point.

There is no Japanese-language country code, for instance. And few Japanese-based companies have been aggressive in promoting IDNs.

The new Google IDN website leads with a headline that translates to Let’s Start With.Everyone.

japanese IDN Google

Check out the video to get a good idea of how Google is positioning this domain against .com and .jp:

Despite the fancy website and video, I don’t believe Google is fully invested in the success of this domain.

If it were invested, the domain wouldn’t cost roughly $18 to register (by my rough calculations).

But that doesn’t mean Google can’t become invested in it at a later point.

The good news is that Google is moving ahead on commercializing IDNs.

I expect other tech companies to follow. 

by John Yunker at 06 March 2014 11:06 PM

March 05, 2014

Global By Design

Localizing your website for Canada: It’s more challenging than you might think

My latest post for client Pitney Bowes on localizing for Canada.

An excerpt:

As they begin their global expansion, American companies often select Canada first under the assumption that the market will be easier to succeed in than more distant and culturally unfamiliar markets such as Germany or Japan.

However, physical and cultural proximity should not be confused with ease of market entry or ease of website localization. Every country is a new market with unique cultures, laws, and ways of doing business. This article highlights a few key tips to consider as you head to the “great white north.”

Link

by John Yunker at 05 March 2014 04:31 PM

W3C I18n Activity highlights

Speaker deadline for Madrid MultilingualWeb Workshop is Friday, March 14

We would like to remind you that the deadline for speaker proposals for the 7th MultilingualWeb Workshop (May 7–8, 2014, Madrid, Spain) is on Friday, March 14, at 23:59 UTC.

Featuring a keynote by Alolita Sharma (Director of Engineering, Wikipedia) and breakout sessions on linked open data and other critical topics, this Workshop will focus on the advances and challenges faced in making the Web truly multilingual. It provides an outstanding and influential forum for thought leaders to share their ideas and gain critical feedback.

While the organizers have already received many excellent submissions, there is still time to make a proposal, and we encourage interested parties to do so by the deadline. With roughly 200 attendees anticipated for the Workshop from a wide variety of profiles, we are certain to have a large and diverse audience that can provide constructive and useful feedback, with stimulating discussion about all of the presentations.

For more information and to register, please visit the Madrid Workshop Call for Participation.

by Richard Ishida at 05 March 2014 11:31 AM

February 24, 2014

W3C I18n Activity highlights

Alolita Sharma (Wikipedia) to deliver keynote at 7th Multilingual Web Workshop (May 7–8, 2014, Madrid)

We are please to announce that Alolita Sharma, Director of Engineering for Internationalization and Localization at Wikipedia, will deliver the keynote at the 7th Multilingual Web Workshop, “New Horizons for the Multilingual Web,” in Madrid, Spain (7–8 May 2014).

With over 30 million articles in 286 languages as of January 1, 2014, Wikipedia has now become one of the largest providers of multilingual content in the world. Because of its user-generated and constantly changing content, many traditional processes for managing multilingual content on the web either do not work or do not scale well for Wikipedia. Alolita Sharma’s keynote will highlight Wikipedia’s diversity in multilingual user-generated content and the language technologies that Wikipedia has had to develop to support its unprecedented growth of content. She will also discuss the many challenges Wikipedia faces in providing language support for the mobile web.

The Multilingual Web Workshop series brings together participants interested in the best practices, new technologies, and standards needed to help content creators, localizers, language tools developers, and others address the new opportunities and challenges of the multilingual Web. It will provide for networking across communities and building connections.

Registration for the Workshop is free, and early registration is recommended since space at the Workshop is limited.

There is still opportunity for individuals to submit proposals to speak at the workshop. Ideal proposals will highlight emerging challenges or novel solutions for reaching out to a global, multilingual audience. The deadline for speaker proposals is March 14, but early submission is strongly encouraged. See the Call for Participation for more details.

This workshop is made possible by the generous support of the LIDER project, which will organize a roadmapping workshop on linked data and content analytics as one of the tracks at Multilingual Web Workshop.

by Richard Ishida at 24 February 2014 05:18 PM

February 20, 2014

Global By Design

WhatsApp: Another “translation worthy” success story

I wrote recently that if you can make your product “translation worthy” the world will follow.

Reading about Facebook’s purchase of WhatsApp I went back and did some language crunching.

WhatsApp Arabic

In December 2012, WhatsApp supported 15 languages. I also noted then that I really liked the company’s global gateway.

WhatsApp Language Growth

Today, WhatsApp supports 35 languages — thanks in large part (or entirely) to a crowd of volunteer translators.

WhatsApp crowdsourcing

This number of languages is now well above the average number of languages tracked in the Web Globalization Report Card.

Clearly, WhatsApp proved itself to be translation worthy.

And now, with Facebook involved, I would not be surprised to see WhatsApp double its language count over the next year or two. Perhaps WhatsApp will actually start paying for translation now…

 

by John Yunker at 20 February 2014 05:40 PM

February 19, 2014

Global By Design

Five reasons you should take your ecommerce global – and five reasons you shouldn’t

My latest post for client Pitney Bowes on going global (or not).

An excerpt — Two reasons NOT to go global:

1. You don’t have realistic expectations (and budgets).
The most common mistake companies make when going global is expecting too much success too early. Doing so not only sets unrealistic expectations, but it also creates a short-term mentality, along with short-term budget commitments. Companies that succeed in new markets typically start small, set achievable and realistic goals, and set longer-term (3-5 year) budgetary commitments.

2. Your staff isn’t ready to go global.
While I believe that the best way to learn something is to do something, you also need to be as prepared as you can be before getting started. Too often, companies don’t have people who are even aware of the complexities of going global. Regularly reading this blog, for example, is something every employee should undertake to start getting a feel for the opportunities and challenges of going global. You want your colleagues to be inherently curious about the world, about cultures, and, ultimately, about customers who may speak any number of languages.

Link

 

 

by John Yunker at 19 February 2014 04:05 PM

February 13, 2014

Global By Design

A look back at the language growth of eBay, Coke, Apple, AmEx and Amazon

Sometimes it’s difficult to see a revolution when you’re standing right in the middle of it.

Which is how I still feel sometimes when it comes to web globalization.

Web globalization feels at times like slow-moving revolution. Every year, companies add, on average, a language or two to their websites. And while one or two languages may not seem all that “revolutionary” at the time, over a number of years, the growth is significant.

Particularly when you take a ten-year perspective.

Shown below are five of the many websites that I’ve tracked since 2004. Note that English for the US is not figured in the counts:

Language growth 2004 to 2014

In 2004, eBay supported just 9 languages; today it supports 25.

American Express went from 24 languages to 40.

Coca-Cola went from 26 languages to 43.

Apple has more than doubled its language count in that time as well, though I believe Apple should be doing much more in this regard; Apple still lags the websites of Samsung, Microsoft, and Google.

What’s important to note is that most companies more than doubled the number of languages they support over this time span. Not just the companies listed here but a good number of the companies in the Report Card.

As for Amazon, it too doubled its support for languages, but  remains well behind the pack in linguistic reach. I’ve long argued that Amazon took its foot off the web globalization pedal prematurely. And now that Apple is selling digital media in more than 50 countries, with Google close behind, I wonder if this is the year we see Amazon start to invest in global expansion again.

The language growth underscores a point I often make regarding web globalization — you need to think about “scale” as early as possible.

That is, will your global template scale? Will your workflows, management structure, vendors, and software scale?

You may be planning to add only one additional languages this year, but as this chart demonstrates, you may be adding 20 languages over the long run.

As I’ve said before, the Internet connects computers but it is language that connects people. This is the revolution going on all around us, though often in slow motion. 

by John Yunker at 13 February 2014 06:52 PM

February 10, 2014

Global By Design

The top 25 global websites from the 2014 Web Globalization Report Card

More than ten years ago I set out to create a report that benchmarked global websites.

I looked at languages supported. I studied the localized websites. I interviewed the executives who managed these sites and learned what was working and what wasn’t working.

And the end result of that work become The Web Globalization Report Card.

There was nothing else around like it. Most companies at the time supported fewer than 5 languages so many executives didn’t even see the need for such a report.

But times have changed. And here I am announcing the leading websites from the 10th edition of the Report Card:

web globalization top 25 websites

Google is no stranger to the top spot. Given the company’s focus on supporting so many languages across so many products, the company didn’t really face much competition this year.

Granted, I still think Google needs to improve its global navigation. I know the company has been working on “harmonizing” its navigation across products, but the “global gateway” remains elusive. And that’s still a work in progress.

But even with this downside, Google remains the leader.

Hotels.com and Facebook more or less held their own over the past year. But there were more interesting developments further down the list.

For example, Starbucks continues to improve its global website, adding languages and modifying its global template. And it remains a leader in local-language social engagement. Its global gateway still needs work though.

NIVEA did much better this year due in large part to its investment in image localization. Check out NIVEA’s many local websites and you’ll see what I mean.

It’s very interesting to see four travel services companies in this list: Hotels.com, Booking.com, TripAdvisor, and Kayak. These companies continue to prove that the travel services sector is among the most competitive when it comes to web and mobile globalization.

It’s also worth highlighting companies like Cisco, Philips, IKEA, and Microsoft — all of which have become regulars in the top 25 list, and for good reason.

Did you know the average number of languages supported by these 25 websites is 50? Even if we were to remove Wikipedia, which is a true language outlier (in a good way), the average would still be above 45 languages.

These companies also generally do a very good job with global gateways, support for country codes — as well as backend technologies like geolocation and language negotiation. In other words, they invest in making local content easy to find for users around the world.

They all do an excellent job of supporting consistent global design templates. This is one of the most important web globalization best practices — one that has clearly stood the test of time.

These companies invest more heavily than most companies in localization — which isn’t just about translation. There is support for local-language social platforms, localized ecommerce, customer support, and culture-specific content and promotions.

Congrats to the top 25 companies and the people within them that have long championed web and mobile globalization!

Learn more about the Report Card.

by John Yunker at 10 February 2014 04:16 PM

W3C I18n Activity highlights

MultilingualWeb-LT Working Group closed, ITS community continues in ITS IG

The MultilingualWeb-LT Working Group has been closed, since it successfully completed the work in its charter.

We thank the co-chairs, the editors, implementers and the Working Group for achieving the goal to publish Internationalization Tag Set (ITS) 2.0 as a W3C Recommendation, and for doing so ahead of the original schedule.

Work on enlarging the community around ITS, gathering feedback and requirements for future work will now continue in the ITS Interest Group.

by Richard Ishida at 10 February 2014 01:29 PM

February 07, 2014

W3C I18n Activity highlights

Updated article: Inline markup and bidirectional text in HTML

Inline markup and bidirectional text in HTML is a major update of the article formerly titled What You Need to Know About the Bidi Algorithm and Inline Markup, and reflects the recent changes in bidi markup in the HTML5 specification.

Technically speaking, the main change is that the dir attribute now isolates text by default with respect to the bidi algorithm. Isolation as a default is the recommendation of the Unicode Standard as of version 6.3.

For the less technical-minded, the main advantage of this change is a much simpler transition for both content authors and browser developers who want reap the benefits of isolation. At the same time, these approaches have good results for existing legacy content.

by Richard Ishida at 07 February 2014 06:40 PM

February 04, 2014

Global By Design

Google Translate turns 80, as in languages

google_translate

From Afrikaans to Zulu, the evolution of Google Translate is one of Google’s greatest success stories that few people fully appreciate. Perhaps because Google is reluctant to release usage data (which I imagine is significant).

So Google now supports 80 languages, having added support for Somali, Zulu, three Nigerian languages (Igbo, Hausa,Yoruba), Mongolian, Nepali, and Punjabi.

I’m not a fan of Google+. I won’t be caught dead wearing Google Glass. But I’ll be the first to sing the praises of Google Translate and Google’s ongoing investment in languages.

Google long ago set the bar for what a “global” website or web app should support in terms of languages. It raised that bar to 40 languages a few years ago is now raising it again to 60. If Google Translate is any indicator, that bar will be raised again over the next decade.

To give you an idea of just how far Google Translate has come in the past eight years, here is a screen grab I took back in 2006:

google_translate_2006

It’s amusing to see Arabic, Japanese, Korean, and Chinese labeled as BETA languages.

And impressive to see that Google Translate has grown from roughly 10 languages to 80 languages in eight years.

PS: Google Translate is one of the reasons Google does so well in the annual Web Globalization Report Card. I’m nearly complete with the 2014 edition and, yes, Google is looking good again this year.

 

 

 

by John Yunker at 04 February 2014 02:20 PM

February 03, 2014

W3C I18n Activity highlights

Call for position statements: Linked Data for Language Technology #LiderEU

The LD4LT (Linked Data for Language Technology) Workshop will be held on 21 March, in Athens, Greece, aligned with the European Data Forum 2014. See the agenda.

The workshop is a free community event – there is no admission fee for participants, but registration is required.

You are encouraged to provide a title for a position statement in your registration form. This is a simple, short statement that summarizes your ideas / technologies / use cases related to Linked Data and Language Technology.

The meeting is supported by the LIDER project, the MultilingualWeb community, the NLP2RDF project, the Working Group for Open Data in Linguistics as well as the DBpedia Project.

As input to the discussion and the work of the LD4LT group, you may also want to fill in the first LIDER survey.

by Richard Ishida at 03 February 2014 12:50 PM

January 31, 2014

W3C I18n Activity highlights

First public working draft of Encoding published

The Internationalization Working Group has published a First Public Working Draft of Encoding.

While encodings have been defined to some extent, implementations have not always implemented them in the same way, have not always used the same labels, and often differ in dealing with undefined and former proprietary areas of encodings. This specification attempts to fill those gaps so that new implementations do not have to reverse engineer encoding implementations of the market leaders and existing implementations can converge.

This is a snapshot of the Encoding Living Standard, as of the date shown on the title page. No changes have been made in the body of the W3C draft other than to align with W3C house styles. The primary reason that W3C is publishing this document is so that HTML5 and other specifications may normatively refer to a stable W3C Recommendation.

by Richard Ishida at 31 January 2014 05:04 PM

January 30, 2014

W3C I18n Activity highlights

Register now for the 7th W3C MultilingualWeb Workshop, Madrid, 7-8 May

Register early to ensure you get a place. Anyone may attend all sessions at no charge and the W3C welcomes participation by both speakers and non-speaking attendees.

Since 2010 the W3C’s Multilingual Web Workshop series has become the preeminent venue for discussion of the standards and technologies that define and enable multilingualism on the Web. The 7th Workshop, “New Horizons for the Multilingual Web,” will be held 7–8 May 2014 in Madrid, Spain.

The workshop brings together participants interested in the best practices, new technologies, and standards needed to help content creators, localizers, language tools developers, and others address the new opportunities and challenges of the multilingual Web. It will provide for networking across communities and building connections.

We are particularly interested in speakers who are facing emerging challenges or who can demonstrate novel solutions for reaching out to a global, multilingual audience. The deadline for speaker proposals is March 14, but early submission is strongly encouraged.

This workshop is made possible by the generous support of the LIDER project, which will organize a roadmapping workshop on linked data and content analytics as one of the tracks at Multilingual Web Workshop.

See the Call for Participation and register online.

by Richard Ishida at 30 January 2014 01:55 PM

January 28, 2014

Global By Design

Going beyond stock photos to succeed locally

My latest post for client Pitney Bowes includes tips for creating “world ready” visuals.

An excerpt:

Don’t Send the Wrong (Hand) Signal

Gestures are culturally specific and, while some gestures have gone global, there are variations on these gestures and hand signals, in addition to locally unique signals, that you’ll need to know. The peace sign may be globally ubiquitous, but if you were to rotate your hand around, it suddenly becomes an offensive gesture in countries such as the UK and Australia. President George H.W. Bush was widely ridiculed in Australia when he visited in 1992 and gave the peace sign in reverse.

Hand gesture for peace sign

Image courtesy of Danilo Rizzut] / FreeDigitalPhotos.net

The OK sign may be perfectly “OK” in the US, but it can be quite offensive in countries such as Turkey and Brazil. And it can be taken to mean “zero” in France.

More

by John Yunker at 28 January 2014 07:56 PM

January 24, 2014

W3C I18n Activity highlights

Building bridges: Tutorial, Linked Data for Language Technologies at LREC 2014 Conference #LiderEU

Under the umbrella of the Lider project and the MutilingualWeb community, the tutorial on Linked Data for Language Technologies aims at building bridges between two communities. Experts in language resources and applications will learn how to work with technical building blocks of linked data (RDF, SPARQL, …); how to build linked data lexicon representations using the LEMON model; and how to integrate natural language processing workflows using the RDF NIF format. The tutorial is part of the LREC 2014 conference. The presenters are key participants in the LIDER projects and in W3C community groups like OntoLex, Best Practices for Multilingual Linked Open Data, and Linked Data for Language Technology.

by Richard Ishida at 24 January 2014 02:27 PM


Contact: Richard Ishida (ishida@w3.org).