W3C   W3C Internationalization (I18n) Activity: Making the World Wide Web truly world wide!

Subscribe to this

(feed) Atom feed

Latest del.icio.us tags

Contributors

If you own a blog with a focus on internationalization, and want to be added or removed from this aggregator, please get in touch with Richard Ishida at ishida@w3.org.

All times are UTC.

Powered by: Planet

Planet Web I18n

The Planet Web I18n aggregates posts from various blogs that talk about Web internationalization (i18n). While it is hosted by the W3C Internationalization Activity, the content of the individual entries represent only the opinion of their respective authors and does not reflect the position of the Internationalization Activity.

July 01, 2009

ishida>>blog » i18n

Converter tool updated and moved

>> Try it

Picture of the page in action.

A new version of this very popular tool is now available, in a new location. Although it is currently labeled ‘beta’, I recommend that you use that instead, and change any links and bookmarks to the new location. There are a number of new features.

There is also a vastly improved code base. If you are one of the many people who have contacted me to ask how I coded the conversions, please take a look at the new javascript code. It is much cleaner and more compact.

New features include:

* New mixed input field and position of some fields changed.
* New field for conversion of 0x… notation hex escapes.
* Enabled invisible and ambiguous characters to be made visible in the XML output.
* Added support for all HTML entities in HTML/XML input.
* All code rewritten to use characters as the internal representation, rather than code points. Also, code is much smaller and cleaner, partly through use of regular expression matching.
* Various filters available for conversion, such as allowing ASCII or Latin1 characters to remain unconverted in NCR output.
* New icon to quickly select all contents of a field.

There is also a new demonstration feature.

If there are no issues raised/remaining in a couple of months, I’ll remove the beta tag.

by r12a at 01 July 2009 09:06 AM

June 17, 2009

W3C I18n Activity highlights

Article for wide review: Using Unicode controls for bidi text

Comments are being sought on this article prior to final release. Please send any comments to www-international@w3.org (subscribe). We expect to publish a final version in one to two weeks. [search keys: qa-bidi-unicode-controls]

by Richard Ishida at 17 June 2009 05:25 PM

Talk slides: New Work on Japanese Layout Requirements

Richard Ishida gave a presentation entitled New Work on Japanese Layout Requirements on 11 June, 2009 at the Fachhochschule Potsdam, Germany. The slides are annotated and in PDF. They build on a previous talk by Richard Ishida, Steve Zilles and Tatsuo Kobayashi at the Unicode Conference, and describe some of the key characteristics of Japanese Layout described in the newly published W3C Note, Requirements for Japanese Text Layout. [search keys: talk-2009 talk-ishida]

by Richard Ishida at 17 June 2009 01:56 PM

Talk slides: Practical Tips for Designing International Web Pages

Richard Ishida gave a presentation entitled Practical Tips for Designing International Web Pages on 9 June, 2009, at Localization World, Berlin, Germany.

The slides are annotated and in PDF. The presentation looked at a selection of practical issues for people who develop web pages for a multilingual audience. Topics included the dangers of composing sentences in content using scripting, strategies for designing layout so that text expansion during translation will not destroy your efforts, strategies for navigating localized content, and the separation of content and presentation. It explored some of the potential difficulties that can be encountered in these areas and recommended some best practices to help you avoid them. [search keys: talk-2009 talk-ishida]

Program and slides

by Richard Ishida at 17 June 2009 01:49 PM

June 16, 2009

Friedel en ander frappanthede » i18n

Pseudolocalisation with podebug (1)

The Translate Toolkit has had a program to help with pseudolocalisation since 2004: podebug. This is the first in a series of articles about podebug and what it can be used for.

Pseudolocalisation is a way to quickly generate or manipulate translation files to use for testing. This way the translatability or (internationalisability) of a program can be tested without having to translate it first or to review it on the level of the source files. It can also help translators in that translations can be annotated and can therefore be found more easily.

One of the things for which our team uses it frequently in the development of Virtaal, is to check if all strings are marked for translation.

podebug --rewrite=xxx virtaal.pot fr.po

With this command podebug creates a PO file based on the POT file and "translates" it so that it looks as folllows:

#: ../share/virtaal/virtaal.glade.h:8
msgid "General"
msgstr "xxxGeneralxxx"

In the program, it looks like this:
Virtaal with 'xxx' pseudo localisation

Now one can see that all the entries are "translated", except for "About". This can indicate that a string is perhaps not marked for translation. In other cases (as is the case here) it indicates that the string is translated elsewhere. This string is part of GTK+, and the translation will be retrieved by GTK+ from another file.

So this is a quick and easy way to check if all strings are marked for translation.

by Friedel at 16 June 2009 09:58 AM

June 15, 2009

Global By Design

Bing Beats Google in Insta-translation

Bing recently added a nifty new translation feature — one that is so simple and in many ways so obvious that I can’t help wondering why Google never got around to doing it. But that’s a topic for a later post.

For now, I’d like you to try entering the following text strings into both Bing and Google (to save you time I created pre-loaded hyperlinks):

Below are screen shots of the first text string in both Bing and Google. I’ll let the pictures speak for themselves:

bing-iloveyou

google_i_love_you

Google, despite its massively powerful translation engine, doesn’t simply answer your translation question. Instead, it provides links.

I realize that this is a relatively minor feature and that it currently only supports a small number of very common text strings, but it’s still a very handy feature for a translation geek such as myself.

Now, I’m not saying Bing is perfect. When it comes to technical searches — or when I just need to look up a Wikipedia article quickly — Google still does better, sometimes far better.

But I’m glad to see Bing integrating translation in an intuitive way. It’s a feature that I’ll be using again.

PS: Here is the blog announcement of this feature from Microsoft Translate team.

by John Yunker at 15 June 2009 04:13 AM

June 06, 2009

W3C I18n Activity highlights

New Working Group Note: Requirements for Japanese Text Layout (日本語組版処理の要件)

This document describes requirements for Japanese layout realized with technologies like CSS, SVG and XSL-FO. For non-Japanese speakers it provides access for the first time to a wealth of detailed and authoritative information about Japanese typesetting. The document is mainly based on a standard for Japanese layout, JIS X 4051 and its authors include key contributors to that standard. However, it also addresses areas which are not covered by JIS X 4051.

The document was created by the Japanese Layout Task Force (with participation from four W3C Working Groups, CSS, Internationalization Core, SVG and XSL)

A Japanese version is also available.

by Richard Ishida at 06 June 2009 11:07 AM

June 02, 2009

Global By Design

What does Libya have in common with Twitter? Ask Bit.ly

libya_cctld

Bit.ly, the URL shortener now used by Twitter, is not the first company to craft its name out of a county code top-level domain (ccTLD).

But Bit.ly does appear to be the first company to do so with the Libyan ccTLD.

As some have speculated, Bit.ly could put itself into a precarious position should it begin hosting URLs for the adult industry, or any other industry that violates Libyan laws. It’s always important to keep in mind that a company can’t “own” a domain the way it owns real estate.

But this is all just speculation. The registrar Libyan Spider clearly is hoping to capitalize on all the “ly” permutations of a word or brand name. And the fact of the matter is that more and more countries are viewing their country codes as profit centers.

Which leads me to a brief inventory of the sites that I am aware of that use ccTLDs as part of their names:

I’m rather surprised at the range of countries represented here. Montenegro, by the way, has already sold more than 250,000 domains so far. Not bad for a country that’s less than a few years old.

Any companies that I missed?

UPDATE: Thanks to the commenter below I’ve added Tri.im — and I also came across Pi.pe. Any more I should include?

UPDATE 2: Just added Su.pr — yet another URL shortener.

by John Yunker at 02 June 2009 04:15 AM

May 27, 2009

Global By Design

The Twitter Domain Rush: Don’t Get “Twit-jacked”

My previous post on Twitter got me thinking about what other companies had registered language-specific domains for their Twitter accounts.

Turns out, most companies haven’t even registered Twitter accounts for their primary brands.

Like who?

Apple, for one.

Here we have someone who apparently likes apples but isn’t Apple:

twitter_apple

It appear that Microsoft reserved its account early on, though nothing is there. Microsoft does have about a dozen Twitter accounts that do include content.

twitter_msft

Coke — someone who drinks Coke, but not the company.

twitter_coke

While Pepsi does have a Twitter account.

twitter_pepsi

The Wall Street Journal has an article out about this domain name rush.

So many questions come to mind:

  • Will Twitter enforce trademarks for valid holders? Usually, the WIPO does this with domain names, but this isn’t actually a domain name in the traditional sense.
  • What percentage of the millions of new Twitter accounts being registered every day simply squatters hoping to make a quick buck? That is, how much of Twitter’s growth actual growth?
  • And what about third-party domain marketplaces — will we see them emerge? Or will Twitter start its own marketplace?

In the meantime, if you’re thinking about reserving a Twitter domain, do it now before getting Twit-jacked…

by John Yunker at 27 May 2009 03:13 AM

May 24, 2009

Global By Design

Update on the World’s Number One Starbucks Fan

In 2005, I interviewed a man named Winter, who was on a quest to visit ever Starbucks location on this planet.

Four years later, the quest continues.

Unfortunately, as documented by the Wall Street Journal, Starbucks is now closing stores faster than Winter can visit them.

In 2005, Winter had visited 4,500 Starbucks stores. Today, his count stands at more than 9,000. And he is now racing to visit those stores scheduled to close, sometimes missing them by a matter of hours.

Winter is single (no surprise there) and lives at home with his parents, who wish he’d just give up this Sysiphean quest.

But I get a kick out of his quest. In this period in our history when so much seems ephemeral, so many trends little more than 15-minute Wharholian blips, it’s nice to see somebody out there, crazy as he may be, sticking with it.

“Pointless though it might it be,” says Winter, who plans to go to the U.K. next week, “a goal is a goal.”

by John Yunker at 24 May 2009 05:25 PM

May 20, 2009

Friedel en ander frappanthede » i18n

Localisation on paper

If I have to explain the term "localisation" to people, I always try to mention aspects falling outside of translation. Things like date formats and currencies are easy examples to use. Although spell checkers are commonly known, many people see that rather as part of language technology, although it is of course part of adapting a computer system to its users. There are, however, more aspects, with a few nice examples in the world of Free Software.

If you install a Linux distribution in Chinese, the system will install the input methods that make it possible to type Chinese. It involves software as well as "dictionaries" that is required for the input methods. Specific fonts will also be installed that might not be installed otherwise, especially because of the size (good Chinese fonts can take up quite a bit of hard drive space). Firefox gives the opportunity to specify special search engines for a language. GCompris has sound files that can be created separately for each language (who's going to help me with the Afrikaans ones?).

Another aspect that one is usually entirely unaware of, is paper sizes. With this I'm not referring to A4 versus A3, but the whole A system (part of ISO 216) versus the American system of paper sizes ("Letter", "Legal", etc.). I guess most South Africans have not yet even seen a sheet of "Letter" size. I don't even know if you'll be able to buy one anywhere. Anyway - we use A4.

These formats have different sizes, and computer programs must know this to be able to do page layout correctly. Mostly this is of no consequence to anybody. Dwayne created locale files for Linux and OpenOffice.org years ago for all the South African languages. These specify that we use A4 in South Africa. On Windows it works differently - it is configured per printer. Of course you can always go and configure these things again, just like with any other setting of the printer.

With this in mind it was a surprise when we realised that Firefox doesn't obey the locale information on Linux, and always uses "Letter" as the page size, even for the Afrikaans edition. In Firefox at about:config you can set the value of "print.postscript.paper_size" to "A4". I wanted to make this change for the Afrikaans version by making a change to the file firefox-l10n.js by fixing this value in JavaScript with:

pref("print.postscript.paper_size", "A4");

However, I was recommended not to do this based on some technical and administrative grounds. Of course it wouldn't be the correct solution (the software should get the setting from the locale data), but I really wanted to correct it for the Afrikaans users of Firefox. Hopefully this can get some attention at some stage. Possibly relevant bug reports at Mozilla:

Thank you to Fabrice Facorat for linking to the extra bug reports.

It is quite sad to see this if you take into account that the American system is basically only used in North America.

by Friedel at 20 May 2009 04:33 PM

Global By Design

Twitter and Web Globalization

icann_es

ICANN recently launched its own Twitter feed. And since ICANN is a global organization, it launched more than one language feed — one in English and one in Spanish.

http://twitter.com/icann_en

http://twitter.com/icann_es

This is not the most scalable solution. And I’m not trying to pick on Twitter; the issue effects any multinational company or organization.

For instance, let’s say ICANN launches a Portuguese feed for Brazil. The address would have to read twitter.com/icann_pt_br. Similar challenges arise with French (Canada vs. France). And even the English and Spanish feeds are inherently going to exclude various flavors of the languages.

In addition, if I were wanting to be a pain, I could register icann_ru to beat ICANN to that address. And this highlights a larger emerging issue (and opportunity) as Twitter becomes more corporate and less personal — how to ensure that brand holders have access to their names. I always thought this would be a nice revenue source for Twitter, similar to the way that registries profit from domain registrations.

Ideally, Twitter would allow you to set up one address and then forward language-specific feeds to the subscriber based on their preference — sort of like how language negotiation works now with Web browsers. For instance, if I type in Google.com, the language I get aligns with the language preference of my browser.

But therein lies the challenge of Twitter — it doesn’t just send feeds to a browser. It sends the feeds to browsers and mobile devices and even Twitter apps, like Tweetie, which I use on occasion.

ICANN is now migrating its subscribers from icann_en to icann. No word yet on what will happen with icann_es.

What do you think Twitter should do to solve this issue?

by John Yunker at 20 May 2009 03:36 AM

May 17, 2009

Global By Design

Why Pay for Translation if You Can Get it for Free?

It was nice to wake up this morning and see this article in the New York Times about the emergence of machine translation and volunteer translation (aka crowdsourcing). These are two very important developments that every companies needs to be aware of — and possibly champion.

That said, I do wonder how this article is going to be received by the translators of the world who actually expect to be paid for their services.

For example the for-profit, invite-only conference company TED saved about $500,000 using volunteer translators. Clearly TED could have coughed up the money.

I can see this article spurring on CEOs across the land to think that they too can get free translations.

One thing I mentioned awhile back is that you need to be translation-worthy to get away with pro-bono services, particularly if you’re a for-profit company.

Facebook, Google and, now, TED appear to be translation-worthy. But I wouldn’t expect to see, say, General Motors succeeding in this area (though they could certainly use the help).

But the larger issue here is to the extent that volunteer translation for companies that can afford to pay for translation undermines the translation industry. I don’t believe machine translation undermines human translation because companies generally use it to translation text they would never have hired people to do (or they use it as a first pass before bringing on the human translators).

But volunteer translation is different.

Are  volunteer translators taking money away from their colleagues? After all, TED and Google and Facebook certainly can afford to pay. Or are volunteer translators raising awareness for the value of their work, thereby benefiting the translation industry as a whole?

Personally, I think we’re entering a dangerous area where companies that don’t know better are going to think they don’t have to pay for translation. This all reminds me of Seinfeld’s George Costanza’s aversion to parking garages: Why should I pay, when if I apply myself, maybe I could get it for free?

by John Yunker at 17 May 2009 04:34 PM

May 14, 2009

Global By Design

Want to buy the number 8?

chinese_domain_8

Someone is promoting the sale of a Chinese domain name, shown here. Technically, this domain is represented over the Internet as http://www.xn--45q.ws, which is the ASCII equivalent of the Chinese character — the DNS is still ASCII-only.

In China, the number 8 one of the best numbers to have on your license plate, phone number, etc — because of the way it’s pronounced. But this particular domain is attached to the .WS ccTLD, which is Western Samoa. As ccTLDs go, .WS is not exactly up there with .COM or .CN. So maybe that’s why the owner is promoting it so heavily — I came across this sale via a press release.

by John Yunker at 14 May 2009 02:57 PM

May 13, 2009

Global By Design

The Rise and Fall of Web Globalization

According to my search on “web globalization” in Google Timeline:

web_globalization_timeline

I’m not sure I agree with this graph, but those were some heady days back in 2000.

From my humble perch, I’d say web globalization is alive and well. Perhaps searches are going down because more and more people already know what it is — at least that’s how I choose to see it.

And while I’m wasting an evening on Google, here’s one of its newest features, the Wonder Wheel:

web_globalization_wonder_wheel

It’s nifty, though I’m not sure I would use it more than once. And what the heck is Walmart doing there?

Walmart failed in Germany and Korea and is still bleeding cash in Japan — not exactly what I would call a web globalization success story. Walmart finished in the bottom 10 of The Web Globalization Report Card.

In other Google news, I added Friend Connect to this site — up on the upper right corner. Apparently Google now offers real-time translation of comments, so I’m hoping to give it a spin.

Let me know what you think…

UPDATE: I just removed it. It was really slow in loading. Instead I inserted my Twitter feed. I just noticed that the Chinese characters that were supported just fine in Twitter didn’t make it across into my feed as Unicode. This is interesting because I have WordPress setup for Unicode. I’ll have to do some digging.

Maybe I should have titled this post The Rise and Fall of Wordpress Plugins.

by John Yunker at 13 May 2009 03:28 AM

May 11, 2009

Global By Design

How many Fortune 500 companies blog?

Curious to know how many big companies have embraced blogging?

81 of the Fortune 500

This is less than I would have guessed. As a comparison, roughly twice as many companies on the Inc. 500 list have blogs — and I would say this is because smaller companies have fewer lawyers to advise against hosting blogs. Or, it could simply be that smaller companies stand to gain more from blogs than large established brands.

Also interesting is that more of these Fortune 500 companies Twitter than blog.

This data is from a report by Dr. Nora Ganim Barnes, a professor and senior fellow at the Society for New Communications Research.

The report is free and you can download it here.

Key findings include:

  • 81 of the Fortune 500 or 16% currently have public-facing blogs. This compares with 39 percent of the Inc. 500; 41 percent of the higher education sector and 57 percent of the nation’s Top 200 charities.
  • 28 percent of the Fortune 500’s blogs link to Twitter accounts. (Other Fortune 500 companies have Twitter accounts, but they are not linked to their blogs)
  • Five of the top ten companies have public blogs: Wal-Mart, Chevron, General Motors, Ford, and Bank of America.
  • 90 percent of the Fortune 500’s blogs have the comments feature enabled.
  • The computer software/hardware technology industry has the most blogs, followed by the food and drug industry, financial services,
  • Internet services, semi-conductors, retail and automotive respectively.
  • Ten percent of the Fortune 500’s blogs link to podcasts; 21 percent incorporate video

by John Yunker at 11 May 2009 03:32 AM

May 08, 2009

Global By Design

Have you dined at the Translate Server Error lately?

File this post under Lost in (Machine) Translation.

translate_server_error

This photo arrived courtesy of Gareth Morgan at Neovia Financial.

Apparently the proprietor of this restaurant in China decided to create an English-language sign using machine translation (MT) software and, apparently, the MT engine wasn’t working all that well.

So instead of “restaurant” we have “translate server error.”

It’s certainly one of the more memorable restaurant names I’ve come across. I’ll be sure to look out for it when I visit!

And I’ve love to know which MT engine delivered this message.

by John Yunker at 08 May 2009 03:33 AM

May 05, 2009

Hacklog: Blogamundo

Could the “Yae” language of Ve…

Could the “Yae” language of Venezuela in http://tinyurl.com/clp6pm be “Pumé”, which has the language code “yae”? http://tinyurl.com/demwl9

by Patrick Hall at 05 May 2009 05:13 PM

May 01, 2009

W3C I18n Activity highlights

ITS support in the Okapi framework

The Okapi Framework Team has announced the first milestone of its Java-based products. The framework provides cross-platform and open-source components and applications for localization tasks.

One of the components in this release is an XML filter based on an implementation of the W3C Internationalization Tag Set (ITS) Recommendation.

The filter allows access to the translatable content of an XML document, based on any external or internal global rules, as well as local rules. The ITS processor provided supports the following data categories: Translate, Localization Note, Element Within Text, Terminology, Directionality, and Language Information.

Rainbow, an Okapi application, uses the filter to extract and merge translatable content to and from XLIFF. Many other utilities provided in the framework take advantage of the ITS-based filter as well, for example to perform pseudo-translation.

You can download the Okapi components and get their source code.

by Richard Ishida at 01 May 2009 01:48 PM

April 30, 2009

Hacklog: Blogamundo

#swineflu Свиной грипп

#swineflu Свиной грипп 12 page educational comic in русский (Russian): http://tinyurl.com/c6ge2l

by Patrick Hall at 30 April 2009 09:59 AM

#swineflu Грип свиня

#swineflu Грип свинячий 12 page educational comic in українська (Ukrainian): http://tinyurl.com/cuqr8w

by Patrick Hall at 30 April 2009 09:22 AM

#swineflu 12 page educational …

#swineflu 12 page educational comic in فارسی (Farsi): http://tinyurl.com/dm47fa آنفلوآنزای خوکی

by Patrick Hall at 30 April 2009 09:22 AM

#swineflu 12 page educational …

#swineflu 12 page educational comic in العربية (Arabic): http://tinyurl.com/c7cjvm إنفلونزا الخنازير

by Patrick Hall at 30 April 2009 09:21 AM

#swineflu Cúm lợn 12 page e…

#swineflu Cúm lợn 12 page educational comic in Tiếng Việt (Vietnamese): http://tinyurl.com/c7bfbf

by Patrick Hall at 30 April 2009 09:20 AM

April 29, 2009

Global By Design

Per capita, Netherlands is the world’s ccTLD leader

The Netherlands, a country with just 16 million people, accounts for more than 3 million ccTLDs.

That’s an impressive ratio of people to domains — one ccTLD per 5.3 people — and it the highest ratio of any country with more than five million residents.

Germany comes in a close second, with a ratio of roughly one ccTLD per 6.5 people.

Granted, many of the owners of these .nl domains are not Dutch. Rather, they are multinational companies like FedEx and Apple.

But even if you take this into account, the Dutch registry SIDO claims that the Netherlands still has the highest density of domains, roughly 28 .NL domains per 1,000 people — a still impressive ratio.

Why is this I wonder?

by John Yunker at 29 April 2009 03:01 AM


Contact: Richard Ishida (ishida@w3.org).