About internationalization

Access to a Web for All has been a fundamental concern and goal of the World Wide Web Consortium since the beginning, and is a natural requirement for Web-based applications, given that they can be accessed by people around the world. Unfortunately, it is easy to overlook the needs of people in cultures different to your own, or who use different languages or writing systems. If you do, you will build applications and content that, in fact, present barriers for the use of your technology or content by many people around the world.

What is 'i18n'?

'i18n' is an industry standard abbreviation for 'internationalization' (because there are 18 letters between the 'i' and the 'n').

What is Internationalization?

Translation and localization are NOT what we mean by 'internationalization'. Surprised? Let me explain.

If you internationalize, you design or develop your content, application, specification, and so on, in a way that ensures it will work well for, or can be easily adapted for, users from any culture, region, or language. This is where you address the first set of barriers: not the fact that your user can't read or relate to your product, but the barriers that make it difficult to adapt your product so that they can.

It's essentially a Quality approach: one that sees you taking action early in the development cycle so that you avoid costly and sometimes prohibitive obstacles when it comes to rolling out your product to new marketplaces.

A universal code base

Fundamental to internationalization is ensuring that your product supports text in any writing system of the world. You should ensure that your product is built on the universal character set, Unicode. This means not only the HTML page that you serve to your user, but all the backend databases, content management systems, scripts, and so forth. There are plenty of examples of beautiful user interfaces that handle deftly any language that you need, but that return gobbledegook after the data has been processed behind the scenes.

You'll also want to ensure that it's possible to easily swap in translations of any natural language text that will be read by humans (including error messages, JSON strings, etc.), but also carry metadata about the language and direction of that text. The language metadata is important to get the fonts right, and to allow for support of the different typographic styles used around the world (for things such as line-breaking, text justification, emphasis or other text decorations, text selection and units, etc.)

It's advisable to clearly separate semantics (markup) from styling (CSS), and to avoid hard-coding content that assumes a particular order of text, or a particular set of punctuation marks, etc.

Text direction

Did you know that the most widely used writing system in the world after the Latin script is Arabic? The script is used for many languages, often with variations in the way that vowels are represented, or with slightly different repertoires. But what all these languages have in common is that they are read predominantly from right to left. This also has implications for layout: things such as table columns, spreadsheets, graphs, cascading menus, and even web page layout, are normally mirror-images of content produced in English. So instead of using values like 'left' and 'right' in your style sheet, you should use logical values such as 'start' and 'end': that way, when the direction of a page changes, the mirroring happens automatically and without the need for the translator to mess with your code.

Actually, it's even more complicated than that. Arabic mixes right-to-left and left-to-right text on the same line, and it is important to be able to control the direction of the surrounding context for that to work properly. It's also important to handle data strings in a way that preserves information about their base direction, so that when they are used on the user interface they don't look mangled.

Bidirectional text in Arabic.

And it's not just Arabic. Right-to-left writing is used for Hebrew, for south Asian languages such as Dhivehi (Maldivian) and Rohingya, and for fast-growing African scripts such as Adlam and N'Ko.

Names, addresses, and such

If you are dealing with HTML forms or creating databases for information such as people's names and addresses, you will need to consider how to handle the many different approaches to formatting data that exist around the world.

In some countries people only use a single name, or write their name using the family then given order. They may have single letter names, or very long names. Street addresses in Japanese go from the general (country or prefecture) to the specific (house location) from top to bottom, and there are plenty of variations on that theme. (In fact, Japanese homes typically don't have house numbers at all.)

You'll need to consider how you'll cope with acquiring and storing this kind of data (and many others, with region-specific approaches). The more you can make your system flexible up-front, the easier a time you'll have when you want to support people in a new locale.

Oh, and by the way, these people don't speak or write in English, and they tend to sort their data in very different ways, so you'll also need to figure out whether that's going to cause a problem for your backstore or back office, and put plans in place to address it as you localize.

Time zones, currencies, dates, etc.

You will usually want to store data internally in one standard form, but display it in ways that look natural to local users. As well as the names and addresses already mentioned, does the person working with your app or content expect to see periods or commas for decimal points? How about the order of day, month, and year, or even which day begins the week in a calendar?

You may also need to support alternative calendars, time zones and daylight savings, in both native plus transliterated forms, etc. Did you know that there are numerous countries around the world that have local calendars, and use them on a regular basis? Birth dates are typically recorded in the Imperial calendar in Japan, and newspapers in Thailand usually carry the date in the Buddhist calendar (the Western year 2022 is 2565 in Thailand). Any app you create needs to be able to adjust information for the appropriate time zone.

If working with monetization, you'll need to consider how to handle users who work with a range of currencies. In addition to deciding how to format and represent monetary data when displayed to the user, you also should consider how to put in place mechanisms to manage diverse currency systems. How will you develop pricing models for different countries, which may have large variations in standard of living? How will you convert subscriptions and payments from one currency to another?

Cultural norms & expectations

You'll also want to do some homework in advance about the cultural preferences and habits of the marketplaces where you want your application to be used, and choose flexible content design technologies and processes so that you can later support others.

For example, symbolism can be culture-specific. The check mark means correct or OK in many countries, but in some countries, such as Japan, it can be used to mean that something is incorrect. Japanese localizers may need to convert check marks to circles (their symbol for 'correct') as part of the localization process.

If you want your product to appeal to users, you'll need to be using content management systems that give you the ability to flex colors, layout, and information structures, as well as introducing local color. But you'll also need to ensure that you are not hard-coding graphics or images that offend or alienate users in another region.

And then there are quite fundamental questions for monetization applications. Is the community one that is familiar with credit card transactions? Does the population you want to reach have access to sufficient bandwidth (or even to the internet at all) when they need to use your application? Do the banking or other systems that your application interacts with support the language of the user? And remember that a large majority of users these days interface with the Web via mobile devices.

And have you taken into account local regulatory and legal considerations in the various territories your application will reach to?

And then localize

The things we have discussed so far all need some attention and preparation while you are planning and building your application. Otherwise, you could be, instead, building barriers for yourself when it comes to the exciting phase where you translate and adapt your product for various local languages and markets.

The localization phase is where you actually adapt for different users. You change the language via translation; you change the graphics and colors, where appropriate; you flick that text direction switch; you make available alternative data collection forms and processes; you write locally-relevant content, and so on.

Internationalization means foreseeing and planning for that phase from the earliest possible moment, so that not only are you ready when the time comes, but you can avoid digging yourself into pit holes that may be costly to get out of later on down the line.

What the W3C Internationalization Activity does

The W3C Internationalization (I18n) Activity works with W3C working groups and liaises with other organizations to make it possible to use Web technologies around the world, regardless of language, writing system, or culture.

The work covers three main areas:

Information about the work we are doing and the resources we make available can all be accessed via the Internationalization Activity home page. If you are new to internationalization, you may find our Getting Started page a useful place to begin. See also the list of groups below.


Active groups

  1. Internationalization Working Group
    Overseeing the language enablement work, reviewing specifications, providing internationalization guidance to Working Groups, & creating educational materials for content authors.
    Home page
    GitHub: w3c/bp-i18n-specdevw3c/charmod-normw3c/i18n-activityw3c/i18n-checkerw3c/i18n-discussw3c/i18n-draftsw3c/i18n-glossaryw3c/i18n-issuesw3c/i18n-requestw3c/i18n-testsw3c/its2reqw3c/localizable-manifestsw3c/ltliw3c/mlw-metadata-us-implw3c/w3c/predefined-counter-stylesw3c/string-metaw3c/string-searchw3c/timezonew3c/type-samplesw3c/typographyw3c/unicode-xml
    Notification list: www-international
    Other lists: public-i18n-corepublic-i18n-translation
    Group-only list: member-i18n-core*

  2. Internationalization Interest Group
    Group membership is based on mailing list participation. Most of the traffic is composed of notifications about changes to GitHub issues, which is where the technical discussions take place. The IG is also the parent group for all the task forces listed below.
    Home page
    GitHub: w3c/character_phrase_testsw3c/klreqw3c/line_paragraph_testsw3c/text_direction_tests
    Notification lists: www-international, public-i18n-its-ig

  3. African Layout Task Force
    Identify and address barriers to use of the Web in any African language or script.
    Home page
    Discussion threads
    GitHub: w3c/afrlreq
    Notification list (public-i18n-africa): archivesubscribe
    Group-only list: public-afrlreq-admin

  4. Americas Layout Task Force
    Identify and address barriers to use of the Web for languages of the Americas.
    Home page
    Discussion threads
    GitHub: w3c/amlreq
    Notification list (public-i18n-americas): archivesubscribe!
    Group-only list: public-alreq-amlreq

  5. Arabic Layout Task Force
    Identify and address barriers to use of the Web for Arabic & Persian languages.
    Home page
    Discussion threads
    GitHub: w3c/alreq
    Notification list (public-i18n-arabic): archivesubscribe!
    Group-only list: public-alreq-admin
    Related list (public-i18n-bidi): public-i18n-bidi

  6. Chinese Layout Task Force
    Identify and address barriers to use of the Web for Simplified & Traditional Chinese.
    Home page
    Discussion threads
    GitHub: w3c/clreq
    Notification list (public-i18n-chinese): archivesubscribe!
    Group-only list: public-clreq-admin

  7. Ethiopic Layout Task Force
    Identify and address barriers to use of the Web for Ethiopic-script languages.
    Home page
    Discussion threads
    GitHub: w3c/elreq
    Notification list (public-i18n-ethiopic): archivesubscribe!
    Group-only list: public-elreq-admin

  8. European Layout Task Force
    Identify and address barriers to use of the Web for European languages.
    Home page
    Discussion threads
    GitHub: w3c/eurlreq
    Notification list (public-i18n-europe): archivesubscribe!
    Group-only list: public-eurlreq-admin

  9. Hebrew Layout Task Force
    Identify and address barriers to use of the Web for Hebrew.
    Home page
    Discussion threads
    GitHub: w3c/hlreq
    Notification list (public-i18n-hebrew): archivesubscribe!
    Group-only list: public-hlreq-admin
    Related list (public-i18n-bidi): public-i18n-bidi

  10. India International Program
    Identify and address barriers to use of the Web for languages of India.
    Home page
    Discussion threads
    GitHub: w3c/iip
    Notification list (public-i18n-indic): archivesubscribe!
    Group-only list: public-ilreq-admin

  11. Japanese Layout Task Force
    Identify and address barriers to use of the Web for Japanese.
    Home page
    Discussion threads
    GitHub: w3c/jlreqw3c/simple-rubyw3c/ruby-t2s-req
    Notification list (public-i18n-japanese): archivesubscribe!
    Group-only list: public-jlreq-admin

  12. Mongolian Layout Task Force
    Identify and address barriers to use of the Web for the Traditional Mongolian script.
    Home page
    Discussion threads
    GitHub: w3c/mlreq
    Notification list (public-i18n-mongolian): archivesubscribe!
    Group-only list: public-mlreq-admin

  13. Southeast Asian Layout Task Force
    Identify and address barriers to use of the Web for SE Asian languages & scripts.
    Home page
    Discussion threads
    GitHub: w3c/sealreq
    Notification list (public-i18n-mongolian): archivesubscribe!
    Group-only list: public-sealreq-admin

  14. Tibetan Layout Task Force
    Identify and address barriers to use of the Web for Tibetan.
    Home page
    Discussion threads
    GitHub: w3c/tlreq
    Notification list (public-i18n-tibetan): archivesubscribe!
    Group-only list: public-tlreq-admin

Former groups

  1. ITS (Internationalization Tag Set) Interest Group Home page • List: public-i18n-its-ig. The mailing list is still open, but now operates under the Internationalization Interest Group.

  2. Japanese Layout Multi-Group Task Force Home page • Lists: public-i18n-cjk, member-japanese-layout-en*, member-japanese-layout-ja*

  3. MLW-LT (MultilingualWeb Language Technology) Working Group defined the Internationalization Tag Set (ITS) 2.0. This delivers metadata for web content (mainly HTML5) and "deep Web" content (for example a CMS or XML file from which HTML pages are generated). The metadata facilitates interaction with multilingual technologies and localization processes. They also produced reference implementations. The group was closed on 17 January 2014, having successfully published the Internationalization Tag Set (ITS) 2.0 specification as a Recommendation. the Working Group has started discussing ITS 2.0 best practices topics within the Internationalization Tag Set Interest Group. This is an open forum aiming to generate discussion around future possible work in this area. To participate contribute to the ITS IG wiki and the ITS IG mailing list. [Home page] [Charter]

  4. Internationalization GEO Working Group worked to make the internationalization aspects of W3C technology better understood and more widely and consistently used through guidelines, education and outreach. This WG was closed when the work was merged into that of the Internationalization Working Group in 2007. [Home page] [Charter]