W3C Architecture Domain International

Internationalization Activity Proposal

Martin Dürst, W3C, Internationalization Activity Lead
last revised $Date: 2004/11/24 02:10:27 $ by $Author: rishida $

1. Summary

Internationalization is a critical part of W3C's long term goal of Universal Access and it is vitally important to global commerce, communication, and understanding. Following Section 5.6 of the Process Document, this is a proposal to modify and extend the Internationalization Activity.

To enable to continue W3C's work on Internationalization, and to facilitate the work in this area, this Activity Proposal suggests to split the current Internationalization Working Group (I18N WG) into three new WGs: the Internationalization Core Working Group (I18N Core WG, charter), the Internationalization Guidelines, Education & Outreach Working Group (GEO WG, charter) and the Internationalization Tag Set Working Group (ITS WG, charter). These three WGs as well as the Internationalization Interest Group (I18N IG, charter) are chartered until 31 October 2006, and the duration of the Internationalization Activity is extended accordingly.

Following the general tendency to make discussions publicly visible, it is proposed that all the WGs as well as the IG in principle conduct their discussions in public.

2. Context

2.1 Concepts

Internationalization in the context of information technology means enabling the use of a technology with any language, script, and culture. Localization means the actual configuration or adaptation of technology or content to a particular language and culture. A script is a set of characters that are used together and have a similar appearance. [Examples: the Latin script is used to write English, Latin, French, Hawai'ian, and many other languages. The Cyrillic script is used to write Russian, Ukrainian, Bulgarian, and other languages. The Arabic script is used to write Arabic, Urdu, Persian, and other languages. The Hebrew script is used to write Hebrew, Yiddish, and some other languages.]

2.2 Business Relevance

Successful businesses have understood for a long time that to reach customers, they must speak their languages and adapt to their cultures. The most cost-effective way to accomplish this is to implement, from the start, flexible technologies that most easily support the exchange and processing of information for the languages and cultures of both current and potential customers.

2.3 History

Starting with its name, the World Wide Web was from the very beginning intended as a world-wide technology. The W3C Internationalization Activity was created in October 1995. In February 1998, the I18N WG was created, and has since been rechartered regularly. In 2002 the WG was organized into three task forces, the Core Task Force, the Web Services Internationalization Task Force, and the GEO (Guidelines, Education, and Outreach) Task Force.

Initially there were quite some gaps between intent, specifications, and implementations. Over the past few years, these gaps have been closed to a large extent for specifications, and somewhat less for implementations. The working mode for internationalization has changed over the years. Originally, special 'internationalized' versions of a specification had to be written. Currently, specifications produced in W3C are reviewed to make sure they are appropriately internationalized. The efforts of the Internationalization Activity, the Internationalization Working Group, and the Internationalization Interest Group have strongly contributed to make sure that established W3C technology can be used readily around the world. It is important that this effort continues.

2.4 Current Context

The World Wide Web, and the use of W3C technology, continues to grow rapidly, in particular in areas of the world where it is less used currently. For example, it is expected that Chinese will replace English as the most widely used language on the WWW.

At the same time, the W3C is working on new technologies, such as Web Services and the Semantic Web. These technologies are expected to show deployment patterns similar to the 'classic' WWW. However, the lags between different parts of the world will most probably be shorter. This makes it imperative to identify the relevant internationalization issues and needs for these technologies more quickly, and to be much more proactive than in the past.

3. Scope

The scope of the Internationalization Activity encompasses all the work related to Internationalization being carried out at the W3C. This includes completion of existing work items, continuation of ongoing work such as reviews and outreach, and start of new work in a timely fashion, such as specification work related to the Internationalization of Web Services and work on an Internationalization Tag Set.

The proposal made in this document for three Working Groups and an Interest Group is based on the experience with the three Task Forces in the previous Internationalization Working Group, which have been working largely independently with different sets of expertise. The I18N Core WG will concentrate on issues related to implementation of internationalization features in all aspects. The ITS WG will concentrate on markup issues for Internationalization and Localization. The GEO WG will concentrate on guidelines and outreach work.

The work items to be completed are the various parts of the Character Model (I18N Core WG). The ongoing work items are reviews and liaison activities (mostly I18N Core WG) and guidelines, education, and outreach (GEO WG). The new work items are work related to the Internationalization of Web Services (I18N Core WG) and work on an Internationalization Tag Set (ITS WG). These new work items are discussed here in some more detail.

3.1 Web Services Internationalization

The former Web Services Internationalization Task Force has carefully developed use cases and requirements for Web Services Internationalization, and for Language and Locale Identifiers for the World Wide Web. Web services can in many cases be designed without concern to internationalization issues, in a locale-independent manner. However, where such services ultimately are carried out for the benefits of human users, server settings and client preferences sooner or later have to be taken into account in a systematic way, and their handling has to be integrated into the overall architecture.

Important for Web services, but useful in other cases where user preferences have to be exchanged on the Web are language and locale identifiers. There is already wide experience on the Web regarding language identifiers. There is also some ad-hoc use of language identifiers to identify locales. Although this may work approximately in certain cases, it is not sufficient in contexts such as Web services.

It is crucial that this work be started and completed quickly to make sure that the necessary technology for the internationalization of Web services and other Web technology involving or implying processing of data is available in a timely fashion to the industry at large, and that in respect to Internationalization, Web services can fulfill their interoperability promise.

3.2 Internationalization Tag Set

XML is used extremely widely, but XML DTDs/Schemas are most often designed without concern for Internationalization and Localization issues. A typical example for Internationalization is the addition of attributes for language identification and for bidirectional rendering. A typical example for Localization is the addition of an attribute indicating whether a certain text has to be translated or not, e.g. blocking translation for literal command examples. This lack of concern leads to document and data formats that work for their original purpose in the original language, but that may be difficult or impossible to use with other languages, or to translate and localize from one language to another.

The W3C over several years has collected a lot of experience in the Internationalization of XML-based formats such as XHTML, SVG, SMIL, MathML,... Rather than continuing to internationalize our own formats one-by-one, the time has come to collect and compile this experience and, as far as possible, make it available both inside and outside the W3C in an easily usable form. At the same time, the Localization Industry, which has gained experience with XML for things such as translation memory and translation process flow, is now looking at ways to leverage information in source XML documents for localization.

The combination of Internationalization and Localization related concerns makes sense because these issues, and the involved communities, are closely related, and the resulting single product will be easier to use than two separate products. The W3C is the logical place for this work because of its knowledge and experience in Internationalization and the unparalleled high visibility of the work product among all users of XML, and the work on XML Accessibility Guidelines in WAI.

4. Proposals

4.1 Structure of the Activity

The Internationalization Activity includes the following Groups, with the following main deliverables:

  1. Internationalization Core Working Group (I18N Core WG):
    1. Finish current work (Character Model)
    2. Continue reviews of W3C technology with respect to Internationalization
    3. Web Services Internationalization: Produce use cases and a requirements document in the area of Web Services internationalization, including the identification/exchange of language/culture specific context (locales).

    See proposed charter for details, including scope, deliverables, and milestones.

  2. Internationalization Guidelines, Education & Outreach Working Group (GEO WG):
    1. Produce articles and advice, linked from the Activity home page, that help to make the internationalization aspects of W3C technology better understood and more widely and consistently used. These articles can be accessed through a topic index, or through other lists, eg. frequently asked questions, tutorials, etc.
    2. Produce WG Notes providing internationalization-related advice for users of Web technologies. To make the information available in a task based fashion, at the point of need, the information in these separate documents will be drawn together by higher level web pages, targeted at specific user types and activities, that group all relevant information in summary form and link to detail, with an organization that aids their use.
    3. Create tests that explore support by user agents for internationalization related features, but also serve an educational role - the audience will be content authors as well as QA testers. In addition, other W3C WGs have found these tests of interest for their own test suites, and user agent development teams have found these tests valuable in terms of identifying, fixing or enhancing internationalization support features or bugs.
    4. Work with the Core and ITS WGs to assist in the publication of educational and outreach materials produced by those groups.

    See proposed charter for details, including scope, deliverables, and milestones.

  3. International Tag Set Working Group (ITS WG):
    1. Develop a set of requirements for development of the internationalization tag set described below. This document will address requirements arising from any work on localization formats developed by localization groups (eg. translatability flags) as well as requirements for supporting content in the many scripts and languages around the world (eg. support of bidirectional text).
    2. Building on the requirements work, develop a set of XML elements and attributes that can be used as a namespace or included in a DTD or schema to ensure a standard and effective approach to enabling internationalization and localization.
    3. Develop techniques for the developers of DTDs and W3C Schemas that help them develop formats that support international content and localization. This information will recommend techniques that can be addressed by the tagset proposed in the previous deliverable, or by other approaches, but will add advices that goes beyond the use of the tagset (eg. exclusion of natural language text from attributes).

    The ITS WG will incorporate members of the OSCAR (LISA) and XLIFF (OASIS) standards groups that have developed key localization standards in the past. This is to ensure good access to the needs and interests of the localization industry.

    See proposed charter for details, including scope, deliverables, and milestones.

  4. Internationalization Interest Group:
    The purpose of this IG is to help the Working Groups within the Internationalization Activity with advice and opinions from a larger group of people with knowledge in different languages and cultures as well as different parts of the Web architecture. The I18N IG also provides a forum to discuss general issues related to the internationalization of the Web, outside of the specific charters of the WGs. To guarantee a smooth flow of information, WG members are expected to also become members of the IG. The IG does not have any deliverables. See proposed charter.

4.2 Proposed Timeline

It is proposed that the three WGs as well as the IG are chartered until 31 October 2006, and that the duration of this Activity is extended accordingly. The charters for each group provide details.

4.3 Resources

4.3.1 W3C Resource Commitment

This Activity will consume 1.6 full time equivalents of a Team member. This includes 0.2 full time equivalents required for leading the Internationalization Activity and 1.4 full time equivalents for the Team Contacts of the various groups. Because of the way the three task forces have worked in the previous I18N WG, the move from one WG to three WGs does not significantly affect W3C's resource commitment.

4.3.2 Member Resource Commitment

Each Member Organization choosing to participate in one or more of the Working Groups under this Activity Proposal is expected to identify one or more individuals as participants. Participation in the Working Group implies a commitment of up to 20% or one day a week for Working Group related tasks.

5. Patent Policy and Patent Disclosures

The I18N Core WG and the ITS WG operate under the W3C Patent Policy (5 February 2004 Version). To promote the widest adoption of Web standards, W3C seeks to issue Recommendations that can be implemented, according to this policy, on a Royalty-Free basis. There is one exception: the I18N Core WG work on Character Model for the World Wide Web 1.0: Fundamentals (Proposed Recommendation) and Character Model for the World Wide Web 1.0: Resource Identifiers (Candidate Recommendation) are carried out under the the 24 January 2002 CPP as amended by the W3C Patent Policy Transition Procedure.

The GEO WG provides an opportunity to share perspectives on Internationalization. This Working Group is not chartered to produce Recommendations with associated licensing obligations as described by the W3C Patent Policy. W3C reminds Working Group participants of their obligation to comply with patent disclosure obligations as set out in Section 6 of the W3C Patent Policy. While this Working Group does not produce Recommendation-track documents, when Working Group participants review Recommendation-track specifications from other Working Groups, the patent disclosure obligations do apply.

Interest Group participants disclose patent and other IPR claims by sending email to <patent-issues@w3.org>; please see the Process Document, Section 2.2, for more information about disclosures.

6. Stakeholders and Benefits

The beneficiaries of this work include all those who work or plan to work with W3C technology in general, both on a local and on a world-wide level. This includes large corporations delivering implementations or content in many languages around the world as well as smaller organizations. The work of the I18N Core WG should in particular benefit world-wide deployment of Web services technology. The work of the GEO WG should make sure that the previous and ongoing work in this Activity is understood better and used more. The work of the ITS WG should benefit users of XML world-wide, and should simplify localization and translation of XML documents.