Accesskey n skips to in page navigation. Skip to the content start
This talk was given by Richard Ishida at the 14th International World Wide Web Conference (WWW2005), in Chiba, Japan, May 2005.
Anyone who would like to understand the latest developments at the Internationalization Activity at the W3C.
The W3C Internationalization Activity has recently been rechartered, and what was once a single Working Group has been replaced by three. Important new work has been started, for example in the areas of internationalization and localization related tag sets, and mechanisms for associating locale information with Web Services. This talk provides an update about the current activities of the W3C in internationalisation, and summarises current progress on new work items.
Please send any comments to firstname.lastname@example.org.
I18n is an industry-wide abbreviation for Internationalization.
Last year there was a single Internationalization Working Group within the Internationalization Activity. At the beginning of 2005, the Activity was rechartered with three Working Groups.
I18n Core WG: This group reviews specifications in development at the W3C for internationalization issues. It also develops its own specifications and W3C Notes related to internationalization topics. (Chair: Addison Phillips, Quest Software) (charter)
I18n Guidelines, Education & Outreach (GEO) WG: This group continues the work of the former GEO Task Force, and aims to make the internationalization aspects of W3C technology better understood and more widely and consistently used by producing articles, tutorials, tests and other resource information. and making the information widely available. (Chair: Richard Ishida, W3C) (charter)
Internationalization Tag Set (ITS) WG: This is a new group, developing a set of elements and attributes that can be used with new DTDs/Schemas to support the internationalization and localization of documents; it will also provide best practice guidelines for developers of DTDs/Schemas that show how to enable internationalization of their documents. (Chair: Yves Savourel, Enlaso) (charter)
Another change is that the Internationalization Interest Group is now open for public participation, and uses the email@example.com mailing list (archive) .
There have also been some recent personnel changes in the Team.
Felix Sasaki joined the W3C Team as of 1 April to work within the Internationalization Activity. Felix is based at Keio University, in Japan.
Felix studied Japanese and Linguistics in Berlin, Nagoya (Japan) and Tokyo. Since 1999 he worked in the Department of Computational Linguistics and Text-technology, at the University of Bielefeld (Germany), where he finished his PhD in 2004. The PhD deals with the integration of heterogenous linguistic resources using XML-based (e.g.linguistic corpora) and RDF-based (e.g. lexica, conceptual models) representations.
Felix replaces Martin Dürst, who has left the W3C to take up a post at Aoyama Gakuin University in Japan. We wish Martin success for the future, and thank him for his dedication and hard work in leading the internationalization effort for many years. Richard Ishida now becomes Internationalization Activity Lead.
The ITS Working Group is aiming to produce a set of tags that people can use to ensure both internationalization and localizability in their schemas - be they based on DTDs, XML Schema or other technologies. Here we will give some examples of the kind of requirements we are currently developing.
The top line on the slide shows some text with an embedded Hebrew quotation (saying "Internationalization Activity, W3C"). In Hebrew, the W3C should appear to the left of the Hebrew text. However, the Unicode bidirectional algorithm will place 'W3C' to the right of the Hebrew. Some additional markup is needed to create an embedded directional level so that the bidirectional algorithm can produce the desired effect. In XHTML this would be the attribute dir with value set to rtl. Provision of such markup falls within the scope of the ITS work, and addresses international use of the schema in question.
The next example is of some documentation text, that will be translated. The text refers to a hard panel on a device (shown in a picture). The text on the hard panel is not translated.
The documentation is likely to be translated in translation tool environments that do not associate the picture with the text being translated, and do not clearly indicate that certain words are commands on the hard panel and therefore should not be translated. In this case, the translator may unwittingly or by accident translate text that should remain in English.
A solution to this problem could involve some method of tagging the relevant hard panel commands in the text with information about whether or not they should be translated. There are a number of ways of implementing this. The solution shown on the slide involving a translate attribute may not be the best answer, but it indicates the type of requirement we have.
The ITS Working Group will also develop guidelines (we call them 'techniques') for schema developers. Not all internationalization or translation issues can be addressed by a tag set, and these should capture a lot of other useful pointers.
This slide shows an example of a caption for a picture where the caption itself is expressed as an attribute value. The caption on the slide contains both English and Japanese text, but because it is in an attribute, there is no way to label the language of the content properly. Attribute text causes problems for other things, too, such as applying bidirectional tags, style tags, abbreviation tags, etc.
A much better approach would be to make the caption an element. This would enable the application of whatever meta information is needed.
The general rule in this case, then, is to avoid putting user readable text in attributes, and to design your schema so that that is not necessary.
A number of early ideas on this topic were set out by Richard Ishida and Yves Savourel in 2000-2001. See for example, a Requirements for Localizable DTD Design and a shorter article in Multilingual Computing magazine entitled Localizable DTD Design. The Working Group was formed partly in response to growing a growing desire in the localization industry to provide solutions to the requirements described in this early work.
The work looks at markup and best practises that will support international use of the format defined by your schema, such as inclusion of bidirectional text markup, ruby markup, etc. There is also good representation on the Working Group for the localization industry, aimed at ensuring that information needed for efficient translation and localization of material is available in the content, and should help standardise that information to make it easier to feed into translation tools and integrate with standard representations of data during the localization process, such as the XLIFF standard.
As of May 2005 we are still developing requirements for well internationalized and localizable schemas. Later in the year we will move to questions of how to implement solutions. We currently expect this to provide a number of alternative recommendations relating to constructs that schema designers should incorporate in their DTDs, XML Schema or RelaxNG formats. There will also be a Note that captures additional guidance that cannot be expressed by simply providing such building blocks.
To see the ongoing work of the ITS Working Group, visit their home page.
We welcome the questions and participation of people currently developing schemas.
Over the past charter period the GEO Task Force of the I18n Working Group developed a large number of articles. These articles all went through several rigourous reviews before release. Many of the articles are based on frequently asked questions.
More articles are in development as we speak.
In addition to the long list of articles the Internationalization Activity provides other resources, such as:
Most of these have been developed by the GEO Working Group. This is an ongoing activity. You can see what is coming up at any given time by looking at the GEO Work Items Pipeline.
The GEO group has so far developed a lot of material aimed at content authors using XHTML/HTML and CSS. It will continue its work in developing this material, but we are hoping to diversify into areas such as XSLT, XQuery, XSL-FO and SVG. This work will depend on having people in the group that have an interest in working in these areas. We will also try to focus more on issues relating to navigating around multilingual sites.
We are also hoping to build on the material aimed at content authors to begin developing guidelines for editing tool developers.
The GEO group has been very successful in developing material so far. Part of its mission now is to:
make the available resources more visible and accessible, and
put more effort into identifying what topics users of Web technology would like advice.
To help with outreach, a number of improvements have been made to the Internationalization Activity site recently, and these will continue into the future.
From the top level pages you can now search within the /International site.
The left side of the Internationalization home page now contains a collapsing list that helps you find your way around the site more effectively. We are in the process of changing links on lower level pages to reflect the basic anchor categories in this list.
We provide an index to the available resources, organized by topic. The second level headings in the index are accessible in the collapsing menu on the left of the home page, but also for the time being in a separate, more quickly accessible list on the right.
One particularly important aspect of the work relates to the development of authoring techniques for content authors working with XHTML/HTML and CSS. This is still a work in progress, but efforts have been made to provide task-based information in a number of ways.
Using the techniques index a content author could, for example, look up information about specifying language attribute values. They can choose to link to an outline summary of recommendations for this task, or a list of external resources, such as the list of IANA-registered tags. The outline view incorporates summary information about support in major browsers for the features described. From the outline view they can go into greater detail on any particular topic. For example, if they want to know more about the use of zh-Hans and zh-Hant tags, the user clicks on the summary text and is taken to an appropriate section of a techniques document where they find information about how to use this feature, a discussion of pros and cons, and more detailed information about user agent support issues, as well as a list of relevant additional resources.
We have also made available lists of announcements grouped into a number of useful 'buckets', dependent on the expected audience. Some of these allow for you to track changes to articles and the like - particularly useful for translators. Another tells you when an new resource has been put out for pre-publication review. Etc.
There are RSS feeds associated with each of these filters, so you don't need to continually visit the site to discover when things change.
Future plans include the development of some 'Getting Started' materials, aimed at bridging the gap for absolute beginners and the more technical resources we have on the site. This material should also be useful for managers, and the like.
We are also in the process of adding a short feeback form to the resource pages. We are very interested in hearing from people who have constructive suggestions for how to improve the site and its resources.
We are also working slowly in the background on redesigning the look and feel of the site. This work involves discussions with the WAI (Web Accessiblity Initiative) folks at the W3C, who have done a lengthy user-centred analysis and design for their own pages (results to be implemented soon).
At this point you can get a very brief insight into some of the work of the GEO group by reading one of the available tutorials. As we try to explain how to use Web technologies, we tend to find areas that have not been well explained in the past, and whose implementation may even still be unclear.
Language declarations, for all their apparent simplicity, lead to this type of work. After a good deal of thought and discussion, we have produced material that emphasizes, for example, that there are two very different aspects to declaring language.
To read more, please follow this link to the tutorial titled Declaring Language in XHTML and HTML.
The Core Working Group has been working on the Character Model for the World Wide Web for some years now. A few months ago we published the first of three parts to this document, entitled Fundamentals. Another part, Resource Identifiers, is in Candidate Recommendation phase. The final part, Normalization, is still a Working Draft.
The specification International Resource Identifiers, an IETF document but developed by Martin Dürst and Michel Suignard with support from the Internationalization Working Group, also became an IETF Proposed Standard (RFC 3987) earlier this year. As a Proposed Standard it is now a stable resource that can be cited in normative references.
In addition to producing specifications, the Core Working Group has reviewed other specifications being developed by the W3C. These include
The group is encouraging people to participate in the review process. If there is a technology that you are interested in learning about and you could read the specification while looking out for internationalization issues the group would like to hear from you. You can see the list of specifications the Core group is planning to review using the radar chart.
To get a flavour for some of the things the Working Group has discussed as a result of reviews you can look at the list of initial comments on the XHTML 2.0 specification (pre-last-call).
We had already worked with the HTML group on such things as removing human readable text from attributes, and allowing the title element to contain markup - these improvements allow for better support of language markup and bidirectional tagging.
Some of the more substantive discussions during the recent review related to the use of quotes, usefulness of the title attribute, representation of abbreviations, use of entities, etc. You can follow the subsequent discusssions in the core public archive.
The Core Working Group also tries to assist other Working Groups, where resources allow, to develop international aspects of their technologies. For example, members of the i18n Working Group have contributed to the development of international features in CSS3 Modules.
CSS3 will provide a large number of features to support orthographic and typographic needs of people using non-Latin scripts. To get an idea of the type of thing this involves, see the tutorial entitled CSS3 and International Text .