The Next Topics for WWW Internationalization

This is a position paper for the Workshop on Internationalization at the 5th WWW conference in Paris on May 6th, 1996.

About the Author

Martin J. Dürst is one of the coauthors of the internet draft for the internatinalization of HTML. His main research fields and interests are worldwide text processing and multimedia, in particular object-oriented architectures for multilingual/multiscript software and computer-supported font design. He is currently a senior research associate at the MultiMedia Laboratory, Deptartment of Computer Science, University of Zurich, Switzerland. He obtained his PhD from the University of Tokyo, Japan, in Computer Science, and his Masters from the University of Zurich, Switzerland, in Compter Science, Business Administration, and Japanese Studies.

Main Issues in Full WWW Internationalization

With respect to the full internationalization of WWW, i.e. the ability to use the WWW with any language and script currently used worldwide, the following following is what I currently am thinking about and/or working on:

Current Issues and Non-Issues in Web Internationalization

Most basic issues are solved or close to a solution. The I18N internet draft for HTML, together with related documents, provides these solutions.

In some cases, multiple solutions are proposed to address the needs of a wide range of user communities, software architectures, and organizational structures as well as for backwards compatibility and smooth deployment. As example for this is the various ways in which, both in the HTTP protocol and with hooks in HTML, the encoding of a document on the wire (MIME "charset" parameter) can be specified.

In other cases, a single attribute, such as LANG, provides information for a wide variety of possible services to the user, from sophisticated search services to high-quality typographic rendering.

The abovementionned draft basically addresses the requirements for markup common to, and necessary for, all kinds of multilingual documents or documents written in single scripts. As far as the knowledge of the authors of the abovementionned draft goes, the proposed solutions are very feasible. Of course, they are still very open to suggestions both in terms of content and presentation.

The draft does not address issues that can and should be solved purely on the server side (e.g. MIME "charset" identification and conversion) or on the browser side (e.g. sophisticated rendering and font issues). Also, issues for which special markup may seem desirable for a certain group of multilingual documents, but which can be implemented by the current facilities in HTML, are not discussed. The most discussed case here is allignement of different translations of the same document.

Internationalization of URLs

This is at the moment the most important issue not yet solved with regards to the internationalization of the internet. The issue is highly controversial, with many interest groups involved, most of which have no big interest in internationalization, and with many possible solutions, all of which have advantages and disadvantages. Some of my thoughts on this issue can be found at in a separate document.

Multilingual Typography

This section summarises thoughts and experiences I have made on the subject of multilingual typography. It should be helpful for people designing browsers or multilingual documents.

In conclusion, the subject of multilingual typography is barely existing and therefore not very well understood. Good solutions for true multilingual typography can only be developped in connection with the production of true multilingual texts, and it will take quite some time (as for any other aspect of typography) for new conventions to be widely accepted.

Annotation of Ideographic Texts: Rubi

In my view, this is probably the one single item that should be added to the i18n draft. While bidirectional formatting control is important for languages such as Hebrew and Arabic, and superscripts are needed for the adequate display of ordinal numbers in French and other languages, rubi are required in many cases to display ideographic text, especially Japanese, in a fashion accessible to a wide range of readers. My proposal for adding rubi syntax to HTML has the advantage that a good result is achieved even on browsers that don't support this feature. Any and all comments are wellcome.


Last updated May 3rd, 1996, mduerst@ifi.unizh.ch