Position paper for I18N Workshop
The authors are part of a small group with European Community funding to work on the development and standardization of APIs to support i18n, together with the development of libraries to support these APIs and their deployment in a number of trial applications including web servers and browsers, E-mail systems, and X.500 systems. This is the MAITS project.
One of the authors is also the UK Panel Chairman covering the i18n activity within ISO SC22 WG20, which has a New Work Item proposal ending its ballot period for the development of ISO Standards for APIs to provide i18n functionality.
It is our belief that the most effective way of ensuring rapid deployment of i18n facilities is the establishent of such APIs and libraries for use in a wide variety of applications. The deliberations at the workshop on the requirements for the WWW application will be a major input into the requirements analysis for these APIs, and will be important for both the MAITS work and for the SC22 WG20 work. We would very much wish to have direct attendance in order to fully understand the discussions that will take place.
The work in MAITS has only just begun, and we have no firm positions to state at this time, apart from the belief that agreement on application-independent APIs to support i18n will be crucial to the rapid deployment of this technology. MAITS has identified several levels of API and library support that will be needed:
- simple character set conversion facilities
- transliteration and transcription for specific language pairs
- translation of application-specific phrases (such as menus and error messages)
- access to machine translation.
The role of Java classes for the APIs, with network down-loading of associated libraries from a variety of (commercial) vendors, could be an important key to the development of a world-wide infrastructure supporting i18n.
In discussing the Web particularly, the following points were noted in discussion:
- Both font information and the MAITS i18n support packages require naming conventions establishing and flexible down-loading mechanisms to enable many vendors to supply (on a free or a charged basis) packages to support the defined APIs for specific platforms, and to permit flexible extension of the support as new character encodings and machine translation facilities emerge. In this context it is important to differentiate between machine-readable code that is the authoritative definition of specific mappings, and libraries provided by many vendors to support a particular mapping through the standard API. Some of the latter will be free, some expensive. Some slow, some fast. Some will handle a wide range of mappings, some rather less. The selection of mappings and associated libraries to go from a given source to a given target is an architectural question worth some consideration.
- In considering questions of language, it is important to recognise that for any specific Web page, a server is likely to have it available (statically or dynamically) in several language forms, but that these are NOT all equal. There is a "quality" issue arising from:
- One will be the master human-produced document
- Others will be human translations, and may or may not be the latest version of the master.
- Others will be machine translations, and again may or may not be up-to-date.
Given this scenario, there are issues of how such documents should be named and stored on a Web server, and what selection/preference mechanisms need to be provided for a human browser.
- In considering (dynamic) machine translation (MT), there arises the question of "architecture". It is likely that all of the following scenarios will arise, and will need support in browsers and in protocols:
- Translation by an MT co-located with the Web server
- Translation by an MT as a browser "helper application"
- HTTP access by either the server or the browser to a free-standing, third-party, perhaps commercial MT engine.
The precise selection mechanisms that a user needs to determine the actions to be taken in such an environment needs further discussions
- It is easy to focus discussion primarily on issues of language selection for reading material, but the character set and font issues are at least as important and difficult for INPUT of material into forms. Again, the appropriate architecture to support use of dynamic MT for forms input is also an important question. (As an aside, perhaps outside the remit of this workshop, is the question of voice input into forms, and the translation issues for voice as well as text).
It is important to note that at this time these are simply summaries of discussions, and do not represent a consider view of the MAITS participitants, still less of any ISO work on i18n.
John Larmouth
Grahame Cooper