WSTF: DRAFT: Locale Tag Requirements

Introduction

This document outlines the requirements for a general-purpose locale exchange model for Web services. This document is the work of the W3C Internationalization Work Group Web Services Task Force (WSTF).

What is a Locale?

A "locale" is a collection of settings related to regional or cultural preferences. As a collection, it is an abstraction of a specific "cultural preferences" or "regional settings" for use by computer systems, not meant as a perfect model of an individual's preferences. Locales provide software developers with the means of writing software that expresses itself in the most appropriate way for a given set of users.

Some of the things that locales commonly influence are:

String display of numeric values
String display of date and time values
Parsing strings to obtain embedded numeric or date/time objects/values
Default currency (such as pounds, Euros, dollars), and currency formats
Percent formats
Default measuring system (SI/metric or customary)
Default system resources (such as fonts, character encodings, and the like)
Collation (sorting)
User interface or content language (sometimes referred to as "natural language")
And many other things…

The locale or locale object generally is not the actual function that does the parsing, collating, formatting, etc. Instead, it is an agreed upon shorthand that the software developer passes to locale-aware functionality in the operating environment.

For example, in Java if you want to get a string with today's date in it, you might write code that looks something like this:

Locale myLocale = new Locale("fr","FR");

DateFormat df = DateFormat.getInstance(myLocale);

String myDateString = df.format(new java.util.Date());

Other platforms and programming environments are similar. For example, C and C++ (XPG4) programs call the "setlocale" function to activate various formatters embedded in functions like strftime().

The most important thing to notice about locales in traditional internationalized programming is that they are part of the operating environment. A default user locale is maintained on a per-process or per-thread basis and is available to the programmer as part of this operating "context". In a single user environment, the programmer is not forced to obtain or instantiate the actual locale or locale object unless they wish to do some specific multi-lingual operation that requires control over the locale. That is, with proper internationalized design, the developer only chooses to pass a locale identifier or locale object to a function or method call when specific control over internationalized behavior is required.

As a result, the user's locale rarely appears in an API and it is extremely rare for the locale to be a reasonable member of a Web services data structure: data structures should be "locale neutral" and make extensive use of locale neutral field types (such as those in XML Schema).

It is true that portions of a locale identifier are often used in data structures. For example, you might need a "country code" to identify a postal address or a "language code" to identify a keyword list.

Since the operating environment supplies the current user's preference and calls to international-aware functions use this context to do their work, the Java example above would much more frequently be written like this:

DateFormat df = DateFormat.getInstance();

String myDateString = df.format(new java.util.Date());

In this example, the "DateFormat" object gets the default locale from the user's runtime environment and does what the user expects: it provides a locally formatted date in a specific format.

The foregoing is important to understand for a discussion of internationalizing Web services because a distributed architecture, such as Web services, requires greater care in order to preserve end-user expectations and to provide a meaningful and coherent architecture.

What is a Web Service?

Web services generally have these features:

Loosely-coupled. Web services exist independently of each other and can run on entirely different implementation platforms and run-time environments. Differing implementations are free to change without impacting others as long as the interface remains the same.
Encapsulated. The internals of a Web service – such as the underlying technology, what back-end systems the service interfaces with, and the logic underlying the provided service – are completely invisible to the users of the service. The only thing that is exposed is the public interface (which consists of generic data).
Components. A Web Service can range in scope from an independent application to modular subcomponents of a larger application. In other words, a Web Service can encapsulate and include other Web services.
Standard protocols and data formats. The interfaces exposed by Web services conform to standards such as XML, HTTP, and SOAP that are open, widely published, and freely available for implementation.

By design, the execution of the service is hidden behind an abstraction layer, and it doesn't matter what programming language was used to create the service or what operating environment is being used. The idea is that we can use XML files to pass platform independent data to services without having special software or inside knowledge of the system design, programming language, or other implementation details.

Understanding the Internationalization Problem

For such a system to work in a global environment, it should be possible to create a service that is internationalized and localizable.

Internationalization is the process of creating software that supports users with different cultural or linguistic requirements.

Localization is the process of "translating" the software user interface and presentation for use in a specific language or target market.

Ideally the development of a Web service should follow a similar paradigm to creating other "regular" software. Internationalized software generally obtains the user's locale (often it is transparently obtained from the execution environment, as demonstrated above) and uses it to control locale-affect processing.

For example, a method or function that performs a database query and returns a "result set" might want to return the database rows in natural language order. This order is dependent on language and cultural preferences. An internationalized version of this method would obtain the user's locale object and use it to set the database's internal collation (or perform the collation itself).

If a developer has created a locale-sensitive method and wishes to convert it to a Web service, the method will not generally have a "locale" in its parameter list or it will be an optional parameter. How should a Web service "container" treat that method? How will the method obtain the locale of the user?

As noted previously, virtually all of the data in a Web services interaction is language and locale neutral. The problem is that many services require a locale preference to satisfy the needs of the end user.

There are a variety of different examples of this. Please refer to the Usage Scenarios document for the complete range of issues. Some of these issues are:

There is no standard way for the client to communicate the desired locale and content language for the service to use.
There is no standard way for the client to communicate the desired locale and language for the service provider (container) to use, especially in the case of a fault.
Services that need to be locale-sensitive must adopt a proprietary solution, expose their internal locale model, or use the host operating environment's locale. That is, there is no way for the service to obtain the client's locale preference except to specify one or more non-standard arguments in the service contract (WSDL).
There is no way for the service to communicate what language and formatting options are available.
There is no way for the service to communicate (or the client to infer) what locale and language settings were actually applied to the processing.

Existing (human-based) web interactions have confronted these issues with only modest success, by using and abusing the HTTP Accept-Language header coupled with (human-interactive) preferences selection and personalization systems. Since a human can "self-correct" over the course of a session, this is often acceptable. Web services, though, are machine-to-machine interactions, and cannot "self-correct" in the same way.

As a result, developers of internationalized Web services (especially those that support multi-lingual/multi-locale operation in which the servers and services can process in a variety of locales) have to provide the ability to external users, whose platforms and programming languages may be maximally different than their own, to specify the locale and language of the response.

Web services provide a specific area in which locale models and tags should play an important role, but where a solution is not currently in place.

Requirements

1. Discovery. Consumers of public Web services or application integrators who needs to locate a service will access the service registry and, if a suitable service for the business or application is found, the retrieve the corresponding WSDL definition from the registry. This step (the consumer looking up a registry and locating a specific service) is referred to as “discovery."

Using Web services begins with discovery. Since UDDI registries are human-based interactions, the language preference and locale of the end user may figure into how they wish to do the searching and which registry they want to access. A UDDI registry today probably does not recognize language differences in the description of services. A keyword search is often strictly heuristic and makes no language distinction. The response may include human-readable information in a particular language, but the user cannot request a specific language. Submitting two different descriptions of the same service leads to having two entries in the registry, which must be separately maintained.

2. Invocation. Based on the WSDL definition the user can locate the Web service on the network, format the SOAP document necessary to execute the service, and call the service.

The consumer, having selected a service, can then obtain the WSDL document that describes how to invoke the service. This document describes the invocation semantics, the required data fields necessary to operate the service, where the provider is located and how the data is encoded. This information is used to generate a SOAP document to invoke the service. Since the service provider environment doesn't provide support for a locale or language preference, each Web Service must be designed to include the user's locale or language preference as part of the data necessary to invoke the service. This generally exposes the specific platform the service is running on to the end user of the service in a non-portable way.

The provider attempts to decode the message, execute the service requested (either locally or on an other machine), and returns the result. The decoding takes the form of reading the SOAP envelope. Sometimes the envelope may contain a header that causes the message to be "chained" or forwarded to another host system for processing.

Sometimes it may fail or require processing, in which case the provider needs to know what language and locale the user needs. For example, if the SOAP document contains an error, the response generated by the provider generally contains a SOAP-Fault message that consists of a human-readable string describing the reason for the failure.

In addition, in some cases it may make sense for the provider to route the request based on a specific locale or language request from the user. Different systems may be maintained regionally with different data sets or different logic may need to be applied to processing based on the consumer's locale.

3. Execution. The "service provider" (you can think of this as the host machine) receives the request, processes it, and calls the underlying business logic—you can think of this as the actual service. The results are packaged in a SOAP response document, which is returned to the caller.

If the SOAP envelope describes a service that the provider recognizes, the provider then can execute the service. The data in the SOAP document is passed to the function or method and any return values are encoded into a SOAP response.

Unless, as noted above, the service's designer has made specific allowance for it, there is no way to tell either the service or the service provider what language or locale the response should use. While most of the data in the response will be locale neutral (that is, it will consist of abstract values like booleans, integers and floats that can be formatted by the receiving system ), some aspects of the processing may require a locale.

There is another aspect to providing the locale in the envelope instead of as part of the data. Recall that in "regular" (non-client/server) programming, the locale is part of the environment. The developer doesn't need to think about the locale in writing code, since the environment sets and manages all of the defaults. Although server writers historically have not had this luxury, it is by far a better design for the server to maintain the locale and language as part of the session context for a particular service or interaction. Developers can then focus on creating the right business logic for the service and rely on the system maintaining locale and language context.

4. Response. Once a Web Service has run, the response is generated from the service's outputs. The service and host have no standard way of indicating what locale was used to perform the processing (the xml:lang attribute can be used to tag the language of the content).

As a result, the web services developer is solely responsible for supplying the fields, logic, and semantics that will be used to achieve these kinds of capabilities. Each service will vary in its approach and may not bother to supply a suitable mechanism. Without guidance from the client, assumptions have to be made that are unsuitable. For example, the locale of the server may be used to format the response.

Finally, there is an important functional and semantic difference between a field supplied in the actual service invocation (that is, as part of the data) and one supplied in the envelope (that is, as part of the protocol).

When supplied as part of the data, developers must always take care to create, populate, read, and process the fields. Internationalization of an existing service therefore takes the form of deploying a new service (since the inputs have changed).

By contrast, if locale and language preferences are part of the "context" (in the envelope, for example), the developer gains several advantages. First, both the provider and the service can read the locale and language preference. (The service must be provided with a specific API to obtain the locale and language from the provider, or it can be silently managed by the provider.) Services that require external environmental changes to activate their locale-sensitivity can have the provider perform this processing for them. Multiple services in the same "chain" can inherit the same locale and language context.

Most important, though, the client-side environment can be optimized to provide the locale and language preferences of the end user automatically, without developers having to write code to obtain the values and populate the inputs of the Web Service. In addition, Web Service authors can add international or multi-language support to services after initial deployment without changing service descriptors (WSDL and XSD) that may already be in wide use.

Vendors are free to define any of the above and even to require them for use of a specific web services platform. However a proprietary solution is highly unsatisfying: it makes interoperability difficult. It would be better if the various web services providers adopted a single standard way of communicating these capabilities and selecting among them. Proprietary multi-locale capabilities within the web service itself would then be more useful, since these capabilities would be exposed to customers and spur the adoption of web service technologies.

In addition, the same standards would provide a framework the promote the creation of multi-lingual web content, including richer personalization based on the availability of more subtle locale, language, and preferences capabilities in the web in general, as well as potentially spurring the development of multi-lingual capabilities in software that previously didn't provide them.

This document, then, contains the requirements for addressing the first "missing piece"—locale tags—and allowing for the creation of specific standards to solve the other problems with locale and language negotiation listed above.

Tags should be as portable as possible. Client, service provider, and service should be able to glean as much information as possible from the tag—and that information should have as similar a semantic meaning as possible.
The tag should be linkable to a locale repository
The locale description tags should be open to adoption by existing vendors as quickly and easily as possible. Vendors should be able to map their existing locales and use them for interaction without having to place them in the public domain, share sensitive locale data, or modify their existing locale semantics in any way.
The tagging system should clearly define the fallback pattern to be followed when interpreting the tags. This will allow systems to provide as much of their own proprietary structure as possible when generating tags, while providing as much redundancy as possible when interpreting tags from other systems.
The locale tags must be suitable for both expressing the client preference (analogous to the Accept-Language or Accept-Charset headers in HTTP) and the actual concrete value used for processing (similar to Content-Language or Content-Type).
The tags, and their elements or sub-elements should be stable. Existing POSIX locales, for example, rely on ISO 639 and ISO 3166, which are known to change over time and which have reassigned a new, different meaning to some of the codes.

Endnotes.

Addison Phillips (aphillips@webmethods.com)
$Id: DRAFT-locale-reqs.html,v 1.6 2003/01/28 22:26:37 aphillip Exp $