Position Paper for W3C Workshop on Constraints and Capabilities for Web Services

The Importance of Constraints and Capabilities for the Internationalization of Web Services

Submitted by Martin J. Dürst, W3C/Keio University

Introduction

The importance of constraints and capabilities for the internationalization of Web services can very easily be shown with the following argument: Internationalization is making it possible for software to deal with the wide range of diversities resulting from the use of different languages, scripts, and cultural conventions all around the world. It is often desirable for a service to adapt to the linguistic and cultural preferences of a client. However, due to resource constraints, it is often not possible for a service to offer adaption to all such preferences. Having mechanisms for clients and servers to express, and potentially even negotiate, the constraints and capabilities under which they operate is therefore crucial.

The Web Services Internationalization Task Force of the Internationalization Working Group has documented Web Services Internationalization Usage Scenarios in a Working Group Note. This document contains a lot of relevant background (Web services specialists are in particular pointed to Section 3, Introduction to Internationalization: Definitions for a Discussion of Web Services, for an introduction. Section 4.1, Locale Patterns in Web Services, splits behavior of Web services with respect to internationalization into four patterns: Locale Neutral, Data Driven, Client Influenced, and Service Determined. Constraints and capabilities are crucial for the later two patterns. Please note that a specific service may use more than one pattern in different aspects of its behavior. A recent talk provides some advice on preferences when using these patterns to design Web services.

From HTTP to Web Services

It can help to look at the internationalization mechanisms in HTTP and contrast them with the need of Web Services. HTTP basically has two mechanisms: The Accept-Charset:/Content-Type: pair of headers to express preferences and the actual value for the charset (character encoding) of a document, and the Accept-Language:/Content-Language: pair of headers to express preferences and actually used natural language(s) in a document.

Of the above functionalities, the one describing the character encoding is the easiest one. Web services are defined in terms of XML documents or the XML infoset, which deals with characters rather than octets. Bindings usually take care of encoding into octets, which is easy because many of the underlying protocols already have mechanisms for dealing with this. Also, the tendency towards the uniform use of UTF-8 helps. In addition, XML has very well-defined rules for the case that an XML document is found as a stream of octets without external character encoding information.

With regards to language, the situation is more complicated. Language is not the only, and often not the main aspect of a Web service that depends on internationalization. While HTTP was originally designed to retrieve human-readable documents, where natural language is core, for Web Services, other internationalization aspects usually subsumed under the term 'locale' are often more important. The relationship between language and locale is not trivial, although it can be shown that in many respects, these two concepts are very close (see the example in Usage Scenario I-022).

When going from HTTP to Web services, the aspect of negotiation inherent in the Accept- headers is also lost. Because Web services are not strictly request-response, other models to perform or simulate negotiation become relevant. In this respect, capabilities and constraints are important. Also, even for a request-response use of Web services, there are no established conventions for negotiation, and there are also no established conventions for mapping from SOAP headers to headers in binding protocols, such as HTTP. A solution for constraints and capabilities should not be purely static, but should extend naturally to more dynamic scenarios such as negotiation.

Main Internationalization Issues that Need to be Addressed with Constraints and Capabilities

Hard-core Web services experts often claim that Web services are used for machine-to-machine communication, and so issues of human language and culture are not relevant. Such a statement is only partially true. First, while Web services can be designed to be largely independent of human language and culture, at least on an infrastructure (as opposed to an application) level, many Web services are not designed from scratch, but are exposing existing functionality that has been created, and is executing, implicitly assuming e.g. a particular locale model.

Second, there are some operations involving human cultural preferences that cannot easily be moved from the server to the client. The clearest example for this is comparison/sorting: Answering queries involving text comparison or sorting in a way that corresponds to user preferences either means that the whole dataset has to be transferred to the client (impossible both in terms of resources and for privacy reasons), or it means that these operations have to be carried out on the server according to client preferences.

When looking at the various functions traditionally grouped as a 'locale' in system platforms, it turns out that when moving from a local execution model to a worldwide network, each of these functions behaves very differently with respect to how it is best treated on the network. A detailed discussion is found in World Wide Localization (23th Internationalization and Unicode Conference, March 2003, Prague, Czech Republic).

Exposing Operational Issues as Constraints and Capabilities

On the surface, the use of XML has at least solved character encoding issues. However, in practice, this is not always the case. When Web services are used to make existing legacy systems available on the Web (as opposed to implementing cleanly designed Web services from scratch from their description), it is often the case that such systems can only deal with a limited repertoire of characters. Although XML Schema provides regular expressions to express such constraints (except for mixed content), for reasons of orthogonality and future-compatibility, it may not be appropriate to expose this in the service signature. However, it may be possible to model this as additional restrictions on the service signature provided by a constraint. This would be a way to operationalize certain classes of constraints.

Interaction of Internationalization with other Specific Problem Domains

A direct interaction between the above mentioned internationalization-related constraints and capabilities and other problem domains such as security, privacy, and reliable messaging is not expected. Therefore, it is important that technology to deal with constraints and capabilities allows easy and orthogonal combination of aspects from different problem domains.

However, there are some connections between internationalization in general and other specific problem domains. These may have less to do with text encoding and formatting aspects that are traditionally handled by locales, but relate to issues such as jurisdiction or network availability. Privacy policies are easily affected by jurisdiction, because different jurisdictions take different approaches to dealing with privacy issues. Also, security policies may be affected by jurisdiction, because different jurisdictions may have different rules or customs regarding e.g. what consists legally binding data exchanges or what may be acceptable as evidence in court. Also, the technologies used for assuring reliable messaging, and also the bindings used for actual message transport, may depend on the general availability of the network, which may vastly differ in different parts of the world.