Content Negotiation: why it is useful, and how to make it work

Part of Tutorials

Author(s) and publish date

By:
Published:
Skip to 17 comments

We recently received a puzzled message from a visitor of the W3C Web site, asking how we were serving images without file suffix in their URI. Looking around, our visitor found that http://www.w3.org/StyleSheets/TR/logo-REC was not one file, but two: logo-REC.gif and logo-REC.png. How do we do that?

The short answer is: Content Negotiation.

In this article, we will discuss content negotiation in depth and examine practical solutions. However, to begin, we need to first understand what a URI is, and what it is not.

A URI is a reference

The first thing we need to understand is that a URI is not a file name.

It is convenient to see a URI as the location for a file, and in most cases, the analogy works. However, as we will see soon, this analogy is too poor to describe everything a URI actually is. Let's just remember that there is a good reasons why web "pedants" insist on calling it a URI, a Universal Resource Identifier, not a URL: it is not a file name or location, it is an identifier (or a reference) to a resource. By using a proper protocol, it is possible to retrieve the actual resource, that's called dereferencing a URI.

But why all this abstraction, since in most cases the resource will happen to be stored in a file anyway, and the URI will be mapped directly to the file name?

Let us consider two things very similar to a URI: a bar code for a product, and an ISBN for a book. The former is a reference to a product, and the latter, a reference to a publication. In the case of the bar code and associated product, it is important to note that the product is not a specific box of cookies on a shelf, the referred product is actually the type of cookies of a certain brand, and all share the same bar code. Similarly, ISBNs do not refer to a flesh and bone (or, rather, paper-and-spine) book, but to the text it contains. In fact, it is not rare that several editions of a book share the same ISBN number: in the context of the ISBN, they are similar.

The same idea can be applied to URIs. A URI refers to a resource, but the resource is not one file on one web server. Take for example the resource "the weather in Oaxaca". A resource is just that: a piece of information on the Web. An HTML document with a text describing the weather in Oaxaca, or an image representing a map with indicators about the weather, all these files can be appropriate representations for this resource.

In fact, the maintainer of the Web resource could very well decide that a number of representations of this piece of information are equivalent, and think "what if I let the visitors of my Web site decide which representation they prefer?" On the Web, these equivalent representations of a resource are called variants, and the mechanism used to determine which of the existing representations is most appropriate for a given request is called Content Negotiation.

a URI is a pointer to a resource, which can have many equivalent variants

Content Negotiation: figuring out the best deal for everyone

Content Negotiation is a complex-sounding term for what is a rather simple mechanism.

Imagine yourself discussing on the phone, suggesting a date. You ask: We should meet soon! How about Wednesday, or Friday?. your friend answers Excellent! I have free time on Friday!. Sounds simple? Now replace the date with a resource, think of yourself as the client requesting a resource and your friend as the Server, accepting the request based on the preferences of the client and on its own availabilities: this is Content-Negotiation as it is implemented on the Web.

In summary, the basic idea of Content Negotiation is to serve the best variant for a resource, and to serve it based on:

  • What variants are available, and what variants the server may prefer to serve
  • What the client can accept, and with which preferences: in HTTP, this is done by the client which may send, in its request, Accept headers (Accept, Accept-Language and Accept-Encoding), to communicate its capabilities and preferences in Format, Language and Encoding, respectively.
<!-- this part on server-driven vs client driven was not really useful, taking it oot

By default, the HTTP protocol implements what is called a server-driven content negotiation mechanism, meaning that the Web server, upon receiving information about the Client's supported variants and preferences, as well as knowing the available variants for a resource, will be the actor of the negotiation responsible for making the final decision on which variant suits everyone best. The opposite is client-driven negotiation, where the server lists all variants, and asks the client: "pick one". The latter is sometimes used as a fallback mechanism when the server-driven mechanism fails.

-->

Language Negotiation: why every multilingual site owner should know about it

The mechanism that allows us to serve an image in two different file formats, which our visitor was puzzled about, is in fact one type of Content-Negotiation, called Format Negotiation. One other important and interesting usage of Content Negotiation is its application to representations of a resource in several languages, and how to serve them to the reader based on their preferences: Language Negotiation.

With Language Negotiation, there is no need to give a link to oaxaca.html.en for readers of English and oaxaca.html.de for readers of German, just link to oaxaca, set up your server properly (e.g apache) and the negotiation happening between the server and the client's preference will make each reader receive the resource in the proper language.

Why is Language Negotiation seldom used?

How come then that language negotiation is not being widely used at all if it can be so useful in dispatching, automatically, the proper language variant of a document to its audience? Partly perhaps because it is not well known, and people building multilingual web sites think of their site as a multiplication of language-specific mini-sites, instead of thinking of it as one site, with one set of URIs, only with different versions and languages available.

It is not, however, the sole reason for the lack of usage of language negotiation. One other reason is that for a long time, with the most popular negotiation-enabled Web server (the ubiquitous apache), failed negotiation (for instance, a reader of french being proposed only english and german variants of a document), resulted in a nasty "406 not acceptable" HTTP error, which, while technically conforming to HTTP, failed to follow the recommendation that a server should try to serve some resource rather than an error message, whenever possible. Fortunately, more recent versions of the server now allow the setting of a fallback, or default, variant in case the negotiation fails.

Another serious issue: giving the users, not the browser, what they want

There is another issue with language negotiation as it is implemented in HTTP: it implies that the client is properly configured, that is, it implies that the client (the Web browser) will send Accept-Language information that actually reflects the languages its user can read, and what languages are preferred among these. Unfortunately, it is often not true: although many modern browsers do allow their users to set preferred languages, not all of them do, and even when they do, there are cases when the user does not know how the set up is made (here is how). In some cases, for instance on shared computers or "internet kiosks", the user is not even allowed to change the settings of the Web browser.

In this context, a zealous usage of language negotiation can even have effects against usability of a site. Imagine a bilingual site (in our example, English and Japanese) where negotiation between the server and browser results in the choice of the English variant. The reader actually prefers Japanese, and finds a link to the Japanese version, easily visible at the top of the English variant, and follows it. However, as the user keeps browsing… the negotiation between the browser and the server keeps returning the English version. Quite probably, the user will just get irritated, browse away, never to return: language negotiation, albeit there to help the user, can prove to be a usability liability.

Toward a better language negotiation

How can we work around this?

One possibility is to choose to provide "generic", language negotiated access to resources only at known important entry points to the site, and from there on, use only language specific links . That solution does prevent the running away of users irritated by the limitations of language negotiation, but if Bob wants to send a link to a specific resource on the site to his friend Norio in Japan, wouldn't it be nice to be able to just send the URI of the page he is browsing (in English) and have his Japanese friend automatically get the Japanese version? Wouldn't it be nice to be able to use the power of Language negotiation on the whole site, without any usability issue?

After all, the concept of negotiation is to try and automatically provide the best possible variant based on the ones available in the server and the preferences of the user - whether to use the preferences of the browser and the Accept HTTP headers it sends is only a convenient implementation in HTTP, not the only way to implement a negotiation system.

What if?… Negotiation could take into account the settings in the user's browsers, and records of past interactions with the site. And although HTTP is stateless, there is an easy way to do this: cookies. A negotiation algorithm trusting a cookie showing that the user has chosen a language different than the one negotiated based on its Accept-Language: header information, and defaulting to Accept-based negotiation in the absence of such a cookie, may be the best of both worlds: negotiated resources, and the guarantee of a consistent user experience regardless of potentially misconfigured browsers.

A PHP implementation of the "better language negotiation"

Below is a sample implementation of the idea described above, using the php language.

How this php-based language-negotiation works

  1. page (the URI naming is just an example) is the "generic" resource. It checks for the existence of a language choice cookie first, and in the absence of it, calls choose_lang.php. When the negotiation algorithm ends, a variable called $chosenlang is set, and based on the value of this variable, either page.en or page.ja is called with an include mechanism
    You can download the php source for page. Its code is actually very simple, as shown below:
    <?php
    include('/path/to/choose_lang.php');
    if ($chosenlang == "en") {
            include 'page.en';
    }
    else {
            include 'page.ja';
    }
    ?>
    
  2. choose_lang.php implements a very basic HTTP language negotiation based on Accept-Language: headers. It does not take into account "quality factors", which could be used to weigh in several possible choices. As you can see in the commented source, its main task is to find a value for the variable $chosenlang, first by checking the presence of a cookie (denoting a language choice in previous interaction with the site), then by trying a content-negotiation algorithm similar to that of HTTP, and finally, if necessary, falling back to a default language choice.
  3. Finally, the language specific files for our page, page.en and page.ja should have some code at the top, executed only if the variable $chosenlang is not set. As we saw above, this would mean that the resource was not called through the generic resource, but rather requested directly, so it's a fair assumption that the user followed a language-specific link to switch the language of display: therefore, we want to store a cookie recording the new choice of language.
    Here is how the code at the top of page.en should look:
    <?php if(! isset($chosenlang)) {setcookie("lang", "en", time()+60*60*24*30, "/"); $chosenlang="en";} ?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    …
    

Links galore

Post Scriptum

Many thanks to Karl Dubost, Felix Sasaki and Steph Troeth for some excellent input, suggestions and corrections to this article.

Please use the comments form below if you wish to provide feedback, or suggest other implementations to make Content Negotiation on the Web more useful and more widely used. Thank you.

Related RSS feed

Comments (17)

Comments for this post are closed.