text/html
and application/xhtml+xml
This document is an attempt to gather the known techniques to serve XHTML documents following the backwards compatibility guidelines as both text/html
and application/xhtml+xml
with content-negotiation, thus allowing browsers that do not understand the newest MIME-type to get a version with a MIME-type they understand.
This is a work in progress. Please send your comments and suggestions to the publicly archives mailing list <www-qa@w3.org>, or if you don't want your email to be public, to the editor of this document, Dominique Hazaël-Massieux at <dom@w3.org>. We are especially interested in techniques for other popular web servers.
Content-Negotiation is a mechanism defined in the HTTP specification that makes possible to serve different "versions" of a document (or more generally of a resource) at the same URL, so that user agents can choose which version fit their capacities the best.
One of the most classical usage of this mechanism is to serve an image as both GIF and PNG, so that browser that don't understand PNG still gets the GIF version.
To summarize how this works, it's enough to say that user agents are supposed to send an HTTP header (Accept
) with the various MIME-type they understand and with indications of how well they understand it. Then, the server replies with the version of the resource that fits the user agents needs.
XHTML 1.0 (as all the versions of XHTML) is supposed to be served as application/xhtml+xml
. But some browsers, among them Internet Explorer, do not recognize this MIME-type. While XHTML 1.0 may be served as text/html
when using the backwards compatibility guidelines, it would be nice to serve it as application/xhtml+xml
for those browsers that understand this MIME-Type.
One of the best way to do that is to use content-negotiation.
There are several ways to do this in Apache.
The traditional way to use content-negotiation in Apache is to have two files with the same name but a different extension, and the MultiViews
option set for the directory where these files resides.
For instance, if you have http://example.com/foo/bar.xhtml
and http://example.com/foo/bar.html
(assuming both extensions xhtml
and html
are defined with the relevant MIME-Types), the URL http://example.com/foo/bar
will offer in content-negotiation the two resources.
Note the following tweaks that are usually necessary:
FollowSymLinks
for this to work)Accept
headers, thus not allowing the server to know which version to serve; it's probably wiser to serve the HTML version by default - to do that, you need to lower quality-of-source parameter of the XHTML version, typically by adding it to the MIME-Type directive: AddType application/xhtml+xml;qs=0.8
(Note that this is the way the W3C Home page is currently served).
While it's theoretically possible, I don't know any way to do it without breaking some important aspects of HTTP (such as proxying, or the HTTP PUT method) - that is, the method I know using RewriteRules doesn't set headers such as ETag as it should.
For a page served through PHP scripts, it is possible to have the page served both as text/html
and application/xhtml+xml
depending on the user-agent that requested the page.
To do so in an HTTP-friendly way, the contentNegotiation class can be used to parse the Accept
header reliably.
The following code will serve a page as application/xhtml+xml
in preference to text/html
, except if text/html
is preferred, or if the user-agent is identified as MSIE.
require_once("http://www.w3.org/2005/04/conneg.phi"); // copy the class to your server
$conneg = new contentNegotiation();
$uastring = $_SERVER["HTTP_USER_AGENT"];
header("Vary: Accept, User-Agent");
if ($conneg->compareQ("application/xhtml+xml,text/html")=="application/xhtml+xml" && !strpos($uastring,"MSIE")) {
header("Content-Type:application/xhtml+xml");
} else {
header("Content-Type:text/html;charset=utf-8");
}
Note how this code sets the Vary
header to make it explicit that content negotiation happened; the ETag
header would need to be set similarly if your server sets it automatically.
Jigsaw handles by default files with the same basename and a different extension as content-negotiated resources (the same way Apache does when set with Multiviews
option). You can also set a prefered variant for each resource by modifying its quality setting (which varies with 0.01 increment).