Re: NEW ISSUE: content sniffing

On Tue, Mar 31, 2009 at 7:44 PM, Roy T. Fielding <fielding@gbiv.com> wrote:
> It is impossible to determine a media-type (how a recipient should
> process a given representation) by sniffing the content (the data format).

Regardless, many popular user agents override the server-provided MIME
type after examining the content of HTTP responses.

> When a media type
> is not present (or is detectably incorrect), only the implementation
> doing the processing can determine an appropriate guess because that
> guess is almost always determined by the context in which the
> reference was made (not by the content).

Content sniffing algorithms in browsers largely ignore the context in
which the HTTP response is being used.  Specifically, algorithm for
computing the effective MIME type is a function of the HTTP response
alone (both in practice and in draft-abarth-mime-sniff).

> Since the context is
> deliberately not sent on the wire, there is absolutely no way that
> accurate sniffing can be defined by HTTP.

Thankfully, we do not require "accurate" sniffing.  We simply require
an algorithm for determining a MIME type that is compatible with
existing Web content.

> We aren't talking about
> a protocol decision regarding communication; we are talking about
> an operating default that is specific to the purpose of a given
> client and will likely be different for each one.

I disagree that we're interested in something specific to a given
client.  We're interested in an algorithm for determining the MIME
type of an HTTP response that works with existing Web content.

For example, suppose you're implementing an image editing program,
let's call it Imageshop.  You'd like users of Imageshop to be able to
open images specified by URL.  A user asks to edit
http://example.com/fancy-image.  Imageshop issues an HTTP request for
that URL and receive the following response:

Content-Type: image/jpeg

GIF89a...

Because popular user agents have historically interpreted such HTTP
responses as image/gif, it is quite likely that the server intends
this response to be treated as image/gif (and not as image/jpeg).

If the Imageshop developers follow the existing HTTP spec, they will
receive complaints from their users and be forced to reverse engineer
a content sniffing algorithm that is compatible with existing Web
content.  If, instead, we specify a sniffing algorithm, Imageshop will
interoperate with existing Web content as it's users expect.

> In any case, there is no algorithm for sniffing that is anywhere
> near the same level of standardization as HTTP.

You're right that a number of popular user agents use different
sniffing algorithms.  I'm hoping to converge these implementations on
a single sniffing algorithm.  Having the HTTP spec recommend a
specific algorithm will aid this process.

> The one that HTML5 is working on would barely qualify as Experimental.

I'm not sure what qualifies as "experimental," but the algorithm in
draft-abarth-mime-sniff is quite similar to the algorithms that ship
in Firefox and Chrome.  We have a great deal of data from the Google
search index and from opt-in user metrics with which to evaluate its
compatibility with existing Web content.

> If the folks
> promoting such software can successfully deploy it across all HTTP
> clients, then it should be referenced.

Surely this is too high a bar.  Why bother writing standards if we
require all implementations to interoperate perfectly before putting
pen to paper?

> Until then, it remains an
> unproven and, IMO, mistaken idea which is far more likely to
> be overcome by events than become a standard way to handle HTTP.

I don't think that content sniffing will magically disappear if we
just ignore it long enough.  Instead, we should shed some light on
this dark corner of reality.

Adam

Received on Wednesday, 1 April 2009 08:18:51 UTC