RE: "sniffing" from Larry Masinter on 2010-03-14 (www-tag@w3.org from March 2010)

From: Larry Masinter <masinter@adobe.com>
Date: Sat, 13 Mar 2010 22:09:17 -0800
To: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <C68CB012D9182D408CED7B884F441D4DAD61D4@nambxv01a.corp.adobe.com>
(Had this queued up, but noted I hadn't sent it):


There are many computer "languages".  Because of this,
there are often strings which are "valid" or might even
seem to be "appropriate" for more than one language.

Software should not "guess" that something that is
labeled as being X is more appropriately treated as Y,
unless the software is an expert at interpreting the
"X" label, and using "X" doesn't work, while using Y
does, and the likelihood that this is a configuration
error (rather than sender intention) is high.

Example 1:  if something comes labeled as 
application/vnd.company.specialpng, the receiver should
treat it as vnd.company.specialpng, and never "guess"
that it should instead treated as a PNG, just because
it "looks like a PNG".

Example 2: 
If something comes in labeled as text/plain, however,
and the application interpreting the data knows
sufficiently about text/plain to know that the
data would be intelligible, while it more closely
matches, say, image/jpeg, and would correctly display
as image/jpeg, and also that mis-configuration of
the site from which the image was retrieved is
likely (through statistical analysis, say), then
performing "sniffing" might be an option.

I think this general rule should apply to MIME 
types, HTML versions, charset labels and language
tags (four kinds of 'sniffing' currently covered
by the HTML document.)

Allowing for disambiguation when the content is removed
from the particular context and repurposed for some
other context is the reason why content SHOULD be
"self-describing", why specifications should explicitly
allow for content that intends to match the specification
to be labeled with the specification name and version,
and that any re-interpretation of that self-description 
should be done cautiously, confirmed, and, if possible,
a way of "correcting" mis-labeled document made a required
or encouraged element of any tooling.

(quoting myself)


> I've postponed ACTION-386 (which was to do a more thorough
> in-depth review of the "sniffing" document), but I wonder if
> it might be possible have a discussion about a very small
> piece of it.
> 
> The mime sniff document, many W3C recommendations, and
> many discussions, including the recent traffic in
> public-html@w3.org around re-registration of the
> text/html MIME type all seem to take the form of
> 
> "Can I serve an X document as Y"
> 
> "How can I 'sniff' that an X document served as Y
>  really is an X."
> 
> These discussions seem to assume that the notion of
> "an X document" (an HTML 5 document, an XHTML2 document)
> is meaningful and well-formed and decidable without
> any additional contextual information. 
> 
> But in the case of "polyglot" documents, we have something
> that is simultaneously "an X document" and "a Y document",
> or is either one or the other. 

> 
> I'd like to see if we could get some agreement on
> a way to rephrase those statements and questions.
>
Received on Sunday, 14 March 2010 06:09:54 UTC