About sniffing from John Kemp on 2010-03-22 (www-tag@w3.org from March 2010)

From: John Kemp <john@jkemp.net>
Date: Mon, 22 Mar 2010 11:17:14 -0400
To: "www-tag@w3.org WG" <www-tag@w3.org>
Message-Id: <2E67B11D-CA03-4053-941E-1EB8127A92EA@jkemp.net>

Hello,

Below, I have written some suggested goals for our f2f discussion (I'll update the agenda to list these and a link to this email), and some notes from my recent re-reading of the Authoritative Metadata finding - http://www.w3.org/2001/tag/doc/mime-respect. I realize I can't make this "required reading" at this late stage, but I would suggest that people at least read my notes below, but preferably read the Authoritative Metadata finding itself to get a good background for this issue.

I believe this email to be related to ACTION-399.

Regards,

- johnk

Sniffing discussion goals
-------------------------------

* Discuss what (if anything) can be done by the TAG to improve the situation of content-type mis-labeling errors and reporting.
* Discuss the requirements for a content-sniffing algorithm given the constraints discussed in Authoritative Metadata, and in relation to the content-sniffing draft proposed in http://tools.ietf.org/html/draft-abarth-mime-sniff-04
* Establish any updates to Authoritative Metadata and Self-describing Web findings based on these discussions.
* Discuss other instances of sniffing, as noted by Larry in email to TAG:

"I think this general rule should apply to MIME
types, HTML versions, charset labels and language
tags (four kinds of 'sniffing' currently covered
by the HTML document.)"

Reading Authoritative Metadata (AM)
-----------------------------------------------

Arguments *against* the summary of key points from AM finding:

i) Why should metadata in an "encapsulating container" be authoritative? What happens when the container is separated from the contained entity? What about publishing chains where mis-labelling occurs?
ii) Inconsistency between representation data and metadata is an error which MUST not be silently ignored. To make the situation better, we need to provide guidance that supports such correction - browser plugins that report inconsistencies to the origin server owner? Content-management system plugins that sniff uploaded content and report errors?
iii) Why must an agent not override content-type without user consent? Source view vs content view - when source is plain text and content is an interpretation of plain text it must be possible to display both...

"For Web architecture, a design choice has been made that metadata received in an encapsulating container MUST be considered authoritative" - why!? Section 3 attempts to describe why....

Why (summarized):

i) Make media types descriptive of intended interpretation, not just an indication of format.

This requires that media types are properly descriptive and registered accurately. This also doesn't deal with the mis-labeling problem (ie media type is there but doesn't accurately describe the proper interpretation.

In order to make this true, servers should sniff and detect mislabeled content received from clients too.

ii) If container metadata is not used, and sniffing is required, only one representation of the content is possible - thus container metadata MUST be possible.

Agree with this

iii) Using the container metadata model allows easier dispatch to "handlers/plugins" without recourse to inspecting the message body

Agree with this

What to do when no metadata is supplied:

* If Content-type is EMPTY, UA MAY sniff

* If Content-type is application/octet-stream, UA should ask the user (this is not said in AM, but appears common convention - AM says: "Server managers (webmasters) SHOULD NOT specify an arbitrary Internet media type (e.g., "text/plain" or "application/octet-stream") when the media type is unknown.")

Servers and clients should be more circumspect about labeling content - and say "I don't know" (empty Content-type) more often.

From AM: "Instead of specifying a default for metadata, it is better for representations to be sent without that metadata. That allows the recipient to guess the metadata instead of being forced to either accept incorrect metadata or be tempted to violate Web architecture by ignoring it."

and...

"It is better to send no media type if the resource owner has failed to define one for a given representation."

Conclusion: Authoritative Metadata finding accurately describes the issues and does its best to give good guidance.

Received on Monday, 22 March 2010 15:17:44 UTC