This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 7744 - Is sniffing required?
Summary: Is sniffing required?
Status: CLOSED WORKSFORME
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords: NE, TrackerIssue
Depends on:
Blocks:
 
Reported: 2009-09-28 16:00 UTC by Julian Reschke
Modified: 2010-10-04 14:47 UTC (History)
8 users (show)

See Also:


Attachments

Description Julian Reschke 2009-09-28 16:00:41 UTC
"The Content-Type metadata of a resource must be obtained and interpreted in a manner consistent with the requirements of the Content-Type Processing Model specification. [MIMESNIFF]"

This *seems* to undo a change we discussed a long time ago, and which resulted in content type sniffing being optional.
Comment 1 Ian 'Hixie' Hickson 2009-09-28 16:13:19 UTC
[MIMESNIFF]'s algorithm allows implementors to use the actual type without sniffing.
Comment 2 Julian Reschke 2009-09-28 16:21:58 UTC
Are you referring to the end of <http://tools.ietf.org/html/draft-abarth-mime-sniff-01#section-1>?

I agree that this makes it optional, but I think both HTML5 and MIMESNIFF could be clearer about that. For instance, the abstract says:

   Many web servers supply incorrect Content-Type headers with their
   HTTP responses.  In order to be compatible with these servers, user
   agents must consider the content of HTTP responses as well as the
   Content-Type header when determining the effective media type of the
   response.  This document describes an algorithm for determining the
   effective media type of HTTP responses that balances security and
   compatibility considerations.

Note the "must consider the content". That doesn't sound optional at all.
Comment 3 Maciej Stachowiak 2009-09-29 07:34:24 UTC
(In reply to comment #2)
> Are you referring to the end of
> <http://tools.ietf.org/html/draft-abarth-mime-sniff-01#section-1>?
> 
> I agree that this makes it optional, but I think both HTML5 and MIMESNIFF could
> be clearer about that. For instance, the abstract says:
> 
>    Many web servers supply incorrect Content-Type headers with their
>    HTTP responses.  In order to be compatible with these servers, user
>    agents must consider the content of HTTP responses as well as the
>    Content-Type header when determining the effective media type of the
>    response.  This document describes an algorithm for determining the
>    effective media type of HTTP responses that balances security and
>    compatibility considerations.
> 
> Note the "must consider the content". That doesn't sound optional at all.
> 

I suspect this was an error in draft-abart-mime-sniff - it looks to me like it inadvertantly used the word "mist" in a non-normative context. The draft capitalizes RFC2119 keywords when it means them. This should be reported as a comment on MIMESNIFF.
Comment 4 Adam Barth 2009-09-29 07:34:57 UTC
> Note the "must consider the content". That doesn't sound optional at all.

I've removed the "must" from this sentence.
Comment 6 Anne 2009-09-29 09:15:42 UTC
Not a bug in HTML5.
Comment 7 Julian Reschke 2009-09-29 09:19:03 UTC
If people read the spec as requiring content sniffing, then yes, it is a problem with the spec. One way to address is would make it to bl clear in the reference to MIMESNIFF.
Comment 8 Larry Masinter 2010-02-08 16:45:17 UTC
My understanding is that draft-abarth-mime-sniff has not been accepted by IETF to be on standards track yet, and so mandating its behavior normatively in the HTML document is inappropriate.

See also http://www.w3.org/2001/tag/group/track/issues/24 W3C TAG issue 24, which is tracking this issue.

Comment 9 Julian Reschke 2010-02-08 16:52:04 UTC
Well, even if it *was* on the standards track and ready, it *still* would be good if HTML5 clearly said that sniffing is optional.

After all, we just heard from the editor (see http://lists.w3.org/Archives/Public/public-html/2010Feb/0164.html) that readers do not follow hyperlinks, so why treat this different from other cases?
Comment 10 Adam Barth 2010-02-08 16:57:52 UTC
Is this the TAG finding to which you refer:

http://www.w3.org/2001/tag/doc/mime-respect-20060412

I haven't read it in detail yet, but it looks to be consistent with what HTML5 requires.
Comment 11 Larry Masinter 2010-02-08 17:35:31 UTC
Sorry if my comment was misunderstood. 
I was just trying to point out that there was an extensive TAG discussion of this issue, under TAG issue 24: http://www.w3.org/2001/tag/group/track/issues/24, and that the topic was still open in the TAG and that anyone writing an HTML WG proposal to resolve the issue might well want to consult with the TAG and the proposals and discussions there, or even collaborate with the TAG on the issue.

Some but not all of the www-tag@w3.org emails discussing this issue are linked from the issue or the actions associated with it.

The TAG issue is still open. If there isn't a W3C HTML working group issue on the topic yet, when it is, please link to the TAG issue.

Thanks,

Larry
Comment 12 Julian Reschke 2010-02-08 17:42:10 UTC
Well, this issue is only related.

It is about whether HTML5 should *itself* clearly indicate that sniffing is optional, instead of delegating this question to the MIME-SNIFFING Internet-Draft.
Comment 13 Larry Masinter 2010-02-08 18:17:23 UTC
The definition of the HyperText Markup Language should defer all protocol issues to separate specifications, so no, HTML shouldn't contain a reference to sniffing. 

If there needs to be a browser implementation guide, even the browser implementation guide should be modularized so that "Resolution of hypertext references (aka IRIs)" is a separate implementation guide, listing which schemes should be supported with reference to the scheme implementation guide.

I think the "change proposal" I'd like to see would be to remove all references to sniffing to a separate spec, maybe I'll integrate this with the URL change proposal i need to update.

I think if "sniffing" is how HTML browsers are expected to implement the HTTP scheme, it belongs in the HTTP scheme definition. 

At this point, I'd rather see barth-mime-sniff fixed so that it is actually acceptable to the HTTP implementing community best represented in HTTP-BIS. The current mime-sniff document still needs work, in my opinion, which is why i signed up to review it and propose different wording in  http://www.w3.org/2001/tag/group/track/actions/386. Might take more than a couple of days, though.
Comment 14 Julian Reschke 2010-02-08 18:21:00 UTC
I'm more than ok with removing mentions of sniffing from HTML5, but I'm not convinced that HTTP is the right place to move it to.
Comment 15 Ian 'Hixie' Hickson 2010-02-08 23:05:49 UTC
> After all, we just heard from the editor (see
> http://lists.w3.org/Archives/Public/public-html/2010Feb/0164.html) that readers
> do not follow hyperlinks, so why treat this different from other cases?

We shouldn't. I'd be more than happy to move this text back into the spec, as it was when I wrote it and before members of the working group asked for it to be put into a separate spec.
Comment 16 Julian Reschke 2010-02-09 08:54:42 UTC
(In reply to comment #15)
> > After all, we just heard from the editor (see
> > http://lists.w3.org/Archives/Public/public-html/2010Feb/0164.html) that readers
> > do not follow hyperlinks, so why treat this different from other cases?
> 
> We shouldn't. I'd be more than happy to move this text back into the spec, as
> it was when I wrote it and before members of the working group asked for it to
> be put into a separate spec.

How about leaving it where it is, and just adding the clarification?


Comment 17 Ian 'Hixie' Hickson 2010-02-09 09:37:47 UTC
If you have a specific request, please file a bug (or reopen this bug, if the request is on topic for this bug) for consideration. (I'm not sure what clarification you're referring to.)
Comment 18 Julian Reschke 2010-02-09 09:42:22 UTC
(In reply to comment #17)
> If you have a specific request, please file a bug (or reopen this bug, if the
> request is on topic for this bug) for consideration. (I'm not sure what
> clarification you're referring to.)

The request is to clarify that whenever MIMESNIFF is referred to, UAs may choose not to sniff, and instead accept the given Content-Type information as authoritative.
Comment 19 Maciej Stachowiak 2010-02-09 09:49:26 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > > After all, we just heard from the editor (see
> > > http://lists.w3.org/Archives/Public/public-html/2010Feb/0164.html) that readers
> > > do not follow hyperlinks, so why treat this different from other cases?
> > 
> > We shouldn't. I'd be more than happy to move this text back into the spec, as
> > it was when I wrote it and before members of the working group asked for it to
> > be put into a separate spec.
> 
> How about leaving it where it is, and just adding the clarification?
> 

draft-abarth-mime-sniff makes sniffing optional, but it would not be accurate to say following MIMESNIFF is optional. draft-abarth-mime-sniff-04 says:

   WARNING!  Whenever possible, user agents SHOULD NOT employ a content
   sniffing algorithm.  However, if a user agent does employ a content
   sniffing algorithm, the user agent SHOULD use the algorithm in this
   document because using a different content sniffing algorithm than
   servers expect causes security problems.  For example, if a server
   believes that the client will treat a contributed file as an image
   (and thus treat it as benign), but a user agent believes the content
   to be HTML (and thus privileged to execute any scripts contained
   therein), an attacker might be able to steal the user's
   authentication credentials and mount other cross-site scripting
   attacks.
 
In other words, it recommends that UAs should not sniff, but if they do, they should use this specific algorithm, not any others. HTML5 does not want that set of recommendations (either don't sniff, or if you do, use this algorithm) to be optional, though specifically choosing the sniffing side of that fork is optional.

I think the only way to convey this accurately would be to duplicate the whole paragraph I just quoted, and even that might not be enough context without duplicating the whole MIMESNIFF introduction. I don't think that would be an improvement.

(PS even though implementors don't always follow references, in this case there is no way to implement the required behavior at all without reading the referenced document.)
Comment 20 Larry Masinter 2010-02-09 10:00:07 UTC
"... it recommends that UAs should not sniff, but if they do, they
should use this specific algorithm, not any others. HTML5 does not want that
set of recommendations (either don't sniff, or if you do, use this algorithm)
to be optional, though specifically choosing the sniffing side of that fork is
optional...."

It is nonsensical to say that "HTML5 does not want". There is no entity "HTML5" that "wants" something. If sniffing is optional (that either sniffing or not sniffing areconforming), there is no reason why it should be non-complaint to, say, sniff for HTML when confronted with text/plain but not sniff for PDF when given text/plain.   The "all or nothing" advice on sniffing is inappropriate.

This is a comment on draft-abarth-mime-sniff-04 but is also a comment on the HTML specification from which it was derived.  

Trying to provide an exact algorithm for sniffing makes this difficult to fix; the right fix is to eliminate the algorithm and make normative constraints on the results instead.
Comment 21 Maciej Stachowiak 2010-02-09 10:56:22 UTC
(In reply to comment #20)
> "... it recommends that UAs should not sniff, but if they do, they
> should use this specific algorithm, not any others. HTML5 does not want that
> set of recommendations (either don't sniff, or if you do, use this algorithm)
> to be optional, though specifically choosing the sniffing side of that fork is
> optional...."
> 
> It is nonsensical to say that "HTML5 does not want". There is no entity "HTML5"
> that "wants" something. If sniffing is optional (that either sniffing or not
> sniffing areconforming), there is no reason why it should be non-complaint to,
> say, sniff for HTML when confronted with text/plain but not sniff for PDF when
> given text/plain.   The "all or nothing" advice on sniffing is inappropriate.
> 
> This is a comment on draft-abarth-mime-sniff-04 but is also a comment on the
> HTML specification from which it was derived.  
> 
> Trying to provide an exact algorithm for sniffing makes this difficult to fix;
> the right fix is to eliminate the algorithm and make normative constraints on
> the results instead.
> 

If you read the full introduction to draft-abarth-mime-sniff, you will see that it gives good security justification for using its particular sniffing rules and not some other set. I do not know if all those same security considerations would apply to using a subset of the rules. I do know that if UAs add their own rules, or use arbitrary other ones, then that is definitely a potential source of security problems. Perhaps that is a comment to raise on the mimesniff draft. In any case I don't think that trying to address that issue of what specific constraints should be placed on sniffing algorithms is best addressed by adding more optionality at the HTML5 level would not be a very good solution.
Comment 22 Julian Reschke 2010-02-09 14:37:51 UTC
(In reply to comment #19)
> ...
> (PS even though implementors don't always follow references, in this case there
> is no way to implement the required behavior at all without reading the
> referenced document.)
> ...

Implementers are only one of multiple audiences.

I'm not so concerned about implementers, I'm concerned about people reading just HTML5 and concluding that the spec requires sniffing (after all, it has a normative reference to "MIMESNIFF", right?)

One simple way to improve the situation would be to rename the reference.

Another one would be to make the actual references more useful. Right now (2010-02-09) HTML5 has:

"The Content-Type metadata of a resource must be obtained and interpreted in a manner consistent with the requirements of the Content-Type Processing Model specification. [MIMESNIFF]

The algorithm for extracting an encoding from a Content-Type, given a string s, is given in the Content-Type Processing Model specification. It either returns an encoding or nothing. [MIMESNIFF]

The sniffed type of a resource must be found in a manner consistent with the requirements given in the Content-Type Processing Model specification for finding that sniffed type. [MIMESNIFF]

The rules for sniffing images specifically and the rules for distingushing if a resource is text or binary are also defined in the Content-Type Processing Model specification. Both sets of rules return a MIME type as their result. [MIMESNIFF]

Warning: It is imperative that the rules in the Content-Type Processing Model specification be followed exactly. When a user agent uses different heuristics for content type detection than the server expects, security problems can occur. For more details, see the Content-Type Processing Model specification. [MIMESNIFF]"

That's right: *every single* paragraph ends with a reference to MIMESNIFF. It would be better to reference (and hyperlink) the relevant *sections* in MIMESNIFF that actually contain the referenced material, instead of letting the reader find out.

And yes, this means that the references may break if MIMESNIFF gets updated. That is a feature, not a bug. If MIMESNIFF is a normative reference than HTML5 be better checked every time it gets updated.


Comment 23 Anne 2010-02-09 15:02:16 UTC
I disagree strongly with your last paragraph. Updating references is a pain. And most often the changes do not affect your specification at all. The cost-benefit ratio is not good.
Comment 24 Julian Reschke 2010-02-09 15:16:14 UTC
(In reply to comment #23)
> I disagree strongly with your last paragraph. Updating references is a pain.
> And most often the changes do not affect your specification at all. The
> cost-benefit ratio is not good.

I don't believe it's avoidable when you have specific, normative references (which is the case here).
Comment 25 Larry Masinter 2010-02-11 09:45:00 UTC
"The Content-Type metadata of the result of fetching a representation from a resource depends on the URI scheme and corresponding protocol. However, there are some circumstances where additional heuristics (overriding the protocol defaults) are needed for compatibility with current web sites.  Guidelines for determining an appropriate content-type to presume are being developed elsewhere [MIMESNIFF]; in particular, user agents MUST NOT use additional heuristics or override authoritative metadata in ways that are not explicitly allowed."

This makes it clear that sniffing is MAY and not MUST, that the guidelines are when NOT to sniff (rather than when one MUST sniff), and allows the MIMESNIFF document to evolve independently of the HTML specification.

I think this would resolve the bugin a way that would let HTML go to Last Call.
If MIMESNIFF actually gets onto IETF standards track, then you can update the reference then.

Comment 26 Ian 'Hixie' Hickson 2010-02-17 21:43:59 UTC
> The request is to clarify that whenever MIMESNIFF is referred to, UAs may
> choose not to sniff, and instead accept the given Content-Type information as
> authoritative.

If there are any cases where this is not already the case, please highlight them. As far as I can tell, the spec already unambiguously allows this is all relevant cases.

I have to say, though, that personally I think this is a huge mistake. We're putting spec purity ahead of reliable interoperability here. If it was up to me, we'd go in the other direction entirely and make the algorithm unambiguously required in all cases, with no optional bits.

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: no spec change
Rationale: The spec seems to match the request. Please reopen if there are specific parts of the spec that you would like changed.

Regarding the discussion after the change request, please file separate bugs for each change.
Comment 27 Julian Reschke 2010-02-17 21:54:23 UTC
Will escalate.
Comment 28 Julian Reschke 2010-02-18 15:02:56 UTC
Now http://www.w3.org/html/wg/tracker/issues/104
Comment 29 Maciej Stachowiak 2010-03-14 14:51:53 UTC
This bug predates the HTML Working Group Decision Policy.

If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
  http://dev.w3.org/html5/decision-policy/decision-policy.html

This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.
Comment 30 Maciej Stachowiak 2010-04-20 06:47:20 UTC
Moving to CLOSED, since the tracker issue is now closed.