This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
"The Content-Type metadata of a resource must be obtained and interpreted in a manner consistent with the requirements of the Content-Type Processing Model specification. [MIMESNIFF]" This *seems* to undo a change we discussed a long time ago, and which resulted in content type sniffing being optional.
[MIMESNIFF]'s algorithm allows implementors to use the actual type without sniffing.
Are you referring to the end of <http://tools.ietf.org/html/draft-abarth-mime-sniff-01#section-1>? I agree that this makes it optional, but I think both HTML5 and MIMESNIFF could be clearer about that. For instance, the abstract says: Many web servers supply incorrect Content-Type headers with their HTTP responses. In order to be compatible with these servers, user agents must consider the content of HTTP responses as well as the Content-Type header when determining the effective media type of the response. This document describes an algorithm for determining the effective media type of HTTP responses that balances security and compatibility considerations. Note the "must consider the content". That doesn't sound optional at all.
(In reply to comment #2) > Are you referring to the end of > <http://tools.ietf.org/html/draft-abarth-mime-sniff-01#section-1>? > > I agree that this makes it optional, but I think both HTML5 and MIMESNIFF could > be clearer about that. For instance, the abstract says: > > Many web servers supply incorrect Content-Type headers with their > HTTP responses. In order to be compatible with these servers, user > agents must consider the content of HTTP responses as well as the > Content-Type header when determining the effective media type of the > response. This document describes an algorithm for determining the > effective media type of HTTP responses that balances security and > compatibility considerations. > > Note the "must consider the content". That doesn't sound optional at all. > I suspect this was an error in draft-abart-mime-sniff - it looks to me like it inadvertantly used the word "mist" in a non-normative context. The draft capitalizes RFC2119 keywords when it means them. This should be reported as a comment on MIMESNIFF.
> Note the "must consider the content". That doesn't sound optional at all. I've removed the "must" from this sentence.
http://www.ietf.org/id/draft-abarth-mime-sniff-02.txt
Not a bug in HTML5.
If people read the spec as requiring content sniffing, then yes, it is a problem with the spec. One way to address is would make it to bl clear in the reference to MIMESNIFF.
My understanding is that draft-abarth-mime-sniff has not been accepted by IETF to be on standards track yet, and so mandating its behavior normatively in the HTML document is inappropriate. See also http://www.w3.org/2001/tag/group/track/issues/24 W3C TAG issue 24, which is tracking this issue.
Well, even if it *was* on the standards track and ready, it *still* would be good if HTML5 clearly said that sniffing is optional. After all, we just heard from the editor (see http://lists.w3.org/Archives/Public/public-html/2010Feb/0164.html) that readers do not follow hyperlinks, so why treat this different from other cases?
Is this the TAG finding to which you refer: http://www.w3.org/2001/tag/doc/mime-respect-20060412 I haven't read it in detail yet, but it looks to be consistent with what HTML5 requires.
Sorry if my comment was misunderstood. I was just trying to point out that there was an extensive TAG discussion of this issue, under TAG issue 24: http://www.w3.org/2001/tag/group/track/issues/24, and that the topic was still open in the TAG and that anyone writing an HTML WG proposal to resolve the issue might well want to consult with the TAG and the proposals and discussions there, or even collaborate with the TAG on the issue. Some but not all of the www-tag@w3.org emails discussing this issue are linked from the issue or the actions associated with it. The TAG issue is still open. If there isn't a W3C HTML working group issue on the topic yet, when it is, please link to the TAG issue. Thanks, Larry
Well, this issue is only related. It is about whether HTML5 should *itself* clearly indicate that sniffing is optional, instead of delegating this question to the MIME-SNIFFING Internet-Draft.
The definition of the HyperText Markup Language should defer all protocol issues to separate specifications, so no, HTML shouldn't contain a reference to sniffing. If there needs to be a browser implementation guide, even the browser implementation guide should be modularized so that "Resolution of hypertext references (aka IRIs)" is a separate implementation guide, listing which schemes should be supported with reference to the scheme implementation guide. I think the "change proposal" I'd like to see would be to remove all references to sniffing to a separate spec, maybe I'll integrate this with the URL change proposal i need to update. I think if "sniffing" is how HTML browsers are expected to implement the HTTP scheme, it belongs in the HTTP scheme definition. At this point, I'd rather see barth-mime-sniff fixed so that it is actually acceptable to the HTTP implementing community best represented in HTTP-BIS. The current mime-sniff document still needs work, in my opinion, which is why i signed up to review it and propose different wording in http://www.w3.org/2001/tag/group/track/actions/386. Might take more than a couple of days, though.
I'm more than ok with removing mentions of sniffing from HTML5, but I'm not convinced that HTTP is the right place to move it to.
> After all, we just heard from the editor (see > http://lists.w3.org/Archives/Public/public-html/2010Feb/0164.html) that readers > do not follow hyperlinks, so why treat this different from other cases? We shouldn't. I'd be more than happy to move this text back into the spec, as it was when I wrote it and before members of the working group asked for it to be put into a separate spec.
(In reply to comment #15) > > After all, we just heard from the editor (see > > http://lists.w3.org/Archives/Public/public-html/2010Feb/0164.html) that readers > > do not follow hyperlinks, so why treat this different from other cases? > > We shouldn't. I'd be more than happy to move this text back into the spec, as > it was when I wrote it and before members of the working group asked for it to > be put into a separate spec. How about leaving it where it is, and just adding the clarification?
If you have a specific request, please file a bug (or reopen this bug, if the request is on topic for this bug) for consideration. (I'm not sure what clarification you're referring to.)
(In reply to comment #17) > If you have a specific request, please file a bug (or reopen this bug, if the > request is on topic for this bug) for consideration. (I'm not sure what > clarification you're referring to.) The request is to clarify that whenever MIMESNIFF is referred to, UAs may choose not to sniff, and instead accept the given Content-Type information as authoritative.
(In reply to comment #16) > (In reply to comment #15) > > > After all, we just heard from the editor (see > > > http://lists.w3.org/Archives/Public/public-html/2010Feb/0164.html) that readers > > > do not follow hyperlinks, so why treat this different from other cases? > > > > We shouldn't. I'd be more than happy to move this text back into the spec, as > > it was when I wrote it and before members of the working group asked for it to > > be put into a separate spec. > > How about leaving it where it is, and just adding the clarification? > draft-abarth-mime-sniff makes sniffing optional, but it would not be accurate to say following MIMESNIFF is optional. draft-abarth-mime-sniff-04 says: WARNING! Whenever possible, user agents SHOULD NOT employ a content sniffing algorithm. However, if a user agent does employ a content sniffing algorithm, the user agent SHOULD use the algorithm in this document because using a different content sniffing algorithm than servers expect causes security problems. For example, if a server believes that the client will treat a contributed file as an image (and thus treat it as benign), but a user agent believes the content to be HTML (and thus privileged to execute any scripts contained therein), an attacker might be able to steal the user's authentication credentials and mount other cross-site scripting attacks. In other words, it recommends that UAs should not sniff, but if they do, they should use this specific algorithm, not any others. HTML5 does not want that set of recommendations (either don't sniff, or if you do, use this algorithm) to be optional, though specifically choosing the sniffing side of that fork is optional. I think the only way to convey this accurately would be to duplicate the whole paragraph I just quoted, and even that might not be enough context without duplicating the whole MIMESNIFF introduction. I don't think that would be an improvement. (PS even though implementors don't always follow references, in this case there is no way to implement the required behavior at all without reading the referenced document.)
"... it recommends that UAs should not sniff, but if they do, they should use this specific algorithm, not any others. HTML5 does not want that set of recommendations (either don't sniff, or if you do, use this algorithm) to be optional, though specifically choosing the sniffing side of that fork is optional...." It is nonsensical to say that "HTML5 does not want". There is no entity "HTML5" that "wants" something. If sniffing is optional (that either sniffing or not sniffing areconforming), there is no reason why it should be non-complaint to, say, sniff for HTML when confronted with text/plain but not sniff for PDF when given text/plain. The "all or nothing" advice on sniffing is inappropriate. This is a comment on draft-abarth-mime-sniff-04 but is also a comment on the HTML specification from which it was derived. Trying to provide an exact algorithm for sniffing makes this difficult to fix; the right fix is to eliminate the algorithm and make normative constraints on the results instead.
(In reply to comment #20) > "... it recommends that UAs should not sniff, but if they do, they > should use this specific algorithm, not any others. HTML5 does not want that > set of recommendations (either don't sniff, or if you do, use this algorithm) > to be optional, though specifically choosing the sniffing side of that fork is > optional...." > > It is nonsensical to say that "HTML5 does not want". There is no entity "HTML5" > that "wants" something. If sniffing is optional (that either sniffing or not > sniffing areconforming), there is no reason why it should be non-complaint to, > say, sniff for HTML when confronted with text/plain but not sniff for PDF when > given text/plain. The "all or nothing" advice on sniffing is inappropriate. > > This is a comment on draft-abarth-mime-sniff-04 but is also a comment on the > HTML specification from which it was derived. > > Trying to provide an exact algorithm for sniffing makes this difficult to fix; > the right fix is to eliminate the algorithm and make normative constraints on > the results instead. > If you read the full introduction to draft-abarth-mime-sniff, you will see that it gives good security justification for using its particular sniffing rules and not some other set. I do not know if all those same security considerations would apply to using a subset of the rules. I do know that if UAs add their own rules, or use arbitrary other ones, then that is definitely a potential source of security problems. Perhaps that is a comment to raise on the mimesniff draft. In any case I don't think that trying to address that issue of what specific constraints should be placed on sniffing algorithms is best addressed by adding more optionality at the HTML5 level would not be a very good solution.
(In reply to comment #19) > ... > (PS even though implementors don't always follow references, in this case there > is no way to implement the required behavior at all without reading the > referenced document.) > ... Implementers are only one of multiple audiences. I'm not so concerned about implementers, I'm concerned about people reading just HTML5 and concluding that the spec requires sniffing (after all, it has a normative reference to "MIMESNIFF", right?) One simple way to improve the situation would be to rename the reference. Another one would be to make the actual references more useful. Right now (2010-02-09) HTML5 has: "The Content-Type metadata of a resource must be obtained and interpreted in a manner consistent with the requirements of the Content-Type Processing Model specification. [MIMESNIFF] The algorithm for extracting an encoding from a Content-Type, given a string s, is given in the Content-Type Processing Model specification. It either returns an encoding or nothing. [MIMESNIFF] The sniffed type of a resource must be found in a manner consistent with the requirements given in the Content-Type Processing Model specification for finding that sniffed type. [MIMESNIFF] The rules for sniffing images specifically and the rules for distingushing if a resource is text or binary are also defined in the Content-Type Processing Model specification. Both sets of rules return a MIME type as their result. [MIMESNIFF] Warning: It is imperative that the rules in the Content-Type Processing Model specification be followed exactly. When a user agent uses different heuristics for content type detection than the server expects, security problems can occur. For more details, see the Content-Type Processing Model specification. [MIMESNIFF]" That's right: *every single* paragraph ends with a reference to MIMESNIFF. It would be better to reference (and hyperlink) the relevant *sections* in MIMESNIFF that actually contain the referenced material, instead of letting the reader find out. And yes, this means that the references may break if MIMESNIFF gets updated. That is a feature, not a bug. If MIMESNIFF is a normative reference than HTML5 be better checked every time it gets updated.
I disagree strongly with your last paragraph. Updating references is a pain. And most often the changes do not affect your specification at all. The cost-benefit ratio is not good.
(In reply to comment #23) > I disagree strongly with your last paragraph. Updating references is a pain. > And most often the changes do not affect your specification at all. The > cost-benefit ratio is not good. I don't believe it's avoidable when you have specific, normative references (which is the case here).
"The Content-Type metadata of the result of fetching a representation from a resource depends on the URI scheme and corresponding protocol. However, there are some circumstances where additional heuristics (overriding the protocol defaults) are needed for compatibility with current web sites. Guidelines for determining an appropriate content-type to presume are being developed elsewhere [MIMESNIFF]; in particular, user agents MUST NOT use additional heuristics or override authoritative metadata in ways that are not explicitly allowed." This makes it clear that sniffing is MAY and not MUST, that the guidelines are when NOT to sniff (rather than when one MUST sniff), and allows the MIMESNIFF document to evolve independently of the HTML specification. I think this would resolve the bugin a way that would let HTML go to Last Call. If MIMESNIFF actually gets onto IETF standards track, then you can update the reference then.
> The request is to clarify that whenever MIMESNIFF is referred to, UAs may > choose not to sniff, and instead accept the given Content-Type information as > authoritative. If there are any cases where this is not already the case, please highlight them. As far as I can tell, the spec already unambiguously allows this is all relevant cases. I have to say, though, that personally I think this is a huge mistake. We're putting spec purity ahead of reliable interoperability here. If it was up to me, we'd go in the other direction entirely and make the algorithm unambiguously required in all cases, with no optional bits. EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html Status: Accepted Change Description: no spec change Rationale: The spec seems to match the request. Please reopen if there are specific parts of the spec that you would like changed. Regarding the discussion after the change request, please file separate bugs for each change.
Will escalate.
Now http://www.w3.org/html/wg/tracker/issues/104
This bug predates the HTML Working Group Decision Policy. If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.
Moving to CLOSED, since the tracker issue is now closed.