This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Problem Statement: For many downloads (especially large files) the publisher often releases MD5 checksums in order to assure that the download arrives as intended. The problem is that this checksum often requires a manual check by the recipient of the download. The recipient is often unwilling or unable to do this verification and continues on assuming the download is good, without actually checking. Relates to Spec section: http://dev.w3.org/html5/spec/links.html#links-created-by-a-and-area-elements http://dev.w3.org/html5/spec/text-level-semantics.html#the-a-element Possible Solution: A possible solution would be to include the checksum as a machine readable attribute or tag that specifies the checksum and algorithm for the user agent to verify the download after the download finishes, giving instant feedback to the user. An example implementation could be: <p><a href="/files/livecd.iso" checksum="e10c75da3d1aa147ddd4a5c58bfc3646" chkfunc="md5">Click Here to Download</a></p> This would suggest (not mandate) to the user agent that it verify the download using the MD5 checksum included in the opening 'a' tag. The reason for specifying the algorithm would be to allow for future enhancements of new (possibly better) checksum functions to be included.
The Content-MD5 HTTP header should do this, no?
I had not heard of this before (sorry for not researching better). I found the spec: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15 And the support in Apache: http://httpd.apache.org/docs/2.2/mod/core.html#contentdigest Do you know of any website that implements this? Or if the major user agents (Firefox/IE/Chrome) support it? This places a burden on the server because it has to compute the checksum upon every request. It would be much better to compute it once, and serve it many times. To my knowledge, I don't know of any websites that implement this for large file downloads. Implementing this change would make it easier for developers and publishers to ensure files are downloaded correctly. I will try to check and research more on its implementation.
> Or if the major user agents (Firefox/IE/Chrome) support it? Firefox does not, but there's a patch in progress to add support in the next few months, most likely. I don't know about the others. > This places a burden on the server because it has to compute the checksum upon > every request. Or just store it on the server and put it in the header, right?
Content-MD5 doesn't help you if the incorrect file was uploaded, or if it was corrupted on disk. Also, it's harder to configure.
> Content-MD5 doesn't help you if the incorrect file was uploaded, or if it was > corrupted on disk. Also, it's harder to configure. Unfortunately, nothing will help if you (I presume the publisher) upload the wrong file. This proposed solution will only make all downloads show up as failed. Content-MD5 is, however, much more difficult (from the designer/publisher perspective) to configure and use. > > This places a burden on the server because it has to compute the checksum upon > > every request. > > Or just store it on the server and put it in the header, right? Ref: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15 According to the spec, it states: "Only origin servers or clients MAY generate the Content-MD5 header field" Ref: http://httpd.apache.org/docs/2.2/mod/core.html#contentdigest Apache's httpd documentation states that the message digest is calculated upon each request, placing a burden on the web server. In fact, they seem to recommend *against* using this feature by saying: "Note that this can cause performance problems on your server since the message digest is computed on every request (the values are not cached)." I want to stress the negative associated with generating the hash upon each request. For files where this would be useful (very large files) the checksum would take a great deal of compute power to calculate (compared to not calculating, and simply serving the file). This is normally not expected of web servers, especially shared hosting. Yes, I understand that by using some scripting magic, one can use PHP or CGI to send stored hash values in the header prior to sending the actual file. This increases the complexity and difficulty of implementation, which is simply not ideal.
> "Only origin servers or clients MAY generate the Content-MD5 header field" That's not a problem for the use cases this bug is about. That just says proxies can't generate this header. Apache's behavior is pretty suboptimal. I'd argue that it's a bug, in fact.
(In reply to comment #5) > Unfortunately, nothing will help if you (I presume the publisher) upload the > wrong file. This proposed solution will only make all downloads show up as > failed. Which is correct, since they're all corrupted. Presumably users would complain in this case and the publisher could fix it, rather than all users just silently getting corrupted files. > Apache's httpd documentation states that the message digest is calculated upon > each request, placing a burden on the web server. In fact, they seem to > recommend *against* using this feature by saying: > "Note that this can cause performance problems on your server since the message > digest is computed on every request (the values are not cached)." This is just because nobody actually has any use for the feature, so nobody's bothered optimizing it. It's not a necessary problem. In particular, if downloads are served by a web application instead of by the web server directly, it would be pretty trivial to have the web app compute the hash once at upload time and serve the header. Commonly you want some kind of user application to serve the file anyway to handle permissions or store download statistics; the performance impact can be negligible if you use X-Sendfile or similar. But a solution in HTML seems more useful anyway, for the reasons I gave.
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html Status: Partially Accepted Change Description: none yet Rationale: This has been proposed a number of times over the last few years. It probably deserves a closer look again, especially from the context of an optional reliability indicator that gives the user a warning rather that a mandatory integrity check that blocks access to the file altogether. However, it's probably best if we wait for the new features we've already added to be implemented more widely before we start adding more features, so I'm going to mark this LATER for now.
mass-moved component to LC1
Hello, I wanted to reopen this because, as the editor noted, "it probably deserves a closer look again, especially from the context of an optional reliability indicator that gives the user a warning rather that a mandatory integrity check that blocks access to the file altogether." This would be a very useful tool to allow user agents to detect if a file is corrupted or tampered with in transit or on the server. Since HTML5 now has wider adoption with the basic features, now would be a good time to add this feature to the specification. Any further discussion?
I'm working on a Firefox add-on that would support this, however I would like to see this as a native implementation in browsers. My suggestion would be to simplify the tag like this: <a href="..." checksum="md5:e10c75da3d1aa147ddd4a5c58bfc3646"> Both the hash method and the value could be stored in one attribute. Also as an alternative one might agree on a data-attribute, for example: <a href="..." data-checksum="md5:e10c75da3d1aa147ddd4a5c58bfc3646"> This is what my Firefox add-on uses right now. https://github.com/grundid/checksum-verifier
Mass move to "HTML WG"
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the Editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the Tracker Issue; or you may create a Tracker Issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html Status: Additional Information Needed Change Description: No Spec Change Rationale: This is a great idea for an extension spec! If you are interested in building a proposal for this concept, the working group would love to provide feedback. A how-to guide is now available for putting extension specifications together (including the definition of an "extension spec"): http://www.w3.org/html/wg/wiki/ExtensionHowTo Thanks!