This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11895 - Make Downloads more reliable by specifying checksums
Summary: Make Downloads more reliable by specifying checksums
Status: RESOLVED NEEDSINFO
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-27 19:29 UTC by Andrew Roth
Modified: 2013-03-05 00:02 UTC (History)
9 users (show)

See Also:


Attachments

Description Andrew Roth 2011-01-27 19:29:00 UTC
Problem Statement:
For many downloads (especially large files) the publisher often releases MD5 checksums in order to assure that the download arrives as intended.  The problem is that this checksum often requires a manual check by the recipient of the download.  The recipient is often unwilling or unable to do this verification and continues on assuming the download is good, without actually checking.

Relates to Spec section:
http://dev.w3.org/html5/spec/links.html#links-created-by-a-and-area-elements
http://dev.w3.org/html5/spec/text-level-semantics.html#the-a-element

Possible Solution:
A possible solution would be to include the checksum as a machine readable attribute or tag that specifies the checksum and algorithm for the user agent to verify the download after the download finishes, giving instant feedback to the user.

An example implementation could be:
<p><a href="/files/livecd.iso" checksum="e10c75da3d1aa147ddd4a5c58bfc3646" chkfunc="md5">Click Here to Download</a></p>

This would suggest (not mandate) to the user agent that it verify the download using the MD5 checksum included in the opening 'a' tag.  The reason for specifying the algorithm would be to allow for future enhancements of new (possibly better) checksum functions to be included.
Comment 1 Boris Zbarsky 2011-01-27 19:50:55 UTC
The Content-MD5 HTTP header should do this, no?
Comment 2 Andrew Roth 2011-01-27 20:45:13 UTC
I had not heard of this before (sorry for not researching better).

I found the spec:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15
And the support in Apache:
http://httpd.apache.org/docs/2.2/mod/core.html#contentdigest

Do you know of any website that implements this?  Or if the major user agents (Firefox/IE/Chrome) support it?

This places a burden on the server because it has to compute the checksum upon every request.  It would be much better to compute it once, and serve it many times.

To my knowledge, I don't know of any websites that implement this for large file downloads.  Implementing this change would make it easier for developers and publishers to ensure files are downloaded correctly.

I will try to check and research more on its implementation.
Comment 3 Boris Zbarsky 2011-01-27 21:14:51 UTC
> Or if the major user agents (Firefox/IE/Chrome) support it?

Firefox does not, but there's a patch in progress to add support in the next few months, most likely.

I don't know about the others.


> This places a burden on the server because it has to compute the checksum upon
> every request.

Or just store it on the server and put it in the header, right?
Comment 4 Aryeh Gregor 2011-01-28 20:58:54 UTC
Content-MD5 doesn't help you if the incorrect file was uploaded, or if it was corrupted on disk.  Also, it's harder to configure.
Comment 5 Andrew Roth 2011-01-28 22:12:44 UTC
> Content-MD5 doesn't help you if the incorrect file was uploaded, or if it was
> corrupted on disk.  Also, it's harder to configure.

Unfortunately, nothing will help if you (I presume the publisher) upload the wrong file.  This proposed solution will only make all downloads show up as failed.

Content-MD5 is, however, much more difficult (from the designer/publisher perspective) to configure and use.

> > This places a burden on the server because it has to compute the checksum upon
> > every request.
> 
> Or just store it on the server and put it in the header, right?

Ref: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15

According to the spec, it states:
"Only origin servers or clients MAY generate the Content-MD5 header field"

Ref: http://httpd.apache.org/docs/2.2/mod/core.html#contentdigest

Apache's httpd documentation states that the message digest is calculated upon each request, placing a burden on the web server.  In fact, they seem to recommend *against* using this feature by saying:
"Note that this can cause performance problems on your server since the message digest is computed on every request (the values are not cached)."

I want to stress the negative associated with generating the hash upon each request.  For files where this would be useful (very large files) the checksum would take a great deal of compute power to calculate (compared to not calculating, and simply serving the file).  This is normally not expected of web servers, especially shared hosting.

Yes, I understand that by using some scripting magic, one can use PHP or CGI to send stored hash values in the header prior to sending the actual file.  This increases the complexity and difficulty of implementation, which is simply not ideal.
Comment 6 Boris Zbarsky 2011-01-28 22:54:00 UTC
> "Only origin servers or clients MAY generate the Content-MD5 header field"

That's not a problem for the use cases this bug is about.  That just says proxies can't generate this header.

Apache's behavior is pretty suboptimal.  I'd argue that it's a bug, in fact.
Comment 7 Aryeh Gregor 2011-01-30 00:13:35 UTC
(In reply to comment #5)
> Unfortunately, nothing will help if you (I presume the publisher) upload the
> wrong file.  This proposed solution will only make all downloads show up as
> failed.

Which is correct, since they're all corrupted.  Presumably users would complain in this case and the publisher could fix it, rather than all users just silently getting corrupted files.

> Apache's httpd documentation states that the message digest is calculated upon
> each request, placing a burden on the web server.  In fact, they seem to
> recommend *against* using this feature by saying:
> "Note that this can cause performance problems on your server since the message
> digest is computed on every request (the values are not cached)."

This is just because nobody actually has any use for the feature, so nobody's bothered optimizing it.  It's not a necessary problem.  In particular, if downloads are served by a web application instead of by the web server directly, it would be pretty trivial to have the web app compute the hash once at upload time and serve the header.  Commonly you want some kind of user application to serve the file anyway to handle permissions or store download statistics; the performance impact can be negligible if you use X-Sendfile or similar.

But a solution in HTML seems more useful anyway, for the reasons I gave.
Comment 8 Ian 'Hixie' Hickson 2011-02-16 09:57:21 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Partially Accepted
Change Description: none yet
Rationale: This has been proposed a number of times over the last few years. It probably deserves a closer look again, especially from the context of an optional reliability indicator that gives the user a warning rather that a mandatory integrity check that blocks access to the file altogether. However, it's probably best if we wait for the new features we've already added to be implemented more widely before we start adding more features, so I'm going to mark this LATER for now.
Comment 9 Michael[tm] Smith 2011-08-04 05:05:04 UTC
mass-moved component to LC1
Comment 10 Andrew Roth 2012-01-24 12:10:21 UTC
Hello,

I wanted to reopen this because, as the editor noted, "it
probably deserves a closer look again, especially from the context of an
optional reliability indicator that gives the user a warning rather that a
mandatory integrity check that blocks access to the file altogether."

This would be a very useful tool to allow user agents to detect if a file is corrupted or tampered with in transit or on the server.  Since HTML5 now has wider adoption with the basic features, now would be a good time to add this feature to the specification.

Any further discussion?
Comment 11 inreacma 2012-04-02 09:33:01 UTC
I'm working on a Firefox add-on that would support this, however I would like to see this as a native implementation in browsers.

My suggestion would be to simplify the tag like this: 

<a href="..." checksum="md5:e10c75da3d1aa147ddd4a5c58bfc3646">

Both the hash method and the value could be stored in one attribute.
Also as an alternative one might agree on a data-attribute, for example:

<a href="..." data-checksum="md5:e10c75da3d1aa147ddd4a5c58bfc3646">

This is what my Firefox add-on uses right now.
https://github.com/grundid/checksum-verifier
Comment 12 Robin Berjon 2013-01-21 15:59:08 UTC
Mass move to "HTML WG"
Comment 13 Robin Berjon 2013-01-21 16:01:54 UTC
Mass move to "HTML WG"
Comment 14 Travis Leithead [MSFT] 2013-03-05 00:02:00 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:


   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Additional Information Needed
Change Description: No Spec Change
Rationale:

This is a great idea for an extension spec! If you are interested in building a proposal for this concept, the working group would love to provide feedback. A how-to guide is now available for putting extension specifications together (including the definition of an "extension spec"):

http://www.w3.org/html/wg/wiki/ExtensionHowTo

Thanks!