<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>11895</bug_id>
          
          <creation_ts>2011-01-27 19:29:00 +0000</creation_ts>
          <short_desc>Make Downloads more reliable by specifying checksums</short_desc>
          <delta_ts>2013-03-05 00:02:00 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>HTML5 spec</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>NEEDSINFO</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Andrew Roth">andrew.in.snow+w3c</reporter>
          <assigned_to name="This bug has no owner yet - up for the taking">dave.null</assigned_to>
          <cc>bzbarsky</cc>
    
    <cc>ian</cc>
    
    <cc>inreacma</cc>
    
    <cc>mike</cc>
    
    <cc>mounir</cc>
    
    <cc>plh</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>travil</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>44809</commentid>
    <comment_count>0</comment_count>
    <who name="Andrew Roth">andrew.in.snow+w3c</who>
    <bug_when>2011-01-27 19:29:00 +0000</bug_when>
    <thetext>Problem Statement:
For many downloads (especially large files) the publisher often releases MD5 checksums in order to assure that the download arrives as intended.  The problem is that this checksum often requires a manual check by the recipient of the download.  The recipient is often unwilling or unable to do this verification and continues on assuming the download is good, without actually checking.

Relates to Spec section:
http://dev.w3.org/html5/spec/links.html#links-created-by-a-and-area-elements
http://dev.w3.org/html5/spec/text-level-semantics.html#the-a-element

Possible Solution:
A possible solution would be to include the checksum as a machine readable attribute or tag that specifies the checksum and algorithm for the user agent to verify the download after the download finishes, giving instant feedback to the user.

An example implementation could be:
&lt;p&gt;&lt;a href=&quot;/files/livecd.iso&quot; checksum=&quot;e10c75da3d1aa147ddd4a5c58bfc3646&quot; chkfunc=&quot;md5&quot;&gt;Click Here to Download&lt;/a&gt;&lt;/p&gt;

This would suggest (not mandate) to the user agent that it verify the download using the MD5 checksum included in the opening &apos;a&apos; tag.  The reason for specifying the algorithm would be to allow for future enhancements of new (possibly better) checksum functions to be included.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>44811</commentid>
    <comment_count>1</comment_count>
    <who name="Boris Zbarsky">bzbarsky</who>
    <bug_when>2011-01-27 19:50:55 +0000</bug_when>
    <thetext>The Content-MD5 HTTP header should do this, no?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>44814</commentid>
    <comment_count>2</comment_count>
    <who name="Andrew Roth">andrew.in.snow+w3c</who>
    <bug_when>2011-01-27 20:45:13 +0000</bug_when>
    <thetext>I had not heard of this before (sorry for not researching better).

I found the spec:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15
And the support in Apache:
http://httpd.apache.org/docs/2.2/mod/core.html#contentdigest

Do you know of any website that implements this?  Or if the major user agents (Firefox/IE/Chrome) support it?

This places a burden on the server because it has to compute the checksum upon every request.  It would be much better to compute it once, and serve it many times.

To my knowledge, I don&apos;t know of any websites that implement this for large file downloads.  Implementing this change would make it easier for developers and publishers to ensure files are downloaded correctly.

I will try to check and research more on its implementation.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>44815</commentid>
    <comment_count>3</comment_count>
    <who name="Boris Zbarsky">bzbarsky</who>
    <bug_when>2011-01-27 21:14:51 +0000</bug_when>
    <thetext>&gt; Or if the major user agents (Firefox/IE/Chrome) support it?

Firefox does not, but there&apos;s a patch in progress to add support in the next few months, most likely.

I don&apos;t know about the others.


&gt; This places a burden on the server because it has to compute the checksum upon
&gt; every request.

Or just store it on the server and put it in the header, right?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>44859</commentid>
    <comment_count>4</comment_count>
    <who name="Aryeh Gregor">ayg</who>
    <bug_when>2011-01-28 20:58:54 +0000</bug_when>
    <thetext>Content-MD5 doesn&apos;t help you if the incorrect file was uploaded, or if it was corrupted on disk.  Also, it&apos;s harder to configure.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>44869</commentid>
    <comment_count>5</comment_count>
    <who name="Andrew Roth">andrew.in.snow+w3c</who>
    <bug_when>2011-01-28 22:12:44 +0000</bug_when>
    <thetext>&gt; Content-MD5 doesn&apos;t help you if the incorrect file was uploaded, or if it was
&gt; corrupted on disk.  Also, it&apos;s harder to configure.

Unfortunately, nothing will help if you (I presume the publisher) upload the wrong file.  This proposed solution will only make all downloads show up as failed.

Content-MD5 is, however, much more difficult (from the designer/publisher perspective) to configure and use.

&gt; &gt; This places a burden on the server because it has to compute the checksum upon
&gt; &gt; every request.
&gt; 
&gt; Or just store it on the server and put it in the header, right?

Ref: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15

According to the spec, it states:
&quot;Only origin servers or clients MAY generate the Content-MD5 header field&quot;

Ref: http://httpd.apache.org/docs/2.2/mod/core.html#contentdigest

Apache&apos;s httpd documentation states that the message digest is calculated upon each request, placing a burden on the web server.  In fact, they seem to recommend *against* using this feature by saying:
&quot;Note that this can cause performance problems on your server since the message digest is computed on every request (the values are not cached).&quot;

I want to stress the negative associated with generating the hash upon each request.  For files where this would be useful (very large files) the checksum would take a great deal of compute power to calculate (compared to not calculating, and simply serving the file).  This is normally not expected of web servers, especially shared hosting.

Yes, I understand that by using some scripting magic, one can use PHP or CGI to send stored hash values in the header prior to sending the actual file.  This increases the complexity and difficulty of implementation, which is simply not ideal.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>44870</commentid>
    <comment_count>6</comment_count>
    <who name="Boris Zbarsky">bzbarsky</who>
    <bug_when>2011-01-28 22:54:00 +0000</bug_when>
    <thetext>&gt; &quot;Only origin servers or clients MAY generate the Content-MD5 header field&quot;

That&apos;s not a problem for the use cases this bug is about.  That just says proxies can&apos;t generate this header.

Apache&apos;s behavior is pretty suboptimal.  I&apos;d argue that it&apos;s a bug, in fact.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>44884</commentid>
    <comment_count>7</comment_count>
    <who name="Aryeh Gregor">ayg</who>
    <bug_when>2011-01-30 00:13:35 +0000</bug_when>
    <thetext>(In reply to comment #5)
&gt; Unfortunately, nothing will help if you (I presume the publisher) upload the
&gt; wrong file.  This proposed solution will only make all downloads show up as
&gt; failed.

Which is correct, since they&apos;re all corrupted.  Presumably users would complain in this case and the publisher could fix it, rather than all users just silently getting corrupted files.

&gt; Apache&apos;s httpd documentation states that the message digest is calculated upon
&gt; each request, placing a burden on the web server.  In fact, they seem to
&gt; recommend *against* using this feature by saying:
&gt; &quot;Note that this can cause performance problems on your server since the message
&gt; digest is computed on every request (the values are not cached).&quot;

This is just because nobody actually has any use for the feature, so nobody&apos;s bothered optimizing it.  It&apos;s not a necessary problem.  In particular, if downloads are served by a web application instead of by the web server directly, it would be pretty trivial to have the web app compute the hash once at upload time and serve the header.  Commonly you want some kind of user application to serve the file anyway to handle permissions or store download statistics; the performance impact can be negligible if you use X-Sendfile or similar.

But a solution in HTML seems more useful anyway, for the reasons I gave.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>45696</commentid>
    <comment_count>8</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-02-16 09:57:21 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Partially Accepted
Change Description: none yet
Rationale: This has been proposed a number of times over the last few years. It probably deserves a closer look again, especially from the context of an optional reliability indicator that gives the user a warning rather that a mandatory integrity check that blocks access to the file altogether. However, it&apos;s probably best if we wait for the new features we&apos;ve already added to be implemented more widely before we start adding more features, so I&apos;m going to mark this LATER for now.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>52881</commentid>
    <comment_count>9</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2011-08-04 05:05:04 +0000</bug_when>
    <thetext>mass-moved component to LC1</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>63048</commentid>
    <comment_count>10</comment_count>
    <who name="Andrew Roth">andrew.in.snow+w3c</who>
    <bug_when>2012-01-24 12:10:21 +0000</bug_when>
    <thetext>Hello,

I wanted to reopen this because, as the editor noted, &quot;it
probably deserves a closer look again, especially from the context of an
optional reliability indicator that gives the user a warning rather that a
mandatory integrity check that blocks access to the file altogether.&quot;

This would be a very useful tool to allow user agents to detect if a file is corrupted or tampered with in transit or on the server.  Since HTML5 now has wider adoption with the basic features, now would be a good time to add this feature to the specification.

Any further discussion?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66329</commentid>
    <comment_count>11</comment_count>
    <who name="">inreacma</who>
    <bug_when>2012-04-02 09:33:01 +0000</bug_when>
    <thetext>I&apos;m working on a Firefox add-on that would support this, however I would like to see this as a native implementation in browsers.

My suggestion would be to simplify the tag like this: 

&lt;a href=&quot;...&quot; checksum=&quot;md5:e10c75da3d1aa147ddd4a5c58bfc3646&quot;&gt;

Both the hash method and the value could be stored in one attribute.
Also as an alternative one might agree on a data-attribute, for example:

&lt;a href=&quot;...&quot; data-checksum=&quot;md5:e10c75da3d1aa147ddd4a5c58bfc3646&quot;&gt;

This is what my Firefox add-on uses right now.
https://github.com/grundid/checksum-verifier</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>81707</commentid>
    <comment_count>12</comment_count>
    <who name="Robin Berjon">robin</who>
    <bug_when>2013-01-21 15:59:08 +0000</bug_when>
    <thetext>Mass move to &quot;HTML WG&quot;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>81825</commentid>
    <comment_count>13</comment_count>
    <who name="Robin Berjon">robin</who>
    <bug_when>2013-01-21 16:01:54 +0000</bug_when>
    <thetext>Mass move to &quot;HTML WG&quot;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>83971</commentid>
    <comment_count>14</comment_count>
    <who name="Travis Leithead [MSFT]">travil</who>
    <bug_when>2013-03-05 00:02:00 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:


   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Additional Information Needed
Change Description: No Spec Change
Rationale:

This is a great idea for an extension spec! If you are interested in building a proposal for this concept, the working group would love to provide feedback. A how-to guide is now available for putting extension specifications together (including the definition of an &quot;extension spec&quot;):

http://www.w3.org/html/wg/wiki/ExtensionHowTo

Thanks!</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>