This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11402 - One problem of todays JavaScript libraries is, that the client has to download the same library over and over again, while visiting multiple sites. One could use services like Google Libraries API for a central location, but that introduces new points of
Summary: One problem of todays JavaScript libraries is, that the client has to downloa...
Status: RESOLVED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-24 22:15 UTC by contributor
Modified: 2013-02-06 20:37 UTC (History)
10 users (show)

See Also:


Attachments

Description contributor 2010-11-24 22:15:29 UTC
Specification: http://dev.w3.org/html5/spec/Overview.html
Section: http://www.whatwg.org/specs/web-apps/current-work/complete.html#top

Comment:
One problem of todays JavaScript libraries is, that the client has to download
the same library over and over again, while visiting multiple sites. One could
use services like Google Libraries API for a central location, but that
introduces new points of failure. For example, Google might be blocked in
certain countries, or might be hacked.

To solve this, I propose a new attribute for the script tag. I would call it
"hash", but that may change. Its value should contain a hash algorithm name,
followed by the hash sum of the referenced JavaScript source. 

If the browser recognizes the attribute, supports the requested hash algorithm
and the sum matches the JavaScript source, it can cache the file in a special
way: If another site references the same hash algorithm and hash sum, it may
use the cached library, even if the src attribute doesn't match. If one of the
mentioned conditions is not met, the hash attribute should be ignored.

I think, the HTML5 specs shouldn't request certain hash algorithms, as they
may brake or better algorithms may be discovered, but it should recommend the
support of sha1 and sha256.

jQuery for example could be referenced like this:

<script src="jquery-1.4.4.min.js"
hash="sha1:b75990598ee8d3895448ed9d08726af63109f842"></script>

or:

<script src="jquery-1.4.4.min.js"
hash="sha256:517364f2d45162fb5037437b5b6cb953d00d9b2b3b79ba87d9fe57ea6ee6070c"
></script>

Posted from: 80.171.214.158
Comment 1 Tab Atkins Jr. 2010-11-24 22:32:28 UTC
This has been suggested before, and it suffers from a trivial and fundamental cache-poisoning problem:

1) Figure out a set of common hashes for common libraries.
2) At your own site, set up an script that would do something evil if you could trick another site to run it for you, like steal bank info.
3) On a page of your own, link to the script multiple times, each time specifying a different hash from the list in (1).
4) Trick people into visiting your page.

Some fraction of people will visit your page with a fresh "script-hash" cache, thus associating your evil script with that hash.  Later, when they visit a page that uses the same hash legitimately to include a javascript library, your script runs instead, and in a trusted context.  Chaos ensures.

This can be trimmed down in important ways, like seeing what hash particular banking websites use, then setting up a standard phishing scheme that just sends the victim to your evil site, poisons their cache, and then immediately redirects them to the actual bank site.  The link in your phishing email thus appears to lead to the bank site like it's supposed to, and so even savvy users can be tricked.

This can potentially be mitigated by specifying particular hash algorithms that can be used, so the browser can verify that the script actually hashes to the provided value before committing it to the cache, but that still leaves us at the mercy of hashing algorithms being strong.  Had this been specified before we knew that MD5 was broken, for example, the attack described above would now be completely doable even *with* hash verification.
Comment 2 Kornel Lesinski 2010-11-24 23:18:00 UTC
(In reply to comment #1)
> This can potentially be mitigated by specifying particular hash algorithms that
> can be used, so the browser can verify that the script actually hashes to the
> provided value before committing it to the cache, but that still leaves us at
> the mercy of hashing algorithms being strong.

Obviously browsers would have to verify the hash and only share verified files.

> but that still leaves us at the mercy of hashing algorithms being strong.

Security of HTTPS is at mercy of hashing algorithms too.

> Had this been specified before we knew that MD5 was broken, for example, the attack described above would now be completely doable even *with* hash verification.

I disagree. Even today attack you've described is not know to be possible.

Replacing library that has already been signed by another website requires preimage attack, and not merely a collision. It's possible to generate MD5 collisions, but there is no known preimage attack yet.

If you only have a collision, you first need to convince attacked site to use specially crafted JavaScript file with unescaped, unmodified binary blob that you know collision for.
Comment 3 Aryeh Gregor 2010-11-24 23:42:43 UTC
(In reply to comment #1)
> 3) On a page of your own, link to the script multiple times, each time
> specifying a different hash from the list in (1).

Which would do nothing.  In a sane implementation, the browser would just keep a second index for its cache, indexed by the hash of the file.  This would allow it to save some disk space (since it only needs one copy of each file).  Then, if a site requested a resource using hash="", it would first check its index for the file and return it if present.  The hash attribute would be ignored when *storing* the file.

> This can potentially be mitigated by specifying particular hash algorithms that
> can be used, so the browser can verify that the script actually hashes to the
> provided value before committing it to the cache, but that still leaves us at
> the mercy of hashing algorithms being strong.  Had this been specified before
> we knew that MD5 was broken, for example, the attack described above would now
> be completely doable even *with* hash verification.

<tangent>
How so?  MD5 has collision attacks against it, but you'd need second-preimage attacks here, which are much harder.  Even MD4, which reportedly <http://valerieaurora.org/monkey.html> has collision attacks that can be carried out by hand, has no practical preimage attack.  (The best attack is 2^102 according to Wikipedia.)  The best preimage attack against MD5 is even less practical (2^123.4).

On top of that, you'd need a preimage attack that allowed you to substitute effectively arbitrary content.  Real-world preimage attacks might generate preimages that are short strings of gibberish, which would be useless for this attack.

In the unlikely event SHA256 (for example) does get a practical second-preimage attack against it anytime soon that's usable for this purpose, there will be plenty of advance warning.  Papers will have been published months or years before pointing out theoretical weaknesses and bringing attacks closer and closer to reach.  There will be ample time to retire it.

(For instance, MD5 had theoretical vulnerabilities first published in 1996, but the first practical break was around 2005 -- and that was only a collision attack.  SHA256 has no theoretical vulnerabilities published yet at all, so we probably have ten years or more before we need to worry about a break here.)

And of course, in the ludicrously implausible scenario that someone publishes a practical preimage attack on SHA256 when there hadn't been significant theoretical problems beforehand, even if they grossly violate ethical standards and publish it with zero advance warning, and even if they include sample code so that there's no delay before attackers get it -- even in this incredibly extreme case, it's just a zero-day vulnerability that happens to hit all browsers at once.  Release a trivial fix and it disappears overnight.  All you have to do to stop it is just clear this special cache and ignore hashes of the bad type.  It's not even a black-box detectable behavior change, it just causes extra page loads.

(It would be very cool if we could skip the whole problem by using a provably secure hash function: <http://en.wikipedia.org/wiki/Provably_secure_cryptographic_hash_function>.  You can construct hash functions whose collision resistance reduces to the security of Diffie-Hellman, for instance, so if they get broken we have bigger problems.  Sadly, they're all horribly inefficient, typically requiring lots of modular exponentiation or such for even small messages.)
</tangent>


The real problem with this is that it will bitrot.  If you update the file but don't update the hash in every single HTML file referring to it, then a bug will occur only for users who happen to have the old file in cache, which will be impossible to reproduce for other people.  Even if the user clears cache, another site might be repopulating it, so the bug will recur for them but not the site admin.  It's not obvious that the saved page loads are realistically worth the danger of this pitfall.  (C.f. resource packages.)
Comment 4 Johannes Barre 2010-11-30 14:50:07 UTC
Hi!

I reported this proposal in the first place, sorry for not replying so far.

> The real problem with this is that it will bitrot.  If you update the file but
> don't update the hash in every single HTML file referring to it, then a bug
> will occur only for users who happen to have the old file in cache, which will
> be impossible to reproduce for other people.  Even if the user clears cache,
> another site might be repopulating it, so the bug will recur for them but not
> the site admin.  It's not obvious that the saved page loads are realistically
> worth the danger of this pitfall.  (C.f. resource packages.)

Thats true. How about this:

1) The browser supports the hash attribute
1 1) The browser supports the requested hash algorithm:
1 1 1) The library is already cached (Hash matches) -> Use the cached file
1 1 2) The library is not cached -> Download the hash file and check the hash
1 1 2 1) The downloaded file matches the hash -> Use the downloaded file
1 1 2 2) The downloaded file doesn't match the hash -> Discard the downloaded file, raise an JS exception
1 2) The browser doesn't support the requested hash algorithm -> Download and use the specified file
2) The brower doesn't support the hash attribute -> Download and use the specified file

1 1 2 2) Would make the error easier to track.

Regards, Johannes
Comment 5 Tab Atkins Jr. 2010-11-30 16:57:43 UTC
(In reply to comment #4)
> 1) The browser supports the hash attribute
> 1 1) The browser supports the requested hash algorithm:
> 1 1 1) The library is already cached (Hash matches) -> Use the cached file
> 1 1 2) The library is not cached -> Download the hash file and check the hash
> 1 1 2 1) The downloaded file matches the hash -> Use the downloaded file
> 1 1 2 2) The downloaded file doesn't match the hash -> Discard the downloaded
> file, raise an JS exception
> 1 2) The browser doesn't support the requested hash algorithm -> Download and
> use the specified file
> 2) The brower doesn't support the hash attribute -> Download and use the
> specified file
> 
> 1 1 2 2) Would make the error easier to track.

(1 1 1) is the weak point here, and you haven't fixed it.  If they have a cached version and they see an un-updated hash that matches the cached version, they'll continue to get the cached version, not the newly updated one.

And, like Aryeh said, if the user clears their cache but then visits another page using the same older version of the library, they'll re-cache it and hit the same problem *again*.
Comment 6 Johannes Barre 2010-11-30 17:45:12 UTC
You are right, but the error is now easier to track, because the library will not be used, if you haven't cached it already and the hash doesn't match the file. Empty ones cache is a very common debugging procedure for web developers, so it's more likely now to hit the problem.

src and hash now have almost the same effect. If you forget to update your src attribute, if upload a new version of a library on the server under a different url, you will hit almost the same problem.
Comment 7 Aryeh Gregor 2010-11-30 17:52:18 UTC
(In reply to comment #5)
> (1 1 1) is the weak point here, and you haven't fixed it.  If they have a
> cached version and they see an un-updated hash that matches the cached version,
> they'll continue to get the cached version, not the newly updated one.

Yeah, but anyone who visits with a clean cache is going to get a JavaScript error raised and the script won't load.  This will probably break the page and thus get fixed quickly.  (If it doesn't noticeably break the page, probably loading an old version instead won't either.)

But this just makes bitrot detectable, it doesn't make it painless.  The feature basically says "If you use this and aren't totally sure that you're going to update every single one of your hashes correctly, stuff will break badly, and the only advantage is that users who don't have a library cached from your site but for some reason do have it cached from another site will see slightly better load times."

I don't think it's plausibly worth it.  Cache churn is too great.  The only thing that would make it worthwhile is if browsers shipped with a standard array of various library versions that didn't get cleared from cache -- that would mean you have a 100% hit rate for those libraries on those browsers.

If that's the primary use, then you can make it simpler.  Just have browsers agree on a set of standard library names to prepackage, then use something like

<script src="scripts/jquery-1.4.2.js" library="jquery-1.4.2"></script>

and browsers would agree on what the exact file for "jquery-1.4.2" is, bit for bit.  So basically, instead of a general-purpose sharing scheme, just develop the notion of a standard library for JS.  The set of supported libraries could be updated out-of-band, like malware lists and so on, so users don't have to upgrade their browsers to get the latest libraries.  (Agreeing on which libraries you want to make standard is the tricky part, of course . . .)
Comment 8 Kornel Lesinski 2010-11-30 22:05:48 UTC
(In reply to comment #7)

> If that's the primary use, then you can make it simpler.  Just have browsers
> agree on a set of standard library names to prepackage, then use something like
> 
> <script src="scripts/jquery-1.4.2.js" library="jquery-1.4.2"></script>
> 
> and browsers would agree on what the exact file for "jquery-1.4.2" is, bit for
> bit.  So basically, instead of a general-purpose sharing scheme, just develop
> the notion of a standard library for JS.

Libraries simply bundled with browsers would be annoying just like outdated browsers themselves &#8212; webmasters would again face choice between forgoing this feature and sticking to a buggy version that was shipped with the browser years ago.
Comment 9 Tab Atkins Jr. 2010-11-30 22:11:06 UTC
(In reply to comment #8)
> (In reply to comment #7)
> 
> > If that's the primary use, then you can make it simpler.  Just have browsers
> > agree on a set of standard library names to prepackage, then use something like
> > 
> > <script src="scripts/jquery-1.4.2.js" library="jquery-1.4.2"></script>
> > 
> > and browsers would agree on what the exact file for "jquery-1.4.2" is, bit for
> > bit.  So basically, instead of a general-purpose sharing scheme, just develop
> > the notion of a standard library for JS.
> 
> Libraries simply bundled with browsers would be annoying just like outdated
> browsers themselves &#8212; webmasters would again face choice between forgoing
> this feature and sticking to a buggy version that was shipped with the browser
> years ago.

To be fair, no they wouldn't, not the way this was suggested.  The author can just provide both @src and @library, so browsers that don't understand @library or that don't recognize the library specified would just use the @src like normal.

The problem is that once you add a library to the browser you can *never remove it*.  It's now just part of the exposed API.  I dunno how serious this would be in practice.  It would add up fairly quickly, though, adding megabytes to the browser's download size.
Comment 10 Aryeh Gregor 2010-12-02 18:07:38 UTC
(In reply to comment #9)
> The problem is that once you add a library to the browser you can *never remove
> it*.  It's now just part of the exposed API.

Not as long as people provide working src="".  Of course, if all browsers agree on library versions to bundle, people will leave out src="" or point it someplace that doesn't work.

On the other hand, if all browsers agree byte-for-byte on what library versions to bundle, that includes the OS default browser, so browsers can just use the OS versions.  OSes can afford to package a handful more libraries every year indefinitely, I imagine.
Comment 11 Shelley Powers 2010-12-02 19:29:06 UTC
Visiting any number of sites, once we get past the multitude of DNS lookups because of the two dozen or so "social networking" widgets embedded throughout the page, you'll then spend enormous amounts of time downloading big images in ads, and not to mention videos, as well as waiting for Google Analytics and whatever else to load--once it's all been dredged up from an over-burdened database (both at the site, and at remote locations, in the case of remotely managed ads, comments, and so on).

You'll drum your fingers on your iPad, waiting for all this cruft to load. The one thing you probably won't have to wait long for is that tiny JavaScript library. 

I as a web developer can tell you one thing: when I create a web application using a JavaScript library, I want you to load what I tell you to load. I don't want to risk someone hacking through the security--which they will, which we know they will--and doing who knows what to my readers or my site, because of all the magnitude of garbage that gets loaded into a web page, we don't want to load something like jQuery (26kb) more than once.
Comment 12 Ian 'Hixie' Hickson 2011-01-11 00:25:04 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: Before we can standardise this kind of thing, we need implementation experience, to show that it could work. I recommend approaching some browser vendors and suggesting this.
Comment 13 Michael[tm] Smith 2011-08-04 05:04:58 UTC
mass-moved component to LC1
Comment 14 Edward O'Connor 2013-02-06 20:37:50 UTC
*** Bug 20789 has been marked as a duplicate of this bug. ***