Bug 17974 - appcache: Add an API to make appcache support caching specific URLs dynamically (and/or a JS "server" or "interceptor" for uncached resources?)
Summary: appcache: Add an API to make appcache support caching specific URLs dynamical...
Status: RESOLVED DUPLICATE of bug 20084
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-18 07:26 UTC by contributor
Modified: 2012-11-26 02:45 UTC (History)
12 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description contributor 2012-07-18 07:26:44 UTC
This was was cloned from bug 14364 as part of operation convergence.
Originally filed: 2011-10-03 14:00:00 +0000
Original reporter: Louis-R <louisremi@mozilla.com>

================================================================================
 #0   Louis-R                                         2011-10-03 14:00:00 +0000 
--------------------------------------------------------------------------------
A simple way to make the appcache dynamic would be to allow data-uris as manifests, to allow scripts to require new ressources to be cached, without server round-trips.

This is of course not an ideal solution to make the appcache dynamic, but it is one easy to implement and to get out of the door quickly.
================================================================================
 #1   Ian 'Hixie' Hickson                             2011-10-03 22:35:53 +0000 
--------------------------------------------------------------------------------
We're not going to add sub-optimal solutions just so we can get something out one year earlier, when the Web is going to last decades. :-)

What we need here is a clear understanding of the use cases and requirements. What are the cases where you're wishing you could add URLs to the appcache dynamically?
================================================================================
 #2   Philipp Hagemeister                             2011-10-07 12:35:38 +0000 
--------------------------------------------------------------------------------
Wouldn't that allow anyone to hijack a website forever?

1. Attacker temporarily gains control over the content of http://example.com/ , and writes

<html manifest="data:text/cache-manifest;base64,Q0FDSEUgTUFOSUZFU1QK">
example.com defaced!
</html>

2. User visits http://example.com/, puts the page in appcache.

3. Rightful owner of example.com regains control (or domain ownership changes if the domain was hijacked, ...).

4. User visits http://example.com/, still sees defacement.

How can the rightful owner of example.com ever serve the user anything?


On the other hand, locking the content (and scripts) of a website forever could also provide benefits to a carefully-engineered project. JavaScript on the page could somehow download the new version, cryptographically verify it (beyond SSL, which may be compromised by .gov actors, like google.com in Iran recently), and only then update to the new version.
================================================================================
 #3   Ian 'Hixie' Hickson                             2011-10-21 22:43:39 +0000 
--------------------------------------------------------------------------------
Yeah we're definitely not using data: for this.


Status: Did Not Understand Request
Change Description: no spec change
Rationale: What are the use cases for making appcache dynamic? (I'm not saying there aren't any, I just need to know what they are to design the solution for them.)
================================================================================
 #4   Louis-R                                         2011-10-24 19:52:37 +0000 
--------------------------------------------------------------------------------
Granted, using data isn't the best option.

I've written an extensive blog post about the use cases for a dynamic appcache: http://www.louisremi.com/2011/10/07/offline-web-applications-were-not-there-yet/

tl;dr: if you build an rss reader with checkbox to make articles available offline, it's easy to store/delete the text content of the article at will using localStorage or indexedDb, but it's impossible to store/delete associated images (and sounds/videos). You could dynamically generate a cache manifest for all "offline enabled" articles, but the client would have to re-download all resources every-time the manifest is updated, as you know. (and you can't store images as data-uris, since they come from different origins)

Mozilla implemented a simple "OfflineResourceList" API which solves that problem by enhancing applicationCache with "add()" and "remove()" methods.
This is the kind of solution I am looking for, although "add" is a confusing name, since it should be able to update a particular resource too.

There is a risk that this API could cause confusion amongst web developers. Should they use a cache manifest or abandon it completely in favor of the JS API? I believe the cache manifest should be advocated to be used for the application structure+presentation+logic (HTML, CSS, JS), while the dynamic API should be used for the application *content* (medias, xml, json).
================================================================================
 #5   Ian 'Hixie' Hickson                             2011-10-25 02:26:46 +0000 
--------------------------------------------------------------------------------
Thanks, will investigate.
================================================================================
 #6   Ian 'Hixie' Hickson                             2011-10-27 00:15:14 +0000 
--------------------------------------------------------------------------------
So the problem is that you write an application that, while online, downloads a bunch of data from the server, and this data includes references to cross-origin images, and you want to make sure that those immediately get cached too, so that when the user later goes offline and tries to use that data, the browser won't otherwise be able to show the images?

You can work around that today using the FALLBACK section, no? (List the foreign image sites as fallback namespaces that fall back to a "broken image" icon, say, and then when you fetch all the data from your server, quickly also create <img> elements for all those foreign images. They'll then be cached.)

Still, I could see how that wouldn't be satisfactory. So for this use case, we'd need an API to add a URL to the cache manually, an API to remove a URL from the cache manually, and an API to list all the files that have been added manually? That seems easy enough to support.
================================================================================
 #7   Ian 'Hixie' Hickson                             2011-11-03 16:03:26 +0000 
--------------------------------------------------------------------------------
Status: Partially Accepted
Change Description: none yet
Rationale: The use case described in comment 6 seems reasonable. I have marked this LATER so that we can look add this once browsers have caught up with what we've specified so far.
================================================================================
 #8   Simon Pieters                                   2011-11-04 06:16:57 +0000 
--------------------------------------------------------------------------------
I believe this has already happened.
================================================================================
 #9   Ian 'Hixie' Hickson                             2011-11-04 17:08:04 +0000 
--------------------------------------------------------------------------------
I didn't mean just with appcache.

Do I take it from your comment that there is implementation interest in adding this now?
================================================================================
 #10  Anne                                            2011-11-15 12:18:52 +0000 
--------------------------------------------------------------------------------
It seems both developers and implementors want this, yes.
================================================================================
 #11  michaeln@google.com                             2011-11-15 22:48:10 +0000 
--------------------------------------------------------------------------------
I think this request makes sense but is not the most pressing issue to resolve, this would be of great convenience. 

But tweeking the model for loading pages from, and associating pages with, and updating caches such that it works for wider variety of use cases is more of a priority (imo). I'd like to see that get in better shape prior to mixing in support for ad-hoc resources.
================================================================================
 #12  Ian 'Hixie' Hickson                             2012-05-03 18:12:24 +0000 
--------------------------------------------------------------------------------
An idea I was kicking around would be to instead have just a way to declare a JS file as being a local interceptor, and then have that JS file be automatically launched in a worker thread, and then every network request gets proxied through that worker in some well-defined manner. The worker could then either say "do whatever you would normally do for that URL", or "redirect to this URL and try again", or "here's the data for that URL".

That would allow authors to implement the above add/remove functionality themselves just by pushing the data into a blob store (FIlesystem API, Index DB), which would be just a few lines of code, while also allowing much more flexible approaches.

Any opinions?
================================================================================
 #13  Philipp Hagemeister                             2012-05-03 21:05:53 +0000 
--------------------------------------------------------------------------------
The JavaScript redirector sounds fantastic, but it sounds complicated to implement in the current state.

Wouldn't it be way simpler to just load a defined fallback HTML document? For example, given the following appcache:

CACHE MANIFEST
ALIAS:
/x.html /serve-file.html
/files/* /serve-file.html
# serve-file.html is automatically included in the appcache

The request to /files/test.html would just render serve-file.html, but under the original (window.)location (just like FALLBACK does). In fact, ALIAS would be exactly like a FALLBACK entry that always fails to load. Additionally, the * placeholder would allow marking whole multiple URLs as belonging to the manifest.

On review, this seems very easy to implement, both for user agent and web application authors.

As a downside, it doesn't allow embedding of non-HTML resources like images. It does allow downloads via window.location.replace(dataUri). To me, that doesn't like a big deal since any dynamically generated page should be using data URIs for dynamically generated images/scripts/styles in the first place.
================================================================================
 #14  Ian 'Hixie' Hickson                             2012-05-04 18:10:01 +0000 
--------------------------------------------------------------------------------
The idea would be to render pages, images, etc from data in IndexDB, not to just to hardcode aliases. (This is in the context of wanting to add and remove URLs from the appcache, which would be easily implementable using a worker as described above.)
================================================================================
 #15  michaeln@google.com                             2012-05-04 22:54:11 +0000 
--------------------------------------------------------------------------------
> Wouldn't it be way simpler to just load a defined fallback HTML document? For
> example, given the following appcache:
> 
> CACHE MANIFEST
> ALIAS:
> /x.html /serve-file.html
> /files/* /serve-file.html
> # serve-file.html is automatically included in the appcache

Chromium's appcache actually has a feature that's very close to whats described here, with a slightly different syntax. The url in the first column is considered a namespace prefix just like entries in the FALLBACK section.

CHROMIUM-INTERCEPT:
/Bugs/Public/show_bug.cgi?id= return /Bugs/Public/bug_shower_page.html

http://code.google.com/p/chromium/issues/detail?id=101565
http://codereview.chromium.org/8396013/

I dont think this addresses what this particular w3c issue is about.
================================================================================
Comment 1 Ian 'Hixie' Hickson 2012-10-26 23:32:20 UTC
So the idea here is that an appcache manifest contains a same-origin reference to a JS file known as its interceptor. When a Document's application cache is complete and has a declared interceptor, the networking model changes to a third model that acts as follows:
 - open a connection to a worker, or create one if none yet exists, that is an
   ApplicationCacheInterceptWorkerGlobalScope for the given application cache,
   using the JS file for the interceptor as mentioned in the manifest.
 - each time there is a network request to the same origin as the manifest,
   send a MessageEvent event to this worker using the event name "request",
   whose payload is an object of the following form:
      {
         method: 'GET', // or POST or whatever, '' for non-HTTP(S) origins
         url: 'http://www.example.com/file.png', // the url being fetched
         headers: {
           'header': ['value', 'value'] // each HTTP request header
         },
         body: '', // the request body (e.g. for POST requests)
         port: a_MessagePort_object,
      }
 - The passed port expects data in the following manner:
    - the first message to be sent has to be one of these:
       - a Blob or File, which is treated as the resource payload.
       - null, which is treated like {} as described below.
       - an object with an attribute named "action", whose value is
         interpreted as follows:
           - "passthrough": fall back to the normal appcache net model.
           - "cache": serve the file from the cache, or act if it is a
             network error if the file isn't there.
           - "network": do it via the network, ignoring the cache.
           - anything else: act as if the "action" attribute is
             absent, as described next.
       - an object without the "action" attribute, which is then
         treated as meaning the resource had a network error.
       - anything else, which is stringified and then treated as the
         response including headers, but possibly incomplete.
    - the second and subsequent messages, which are only acted upon if
      the first was not an object, are either of these:
       - null, which is treated like {} as described below.
       - an object, in which case the resource is assumed to be
         finished, as if the network connection had closed.
       - anything else, which is stringified and treated as a more
         response data.
    - if there's a Content-Size header, and data is transmitted past
      the specified size (as interpreted per HTTP rules), then the
      extraneous data is discarded.
 - swapCache() disconnects from the worker if there is one (so that
   the new cache's worker can kick in if necessary).

One possible problem with this approach is that it doesn't kick in until a Document exists, which is after the first attempt at fetching a file: the main "master" file thus always comes from the cache (or maybe network, in the case of prefer-online stuff). Is that a problem?

One much more serious problem with this approach is that it totally fails to handle the use case in #4 above. Specifically, it doesn't work for cross-origin images (because you can't let the interceptor see the cookies or data from cross-origin resources), and it doesn't work for an RSS reader that puts the articles in an <iframe> (because <iframe>s aren't fed from the appcache of their parent browsing context).

Maybe we should file a separate bug for the interceptor idea and more clearly lay out the use cases for that idea.

For the RSS reader idea not based on <iframe>s, how about just adding a simple API to applicationCache for adding and removing extra files?

   applicationCache.addExtra(url);
   applicationCache.removeExtra(url);
   applicationCache.getExtras(function (files) { ... }); // files is an Array of URLs
Comment 2 sridhar 2012-11-18 18:26:28 UTC
I think post #4 was talking about the add/remove/get methods. Was there a specific reason the interceptor model came into picture ? 

The uses cases btw from my side for having the add/remove/get methods is the following: (not sure if it's in the right format :))

- offline mp3 player : User needs an option to see which files are available offline ..
- online mp3 player : Users needs to able to add and remove files from local storage because storage is limited and cannot expect the user to cache all mp3 files.
Comment 3 Ian 'Hixie' Hickson 2012-11-26 02:45:01 UTC
Yeah, I went off into the weeds here. I've filed two bugs to replace this:

   bug 20084 for the add/remove API
   bug 20083 for the interceptor

*** This bug has been marked as a duplicate of bug 20084 ***