14702 – appcache: always up-to-date applications

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 14702 - appcache: always up-to-date applications

Summary: appcache: always up-to-date applications

Status:	RESOLVED WONTFIX

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 critical
Target Milestone:	---
Assignee:	Ian 'Hixie' Hickson
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:	comment 27
Keywords:

Depends on:
Blocks:

Reported:	2011-11-05 17:35 UTC by Anne
Modified:	2012-06-14 22:49 UTC (History)
CC List:	18 users (show)

See Also:

Attachments

Description Anne 2011-11-05 17:35:38 UTC

We should have a new section in the manifest that allows for files that should always be requested when online, but not when offline. The master entry should be allowed there. That way you can be sure dynamically generated pages are always up to date, while also still working offline.

Comment 1 Anne 2011-11-05 17:39:06 UTC

See also: http://www.w3.org/2011/web-apps-ws/papers/Facebook.html and https://gist.github.com/1341809

Comment 2 Anne 2011-11-05 18:43:08 UTC

Apparently Microsoft requested this before in 13168. To be clear, you cannot rely on the HTTP cache because:

1. It is more likely to go away.
2. Resources used by other pages will not be cached (e.g. help pages).
3. Using a manifest gives you control over complete application including its expiry time. You also do not have to perform conditional requests to see if any of the resources has been updated meanwhile.

Comment 3 Anne 2011-11-05 18:43:29 UTC

Sorry, meant to write bug 13168 to make it link.

Comment 4 Yehuda Katz 2011-11-05 21:34:25 UTC

Additionally, Microsoft's semantics are not quite right. When the user agent is offline, you *do* want to use the master entry. This makes it a full-on offline app, with better updating semantics when the user agent is online for many use-cases.

In light of that, [NETWORK] is not exactly the right API.

Comment 5 Adrian Bateman [MSFT] 2011-11-05 22:58:41 UTC

We were trying to add the functionality while still being interoperable with current implementations. You could add the master page to the fallback section. We also have scenarios where you wouldn't want that (for example, where you are trying to optimise connected applications and not support disconnected).

We're certainly open to suggestions for better alternatives.

Comment 6 Ian 'Hixie' Hickson 2011-11-11 00:29:45 UTC

So this is basically requesting an entry similar to FALLBACK except that the resource is only used if it can't be fetched, even if it is already cached the network is still hit?

I don't understand why this is better than the HTTP cache. If the idea is to hit the network when you can, but fall back to the cache when the resource is either not expired or the network is down, then that seems like _exactly_ what local HTTP caching is for. Appcache is only useful because it lets you get to the data _before_ you check to see if it's up to date.

Comment 7 Adrian Bateman [MSFT] 2011-11-11 16:43:08 UTC

This is better than the HTTP cache for all the reasons why AppCache needs to exist at all: to provide a predictable life time for the caching of content.

AppCache contains this functionality already with FALLBACK. The problem is that the master entry cannot currently participate in the FALLBACK contract. This means a news site that I want to enable while disconnected is always shown out of date even when connected.

Comment 8 michaeln 2011-11-11 20:09:51 UTC

Solving this problem would go a long way to "fixing" the appcache and make broader adoption possible. Imho, this is the biggest piece of low hanging fruit to take care of and the most often requested change.

> So this is basically requesting...

The way i understand this request is to allow pages retrieved over the network to utilize the resources in an application cache for subresource loads, without itself being added to that application cache. The point is to speed up subresource loads.

The automagic pinning of 'master' is often not desired. And once added to the cache, there is no way to remove one from the cache. This automagic behavior may be causing more problems then its solving imo.

There's more than one way to express the desire to "use but not add" in the API (pick your bike shed color).

a. A new html element attribute

   <html useManifest='xxx'>

b. A new OPTIONS section in the manifest file to turn off automagic adding of master entries. (this is essentially what ms has done with different string constants).

   OPTIONS:
   cache-master-entries = false


Once you get past the syntax, there are some questions about how exactly the subresources can be utilized during page load, we don't want to load stale content.  The second link below starts to address those questions.

There were some posts to the whatwg list on exactly this topic from last feb.
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2011-February/030410.html
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2011-February/030590.html

Comment 9 Yehuda Katz 2011-11-11 20:43:00 UTC

Also, while this requests the resource when it can be fetched (the user agent is online), it requires that it  be available when the user agent is offline as well.

Fallback semantics are basically right here, but a bit weird. I think something like this might technically work, but it doesn't describe the author's intention very well.

FALLBACK:
index.html index.html

Comment 10 Yehuda Katz 2011-11-11 20:47:38 UTC

(In reply to comment #9)
> Also, while this requests the resource when it can be fetched (the user agent
> is online), it requires that it  be available when the user agent is offline as
> well.
> 
> Fallback semantics are basically right here, but a bit weird. I think something
> like this might technically work, but it doesn't describe the author's
> intention very well.

When I say this might technically work, I mean if the master entry was allowed to also appear in the FALLBACK section, not the behavior as it is spec'ed today.

> 
> FALLBACK:
> index.html index.html

Comment 11 Ian 'Hixie' Hickson 2011-11-11 22:47:35 UTC

(In reply to comment #7)
> This is better than the HTTP cache for all the reasons why AppCache needs to
> exist at all: to provide a predictable life time for the caching of content.

Appcache does not provide a predictable lifetime for cached content. Nothing stops a UA from arbitrarily evicting an appcache.


> AppCache contains this functionality already with FALLBACK. The problem is that
> the master entry cannot currently participate in the FALLBACK contract. This
> means a news site that I want to enable while disconnected is always shown out
> of date even when connected.

This is incorrect. URLs that match FALLBACK entries, once cached, are served from the cache, not from the network as is being requested here. The only reason you can't use FALLBACK with a master entry is that the master entry is already cached.


(In reply to comment #8)
> 
> The way i understand this request is to allow pages retrieved over the network
> to utilize the resources in an application cache for subresource loads, without
> itself being added to that application cache. The point is to speed up
> subresource loads.

This is *exactly* what an HTTP cache does. You don't need appcache for this specific feature. (This is not what was described earlier in this bug, however.)

Comment 12 Yehuda Katz 2011-11-11 23:40:26 UTC

Let me try to articulate the use-case clearly.

A web author has a web site, say a blog, with significant static content, but which is regularly updated. He develops this web site using HTML, and updates it routinely as desired.

Many of the readers of his site would like the ability to read this content reliably offline, but he would also like to make sure that any new content is available immediately. He does not know JavaScript, so he cannot notify the user that new content has become available using the applicationCache API. Alternately, he does not want to interrupt his users from reading content when the application cache is updated in the background.

At present, in order to make his content available offline in a reliable way, he must use the application cache. However, once he has done so, his users will always receive the last version of his content, even if they are online. To avoid this problem, he can use JavaScript to notify the user that the site has new content, but he may not know enough JavaScript to achieve this, and even if he does, it produces a degraded user experience.

Comment 13 michaeln 2011-11-12 00:54:13 UTC

(In reply to comment #11)
> (In reply to comment #7)
> > This is better than the HTTP cache for all the reasons why AppCache needs to
> > exist at all: to provide a predictable life time for the caching of content.
> 
> Appcache does not provide a predictable lifetime for cached content. Nothing
> stops a UA from arbitrarily evicting an appcache.

In practice, an appcached resource will generally outlive an http cached resource. We're looking for ways to take advantage of that.

> > The way i understand this request is to allow pages retrieved over the network
> > to utilize the resources in an application cache for subresource loads, without
> > itself being added to that application cache. The point is to speed up
> > subresource loads.
> 
> This is *exactly* what an HTTP cache does. You don't need appcache for this
> specific feature. (This is not what was described earlier in this bug,
> however.)

The essence of this request is to define some semantics whereby 'online' sites can effectively make use of appcached resources. Consider a site that has populated an appcache with many megabytes of resources. To rely on the HTTP cache would likely mean to redownload all of them when first visiting the site after having been away for a while (costly), then things work great. (And then if they wanted an appcache to be updated as an artifact of visiting the 'online' page, they'd have to do that separately with a iframe (hassle)). We're looking for a way for things to work great from the git go without having to repopulate the http cache and without having to hack around with iframes.

The combination of *dont-cache-masters* + *fallbacks* is one way to provide semantics along those lines. If a request for an online-only-master succeeds, it should be able to dip into the pile of appcached resources w/o incurring roundtrips (in many cases). If a request for an online-only-master fails, a fallback thats explicitly listed in the manifest can be used. 

I'm less worried about the guy that doesn't know how to use JavaScript than the professional developer than can't do much with the feature set as it stands because its too narrowly focused on unrealistic use cases.

Comment 14 Yehuda Katz 2011-11-12 01:01:50 UTC

As to HTTP caches, there are several issues that are addressed by the application cache:

* HTTP caches are not atomic, which means that individual assets or pages can be evicted separately from each other. This is very important behavior for offline web sites.
* Browsers are not required to serve pages offline that are in HTTP caches, and their current behavior is unreliable and spotty from a user perspective. Efforts to specify this behavior more rigorously would probably be more work (and less in scope) than the proposed improvement to app cache.

Application cache provides a way to tell the user agent that a series of assets should be inserted in the cache together, evicted rarely and atomically, and served when offline. The use-case for this feature takes advantage of all of those improvements over normal HTTP caching semantics.

Comment 15 Jonas Sicking (Not reading bugmail) 2011-11-12 01:55:00 UTC

I'll also jump in and try to articulate the use case.

Consider facebook. Currently facebook doesn't support appcache (at least, for the sake of argument, let's say it does not).

When the user clicks the facebook bookmark, or types "facebook.com" in the URL bar, the site is loaded from the network, or if any occasional resources are cached in the HTTP cache, loaded from HTTP cache.

Same if the user receives a facebook URL through email and clicks said url. The url is opened in the browser and loaded from network/http cache.

Now say that facebook wants to "enable offline". If they simply add a manifest to all pages that they want to offline, this means that when the user goes to facebook the first time in the morning, he/she will see the appcache-cached version, not the one from the network.

This is not acceptable for this use case. While it's an improvement in performance, it's a degradation in user experience since the user will see an outdated version of the website. The downside in this case is a bigger problem than the gained performance.

For facebook to continue to "work", they have to create a whole new set of URLs which are only used for the offline version of the website.

However this means that the user has to use a separate bookmark or type a separate url in the URL bar to go to the offline version. And any links sent to the user will not work since they are pointing to the online version of facebook.

None of these things are good and defeats one of the main benefit of webapps, that you can type a single url to open them.

To fix this, I propose that we change the semantics such that if the UA is online, the UA is instructed to only use the appcache *after having checked that the cached version is up-to-date*. In a naive implementation, this generally means that on the first load the user will not get the version from the app-cache, but rather load the app from the network.

However UAs can have more smarts than that if they want. For example if the UA knows that the user uses facebook a lot, it can check that the appcache is up-to-date even before the user goes to the site. That way, once the user goes there, the UA has already ensured that the latest version is the cached one and can load directly from the appcache.

The UA is free to use whatever heuristics to determine which sites it should aggressively pre-cache. For example if the user has bookmarked the site, has gone there over 10 times during the past week, is in the list of top 10 sites the user visits, has created a app-tab for the site, or has through UA specific UI indicated that they would like the site to be "preloaded", then the UA can choose to keep the appcache for the site up-to-date.

We could even add a section to the manifest which allows the site to indicate how often it would like the UA to check for new versions of the app. So if the site indicates that an update should be checked for every 5 hours, then the UA can then use the above heuristics to determine if it should aggressively check that often, or if it should simply check when the user goes to the site if it hasn't checked in the last 5 hours.

In other words, the semantics of this how-often-to-update value would be that if the UA hasn't checked for an update more recently than the indicated time, then the UA should not use the appcache-cached version if the user is online.

Comment 16 Ian 'Hixie' Hickson 2011-12-02 03:29:53 UTC

(In reply to comment #15)
> 
> Now say that facebook wants to "enable offline". If they simply add a manifest
> to all pages that they want to offline, this means that when the user goes to
> facebook the first time in the morning, he/she will see the appcache-cached
> version, not the one from the network.

No, what they do is they separate the content of the pages from the data in the pages, and they fetch the data on page load.


> This is not acceptable for this use case. While it's an improvement in
> performance, it's a degradation in user experience since the user will see an
> outdated version of the website.

No, the data will be fetched as soon as the page loads and thus will appear just as fast as it would if there was no appcache at all, except the page itself will load faster (only the data has to be downloaded, not all the supporting app code).


> For facebook to continue to "work", they have to create a whole new set of URLs
> which are only used for the offline version of the website.

No, they just use the same pages.


> To fix this, I propose that we change the semantics such that if the UA is
> online, the UA is instructed to only use the appcache *after having checked
> that the cached version is up-to-date*.

Then you lose the entire performance benefit of appcache, which is that you can run the app immediately without any network latency.



As far as I can tell, the use case for the feature being discussed here is that authors want to be able to author pages where:
 - for users who go to a page for the first time, the page loads as now, without any appcache stuff.
 - for users who go to the page again, the UA somehow checks with the server as it is loading the page to get a list of which of the resources that are already cached for the page can be kept as is and which need to be fetched afresh.
 - for users who go to the page again while the page returns a result other than 200, 404, or 410, the UA just uses the old page, the same way appcache works today.

Is that right?

We can certainly support this use case, but it's not at all obvious that any of the proposals that have been made so far actually achieve this.

Comment 17 Yehuda Katz 2011-12-02 03:42:39 UTC

(In reply to comment #16)
> (In reply to comment #15)
> > 
> > Now say that facebook wants to "enable offline". If they simply add a manifest
> > to all pages that they want to offline, this means that when the user goes to
> > facebook the first time in the morning, he/she will see the appcache-cached
> > version, not the one from the network.
> 
> No, what they do is they separate the content of the pages from the data in the
> pages, and they fetch the data on page load.
> 
> 
> > This is not acceptable for this use case. While it's an improvement in
> > performance, it's a degradation in user experience since the user will see an
> > outdated version of the website.
> 
> No, the data will be fetched as soon as the page loads and thus will appear
> just as fast as it would if there was no appcache at all, except the page
> itself will load faster (only the data has to be downloaded, not all the
> supporting app code).
> 
> 
> > For facebook to continue to "work", they have to create a whole new set of URLs
> > which are only used for the offline version of the website.
> 
> No, they just use the same pages.
> 
> 
> > To fix this, I propose that we change the semantics such that if the UA is
> > online, the UA is instructed to only use the appcache *after having checked
> > that the cached version is up-to-date*.
> 
> Then you lose the entire performance benefit of appcache, which is that you can
> run the app immediately without any network latency.
> 
> 
> 
> As far as I can tell, the use case for the feature being discussed here is that
> authors want to be able to author pages where:
>  - for users who go to a page for the first time, the page loads as now,
> without any appcache stuff.
>  - for users who go to the page again, the UA somehow checks with the server as
> it is loading the page to get a list of which of the resources that are already
> cached for the page can be kept as is and which need to be fetched afresh.

No. The manifest (or alternately, the index page) would be checked for freshness, and if it was fresh, the app cache's assets would be used. Only a single blocking request would be required, and only if the UA was online.

>  - for users who go to the page again while the page returns a result other
> than 200, 404, or 410, the UA just uses the old page, the same way appcache
> works today.

It's important that this use-case still wants atomicity of the resources, with just a single blocking request to determine whether the entire manifest was stale or fresh, and with the same update semantics as a stale app cache.

> 
> Is that right?

Additionally, if the UA is offline, the UA uses the cached content.

The desire is to have all of the semantics of app cache, especially atomicity, but a blocking way to determine freshness (in an online UA), instead of always assuming freshness and determining whether an update is required in the background.

> 
> We can certainly support this use case, but it's not at all obvious that any of
> the proposals that have been made so far actually achieve this.

Comment 18 Ian 'Hixie' Hickson 2011-12-02 20:49:52 UTC

wycats: I'm asking about the use case, you're telling me about your preferred solution. Please keep them separate so that we can evaluate the proposed solutions against the use cases.

Comment 19 Tobie Langel 2011-12-02 20:53:59 UTC

Currently collecting a series of use-cases internally at Facebook.
Hoping to get back to you by December, 13.

--tobie

Comment 20 Ian 'Hixie' Hickson 2011-12-05 20:33:18 UTC

Here are some theoretical timing diagrams showing how page loads would progress under various scenarios. In particular here I distinguish between the main page having mixed data and structure, with a number of external resources critical to the page load (the typical design these days), and the main page having just the structure, with the data being one of the external resources. For simplicity here I just assume we have styles, scripts, and images as the other external resources. Also I assume here that the images and styles have Expires headers, the data is always changing so cannot be cached, and the scripts have ETag or If-Modified-Since caching but no Expires headers and thus have to be checked every time in the normal (non-appcache) case, and the structure, when provided separate from the data, can be cached, but that the user agent always hits the network for the top-level load if it's not appcached.

Mixed data and structure, uncached:
 [RTT][Structure+Data]
         [RTT][Styles]
         [RTT][Images]
         [RTT][Scripts]

Mixed data and structure, cached:
 [RTT][Structure+Data]
         | (Styles)
         | (Images)
         [RTT] (Scripts)

Mixed data and structure, proposed appcache mechanism with manifest unchanged:
 [RTT][Structure+Data]
 [RTT] (Manifest) 
     | (Styles)
     | (Images)
     | (Scripts)
  
Mixed data and structure, proposed appcache mechanism with manifest changed but most resources unchanged:
 [RTT][Structure+Data]
 [RTT][Manifest]
       | (Styles)
       | (Images)
       [RTT] (Scripts)
    
Split data and structure, uncached:
 [RTT][Structure]
         [RTT][Styles]
         [RTT][Images]
         [RTT][Scripts]
         [RTT][Data]
  
Split data and structure, cached:
 [RTT] (Structure)
     | (Styles)
     | (Images)
     [RTT] (Scripts)
     [RTT][Data]

Split data and structure, appcached:
 | (Structure)
 | (Styles)
 | (Images)
 | (Scripts)
 [RTT][Data]

It's really not clear to me, given this, why anyone would want to use the mixed structure and data model these days. It's harder to maintain, is no faster than the split model, and if you use appcache, is forcibly slower.


Given how many people are asking for this, it seems reasonable that we should consider how to add the feature, but as far as I can tell it is never the right solution from an authoring perspective.

Comment 21 Tab Atkins Jr. 2011-12-05 22:10:28 UTC

(In reply to comment #20)
> It's really not clear to me, given this, why anyone would want to use the mixed
> structure and data model these days. It's harder to maintain, is no faster than
> the split model, and if you use appcache, is forcibly slower.
> 
> Given how many people are asking for this, it seems reasonable that we should
> consider how to add the feature, but as far as I can tell it is never the right
> solution from an authoring perspective.

The "split model" requires authors to fundamentally change how they author pages, so that most of the useful parts of the page are generated from JS based on an external data file.

Some authors do indeed embrace this model, but the vast majority still do not - they generate the pages whole on the server-side and then send them to the client.  Changing to the split model requires pretty substantial changes to the authoring workflow and often substantial amounts of code rewriting.  It also means that an accidental JS error takes down the whole site, rather than just killing some functionality, which is often not enough to render the site unfunctioning as a whole.


Authors shouldn't have to substantially change their coding practices to benefit from caching.

To be very specific about the use-case, it is to take an existing page authored using current standard practices and make it offline-capable with a minimum of changes to the site's structure, without interfering with normal online interactions.  The external resources for the site should be all cached together or not cached at all, so that pages don't half-load and end up looking or acting broken (this is the failure-mode of relying on the HTTP cache).

Comment 22 michaeln 2011-12-05 22:34:15 UTC

(In reply to comment #16)
> (In reply to comment #15)
> > 
> > Now say that facebook wants to "enable offline". If they simply add a manifest
> > to all pages that they want to offline, this means that when the user goes to
> > facebook the first time in the morning, he/she will see the appcache-cached
> > version, not the one from the network.
> 
> No, what they do is they separate the content of the pages from the data in the
> pages, and they fetch the data on page load.
> 
> 
> > This is not acceptable for this use case. While it's an improvement in
> > performance, it's a degradation in user experience since the user will see an
> > outdated version of the website.
> 
> No, the data will be fetched as soon as the page loads and thus will appear
> just as fast as it would if there was no appcache at all, except the page
> itself will load faster (only the data has to be downloaded, not all the
> supporting app code).
> 
> 
> > For facebook to continue to "work", they have to create a whole new set of URLs
> > which are only used for the offline version of the website.
> 
> No, they just use the same pages.
> 
> 
> > To fix this, I propose that we change the semantics such that if the UA is
> > online, the UA is instructed to only use the appcache *after having checked
> > that the cached version is up-to-date*.
> 
> Then you lose the entire performance benefit of appcache, which is that you can
> run the app immediately without any network latency.

This is where we have to be careful to not break the existing behavior. I'm fairly certain there are some developers that would not like to see that no latency characteristic go away.

> As far as I can tell, the use case for the feature being discussed here is that
> authors want to be able to author pages where:
>  - for users who go to a page for the first time, the page loads as now,
> without any appcache stuff.
>  - for users who go to the page again, the UA somehow checks with the server as
> it is loading the page to get a list of which of the resources that are already
> cached for the page can be kept as is and which need to be fetched afresh.
>  - for users who go to the page again while the page returns a result other
> than 200, 404, or 410, the UA just uses the old page, the same way appcache
> works today.
> 
> Is that right?

Yes to your first points. Less clear on your third point. For some developers, they'd rather rely on the existing fallback mechanism to handle that case. The automatic insertion of 'master-entries' is problematic in other ways too, for example they can't be removed once added, so they'd like to not add them to start with.
 
> We can certainly support this use case, but it's not at all obvious that any of
> the proposals that have been made so far actually achieve this.

I believe the proposal to 'dont-cache-masters' achieves the first two of your bullets, but not the third.

Comment 23 michaeln 2011-12-13 22:47:35 UTC

> To fix this, I propose that we change the semantics such that if the UA is
> online, the UA is instructed to only use the appcache *after having checked
> that the cached version is up-to-date*.

Would an option to enable an 'updateBeforeUsing' behavior satisfy this?

OPTIONS:
updateBeforeUsing=true

When set to true, prior to returning a resource from an appcache, the system should validate/update the associated manifest.

Comment 24 Ian 'Hixie' Hickson 2011-12-13 23:41:14 UTC

That's essentially black-box indistinguishable from the "proposed appcache mechanism" I modeled in comment 20, assuming an incremental approach (i.e. start using the resources as soon as possible, with the caching going on in parallel).

The performance characteristics of such a model seem rather poor (barely distinguishable from straight HTTP caching in the common case). I don't really see the benefit.

At the end of the day, assuming you can properly set up your HTTP caching (which is easy — set long expiry times and change the resource URLs when they are updated, pretty standard practice), I really don't see how adding a manifest makes much of a difference, other than making the file available offline.

Anyway, as pointless as I think this feature is, since people want to implement and people want to use it, my opinion doesn't really matter.

Here's what I propose:
- add a new section to manifests which works like the CACHE section, but also flags the resources with a new flag.
- when you fetch a file during navigation and it's in an appcache but has this new flag and isn't marked foreign, fetch it from the network and immediately kick off a cache update.
- if the file comes back 200, then act as if you were not associated with an appcache (same as the first time you fetch a file that later is found to have a manifest), except use the existing appcache as an HTTP cache. Once the appcache update is complete, treat the appcache as if it was guaranteed to hold the latest copies of everything it has, so you never have to hit the network for cached resources. If the file is found to not have a manifest="", then mark the entry as foreign.
- if the file comes back with any other code, load it from the appcache as normal.

Is that satisfactory?

Comment 25 Jonas Sicking (Not reading bugmail) 2011-12-13 23:48:25 UTC

> Here's what I propose:
>  - add a new section to manifests which works like the CACHE section, but
> also flags the resources with a new flag.

I don't understand this. Could you elaborate.

> Is that satisfactory?

I'm reluctant to say that the appcached is "fixed" even if we made the above changes. What's really needed is more feedback from webauthors that are to use this. But it definitely seems like something that would be good to put in draft and gather feedback on.

Comment 26 michaeln 2011-12-14 00:04:07 UTC

> > Is that satisfactory?

Lets iterate a bit more prior to putting something in the draft. Once in draft'eese, the first task for readers is to reverse engineer the code-written-in-english to understand what it actually says. Let's clarify it in plain language first, then translate to spec-eese.

> - if the file comes back 200, then act as if you were not associated with an
appcache (same as the first time you fetch a file that later is found to have a
manifest), except use the existing appcache as an HTTP cache. Once the appcache
update is complete, treat the appcache as if it was guaranteed to hold the
latest copies of everything it has, so you never have to hit the network for
cached resources. If the file is found to not have a manifest="", then mark the
entry as foreign.

That behavior sounds somewhat similar to the 'useManifest' or 'dontCacheMasters' behavior that i have in mind, but has the downside of having only applying to specific urls that are listed in the new manifest section. Listings like that are tedious and error prone to produce.

Comment 27 Ian 'Hixie' Hickson 2011-12-14 00:10:25 UTC

Obviously anything we do here is subject to further development.

(In reply to comment #25)
> > Here's what I propose:
> >  - add a new section to manifests which works like the CACHE section, but
> > also flags the resources with a new flag.
> 
> I don't understand this. Could you elaborate.

Cache manifests have sections. CACHE:, FALLBACK:, NETWORK:. The above proposes a new section that works basically like CACHE:, but with resources flagged for the special processing during navigation.


(In reply to comment #26)
> 
> That behavior sounds somewhat similar to the 'useManifest' or
> 'dontCacheMasters' behavior that i have in mind, but has the downside of having
> only applying to specific urls that are listed in the new manifest section.
> Listings like that are tedious and error prone to produce.

Hmm, I guess it's true that this feature would often be used with a whole bunch of resources at once, since the whole point here is that the data is mixed in with the structure rather than having one URL for the app structure and the data fetched separately.


Ok. New proposal:
 - add a flag at the top of the manifest that flags it for this new processing.
 - when you fetch a file during navigation and it's in an appcache that has been flagged, and the resource is not marked foreign, fetch it from the network and immediately kick off a cache update.
 - if the file comes back 200, then act as if you were not associated with an appcache (same as the first time you fetch a file that later is found to have a manifest), except use the existing appcache as an HTTP cache. Once the appcache update is complete, treat the appcache as if it was guaranteed to hold the
latest copies of everything it has, so you never have to hit the network for cached resources. If the file is found to not have a manifest="", then mark the entry as foreign, just like for master resources now. Normal master resource caching works as now, which is how we get these pages into the cache for offline support:
 - if the file comes back with any other code, load it from the appcache as normal.

Comment 28 michaeln 2011-12-19 23:04:11 UTC

I've gotten very clear feedback that the automagic caching of master-entries is often not desirable. I'd like to see that feedback addressed as well.

Comment 29 Israel Hilerio [MSFT] 2012-01-13 19:43:09 UTC

We (Microsoft) were asked by our partners and our web properties to expand the use of AppCache beyond offline scenarios. These web properties were not interested in providing offline capabilities to their customers but rather deliver content quickly to them. Like Adrian explained below, we introduced a new tag into the manifest file that allows web properties who want to opt-in into this new functionality to overwrite the default behavior of their AppCache. This new tag prevents the master entry from being cached into the AppCache but allows all the other resources to be cached. As Anne summarizes in thread below, there are issues with using HTTP caching to enable this scenario.

One common scenario where this could be problematic relates to caching of outdated library versions. In other words, what happens if a non-cached master entry needs a new version of a library that was cached and the name of the new and old libraries are the same. This problem exists today, independent of AppCache, for libraries served with long-lived HTTP caching headers. Fortunately, major sites have dealt with this by versioning their library URIs to force loading of new libraries when their main page is updated. We believe that this best practice can be recommended for sites who want to use AppCache as a perf optimization tool and not an offline mechanism. We’ve discussed this with Hotmail, Bing, and Facebook and they agree that this is useful functionality for the AppCache feature to expose.

Comment 30 Tobie Langel 2012-01-20 08:49:42 UTC

Was finally able to gather and list the use cases which aren't covered by AppCache here: http://www.w3.org/community/fixing-appcache/2012/01/18/appcache_use_cases/

The relevant use-case for this bug report is #2 "Make a blog work offline":

A blog engine (e.g. WordPress) can have a very basic offline mode that caches its index page and the _n_ entries listed on it for offline use. This wouldn’t modify its behavior when online, so that visiting the index page would always display the last entries and not need a page refresh to do so.

Comment 31 Ian 'Hixie' Hickson 2012-01-28 22:41:46 UTC

(In reply to comment #28)
> I've gotten very clear feedback that the automagic caching of master-entries is
> often not desirable. I'd like to see that feedback addressed as well.

Please file separate bugs for separate issues.


(In reply to comment #29)
> We (Microsoft) were asked by our partners and our web properties to expand the
> use of AppCache beyond offline scenarios.  These web properties were not
> interested in providing offline capabilities to their customers but rather
> deliver content quickly to them.

Sites that are not interested in working offline have nothing to do with appcache. Appcache can't make them any faster than HTTP caching, regardless of how we change it (certainly within the context of this bug), as described in comment 20.

(In reply to comment #30)
> 
> The relevant use-case for this bug report is #2 "Make a blog work offline": [...]

Does the proposal in comment 27 address this satisfactorily?

Comment 32 Tobie Langel 2012-01-29 11:23:36 UTC

(In reply to comment #31)
> (In reply to comment #29)
> > We (Microsoft) were asked by our partners and our web properties to expand the
> > use of AppCache beyond offline scenarios.  These web properties were not
> > interested in providing offline capabilities to their customers but rather
> > deliver content quickly to them.
> 
> Sites that are not interested in working offline have nothing to do with
> appcache. Appcache can't make them any faster than HTTP caching, regardless of
> how we change it (certainly within the context of this bug), as described in
> comment 20.

Allow me to disagree. AppCache has atomic guarantees regular http cache doesn't have, hence providing significant perf boosts in scenarios where only part of the resources would be cached in the HTTP cache while all of them would be available in AppCache.

Furthermore, AppCache doesn't require developers being able (or knowing how) to set HTTP headers. So allowing non-offline apps to use AppCache would also provide significant perf boost in such situations.

> (In reply to comment #30)
> > 
> > The relevant use-case for this bug report is #2 "Make a blog work offline": [...]
> 
> Does the proposal in comment 27 address this satisfactorily?

Yes.

Comment 33 Jake Archibald 2012-03-03 15:12:05 UTC

We had this use case on http://m.lanyrd.com/. We wanted people to be able to access their tracked/attended events offline, but browsing the site would provide the latest (not app cached) data. We wanted the site to be spiderable by search engines, and didn't want to rely on JS.

Working without JS meant we could extend our support to older devices (and opera mini) by preventing those devices parsing most of the JS based on a few simple bits of feature detection. The performance benefit from this on the old blackberry browser was measurable in seconds.

We worked around the problem by having every page include a hidden iframe pointing to our fallback page, which had a manifest which included all our css/js/assets. This means any page visit informs the browser of the manifest, without the visited page being a master entry.

When a connection fails to a page the app cache takes over. The fallback page gets its data from localstorage & mustache templates (which are shared with the server) from a json object included in the page.

The downsides to this approach are having to wait for a connection to fail before showing the cached version, which is fine in "no signal" cases but can cause a delay in "some but not enough signal" cases. We can live with this.

Also, the conditions in which the connection is seen to fail are pretty broad, eg site throws a 500, dns failure, 404, off-domain redirect. This is good, but once we're in the cached version we have no idea which of those happened, making our error messages a bit vague. Eg, see http://m.lanyrd.com/404 (when you have a populated app cache)

Comment 34 Andrew Betts 2012-05-09 10:23:15 UTC

At the FT we effectively have an advanced variant of Tobie's 'Make a Blog work offline' use case.

Pages of a news app can be expected to contain perishable content, which should be refreshed on each visit if a network connection is available.  However, in enabling appCache, we are forced to have a page of highly dynamic content in our cache, which is not acceptable and eliminates the use of appcache in the conventional way as an option.

The way we solve this currently is using a similar method to that outlined by Jake in comment 33.  Our news pages *do not* include a manifest attribute on the <html> tag, but instead have a hidden IFRAME that loads a dedicated appcache loader resource, eg:

<iframe src='/appcacheloader.html'></iframe>

The appcacheloader.html file includes the manifest attribute on its <html> element, but is otherwise an empty page, and isn't explicitly listed in the manifest, so gets cached as a master entry, which we don't care about.  This allows us to ensure that we don't cache perishable content but still provide fallbacks for it.

Comment 35 Ian 'Hixie' Hickson 2012-06-14 22:49:36 UTC

I've added the idea in comment 27 to the WHATWG spec. Since it's a new feature, I haven't added it to the W3C copy. Here's the relevant diff:

   http://html5.org/tools/web-apps-tracker?from=7135&to=7136

For the purposes of the HTML5 disposition of comments, therefore, this counts as a rejection (boilerplate below). However, I'm happy to add this to the HTMLWG spec if that's what the HTMLWG chairs want, let me know.

For those of you who don't care about the politics, the spec for the new syntax can be found here:

   http://www.whatwg.org/specs/web-apps/current-work/multipage/offline.html#writing-cache-manifests

Basically, if you want this new behaviour you just add this to the bottom of your manifest:

   SETTINGS:
   prefer-online



EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no normative spec change
Rationale: New feature, best to be added in HTML.next.