AppCache Wows, Vows & Woes

Position Paper for the W3C Workshop on The Future of Offline Web Applications

Author: Tobie Langel, Software Engineer, Facebook.

According to the HTML5 specification, AppCache is designed to, “enable users to continue interacting with Web applications and documents even when their network connection is unavailable.” Clearly this is a problem worth solving as it enables a number of use-cases which until now weren’t guaranteed to work—such as navigating between previously loaded web pages while offline—or were simply impossible to build (e.g., offline-capable mail clients). Not only does AppCache help bridge the gap between native and web apps, it also has tremendous potential to improve load times and general availability of websites and web apps alike, thanks to allowing for fine-grained pre-loading and persistent caching of assets. This is especially important given the dramatic increase in mobile traffic.

Despite these exciting promises, AppCache suffers from a number of flaws which we believe negatively impact adoption.

One of the largest problems is that it is modeled after the updating process of regular desktop applications. Instead of enabling web developers to build experiences which seamlessly transition from connected to offline and back, AppCache forces them to write offline applications which can access the web when connected. In effect, web pages which reference a cache manifest get served from AppCache even when online. This makes retro-fitting existing web apps nearly impossible and corners development of modern web apps to client-side MVC or similar architectures (where HTML is only used to build the application’s chrome and data is fetched asynchronously). More importantly it breaks the mental model of the web both developers and users have established over the years (navigating to a page yields fresh content).

This brings us to AppCache’s second issue: implicit caching of visited web pages (called master entries by the spec). Although allowing this options is crucial for some use-cases, for example to allow offline browsing of visited public content, it creates security and privacy concerns in others. This issue has been recognized by the spec’s editor, and the Editor’s Draft now contains a tentative fix for it which relies on AppCache following the Cache-Control header’s “no-store” directive.

We believe both these issues would be better addressed by making implicit caching of master entries opt-in and by completely bypassing the AppCache for master entries when online. Opting-in could be achieved using a syntax around these lines:

CACHE MANIFEST
CACHE VISITED PAGES:
*

which would even have the added benefit of allowing more fine-grained filtering, e.g.:

CACHE MANIFEST
CACHE VISITED PAGES:
/posts/
/public/

The last problem with AppCache is how it makes updating non-trivial applications… non trivial. There are a number of reasons for this. Firstly, the current caching process reverses the usual relationship between a document and its assets (a change in the manifest triggers a refresh of the main entry, not the opposite). Fortunately, the modification we suggest above already mitigates that issue. Secondly, the current syntax uses the manifest’s URL as an identifier. This breaks fingerprinting, still the most common technique for cache busting. Using the LINK element (with a rel attribute of “manifest”) instead of the current solution, which relies on the “manifest” attribute of the HTML element, would allow identifying the manifest by the “title” attribute rather than by it’s URL, just like stylesheets, which would enable fingerprinting, e.g.:

<link rel="manifest" title="main" href="/main_1309991142320.manifest" />

Thirdly, the same origin policy restriction prevents storing the manifest on a CDN which would greatly simplify certain scenarios where AppCache is used nearly exclusively as an assets cache.

Increasingly, web technology is used to power full-blown applications distributed across multiple platforms. While a lot of these platforms now support AppCache, most also sport their own, proprietary manifest. Developers are forced to maintain a specific manifest for each distribution platform they target (on top of AppCache’s). Ultimately providing a simpler path for this through a unique, extensible syntax, would probably be ideal.