Meeting notes 14 August 2012

From Fixing Application Cache Community Group
Jump to: navigation, search

FT Labs hosted an informal meeting to help identify shortcomings in AppCache and help improve the state of building offline apps. As participants of this meeting were representing similar companies to those present in this CG, and as the focus is identical, we will treat the output of the meeting as a production of this CG.

Notes were taken on a public etherpad and this summary is intended as a more palatable version.

Attendees

The following people were present:

Attendees represented both application developers (FT Labs, Lanyrd, Google and Facebook), and browser vendors (Opera, Google and Mozilla).

Agenda

The discussion was held in two stages:

  1. Developers present described case studies of real-world app cache use
    • Mobile Lanyrd (Jake)
    • The Economist on PlayBook (Andrew)
    • Mobile Facebook (Jackson)
    • Mobile GMail (Alex)
    • The FT Web App (Rowan)
  2. The group distilled use cases from the case studies

Throughout the meeting, ideas were presented and discussed that might present solutions to these use cases and represent solutions to the current woes of using App Cache.

Case studies

The case studies were presented by the developers present who either were part of the team that built the site concerned or had detailed knowledge of it.

Mobile Lanyrd

The Lanyrd mobile site is completely separate to the desktop site, at least partly because the use of appcache demanded a different architecture. There is too much data to store it all offline, so users explicitly choose which content to cache. The solution was required to work without JavaScript, as well as work offline (though it was acceptable that it did not work offline for JavaScript-incapable clients)

Problems that Lanyrd experienced using Appcache have been documented publicly in the A List Apart Article Application Cache is a Douchebag. Those that were highlighted by Jake in the meeting comprised:

  • Unable to avoid implicit caching of master entries - leads to lack of control, possible unintentional exhaustion of app cache quota, undesired caching of dynamic data. This is worsened by the fact that it's not possible to query the contents of the cache or evict items from it on demand.
    • Workaround: Reference the manifest from only one page, and include that page in an IFRAME on every other page.
  • Potential vulnerability - if an attacker could convince the browser to load any page that references a manifest, and repeat this many times with different query string parameters on each request, every response would be cached, resulting in undesired exhaustion of app cache quota and risable performance of an app cache update.
    • Workaround: Serve a 404 or other error response if any query params are sent with any request for a page that references a manifest.
  • Unreliably connected is worse than not connected - waiting for a network timeout before a fallback can be loaded degrades app performance, sometimes to the point of making it unusable.
    • Workaround: Explicitly cache the homepage of the app, hoping that most users enter the app on that URL, and use that as the fallback for all other URLs (side effects includes unknown cache state of current page, and undesirable caching of dynamic, user-specific data in app cache. Further workaround: add user ID to manifest comment, ensuring that app cache is invalidated by a change of authenticated user)
  • No granularity of error states - if some request fails and fallback is served, it's impossible to tell whether the error resulted from the device knowing it is offline, a network timeout, or a server error (and in the latter case, which type of server error, since 404 has very different implications to 500).
    • Workaround: Explain to end user how crap app cache is and why we can't tell them why what they wanted can't be shown
  • Far future caching of the manifest is a death trap for developers - Any momentary mistake resulting in a far-future TTL on a manifest response will result in that device never again updating its cache. There is no way to remotely recover these users - they must clear the cached data manually.
    • Workaround: Always serve manifest with no-cache, no-store

App cache manifest for Lanyrd at time of writing:

CACHE MANIFEST
# v207_9497
http://static.lanyrd.net/mobile-web-bundle.6d059685.gz.js
/templates.v207.js
http://static.lanyrd.net/css/mobile-web/all.260e2b1c.gz.css
http://static.lanyrd.net/css/mobile-web/img/sprite1@2x.ef0d2a65.png
http://static.lanyrd.net/css/mobile-web/img/social-sprites@2x.094b231b.png
http://static.lanyrd.net/css/mobile-web/img/person.d4982da9.png

NETWORK:
*

FALLBACK:
/mobile/ios2/ /static/json/mobile-web/api-fallback.json
/ /

The Economist on PlayBook

The Economist HTML5 app is currently only available from BlackBerry App World as a PlayBook app running under BlackBerry WebWorks (for the distribution channel, presence on home screen, removal of browser chrome), but since it does not use any of the WebWorks APIs, it does not rely on the WebWorks environment and can be viewed in a browser. Currently it requires JavaScript but only for layout assistance - if support for a fallback layout using only CSS were added, the app would work without JavaScript, and that remains a longer term aim. The app has the capability to render content server side and deliver combined content+structure responses partly in support of that goal, also partly for performance, and finally to enable crawler support.

Building the app took advantage of lessons learned from the first generation of the FT web app. Some of the challenges and experiences in building the Economist app are discussed in the FT Labs blog posts Navigator.onLine: Here be dragons, Tutorial: How to make an offline HTML5 web app, FT style and Prefer-online? Not so much. Andrew highlighted the following problems at the meeting:

  • No straightforward way to deal with 'drive-bys' (users who are visiting briefly and don't mean to 'install') - we only want to start using offline data if the user is logged in and opts in, but we have to reload the page to do it.
    • Workaround: When user opts in, add an IFRAME to the page that loads a page that references the manifest. Record the opt-in in a cookie or localStorage. On every subsequent page-load, add the IFRAME automatically. Remove IFRAME once cache check/update is complete.
  • Unable to avoid implicit caching of master entries (same problem as Lanyrd, same impact)
    • Workaround: Same as Lanyrd - reference manifest from only one page, load that page in an IFRAME on every other page.
  • Atomic cache prevents storage of per-edition assets - can't store content of an edition in the app cache because the editions the user chooses to cache offline can change at any time, and app cache would force redownload of all cached editions in order to cache a new one (especially if doing so required eviction of an existing cached edition).
    • Workaround: Store all edition content (including images) in WebSQL or IndexedDB, not in App cache and evict based on LRU strategy implemented in JS (side effect: WebSQL/IndexedDB are UTF-16 string stores, and image data must be converted to base-64 for storage. Images are downloaded pre-encoded and repacked into UTF-16 with two ASCII chars per UTF-16 char, to optimise for inflexible JS string encoding)
  • Atomic cache prevents immediate and differential updates to application code - want to be able to download only new code files / patches for existing code, and to apply the update immediately rather than on the next reload. With app cache, it takes one reload to discover that an update is available and another to apply it, and it will redownload the entire manifest. Also Economist app session times can be extraordinarily long.
    • Workaround: Store all application code in localStorage. Run a periodic healthcheck process to verify that local mod time is still the latest available. If new code is returned, save it in LS and if backwards compatibility is broken, force a reload with an informative upgrade message to the user.
  • Difficult to guard against errors in app cache resulting in reload loop
    • Workaround: use window.name to store the time of last app start. Give up if starting up fails and previous app start was less than 10 seconds ago (retrying obviously didn't work!)
  • Impossible to tell how much space there is left in app cache, and in practice it's not enough to store any content - this may be partly solved in future by the Quota management API but the space limitations remain extremely variable between implementations.
    • Workaround: store all content in WebSQL or IndexedDB

App cache manifest for the Economist app at time of writing:

CACHE MANIFEST
# Version: 12520120420
/
/favicon.ico
/lib/fonts/EcoHeadBd.ttf
/lib/fonts/EcoNewBd.ttf
/lib/fonts/EcoNewBdIt.ttf
/lib/fonts/EcoNewIt.ttf
/lib/fonts/EcoNewRg.ttf
/lib/fonts/ofsnbd__.ttf
/lib/fonts/ofsnbdi_.ttf
/lib/fonts/ofsnbk__.ttf
/lib/fonts/ofsnbki_.ttf
/lib/fonts/ofsnmd__.ttf

FALLBACK:
/issues /appcache/fallbacks/issues
/api /appcache/fallbacks/api

NETWORK:
*

Mobile Facebook

Facebook is not currently using app cache on the mobile site. It was removed in Dec 2011. FB has never supported offline use and even when app cache was included in the site, it was used for performance of static resources, not offline support.

The problems highlighted by Jackson comprised:

  • it was causing clients to burn though lots of bandwidth unnecessarily. FB releases a new code base that would cause an app cache update at least daily, if not several times a day, and groups resources based on individual user behaviour, resulting in huge fragmentation and complete cache invalidation on each push. Re-downloading all the resources was an unnecessary burden on clients and servers for no significant benefit.
  • concern that relying on HTTP for SOME things, PART of the time makes the whole system extremely difficult to reason about (see: no store headers and master entries vs. other items)
  • if an update rolls out gradually across a large number of servers, servers may for a time have different copies of the same file. Because master pages cannot be fingerprinted, this leads to the situation where the browser will download new master page from server 1, get the manifest and then redownload the master page (to populate app cache) from server 2, where it is different.
    • Discussion of this point led us to the conclusion that no-one really understands either the app cache selection algorithm or the app cache download process.

FB are now changing their approach in ways that it is believed will allow app cache to be used more effectively:

  • will go to a more app-like style, which will help
  • progressive enhancement to the Nth degree is untennable at some size
  • spreading a codebase too wide hurts both high and low-end experiences

Mobile GMail

GMail has very little public data - almost everything is user specific. The site is always served over SSL, and there are many different versions for different platforms and client capabilities. Variation is mostly done server side. Gears was designed to make GMail work offline, and consequently Google has been keen to get the main features of Gears into HTML5. Offline use is only supported on the most modern browsers, where app cache is used to store the 'shell' of the application, though not the application code. Like the Economist app, the main app code is in localStorage, because it is updated differentially. The code is initially delivered as inline code in an HTML document and is commented out, so it only need be parsed when it is actually required. Performance targets are time to inbox, time to first message, so being able to parse and/or update only specific code is important. So the app cache contains just a fallback for bootstrapping the app while offline.

Every view of GMail is a search, the most common and default one being a search that produces what the UI refers to as the inbox (the inbox exists in practice only in the sense that some messages are tagged with an inbox tag so that they can be found by that search)

Current sample app cache manifest for GMail:

CACHE MANIFEST
# AppName: superpudu
# User: test@example.com
# Version: 6bc6d50801e27c2b
NETWORK:
/
*
//www.google.com

The FT Web App

The FT app was developed prior to the Economist one and before some of the workarounds there were developed, so it differs in a few respects:

  • There is only a single URL, to avoid caching of multiple masters
  • Fragment navigation is used rather than the history API
  • All page rendering is client side

This is in the process of moving to a more traditionally web-oriented model, to restore expected web behaviours such as conventional article URLs with server side rendering, using the workarounds outlined in the Lanyrd and Economist case studies.

Current app cache manifest for the FT Web App:

CACHE MANIFEST
# r1295
/
/lib/img/startupscreen/splash-logo.png
/lib/img/startupscreen/splash-logo@2x.png
/lib/img/mini-headshots-sprite.png
/lib/fonts/MillerDisplay-Semibold.ttf
/lib/fonts/Clarion.ttf
/lib/fonts/Clarion-Bold.ttf
/lib/fonts/BentonSansRegular.ttf
/lib/fonts/BentonSansBold.ttf
/lib/fonts/BentonSansBlack.ttf
/lib/fonts/BentonSansLight.ttf
/lib/fonts/MillerDisplay-Roman.ttf
/lib/fonts/MillerDisplay-Bold.ttf
/lib/fonts/Clarion-Italic.ttf
/lib/fonts/BentonSansRegularItalic.ttf
/lib/fonts/MillerDisplay-Semibold.woff
/lib/fonts/Clarion.woff
/lib/fonts/Clarion-Bold.woff
/lib/fonts/BentonSansRegular.woff
/lib/fonts/BentonSansBold.woff
/lib/fonts/BentonSansBlack.woff
/lib/fonts/BentonSansLight.woff
/lib/fonts/MillerDisplay-Roman.woff
/lib/fonts/MillerDisplay-Bold.woff
/lib/fonts/Clarion-Italic.woff
/lib/fonts/BentonSansRegularItalic.woff
/favicon.ico

NETWORK:
*

Use cases

The following use cases were distilled from the case studies.

  • On a large knowledge base site, the user can choose content (developer-defined chunks, including potentially content possibly alternative, related and/or additional to the content selected) he wants to consume later, maybe when offline (requiring the ability to add to and remove items from cache on demand)
  • In a complex application whose offline resources fill the entire available space in the app cache, the user is made aware of a small update that is available and wants to download and apply it quickly (it should be possible to apply differential updates to a cached resource without either having to download the whole of that resource, or any other unaffected resource)
  • In a large news site, not all content can be stored offline, but the user sees an indication to make it clear to them which items of content are currently available offline, where items here are semantically atomic units of content that may comprise numerous HTTP resources (requiring the ability to query the cache to obtain the cache status of a specified resource)
  • User visits web game, assets are all required to be downloaded before user is able to play, game is playable without a connection
  • User visits web game, levels (comprising numerous media files such as images, sounds, and video) are downloaded one by one, user is able to play once first complete level has downloaded, additional levels are downloaded in the background. Game playable without a connection (only pre-downloaded levels, if a level fails to download or does not download completely, the others remain available and playable)
  • User visits a page while offline or on a lousy connection, and wants to be able to see stale content straight away with fresh content as soon as possible
  • The user is able to be aware that a particular piece of content is stale and some data may be out of date (where content here should be read as equivilent to an HTTP resource, though it may be displayed as part of a page that features the content of many such resources, some of which may be fresh and others stale)
  • The user is able to see when an app is making an attempt to update cached data (a train times app will show out of date train times and needs to reassure the user that effort is being made to update them).
  • The user is able to copy the URL in the address bar at any arbitrary point in time and send it to a friend who will be able to recreate a representative state by visiting that URL, regardless of the connection state of either user.
  • User visits an online newspaper and a user-defined selection of several large media assets (audio and video) are downloaded in the background, and the user is able to play these while offline.
  • User visits http://foo.bar/articles/whatever but they don't have it cached offline / page does not exist / foo.bar is down, user wants to know whether the error is their fault, server fault or connection fault. The app should be able to tell you why it's not available.
  • User visits page while online on a mobile device while riding a train. Train goes into tunnel, and the page triggers an async request. That request is served from the cache if a cached response is available (currently a page loaded while online will not use app cache for any subresource requests)
  • User visits a page that may use a variety of resources depending on the device capabilities and wants to store in cache only those files/formats that are appropriate for their device (eg TTF vs WOFF fonts, retina vs non-retina images)
  • The user is informed of cache population progress so that they feel informed about a) the fact that such a background process is happening, b) how much has been done, c) how long it it likely to take to complete, and d) the fact that the process is not complete yet.
  • User clicks "read later" on an article that contains a static Google Maps image (an IMG tag whose SRC is a resource of foreign origin) and this continues to work when offline. App cache currently supports this but it is underused because of lack of granularity of control over cached resources, atomicity, personalisation of manifest etc
  • User clicks "read later" on an an article that contains an embedded interactive Google Map (using Google's default recommended embed code or some modification of it by the developer of the site) and this continues to work while offline. Alternative example: fonts from fontdeck.
  • A user hasn't played a game for a long time. When he returns to the game, he wants his game data to be preserved, as far as possible, giving priority to the core application code and resources, then (less critical) his saved game data, then (least critical) the level information and resources.
  • Any page visited by the user on a given origin can, on demand (not just on page load), initiate the offline use of the app for any part of the same origin (without forcing the storage of the current document)
  • A user wants to explicitly provide consent for an app to use a local resource, such that the browser will not revoke that permission or erase that data without further explicit consent or request from the user (ensuring that if the user installs an app they regard as important, its data is safe from proactive eviction by the browser without user consent)

Proposals discussed

It was acknowledged that app cache is unlikely to go away, and advocating an entirely different solution is not necessarily helpful, but nevertheless a couple of alternatives did arise.

JavaScript cache API

The developer should have programmatic control of the cache for their origin, with the ability to:

  • Push an item into cache (including with support for items from foreign origins, and content generated locally)
  • Evict an item from cache
  • Inspect the current cache status of an item
  • Change the current cache status of an item (eg changing the remaining TTL or validation requirements)

There should be only one cache for the origin, with complete control available to any document loaded from that origin.

Web worker local proxy

It should be possible for a developer to provide a web worker or similar JS module that receives a callback when a request is made by the browser and which has the opportunity to act as a proxy. The worker might cancel the request to the network and simply return a response locally, or could allow the request but modify or replace the response. The worker would have access to all local storage resources and connection status information.

It was noted that this solution falls short where an app may require resources from a foreign origin, the responses from which the proxy should not be able to inspect.

App cache modifications

Some participants in the meeting have previously advocated changes to the app cache, though these weren't discussed at the meeting:

Notable conclusions

As a result of discussing a number of specific elements of app cache specification, notably the application cache selection algorithm and the application cache download process (the group failed to reach agreement on a common interpretation of either of these) the group agreed it has no mental model for app cache, and is reasonably convinced that no-one else does either.

There is significant disagreement about what defines a 'web app', whether the distinction is binary or whether there is a continuous scale of 'app-ness', and whether the distinction is actually important.

  • Tobie advocates that an app is defined by a single user-facing URL endpoint. A solution with multiple pages is a website, not an app.
  • Andrew advocates that the distinction is qualitative, an app is an experience tailored to a device, a web app simply means achieving the necessary user experience quality using web technologies rather than a proprietary platform specific technology. If a user can't tell the difference between a native app and a website, that website is as much an app as the native code.
  • Alex raised the idea of a site becoming an app when it is 'blessed' with higher than normal privileges, eg access to offline storage, filesystem, geolocation etc. Though there is no one privilege that tips the balance from website to app and Alex is in favour of prompting the user individually for each privilege request (though several can be combined into one request, eg as is currently the case when packaging an app using the widgets spec)