London Meeting

From Fixing Application Cache Community Group
Jump to: navigation, search

14 August 2012, FT Labs.

Minutes are reproduced below verbatim for archiving purposes. Original minutes available here:

* No implementation is correct 
* AppCache spec is too complex
* In order to change the standards we need real stories that didn't work and offer proposals
* Results of today
** Use cases 
  -- "what are the various set of things we're trying to build?"
** Things to discuss next week in Mountain View

Case studies
FT app
 * Single page app, (currently) uses hash-bang URLs.
 * As a result, explicitly caches root URL and depends on user only visiting that one URL
  * Otherwise much as described in Lanyrd and Economist case studies below.
  * separate domain
  * not a retrofit, would have required a re-write anyway
  * want to provide a subset of overall content, scored by users a-priori
  * can't put a manifest on every page - lack of control, no way to uncache parts explicitly
   - diveintohtml5 pattern doesn't work; no way to expire things
   - bunfight over cache eviction policies
  * description of the problems: http:///articles/application-cache-is-a-douchebag/
* problem with stale data
* problem with user-facing private data
* wanted the app to work without JS
* cached JSON data in localStorage (mustache to render HTML - reason is storing multiple page data)
  - unreliably connected is *worse* than disconnected
 •  because users are logged in and their data is saved to the cache, if   they log out and someone else logs in you get the wrong data. So they   include the login as a comment in the manifest to work around that.
 • We use XHR just so we can show the "Loading" thing
 • there is no error feedback on fallback pages, and there are quite a few rather different error cases
 •  interesting attack: load the main page with lots of different query   strings, cause it to be implicitly cached many, many times
 • far-future caching of the manifest will lock out updates until the  far future happens (30 FT users still locked into an AppCache from July) (bug 830588)
Economist app
  * the result of lessons learned from the FT app
  * no manifest when uesrs aren't signed in
  * offline is an explicit uesr choice (users are prompted)
  * manifest is populated via an iframe, the same as Lanyrd
  to * upon user permission, download all current edition WebSQL/IndexedDB
  • we don't know how much space there is in appcache, and realistically we find that it usually is not enough for our needs
* storing images as encoded strings in the db.
* batching requests for pages. Pages are atomic, JSON data that contain the text content + the encoded images.
 * economist has a native affordance, app store distribution
 * FT only run through web UI, but prompts users to add a web shortcut
   - shortcut is the preferred method for Lanyrd too
 * neither FT nor Lanyrd is mobile-only for this stuff, but is mobile-first
* on desktop, one of the important use cases is t o pre-fetch content.
 * individual issues tend not to change
   - eviction is handled through LocalStorage/etc., not through AppCache, and is LRU based on issues
 * All the JS is in LocalStorage
 • on startup, a heartbeat is sent to the server with last modificatin times of the resources in local storage
    - server responds with information about what to update (and the  priority of the update) based on compat with currently bootstrapped host   code
* very long session times (sometimes weeks)
 *  everyone is using to hold invalidation/last-boot  information in order to avoid reload loops and to force refreshes when   the appcache has old templates vs. incompatible data
  * two fallbacks: /api and /issues, both  are needed to make JS apis work gracefully when things aren't working
* want to support content + struc in the same response for performance reasons, crawlers, etc.
* deliver content within the structure first (data inlcuded in HTML markup) then bootstap the JS ("appify"). 
 *  origin segmentation is a huge issue. Everyone is checksumming in  order  to invalidate based on user of the app (different logins in the  same  browser/session)
 Facebook mobile site
 * current infrastructure has oone giant branch system, no  differentiation based on client capability in terms of overall  architecture
 * 2011 Appcache testing - facebook is mainly content 
 * caching static resources via AppCache
 * push a lot, many invalidations, pre-fetching punishes demand-loading of content
 * history API via a controller that owns the app state
   - JS for page is serialized in sequence and replayed when view is returned to
 * always signed in (no way to cache the whole thing), never offline
 * appcache for perf *only*
 * would strongly prefer a "page state snapshotting" system
   - don't have a strong idea for how the controller would interact with this
   - stopped in Dec
   - burned through too many network resources to be useful
   - probabilistic resource packaging means that all content is invalidated on each push
 * changing their appraoch, looking to be more in-line with AppCache's current strenghts
   - will go to a more app-like style, which will help
   - progressive enhancement to the Nth degree is untennable at some size
   - spreading a codebase too wide hurts both high and low-end experiences
 *  concern that relying on HTTP for *SOME* things, *PART* of the time   makes the whole system extremely difficult to reason about (see: no   store headers and master entries vs. other items)
 *  big apps hit the "manifests hosted on diffrent servers from data and   load-balancer's f you" and invalidating everything is the only way to  go  because you can't do temp responses (because spec doesn't want to be   hostage to captive portals)
   - manifest on Server 1, downloads master entry (which can't be fingerprinted)
   - resources are on a CDN and are all fingerprinted
   ... much quibbling over what spec actually says about 2-phase update...
   ... some discussion of overlapping resource requests in disjoint manifests from the same domain ...
... correction... 
WE HAVE NO MENTAL MODEL FOR APPCACHE, and as a group are relatively convinced that nobody else does either.
Mobile GMail
 * All data in gmail is private, no public data
 * All served over SSL
 * Multiple versions of the app for different platform / client capabilities
 * Offline only supported on most modern browsers
 * Heavy client side data model
 * Every view is a search
 * Gears was designed to make GMail work offline
 * App Cache stores the 'shell' of the application
 * Main app code mainly sits in localStorage (though it is initially delivered to browser in an HTML comment)
 * Time to view inbox, time to view message - key performance targets
 * App cache contains just a fallback / master for bootstrap - effectively just a single resource
Use Cases
* The ability for a user to be able to choose content (developer-defined  chunks, for example alternative and/or additional content) he wants to  consume later, maybe when offline (eg wikipedia read later).
* The user wants to update an application and only incur the cost of downloading the delta of that app.
Dev Use Case
* The user sees an indication to make it clear to them when content is  available offline, and which specific items of content. Items don't  refer to HTTP assets but rather App
* User can refer to an index of pages they have cached offline, built by the site
* User  visits single page, sees content instantly from cache,  application  fetches resources to update that single page atomically,  user is  informed of updates if any
* User  visits leaf page (search results, article), want to  update/create  offline experience for that site without the current page  necessarily  being part of it
* User visits web game, assets are all downloaded before user is able to play, game is playable without a connection
*  User visits web game, levels are downloaded one by one, user is able  to  play once first level has downloaded, addiitonal levels are  downloaded  in the background. Game playable without a connection (only   pre-downloaded levels)
*  User is in a low network area but data for page has been cached   offline, wants to see cached data straight away rather than waiting for   connection to succeed/fail
*  User visited page they have cached, site decides to show them cached   data straight away for performance, then update with fresh data when/if   it arrives
*  User visits an online newspaper and a user-specific selection of   several large media assets (audio and video) are downloaded in the   background, and the user is able to play these while offline.
* User visits  but they don't have it  cached offline / page does not exist /  is down, user wants to  know whether the error is their fault, server  fault or connection fault
* User visits page while online, train goes into tunnel, triggers an async request, request comes from offline cache if present
* User views TV shedule which also contains now+next. Page caches with the exception of now+next which is too time sensitive
*  User saves 99 articles for “offline reading” on a large site such as   wikipedia. User adds article on “gravity” to their offline list. Want  to  cache that new article without re-requesting the other 99
*  User visits article on “Gravity” while offline, want to show cached   version straight away but request updates to the article without   requesting the other 99
* User selects “update all” from a menu, want to update cache for all 100 articles
*  Page uses webfonts, want to cache web font formats used without  having  to cache other suggested formats. Eg, cache WOFF or TTF, not  both  (similar cases for media-query determined imagery)
*  A tool like Opera Dragonfly - currently an appCached HTML5 app -  needs  to be pre-seeded so on first run there's already something there.   Currently, if a user happens to be offline and for the first time hits   "Inspect Element", nothing is there as the appCache is empty.
* The user is informed of cache population progress
*  The user is shown an app-specific loading screen between pages,   indicating that the resource isn't instantly showable (perhaps offering   alternatives that are)
* The user is aware that a particular page is 30 days old and some data may not be relevant
*  User clicks "read later" on an article, that article is cached along   with a Google Maps image included on the page (which originates from   another origin)
*  User visits page, page makes itself work offline, some resources are   from a CDN for performance (therefore different origin) which the page   can use offline
Trying to pin down when/how caches are removed on low-memory/usage - don't think we agreed on these:
*  The user has cached 50 levels of a game, the system runs out of  space,  the browser flushes completed levels, but leaves uncompleted  levels and  the core of the game (developer has control over priority of  atomic  packages, and accepts some may be discarded)
*  A site gaurantees the user a particular set of data to be available   offline, the browser or system must not remove this without user   knowledge 
*  A user has 5 games and 1 note taking app stored offline. The system  is  critically short of resources, the user chooses to uncache the games  but  retains the note taking app, based on importance and space used
*  A user is asked to give an app no-flush privilages, meaning it will  not  be flushed by the browser automatically, user selects "yes please",   because they want to rely on the app's offline functions
Potential proposals
*  Super cache API: programmatic control over cache - push items into   cache, inspect current cache status of any resource, evict items from   cache, etc.
*  Web worker callback / local server: Route all requests through a  local  process that can intercept requests and serve them from local  data  stores rather than the network (and push responses from the  network into  local stores before making response available to browser)
Discussion Topics
* need a way to know if the current page is served from appcache or not
Arbitrary Notes
• Worth looking at Quota Management API
• Old, currently dead proposal: Programmable HTTP Caching and Serving
*  we want a fairly low-level API, which gives power, but also puts   responsibility on, developers to be able to very granularly control   what/how/when stuff gets cached. no special APIs or "flags" to say  "this  is important -> put it in a new type of SUPERcache", leave it  up to  developer to handle exactly how they balance their resources by  giving  them enough access to info about: what is and isn't cached, how  old are  the cached versions, are updates available for each individual  resource,  how much space is left on the device for caching, does the  device allow  me to ask for more space from the user.

*  more granular control needed because there are different kinds of   "stuff" that developers want to store. nice to have (images, video?) vs   critical (the actual JS logic of an app). it's not just static  content,  but could be whole framework, data, etc.

* low-resource callback system: Event Pages:
People present:
Alex Russell -- Google
Andrew Betts -- FT Labs
Jackson Gabbard  -- Facebook
Jake Archibald  -- Lanyrd
Patrick H Lauke  -- Opera
Robin Berjon  -- Independent standards consultant
Tobie Langel  -- Facebook
Christian Heilmann -- Mozilla
Rowan Beentje - FT Labs (from 12pm)