Web & Networks IG: Lessons from Network Information API WICG

Meeting minutes

Slides: Network Quality Estimation In Chrome

Sudeep: today's session is an important one for the IG
… in the past, we've covered a lot about MEC, CDN, network prediction
… today we have folks from Google's Chrome team who implemented some APIs around networking
… we're glad to have our guest speaker Tarun Bansal from the Chrome Team to give us insights about the APIs implemented in the networking space; how it is used, how useful it is, what lessons to draw from it

Tarun: I work on the Google Team and will talk about network quality estimation in Chrome
… the talk is divided in 2 parts: use cases, and then technical details about how it works
… my focus in the Chrome team is on networking and web page loading
… I focus on the tail end of performance, very slow connections e.g. 3G
… about 20% of page loads happen on 3G-like connections - which feels very slow, e.g. 20s before first content
… videos would also take a lot of buffering time in these circumstances
… the 3G share varies from market to market; e.g. 5% in the US, but up to 40% in e.g. developing countries
… We have a service that provides continous estimates of network quality, covering RTT and bandwidth
… we estimate network quality across all the paths, not specific to a single web servers
… this focuses on the common hop from browser to network carrier
… [this work got help from lots of folk, esp. Ben, Ilya, Yoav]
… Before looking at the use cases, we need to understand how browsers load Web pages and why Web pages load are slow on slow connections
… First, it is very challenging to optimize performance of Web pages - takes a lot of resoruces
… Web pages typically load plenty of resources before showing any content (e.g. css, js, images, ...)
… Not all of these resources are equally important - some have no UX impact (e.g. tracking, below-the-fold content)
… loading everything in parallel works fine in fast connection, but in slow connections, it slows everything down
… an optimal web page load should keep the network pipe full and should a lower-priority-resource should not slow down a higher-priority resource
… e.g. loading a below-the-fold image should not slow down what's needed to show the top of the page
… or a JS-for-ad shouldn't slow the core content of the page
… this means a browser need to understand the network capacity to optimize loading of resources
… this is what led to the creation of this network quality estimation service
… Other uses include so called "browser interventions" which are meant to improve the overall quality of the Web by deviating from standard behavior in specific circumstances
… in our case, e.g. when a network is very slow
… another use case is to feed back to the network stack - e.g. using network timeouts
… in the future, this could also be used to set an initial timeout in a smarter way (e.g. higher timeout in poor connection contexts)
… lots of use cases for the browser vendor - what use would Web dev make of it?
… We've exposed a subset of these values to the developers: RTT estimate, a bandwidth estimate, and a rough-categorization of network quality (in 4 values)
… This was released in 2016
… and is being used in around ~20% of web pages across all chrome platforms
… examples of usage:
… the Shaka player (an open source video player) use the network quality API to adjust the buffer; Facebook does this as well
… some developers use it to inform the user that the slow connection will impact the time needed to complete an action
… Now looking at the details of the implementation
… The first thing we look at is the kind of connection (e.g. wifi)
… but that's not enough: there can be slow connections even on Wifi or 4G
… a challenge in implementation this API is being able to make it work on all the different platforms which expose very different set of APIs
… We also need to make it work on devices as they are, with often very limited access to the network layer
… Typically, network quality is estimated by sending echo traffic to a server (e.g. speedtest)
… but this isn't going to work for Chrome: privacy (don't want to send data to a server without user intent)
… also don't want to maintain a server for this
… we also want to make the measurement available to other Chromium-based browsers
… so we're using passive estimation
… for RTT, we use 3 sources of information based on the platform
… the first is the HTTP layer which Chrome controls completely
… the 2nd is the transport layer (TCP) for which some platforms provide information
… the 3rd is the SPDY/HTTP2 and QUIC/HTTP3 layers
… for HTTP, you measure the RTT by the time different between request and response - this is available on all platforms, completely within the Chrome codebase
… there are limitations: the server processing time is included in the measurement
… for H2 and QUIC connections, the requests are serialized on the same TCP or UDP request, which means the HTTP request can be queued behind other requests
… which may inflate the measured RTT
… it is mostly useful as an upper bound
… for the TCP layer, we look at all the TCP sockets the browser has happened, and ask the kernel what RTT it has computed for these sockets
… then we take a median
… this is less noisy, but it still has its own limitations
… it doesn't take into account packet loss; it doesn't deal with UDP sockets (e.g. if using QUIC)
… and it's only available on some platforms - we can't do this on Windows or MacOS
… this provides a lower bound RTT estimate
… The 3rd source is the QUIC/HTTP2 Ping
… Servers are expected to respond immediately to HTTP2 PING
… this is available in Chrome, and it removes some of the limitations we discussed earlier
… but not all servers support QUIC/H2, esp in some countries
… not all servers that support QUIC/H2 support PING despite the spec requirement
… and it can still be queued behind other packets
… So we have these 3 sources of RTT, we take for each sources all the samples, and we aggregate them with a weighted median
… we give more weight to the recent samples; compared to TCP which uses weighted average, we use weighted median to eliminate outliers
… once we have these 3 values, we combine them using heuristics to a single value
… these heuristics will vary from platform to platform
… Is that RTT enough?
… We have found that to estimate the real capacity, we need to estimate the bandwidth
… there has been a lot of research on this, but none of them worked well for our use case
… we do not want to check a server; we want a passive estimate
… What are the challenges in estimating bandwidth? The first one is that we don't have cooperation from the server-side
… e.g. we don't know what TCP flavor the server is using, we don't know their packet loss rates
… so we use a simple approach: we measure how many bytes we get in a given time window with well defined properties (e.g. >128KB large, 5+ active requests)
… the goal being to ensure the network is not under-utilized
… with all these estimates, how do they quickly adapt to changing network conditions?
… e.g. entering in a parking will slow down a 4G connection
… we use the strength of the wireless signals
… we also store information on well-known networks
… To summarize, there are lots of use cases for knowing network quality - not just for browsers, also for Web developers
… but there are lots of technical challenges from doing that from the app layer without access to the kernel layer

Piers: (BBC) I heard Yoav mention in the IETF that the netinfo RTT exposure might go away for privacy reasons
… that was back at the last IETF meeting last year

Tarun: it's not clear if we should expose a continuous distribution of RTT - a more granular exposure could work

Piers: so this is an ongoing discussion - can you say more about the privacy concerns?

Tarun: 2 concerns: one is fingerprinting
… we round and add noise to the values to reduce fingerprint
… another concern is that a lot of Web developers may not know how to consume continuous values
… simplifying it make it easier to consume
… we provide this in the Effective Connectivity Type - which can be easier to use to e.g. pick which image to load

Piers: we have ongoing work on TransportInfo in IETF that is trying to help with this

Tarun: if the server can identify the network quality and send it back to the browser, the browser could it more broadly

<piers> https://‌github.com/‌bbc/‌draft-ohanlon-transport-info-header/‌blob/‌master/‌draft-ohanlon-transport-info-header.md

Piers: one of the use cases is adaptive video streaming; could also useful for small object transports (which are hard to estimate in JS)

Tarun: is is mostly for short burst of traffic?

Piers: it's also for media as well

Tarun: so would the server keep data on typical quality from a given IP address?

Piers: it would be sent with a response header (e.g. along with the media)

DanD: (AT&T) for IETF QUIC, are you considering using the spin bit that is being specified?

Tarun: we're not using it, and I don't think there are plans to use it at the moment
… QUIC itself maintains an RTT estimate which we're using

Dom: has there been work around network quality prediction - we have a presentation from an Intel team on the topic back in Sep

Tarun: not at the moment - we're relying on what the OS provides

Jonas: what we're doing for network prediction is to use info coming from the network itself (e.g. load shifting across cells)
… we use this to do forward-looking prediction

Tarun: the challenge is that this isn't available at the application layer
… e.g. they wouldn't be exposed to the Android APIs
… an app wouldn't know the tower location - you can know which carrier it is, but not more than that
… there is a also a lot variation across Android flavors
… the common set is mostly signal strength and carrier identifier

Sudeep: would it be interesting for the browser which talks to the browser to talk to interfaces to the carrier network (e.g. via MEC)?
… The carrier/operating networks may have more info about the channel conditions

Tarun: definitely yes
… Android has an API which exposes this information
… but it never took off, and most device manufacturers don't support it
… there is a way to expose this in Android
… I'm not sure what the practical concerns were, but it never took off
… it would be super-useful if it was available

Sudeep: you spoke about RTT, bandwidth that got defined in W3C
… but implementations can vary from one browser to another - is there any standardization about how these would be measured, or would this be UA dependent?

Tarun: it's spec as a "best-effort estimate" from the browser, so it's mostly up to the browser
… right now it's only available in Chromium-based browsers
… even Chromium-based implementations will vary from platform to platform

Dom: can you say more about the fact that is is not available in other browsers?

Tarun: I think it's a question of priority - we have a lot of users in developing markets which helped drive some of the priority for us

Song: (China Mobile) I'm interested in the accuracy of the network quality monitoring
… you mention aggregating data from 3 sources: HTTP, TCP and QUIC
… is the weights for these 3 sources fixed, or does it vary based on the scenario?

Tarun: it's very hard to measure accuracy
… in lab studies (with controlled network conditions), the accuracy algorithm does quite well
… we also do A/B studies, but it's hard given we don't really know the ground truth
… so we measure the behavior of the consumer of the API, e.g. on the overall page load performance
… we've seen 10-15% improvements when tuning the algorithm the right way

Song: when you measure the data from these 3 sources, are they exposed to the Web Dev? or only the aggregated value?
… are there any chance to make the raw source data available to Web browsers?

Tarun: we only provide aggregated values

Piers: how often do you update the value?

Tarun: internally, everytime we send or receive a packet
… we throttle it on the Web API - when the values have changed by more than 10%

Piers: that's a pretty large margin for adaptation

Tarun: most of the developers don't care about very precise estimates
… it's pretty hard to write pages that takes into account that kind of continuous change

Piers: for media, more details are useful

Tarun: even then, you usually only have 2 or 3 resolutions to adopt to

Piers: but the timing of the adaptation might be sensitive

Piers: Any plans to provide more network info?

Tarun: no other plans as of now
… we're open to it if there are other useful bits to expose

Sudeep: that's one of the topics the group is aiming to build on
… are there other APIs in this space that you think would be useful to Web developers?

Tarun: I think most developers care about few different values
… it's not clear they would use very detailed info
… another challenge we see is around caching (e.g. different network resources for different network quality)
… you might be loading new resources because you're on a different network quality, which if it is of low quality isn't counter productive
… In general, server-side estimates are likely more accurate

Sudeep: Thank you Tarun for a very good presentation!
… Going forward, we want to look at how these APIs can and need to be improved based on Web developers needs
… we'll follow up with a discussion
… Next week we have a presentation by Michael McCool on Edge computing - how to offload computing from a browser to the edge using Web Workers et al
… call info will be sent to the list

05 February 2020

Attendees

Meeting minutes