W3C

– DRAFT –
WebDriver BiDi

02 September 2020

Attendees

Present
brrian, cb, drousso, foolip, jgraham, john_chen, mathiasbynens
Regrets
-
Chair
-
Scribe
foolip

Meeting minutes

<jgraham> Zakim: this is webdriver-bidi

<mathiasbynens> The meeting has not started

<jgraham> Yeah, so I think the problem is that with AutomatedTester away no one can start the zoom call

<jgraham> So unless something changes, I think we either need to defer e.g. until next week or pick a different videoconf

<mathiasbynens> I could create an impromptu Google Meet URL?

<gsnedders> that works?

<jgraham> https://‌mozilla.zoom.us/‌my/‌jgraham ?

<mathiasbynens> let's go with ^

<jgraham> For anyone still waiting to join the meeting, we changed conference rooms since AutomatedTester is away

jgraham: agenda is at https://‌www.w3.org/‌wiki/‌WebDriver/‌2020-09-BiDi#Agenda

jgraham: first agenda item is follow-up from last meeting. logging module and more?

<jgraham> foolip: does discussion need to continue?

foolip: https://‌github.com/‌w3c/‌webdriver-bidi/‌issues/‌45 was filed

jgraham: we can have the discussion on GitHub, but a bit early to specific an actual feature yet

foolip: let's do https://‌github.com/‌w3c/‌webdriver-bidi/‌issues/‌43 first

specifying navigating a browsing context

<jgraham> foolip: Once we hae basic protocol, how do we navigate? Inspiration from CDP and WebDriver. These are very different. WebDriver has page load strategies that can be set to define the behavior, and it blocks.

<jgraham> foolip: Navigate and then return.

<jgraham> foolip: CDP model is request navigate and then events happen and client has to wait

<jgraham> foolip: Opinions? Vaiations are possible e.g. progess updates, which aren't quite like CDP

jgraham: I think the point of this is to do something more like CDP in these cases, that the protocol should be lower level and not having blocking commands

jgraham: I'd be against adding a long-running blocking command for navigation, or anything

jgraham: not sure about the progress update model was, but my assumption is you send a navigate command and then start getting events for various points in the lifecycle

jgraham: I think CDP has more events for network stuff, maybe it all falls out of performance timing

cb: +1 to that

cb: the reason we're looking at CDP is that it allows the browser to be introspected

cb: which makes it really powerful. people will use a framework that abstracts complex logic like listening to certain events

cb: someone with no understanding of the protocol would be able to use it through a framework

<jgraham> foolip: Means that we have to define lots of events to make navigation work, but there's some chance to start with just e.g. DOMContentLoaded

brrian: beyond sending a response once load is done/canceled, I don't think anyone else needs to go in

brrian: that said, it can take a very long time for a navigation to be committed

brrian: don't see a big difference between the time it will take

brrian: what we've done in the web inpector protocol is async commands, where responses can come out of order

<jgraham> foolip: is navigation committing a thing we want to block on?

brrian: I see no reason to return early if you can't do anything with it

<jgraham> foolip: How does this work in web inspector

brrian: there's no timeout for the navigation

brrian: not sure how to specify it, it can take a long time to navigate

<drousso> +1 to that ^

drousso: in web inspector there's no navigate command, that's triggered by JS

drousso: but while you're navigating, what is there for you to do in the page?

jgraham: one thing that might be different from the underlying inspector protocol is that we're mulitplexing a bunch of browsing contexts for a single connection

jgraham: so maybe you should be able to navigate one while you do other work in another

jgraham: way I thought this would work is you navigate, and the response tell you only "yes that was a well-formed command, I can try to navigate based on that"

jgraham: and later, you get some event, not with the command ID, but with the browsing context ID, say

jgraham: that does have implications for client authors

jgraham: there could later be an error, if the navigation couldn't actually occur

jgraham: but in other ways it seems similar to the async command model

jgraham: the difference I think is whether you can send no response at first and later send a response with the command ID

jgraham: or if there should always be a response saying "that packet was valid" and later can't send another reponse for that

drousso: that's is what I was describing as an async command

drousso: but rather than sending an empty response, we can tie them together

<jgraham> foolip: Still some choices to make here. Can take it to the issue

gsnedders: want to point out a bunch of potential error cases, where the client fails to communicate with the browser, which might have crashed or is running on another machine

<brrian> Also, due to multiprocess and embedder navigation policy, WebKit can't synchronously tell if a navigation is valid or to be cancelled.

gsnedders: so there are cases where even commiting the navigation can fail

jgraham: we're 50% through our time

cddl refactor

issue is https://‌github.com/‌w3c/‌webdriver-bidi/‌pull/‌50

jgraham: don't think there's loads to discuss

jgraham: point of the PR is to fill out more of the basic infrastructure in the spec

jgraham: one of the things it does is change how we use CDDL. initially had a single CDDL fragment per command

jgraham: in theory we could extract the schema for the spec, given the right tooling which doesn't exist

jgraham: it defines remote end and local end schemas for validating messages

jgraham: it's slightly awkward, using CDDL is a little bit of a compromise, but I think it's an improvement

cb: to confirm, we'll have all the definitions in the spec and need to build tooling that extracts it into a single CDDL file?

jgraham: yes, that's the idea

basic support for subscribing to events

issue is https://‌github.com/‌w3c/‌webdriver-bidi/‌pull/‌51

jgraham: this is based on the previous PR

jgraham: adds the basic mechanism for subscribing to a specific event stream

jgraham: there are compromises which might be worth discussing

jgraham: at the moment it's in a session module, which I'm not attached to

jgraham: you can pass in module names and a list of browsing context IDs

jgraham: I didn't deal with real-specific cases like service workers

jgraham: at the moment, you can enable events globally, like "tell me when a new browsing context is created"

jgraham: you can also apply it to a specific browsing context and its descendants, unless you unsubscribe on a given descendant

jgraham: if you subscribe to logging, you get it for frames as well unless you opt out. that was an arbitrary choice. I think Brandon had advocated for that, but people may have opinions

foolip: is there a parameter in the command to avoid the descendents

jgraham: not at the moment, would be more complicated, but possible

jgraham: I don't have a strong sense of what the implementation tradeoffs are. My feeling is that this is probably sensible from the pov of automation clients, like if you want to see all the logging for a window.

jgraham: that would all be part of the test

<mathiasbynens> seems sensible from an end-user perspective, but might be tricky with site isolation/OOPIFs

<mathiasbynens> (not pushing back per se, just saying it might be more challenging to implement)

<mathiasbynens> supporting descendents, that is ^

<gsnedders> yeah, OOPIFs seems like it makes it difficult, but only getting an event a new descendent is created is painful because then you can miss early events from it

<drousso> ^ +1

<jgraham> foolip: Are non-browsing context targets part of the the design?

<jgraham> jgraham: Not currently, but know it's needed

mathiasbynens: out-of-process is hard, but that's an implementation concern

mathiasbynens: you would like things to behave the same for in-process and out-of-process

jgraham: yes, that worries me

<jgraham> foolip: Would hope that it's possible to have the same model for in/out of process, but it could be hard to actually make it work. Need implementation experience

jgraham: another point is that for most test use cases you probably don't run into these problems

foolip: can you clarify?

jgraham: think it's common for tests to only care about a specific origin, that testing cross-origin is less common

<jgraham> foolip: Data point: we have a puppeteer bug with OOP iframes, so it is a thing that users run into

jgraham: if we do something different here, can we do something better than the client could do?

jgraham: is it better to punt it onto the client?

<jgraham> foolip: what we learnt from the bug is CDP pauses new targets until you have a chance to attach and register for events

<jgraham> foolip: Design is pretty weird, but could be a direction to explore

gsnedders: on processes. given that we can be subscribed to multiple top-level browsing contexts, we can already be subscribed to multiple processes

<jgraham> foolip: easy to have a bug where you miss events from OOP iframes but in-process iframes work

Resolving minor issues for basic transport

first issue, https://‌github.com/‌w3c/‌webdriver-bidi/‌issues/‌47

<jgraham> foolip: number of connections. Noone wants to support multiple connections, but there are details about how to do that. If websocket closes, when is the state reset

jgraham: since Simon isn't here I will channel him for a moment

jgraham: this only applies to "furthest remotes", not intermediary notes and such, they'll support multiple connections at a time

jgraham: they could multiplex

jgraham: but we still need to specify that for browsers there can only be one connection

jgraham: scenario is if the WebDriver session is still open but the WebSocket connection was closed?

jgraham: then spec should say you're able to connect?

<jgraham> foolip: easy to write a bug where the client thinks the connection is dropped but the server doesn't yet

Shengfa: What jgraham was saying about keeping the sessions state, if you first connect and subscribe. If you reconnect, are you still subscribed or do you have a clean state?

Shengfa: If you want to keep the state, you have to record the events and send them back to the client.

Shengfa: On the other hand, if the WebDriver session closes, do we automatically close the WebSocket connection?

jgraham: We've talked about event buffering before and it's very difficult. My fairly strong preference is we don't spend a lot of time worrying about it now, difficult to get it right.

jgraham: I'm tempted to say that we don't buffer any events if you reconnect but you do retain the state

jgraham: I think there's been conversations about allowing starting a session without the HTTP handshake, which I think will be possible, but there's conceptually still a connection

jgraham: maybe in that case if the websocket connection drops the session ends, but we can think about that separately

jgraham: But assuming an explicit WebDriver session, closing that should close the WebSocket connection

foolip: If you start a session and then do nothing for a long time, won't the connection drop?

jgraham: No, there is no connection, you've made a HTTP request and then there's no connection.

https://‌github.com/‌w3c/‌webdriver-bidi/‌issues/‌48

jgraham: is it minimum maximum size?

<jgraham> foolip: Minimum size for the maximum message size

<jgraham> foolip: Matters because clients could not interoperate on this

<jgraham> foolip: Could be useful to look at existing limits. CDP/Chromium WebSocket might have 256Mb limit

drousso: along those lines, in web inspector we have commands that will send back the response content, which is content that came over the web, and you'll never encounter a 256 MB JS file

<mathiasbynens> challenge accepted

<jgraham> foolip: concern is that limi shouldn't be too small

<drousso> mathiasbynens: plzno 🤣

drousso: these limits are likely limits that won't matter in practice

<jgraham> gsnedders: On mobile, limits might be more aggressive

gsnedders: on a mobile device you might want a much more aggressive limit

gsnedders: total amount of memory the device has sets a limit

drousso: generally speaking, the limit only comes into play if we've designed the commands and events badly

drousso: this will only come into play if we batch a whole lot of things together

jgraham: think I agree with not ratholing on this too much

jgraham: also think that things like print to PDF and screenshots can generate large messages

jgraham: think I've seen multi-megabyte messages at least

jgraham: a small number of commands that we have to base64-encode at the moment

jgraham: maybe have some way to split that data up

<jgraham> I think some CDP things have a way to send back a handle you can use to get the data?

<jgraham> There's at least design options there

Zakim: generate notes

<jgraham> RRSAgent: make minutes v2

<jgraham> RRSAgent: make logs public

jgraham: thanks :)

<cb> thanks for scribing @foolip @jgraham

<jgraham> RRSAgent: make minutes v2

<jgraham> RRSAgent: make minutes v2

Minutes manually created (not a transcript), formatted by scribe.perl version 123 (Tue Sep 1 21:19:13 2020 UTC).

Diagnostics

Maybe present: gsnedders, Shengfa, Zakim