Meeting minutes
<jgraham> Zakim: this is webdriver-bidi
<mathiasbynens> The meeting has not started
<jgraham> Yeah, so I think the problem is that with AutomatedTester away no one can start the zoom call
<jgraham> So unless something changes, I think we either need to defer e.g. until next week or pick a different videoconf
<mathiasbynens> I could create an impromptu Google Meet URL?
<gsnedders> that works?
<jgraham> https://mozilla.zoom.us/my/jgraham ?
<mathiasbynens> let's go with ^
<jgraham> For anyone still waiting to join the meeting, we changed conference rooms since AutomatedTester is away
jgraham: agenda is at https://www.w3.org/wiki/WebDriver/2020-09-BiDi#Agenda
jgraham: first agenda item is follow-up from last meeting. logging module and more?
<jgraham> foolip: does discussion need to continue?
foolip: https://github.com/w3c/webdriver-bidi/issues/45 was filed
jgraham: we can have the discussion on GitHub, but a bit early to specific an actual feature yet
foolip: let's do https://github.com/w3c/webdriver-bidi/issues/43 first
specifying navigating a browsing context
<jgraham> foolip: Once we hae basic protocol, how do we navigate? Inspiration from CDP and WebDriver. These are very different. WebDriver has page load strategies that can be set to define the behavior, and it blocks.
<jgraham> foolip: Navigate and then return.
<jgraham> foolip: CDP model is request navigate and then events happen and client has to wait
<jgraham> foolip: Opinions? Vaiations are possible e.g. progess updates, which aren't quite like CDP
jgraham: I think the point of this is to do something more like CDP in these cases, that the protocol should be lower level and not having blocking commands
jgraham: I'd be against adding a long-running blocking command for navigation, or anything
jgraham: not sure about the progress update model was, but my assumption is you send a navigate command and then start getting events for various points in the lifecycle
jgraham: I think CDP has more events for network stuff, maybe it all falls out of performance timing
cb: +1 to that
cb: the reason we're looking at CDP is that it allows the browser to be introspected
cb: which makes it really powerful. people will use a framework that abstracts complex logic like listening to certain events
cb: someone with no understanding of the protocol would be able to use it through a framework
<jgraham> foolip: Means that we have to define lots of events to make navigation work, but there's some chance to start with just e.g. DOMContentLoaded
brrian: beyond sending a response once load is done/canceled, I don't think anyone else needs to go in
brrian: that said, it can take a very long time for a navigation to be committed
brrian: don't see a big difference between the time it will take
brrian: what we've done in the web inpector protocol is async commands, where responses can come out of order
<jgraham> foolip: is navigation committing a thing we want to block on?
brrian: I see no reason to return early if you can't do anything with it
<jgraham> foolip: How does this work in web inspector
brrian: there's no timeout for the navigation
brrian: not sure how to specify it, it can take a long time to navigate
<drousso> +1 to that ^
drousso: in web inspector there's no navigate command, that's triggered by JS
drousso: but while you're navigating, what is there for you to do in the page?
jgraham: one thing that might be different from the underlying inspector protocol is that we're mulitplexing a bunch of browsing contexts for a single connection
jgraham: so maybe you should be able to navigate one while you do other work in another
jgraham: way I thought this would work is you navigate, and the response tell you only "yes that was a well-formed command, I can try to navigate based on that"
jgraham: and later, you get some event, not with the command ID, but with the browsing context ID, say
jgraham: that does have implications for client authors
jgraham: there could later be an error, if the navigation couldn't actually occur
jgraham: but in other ways it seems similar to the async command model
jgraham: the difference I think is whether you can send no response at first and later send a response with the command ID
jgraham: or if there should always be a response saying "that packet was valid" and later can't send another reponse for that
drousso: that's is what I was describing as an async command
drousso: but rather than sending an empty response, we can tie them together
<jgraham> foolip: Still some choices to make here. Can take it to the issue
gsnedders: want to point out a bunch of potential error cases, where the client fails to communicate with the browser, which might have crashed or is running on another machine
<brrian> Also, due to multiprocess and embedder navigation policy, WebKit can't synchronously tell if a navigation is valid or to be cancelled.
gsnedders: so there are cases where even commiting the navigation can fail
jgraham: we're 50% through our time
cddl refactor
issue is https://github.com/w3c/webdriver-bidi/pull/50
jgraham: don't think there's loads to discuss
jgraham: point of the PR is to fill out more of the basic infrastructure in the spec
jgraham: one of the things it does is change how we use CDDL. initially had a single CDDL fragment per command
jgraham: in theory we could extract the schema for the spec, given the right tooling which doesn't exist
jgraham: it defines remote end and local end schemas for validating messages
jgraham: it's slightly awkward, using CDDL is a little bit of a compromise, but I think it's an improvement
cb: to confirm, we'll have all the definitions in the spec and need to build tooling that extracts it into a single CDDL file?
jgraham: yes, that's the idea
basic support for subscribing to events
issue is https://github.com/w3c/webdriver-bidi/pull/51
jgraham: this is based on the previous PR
jgraham: adds the basic mechanism for subscribing to a specific event stream
jgraham: there are compromises which might be worth discussing
jgraham: at the moment it's in a session module, which I'm not attached to
jgraham: you can pass in module names and a list of browsing context IDs
jgraham: I didn't deal with real-specific cases like service workers
jgraham: at the moment, you can enable events globally, like "tell me when a new browsing context is created"
jgraham: you can also apply it to a specific browsing context and its descendants, unless you unsubscribe on a given descendant
jgraham: if you subscribe to logging, you get it for frames as well unless you opt out. that was an arbitrary choice. I think Brandon had advocated for that, but people may have opinions
foolip: is there a parameter in the command to avoid the descendents
jgraham: not at the moment, would be more complicated, but possible
jgraham: I don't have a strong sense of what the implementation tradeoffs are. My feeling is that this is probably sensible from the pov of automation clients, like if you want to see all the logging for a window.
jgraham: that would all be part of the test
<mathiasbynens> seems sensible from an end-user perspective, but might be tricky with site isolation/OOPIFs
<mathiasbynens> (not pushing back per se, just saying it might be more challenging to implement)
<mathiasbynens> supporting descendents, that is ^
<gsnedders> yeah, OOPIFs seems like it makes it difficult, but only getting an event a new descendent is created is painful because then you can miss early events from it
<drousso> ^ +1
<jgraham> foolip: Are non-browsing context targets part of the the design?
<jgraham> jgraham: Not currently, but know it's needed
mathiasbynens: out-of-process is hard, but that's an implementation concern
mathiasbynens: you would like things to behave the same for in-process and out-of-process
jgraham: yes, that worries me
<jgraham> foolip: Would hope that it's possible to have the same model for in/out of process, but it could be hard to actually make it work. Need implementation experience
jgraham: another point is that for most test use cases you probably don't run into these problems
foolip: can you clarify?
jgraham: think it's common for tests to only care about a specific origin, that testing cross-origin is less common
<jgraham> foolip: Data point: we have a puppeteer bug with OOP iframes, so it is a thing that users run into
jgraham: if we do something different here, can we do something better than the client could do?
jgraham: is it better to punt it onto the client?
<jgraham> foolip: what we learnt from the bug is CDP pauses new targets until you have a chance to attach and register for events
<jgraham> foolip: Design is pretty weird, but could be a direction to explore
gsnedders: on processes. given that we can be subscribed to multiple top-level browsing contexts, we can already be subscribed to multiple processes
<jgraham> foolip: easy to have a bug where you miss events from OOP iframes but in-process iframes work
Resolving minor issues for basic transport
first issue, https://github.com/w3c/webdriver-bidi/issues/47
<jgraham> foolip: number of connections. Noone wants to support multiple connections, but there are details about how to do that. If websocket closes, when is the state reset
jgraham: since Simon isn't here I will channel him for a moment
jgraham: this only applies to "furthest remotes", not intermediary notes and such, they'll support multiple connections at a time
jgraham: they could multiplex
jgraham: but we still need to specify that for browsers there can only be one connection
jgraham: scenario is if the WebDriver session is still open but the WebSocket connection was closed?
jgraham: then spec should say you're able to connect?
<jgraham> foolip: easy to write a bug where the client thinks the connection is dropped but the server doesn't yet
Shengfa: What jgraham was saying about keeping the sessions state, if you first connect and subscribe. If you reconnect, are you still subscribed or do you have a clean state?
Shengfa: If you want to keep the state, you have to record the events and send them back to the client.
Shengfa: On the other hand, if the WebDriver session closes, do we automatically close the WebSocket connection?
jgraham: We've talked about event buffering before and it's very difficult. My fairly strong preference is we don't spend a lot of time worrying about it now, difficult to get it right.
jgraham: I'm tempted to say that we don't buffer any events if you reconnect but you do retain the state
jgraham: I think there's been conversations about allowing starting a session without the HTTP handshake, which I think will be possible, but there's conceptually still a connection
jgraham: maybe in that case if the websocket connection drops the session ends, but we can think about that separately
jgraham: But assuming an explicit WebDriver session, closing that should close the WebSocket connection
foolip: If you start a session and then do nothing for a long time, won't the connection drop?
jgraham: No, there is no connection, you've made a HTTP request and then there's no connection.
https://github.com/w3c/webdriver-bidi/issues/48
jgraham: is it minimum maximum size?
<jgraham> foolip: Minimum size for the maximum message size
<jgraham> foolip: Matters because clients could not interoperate on this
<jgraham> foolip: Could be useful to look at existing limits. CDP/Chromium WebSocket might have 256Mb limit
drousso: along those lines, in web inspector we have commands that will send back the response content, which is content that came over the web, and you'll never encounter a 256 MB JS file
<mathiasbynens> challenge accepted
<jgraham> foolip: concern is that limi shouldn't be too small
<drousso> mathiasbynens: plzno 🤣
drousso: these limits are likely limits that won't matter in practice
<jgraham> gsnedders: On mobile, limits might be more aggressive
gsnedders: on a mobile device you might want a much more aggressive limit
gsnedders: total amount of memory the device has sets a limit
drousso: generally speaking, the limit only comes into play if we've designed the commands and events badly
drousso: this will only come into play if we batch a whole lot of things together
jgraham: think I agree with not ratholing on this too much
jgraham: also think that things like print to PDF and screenshots can generate large messages
jgraham: think I've seen multi-megabyte messages at least
jgraham: a small number of commands that we have to base64-encode at the moment
jgraham: maybe have some way to split that data up
<jgraham> I think some CDP things have a way to send back a handle you can use to get the data?
<jgraham> There's at least design options there
Zakim: generate notes
<jgraham> RRSAgent: make minutes v2
<jgraham> RRSAgent: make logs public
jgraham: thanks :)
<cb> thanks for scribing @foolip @jgraham
<jgraham> RRSAgent: make minutes v2
<jgraham> RRSAgent: make minutes v2