W3C

– DRAFT –
WebDriver BiDi

08 January 2025

Attendees

Present
benchatt5, Francois, jdescottes, jgraham, jimevans, jugglinmike, lauromoura, sasha, simons, whimboo
Regrets
-
Chair
tidoust
Scribe
tidoust

Meeting minutes

Request for a "navigationCommited" event

<jgraham> RRSAgent: make minutes

henrik: Put on the agenda because that is a requirement from Playwright.
… Opposite of navigation started event that we currently have.
… This would indicate that a navigation takes place.

sadym: It does make sense for us. The only question is where to put it in the specifications. It seems Alex has an idea to put it in the HTML spec.

<whimboo> so it is actualy the opposite of `navigation aborted` event

sadym: We do need it as well, we're going to implement.

jgraham: Question was when is the event emitted. It feels it needs someone who understands the invariant of when that event is emitted.
… Reflect on how easy or hard it is to implement.

sadym: It does make sense for us to emit that event when the document changes. When we see that the navigation is "committed". After that event, the navigation cannot be failed anymore. It can only be aborted.

jgraham: Is somebody from Google able to make a PR for this, then?

sadym: Yes.

<jgraham> github: w3c/webdriver-bidi#788

Support user context configuration

<jgraham> github: w3c/webdriver-bidi#775

whimboo: New feature. What is requested here is to support a configuration context which gets automatically applied to any context that gets opened to any command.
… This is for commands but I think the same is also requested for events.
… This is used by Playwright. Not sure about Puppeteer. Alex made a first step for events at least to be able to subscribe and unsubscribe automatically.
… We'd want that from our side as well.
… I have put a full list of options that Playwright that currently supports. I'm not saying that we should support all of them right away. Step by step, we can fill in new options.

jgraham: You subscribe to events but instead of parsing the list of contexts, you would pass a specific list of user contexts. Same as what we have for contexts (traversables).
… The way that works in CDP is I guess different but I think we'd want to avoid that.
… It's not the most efficient protocol but that's the way it naturally integrates with what we have.
… We may want to consider another approach later on. There shouldn't be something specific to user context here.

sadym: I'm looking at the list of events used. I see some that we will not be able to set for user contexts because they are capabilities.

whimboo: I think we have a different issue for these, so we can ignore them for now.

whimboo: To follow up on James comment, what I remember is that they want to create a user context, configure it automatically, and the first tab that gets opened, gets the config.
… With the viewport, it would mean the page already needs to be rendered.

jgraham: You can create a user context without creating any tab in that user context. The very first tab that the browser creates on startup is a bit of a specific case, but you can then do what you need afterwards.
… From that point of view, I don't think there's a problem.

whimboo: About round-trips, If we have 50 settings, sending 50 commands take longer than sending them once.

jgraham: The round trips here are not a major issue here.
… It's not like they have to be serialized, you can send them as fast as possible.
… It feels like a case where we could special case. Or we could not do that.
… For CDP, I think this stuff is already multi-commands.
… They block the user context, set the settings, then unblock. We would do slightly better. Not a perfect approach, but probably not that bad.

<sadym> correct

jgraham: Once per user context rather than one per browsing context.

jgraham: First step is probably to land the events changes. Preloadscript is another one that we may want to do. And then look at other parameters, like viewport emulation and so on.

Support getting response body for network responses

<jgraham> github: w3c/webdriver-bidi#747

jgraham: Use case is that people want to be able to initially read and then read and write the body of HTTP responses.
… The difficulty is that these bodies may be rather large, so we can't send them fully over the wire.
… We talked before about getting something like streaming, with some method to pull more data until it's complete.
… The streaming approach seems necessary in at least some cases.
… The request interception is already in place, but this thing is still missing. My understanding is that this has become the remaining "blocker".
… Other thoughts on what the design should be here? Plans to work on this?
… We'll eventually end up doing it, but it anybody else is looking into it, that would be better.

jdescottes: Question for Maksym. Any API in Puppeteer that exposes response bodies as a stream?

sadym: I had the impression that responseBuffer would return chunks, but it probably returns the whole thing at once.

sadym: I'm not sure if Alex is going to work on it soon or not, so cannot give you an estimate.

<jgraham> Maybe https://chromedevtools.github.io/devtools-protocol/tot/Network/#method-getResponseBodyForInterception allows getting the body as a string for intercepted requests

<sadym> https://chromedevtools.github.io/devtools-protocol/tot/Network/#method-getResponseBody

jgraham: It does look like CDP has both full body and chunk streaming capabilities, through getResponseBody and getResponseBodyForInterception

sadym: The only way it works is no chunk, but streams and base64 encoded streams.

jgraham: That suggests some possibility that we could start by returning the whole body as a string, which would certainly help us ship something faster.
… It feels a good first step, but also scary if the body is a 2GB piece of data.

sadym: Currently, our implementation only provides a full body, even though the API offers an apparent stream. It may lead to a false "safe" feeling for the user where they feel they get a chunk of the data whereas we sent the entire thing under the hoods.
… It may be better to stick to strings initially.
… and thus the whole body.

jgraham: It seems there's some consensus to start with an API that gives you the full body.
… A possible complication compared with fetch is that chunks may need to be split across messages.

jimevans: Are we talking about a base64 encoded version of the body string? I worry about things such as people intercepting an image or any binary data.

jgraham: We need to support both. If it's a thing with a text content-type, we should return a string. If not, I'm not sure about the details but we need to support both use cases (base64 encoding and binary), and there's a pattern on how to do that.

<jgraham> RRSAgent: make minutes

Minutes manually created (not a transcript), formatted by scribe.perl version 242 (Fri Dec 20 18:32:17 2024 UTC).

Diagnostics

Succeeded: s/viewports/contexts (traversables)/

Maybe present: henrik, sadym

All speakers: henrik, jdescottes, jgraham, jimevans, sadym, whimboo

Active on IRC: benchatt5, jdescottes, jgraham, jimevans, jugglinmike, lauromoura, sadym, sasha, simons, tidoust, whimboo