WebDriver – 14 January 2026

Meeting minutes

Proxy configuration

<jgraham> github: w3c/webdriver#1920

<jgraham> sadym: We want to enable scenarios that are supported by all the browsers, different proxies for different traffic, different proxies for diffrerent protocols e.g. http via SSL, some traffic to one proxy some to another. Have proposed some solutions there. Are the scenarios mentioned supported by all browsers? Do we want to enable it? What the best

<jgraham> way to do it?

jgraham: My concern is that it assumes that some schemes exist such as socks or socks5 that don't currently exist.
… But the principle seems fine to me.

<jgraham> sadym: The updated protocol allows passing either a string or a detailed JSON object, like for unhandledPromptBehaviour

<jgraham> sadym: That would avoid defining a socks schema, so that should address the concern and be more extensible

<jgraham> jgraham: This addresses the concerns I had about coining new protocols or schemes. Historically the spec has been underdefined here, so if possible we should clean it up and have explicit outs for implementations that don't support specific configurations.

<jgraham> whimboo: As long as we don't have compatibility issues I'm fine with these changes. If some browsers don't support all configurations that's fine. The cross-protocol configuration sounds good.

<jgraham> gsnedders: We need to make sure there are outs for implementations that don't support configuring certain schemes

ACTION: item to review the PR

Autofill trigger

<jgraham> github: w3c/webdriver-bidi#706

<jgraham> Yoav: On the PR blaze left some comments that we could discuss here. Some comments are discussing going back to a two-phase model where we have a call the registers the data and one that tests using the data. I think there were security objections for that earlier. I'd like to get agreement on which one we should go forward with. If there's

<jgraham> consensus on moving to a two-phase API I can move the API to that.

<jgraham> blaze: Most of my concerns at this point are a lack of testcases, so I can't check what to do. One vs two phase doesn't matter to me.

<jgraham> Yoav: If I created a body of tests using this, would that be useful to you?

<jgraham> blaze: Yes

<jgraham> Yoav: I can do that, we have some tests from the previous iteration, we have an implementation which we can use. How would you like those tests a PR against wpt?

<jgraham> blaze: Yes, wpt is best

<jgraham> blaze: I can leave more comments on the test.

ACTION: Yoav to create some tests for this API

<sadym> Here are examples of using that command: https://github.com/GoogleChromeLabs/chromium-bidi/blob/1f4a9e9389597f99e0bde231b0506b11a03cc740/tests/autofill/test_autofill_trigger.py

<Yoav> WICG/autofill-event

<Yoav> https://wicg.github.io/autofill-event/

<jgraham> Yoav: One more thing that would be potentially useful is that we have a spec for a web-exposed API for autofill; that would be tested by this WebDriver API

<jgraham> sadym: The concern was about the expected effect of the atuofill. We have an implementation and you can check that autofill is triggered and that the fields are filled in. That might address blaze's concern.

Media queries emulation

<jgraham> RRSAgent: make minutes

Media features emulation

<jgraham> github: w3c/webdriver-bidi#750

sasha: This could be similar to the mobile emulation topic
… this for doing dark/light mode etc. There has been some feedback that a number of people would like this and it would be good to discuss what the API would look like
… sadym I am not sure if you want to discuss your proposal

sadym: the proposal was to extend the parameters of what could be emulated and related features

sasha: one of the proposals was to extend how to do the viewport
… and a command for media queries
… and another command for media features and each command would have different parameters

sadym: the main concern with different commands is that if you mix emulations
… it could be quite long to set up the emulation on first run
… there could be issues with race conditions if we do multiple commands
… I think that it would be better to have one command that sets all the emulation features all at once

sasha: I think that for rendering that would be a good thing to make sure we don't rerendering
… it might be also be good to discuss how we do multiple commands at once
… it would also like to discuss the mobile modes. I think that having a number of commands are better as things could override how other specs are done. E.g. Print features don't need to work with mobile features and so on

jgraham: it is true that with the ability to batch commands that we would could benefit from commands that run atomically
… there are times where commands don't always merge in nicely and would that benefit from the batching

sadym: it would be great to do batching... should we invest time now on that
… if we decide on extending set viewport... do we want to specify what features we want to emulated or do we want to keep it open ended by passing in a map of features and you may get a not supported response?

<sadym> Proposed and alernative solutions: w3c/webdriver-bidi#750 (comment)

jdescottes: on the topic of command optimizer vs batcher... I am not sure that we have something that is going to be a strong argument
… on the question of which layer which should group things, I am not sure that bidi is the correct place to descrivbe what mobile emulation should mean
… I would prefer to keep separate commands

gsnedders: the thing I wanted bring up, which is similar to proxy, there are some things that some implementation and we need to handlke the case where an implementation doesn't support a given media feature

<Zakim> jgraham, you wanted to say "no" to it being open ended

jgraham: to sadym saying should we list ieverything or have it open ended... we need to spec this properly and means describing it all
… and having 1 command vs multiple is how do we deal with the case that an implementation can do 8/10... how do we descrive the 2 things that cant be done back to the client
… and I agree with jdescottes that there is value in batching
… however, that making that for everything is that it can be very hard to optimise the reading in of the payload
… since its a specific mechanism it might be easier to optimise this

sadym: so technically we can return not supported params in this command or other commands
… so it would have to return all the unsupported features
… and then it would be up to the client on how to proceed

jgraham: there are semantics that we can define there... it feels a little like reinventing capabilities but across the protocol
… and it adds complexity
… you could, and i am not proposing, that we can do like capabilities and we would have mandatory/not mandatory in what people want

sadym: given that situation of geolocation. If the remote end doesn't handle bearing... they try emulate and remote end doesn't it errors, the client could try again or deal with it accordingly. it will create a lot of traffic
… or we could collect them together. It's tradeoffs between protocol, speed, etc

jgraham: that's fair. We can discuss this async around discussing how this would return a standardised way of saying this key would be error

sadym: if I understand what you're saying is that it is similar to what I was saying with each command would have a standardised response

jgraham: if we have a bunch of commands that it has an extension to the error format about which parts were unsupported when returning to the user. Currently the protocol doesn't support that

sadym: there is consensus that it is wanted but we need to work on the commands/errors and how that is sent back and forth

sadym: to jimevans, what are your thoughts as a client author

jimevans: my general philosophy that we can make the protocol less chatty we should take advantage of that
… one of the drawbacks is that it it tends to do things with a lot of round trips and if we can make that avoidable I would prefer that
… if we can do a single command it is a better design so that could be command batching, which I am proponent of, or a single command
… I think that command batching will be useful for multiple clients

orkon: I think the round trip argument doesn't matter for bidi since we can do things in a event driven way

<jgraham> w3c/webdriver-bidi#447 is the command batching proposal

orkon: in puppeteer and cdp they have separate commands so I prefer that

jdescottes: one concern with separate commands is that we had a case with too many commands being sent at once that could lead to race commands
… if people want to improve performance could cause problems that batching would solve
… so if performance is wanted we should look at how batching can help

jgraham: I started thinking about the generic case rather than the specific case
… <not sure if worth minuting this :) >

sadym: for what I have understood is that would have single command per feature area or one command that takes all the features?

jgraham: I think the consensus is not there yet but we can discuss the minutae in github

Supporting await in Execute Script

github: w3c/webdriver#1436

jugglinmike: this is an old issue that I wanted to bring it up
… and I appreciate this isn't high priority work for this group

<gsnedders> The PR being w3c/webdriver#1431 I presume?

jugglinmike: and I have a non-trivial patch that I would like the group to actually solve this rather than firing it into github and hoping it it is eventually reviewed
… and think whimboo has been responsive and he has been reviewing that
… and I think that when whimboo returns he could help me close the loop on this
… and I hope to get a PR in by the end of the week on this

jgraham: I think if you have a PR please send it in
… it's not a surprise that classic has holes and it's because it predates what was on the platform
… and it's not clear from wpt what should be done here
… but in principle we should definltey fix the spec there
… but implementations may not go fix it as it might not be a priorty

gsnedders: there is a PR that I linked to in irc
… it is from 2019. it was trying to move the terminology of when to execute things and not sure what is different in this spec
… and I remember back then there was a lot of issue with how promises should work in this context

jugglinmike:

<jgraham> +1 to making the smaller change :)

jugglinmike: I think I want to probably close the original patch and get things better improved and I am more pragmatic about how it runs

gsnedders: bidi seems to handle this much better

jugglinmike: I think that's to jgraham point that the platform has improved a lot here and we're taking advantage of it

gsnedders: let's make sure bidi does the correct thing and then make classic do the right thing that is similar to bidi now

RRSAgent: make minutes

jgraham: one tangentially related point, we have strongly avoided references between classic and bidi. If we have a CR of bidi then it could make things look different

Timeout for browsingContext.locateNodes

github: w3c/webdriver-bidi#1055

jimevans: it has become clear to me as a client author is one thing users want is they want to call a command and not have it it return until thing they want is done
… for things like locating an element/node or waiting for an element to interactable before returning the command
… to that end that locating node on a page that we can create a way in the protocol to optionally wait for something to be true
… I can see why the protocol might not want to do it and they have to put it into the client which adds complexity to how things go back and forth
… and I have said earlier that fewer trips acorss the wire is better
… and if we move it to the protocol that would make things better for users
… I am not concerned with how we should structure the data sent across the wire
… my concern is that, as a client author, is that I want to mkae 1 call across the protocol rather than many
… I understand that it complicates the remote end of the protocol

jgraham: I agree there is clearly a pattern in libraries that want to to wait for a page to reach a certain point
… I believe that in puppeteer/playwright do this via polling which isnt problematic
… which makes it a low priority for now and we have avoided timeouts in the current spec
… I do take your point that if the only tool the client has about polling
… <dicsusses items in the github issues>
… as we things that can be done via javascript commands
… if we aren't having timeouts elsewhere in the code, is locatenodes a special case?

sadym: from the user perspective that 95% of failures are down to not waiting properly
… and there are times where locate nodes is the way to do it in a SPA. As for client or browser I lean towards client
… all the solutions have tradeoffs but I think it should live in the client side

jimevans: we have touched on some of my concerns. We do allow searching for nodes via hard ways that cant be done via JS. As a client library if you wanted to search shadow roots for a given child instead of the light dom, it becomes hard to deal with in JS if the shadow roots are closed
… however we allow piercing the shadow roots
… it is something that many users are confused on how to handle
… and they see it as a feature of non-selenium client about piercing closed shadow root. Playwright does element location is done via JS injection and can only do open shadow roots
… and I do believe locating nodes is unique since not all our strategies can be done with JS
… and in the interest of trying to reduce the traffic

orkon: We do polling via different mechanism. We also have mutation observers but they don't support shadow roots
… my opinion is that we should have it but I dont think that its a magical way to solve. We need to have a way to handle this properly

jgraham: if I am paging things in better, we dont have a way to handle shadow roots that are closed
… i think if we make locatenodes more powerful that we should have a timeout
… I can't rememober the discussions that got use to the tradeoffs that we currently have
… i think we will either do something basic which wasn't as perfomant as puppeteer
… so one of the questions is do we have someone that is happy to descrives the work around doing things like that
… I think the biggest value areas or locators that don't map to JS APIs

jimevans: I do remember of how we got to current implemention. The original proposal was more complicated and we simplified around not batching
… if we decided to apply this to things that can't easily replicated with cases that can't be handled by JS
… my worry is that I can't spec it but I know others in this group can
… I am hoping that others see the merit in what I am proposing and help spec it
… if the group wants to ignore it that's fine too

jdescottes: potentially a bad idea is that specing polling strategies could be hard but if we had a generic polling should be simple to do

sadym: jimevans do you have an estimate of time to move to from polling to server, what is the percentage of time that time

jimevans: I don't have metrics but want to point out that localhost isn't the only use case here. Remote end could be somewhere in the cloud
… that could mean a significant relative performance gains if in the cloud case

sadym: my gut is saying the delay shouldn't be no more than latency of the connection

jimevans: that's what I said in absolute perfomance I am not sure of the gains but in relative performance there would be gains by my gut feel

jgraham: regarding jdescottes about a generic retry command. There is an appeal to having this
… it's harder to handle the generic case. I think we need to look through the spec to see how we can benefit things here
… we can see a complicated client, like puppeteer could have their own retries while a simple client could rely on these retries

orkon: I wanted to mention that if a person has a latency issue
… if people are awaiting 100ms. If we don't do the complicated polling, then the simple polling can be done on the client
… it might solve chattiness of the protocol but not improve the performance

<sadym> * my gut says if the client polls and there is a need in retry, it means there is a delay of the node to appear. And if so, it should not be that different from if the client was very close to the server.

jdescottes: I will file an issue around creating a generic polling. Is this for performance or easier to use?

jimevans: this is around using a different client to selenium and it made my client code harder to create as a generic new client that finds it hard to create

jgraham: there is some appetite to look at this in the future and do a little more review for the future. Let's keep the issue open for now

Multiple click events (e.g. double/triple click)

<jgraham> github: w3c/webdriver#1772

Multiple click events (e.g. double/triple click)

<jgraham> github: w3c/webdriver#1772

Sasha: The point which wasn't addressed yet is that some clients will still want to genere double cllicks or triple clicks with different action chains, maybe because they need to run some script in between.

jgraham: The way it works at the moment is that it's left at the implementation to define what double-click or triple-click means.
… Gecko more or less uses a timer and if the events happen within that timer, they get counted as double clicks and so on.
… Relying on internal timers is a bit difficult because it makes things undeterministic.
… If you split across action chains, latency comes into play as well.
… It has also seemed logical to me that things needed to be within the same action chain to be counted together.
… However, some use cases involve starting a click, running an event handler, and then continuing with the rest of the click.
… One option is to have a specific clickCount property.
… We could have some sort of sequence identifier to relate actions together.
… to increment clickCount when actions are part of the same sequence.
… But then the question is whether the timer still runs in that case. E.g., if the same sequence is used 30s apart?

orkon: I think if we're doing double click based on the action chain, that's fine. After an action chain is dispatched, you released all the actions, says the proposal.
… If the event chain causes things to be released, would scenarios that involve checking on mouseup following mousedown still work?

jgraham: I don't think it makes sense to release all the events at the end of the action chain.
… I think they should just not be able to merge as part of a single click if they're not part of the same action chain.
… Something may come from how Playwright handles it.

<Zakim> gsnedders, you wanted to ask about devices without mouse input

gsnedders: Wondering about what it means for devices without mouse input types
… Maybe the answer is simple.

sasha: I just wanted to address Alex comment about resetting the chain. What Hendrik meant was for the double-click only. Related to our implementation. By the end of the action, we would not expect a double-click coming. That would not be for all events.
… For Playwright, for them they want to generate a double click with two clicks using two different actions. They use clickCount in CDP for that. They don't necessarily do anything in between these two chains.
… Sequences are more flexible. They open the question on how to validate that they still make sense.

orkon: Making it possible to double click with two clicks is probably not a good idea, we used to have it in Puppeteer. Splitting things across chains is also not super consistent.
… You can one mousedown in one chain with script evaluation. And then mouseup and click in another action chain.
… It's not timer based in that case.
… There's some inconsistency although we can perhaps live with it.

jgraham: For devices without mouse input, unsupported operation seems a good answer. Double tap is still doable there.
… Sequence ids are more flexible, but there is still the question of the timer.
… Timer seems not ideal but it wouldn't be ideal to drop it as well. If there's no timer, should there be something to cancel the sequence?
… Should a set of pointer actions cancel the sequence?

sasha: I wanted to comment on what Alex was saying on consistency
… The whole issue started with trying to understand how it works for different browsers and specify something.
… Nothing specifies it in the current spec.
… The main point is to specify something that is consistent across browsers that would allow clients to simulate double or triple clicks consistently.
… If implementations are different but the specification still allows that, that seems fine.

shs: It seems like the only way to do that reliably is to have all actions in the same action chain. Otherwise, not only will you hit timer issues, but also you may end up with timing issues with the underlying system.
… These cannot be spec-ced out cleanly.
… It may be that we want to be able to add as part of a single action chain. We mentioned scroll into view.
… Related to operations that we may want to do as part of an action chain.

jgraham: The simplest approach is to say that by default mulitple click events that happen as part of the same actions chain increase the click count unless something causes delay as part of the actions.
… That's the least that I know how to specify. It covers the simple cases, and then we can work increasingly from there.
… My concern is incompatible changes.
… For the time being, people may rely on timers to trigger double or triple clicks, and that would break.
… But then the alternative would be to say that the timer is implementation-defined, which does not seem a good thing either.

sasha: If we would find some kind of different solutions to simulate double clicks consistently, implementations could converge to them over time and we would have a transition path.

jgraham: It would be possible to opt-in in a couple of ways.
… You could add a property on specific events. But I guess that we would things to work like this in general.
… Opting in for deterministic behavior seems doable in any case.

<jgraham> You could also opt in on a performActions call

orkon: Just to confirm that we still want mousedown and mouseup as part of different action chains to trigger a single click.
… I imagine scenarios where you have mousedown in an actions chain, then mouseup, mousedown, mouseup in the next one.
… Should that trigger a double click or not?

jgraham: I guess click isn't on a timer at the moment. At least, I don't think Gecko has a timer.
… If there is no timer, it seems easier to say that it can persist across action chains.
… But I actually do not know.

sasha: To summarize, that's how I imagined it, that we would keep the current behavior but would add a more deterministic way to also do double and triple clicks.

jgraham: I'm not immediately seeing what part of the DOM API to attach timestamps to.
… Is long press reflected in the DOM API somehow? Do you have to implement it by hand?

orkon: I think it's a timer you can measure by yourself.

jgraham: That seems ok. We don't have to care if we don't have to measure time by ourselves.
… If you try and do mousedown, mouseup, mousedown, mouseup split in different action chains, it probably doesn't trigger a double click.

gsnedders: There were some tests in the Interop Pointer Events focus area along those lines.

jgraham: We should look at that.
… Sometimes, I've been wondering whether they were doing silly things or whether we were misusing the TestDriver API.

Screencasting

github: w3c/webdriver-bidi#636

sasha: We wanted to start discussing screencasting following a few requests received on that.
… Similar API in CDP. It would make sense to add that to BiDi as well.
… Obvious way to work with it would be with Streams, but since we don't have Streams in BiDi, our suggestion was to start with file on the disk.
… CDP works with streams so perhaps not interested in an intermediary solution though.

AutomatedTester: Are we straight away saying that we wouldn't support certain mobile devices where it might be harder to access the filesystem?

jgraham: I think the answer is yes to some extent. An implementation could do something more complicated. Something like Gecko driver to control a mobile device, the host could imagine storing the file itself.
… The implementation complexity is probably the same as having the streaming API and having the client do the streaming to the local disk.
… It seems important for users to save to local disk. That's the API that Playwright is offering. That's a user need.
… The streaming API will then allow us to improve on this over time.

AutomatedTester: A fair counter argument.

orkon: Saving to disk is fine in Playwright cases. Other cases where you want to look into realtime, like dev tools.
… It would be nice to support both, with a streaming API.
… But it's not easy to have streaming. Streaming a video is different for example.
… It would be preferable to start with a streaming API in some way unless we think it's not feasible.

AutomatedTester: The argument for streaming real time is one that will be used quite heavily by cloud providers.
… The way we do it right now is by capturing the screen in real time.
… There's a bit of latency there, but that's fine.
… We're talking about millions of tests run this way.
… If we can go to the streaming API directly, that would be better.
… I appreciate it means different things depending on context.

gsnedders: There's both the use case that David was just talking about. Also iOS is very limited in terms of file system access. I wouldn't be opposed to having a capability that allow you with the file system directly, but obviously not all servers can support that.
… When it comes to the streaming side, it gets to video streaming protocols with all sorts of complication, and that kind of scares me.

sadym: Streaming and saving to file are not that different. Would it be easier for browsers to implement saving to file rather than screencast? What is the motivation for saving to file?

jgraham: The motivation is that we currently don't have a streaming API. The DOM has media capture primitives and writing to the disk is relatively easy.
… And that makes adding the feature easier in one go. That already covers a good range of cases that we know about.
… Not the end of it, but useful for a number of users.
… If we don't want to do that, we should prioritize getting the streaming bits running.
… I'm hoping that it can build on top of a simple byte stream polling interface. From a client perspective, if you want to have a video in the end, having to do that from bits seems hard to do.

<gsnedders> Stream is high overhead, esp. with base64 encoding, and potentially could easily eat up a lot of the available bandwidth.

jgraham: Working with files is much simpler for everyone, it appears.

orkon: Should we explore streaming over HTTP? Or do you think it adds more complexity?

jgraham: I think it adds more complexity, but I haven't thought about it enough to have a more informed opinion.
… To fit things in JSON objects, we have a bunch of things to do with that add complexity. Moving away from JSON might be needed.
… I'm very keen to not having support based on screencasting because that still seems years away. High browser complexity for browsers and clients.

orkon: We have a stream of screenshots with a low FPS in CDP.

AutomatedTester: Where is the consensus here?

sadym: No access to the file system is not a blocker, so we can proceed with that.

jgraham: If we do this, being able to say UnsupportedOperation for browsers who cannot support this seems a legitimate thing to do.

Stream IO (e.g. for network bodies)

github: w3c/webdriver-bidi#959

jgraham: Natural progression from the previous topic!
… Use cases include screencast, network response bodies.
… From this one, we may want to read and write to streams to do real-time replacement of the body for the latter use case.
… Polling access to the read streams. Base64 encoding because it's text.
… Writing streams is kind of the opposite.
… My hope here is that we can just generically connect this to the Stream API in the web platform.
… Fetch has these objects internally but does not expose them, but that should still work.
… You would end up with an IO handle, and read or write from that stream.
… Do we have concerns that it tries to be too ambitious?

sadym: What is the delta between the proposed branch and a proper DOM Streams?

jgraham: The branch is basically that it would map to DOM Streams. Just work in progress.

sadym: Do we have other ways to transfer a bunch of data through a WebSocket?

orkon: I think the answer is yes for the whole message, but not for parts of the message.
… There's a binaryType but I think it just gives you a way to have Blob interface or a Buffer interface, not sure.

sadym: In general, I believe it makes sense to provide that API. It would be great to have.

orkon: Just want to mention that in CDP we only have readable streams. Not writable streams. They're nice, probably lower priority.
… Another use case is PDF generation that can be large and exceed WebSocket limits.

<jgraham> RRSAgent: make minutes

jgraham: About WebSocket, binaryType does not help because of the use of JSON, which needs to be UTF8 encoded.
… Or you need to spin up a specific WebSocket as part of your response.

AutomatedTester: It would be quite useful for cloud providers to do it that way.

orkon: I was wondering about the cloud provider use case. In containers, there's usually a way to look into a container. I wonder if that could be a way to do quality streaming directly from the browser.

AutomatedTester: It can be, it becomes cumbersome depending on the OS.
… Desktop and then mobile where things can be almost impossible.
… 99.9% of our users are doing simple things. We wouldn't be automating someone streaming a video game for example.
… If people want to check that a video plays, they really just need a few frames to check that playback runs properly.

AutomatedTester: I think the consensus is that we want this. Writable streams are lower priority.

AutomatedTester: Tomorrow, we'll be kicking things off again at 16:00 UTC. Note we have a few topics scheduled for after 18:00 UTC.

– DRAFT –
WebDriver

14 January 2026

Attendees

Meeting minutes

Proxy configuration

Autofill trigger

Media queries emulation

Media features emulation

Supporting await in Execute Script

Timeout for browsingContext.locateNodes

Multiple click events (e.g. double/triple click)

Multiple click events (e.g. double/triple click)

Screencasting

Stream IO (e.g. for network bodies)

Summary of action items

Diagnostics