WebDriver WG @ TPAC Day 1 – 26 October 2020

Meeting minutes

RRSAgent: quiet

RRSAgent silence

gsnedders: hey :) what's the github bot name?

RRSAgent: listen

RRSAgent: nolisten

<Shengfa> https://‌browserstack.zoom.us/‌j/‌98806831024?pwd=K3JQSVBmanpkeE5uUzd5elBEb1owZz09

<Honza> * jodvarko in the meeting invite

State of the Union

RRSAgent: listen

<jgraham> RRSAgent: make logs public

<jgraham> RRSAgent: make minutes v2

AutomatedTester: The WebDriver specification is mostly in maintenance mode and wpt have been improving. THanks to all vendors in this area

<cb> David, should we record this session?

jgraham: SInce TPAC last year, the webdriver specification has been created. It is creating APIs that are not readily available in webdriver. This works by creating these new APIs that are wanted based on other bit of work based on proprietary APIs
… We have seen clients, like Selenium, adding new APIs that we think people will need, and seen clients that use directly like puppeteer/playwright/cypress

foolip: Not much to add... the justification from Google is that cross browser and ergonomics are not where we need them to be for anyone
… from TPAC it would be good to move our near term roadmap to being more longer term

brwalder: In addition to how to start sessions, we have started using CDDL to allow us to generate clients for the new spec

simonstewart: in the Selenium project has started work on an "idealised" API for how we use domains and would be good to discuss the modules later

https://‌docs.google.com/‌document/‌d/‌1zVvIduq6BfmnVvwGdxrT4l9oglUoOl9hscjzJfYO0Sw/‌edit

<cb> jgraham: to much work is missing from my side to be properly prepared, I will follow up with it soonish but doesn't need to be part of our TPAC meetings

BiDi priorities - what do clients requires to move features over

jgraham: This is a question around the transition process to cdp -bidi. If we want people to move over from webdriver, as an example, to webdriver. What APIs should we start with?

<jgraham> RRSAgent: make minutes v2

AutomatedTester: We need to make sure that we don't break selenium, puppeteer, cypress

jgraham: which features of CDP are you using and are planning to use?

simonstewart: The first thing we do is during session creation to see where the cdp endpoint is and possibly rewrite that endpoint
… we have been looking at the use cases people are using
… we have currently added network interception, javascript exceptions, console logging, basic authentication, and DOM mutations
… we have some domains that we created to get an idealised and have created our own APIs on top
… we took a very use case approach

jgraham: That's great. When thinking of the spec, there are 2 ways to do this. There are create a session and script execution. Or there is the value added item ontop of WebDriver 1 and 2

simonstewart: this goes back to TPAC 2019, we had use cases discussed

foolip: we could appraoch this to be feature complete but that would be fun but I suggest doing what you suggested

Async commands

jgraham: THis is going back to a discussion at a previous F2F call
… basically the way the spec is currently worded, you will always get a response to a command that is sent.
… Events can always come at any point
… and if I remember at a previous meeting, I think Apple suggested that these responses could come out of order
… in a spec we can easily fix this but I want to check if people are ok with this

brwalder: We discussed this in the chromium implementation recently and this makes sense
… especially for people are using CDP
… and this would ease the transition for people moving from CDP to bidi
… this allows for scenarios where people could get a status update on a command and then get a full response when its done
… and there are netwrok interceptions scenarios
… and by not enforcing the order we make the protocol more powerful

drousso: WebInspector will always return in the order they are send unless explicitly called async

<brrian> Example async commands in Web Inspector: network callbacks, IndexedDB commands, anything that could take a long time.

jgraham: my question is then... should we make them async explicitly?

simonstewart: the programming model for the original selenium is not great for the async work
… and I don't think we need to explicitly mark it as async

jgraham: I think the models are isomorphic
… if you want to represent the response that could take a long time you could get an empty placeholder and then an event could come later to replace that placeholder, or via an ID

simonstewart: I don't see value in adding this to webdriver http, so we need to add it as purely event driven

brrian: I would prefer to have it all async but if can 't then we need to definitely show it clearly

foolip: In terms of preferences I would go for everything to be async
… I think where order is guaranteed we need to make sure that highlight it

simonstewart: Its worth calling out that commands are executed in the order they are sent but responses might not come back in that order

jgraham: The natural way of writing this will guarantee that
… the commands will be executed in order

Resolution: Agreement that we should have commands async and any further discussion can happen in the PRs for this work

<jgraham> RRSAgent: make minutes v2

Targets, contexts, realms

jgraham: One of the things to do in an automation protocol we need to know where commands are addressed to
… e.g. in JavaScript it could be document or a service worker
… and we wanted to address commands to specific areas
… e.g. for resizing a window we need to make sure we are in the correct area
… so the question is what is the shape of the API here?
… we are taking our cues from CDP here and we want to be close to CDP to maximise moving the ecosystem over to the Bidi work
… There is an anomaly here from CDP due to it's history likely
… so using a browser context here would be good but we need to discuss it
… I have put a PR up at https://‌github.com/‌w3c/‌webdriver-bidi/‌pull/‌62

<gsnedders> github topic: https://‌github.com/‌w3c/‌webdriver-bidi/‌pull/‌62

brwalder: To go into a little detail here about how things are model. I agree that some of the things are historical items. It's described as originally by JavaScript debug details
… I agree that we need a much higher concept here
… and browser context that makes a lot of sense

<brrian> Enumerating all the context types seems like a fool's errand, new ones get added all the time. Web Inspector now supports JSContext, AudioWorklets, Workers, Web Workers, Service Workers, Extensions/content scripts, and normal pages. I'm sure there will be something next year.

<brrian> IMO, for the purpose of restricting which commands work for which contexts, it would make more sense to focus on capabilities of a context (has JS, has DOM, etc) and allow introspection of the context type.

<drousso> there's also new types of contexts being created, like worklets

<drousso> and more annoyingly, different worklets have different behaviors about what/where the execution context lives

brwalder: it makes sense that we have them as addressable

foolip: James could you direct us to the problems that there might be here

<gsnedders> that just implies we need to be able to extend the set, rather than it being impossible to enumerate them? we see similar with IDL which enumerates contexts

jgraham: My main concern is the migration path for clients trying to support both versions of the protocol. E.g. Puppeteer supporting CDP and bidi
… and I don't want to hit complex pitfalls around multiprocesses

brrian: Enumerating all the context types seems like a fool's errand, new ones get added all the time. Web Inspector now supports JSContext, AudioWorklets, Workers, Web Workers, Service Workers, Extensions/content scripts, and normal pages. I'm sure there will be something next year.
… for the purpose of restricting which commands work for which contexts, it would make more sense to focus on capabilities of a context (has JS, has DOM, etc) and allow introspection of the context type.

jgraham: I see the concerns but I dont see the concrete implications of them are

<simonstewart> It almost sounds like realms have capabilities

brwalder: If we, following what brrian said, that we make sure that we target anything that follows that area. E.g. Send a javascript to a command that realm that supports javascript realms

brwalder: we need to make commands forward compatible so new commands just fit in

<brrian> Other concern (maybe already addressed), are we trying to specify what contexts are top-level and which are not? And the relationship (i.e., a ServiceWorker can't be subcontext of an iframe)

jgraham: I might be misusing the terms
… and the context is a browsing context and not a JS context
… I was thinking of them as discrete items based on their capabilities

brrian: Other concern (maybe already addressed), are we trying to specify what contexts are top-level and which are not? And the relationship (i.e., a ServiceWorker can't be subcontext of an iframe)

jgraham: yes, so the service worker wouldnt have a browsing context

simonstewart: is it possible to reframe the idea of the context in terms of capabilities and you send a command to something that matches the capabilities

jgraham: for clarity, capaibilities here is not Webdriver capabilities

foolip: I assume solutions would be isomorphic

<simonstewart> Getting the list of targets that match a given "feature" seems like a new command

foolip: if we are targeting command to a realm that matches "features"

simonstewart: so if there a command takes a union, not 2 targets

foolip: that makes sense
… what is the complexity we are trying to reduce here?

simonstewart: [reads what brrian mentioned earlier in this topic]. So we have the problem of forward compatibility
… and realms seem like a specialised feature here
… and brwalder mentioned earlier that we would feature matching (pattern matching)

<brwalder> was paraphrasing brrian

jgraham: I think this is describing all the same thing
… a realm has the ability to execute JS
… and browsing context has the everything including the realm
… so you could execute JS in a browsing context and it automatically finds the realm
… so the features model is what we want to follow
… and there are likely to be "specialisations"
… so it will do the thing or it will error loudly. E.g. Get a DOM node or error out saying you can't
… do we think that we have a model here that we actually execute?

simonstewart: When we have something to play with we can have more of a discussion

foolip: what is the decision that we are facing here?

jgraham: The current design looks fine. There might be changes in the future when we get more concrete use cases

foolip: I am struggling to think through the heirarchy of things and then how to go down and then targeting

jgraham: there is no doubt that we need to inform the user what the realm is.
… and this is more of spec organisation issue
… so if another spec adds a new realm we handle it it fits into this set of features

Resolution: Treat browsing context and realms as 2 concepts. Ensure the model is extensible to future items. This is not a great summary

<brrian> lol

Script Execution

https://‌github.com/‌w3c/‌webdriver-bidi/‌issues/‌18

Github Issue: https://‌github.com/‌w3c/‌webdriver-bidi/‌issues/‌18

github-bot: end topic

<brwalder> https://‌github.com/‌w3c/‌webdriver-bidi/‌issues/‌16 looks like its related to the PR

jgraham: We have just been discussing realms
… one of the items that we need to do to prove out the spec is add out support executing script
… There are a few questions:
… How do we serialise data?

Github Issue: https://‌github.com/‌w3c/‌webdriver-bidi/‌pull/‌57

https://‌github.com/‌w3c/‌webdriver-bidi/‌pull/‌57
… Should we follow the like the webdriver http approach using `arguments`

simonstewart: how adrift are we from the webdriver http serialisation

jgraham: it is different to how devtools approaches
… it will return a handle to a value so you pass that on and then carry on
… and thing like CDP has the N+1 problem around `querySelectorAll` that it needs a iterator that does N+1 loops
… and there are improvements to CDP how to improve things
… and this fronts it

<foolip> https://‌chromedevtools.github.io/‌devtools-protocol/‌tot/‌Runtime/#method-evaluate is what I've been looking at. `returnByValue` is what turns JSON serialization on/off. I'm not sure what `generatePreview` does, maybe that has something

simonstewart: If you return a list via Execute Async it would return a handle to the list and then a handle to each item in the list that you need to collect

jgraham: that is correct.

brwalder: The `generatePreview` just gives a "shape" of that can be returned

simonstewart: doing N+1 can be a lot of work if you have a service provider like BrowserStack/Saucelabs

simonstewart: is there a reason why aren't using the webdriver HTTP serialisation?

jgraham: yes, there are times we want to return a handle to JSON objects as well
… we want to try have the best of CDP and WEbdriverHTTP here

foolip: could someone explain the webdriver http approach?

simonstewart: the tl;dr is if it is a basic type return that, if its a webelement return the UUID that represents it. It expands objects to JSONObjects.
… there is special casing for windows

<foolip> Is it https://‌w3c.github.io/‌webdriver/#dfn-json-clone?

jgraham: its `JSON.stringify` but special handling for windows and elements?

foolip: that feels differnt to James proposal

jgraham: WebDriver HTTP would fail in cases where they can't be serialised like cyclic elements etc

simonstewart: we need to prevent too many roundtrips as possible with cloud providers

jgraham: there are cases we need to think about it like WASM in the future
… how do protocols return multiple values async

drousso: in webinspect you have 1 opportunity to return a value
… it gets the completion value as in ecmascript
… if you need multiple returns you need to do something special

<foolip> https://‌chromedevtools.github.io/‌devtools-protocol/‌tot/‌Runtime/#method-evaluate indeed has an `awaitPromise` parameter

drousso: but you could do await promises

foolip: that's the same model that CDP does

drousso: there are ways we can try solve it but it's never been a use case that people wanted

<foolip> the sort of model that exists in https://‌streams.spec.whatwg.org/ might be worth looking at

drousso: you could have your code return a generator

<jgraham> RRSAgent: make minutes v2

– DRAFT –
WebDriver WG @ TPAC Day 1

26 October 2020

Attendees

Meeting minutes

State of the Union

BiDi priorities - what do clients requires to move features over

Async commands

Targets, contexts, realms

Script Execution

Summary of resolutions

Diagnostics