Browser Tools- and Testing WG, Day 2, TPAC 2019, Fukuoka -- 20 Sep 2019

<ato> Chair: AutomatedTester

bi-directional communication

<ato> https://docs.google.com/spreadsheets/d/1mZGuKR4SR6HGhqf4wTJQg6eJQBkqnP4G3ibkPTIP_1Q/edit#gid=2126300658

<mmerrell> everyone should mark themself present

<simonstewart> I hast marked mineself as being present

<mmerrell> AutomatedTester: continuation of the bi-di talk. centered around proper examples

<mmerrell> ... need to start from implementation and go backward

<mmerrell> AutomatedTester: start with "loading", how we'd do navigation

<mmerrell> CalebRouleau: first thing would be to target a navigation (across tabs)

<mmerrell> JohnChen: we start with how nav is initiated. the DevTools method is that every tab is a separate tab, and choosing which tab needs to navigate is significant

<mmerrell> .. page.navigate() has 2 params, the tab (the target) and the URL

<simonstewart> https://chromedevtools.github.io/devtools-protocol/tot/Page#method-navigate

<mmerrell> ... 3 other events: page.frameStartedLoading(), happens when the HTML is received. before that, nav is tentative (not committed yet), but once loading actually begins, a page.load event is fired

<mmerrell> ... page.frameStoppedLoading event indicates the loading is done

<jgraham> scribenick: mmerrell

UNKNOWN_SPEAKER: chrome monitors for these events, and makes decisions based on that. This is how loading happens with CDP

brrian: why are we talking about this?

AutomatedTester: we're trying to use an example of a command, and working backward

brrian: is this part of the use case we talked about yesterday?

JohnChen: yes

brrian: this seems like a duplicate

jgraham: if the use cases don't include this the use cases are wrong

<AutomatedTester> RRSAgent: make minutes

jgraham: the point isn't to discuss every command, the point is to cover things that are fundamental to the protocol, of which navigation is one
... there should be some way in the bi-di protocol to initiate navigation, the requests, responses, etc. We need to discuss this, because it's the base part of the framework for the whole conversation

brrian: this isn't part of the use cases

CalebRouleau: we should be discussing navigation because it's more contentious, while we're here in the room, rather than discussing something like logging, where we'll probably agree on everything

lukebjerring: it's easier to start with a use case that involves every part of the protocol

jgraham: nav is a necessary component of a rewrite of the protocol

simonstewart: it's not necessary, and it's already in there

jgraham: but we've already demonstrated that the discussion has been incomplete

simonstewart: if the purpose here is a re-do of the bi-di protocol, then it makes sense to do load. if it's to discover the shape of how to do it, that discussion is worth having

JohnChen: one use case was the request modification (intercept), which has been covered in the CDP
... we could insert an event in front of the loading call, which would allow this kind of interception in a way that is non-blocking, which fosters an async loading of the page in parallel with the intercept

drousso: the idea of loading a page, and intercepting a request, are two separate things entirely. It's not necessary to talk about these things at the same time... the only thing that overlaps is that they're going across the netork
... a lot of what's being proposed comes down to data, and we need to discuss the shape of what it is and how it goes, not about the combinations of these concerns in a specific implementation

JohnChen: whatever script is running on the page disappears once the load event changes

drousso: you should be able to send a script to the driver, which executes just prior to a load event

CalebRouleau: but JavaScript can't do everything we want, so we need to talk about it as a separate issue

jgraham: this comes down to script execution, though, not bi-di fundamentals. We need to understand the nature of the bi-di communication in order to agree on how to proceed in this conversation

CalebRouleau: this is getting too meta, so I suggest getting back to concrete examples
... we should talk about logging, which will give us the context that fosters agreement on a framework or bi-di

jgraham: talking about logging risks too deep a dive into a simple use case, where we'll get distracted from discussing the nature of bi-di communication

AutomatedTester: with logging, we *won't* discuss the particulars of logging, we'll stick to the fundamentals of the packets and how they look

bwalderman: that will include transports, correct?

jgraham: yes, but there's a risk that we're leaving too much out

[general agreement on moving forward]

simonstewart: we should also include the handshake

<JohnJansen> I will paste this again to make sure everyone has read Google's description: https://docs.google.com/document/d/1eJx437A9vKyngOQ49lYYD3GspDUwZ6KpKDgcE2eR00g/edit#

jgraham: has the position changed since TPAC 2018? the conclusion was around a capability that included "bi-di". Has that changed?

JohnChen: our prototype allows for that capability, which "upgrades" the connection to one that goes via a websocket, but which doesn't exactly hand back a websocket connection. It keeps the websocket connetion between the client and the browser, not exposing it further

simonstewart: the top-level return payload should include the "upgrade URL", but which can be re-written

RESOLUTION: Bi-di is always enabled. An optional capability, defaulting to true, indicating that bi-di is desired. When a new session is established, the return value of the new session contains the new top-level property of the bi-directional URL

<AutomatedTester> scribenick: automatedtester

CalebRouleau: are we going to be doing 1 URL or multiple

simonstewart: 1

<brrian> https://www.jsonrpc.org/specification

<mmerrell> scribenick: mmerrell

brrian: I propose that we send commands in events. We can work through examples, but it needs to go via JSON, which can include binary data
... there are . alot of tools available for generating code that can use this, incl C, JS, C#, etc. We should discuss

jgraham: is this close to existing implementations?

bwalderman: we should start with JSON RPC. I second
... CDP is basically already JSON RPC, only missing a couple things

brrian: one difference is that JSON RPC is very particular about one request, one response. You have to batch things together. We should write tests for this, but ultimately adhere to the JSON RPC protocol

bwalderman: it's proposed that the bi-directional WebDriver protocol uses JSON RPC

jgraham: we would need to study the standard more before agreeing to this

JohnChen: there are some pieces to this that would be a challenge for us to conform to, particularly around notifications and identification of these responses

ato: I don't think we can use JSON RPC. We are fundamentally constrained by existing clients. We need to be able to proxy existing RPC clients, which would require a fundamental rewrite of all clients
... we shouldn't change the fundamental transport protocol
... there's a corpus of clients which already use a protocol. This would be an unnecessary change

jgraham: what's the advantage of using JSON RPC, as opposed to the CDP version of the JSON that's being carried over?

brrian: it's a spec. I don't want to be required to conform to weird CDP bugs

ato: I would like to use a more well-defined message formatting, but I'm not sure the JSON RPC is the right answer. We should nail down the specifics before we do this. JSON RPC is a good guidepost, but we shouldn't assume it will solve all our problems, and may require additional specprose for how to define these things, and it may get very complicated very quickly

lukebjerring: can we have a translation layer? this would allow us to transition clients over time

CalebRouleau: what problems are there in the CDP right now that would prevent us from using it as a guideline?

simonstewart: we shouldn't start from an existing implementation and work backward, we should start with what we need and define the protocol as required

jgraham: the mistake we've made is "making small changes to things that work, and spending years fostering adoption", when we should instead focus on solving user problems, as evidenced by the heavy usage of CDP
... we're making a mistake by talking about the transport layer, in that we're missing the actual use case. Let's defer making a resolution at the moment, because we haven't moved through the conversation enough, guided by real knowledge of the existing issues

Hexcles: the JSON RPC spec is still a single-direction protocol. We haven't started to talk about how to make them truly bi-di, but there's nothing in the JSON RPC spec that directly encourages bi-directional communication

cb: adoption of tools is a whole other problem. Cypress, Puppeteer, etc., all use the CDP itself, so making the spec more friendly to the CDP would be a benefit to the spec, and we'd be missing a lot of use cases by ignoring it

<AutomatedTester> Zakim close the queue

brrian: there's a lot of concern about the difference between what we want and what JSON RPC offers. I've personally found it to be quite easy to follow, and made no practical difference to the amount of change. The benefit was that we can say we conform, and that our packets will be predictable
... having written 3 implementations in 3 languages, I can say it's trivial

jgraham: JSON RPC is roughly the shape of how the transport protocol should go, but there are particulars to its adoption

ato: we have concerns about version pinning as well as the server-side events as they come across. I have ideas for a transition plan, but we need to address the concerns before we can resolve

jgraham: we agree that we can't adopt the JSON RPC spec without having studied it further

<inserted> scribenick: MikeSmith

AutomatedTester: start from .. transports roughly agreed

simonstewart: How to send references to browsing contexts, frames
... something like ExecuteScript

CalebRouleau: WD curently has notion of "you are attached this to this specific handle"

simonstewart: I think we will allow to communicate with multiple handles
... [example of communication with ServiceWorker]

CalebRouleau: send to any browsing context, and get events back from any?

simonstewart: yes

<CalebRouleau> ack

ato: "control" messages, example, if you want to get browser-internal logs, that might not require a target ID
... some commands make sense in a global scope, some make sense in scope of a single browsing context

jgraham: JS realms, global object is a JS realm
... important to distinguish between browsing contexts and targets
... it's important to be able to target more than just window globals
... [inject script case]
... should be possible to specify either a browsing context, or a target, or

drousso: similar to how Web Inspector already works
... we include a target ID in every message

brrian: SafariDriver has an internal protocol in which the browsing context is passed around with every command

ato: what we have now requires a lot of context-switching
... that makes sense in WebDriver's view of the world
... [which maps to how a user sees and does things]
... but I have a sense that the bidi protocol is a bit lower than that
... so some of the things we have held true so far might no longer hold true in the bidi protocol
... ability to associate a message going over bidi with a specific browsing context, without needing to switch into that context

RESOLUTION: It should be possible for command request messages to target a particular target/browsing context.

brrian: for random clients, let's make it as foolproof as possible

simonstewart: we want to reformulate WebDriver on top of the bidi communication thing
... have world where we can do everything in the same protocol

ato: we should continue to bear in mind that we enable that kind of programming model that is being use by, for example, Puppeteer
... message indexing is really important
... we would agree that every request wof this bidi protocol should have exactly one response
... that case is not implicit in JSON-RPC
... additional complication of CDP, the fact that has a target that is not a browsing context but is instead an execution context
... you get an event back telling you that a new execution context has been created

AutomatedTester: what is left to do?

jgraham: so bunch of stuff we been doing by reference to CDP
... wrapping messages to root them to a target
... we too want to do it that way?

simonstewart: new thing in CDP, where yo uhave a session ID and you prepart a message and send to it a single WebSocket connection

drousso: we are similar to that

jgraham: do we weant to replicate that design?

simonstewart: no

jgraham: artifact of the way that devtools needed to operate
... the reason they added that wrapper way is because they did not want to change the existing protocol they had, which assumed a single browsing context

simonstewart: looks definitely like a historical artifact
... double-encoding JSON

CalebRouleau: gross
... we don't want that

brrian: we have a similar implementation detail
... I don't think we should expose any of that

CalebRouleau: target will be consistent across a browsing context
... not just doing what CDP does

jgraham: do we adopt the syntactic pattern that already exists? or do we do something more sane?

brrian: the existing devtools mechanism comes from things that are specific to debugging
... and we are not making this feature for debugging needs

bwalderman: process-switching is an implementation detail
... I don't see any reason to abandon the current model of the browser as we are using for WebDriver now

ato: conflation of targets and execution contexts

<ato> https://firefox-source-docs.mozilla.org/remote/Architecture.html

ato: Firefox is working on a implementation of CDP
... see the doc as the URL above
... a target can be a tab (as opposed to a browsing context)
... you can route individual messages using the session ID

simonstewart: so we connect, and the thing we want to do is, register a listener (say)
... we will have a command name, a list of arguments
... how do we say which execution context it will be run in?

JohnChen: WebDriver process is normally implicit
... but in bidi it is is different

simonstewart: context ID

ato: historially CDP did not support site isolation
... so there are artifacts in it that were based on assuming that
... important thing is for the context ID to be a serializable value that can be passed around

simonstewart: we haev a window handle, but not for a frame

jgraham: we do

ato: we have this is the spec but not implemented
... CDP is both an HTTP API and a socket API
... can auto-attach
... (which changes the implicit target, btw, and not sure we want that part of it)
... a service worker is not a browsing context
... we are inventing a super-abstraction above browsing contexts

bwalderman: getting an event back to the client when a mutation occurs
... maybe need a way to pass in a function go get message back to client

ato: how to identify JS object, we should talk about
... in CDP you can return anything; for example, a JS object

simonstewart: element ID in the WebDriver spec is because we were limited by the serialization mechanism
... element IDs are the JS object reference
... window handle is the target iD for browser context

bwalderman: message passing?
... supply a postMessage ID

ato: is that in CDP?

<mmerrell> we're also introducing a new id for the frame

CalebRouleau: is there a reason CDP does not have this?

drousso: in devtools we have direct access to the engine

CalebRouleau: how does Puppeteer do this?

<mmerrell> Simon, can you summarize the last point about the frame ID so that we have your telling on record?

ato: you can pass in fuctions, inner functions, or Promises

bwalderman: CDP pollutes the global namespace [to do the similar thing]

<simonstewart> Summary: Add a new "get context" function to existing webdriver. If the "current context" is a top level browsing context, this will return the current window handle. For a frame, it's "something" that we create. In both cases, the "context id" can be passed to bidi to use as a target id

<mmerrell> thanks

ato: current CDP primitive is conceptually very similar to what we were are already doing in WebDriver
... primitive for script execution without a lifetime

simonstewart: there is a no synchronous thing for this in CDP

jgraham: there is no blocking

ato: another model is the Promise style [in addition to return-by-value]

simonstewart: get the communication part sorted out first

jgraham: existing clients provida a way to create a custom event stream in JS?

ato: there is a way in Puppeteer

[discussion of handling of bootstrap scripts]

jgraham: this is how extensions work, basically
... so conceptually is already exists

ato: connections in the API, for a script injection, for a single browsing context, there soucld be multiple execution context
... some execution contexts may be privileged
... each service worker can have mulitple JS realms

jgraham: theoretically, yes, but in practice, no

simonstewart: if you send an element from the remote end to local end, how do you know which context it has come from?

ato: I don't think it does in CDP, but there is a way to query for it

simonstewart: ... which is very inefficient

bwalderman: included in the event, is how we should do it

CalebRouleau: to implement this in ChromeDriver will be a pain and inefficient

[JohnChen explains why]

JohnChen: have not found an efficient way to map element ID

CalebRouleau: in JS land

JohnChen: low-level, the IDs that devtools know about are not exposable to JS

jgraham: when do runscript in CDP, you get back a reference to an object
... but what WebDriver wants is not that
... so you would need to query again to get what we would need

JohnChen: so we could do what we need but it will require additional roundtrips

drousso: similarly for us in Safari

<bwalderman> has joined #webdriver

<simonstewart> https://github.com/GoogleChrome/puppeteer/blob/master/lib/ExecutionContext.js#L142

<jgraham> https://developer.mozilla.org/en-US/docs/Mozilla/Tech/Xray_vision

<mmerrell> scribenick: mmerrell

<projector_webdriver> hello

your name is brrian

<AutomatedTester> https://docs.google.com/document/d/1gUm7Be-akW2-4mjr15cnZlzwoAfOlfL7b3tWCDrb1Jg/edit#heading=h.f9zxnd3oxxm9

ato's comments on bidi

ato: good progress was made this morning, but we need to make sure follow-up actions are taken in order to prevent a repeat conversation next year
... [ato takes an action to make some detailed proposals around these decisions]

long-running new session

simonstewart: context: new session is synchronous--request new session, and the wait can be forever

<ato> ACTION: ato to draft proposal for the bi-di protocol interop terminology we discussed this morning

simonstewart: this can take unreasonably long, and given that it's a blocking call, this can be "bad"

cb: queuing and throttling from grid/vendors is another use case

simonstewart: networks are [not very good]. We need to have an async new session

<ato> Unrelated to the current topic, here is an example of some CDP protocol chatter: https://taskcluster-artifacts.net/FQEINPSIQ-CWhvboeeJTbg/0/public/logs/live_backing.log

simonstewart: request a new session, a token is returned, which you can use to track on your own to see the progress

mmerrell: similar to the async nature of the AWS API

simonstewart: I have a draft implementation for this

<simonstewart> https://gist.github.com/shs96c/108f5313eae54b94658ee018e37926d2

jgraham: use cases all seem to be for intermediary nodes, right? drivers are usually on local machine, so is this really a concern for non-local nodes, or can it just be implemented on intermediary nodes?

simonstewart: I want it to be consistent, so we should only have to write the code once.
... might not need to be the highest priority, but this would be a benefit

ato: is this how VM requisition works in the cloud?

simonstewart: that seems to be the case

ato: we should model this kind of API on known-good usages of such a library

simonstewart: a good usecase for this on a local machine would be for queueing multiple requests

<ato> titusfortner: https://w3c.github.io/webdriver/#dfn-readiness-state

AutomatedTester: what are the session creation events?

<ato> titusfortner: Also https://w3c.github.io/webdriver/#dfn-active-session

simonstewart: the first stage is that the request is queued, then being created, then created

<ato> “A remote end that is not an intermediary node has at most one active session at a given time.”

<ato> Also:

<ato> “A remote end has an associated maximum active sessions (an integer) that defines the number of active sessions that are supported. This may be “unlimited” for intermediary nodes, but must be exactly one for a remote end that is an endpoint node.”

simonstewart: in the creation process, there are events we'd like to know about that might be interesting for the client to discover

<titusfortner> oh, yes, so a single driver, but most drivers can have multiple processes

<ato> You may have multiple processes on systems that support that.

<titusfortner> right; except safaridriver, which is why I thought this might be more interesting for Safari, but doing it in series is probably better than parallel. anyway confusion resolved, thank you!

mmerrell: would it be good to expound on the AWS example?

simonstewart: roughly, though it will require some more investigation. But we're heading in the right direction, with some open questions about how to return lists of events, etc
... this would be greatly helped with a bidi implementation

JohnChen: you'd see a connection token that you could use to track the status of various events

CalebRouleau: which in Chrome's case will likely be nothing

JohnChen: there will probably be a few interesting cases, in the case of queuing in particular

simonstewart: yeah, when you want to query capacity

JohnChen: that's actuallyone place where Chrome is not spec-compliant--it won't queue sessions, it will just create them on-demand

cb: there is already an end-point to query the state of a session. why couldn't we just use that, rather than creating a whole new mechanism for async session management?

simonstewart: there will be other cases where you'd need this kind of information, e.g. different versions of grid, or SL, etc

cb: as an implementer you'd want to use the async version, correct?

simonstewart: yes

cb: would you then get rid of the synchronized method?

simonstewart: we'd likely reformulate the sync to use the async, and just appear to be synched

jgraham: should this be a parameter to the existing endpoint? why create a new endpoint?

brrian: wha'ts the fallback if you have the extra param and it doesn't come back correctly?

jgraham: it would make the return type polymorphic

ato: question regarding security: with WebDriver, you can't query open sessions. Will this break that?

simonstewart: you can't get a list of sessions--you get a request key, and you query on that request key
... you don't get access to others that you don't already know about

AutomatedTester: would this be implicitly handled in client bindings?

[general assent]

AutomatedTester: the end user won't know or care?

[general yes]

titusfortner: chromedriver.new() currently blocks on that call. How do we handle that?

simonstewart: the user will never know

titusfortner: the token we get back--should that be the same UUID?

simonstewart: no, there should be no requirement around that... that should be able to be determined by the implementing bindings

jgraham: another argument for adding a param rather than a new endpoint
... this would make it even easier to foster backward compatibility

ato: but you'd still need an endpoint to query the status
... but you'd still need an endpoint to query the status?

[general yes]

ato: you have to account for whether or not you're hitting an older version of the server, without the new endpoint

simonstewart: there needs to be a mechanism to query for the async support. extra parameter is one way, new endpoint is another

jgraham: this needs to be considered a session capability, not a browser capability

ato: sending as an "alwaysMatch" capability would risk rejection on an older driver
... there would have to be merging of the capabilities dict on all the drivers, resulting in combinatorial explosion of logic for processing them all

jgraham: you could keep retrying with the capabilities

ato: this is why it should be a HEAD request, to query the server's capability

simonstewart: based on the agreement from yesterday, we decided we should keep passing the capabilities through

titusfortner: from a user standpoint, I'd rather see a new endpoint, and if it fails, hit the older sync version

simonstewart: this is why ato requested a HEAD to gauge capability

titusfortner: so every session creation request is 3 separate requests?

simonstewart: yes, similar (but better) to an older version of this

titusfortner: what are the bindings going to do to optimize this?

simonstewart: 3 requests: one to get the token, one to query readiness, and one to engage?
... the middle request would give you the state of the request

diemol: how does the grid keep track of all the tokens?

simonstewart: through the hub

ato: how do you get the capabilities of the session?

simonstewart: you hit the endpoint that returns the capabilities?

ato: but we don't get that yet

simonstewart: we should add that
... we used to have that, but we removed it

<scribe> ACTION: Simon to finish PR around the async session request

Pointer spec modification

AutomatedTester: suggestion from Microsoft for scrollwheel, from Samsung

JohnJansen: pointer spec modification request is to allow MS to create new request in this repo to add new tests for this repo
... writing new tests is basically impossible
... like to merge this into the new tests... e.g. how fat is the pen, what tilt is the pen
... like to create new bindings for this, to TestActions in the WD spec. WinAppDriver can't run in WPT, which is very challenging for engineers
... MS would like to write tests for this once it's merged

AutomatedTester: does this handle proximity cases for a pen?

JohnJansen: "tangential pressure" is what that's called. One of the new events

jgraham: what's the issue regarding running the tests?

JohnJansen: they all fail, but they currently can't actually run
... it's frustrating to write the tests now when we know they're going to fail. Need to update the Python bindings in order to make them run

jgraham: why would spec changes block you from making progress on this? it's ok to make changes as a result of creating a proposal for an implementation

JohnJansen: we want to merge this now so we don't have to continue to wait

ato: we can't make this process completely atomic

CalebRouleau: this process shouldn't block you from making progress on something that will eventually be a proposal

JohnJansen: the priorities won't be written until after the changes are merged

jgraham: working group policy is that we don't make changes until the tests are written or passing

JohnJansen: the team we're relying on to write these tests is blocked awaiting official movement on the spec

CalebRouleau: WPT doesn't use selenium--it shouldn't be required that the WD spec changes in order to write WPT

AutomatedTester: this might dovetail into the discussion around the scrollwheel, but for the moment we need to decide whether to merge this or not

ato: the important thing is that it can't land in something we publish until the tests are complete

jgraham: should we provide an exception to the spec policy to provide for this?

ato: we should find out from JohnJansen's colleague what the particular blocker is to making progress. They should be able to move forward without breaking any rules or feeling any risk of having to redo or lose work

AutomatedTester: we should meet next week

<JohnJansen> ACTION: JohnJansen schedule meeting with AutomatedTester to chat with Timotius re: Test for Pointer Modificatoin

Scroll Wheel Request

AutomatedTester: we agreed last year that we need a scrolling action of some sort, but we haven't made progress
... we have Lan here to help with use cases

Lan: devtool protocol has an Action for mouse wheel, but this hsouldn't be part of the Point actions. It should be its own kind of action

ato: why is mouse wheel not associated with the mouse device?

Lan: it is associated with mouse, but it should be decoupled from the mouse--scrolling actions can happen without input from the mouse

ato: something like mouse wheel introduces questions: what does the wheel actually do? most things are definite (XY coordinates, etc), but mouse wheel motions are vague, and based on preferences set in the system or browser

jgraham: but we have this kind of ambiguity in plenty of other input devices, and these things break across systems on occasion

ato: compared to a mouse click or button press, the wheel is less deterministic

<simonstewart> https://w3c.github.io/uievents/#event-type-wheel

jgraham: you still end up generating DOM events that are predictable with a mouse wheel, so we should be able to model this interoperable behavior like anything else
... it's the same as measuring the mouse motion itself--we measure by deltas. This is just a slightly different mechanism for measuring a slightly different input device

Lan: we propose extending the action API, by adding the delta of the mouse wheel, or one "tick" as defined by the mouse hardware

simonstewart: right, as ato says, we don't measure movement, we measure deltas by CSS pixels
... looking at the events generated by the wheel in the spec above, it's still measured by CSS pixels

<ato> https://developer.mozilla.org/en-US/docs/Web/API/Element/wheel_event

<ato> Funky example.

simonstewart: there should still be a new type, and we would allow these deltas to be recorded, which would map well down to the expected event

<jgraham> https://github.com/w3c/webdriver/pull/1410/files

jgraham: the above PR for "pointer wheel" might be good enough as-is

simonstewart: this needs a test

jgraham: with tests, if Lan agrees, this should be good enough

Lan: would a WPT be good enough?

[room says yes]

JohnJansen: for implementation of this, Windows cares whether you're swiping or turning the wheel here. How do we mimic the hardware here?

ato: we don't

AutomatedTester: if you're using a finger, wouldn't that be a pointer gesture?

JohnJansen: yes, we already account for these swiping gestures. The wheel is different

CalebRouleau: this is just creating the web events, it has no hardware interaction

jgraham: does scrolling with your finger create web events?

AutomatedTester: no, it's a touch-and-drag

<scribe> ACTION: ask for tests on the above PR

<ato> ScribeNick: ato

brrian: This doesn’t have a scroll wheel.

jgraham: It seems totally reasonable for the driver to complain if the scroll wheel is not supported on the platform.

<JohnJansen> (bbrian was pointing to his phone re: no mouse)

https://developer.mozilla.org/en-US/docs/Web/API/Element/wheel_event

ato: "wheel" event is being emulated on macOS.

brrian: There’s no way to scroll on these devices using WebDriver.

jgraham: Is the proposal that we also have a generic scroll API?

simonstewart: Last year we did the scrollToElement in the middle of an action chain,
... scrollIntoView in the middle of actions.

<AutomatedTester> scribenick: AutomatedTester

brrian: we are going to need a mechanism that allows use to scroll

jgraham: yes, we agreed on this last year that would allow us to move to an element and we can move that

ato: yes, but no one has actually specified that

jgraham: there is the argument, why do scrollwheel or just normal scroll

ato: we need both so that we can have the wheel DOM events

RESOLUTION: create scroll as discussed last year as well as item for something that gives off wheel events

<ato> ScribeNick: ato

Automasting PWAs or other "site as app" web content

JohnJansen: We have large teams around building PWAs, like Twitter.
... Testing these things isn’t just about testing the DOM as WebDriver expose it.
... But also things like service worker.
... It spans in a weird way the area between the web platform and other things around you.
... Is testing service worker in WebDriver in scope? Or should we look into other solutions?

simonstewart: We have alluded to it in our earlier bi-di discussion, that automating different execution contexts and JS realms are in scope.

AutomatedTester: What would be different from testing Outlook as a PWA, as opposed to Outlook in a browser? What are the expectations that would be different?

JohnJansen: There’s no address bar, the frame of the browser is different, and the features that PWAs access could be in service workers which we can’t yet access from WebDriver.
... Media queries would be interesting.
... They could be, for example, completely full screen.

bwald_: They are likely to be productivity apps, they might be using the native file system.
... There’s not way to WebDriver to mock these things so the page thinks it’s interacting with a real file system.

AutomatedTester: Take files as an example: are those APIs only going to be available to PWAs and not websites?

JohnJansen: There’s talk about extending the Permissions API to cover PWAs.

ato: Permissions API already has a WebDriver extension, and it has an implementation in Chrome.

bwald_: Let’s say you use this API to grant geolocation to a page.
... Why grant it permission if it can’t interact with it?
... Are there external tools that drive these tools?

JohnChen: A website wants to access my web cam, a popup will ask me if I want to grant permissions.

jgraham: In bi-di you would get an event that this popup appeared.

ato: In WebDriver you have this strange API similar to the unhandled prompt behaviour.

CalebRouleau: But we don’t have a way to mock out the devices at all.

bwald_: File API lets you access native file system and this might popup a native widget for selecting a file.
... This is not automatable with WebDriver.

jgraham: We can do interaction with <input type=file> but it’s just very hard to model this in a nice way over a command-response based API.
... Let’s hope we get bi-di before we really need to implement this.

AutomatedTester: Conclusion is that this falls within the scope of the WG.
... Making sure that people actually use us for wider review (horizontal review) whenever these changes are coming up, especially for Fugu, is important.

JohnJansen: I didn’t know that the Permissions WG had extended WebDriver.
... Is there a reference from WebDriver to their spec?

ato: No, the relationship is the other way around.

<jgraham> https://github.com/w3c/webdriver/pull/1410#issuecomment-533424647

<scribe> ScribeNick: ato

<boaz> scribenick: boaz

Guidelines for Spec Authors

reillyg: I work on chrome on apis like web bluetooth and usb. These apis are difficult to test because they don't follow the normal web testing workflow. They rely on a hardware device and need a mock of that. Or if you are web developer, you might have real hardware and want to use web driver to drive your app with the peripheral attached.
... I struggle with adding web driver commands because I have to land patches on 7 places.

AutomatedTester: are your spec prose changes to webdriver, or to an automation section to an existing spec?ou

reillyg: an existing automation section, now that I am aware of that.

AutomatedTester: now that you have extension points, is there anything else that we can make simpler?

reillyg: plumbing a new command through web driver layers of the web driver spec, cdp, chrom/saf/ff-driver, etc.
... anything we could do to reduce the number of places, would be great. and any guidelines that you could offer woudl be helpful.

AutomatedTester: I think some of this is browser specific.
... for a spec author, we want to focus on how easy it is to write spec prose. and potentially any plumbing that may need to change to the client. what you need to do in your own browser is somewhat orthogonal.

reillyg: when I write my spec I can take my idl and copy/past that into a file in my browser codebase and then there are tools I have to plumb the interfaces into my engine.
... I'd like to see a similar tool for my chromedriver code.

brrian: is there anything else reducable about this?

reillyg: I'm complaining about how many projects I have to touch.

simonstewart: you should only need to define the endpoints, and put it into one implementation

jgraham: it is legitimate that we dont have a lot of idl and you need to do a lot of extra work around that. I dont think having that would help you right tests.

brrian: I could make an ingester
... but it hasn't been a pain point

jgraham: I think it is a bit of a pain point.

brrian: its not a painpoint because there are ~12 endpoints, and not 100s of js apis

ato: the concrete feedback to this working group is that it daunting to add new web driver commands, and we should take that seriously
... historically it has been difficult to map things you want to test onto the req/resp flow. we are now working on a bidirectional protocol. I think this will make it a lot easier to add extensions for specs that currently aren't tested.
... permissions for example, we had to contort ourselves and make a weird api to fit into the current model.
... however, this is all complicated in different ways depending on the various tech stacks. chrome may be harder than safari.
... I think we should take the step to document our expectations of how people are going to use web driver
... this is connectied to what I said before this topic started. we should gather these things.

brrian: I agree

CalebRouleau: we have that

ato: we could use that

CalebRouleau: good

reillyg: we only have a few dozen web driver commands because it is confusing for people

AutomatedTester: yah
... if we link to prior art would be helpful. if we looked at the docs for wdspec, this would help things.

ato: and this would have solved microsoft's problem of not understanding the process.

simonstewart: is it obvious who to reach out to for help

reillyg: well, we have a problem inside chrome, because I don't know who owns chromedriver.

CalebRouleau: he is sitting here (points to right)

simonstewart: myself and david are the webdriver editors
... we can help you
... what made this so hard?

reillyg: I will take as an action item to reach out to people in my org.

simonstewart: but what could we have done?

ato: I think we are bad at following up on issues being filed.

jgraham: I think there are also some issues with extensions.
... also, wrt to web idl, I think there is a possible opportunity for using a schema language when we write down bidi

<simonstewart> https://swagger.io/specification/

<jgraham> https://json-schema.org/

RESOLUTION: research having a more formalized schema for defining the transport layer

reillyg: there is also this issue where web driver is overloaded, both web driver and selenium, and I don't know if I need to add my work to both, and it would be nice to have tools for this

ato: in the interest in making progress, what we should do

<scribe> ACTION: cb to draft some high level documentation containing who is owning the driver, how to add wpt tests, putting the web driver api we have now and putting it in swagger yaml and publish it so it can be consumable

ato: I think the most important thing for us to do in terms of our expectations is to say what web driver cannot do

CalebRouleau: are you saying you would have discouraged permissions?

ato: if bi-di were a serious conversation when permissions was happening, I would have discouraged it.

<jgraham> close the queue

- DRAFT -

Browser Tools- and Testing WG, Day 2, TPAC 2019, Fukuoka

20 Sep 2019

Attendees

Contents