Browser Tools- and Testing WG, Day 1, TPAC 2019, Fukuoka -- 19 Sep 2019

Boaz: there's another agenda topic: articulating a workflow or set of guidelines for how to use the interfaces to create test materials
... specifically for browser vendors
... propose discussing that for Friday afternoon

AutomatedTester: lunch 12:30-1:30, break 3:30 - 4, at which point we do the Aria Driver demo

jgraham: "bi-di" is short for "bi-directional WebDriver"
... 16:30 should start Shadow DOM discussion
... first thing Friday will be long-running session

simonstewart: Custom Selectors will be a 30 minute discussion

<AutomatedTester> https://docs.google.com/document/d/1gUm7Be-akW2-4mjr15cnZlzwoAfOlfL7b3tWCDrb1Jg/edit#

Boaz: there is interest in creating bi-directional communication functionality added to the spec

AutomatedTester: 3 elements: use cases, transports, APIs
... this is not standardizing on CDP, this is faciilitating test automation
... frame use cases around that principle, we wont' get bogged down in historical implementations
... focus on what this group needs for a bi-di tool
... starting with use cases, we have input from Sauce Labs and others

<JohnJansen> https://mit.webex.com/mit/j.php?MTID=mf0c6a95eedfa61b9e4d6dbdc08e3798a

jgraham: [explains how to use the queue for speakers]

simonstewart: use cases: people use Cypress and Puppeteer in certain ways that should inform this feature
... the ability to wait for an event in the DOM
... being notified of those events allows stable tests
... logging what's going on in the browser, including console and JS errors
... people really like to fail tests on ANY JS error, which they do by loading a page and executing a script, but which can cause race conditions
... also, stubbing out back-ends
... people are trying to record traffic, then simulate the back-end operations. Supporting that woul dhelp
... CDP gives you a full-page screenshot option
... [4 use cases total]

JohnChen: people like the features of Puppeteer, and want those features in WebDriver
... e.g. intercepting HTTP requests, and modifying them dynamically

CalebRouleau: is that the same as Simon's point about stubbing back-end requests

jgraham: it's the same thing

<simonstewart> It's a better formulation of what I said

JohnChen: users need to be able to get notified without having to poll the page

cb: at Sauce Labs, we have the ability to bypass the crhomedriver, which allows us to allow custom commands (throttling CPU, simulating performance issues, etc)
... SauceLabs performance product also requires these internal accesses, and incorporating that ability into the protocol would help greatly
... meaningful error messages, access to internal devtool protocols, all would be helpful for customers to be able to write better tests

jgraham: is it important to SauceLabs that you can get the same profiling data out of all browsers, or can these things be more browser-specific?
... FF has different data from Chrome, which greatly complicates the possibility to do this

cb: we just need to get the same kind of information, understanding that it will be different
... we need to get as much information about the AUT as possible, no matter how we can get it (page load timings, logs, etc)

jgraham: for these introspection APIs, these use cases *are* satisfied even though there isn't uniformity among the browsers

ato: one more use case: from WPT, a question came up
... in order to be able to write good tests for the browsers, functionality could be exposed in this manner that would help browser developers as well as end users

jgraham: do you mean event-based user interactions?

ato: there's an interest from spec authors who want to expose bi-di APIs, which is currently made difficult by WebDriver

jgraham: there are other contraints here: some of these "mocking" features are very difficult to implement in terms of a command-response protocol

Boaz: I've invited Reilly [from the Chrome team] to discuss this particular thing Friday afternoon
... this will allow us to state these use cases to one of the most important stakeholders

jgraham: to ato: is this for developer ergonomics for the browser developers, or are they being prevented from developing these things?

ato: one example is the Gutenberg project, which is a test suite--this should be used as a target for the kinds of features being asked for

<jgraham> https://github.com/WordPress/gutenberg/tree/master/packages/e2e-tests/specs

ato: the test suite isn't doing anything particularly challenging, but it would be a different programming model. There's a desire from modern web developers to have an API that allows async comms
... the one thing the test suite does, where WD would be helpful, is in the cases of complicated keyboard interactions
... these cases woul dbe better expressed in WebDriver
... it exposes and surfaces browser-specific interactions

cb: for SauceLabs customers, bi-di mechanism would allow for much better state management of tests

simonstewart: that's risky, because so many things can go wrong

cb: agreed, but those things that can go wrong are commonly network-based

ato: another use case: people writing clients for automation could benefit from other features, e.g. dynamic changes to iFrame or documents
... these events are important when writing clients
... another use is performance logs: people need to know about the performance logs for internal timings
... these things are inherently browser-specific, and they already exist

AutomatedTester: moving on, to simonstewart

simonstewart: summary of use cases: log (performance, console, javascript errors), network interception, and stubbing out req/res, mutation/observation in waiting for events or new contexts

jgraham: we should not assume this is the complete set of use cases, and we should get more data from other tooling

Boaz: we should also consider web-USB and others
... as well as mutation/observation of non-browser features, e.g. Geolocation

ato: these things are not dependent on a bi-di comm interfact

CalebRouleau: there is probably overlap of these use cases

simonstewart: bi-di comms expose 2 common patterns: 1 request, I want 1 response, or "I'm registering a listener, so give me responses as they come up"
... registering for a listener is "network interception"
... e.g. "I want to click a button" or "I want to type something"
... people usually set up a web socket as a long-running connection (usually on localhost only)
... this normally sets up the web socket
... when not on localhost, we introduce risks around corporate networks
... need a mechanism for reconnecting and being notified of what was missed (in the case of complcated, corporate networks)

jgraham: feels like a lot of scope creep in the context of what exists
... buffering up these things won't work well. Listening to network requests adds risk
... timeouts are a risk
... need to hear that this is super-important from the most prolific users before we make the browser responsible for caching all these requests

cb: the VM will shut down if the request times out

jgraham: is network flakiness such a big problem that we need to later the protocol? do we need to abstract over that with an event buffer to keep the integrity of the comms in place to that extent?

cb: we can always build an abstraction around it that doesn't need to be part of the protocol

jgraham: not sure about the web socket issue, but would like to know whether this body considers that to be a risky or unstable thing to rely on

simonstewart: HTTP2 server push is an alternative

<simonstewart> Server push

MikeSmith: WebSockets are essentially deprecated at this point

ato: we are jumping ahead of the discussion - many assumptions going on about how the protocols work
... we've agreed that having a bi-di mechanism is good, but we are constrained by the existing protocols that are out there
... designing a new one from scratch seems silly
... I'd like to see a good x-browser automation protocol, designed from scratch, which covers all use cases, which fixes the problems in the existing protocols, but that's not possible at this point
... SauceLabs says they want to expose the existing internals (CDP/Puppeteer, etc), but not all these protocols would support the kind of embedding necessary for the "resumption of session" that simonstewart wanted

simonstewart: one of the problems we had developing the WD standard is the constant improvement/change we had during the process

jgraham: I believe SauceLabs didn't say they wanted those features--they're already doing them

ato: this isn't about CDP specifically, even though that's what it sounds like when I listen to this body
... CDP does support session resuming, web socket connections, etc... by going in to that conversation, we're jumping ahead... we're adding constraints
... I would like to know the opinions of the browser vendors to know what the outcome should be
... need more clarity as to what exactly would be *in* that web socket, if the web socket is exposed

CalebRouleau: we want to think about what's currently there with CDP
... we don't want to implement something from scratch in the Chrome Team. we have a lot of stuff that already depends on CDP and we don't want to support both, and need the new feature set to be at least somewhat compatible with what we've done already
... we don't want WD to be overly complicated

simonstewart: we should take the caching of things off the table for now

<JohnJansen> +1 to simonstewart taking caching off the table

drousso: Apple's position is that we want to see how this will work before we sign off. Whether it's WebSocket or HTTP2, it's a rewrite for us
... so if we're going to do it, we need to know more before we can commit

jgraham: are you saying you're interested, but with no plans to implement?

brrian: we don't currently allow for either Web Sockets or HTTP2, and this would represent a significant spending of resources

CalebRouleau: how does the CDP work for Safari, in detail?

simonstewart: the Safari implementation of the debug protocol is currently locked down

CalebRouleau: I'm trying to ask what is publicly available, but it sounds like it's basically nothing

brrian: we need specific questions and specific feature requests in order to get this conversation going for Safari

jgraham: at some point we need a clear decision about the debug protocol, some entry criteria... but as long as we can agree on the messaging itself, that will be sufficient

<AutomatedTester> scribenick: AutomatedTester

cb: the SauceLabs major use case is being able to access the browser internals to surface that to people

CalebRouleau: you dont care the transport, you just want access

cb yes

jgraham: for Mozilla we think there is a cross browser automation protocol that facilitates automation. We need to be aware that there is a move to something like puppeteer is a threat to the web since it only covers 1 browser
... There has been some discussion around CDP for reuse. One of the historic complaints is that CDP is unstble and only gives us certain things that are chrome specific. We would need to see what is stable and what we can solve those

simonstewart: and this is why we focused on use cases

jgraham: yes, my point is that there are implicit stability from the APIs that use CDP

brrian: it would be great to expose the API and just add to it. That is a bad idea. We need to make sure that we versioning and even with that we break things all the time.
... it is very difficult to test at the moment
... whatever we do needs to make sure is testable and that we dont rely on browser internals
... in webkit we have processes that come and go and our tools should not have to follow that
... we are going to need some compat layer to figure this out.
... [discusses examples of where webdriver and a tool might look different]
... as for JSON RPC, we have used it for 11 years and it is solid so I dont think we need to change that

jgraham: is this very different to what the chrome engineers have said ?

brrian: no, this is not going to be a lot of work, its just going to be well tested

drousso: one of the things that I would caution against. A lot of things have browser leakage and we need to make sure that browser impl info is not bled out

CalebRouleau: there are lot of issues, like site isolate, we need to make we are a level above that

bwalderman: we need to work top down from use cases like brrian suggested
... and it might be better to keep them separate from the debug protocols out there

AutomatedTester: are you suggesting 2 connections into the browser or that the shim would handle 2 and then do magic into the browser

bwalderman: it would be in the shim

AutomatedTester: good as there could be an attack vector into the browser if there are 2 and we need to be aware

jgraham: ok but we need to make sure that the connection can handle that
... and that the shaping of packets can handle it
... we have seen in gecko that there are cases where this doesnt work. We are constrained on what the low level remote protocols can handle.

bwalderman: I am not suggesting that we write from scratch

ato: is it reasonable to have something that maps down to other protocol?

<karl> Context about what ato is saying http://operasoftware.github.io/scope-interface/

<karl> https://dev.opera.com/blog/opera-scope-protocol-specification-released/

ato: we have seen multiple versions and there has been times to standardise
... we need to make sure we agree on what protocol actually means and transport layer actual means

CalebRouleau: the new devtools team are interested in standards

<jgraham> Zakim: close the queue

JohnChen: we have been experimenting with webdriver and upgrade to a bidi connection via CDP

<jgraham> close the queue

brrian: as far as 2 protocols... for security... we have already 2 and they enter the stack at 2 different points.
... we have mechanisms to protect the user now
... [explains how a bad actor could attack]

<karl> things from the past https://github.com/WICG/devtools-protocol

brrian: and there are people writing a lot of adaptors for VSCode and they dont think that it is a lot of work

LUNCH

<jgraham> reopen the queue

<Hexcles> https://gist.github.com/Hexcles/69f44b94aa616981a564efff11e5f4bb

<mmerrell> jgraham: change in the agenda

<mmerrell> scribenick: mmerrell

jgraham: move the continuing discussion about bi-di to Friday morning

<JohnJansen> https://docs.google.com/document/d/1eJx437A9vKyngOQ49lYYD3GspDUwZ6KpKDgcE2eR00g/edit

jgraham: homework needs to be done, where Google team proposes beginnings of an outline for the bi-di protocol

CalebRouleau: it's not clear that we should start with load event--we should start with something else in order to satisfy the 3 other use cases

simonstewart: one clear thing is that we're going to re-hash bi-di on top of WD, which won't satisfy all needs
... need to put building blocks in place to move forward

CalebRouleau: to JohnChen: could you write up a proposal

JohnChen: today, you can't do these use cases with WD, but you can do some amount of "loading"

simonstewart: agree with jgraham: this will allow the network proposal to move forward

CalebRouleau: maybe simonstewart can come up with the network proposal while Google team works with jgraham to work on a proposal for Loading

jgraham: the loading proposal will look like CDP without the target stuff
... "how do I address a loading context?"
... not everyone has to participate in the side meeting, but it should be open
... if we're not making progress on loading, we can address later

AutomatedTester: moving bi-di discussion to Friday morning. Agenda has been updated
... need to start with custom selector strategies, then shadow DOM, then break, then ARIA driver, then the laundry list of other items on the list
... (the non-contentious ones)

[general assent ensued]

Custom selector strategies

AutomatedTester: from cb: custom selectors

cb: objective is to help user of WD protocol to automate tests for new frameworks like React/Vue
... friction is that new frameworks are built on browser features that aren't mapped well to WebDriver element locators
... automation engineers are having trouble getting to some of these new kinds of elements. Protocol needs to be extended to help locate these kinds of elements
... 2 proposals: drivers can share libraries of atoms. WDio can fetch elements by property name or by component property, taking advantage of shared components in new frameworks. Risk is that this creates organizational overhead for application teams
... other option is "custom selector strategy", allowing vendors to be able to intercept selectors and be smarter about actually locating the element. Advantage here is that there's no new work for this body. Downside is that implementations could be so different that there's more overhead for browser vendors, and behavior won't be standardized
... alternative is that users could register scripts that would collate and cache selectors and selector scripts to avoid unnecessary wire calls

<ato> q

titusfortner: this would allow the driver to operate much more quickly. TestCafe allows location by component, not just CSS. Don't know about implementation, but we need to know if all browsers work the same wrt WDio?

cb: yes it does--the abstraction goes through the virtual DOM

jgraham: bad idea to try to standardize on large chunks of JS atoms. it's not something that will be future-proof: web frameworks come and go, and standardized javascript ends up being brittle
... we should either double-down on vendor-specific extensions to the protocol, or register JS scripts (caching) to improve performance

titusfortner: this wouldn't be limited to just locators--it could be any javascript

cb: yes, it could be

ato: question: can you explain why this is needed?
... do you mean to register particular methods?

cb: yes

ato: these frameworks manipulate the DOM in a way that makes it difficult to just use CSS selector?

cb: yes--React developers write these components using more dynamic logic, and the QA engineer has trouble defining the locator

titusfortner: but it's not the QA eng we're concerned with. This is for developers

simonstewart: there's something here, about registering javascript snippets, which would return a handle. Use cases: good points about React and Vue, compiled and inflated, produced by the client and uploaded. New "friendly" locators in Selenium as well as Watir's JS atoms, all upload these snippets, which becomes chatty and expensive really quickly
... you don't want to inject these scripts on every page load--we want to do this per-session. This would amount to an overloaded version of the JavascriptExecutor
... the user-facing API would make it look like a selector, but underneath it's using the same handler, parameterized
... this will end up being a client concern--not a WD spec concern
... this will be hard to spec out

jgraham: it won't be that hard

ato: the latter proposal isn't too invasive--it makes sense to register scripts and allow devs to inject a JS library and re-use it per-session, but these solutions are so different that they address different problems
... the selector API needs to be locked down very tightly
... you'd want a capability to pre-register a selector and identify them by a string (the key in a hash), passed in the body of a function... it would be a wrapper accepting one argument, which would return the locator

titusfortner: is there a distinction between using a custom selector and pre-registering the scripts? how generic is the current mechanism?
... should we have a generic "driver registration", and then have a component-based registry, all as part of the spec?

simonstewart: if you have this registry for any of the javascript, you could use it for any of these use cases

titusfortner: would it make sense to explicitly make it generic?

simonstewart: you want the remote end to be able to have non-Javascript locators
... Jason Arbon demonstrated an ML-based element locator that wasn't Javascript
... but in most cases it will be JS

jgraham: we should pin this down to stuff that can be implemented in JS

simonstewart: yes, agreed

jgraham: we shouldn't allow scope creep, with all that's going on, to include non-JavaScript features into the "pinning" (registering) piece we're talking about

drousso: fundamentally, this isn't about pinning or caching--it's minimizing the chattiness of the comms
... this isn't fundamentally a concern of the WebDriver spec. This is about implementation, and adding this to the spec would add more questions and opens the door for complications

diervo: we have a huge group of web components. From our POV, we shouldn't worry about the spec for locators, we should be able to register anything before the session
... having this ability would foster custom elements, and would allow for advanced traversal, etc.
... we would like syntactic sugar for being able to select things with custom commands

simonstewart: this is, at its heart, JavaScript, though, right?

diervo: yes

cb: this proposal fosters the sharing of code, even when items are embedded in the code and in different places. This would allow shareable atoms

drousso: but you can already do that--you don't need to change the spec in order to make this happen. You can code it elsewhere

simonstewart: there's a difference between what browser developers need and what execution vendors need

[fast discussion about caching, sandboxing, bootstrapping ensues, from which no point arose]

bwalderman: if we're going to add this ability, we should also allow messages to pass between the client and the script
... it would be interesting to register these events. It would help users, but is a little outside the scope of custom selectors
... it's more in the scope of the bi-di discussion

ato: there's an alternative to consider--JS frameworks are sensitive to DOM modification [gives examples]
... there's risk in registering these snippets to the global window. It could result in race conditions (?) and other issues with dynamic changing of page contents when multiple regions of the page are affected

jgraham: qq: the proposal was to dump text into a map, which you can then later pull it out... the proposal was not to have live JS scripts executing at will within the browser. This would present a large interaction/security risk. The intent is to optimize performance, not to enhance feature function or to execute scripts different than is currently done

ato: not quite--the concern was about maintainability of these functions. The client should be able to pre-register a function, but the concern is that there will be a maintenance burden
... maybe the browser could be pre-loaded with a webdriver running?

cb: if we could pin these JS snippets to the session, we could provide these features easily. SL customers would benefit from these features, both for performance purposes as well as test stability

ato: wouldn't it be better if the React devs owned the selector?

cb: that's not important--we just need to be able to support the feature?

ato: it's not future-proof--old versions of React won't work the same way as current ones, and that will be a continuing risk to any project implementing this

simonstewart: agreed

jgraham: we shouldn't be discussing SL product strategy here

simonstewart: but pinning a script still makes sense
... it's a useful feature, which could be implemented in many ways... as a bootstrap script, this would alter the AUT, but it could also be put into the driver process, and that would allow better performance
... you need 2 endpoints: upload and call

jgraham: we already have call
... call could be used for this

titusfortner: what's the downside of having another endpoint?

<JohnJansen> isn't this the proposal?

<JohnJansen> https://docs.google.com/document/d/13ycIhXJxoCq0K6ti10VpFp_l9hLtTFrx941uC_aSwfk/edit

simonstewart: we shouldn't need another endpoint--but anyway, we're arguing over how much it will cost, not whether or not we need it, so it sounds like we've made our decision

<simonstewart> That proposal needs clearing up to talk about script pinning

JohnChen: this script will still need to be sent to Chrome, no matter what. There might not be a real way, at least in chrome, to pin these scripts in a way that will be markedly improved wrt performance
... the selector pinning strategy as proposed will be exceedingly difficult to implement due to how the already-huge chunks of JS are stored in Chrome
... we can store these snippets, but it will be difficult, and it won't help that much

jgraham: this would supplant the existing JS around "findElement"

JohnChen: not in every case. It varies depending on how many elements you're looking for, and where they're located in relation to each other. There's a way to do this, but we shouldn't try to merge this with existing findElement--we should offer a mechanism for uploading JS that would replace it

titusfortner: this might not need to be part of the spec

brrian: if these atoms are used in testing, and used in the app, can't the app include a snippet that would "assist" with locating the element?

simonstewart: quite often people minify their JS, which mangles the ability for the app to display the same element that is stored in the app
... testers are usually separate from the developers and have no influence there
... testers can't have the locators changed to something more friendly

mmerrell: the wall between devs and testers is the norm, not the exception

<AutomatedTester> scribenick: AutomatedTester

brrian: this is not going to make things faster

titusfortner: yes but this is for larger connections we are sending lots of data all the time

<brrian> ... from safaridriver onward (to devices, other apps, etc)

brrian: why are we caring about people perf, they should

titusfortner: well a lot of people are doing things that are silly like they are sending 20 commands for finding 1 element

<scribe> ACTION: Saucelabs to write a propsal and send to the group

<ato> CalebRouleau: ++

ShadowDOM Support

https://github.com/w3c/webdriver/pull/1320

<ato> ScribeNick: ato

AutomatedTester: We have discussed this before, briefly.

<brrian> 👆

AutomatedTester: I had a proposal in a PR from before.
... We need to think of a process, and happy to throw away this PR and start from scratch.
... Shadow DOM has interesting problems.
... We need to be able to interact with it, and there are two types of nodes.
... Sometime you can see into these nodes, and sometimes they operate as black boxes.
... Frameworks like React are looking into using Shadow DOM, and we need to support this use case otherwise it will be hard to automate.

<simonstewart> cb: use "/msg" to send a private message :)

jgraham: You have a shadow host element, which conceals a shadow DOM.
... If you want to go into the shadow DOM.
... Two shadow DOM host elements: open and closed.
... Theoretically with open elements the inner elements are exposed to JS.
... You can polyfill this today with JS.
... You can Execute Script on the DOM property that gives you the shadow root, and from there you have access to the tree below that to manipulate them further.
... My proposal is that we hoist that into a WebDriver endpoint.
... Should should also work for closed shadow DOM trees.
... If you want to pierce the encapsulation field for testing, that is possible to do from content.
... DevTools allows you to pierce it.
... "element/shadow" could return a shadow host for that element.
... The alternative is for Find Element to take an extra parameter: but instead of returning the element, it would return the shadow DOM root inside that element.

lukebjerring: Can I see the original proposal?

jgraham: The original proposal worked like frames, that you’d have to switch into the shadow DOM.
... There are however a few things that work for frames that may not work for shadow DOM, so I don’t think it’s entirely workable.

AutomatedTester: When you want to click on something, you need to make sure you’re in the right space.
... For example, inside the <video> element you’d want the play button [element].

diervo: At Salesforce we will soon have developers working on web components.
... In Chrome and Safari it will throw when you click.

jgraham: Why would you click on the shadow root element?

diervo: You call a selector to find this, and people are sometimes confused.

<diervo> rniwa

rniwa: Shadow DOM is a weird parallel tree.
... It replaces the appearance of the shadow root element.
... A click will pierce down to the right element inside the shadow DOM.
... What will not work is elementFromPoint.
... If you focus or click through the browser’s normal event queue things iwll work.
... What will not work is the JS primitives available to you for the test tools.

lukebjerring: :next() selector on the shadow root is prone to bugs.
... Does this proposal include prose on piercing the shadow DOM through CSS selectors?

[?]

scribe: I.e. a parameter could define whether you want to pierce it.

<AutomatedTester> https://github.com/w3c/webcomponents/issues/771

rniwa: Historically there has been a proposal to have generic shadow-piercing combinators.
... Every use of that was bad, especially for performance.
... We don’t want to have this, was rejected by Google.
... But _specifically_ for WebDriver you could consider having this.
... A :descendants() selector could potentially be made to pierce the tree, specifically for WebDriver.

jgraham: Find Element From Element, where the root is the shadow DOM root element would be the easier option.
... Implementing a new CSS selector is more work.
... There is utility to have WebDriver-only CSS selectors for use in e.g. Execute Script and CSS selectors, but we should take baby steps and do the thing that is easy to do today.

pmdartus: We are facing huge, nasty shadow trees. Can go to the depth of eight levels.

<diervo> https://gist.github.com/diervo/7ce4437bde4a382679b22306af9b5b6c

pmdartus: We have worked around this by using page objects.
... You create abstraction that deals with the shadow DOM traversal for you.
... This is more resilient than using a selector to one element within the shadow tree.
... We see some value in the piercing shadow tree selector, but it wouldn’t be immediately applicable to us.

<diervo> Here is out utility for shadow dom:

<diervo> https://gist.github.com/diervo/7ce4437bde4a382679b22306af9b5b6c

lukebjerring: What I mean is that it could be solved from the client’s perspective much easier without WebDriver primitives.
... If you had shadow-piercing CSS selectors you could write compound selectors in your client code using helper functions.
... Whereas other proposal here actually requires traversal.

diervo: We trying to force our test authors to do the right thing rather than crafting very complicated XPath selectors.

lukebjerring: Polymer translates basically to web components.

diervo: Historically people have relied on a lot of relativty in their element queries.
... This is fragile as the document structure changes and the component hierarchy changes.
... We need to strike the right balance if we introduce a selector because it can be misused.

rniwa: The shadow-piercing proposal in CSS 4 Selectors you can have a look at, but it forces you to define the boundaries of the shadow DOM.

jgraham: They have something similar in the thing diervo posted.
... The list of selectors thing is a thing we could implement in clients.
... The fundamental primitive lacking from WebDriver, is, I have an element and select the shadow root of this element if it has one.
... If there are more convenience APIs we can add in the future, then that is a second thing to build maybe later.

diervo: Will this work for closed mode?

ato: Yes.

rniwa: Does the script WebDriver injects run in the same RIL as the page does?

jgraham: If you can pass back a reference to an element inside the shadow DOM, then you can interact with it.
... Because it all lives in the same JS realm.

rniwa: WebKit has the ability to expose closed shadow DOMs are open.
... We have the ability to make the closure route open [under certain circumstances].

jgraham: The most straight forward approach would be for Find Element to return the shadow DOM root element if the found element is a shadow host element.

diervo: That sounds like what we have done.

jgraham: Potentially we have agreed here.

RESOLUTION: create a "Get Element Shadow" endpoint that takes the handle for an element and returns a handle for the associated content-defined shadow root, if any, or null if not

We will figure out the details later, but the intention is for the above to work on closed roots.

<AutomatedTester> https://mit.webex.com/mit/j.php?MTID=mf0c6a95eedfa61b9e4d6dbdc08e3798a

<spectranaut> https://bocoup.github.io/presentation-aria-and-webdriver/#/

ARIA & WebDriver

https://bocoup.github.io/presentation-aria-and-webdriver/#/

zcorpan: Hi, I’m Simon.
... Promoting accessibility with incentivisng web authors with tools.
... The idea behind this comes from jugglinmike.
... ARIA and WebDriver are similar because they both aim enable [?] machines.
... With WebDriver you have a script that interacts with the webpage.
... When you use WebDriver you typically work with HTML directly, you get an element by their class name or ID or name.
... Whereas with a screen reader they work with an accessibility tree which is influenced by ARIA information from the DOM.
... There is an opportunity to have the same code path for both assisted technology interaction as for WebDriver.
... There’s also an idea about assuming semantics that are required or recommended in the ARIA practices guidelines to simplify testing.
... Specifically testing accessible applications.
... ARIA Practices Guidelines is a document that non-normatively recommends how to use ARIA.
... Design patterns and such.
... "How to make a modal dialogue?"
... "How to do a button?"
... For example it recommends specific keyboard interactions.
... Tab key should move focus, Escape should close the dialogue.
... The developer has to implement this with JS to be follow these guidelines.

spectranaut: Less test code, improved test stability, …?
... [role="radio"] is ARIA defined role.
... accessibleName() does a difficult thing.
... Instead of these complicated code patterns, we’d like to see this built into WebDriver.
... webdriver.setRadio("Thin crust")
... Improved test stability: finding element’s that have ARIA popups could cause synchronisation issues with elements not being found.
... We’d like to encapsulate these ARIA checks/patterns with a command.
... For example webdriver.OpenPopup().
... This would ensure for sure that the popups are there by running the checks internally.
... Improved test resiliency: tests become brittle when they rely only on CSS selectors.
... Page objects is one mitigation, but this is just another way ot mitigating brittle test writing by relying on the accessibility API.
... Testing against the accessibility tree gives more stable tests over time.
... Accessibility verified: we can add to these commands accessibility verification.
... People don’t have to know ARIA to get the benefit of these.
... For example, on a toggle button the text on the button shouldn’t change when it becomes depressed.
... We could built this check into the WebDriver extension and have WebDriver return an error when the ARIA verification check fails.

zcorpan: jugglinmike has done an implementation of this, with API documentation you can read.
... I want to quickly step through the specification that jugglinmike wrote.

<zcorpan> https://bocoup.github.io/aria-practices/aria-practices.html#automation-pushbutton

zcorpan: This is a new WebDriver command that gets the role of the element and computes it as is specified by the relevant specs.
... Could be another extension to WebDriver to get the precomputed role from the browser, but this is not part of this specification.
... If the role is not a button, it will fail. This is a type check.
... It then gets the accessible name of the element and we return an error if the string is empty because all buttons in ARIA must non-empty strings.
... If it’s a toggle button, it verifies that the state changed after pressing it.
... It then tries to ‘use’ the button.
... This is a new term that depends on the interaction mode that the tester has.
... Mouse, keyboard, touch mode.
... Different code paths when you use the button.
... It then compare the old state value with the new one.
... The accessible name should not change as the result of using a button, and it returns an error if it has.
... Then if everything went fine, we’re done.

spectranaut: Some points of discussion we have are listed.
... The stale element reference could happen inside the algorithm to use the button, and we don’t know how to solve this,.
... We have some concerns about the ARIA recommendation spec stability.
... Is this something that belongs in WebDriver?

simonstewart: Is there a normative spec?

spectranaut: Not exactly
... They are non-normative suggestions for how to make an accessible website.

AutomatedTester: This should not live in the WebDriver specification, because it is more about low-level tasks.
... Here many things can happen along the way.

zcorpan: I agree APG stuff should probably not be in the core WebDriver spec.
... The exposing of the accessible name and the exposure of the computed role is something could be in the core WebDriver spec.
... Computation of the accessible name etc. is normatively specified.

simonstewart: If there's a normative source for it, that would be the logical place to put the WebDriver extension for it.

<AutomatedTester> scribenick: automatedtester

ato: there is a similar impl. in Firefox webdriver. You can set a capability and then get a bunch of a11y checks using the a11y tree

<ato> ScibreNick: ato

<ato> ScribeNick: ato

ato: I think this is a good idea.

titusfortner: What is ARIA? Where are the roles stored?
... In Watir we’ve done some work to make it possible to retrieve ARIA elements.
... Or by ARIA selectors.

zcorpan: Accessible Name and Description Computation 1.2 looks at some of the attributes, but also looks at some of the existing attributes of elements, for example <img alt="whatever">.

ato: These are heuristics for assisted technology, right?

zcorpan: Yes, except the ARIA roles are normatively defined.

jcraig: ARIA practices guide is non-normative best practices.
... The ARIA spec is normative and defines roles, maps to accessibility technologies on the platform.
... Accessible names spec is normative and exposes the heuristics of finding the right elements according to roles.

<Zakim> jcraig, you wanted to mention label should not be tied to the accname spec. These are primitives that differ per implementation.

jcraig: The accessibility roles are implemented by the web browsers, and through adding this to WebDriver we could also check the browsers against themselves.

ato: Is ARIA tested in WPT?

<MikeSmith> https://github.com/web-platform-tests/wpt/tree/master/wai-aria

jcraig: Not currently to my knowledge, because it doesn’t have hooks for each browser’s accessibility engine.

<boaz> titusfortner: can you link to water?

<titusfortner> http://watir.com

jcraig: best ideas from the preso are the primitives: label & role #1439 and the “getter by label” #1441

<titusfortner> specifically: http://watir.com/guides/locating/#aria-attributes

simonstewart: If you set the preferred input device in WebDriver, it would be great if the high-level APIs in WebDriver would use that.
... WebDriver Element Click signals an intent that, say, a button would prefer this input method.

AutomatedTester: It would need to be opt-in, through a flag or capability.
... The default should be off.

zcorpan: I understand the concern that we don’t want to break existing tests.

<cb> ato: in geckodriver the user has to opt in

ato: It seems the question should be if the ARIA checks should be tied to Element Click/Element Send Keys, or if they should be separate ARIA-specific commands.
... simonstewart wanted the formed, and this is how it currently works with the accessibility checks in geckodriver.
... But I’m on the fence.

[Technical discussion about how the extension should be done.]

simonstewart: I would like this to live in ARIA, but I’m not sure if WebDriver should explicitly call that.

AutomatedTester: What would be the performance hit of having the accessibility tree turned on?

jcraig: It’s non-zero. Sometimes significant.

zcorpan: Users pay if they use a screen reader.
... It seems a reasonable cost if you’re testing accessibility.

AutomatedTester: We could use Firefox as an example here and document what it does.

JohnJansen: How is the Firefox implementation different from the PR that is presented?

[Some discussion about how Firefox works.]

ato: I don’t think we should base this off the Firefox implementation. I was just saying that there is presedence for building accessibility checks into WebDriver primitives.

<jgraham> ack

<Zakim> jcraig, you wanted to discuss the more complex issues with APG-based testing

jcraig: ARIA can affect what the label is, as can the label attribute in HTML, but it’s not specifically tied to ARIA.

<AutomatedTester> https://github.com/w3c/webdriver/issues/1439

jcraig: There may be some way to write a library that can have some assumption about what roles are activated.
... But you quickly get in to complicated trees.

jgraham: Some of the other stuff we’re discussing today is fundamental architecture work that is quite important to people.
... In terms of resourcing I’m not sure if this is going to be our top priority.

<boaz> and https://github.com/w3c/webdriver/issues/1439

<boaz> sorry, and https://github.com/w3c/webdriver/issues/1441

brrian: I’m curious about the performance penalty?

jcraig: It should probably be opt-in as there is some initial cost to spinning up the accessibility tree.
... I don’t think the implementation is much, but there will be performance cost at runtime. For this reason it should be opt-in.

jgraham: Does anybody feel like they’re writing the spec text for this?

jcraig: I can commit to advising.

AutomatedTester: I don’t think it’s going to be that much.
... So I can commit to working with jcraig.

JohnJansen: I’m not against it at all.

AutomatedTester: I’m happy to get a PR and tests up.

jcraig: Are people optimistic about the second issue I filed?
... Get element by its label?

jgraham: I have a DOM tree, how do I tell what the accessible name is? Do I have to turn on the accessible tree?

jcraig: I believe each implementation has a reverse implementation from accessibility node back to the element.

jgraham: Is there a DOM API that is "get accessible name for node"?

jcraig: For example, if a cell is not contained inside a row then we don’t expose it as a valid cell.

zcorpan: On the question, is the information in the DOM enough?
... The answer is no.
... It includes CSS generated content and shadow DOM.
... You want the flat tree + CSS generated content.
... This makes out the accessibility tree.
... The accessibility tree is the same as the render tree, but for spoken output.
... The render tree is for the visual output.

[Discussion about how ARIA works.]

Get Element Text

simonstewart: We use an atom because there was at the time no interoperable implementation of getInnerText().
... We agreed in the past that we should move to use the platform API when all the browsers supported it.
... And now they do.

jgraham: There was agreement in the past that we should do this, but behind a capability.
... Now is the time to do it.

RESOLUTION: We should add capability to opt in to the platform getInnerText() for Get Element Text

Maximum timeout to set over 1 minute

Text input spec modification

<JohnJansen> https://github.com/w3c/webdriver/issues/1430

JohnJansen: It’s a bug.
... We missed the case that you’ve already selected an element. The spec says you should always start at the end of the text.

simonstewart: Send Keys always resets the caret to the end of the text field.

JohnJansen: Firefox implements this the way I expect, but which is in violation of the spec.
... I think everyone agreed this was an oversight.

RESOLUTION: The caret in Element Send Keys should move on calling the command again

Adding the step to handle user prompts for screenshots

<simonstewart> Scribe: sstewart

<simonstewart> scribe: simonstewart

<scribe> ScribeNick: simonstewart

<AutomatedTester> scribenick: simonstewart

ato: do the js dialog prompts stop you from taking the screenshots?
... why do we want them to go away?

jgraham: implementations may not be able to take a screenshot if an alert is displayed

<scribe> ACTION: AutomatedTester to approve PR

CalebRouleau: wonders if alerts should block js at all

nerdery about specs follows

Clarify handling of unrecognised extension capabilities

JohnChen: should we accept or reject unrecognised extension capabilities

jgraham: two cases. All capabilities have a prefix. geckodriver might understand "moz:" but there might be something that means "moz:foo" isn't recognised

ato: capabilities used to pass in config to end point nodes. Also to provide additional configuration to intermediary nodes
... should intermediary nodes prune capabilities?

simonstewart: I had a spec change ready to allow intermediary nodes to delete capabilities as needed

JohnChen: there's one WPT test that expects the unrecognised capability should be rejected

ato: looks up the test and all browsers fail it

<ato> https://wpt.fyi/results/webdriver/tests/new_session/default_values.py?label=experimental&label=master&aligned

jgraham: explains how geckodriver works, and claims this is the desired spec behaviour
... browser ignores capabilities that start with a prefix that's not specific to the browser, it's ignored. If there's a capability with a (eg) "moz:" prefix that's not recognised, the session fails
... the spec is currently in a degenerate state
... if any capability with that prefix is accepted by the driver, but the capability is unknown to the driver, an error should be thrown. Otherwise the extension capability should be accepted

ato: does this mean we need to look at all capabilities

jgraham: yes. You can't just pull out "known things"

ato: normally we ignore additional data in payloads

jgraham: explicitly not with capabilities
... suggests what he's suggested before as a change to the spec

ato: is that needed?

jgraham: the reason for this proposed handling is that if someone passes in something with a typo, they fail fast.

lukebjerring: but they could typo the prefix

jgraham: but this makes it easier to handle capabilities uniformly

JohnChen: for chrome there are some extensions that use "goog:" but aren't actually chrome extensions --- they're ones from other parts of google
... Maybe we should have used the "chrome" prefix

jgraham: if the point is that chromedriver is never throwing an error, then this change would be a breaking change for them

ato: I ran into a bug the other week
... to do with chromedriver extension capabilities. We stored the data in geckodriver, and it was waaay too large
... selenium clients send massive blobs all the time, since they do double locations (the protocol handshake with the old protocol)

lukebjerring: I'd argue there's not a clear benefit

jgraham: the other reason is to do with the matching. The spec spelled out what was allowed.

simonstewart: explains original behaviour and reasoning behind it
... suggests end nodes abandon session creation on any unrecognised capability, and intermediary nodes to strip capabilities that are specific to the intermediary

jgraham: wonders aloud whether end nodes are getting capabilities that they don't recognise

JohnChen: we might get some resistance from the internal test team

CalebRouleau: "extension capabilities are a free for all, and we should accept that"

lukebjerring: is there a need to reject unrecognised extension capabilities

jgraham: yes. eg. different browser versions that support different capability names

ato: it seems like what we're agreeing to is to accept jgraham's proposal?

jgraham: no. If there's a colon in the name and you don't recognise the name, we accept that capability
... compares chrome's lax approach with gecko's stricter approach

CalebRouleau: attempts to use the word SHOULD

ato: it MUST NOT have the word SHOULD

lukebjerring: the existing test doesn't highlight the difference in behaviour between chrome and firefox

simonstewart: we could just grab the "example" extension capability for ourselves
... so we're back to the original desired capabilities pattern?

jgraham: yes

RESOLUTION: For extension capabilites that are unknown to the implementation the result of validating capabilities must always be to accept the capability i.e. unknown extension capabilities never cause matching to fail irrespective of whether the extension prefix is known to the implementation

<jgraham> ACTION: (someone) to update the tests to include moz:foo goog:foo etc. and ensure that matching doesn't fail

<jgraham> RRSAgent: make minutes

- DRAFT -

Browser Tools- and Testing WG, Day 1, TPAC 2019, Fukuoka

19 Sep 2019

Attendees

Contents

Continuous Standards Development

Agenda

Bi-directional communication