W3C

– DRAFT –
Browser Tools & Testing @ TPAC 2023 - Day 1

14 September 2023

Attendees

Present
AutomatedTester, jamesn, jcraig, jdescottes, Jem, jgraham, JimEvans, jugglinmike, lola_, Matt_King, mattreynolds, orkon, patrickbrosset, sadym, sasha, shs96c, spectranaut_, thiagowfx, whimboo
Regrets
-
Chair
David Burns
Scribe
AutomatedTester, David Burns, jgraham, orkon

Meeting minutes

<jgraham_> RRSAgent: make logs public

<jgraham_> RRSAgent: make minutes

Agenda wrangling

<whimboo> when exactly we will have the breaks today?

<whimboo> is there a link to a schedule that we might want to follow?

<jgraham> Breaks are 11-11:30 CEST and 16:30-17:00

<jgraham> Lunch is 13:00-14:30

<jgraham> https://www.w3.org/2023/09/TPAC/schedule.html#thursday

<jgraham_> RRSAgent: make minutes

Resize and positioning windows

Github: w3c/webdriver-bidi#398

whimboo: I've added this and was curious if this is a high level priority from client vendors

jgraham: Classic webdriver has support for resizing and positioning. From the point of view of supporting classic we should add this
… it has a lot of capabilities that are not available through device emulation
… in classic webdriver there is some confusion between top level browser context and the OS window
… the suggestion in the issue is that we have something has an a OS window id
… and have top level context in that OS window
… and then dimensions that are available
… and then you can set the state of the window (max/min)
… are there other use cases that we should see about addressing?

shs96c: This is basically what we need to implement (points to classic spec section 11)
… we are just going to lft that and shift it?

jgraham: yes
… webdriver classic does a lot of things that says "please try" and it can fail

shs96c: yes... like mobile can';t do a lot of those things

jgraham: it is also fallible in some cases in there is a window manager that has no idea how to do that. E.g. window managers don't have max

shs96c: from where I am sitting this proposal looks good

orkon: from our perspective the proposal looks good
… for the other things these are outside the browser
… and more window manager controls
… it would be good to get messages coming back about what can/cant be done

shs96c: people historically run tests in as big as possible and know it's not always perfect

<whimboo> https://github.com/web-platform-tests/wpt/pull/41588/files

whimboo: one follow up is I have updated the tests recently
… so what ever we do in bidi should be easy to copy these tests across

<shs96c> whimboo: can you please share a link to that spreadsheet?

whimboo: the priority on this still correct as it has a high priority

<orkon> https://docs.google.com/spreadsheets/d/1Cg3rifrBZClIitU3aFW_WDv64gY3ge8xPtN-HE1qzrY/edit#gid=0

<whimboo> https://docs.google.com/spreadsheets/d/1Cg3rifrBZClIitU3aFW_WDv64gY3ge8xPtN-HE1qzrY/edit#gid=0

shs96c: for selenoium it is still a high priority for what I mentioned earlier
… one quick question, do we want to send events back or is it fire and forget?

jgraham: I think we should have a response with details of the window size it got to
… and we can have an event that is fired if a new window is created or a resize happens from something else

shs96c: that would end up with 2 events? a window resize, window create

jgraham: 3. one for resize, create and destroyed

orkon: for events we should discuss events separately as that is not part of the current proposal

jgraham: I agree. It is lower priority and a lot of it is covered by context created

Capture full page screenshots

github: w3c/webdriver-bidi#384

whimboo: We are already doing this in classic
… for iewport or for an element
… but there are a lot of users asking to capture the area offscreen to get a fullpage
… so it will be great to discuss this

whimboo: we have already implmeneted this on Firefox in classic so we don't have an issue

orkon: we are also able to do a fullpage but making hte viewport as big as possible
… we will not able to capture fullscreen shot screenshots of elements with overflow or iframes

jgraham: I think with screenshots there is already scenarios that can't be handled. e.g. in iframes
… I don't think that people would want to handle the cases with scrolling of text in a box
… functionally we should add an extra attribute e.g. fullscreen=true that takes the viewport of the scroll dimensions of the document
… it also makes taking an screenshot of the element easier

and do the whole element
… and I think that solves the main use cases

Jim Evans: just as a implementation detail on the spec
… is therre any mileage for making a fullscreen clip rectangle or full page clip rectangle?

jgraham: I think the answer is yes
… it would be a viewport clip rectangle that could take an element
… the way it is written here doesn't take the whole matrix of choices in

orkon: I think that makes sense. I just want to point out that viewports can affect things here. We will need to make sure that we take the edge cases into account

sadym (IRC): is this different to what Mathias asked in the issue the other day?

Jim Evans: that makes sense to have a boolean property. I withdraw my previous suggestion

jgraham: I think the previous suggestion is quite good. the previous design is weird
… we could do the fullscreen like an element clip rectangle
… and it makes scroll into view mutally exclusive
… it's not very explicit in the protocol how to handle this
… I am not sure what the correct answer is here and don't have know how to handle it right now. I like the clip rect suggestion

shs96c: I like the clip to view port would be useful
… we can do viewport false to get things of everything
… the classic spec has a lot of "if in view" so a lot of people maximise the window to try get as much as possible in the window to remove flakiness and then want screenshots
… and we still need to have resizing of windows

orkon: the browsing context can manipulate the viewport
… I don't have an opinion other way
… there is the question of scroll into view in screen shots

orkon: the issue is scroll into viewport has been merged into element screenshot
… it's currently an option
… it doesn't make sense for element screenshot to scroll if we are using it for full page

jgraham: so I agree it doesn't make sense there
… for fullscreenshot. We can make a new command or a new attribute
… the reason is due to classic
… we could make it separate commands
… if we can send 2 commands in one payload

automatedtester: The context for scrolling in element screenshots was this was a request from Microsoft to be able to try take a screenshot of an element in the view port by scrolling and then times when you just want the element and it should be out of the viewport. this is why it was originally designed that way

shs96c: some of the colour for this. IE could only give you the screen shots of the view port
… since we only had a screen shots to scroll as we screenshots
… my preference of screen elements with the ability to turn off

<Zakim> jdescottes, you wanted to react to jdescottes

jdescottes: I don't think that it will be enough for all cases e.g. scrollable non-root elements

<jgraham> RRSAgent: make minutes

whimboo: I wanted to add to jdescottes item is that we don't want to constantly scroll e..g. twitter/facebook
… element interaction is going to need "scroll into viewport" so that we can move the viewport to be able to handle it
… this feature is part of interaction commands
… but not in actions as that assumes everything you need is already in the viewport

jgraham: irrespective of the screenshot command there should be a top level command for scroll into view
… and to whimboo 's point there is going to be a usecase to be able to scroll the element. I do think we should have a top level "scroll into view"

<orkon> that makes sense to me

shs96c: I 2nd adding a command to scroll the element in an action chain
… and typing de de de is hard to type

Ability to upload files and fill out file inputs

github: w3c/webdriver-bidi#494

orkon: We started working on this feature. THere are some open topics on this
… context: we tried to implement a method to set the file to the dialog
… and we need to get the events with this
… q1: is the interception of the dialog should be doable through the current mechanisms. Are people ok with this?

<jgraham> w3c/webdriver-bidi#514

shs96c: Not all UIs show the dialog. There might not be able to get the file upload. For the local case this is really easy to do
… for remote the clients are going to need to be able to send the file across intermediary nodes
… for the UIs doing text boxes they assume the file exists on the local file system
… but I think the remote case needs to be handled.
… for selenium people hate the sendkeys that doesn't upload the file if it's just a text box

orkon: should we intercept the dialog and be part of the event subscription

shs96c: For the remote case we take the file, upload it, get the new file address and they type the new file path is done part of the sendkeys command
… if set file input had the file name and file contents as the payload it would solve this problem

jgraham: for set files I think it would be fine to bypass dialogs
… if the dialog blocks the browser there should be a way to probably send that out but it might not be cancellable.

<whimboo> the file dialog is a native dialog from the OS and we cannot handle that in Firefox

jgraham: we could have an event for "a file dialog has appeared, please send files or cancel" so we don't block the UI

orkon: we envisioned the workflow that the person would intercept the dialog and then set the file path in the dialog
… and then people could choose what to do next
… we suppress the dialog from appearing on screen

shs96c: in selenium the uploads can be dependent on the UA as it can block the browser
… we were block the dialog from loading. <describes the entirely convoluted way to make it work in IE in the past>
… and this didn't work in Firefox

orkon: this is why we want to discuss it. We want to intercept it. We can stop here and then I will come back with an example.

shs96c: <describes a use case where we capture an display:none interaction causing a dialog to appear>

<shs96c> In that case, being able to block the dialogue from opening would be wonderful

RRSAgent: make minutes

AOM Accessibility testing

<jcraig> https://www.icloud.com/keynote/0eciRqzy6aGWyffs7OXZbkz8w#AccessibilityAutomation_2023SeptTPAC

Slideset: https://www.icloud.com/keynote/0eciRqzy6aGWyffs7OXZbkz8w#AccessibilityAutomation_2023SeptTPAC

[Slide 1]

[Slide 2]

jcraig (IRC): each web rendering engine maintains an internal model of accessibility before it exposes it to the platform-specific accessibility APIs. That will provide information to a11y API. In separate contexts, there is platform-specific automation and AT automation. Lola and Matt's AT-Driver session will be later this afternoon. Not directly related to this request. Some other existing "client-side automation" solutions (like Deque's Axe-Core) treat the DOM as source of accessibility truth.

[Slide 6]

jcraig (IRC): a11y tests added as part of wpt and in Interop. Computed Role and Computed Label already in the spec

[Slide 10]

jcraig (IRC): Have basic a11y tests running in all four engines. Currently 600 tests

[Slide 12]

[Slide 16]

jcraig (IRC): We can check for computed role and computed label for any element. There are 54 elements and attributes that affect a11y. We can currently test 3 of these. Still lots more we'd like to test.

[Slide 17]

jcraig (IRC): In addition to the 50+ other aria attributes, we want to test conflicts with host language attributes... e.g. required vs aria-required on the same element. Would like to trigger a11y actions / events. Stack is different to other kinds of events e.g. pointer click. This is not quite the same "actions" as in wpt.

[Slide 18]

jcraig (IRC): As changes happen to DOM they change a11y tree and then causes events in the accessibility system. Concept of a11y tree walker. Currently lots of implemenation differences in those trees. Easy example is div with overflow:auto. Scroll view that creates will be represented differently between different engines. But we want to test what we can with a11y tree even if it's not fully interoperable. orkon wanted something similar for a

CDP feaure

[Slide 19]

<AutomatedTester> WICG/aom#197

<AutomatedTester> WICG/aom#203

jcraig (IRC): Two AOM issues. I think all the stakeholders involved in 197 agree on the following goals. Test-only web api for a11y feature. DOM-exposed API would over-complicate things. Need a way to get an a11y object and its atributes. Can currently get two attributes associated with real elements. Need to reconcile a11y tree elements with DOM tree elements, because the relationship between the trees isn't quite 1:1. Want to synthesize screen reader inputs.

[Slide 20]

[Slide 21]

jcraig (IRC): Don't need writable a11y nodes. Don't need a live tree representation. Don't need a11y node ids to persist after the DOM changes. Node can be destroyed and recreated if DOM elements are e.g. hidden and redisplayed. Trees aren't expected to be identical between browsers. Platform specific a11y APIs aren't in scope, neither is assistive technology (AT, e.g screen readers) automation itself. (Caveat: Driving AT is obvs in scope for separate issue: AT Driver)

[Slide 23]

jcraig (IRC): [Clarifies which bits are in scope]

RRSAgent: make minutes

jcraig (IRC): Been shopping this around. There seems to be general feeling this might be a good webdriver extension.

[Slide 24]

jcraig (IRC): Want a way to get the accessibility node from a specific element. Might want to get just an id or get all the properties in a single call.

[Slide 25]

jcraig (IRC): Need to be able to get a11y node by its own id, so that we can walk the a11y tree.

[Slide 26]

jcraig (IRC): This is an example of what the properties of an a11y node might look like.

jcraig (IRC): Reason for property bag rather than individual accesors is that it reduces the number of requests required per element.

[Slide 27]

jcraig (IRC): Synthesize event. Events are not quite the same as the non-a11y events, they can also affect the a11y tooling in a way that other events don't.

[Slide 28]

jcraig (IRC): ARIA actions aren't like WebDriver actions.

[Slide 30]

jcraig (IRC): Might be a problem with ids being reused across different sessions, but session id might be enough to disambiguate. Can probably do the property bag quite quickly even if the id portion comes later.

jcraig (IRC): Inert hides things, want to test that works. Might be differences with some implementations marking nodes as hidden and some removing them from the tree.

orkon: You mentioned something about subscribing to events. BiDi seems like it's better for that . Could you use BiDi?

jcraig (IRC): We don't think there's a problem with taking these concepts and implementing them in BiDi.

jcraig (IRC): But we don't know about BiDi shipping schedule.

jcraig (IRC): Could put this on the BiDi roadmap.

jgraham: Seems like the design would work well in BiDi, not just for events but also because the tree properties match the way we do DOM in BiDi.

jcraig (IRC): Could maybe do a subset in classic today and then add other stuff to BiDi long-term

shs96c: Events are hard to do in classic.

jcraig (IRC): We could send events to the page?

shs96c: Yes. We're also talking about how to reformulate classic in terms of BiDi so we expect it to be a superset

orkon: Should this be an extension or a core part of the spec? In BiDi it seems like it could be a core part of the spec.

jcraig (IRC): What do you see as pros/cons of extension vs not

orkon: Don't think this should be an optional thing.

orkon: We have use cases in puppeteer. We want treewalker to get a11y tree snapshot. We want to query a11y tree e.g. finding nodes that have certain properties.

AutomatedTester: I'm with orkon on making this a core webdriver feature. In testing space people are starting to look at doing a lot more accessibility testing. WebDriver should take on the hard parts. This could simplify a lot.

jcraig (IRC): One of the reasons for suggesting an extension is that it would allow clear delineation of responsibilities. People in ARIA group would be willing to work on this. We're happy to take in whichever direction you want. Don't want to make unreasonable requests.

AutomatedTester: I think this is important and so should be in core.

jcraig (IRC): One complication might be that we'd really like to test parent child relationships, but right now they aren't the same between browsers. Is that OK?

<AutomatedTester> jgraham: where this lives doesn't really affect wether it is required to implement. The issue is more to do with the ownership of doing the work

<AutomatedTester> ... I think it being in a different spec this might make sense

<AutomatedTester> ... as for trees that could be very different between browsers scares me but doesn't mean I think we shouldnt do it

<spectranaut_> maybe we should do: find accessibility child of role x

<AutomatedTester> ... I think we should be worried that people will assume the way a specific browser works doesn't means to users

<spectranaut_> I think that would be the same across browsers...?

<AutomatedTester> ... e.g. Browser A returns a specific way and then other browsers are "not accessible"

<Jem> I wonder whether we heard about ideas about James Craig's question regarding parenet - children relationship

<AutomatedTester> ... there are enough legitimate use cases here to do it

<Zakim> jcraig, you wanted to react to jgraham to respond to jgraham

<AutomatedTester> jcraig (IRC): Aaron Leventhal from Google suggested a variant "normalized" ax parent which could make these the same... possibly align on a single ax tree

<AutomatedTester> ... there is still utility in being able to access the actual tree

<AutomatedTester> ... I think we can still move forward on what we have here

jcraig (IRC): There's a concept of a normalised tree which would align with ARIA's definition of a11y parent and child. There's some benefits of getting the underlying tree to help align implementations.

<AutomatedTester> jgraham: a good analogy here is that we don't expose the layout tree between browsers

<AutomatedTester> ... we need to be wary of what can be returned

<AutomatedTester> ... and we need to explain that these should be behind flags with the explainer that things will be different between browsers

jcraig (IRC): Having the implementation tree accessors behind a flag seems reasonable.

shs96c: As long as you can express the same concepts in the tree between browsers, that seems find. Could also base tree walking on find element-like API rather than walking the tree, so you'd skip over things that are different between implementations.

<shs96c> Stewart

orkon: There's also a discussion about find element, and there's a proposal to make those work with role, so that might affect the extensions question.

<Zakim> spectranaut_, you wanted to say maybe we should do "get accessible child of role x" which should mostly be the same across browsers, with some exceptions

spectranaut_ (IRC): I was going to recommend a similar solution "get accessible child with role <x>"

Jim Evans: I wanted to point out that in BiDi spec there's prior art for serializing children of DOM nodes and get a tree in that way up to a certain maximum depth. which might also work for the a11y tree,

[clarification that "prior art" is not meant in a legal sense]

RRSAgent: make minutes

Ability to upload files and fill out file inputs (contd)

<AutomatedTester> github: w3c/webdriver-bidi#494

orkon: We have a use case that a dialog would be shown to a user. We would like to have an event that shows that a dialog would appear. We would surpress the the dialog from loading
… we would then notify the user so they can then decide to dismiss the dialog or complete the form
… we also want to have people to automatically handle the dialog if people aren't expecting it
… the interception of the dialog would be happen if the person subscribes to the events

<jgraham> https://html.spec.whatwg.org/#show-the-picker,-if-applicable

jgraham: I have found the relevant part of the html spec for this
… we would effectively override steps 2 and 3
… there is some interesting edge case here whether the element fets the cancel event
… and if you didn't respond that's a case that wont happen with a real dialog
… I think if you are not subscribed to the events that we should cancel the dialog automatically
… the worst case is that we get the dialog and can't handle this

orkon: we have the same situation in alerts and we should handle in the same way

jgraham: yes, we need to be able to handle this. currenetly people need to subscribe and handle the alerts as they appear but we should probably go back and check the spec in bidi here

jgraham: we should sort this with alerts

shs96c: we should do what classic does here

jgraham: we should raise an issue here and get it sorted

orkon: there could be cases for hybrid automation
… we should automatically throw errors here

ACTION: file an issue on how to handle alerts if there aren't any subscribers to the events

jgraham: the hybrid case is important here
… <explains a use case>
… and it not working in Safari is something we need to be aware of. We should probably do what shs96c said and follow the capability

Add support for FindElement(s) commands from WebDriver classic

github: w3c/webdriver-bidi#150

Jim Evans: in the github issue. In the comment I wrote is a strawman proposal for how we can do it
… there are some locator strategies that won't be able to be done via the javascript execution like the a11y discussion earlier
… having the findElement command also helps with not web platform implementations of webdriver (appium et al)
… I think it is important that we do this. I know in puppeteer they move things down to css or psuedo css selectors
… I would like to be able to do things without having to rely on JavaScript to make this happen

orkon: I wanted to comment on the proposal. We are in favour in having this command but we have some concerns
… e.g. how do combine commands for finding elements
… we need to discuss the details. Also allow people to return an iterator etc and shadow roots

littledan: We in bloomberg we have a UI testing tool using a custom protocol
… the bloomberg terminal is based off chromium
… we have been using a CDP interface on top of that
… this is done via commands on the domain instead of JS
… we are in favour of a find element command
… and if we get this in then we can see about adopting this quicker
… its very similar to the appium use case

shs96c: from the selenium point of view, this is very crucial here

<littledan> I also wanted to mention: We don't have any particular concerns about CSS selectors--we actually already implement a CSS selector system for querying the UI

shs96c: we also need findElements but with the ability to limit the amount of elements returned
… the ability to handle compound selectors would be very useful
… from Jim Evans proposal we can do some really useful ideas looks really nice and I support it

<Zakim> jgraham, you wanted to note that the scope of the proposal is far larger than we have in classic

jgraham: looking at this comment I would want to start making this map onto classic
… as this proposal is larger than classic
… the easiest approach is using a single locator and then making sure they map across to classic and then new items be a separate discussion e.g. shadowroot
… we need to make sure we work for the classic use cases and then move onto the longer discussion on compound as they can, not elegantly, be handled in the client at the moment

Jim Evans: one of the reasons for this proposal from puppeteer and looking at playwright notion of what a locator looks like and being able to chain locators
… the next reason for the complexity was to also minimise the amount of round trips
… unlike cdp based tools we have to be able to handle internet latency
… and minimising the round trips
… I haven't been through all the edge cases but the command is definitely important

shs96c: implement things to get classic to work on bidi and I suggest that we get the text to move to innerText whch could be a breaking change

<jgraham> +1 to that.

shs96c: clients could potentially solve that via javascript to make sure that we dont break backwards compat
… and we need a JS locator

sadym: Another scenario we want to want for an element to wait for it to appear/disappear
… this is why it's in JS at the moment so that it can wait for elements to reach a state required
… do we want this to happen too?

shs96c: I think that is out of scope for right now and we should deal with it in a separate issue

jgraham: people could poll or create a mutation observer

shs96c: classic just says "it's here" or not where puppeteer/playwright can wait for a page load and an element loads

whimboo: classic does have implicit waiting

jgraham: I agree the compound locators have real uses but we should focus on find element so we can get quickly get consensus
… i'm worried about adhoc compounding of commands will not work well across the whole spec even though they make sense on their own

<orkon> an example of p-selectors (inspired by CSS extensions and deep selectors proposals) combining various strategy (css + text + aria + xpath + custom locator with light and shadow dom descendants):

<orkon> `div.my-cls ::-p-text(Test) >>> ::-p-aria(MyButton) >>>> ::-p-xpath(//div) >>> ::-p-vue(MyComponent)`

orkon: I just wanted to point an out an example using the p selectors from puppeteer

Ability to upload files and fill out file inputs

github: w3c/webdriver-bidi#494

<orkon> 1. Interception enabled by the event subscription to the input.fileDialogIntercepted event.... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/zhgDeKqAtTFNZpkaAtotzLTF>)

<orkon> this is the consensus

<jgraham> RRSAgent: Make minutes

Session History

<jgraham> github: w3c/webdriver-bidi#502

jgraham: There are 2 questions here. Currently with navigation events we have an event when fragment is navigated but not when there is popstate
… we could create an event when navigating the history even if it doesn't change url

<shs96c> Classic has https://w3c.github.io/webdriver/#back

jgraham: do we want to be more in line with the navigation API or ...?

shs96c: do we know why puppeteer exposed this in the first place? Was this because its a devtools protocol or was it a conscious decision?

orkon: it would be better to follow the fragment events and happy to add a new event. For the history of puppeteer we allow people to move back/forward. but we wanted to expose the everything for us to get a larger and more flexible use cases

shs96c: this is why I was wondering how widely used it is?

<jgraham> https://developer.mozilla.org/en-US/docs/Web/API/Navigation_API is the Web Navigation API

orkon: there is a puppeteer use case that allows people We also need a mechanism to get the current URL

shs96c: forward and back is the most important but getting the complete history is lower priority

jgraham: it's definitely harder expose all history across all UA

<shs96c> @AutomatedTester: We also need a mechanism to get the current URL

jgraham: The web also has the navigation API (shared)

orkon: you can also inspect history and then decide what to do based on it's content

jgraham: So we need to make sure that we have back/forward. There is support for event when there is a navigate happens
… and exposing session is what people would like but more advanced used cases are lower priority

<orkon> agreed

Final decision if browsingContext module should be renamed or not

github: w3c/webdriver-bidi#91

jgraham: A while we discussed that the browsing context doesn't match what the html spec has
… I think we are reaching a point that renaming it is going to start annoying people
… and in spec land people might notice but not care
… so we can see about leaving it and just going through the spec and make sure that everything is documented properly

shs96c: I would advocate that make sure the name is correct rather than telling people in the spec that the term is correct
… I think we should be thinking that the spec will be around for years
… so we have the opportunity so we should take it

orkon: is browsing context still used in the html spec? Has it been replaced by navigable?

jgraham: a document still has a browsing context in some circumstances
… there are cases when the routing can change things
… navigable is conceptually the tab/iframe

orkon: There is a discussion item later about prerendering
… so if we keep browsing context in there . If we don't change it how will it be with that?

jgraham: I would surprised if they have multiple browsing contexts
… I think navigable can only have 1 session history entry
… and 1 active document

orkon: on our side we would like to prefer not to rename things due to the amount of work
… but for future ambiguity then perhaps we should probably fix it
… I think it's constrained on the spec work

jgraham: <describes the spec work involved... tl;dr; it's not find/replace>
… we would need to see if we can find someone that do the spec work and then coordinating the amount of work that is also doing the tests
… we can't defer this any longer
… the amount of work on this part of thinks this is not worth it and we might have html churn things again the future
… but we need to make sure that navigable is documented properly

<orkon> agreed not to rename

shs96c: I suggest leaving things as is and adding action on documenting navigable properly

ACTION: documenting navigable properly in the spec with browsing context

AT-Driver

Slideset: https://docs.google.com/presentation/d/1dgjLt7HdenkvNpI6UBoS6BWYvC16L4qnbDnZ1pH4Nnw/edit?usp=sharing

Lola: We have presented AT driver in the past
… and we are being part this working group
… AT Driver is a protocol for doing automation of assistive technology
… the main people behind this are Lola Mike P. and the ARIA group
… {talks about slide 5]. We have some implementations and interested groups
… we have issues that have come up, e.g. Sec issue around keyboards
… what do we need to promote this to a working draft

jcraig (IRC): thanks for handling the fundamental issue around security
… I think the utility of this project is great
… and if we can make it work in an interoperable way it will be amazing
… Apple does have a way for automating voice over
… and there are unsupported paths in this case
… beyond supported and unspported ways there is no plans to support this on the roadmap at the moment. there are higher priorities
… I have fundamental concern around security and design
… the security is that we simulate the hit press
… which then goes to the screenreader which gives us full access to the machine
… from the design level we could see about see about changing things
… we need to have a better way enumerate on things rather than having the tests needing to figure out what to do

orkon: is there a demo how it work and what it looks like?

jugglinmike (IRC): We are building out the infra on this and show the harness
… the tests are written in aria-at and then runs them

<jugglinmike> w3c/aria-at-automation-harness

automatedtester: we need to make sure that we follow the process and thats a question to sideshowbarker . While we solve the design issues and sec issues as an editors draft for now

sideshowbarker: the WG is out of charter and we would need to make sure that follow the process and get it all documented properly
… we can't just have an open ended wg and deliver what we want

<Matt_King> The aria-at CG should be able to modify the current draft to address security and design concerns before the new charter is in place.

shs96c: there are 2 things. I looked at the interactions. There are no targets where interactions should happen
… from selenium people want to use their machine and hate having focus stolen. I think having a target would solve that
… and a comment... writing tests with IME in the old selenium was really painful. This is why in webdriver we have code paths for specific characters
… and we need to make sure that intent is handled in the commands

lola_: do you have an example?

shs96c: The concept of handling an alert can be different between a braille reader and other places but the intent shows that people just want an alert handled

Matt_King (IRC): This sounds a lot like what jcraig (IRC) is asking for

<Zakim> jcraig, you wanted to mention prior feedback, target, and the a possible braille misunderstanding"

jcraig (IRC): I want to make sure this spec is a success. The utility of the project is great but if get to that point the WD without things being fixed I will have to do a formal objection
… as for targets then we need to potentially have multuple targets
… the target proposals is a really good one
… there was an item on a slide about braille
… screenreaders are not tied to a specific output
… and their input to the device can be different

jugglinmike (IRC): there are some assumptions that we have made
… the info being conveyed is appropriate for a screenreader
… where the braille it would be more passive
… e.g. what was the last thing did you vocalise

jcraig: i'd see this as another accessor on the screen reader... (last spoken phrase, or current braille buffer... identical in most cases) .... that'd be a way to test aria-braillelabel for example

<Zakim> jgraham, you wanted to react to jgraham

jgraham: I think this does sound useful. THis WG has been always been very specific group for browsers
… other than webdriver-bidi there doesn't feel like there a a lot of overlap

<jugglinmike> AutomatedTester: BTT is about tools that facilitate the ability to do testing

<jugglinmike> AutomatedTester: historically, BTT has meant "WebDriver"

<jugglinmike> AutomatedTester: The charter was initially to do WebDriver and developer tools

<jugglinmike> AutomatedTester: There's definitely overlap in that regard

<jugglinmike> AutomatedTester: If it doesn't have a reasonable home with people who understand the testing side of it, then it risks "falling between the cracks" of the various folks who have expertise with some part of it

Matt_King (IRC): the knowledge in this group is essential to AT-Driver being a success
… I don't think any other group would give us this feedback
… our user base is hopefully going to be the same as those who are webdriver users

Matt_King (IRC): to your point of targets we see the browser as the target
… <discusses a use case about the target being hit from a screenreader>
… what do we mean by target?

jcraig (IRC): this is about making sure we pointing at the correct target since we are outside the process
… this might be solved in the implementation? Or is this showing the security issue?

Matt_King (IRC): do we send everything through the screenreader

jugglinmike (IRC): not sure
… I don't have a lot of knowledge there so would need to look into it

jcraig (IRC): we allow any command through then there is a security issue since we could run apple script or change app
… and can get it do internal things to the OS

<Zakim> shs96c, you wanted to react to shs96c

shs96c: it would nice if AT Driver and WebDriver can interact together
… e.g. find element in one and pass to the other
… re: targeting it flags the window that is being automated
… Firefox says that it is under control and so does chrome. Safari has the "glass"
… and having a target prevents leaking from other windows
… and if we don't get this right it could negatively impact the rest of things we care about
… and finally there are some parallels AT Device and webdriver. E.g. Last APP is like getting the url of a website
… and then actions against a A11y tree

jcraig: I thought I recalled you wanted to "write" some AT prefs, like changing VO into "quick nav" or verbosity settings

jcraig (IRC): in previous meetings we didn't want to just read but be able to "write" to things

shs96c: Sending keys and hoping it goes the right place is likely going to go to the wrong place
… where a "what was the last thing spoke" it is less likely to leak info

Matt_King (IRC): we would get webdriver to create the item that is spoken
… e.g. webdriver doesnt always have focus where assistive tech needs the focus

Matt_King (IRC): <describes example>

jgraham: this is anagolous to writing automation at a OS level.
… and there is a lot of issues around scope and sandbox

jcraig (IRC): I will speak in defense. I do think it is possible to do this if we address the sec concerns
… as proposed, there is potential control AT from one browser and use it to control another app on the system
… and think my concerns are likely to be the same as narrative teams

<Zakim> jcraig, you wanted to mention install base differences between VO and other screen readers... much larger target for VO and Narrator than NVDA and JAWS

Matt_King (IRC): there are some ways fundamentally different
… the tests could be platform specific
… we are reducing the work that is needed to write a test

shs96c: we have capabilities to reduce the OS level things into start up

jugglinmike (IRC): most recently to discussing sandbox/target. This speaks to the wider scope of AT Driver
… my conception is to do what the user can do like what webdriver can do
… and in webdriver people are limited to the browser
… wouldnt it be good to discuss AT driver to drive anything that has an a11y tree

Matt_King (IRC): where we are now in terms of having an AT Driver protocol
… that can only have access to the browser for now is the right way to go
… it won't limit it in the future

jugglinmike (IRC): my concern that trying to standardise things that people concerned that things are too limiting?
… and there are potential issues around key codes between OSs
… and we can make the items that simplifies things for users
… and I don't have a specific quesition

<Zakim> jcraig, you wanted to volunteer to review the list of actions

jcraig (IRC): I voluteer to review that list of enumerated commands

shs96c: yes standards bodies are less nimble but its worth it
… getting something started is hard but getting incremental improvements are faster
… and we started small and built it out
… we can't solve all the use cases at first

Matt_King (IRC): I understand your concerns jugglinmike (IRC) and share some of them
… we can build out something that will be easier to scrutinize

<whimboo> RRSAgent: make minutes

<JimEvans> Suppose I execute `script.callFunction` with `() => document.querySelector("iframe")`. I get back a serialized NodeRemoteValue. How do I get the `BrowsingContext` and subsequently the document hosted within that iframe?

<JimEvans> * the `BrowsingContext` of the frame and subsequently

<jgraham> JimEvans: You need `iframe.contentWindow`

<jgraham> jgraham: Which should give a `WindowProxyRemoteValue`

<JimEvans> jgraham: Brilliant. That's what I was missing.

<whimboo> JimEvans: and that latter has the browsing context, but not implemented in firefox yet

Extension commands

<jgraham> github: w3c/webdriver-bidi#506

shs96c: one of the things people want to do is to extend the specification in certain way. Later there is a discussion about WebUSB. Previously not all extensions were supported. So in WebDriver Classic we have a concept of extensions and then primarily take form of extensions command. If you are a vendor, your command begins with the well known prefix.

<jgraham> RRSAgent: make minutes

shs96c: in web driver bidi it might be more complicated, we might want to allow specifying complete modules. For example, cdp commands could be in a cdp module.

shs96c: another place we want to do it is extensions to existing modules. E.g., adding specific methods to the browsing context. And finally the a11y conversation we would like to have additional locator strategies.

shs96c: extensions for the spec that makes them different that they will be defined in another spec. E.g., webusb webdriver extensions will be in the web usb spec.

AlexRudenko: what is missing from the spec?

jgraham: so in bidi at the moment we talk about extensions commands but we don't have in specs what other specs can and cannot do 1) defining commands in existing modules 2) defining modules

jgraham: for example, if I am making a command, what is the style I should use?

jgraham: e.g., what string formatting does webdriver bidi use?

jgraham: it would else help us to have consistency

jgraham: another question for some places we need to add explicit extension points

jgraham: this is how the capability matching currently works when it delegates to the webdriver spec

jgraham: w3c/webdriver#1701

shs96c: there is an argument that we need to add extensions to existing modules. For example, extending the new features in a namespace before moving to the spec

shs96c: the guideline could be if the module is defined in the webdriver bidi spec, send PR to the webdriver bidi spec?

jgraham: for vendor extensions it makes sense that they could be anywhere like parameters on existing commands. For the stuff coming from other specs, everything I saw so far can be a separate module. So it should be a preferred way for specs to do that.

jgraham: there is an interesting non spec discussion on what tooling we need to make it work

jgraham: with cddl documents we have no way of generating the cddl derived from another spec

jgraham: there is a general agreement about it

<jgraham> w3c/webdriver#1725

<jgraham> * Topic: w3c/webdriver#1725

jfernandez: regarding the syntax in extensions, there is a conflict between one that was created by Chrome. It is SPC Transaction Mode.

<jgraham> * github: w3c/webdriver#1725

jfernandez: I added a new extension command in order to define that permissions should not be required enumeration (?)

jfernandez: there is a conflict in recommendations and there is no sense to have two different enumerations with difference formats

jfernandez: the extension we are implementing is not the spec so we can do whatever we want

jfernandez: the only important thing is that we have an agreement on the formart

jfernandez: what syntax do we want for enums?

jfernandez: there is an agreement that it should camel case

jgraham: did an audit of what webdriver does

jgraham: in webdriver bidi we use camelCase for things that look like js code (further details w3c/webdriver#1725 (comment)

jgraham: we use lowercase in a bunch of places

jgraham: and there are cases where we taken format from JS

<jfernandez> for refrerence, these are the casing rules defined in the W3C design principles

<jfernandez> https://w3ctag.github.io/design-principles/#casing-rules

jgraham: proposal is that those value is camel case and we fix realm values

jgraham: what do people think about changing realm types?

jgraham: we could make bidi locator strategies match the new convention

jgraham: compared to lowercase camel case allows automatic conversion

jgraham: with lowercase it is more difficult

<Zakim> shs96c, you wanted to add some background on how the casing used in webdriver came about. Happy to skip that if there's no interest

shs96c: originally webdriver was taken java objects and name conventions matched Java

shs96c: with the exception when we needed to use keys in a dictionary

shs96c: it was before the design guidelines existed

shs96c: for locator strategies we can change them as they are not spec'ed

shs96c: similarly for error codes we could change it. Jim and I will be most affected but managable.

shs96c: I see no potential problems following James' proposal

sadym (IRC): there are some clients that implemented parts of bidi selenium and webdriver. If we rename it, we should do it as early as possible

shs96c: for selenium it would not be much effort. Cannot speak for webdriver.io

shs96c: if it is the breaking change worth making for consistency, we can do it because it is still early in the spec and now is the time for changes like this

sadym (IRC): is selenium version ties to the browser version?

shs96c: there is a protocol converter that allows mapping enums

shs96c: similarly we also don't pass locator names unchanged

AlexRudenko: it is managable if it is not a difficult rename (only format changes). We could be breaking things for a while but we are not shipping it for the users so can afford it.

jgraham: sounds like we have agreement on the first part that we need to adopt camel case

jgraham: changing chromium/firefox/tests sounds like a lot of work

jgraham: making breaking changes is painful and hard

jgraham: for the value types and the error types, we just say it is inconsistent for historical reasons

jgraham: in practice no one is probably affected by the format

jgraham: for the value types we can make an argument that it matches the constructor name conversion so that they match HTML and EcmaScript

jgraham: that makes us consistent with other specs

jgraham: hopefully no one in other specs introduces case sensitive identifies which would make us ambigious

Jim Evans: for enumerated value it is not a big deal

Jim Evans: it is easy to map values in enums

* in enums (for me)

jgraham: what is the effort for renaming

sadym (IRC): serialzation from our side could be complicated to rename but it is doable

sadym (IRC): it is not a show stopper but if we have other arguments it can overweight

AlexRudenko: probably WPT and serialization is the most effort

jgraham: we reached a half decision and can revisit tomorrow. Moving on to the next topic.

Adding Permissions Automation to the Roadmap

<jgraham> github: w3c/webdriver-bidi#523

<jgraham> RRSAgent: make minutes

mattreynolds (IRC): we have a webbluetooth api that has a chooser api for its permission UI

mattreynolds (IRC): we developed a webdriver api for testing but we need to communicate back the list of devices that comes from the chooser

mattreynolds (IRC): we are unable to bring the API into the webdriver classic so we are looking into an extension for the webbluetooth

* extension for bidi for the webbluetooth

mattreynolds (IRC): we are developing a proposal for the webdriver bidi extension. We eventually want to use for WPT

mattreynolds (IRC): is there anything more to know about webdriver bidi for us?

mattreynolds (IRC): we have a pull request that adds permissions to the roadmap

shs96c: an extension sounds good

mattreynolds (IRC): what is the best way to reach out?

shs96c: this wg, e.g., via matrix

<jgraham> orkon: I'm familiar with device access API. This is probably the first extension for BiDi. You should be able to reach out to me to bring questions to the wg

jgraham: I agree that it sounds like an extension. If I understand the model, it sounds right to have the event based model where the backend sends the choices and the client makes a choice based on the event.

jgraham: this is exactly the kind of flow I would imagine in BiDi.

mattreynolds (IRC): another kind of API is to bind/unbind to enable interception

jgraham: you need to subscribe to existing events

sadym (IRC): the question is not only about the device choice automation but also about other extensions. Should regular permissions be part of the bidi API?

jgraham: webdriver has a spec for permissions

jgraham: there is certainly a precedent for permissions being an extension

jgraham: but it might be that general permissions could in the core spec. It depends?

jgraham: e.g., device choice has different workflow compared to the yes/no permissions

jgraham: we want to reformulate classic on top of bidi, so we would need an equivalent in bidi

jfernandez: maybe I have not understood the use case completely. But it is very similar to the use case I tried to define for the protocol handlers API. SPC dialog also requires permissions. So in order to implement WPT they defined an extension command for webdriver which puts the feature into a specific mode that allows the feature to ignore the prompt.

jfernandez: it seems that there are several features that require bypassing the workflow. Another option for you would be another extension command for webdriver to bypass prompts. If that is the case, do we want a lot of extension commands for the same purpose?

mattreynolds (IRC): I would that agree that for apis that do not have chooser dialog it is a good way

mattreynolds (IRC): and the bluetooth api dialog has a dynamic list and we need to tell which item to select

mzgoddard (IRC): we talk about it as permissions but that is a chooser prompt that can get a list of many devices and they don't show up all at once. And bluetooth needs to wait for the wireless communication. The list does not always show up in the same order. And we cannot always say choose the first device and the order is not deterministic. And the second thing I would like to express: there are three other APIS, USB, Serial, WebHID that also has

the list of devices that updates based on the filter.

mattreynolds (IRC): technically there is also a font access dialog

shs96c: while those things should live in their own specifications. One thing we can have a non-normative section on how to perform common patterns so that there is a consistency between related specs

Sandbox mode

<jgraham> github: w3c/webdriver-bidi#289

orkon: The problem is that Puppeteer supports a command based on CDP that allows partitioning browsing contexts into multiple profiles based on incognito modes. Terminology here is hard because of existing usage. Proposal is to call it "browsing context group". I've prepared a short draft we can discuss. Are there high level concerns about the need for this API?

sck orkon

* ack orkon

jgraham: Browing context group is already a thing in HTML, but let's work out naming later.

<orkon> * ```... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/OskbYencEkxrfnjrRBJGMHRI>)

orkon: Three more operations. One to create a group. Chrome has properties that can be given to specific groups. Don't want those in specification, but it should be extensible. Group should get an id. That will close all contexts in a group without running unload handlers. Third method gets all groups. Add group parameters so you can get contexts in a specific group. Also allow defining permissions per group. Another aspect is that storage is

separated between groups. User action causing new browsing context to be created causes it to be created in same group. Whether this affects on-disk storage of data could be implementation detail. Primary use case is not needing to restart browser to get isolated test environments and have easy cleanup.

shs96c: Browser groups are effectively totally independent of each other? Another model would be multiple independent sessions?

orkon: Groups are kind of independent. Shared browser process. Can connect multiple sessions to groups.

shs96c: This seems like it might be specific to how chrome is implementing this feature. Need to talk to WebKit.

shs96c: Selenium wants fast startup time, not containerization. Sounds like this maps a bit to containers in firefox, but maybe not exactly. Might be worth going back to use cases i.e. fast startup time of independent concurrent sessions. Everyone should be behind that. I don't think we need to go this far e.g. chromedriver creates browser process and a group and new session creates a new group rather than a new group. Don't know if we need to

expose this level of control to individual user. Completely new session could be a flag.

jgraham: nervous about using new session for this. Concerned about using it for this specific use case

jgraham: it is important it is supported by Webkit

jgraham: on Firefox side maybe it maps to exposing containers

shs96c: given my knowledge in WebKit it maps to profile

jgraham: the ability to have multiple private browsing windows might be Chrome only

jgraham: on the protocol level it sounds like we need to add a group id next to the context id

jgraham: does the format on protocol makes the clients to have it easy to have separation?

jgraham: if it is a property on the browsing context object in theory it is possible

<jgraham> orkon: Goal is to enable seperation on the browser side. Don't necessarily need clients to be able to enforce seperations.

<jgraham> shs96c: Supposes that everything could support groups. We should think about the expected behaviour of not supporting groups. If that's just like calling New Session, we should consider the context e.g. is it connecting to an exisiting instnace or are we creating a new isolated world. The features are incredibly desirable. Being able to connect to an existing browser process as a new session. Being able to have both of these things would be

<jgraham> good. Might not need a new API. If every new group had a new sokcet URL you could distinguish between old and new processes.

<jgraham> RRSAgent: make minutes

<jgraham> zakim: bye

Summary of action items

  1. file an issue on how to handle alerts if there aren't any subscribers to the events
  2. documenting navigable properly in the spec with browsing context
Minutes manually created (not a transcript), formatted by scribe.perl version 221 (Fri Jul 21 14:01:30 2023 UTC).

Diagnostics

Succeeded: s/<missed the use case where it fails>/e.g. scrollable non-root elements/

Succeeded: s/command to scroll the element/ command to scroll the element in an action chain

Succeeded: s/<name>/Aaron Leventhal/

Succeeded: s/,/./

Succeeded: s/orkon/jgraham

Succeeded: s/<missed something about CDP>/We also need a mechanism to get the current URL

Succeeded: s/a11y engine is part of web rendering engine/each web rendering engine maintains an internal model of accessibility before it exposes it to the platform-specific accessibility APIs/

Succeeded: s/In the context of automation there is platform-specific automation. AT-driver will be later today. /In separate contexts, there is platform-specific automation and AT automation. Lola and Matt's AT-Driver session will be later this afternoon. Not directly related to this request. /

Succeeded: s/Some existing solutions here with DOM as source of truth./Some other existing "client-side automation" solutions (like Deque's Axe-Core) treat the DOM as source of accessibility truth./

Succeeded: s/Want to test conflicts e.g. required vs aria-required. /In addition to the 50+ other aria attributes, we want to test conflicts with host language attributes... e.g. required vs aria-required on the same element. /

Succeeded: s/Stakeholders agree on the goals in those issues. /I think all the stakeholders involved in 197 agree on the following goals. /

Succeeded: s/neither is a11y tooling itself./neither is assistive technology (AT, e.g screen readers) automation itself. (Caveat: Driving AT is obvs in scope for separate issue: AT Driver)/

Succeeded: s/we can have something like a aria-braillelabel and return a string/i'd see this as another accessor on the screen reader... (last spoken phrase, or current braille buffer... identical in most cases) .... that'd be a way to test aria-braillelabel for example/

Succeeded: s/ <missed example given>/ I thought I recalled you wanted to "write" some AT prefs, like changing VO into "quick nav" or verbosity settings /

Succeeded: s/there is potential get from one browser and copy it to another/as proposed, there is potential control AT from one browser and use it to control another app on the system/

Succeeded: s/review things/review that list of enumerated commands/

Succeeded: s/speced/spec'ed/

Succeeded: s/spec/specs/

Succeeded: s/?/WebHID/

Succeeded: s/find access/font access/

Succeeded: s/Aaron Leventhal from Google is trying to make this the same/Aaron Leventhal from Google suggested a variant "normalized" ax parent which could make these the same... possibly align on a single ax tree /

Maybe present: AlexRudenko, jfernandez, littledan, Lola, RRSAgent, sideshowbarker

All speakers: AlexRudenko, automatedtester, jcraig, jdescottes, jfernandez, jgraham, littledan, Lola, lola_, orkon, RRSAgent, sadym, shs96c, sideshowbarker, whimboo

Active on IRC: AutomatedTester, jamesn, jcraig, jdescottes, Jem, jfernandez, jgraham, jgraham_, JimEvans, jugglinmike, littledan, lola_, Matt_King, mattreynolds, mzgoddard, orkon, patrickbrosset, sadym, sasha, shs96c, spectranaut_, thiagowfx, whimboo