ARIA-AT AT Driver Subgroup Monthly Teleconference

Meeting minutes

jugglinmike: let's review upcoming meetings. This group will meet 10th of February. The next BTT meeting will be the following Wedsnesday after that

jugglinmike: 2 agenda items today - the future of the windows AT Driver server, and the Activate Element user intent patch for AT Driver

future of the windows AT Driver

jugglinmike: over a year ago, we built an implementation of AT Driver for Windows using a text-to-speech voice for Windows OS.

jugglinmike: Our usage hasn't panned out in the way we thought. NVDA has their own patch (built by PAC). It doesn't work for JAWS. We don't know why exactly yet, due to lack of resources. Windows Narrator is not on the roadmap yet either.

We still have this server code in the same repo as our Voiceover MacOS code to reduce duplication since they share the same CLI

Continuing to maintain the windows server is a burden. We can continue, but Bocoup would like more clarity from this group on the roadmap to determine what to do next.

mattking present+

mattking: we wanted to test vendor implementations in a generic way. NVDA has its own server and NVDA instance built in, but we wanted the AT driver server to be able to be compared to PAC's NVDA implementation. I'm not sure if that logic still makes sense or not? Does this rationale still hold any water?

jugglinmike: It does make a little bit of sense. We built this before the NVDA addon was made, so we weren't thinking of this use case specifically. The issue with generic servers is that it's really limited to what we can do with AT driver. Sending keys and observing speech. When we develop more capabilities for AT Driver, it's unclear how we implement those in an OS-agnostic way.

mattking: so the value proposition seems to keep shrinking. Why is it different on the Mac side?

jugglinmike: the MacOS won't be able to use `ActivateElement` for example

mattking: Ideally if Apple would implement AT Driver, then the Mac issue would go away

Given that JAWS is working on it and NVDA is functioning, it doesn't make sense to maintain the Windows server. Especially since Narrator isn't on the roadmap, but if it were, we could ask Microsoft to implement AT Driver

jugglinmike: that would be ideal for the future of AT Driver

I still think the underlying motivation is compelling, so am sorry to leave it behind. It seems like the most authentic way to build this. There's less room for false negatives and positives in this testing environment.

mattking: can you clarify?

jugglinmike: Let's say Mac does implement AT Driver, they could implement differently from what a user might expect.

jugglinmike: There may be issues down the line for AT Driver commands that try to change AT settings depending on the OS

mattking: there's currently keyboard commands for changing settings on most screen readers now

mattking: for example, on NVDA, you can change the punctuation level if you press the INSERT key

mattking: getting information from the screen reader is the harder thing in terms of dealing with AT settings.

Do we write into the AT Driver spec anywhere about implementation guidance if the AT driver response differs from the expected user response?

How could we test for that too?

jugglinmike: I think that expectation is kind of implicit. We've kind of decouple the idea of a command from a response. Everything is just an event. We're not saying commands have to behave in certain ways. They're not closely connected to responses. That would be more like the webdriver architecture.

mattking: I wasn't really fully aware of this architecture. When we send commands, I thought we still expect a response?

jugglinmike: No. The commands are about simulation of the expected command. But we don't know if or what the response will be? We also don't know the timeline of a response.

mattking: A command for "Say Element" should have a response, for example.

jugglinmike: I agree and am thinking how that would work in a protocol sense. The screen reader is at liberty to say anything at any time. How do we handle async responses on a protocol level? We'd have to specify some kind of buffer for responses

mattking: Somewhere in the spec it seems like we should be doing something to specify the expected user response from the same command. It should be a normative requirement.

jugglinmike: It's something that will come up as we develop out more user intents

mattking: It could apply to `PressKeys` too. The advantage of the AT driver server is that's closer to an actual user experience in terms of the relationships between commands and AT response. We should have something in the spec that says that.

jugglinmike: Ok, I will keep that in mind. Maybe we can come back to it later.

It does seem like that we can generally drop Windows support for now.

mattking: We'll only revisit if circumstances change

jugglinmike: it will be in the git history

The 'ActivateElement' user intent in AT Driver

w3c/at-driver#82

github: w3c/at-driver#82

jugglinmike: the idea for this PR was largely hammered out at TPAC. We wanted a way to activate a focused element under the current virtual cursor. We wanted to facilitate use cases where the user needs to interact with form fields without sending arbitrary key presses. So with Activate Element, we can move the virtual cursor to the desired element, and then use webdriver to interact with the element.

I initially wanted to use a lot of the web platform concepts to talk about focus and element, but when I started editing the spec, I realized that only makes sense for Webdriver but not for AT Driver. It doesn't have concepts of browsers or browsing sessions. Webdriver knows about the DOM and browsing contexts. Do we want that in AT Driver or not?

I didn't want to go down that route because it involves a lot more spec writing. I also think it's not quite appropriate for the AT context.

mattking: Are we stuck on Mac development without this?

jugglinmike: Not exactly. If Apple won't implement 'Send Keys', we're still stuck.

mattking: I thought that this would allow us to use Webdriver for 'Send Keys'

jugglinmike: Yes, but not for the interesting part in terms of AT Driver.

mattking: In the current MacOS implementation, doesn't applescript give you a way to activate an element? We're doing it right now.

jugglinmike: Yes, but it's not recognized at a protocol level.

mattking: so we don't have a concept of session or a document. What would the workarounds be?

jugglinmike: Right now, what the PR does is that the algorithm just instructs the implementor to perform the default action for the virtual cursor.

mattking: we should use a different term

jugglinmike: are we on the right direction?

Beyond the directional sense, is this a good thing that the PR is so agnostic about the browsing context? It could be useful, because it could be equally viable in other contexts, like native apps.

mattking: But "default action" is not enough. Did we have the discussion about "Activate Element" versus "Focus Element". Moving the Screen Reader Point of Regard does not necessarily activate the element. For instance, it doesn't click a button, but we might need to just move focus to the button.

jugglinmike: The motivating use case here is mostly for form fields, not buttons.

mattking: If you want to test the NVDA response for reading the current element for example, but doing it in NVDA focus mode and not browsing mode, do we have a way to get focus for the desired element in NVDA? In ARIA-AT, we have scripts that move focus and reading cursor around.

jugglinmike: I don't know

mattking: so right now in ARIA-AT, we rely on scripts to set up the test conditions. But if someone was walking through a workflow with AT-Driver, then they might need focus-setting user intents.

jugglinmike: With moving focus, is that just basically using TAB?

mattking: it can be using TAB or arrow keys

jugglinmike: I think we should have an intent for anything a user might do, so yes we would have intents for focus movement

mattking: so where is "Default Action" defined?

jugglinmike: It's currently not defined. I'm not as familiar with AAM. Is that something we need to define ourselves? Is there something we can normatively reference here?

mattking: I know there are API's that have method names like `performDefaultAction`, but I can't answer that definitively now. The concept of default action is understood, but I'm not sure how it's defined or if it's defined anywhere

jugglinmike: I can take this an action item for myself to get more understanding here. I can bring this to Valerie from Igalia, who is an editor of Core-AAM

mattking: So webdriver doesn't have anything like "Default Action" in it?

We might have to normatively define it in Webdriver terms.

jugglinmike: As much as I want to leverage existing work and familiarity with web concepts, I'm not sure we can do that without dragging in a concept of a browsing session again.

mattking: It seems like it's hard to define this without bringing the concept of a browsing context.

jugglinmike: It seems like the only way to make this less hand-wavey is to bring in a lot of web platform concepts. I'm not sure these concepts will be helpful to a screen reader implementor.

mattking: I think we need the screen reader implementors to weigh in here.

jugglinmike: Is this something I should push on ahead of the February meeting?

mattking: It's going to be hard to move things along in the BTT meeting without some earlier discussions. I think some other things you can do in the near term is to use "Reading Cursor" or "Screen Reader Point of Regard" instead of "item in Virtual Cursor". We also need some definition of "Default Action". That word "item" in the AT's point of regard needs to have meaning. We need to make sure we're not writing this in a way that's too

specific to screen readers.

Zakim. end meeting

– DRAFT –
ARIA-AT AT Driver Subgroup Monthly Teleconference

13 January 2025

Attendees

Meeting minutes

future of the windows AT Driver

The 'ActivateElement' user intent in AT Driver

Diagnostics