The Future of Assistive Tech Interoperability with ARIA-AT

Meeting minutes

<chrisp> +present

ChrisCuellar: ARIA-AT is different from other accessibility testing frameworks or platforms that you may have already encountered is that this one is really pushing forward the concept of interoperability within screen readers themselves

ChrisCuellar: It's pushing beyond the boundaries of the browser

ChrisCuellar: This effort started with the ARIA-AT community group in 2016

ChrisCuellar: in that time, the framework has evolved to a high degree of sophistication

ChrisCuellar: this involes writing tests, running tests, and even automating test execution

ChrisCuellar: We're here to share an overview and give status updates

ChrisCuellar: We want to make a deeper dive into the infrastructure--how the tests work, how they are structured, and the underlying methodology

ChrisCuellar: And we'd like to give an indication of where the program is headed and share some pointers on how folks can get involved

ChrisCuellar: We're sharing a screenshot of the ARIA-AT app which documents support levels for our test plans

ChrisCuellar: It features a grid describing screen reader / web browser pairs

ChrisCuellar: Initially, we've been testing those pairs' renderings of design patterns from the ARIA Practices Guidelines

ChrisCuellar: The goal of the program is to help AT vendors improve interop through testing

ChrisCuellar: We've taken a lot of inspiration from the web-platform-tests project

ChrisCuellar: Along those lines, we're hoping to get more granular with accessibility-related features

Matt_King: This is different from other interop efforts is that normally, you start with a standard and have everyone tested to that standard. Here, there is no standard for how ATs should behave. We're trying to solve that, but not by writing a standard. We're starting with tests as a basis to drive consensus about basic expectations

Matt_King: The biggest value to developers is having the confidence in that the experience you are designing is truly accessible across all platforms

ChrisCuellar: There's been a lot of evolution in this project's six-year lifespan

ChrisCuellar: We've learned a lot about the difficulty in testing against the accessibility stack

ChrisCuellar: We have a concept we've been calling the "four-mile journey"

ChrisCuellar: It describes what happens to the code that web authors write in order for it to reach AT users

ChrisCuellar: The first mile is about the code as authored by web developers. That's supported by the ARIA Authoring Practices Guide, WCAG, etc.

ChrisCuellar: The second mile is the accessibility tree. That's really the territory of browser developers. At this stage, we're able to track interop by web-platform-tests. There's been a lot of innovation in recent years around exposing that tree for testing

ChrisCuellar: At the third mile, we have the operating systems' accessibility APIs. Before we reach the screen reader, we have to pass through the operating system. There, we're dealing with AAM tests

ChrisCuellar: It gets harder and harder to access each layer we're describing here. But efforts are underway to tap into and to test the accessibility APIs. That's a new frontier for the web-platform-tests

ChrisCuellar: The last mile is what we've been talking about--it's where ARIA-AT really lives. It's the behavior of the ATs (e.g. screen readers) themselves

ChrisCuellar: We're validating that the various ATs we support (JAWS, NVDA, and VoiceOver at the moment) provide a roughly equivalent experience

Matt_King: The idea that the screen readers should behave "more or less the same" is an area that I'm sure many in attendence today will want to interrogate. That's where the ARIA-AT community group spends most of its efforts

ChrisCuellar: There are many distinct projects under active development for testing at each of these "miles"

ChrisCuellar: Since 2018, we have developed an overall approach to building consensus. I think that's the most unique part of our work. There is so much conversation between testers and AT vendors themselves

ChrisCuellar: We have a repeatable, scalable, and automatable test structure

ChrisCuellar: We have a testing and reporting platform

ChrisCuellar: And we have integrated JAWS, NVDA, and VoiceOver automation

Nigel: Is there a reason why TalkBack is not in that list?

ChrisCuellar: Why yes there is!

ChrisCuellar: It's on the roadmap!

ChrisCuellar: Mobile in general is something we started to work on this year, and we made some good progress on Android

ChrisCuellar: Likewise, we're also interested in moving beyond the English language

Matt_King: Our current scope is limited by resource availabilty, largely

Matt_King: Our plans have shrunk over the years. Our plans back in 2018 were much more optimistic about our progress by this point. We wanted more screen reader/browser pairs, etc

Matt_King: This work all hinges on the availabilty of automation. Without that, it becomes impossible to keep up with the releases of new versions of platforms

Matt_King: So we continued narrowing our scope to a point where we could find succcess given our resources

Matt_King: But we've been designing everything to avoid limiting extensions to other kinds of ATs, other languages, etc

Nigel: Do ATs have a standard protocol for reporting their state?

ChrisCuellar: We're not trying to start with specs and standards--we're backing in to that via testing. However, one area that is under development (one that is powering automation) is a standard protocol for remotely controlling ATs

ChrisCuellar: We're calling it AT-Driver, and it's modeled after WebDriver BiDi from the W3C

ChrisCuellar: That's enabling us to do the work driving consensus in AT users' experience

Nigel: So you've got some stimulus, you're expecting to observe the behavior, and what are you actually observing?

Matt_King: We're at the final level

ChrisCuellar: Right, "what is the attached device actually doing?"

ChrisCuellar: Initially, it started with pressing keys, but it's starting to evolve into more generic "user intents"

jugglinmike: We're actually capturing the text being spoken by the screen reader. So it's text data from the screen readers.
… AT-Driver is implemented as a web driver bidi protocol. It speaks over websockets. We've implemented in NVDA, macOS and in JAWS. As a separate effort, we're hoping that this has other implementations.

ChrisCuellar: Yeah! And hearing that, I was wondering: are there other implementations or use-cases that would be valued by folks here?

ChrisCuellar: What drew you here to this talk today?

florian: The use cases I've had are internationalization-related

florian: I've assumed that the well-trodden English paths are the best-tested

florian: So I'm concerned with how the accessibility tree is rendered in internationalization contexts
… I'm curious about CJK-related transforms

ChrisCuellar: So it would be useful to you to get the final output just to verify?

florian: Yeah, in a WPT-style context
… So if implementers really insist on what they're doing, we can reecognize this and have a conversation
… Another case I've wanted this is also related to CJK use-cases. Ruby is an assistive tool for sighted users who, for whatever reason, lack knowledge about the rendered text and need help interpreting it
… Bad information in this context is worse than no information
… There are language-specific things that happen with some technologies where verification is especially important

Nigel: One of my use-cases is in an implementation that sends audio description text via an aria-live=polite element
… and there's a related use-case where, if you imagine that you have a video that only has a description and it has hard-of-hearing subtitles
… In the BBC's player, we send the subtitles to the screen reader. Let's say that you have two people watching this video, and one of them can't hear, and one can't see. You're screen reader is on, and your subtitles are on. It would be really good to have a repeatable mechanism to understand the user-experience

Chiara: Sometimes, others think about people with physical impairments. Of course we know this work is also about people of advanced age
… We want to be sure that these experiences are designed properly
… Also, my manager asked me to implement something in a website because the target audience is aged above 70 years old.

ChrisCuellar: Thanks, everyone!

ChrisCuellar: Let's get into how this all works operationally

ChrisCuellar: The effort is hosted by the W3C

ChrisCuellar: And that's informed a very rigorous process design

ChrisCuellar: If you ever join a Community Group call, you'll hear Matt_King assigning Test Plans to testers (here on the slides, we're looking at a "radio group" test plan--specifically one that relies on aria-activedescendent)

ChrisCuellar: Generally, we want to have test plans executed by two testers. We're looking to corroborate the results

Matt_King: Right. The test plans are authored to be as specific as possible, but there's still plenty of room for people to make mistakes.

ChrisCuellar: So, reviewing a test plan like this one I'm sharing for JAWS and Chrome, you can see that there are a lot of instructions.

ChrisCuellar: Then, you get a list of different commands--these are steps in the test. You can see that here, we have one command for what happens after you hit the "x" key. Here, it's when JAWS is in a specific mode.

ChrisCuellar: Following that, we have the captured output from JAWS

ChrisCuellar: We're not just making assertions against the output itself. This is where the role of human testers is critical. There has to be subjective judgements made about the output

ChrisCuellar: We have a set of assertions about the output. E.g. "was the list boundary conveyed?"

ChrisCuellar: Sometimes, people have different results, and we talk about that in the community group. The bot really helps with the velocity of this task. Today, the human testers work to verify that what the bot reports match their own experience (rather than enter the data manually themselves)

Matt_King: The process is that: someone writes the test plan, then at least two people run it. Once we've ironed out the behaviors, we move from "draft review" to a state where we can run the test plan whenever a new version of the AT under test is released

Matt_King: Ultimately, this test is kind of defining "what do we mean that the checkbox is supported?" What does that mean in real life? By writing these tests and gaining consensus with screen reader developers, everyone can have a shared understanding about elements in HTML (or role, state, and property in ARIA) means for users

ChrisCuellar: All of these get finalized into reports that we published. Those give an overall sense of how the AT/browser combinations are performing

florian: This sounds reminiscent of something we used to have in Opera software for visual tests.
… Is this approach something that is or can be integrated with how WPT does tests?

jugglinmike: In WPT, there are ref tests that are somewhat relevant to this discussion. But the level of fuzziness involved in this kind of testing is different. We have a concept of verdicts in ARIA-AT. It's not enough to say that an assertion is passing or failing. The verdicts are subjective and fallible.

florian: In the pre-WPT days at Opera, we had reftests AND visual tests.
… In some cases, a fuzz factor would be sufficient
… I think we had in the visual context where there could be tremendous variability, and it would be obvious to a human if the result was right or wrong, but it would be very difficult to encode that in query

ChrisCuellar: I think it might be a non-goal to get this into WPT given the level of infrastructure required to run screen readers seems undesirable for WPT maintainers

Matt_King: In the first 1.5 years, we researched what exists already and whether we could fit into off-the-shelf solutions. The result was that we really did need to build a bespoke solution

Matt_King: Over the years, we've learned about what we can and cannot abstract

Matt_King: Making the decisions about test design--how abstract or concrete to write them. Working with concrete has allowed us, in time, to see the opportunities for abstractions

florian: It seems valuable, though I'm sure it would involve a lot of work
… In CSS WG, we did not consult with the people who work on the system to learn what is feasible
… When people implemented our work, they did something bad because there were no tests, and that was bad. Our system may have been great, but they did something else

Matt_King: We want people to be able to propose expectations for assistive technologies, place a test in this system, run the tests, and learn about the implementations' behavior

florian: I was hoping for a separation between the people writing the implementations and the people writing the tests. That's the WPT parallel I'm interested in

Matt_King: Agreed. We think this platform brings a lot of value to the community in terms of moving interoperability forward. We're trying to find the best way to work it into the needs of those working in these spaces

Matt_King: Testing new features and experimental implementations--are they potentially able to deliver the value to end-users that we want them to?

ChrisCuellar: This slide I'm showing now demonstrates how we're reporting support levels

ChrisCuellar: We've been testing against the APG patterns, and we're trying to open the door to writing other kinds of tests

Matt_King: The project actually has a lot of different needs

Matt_King: One is enlisting people write tests for ARIA and HTML features

Matt_King: We also want to make this platform itself work--there is infrastructure, the implementation (three bots at the moment, and a desire for more)

Matt_King: If there are people who are passionate about this space and want to see assistive technology interoperabilty become a thing (or you know someone who is, or you know someone who can recruit talent), that would be a big help

Matt_King: You can come to us directly, or you can join the Community Group and we'll welcome you there

– DRAFT –
The Future of Assistive Tech Interoperability with ARIA-AT

12 November 2025

Attendees

Meeting minutes

Diagnostics