W3C

– DRAFT –
ARIA-AT AT Driver Subgroup Monthly Teleconference

08 September 2025

Attendees

Present
ChrisCuellar, jugglinmike, Matt_King, mattking, mfairchild
Regrets
-
Chair
-
Scribe
ChrisCuellar, jugglinmike

Meeting minutes

Changes to today's agenda?

Matt_King: Do we have any info from BTT about advancing the status of AT-Driver? What is an MVP spec? How close are we to that? Who decides that?

jugglinmike: I can try to get an answer from the BTT chairs at the next meeting on Wednesday

Matt_King: That would be great! I would really like to know what critical gaps we have to fill to start moving this forward before the end of the year

jugglinmike: I'll take that as an action item.

Matt_King: We discuss any follow-up to the ARIA-AT CG agenda.

Screen Reader Speech Interruptions

github: w3c/at-driver#94

jugglinmike: We've always known that a screen reader makes decisions in rendering text when there's multiple concurrent messages. ARIA-live being polite or assertive is an example. But it's also common when navigating any web page. Right now, AT-driver is ignorant of those interruptions. This isn't new to us, we anticipated that this would someday be important for us. I'm bringing it up today really because of the work that Bocoup has been

doing to improve the consistency of the ARIA-AT automation.
… In our observations, there appear to be cases where we our tests catch multiple messages that don't seem to be rendered to a human user of the AT.
… So right now AT-driver may be reflecting what the AT is doing in a way that isn't meaningful to a human user.

Matt_King: We saw an instance of this recently with NVDA. The bot recorded double speech but most human users couldn't detect the duplication.
… So the speed of interruption matters. It's possible that the AT renders parts of the utterance before it's interrupted. What's the most accurate way of recording this? Is it impossible to predict when this actually affects human users?
… I'm actually wondering if what we're really highlighting here is more that we discovered a limitation of relying so heavily on human review in the ARIA-AT testing protocols.

jugglinmike: It's hard to verify what human testers report.
… I'm also thinking about other possible users of AT-driver and what their goals might be? I'm also thinking about the case of aria-live, where it seems like this problem would come up a lot more.

Matt_King: AT-driver might not be able to detect product regressions.

jugglinmike: Incoming timestamps and priority levels of utterances might be helpful.

Matt_King: The new aria-notify API does have timestamps.
… at least this API has this concept of priority levels. I don't know if this exists generally in AT's however. Probably not.
… I wonder if priority levels should be treated as a settings thing?
… Some utterances are the results of commands, others are from events (like aria-live). Some web apps like Google Docs heavily rely on aria-live to report what's happening.
… One of the inherent limitations of the AT-driver concept is that we don't know whether or not the speech that is presented through the driver is in response to a command or some other event.

jugglinmike: That's correct. Both JAWS and NVDA aren't technically capable of linking utterances with specific user interactions.
… Maybe the timestamp could be inferred from an AT-driver client. It's true that we can't know how interruptions are handled in the wild, but maybe we can make some assumptions based on what we know about human cognition. Maybe we can test for interruptions and make assertions about what's expected in terms of space between utterances?

Matt_King: That would be hard to do.

mfairchild: We can probably safely assume if the timestamps between two utterances is less than 50 milliseconds we could infer an interruption (unless the messages get queued).
… For the issues we're running into? Are the problems really obvious?

jugglinmike: Good question. I don't have the data about timestamps yet. Maybe that would be a good first step though. Capture timestamps and filter out messages that come in with too short of an interval between the timestamps.

Matt_King: Yes, we can capture all of the utterances with their timestamps and then making decisions later about how to calculate the reliability of the captured utterances.

jugglinmike: It's starting to feel like storing this data as strings is not sufficient. We might need to talk about the data model at some point. This is connected to the issues with normalization that we had recently.
… I'm glad where we got to today. This seems like something we can implement for the macOS voice and it's probably a low-lift for Vispero and NVDA. Do any APG patterns actually use aria-live?

Matt_King: We can look at the index by property in the APG for that info.
… There's 5 patterns. Alert, 2 different Carousel patterns and 2 Date/Timepicker patterns. And Alert uses "assertive".
… But the phrase spoken is really short and it's only spoken once, so it's not useful for testing interruptions.

jugglinmike: This sounds like a test coverage issue.
… Would this fit into our roadmap at all?

Matt_King: I would love to have people submit test plans for aria-live specifically at some point in the future to address coverage gaps like this. APG doesn't cover everything, especially more atomic tests.

jugglinmike: Just to speculate on changing the Alert test to test sequences of utterances, what would that test like that?

Matt_King: As an aside, the other option for aria-live used to be "rude," but that became "assertive".
… But back to the question, we would just need to test how utterances are rendered with sequential longer strings of text. It would be very different from the current APG example.

jugglinmike: How would an ARIA-AT human tester capture the output?

Matt_King: I
… I'm actually really curious how the ATs would perform now with a test like this? I really wonder what AT vendors would think of this?

jugglinmike: I'm guessing that the AT doesn't know whether or not an interruption happens, but on the level of platform API's it seems possible that the screen reader could be more aware about interruptions.

Matt_King: It's complex to think about how all of this makes its way to the actual speech synthesizer.

ChrisCuellar: It would be great to have an AT vendor here!

ChrisCuellar: We're trying to specify this, after all. How would we want this to actually work? From the perspective of a human end-user

ChrisCuellar: It seems like interruptions should never happen unless there was an aria-live=assertive setting

Matt_King: By default, speech is always getting interrupted, and that's necessary

Matt_King: Imagine you are going through a list of files in Windows Explorer. You only need to hear the first two or three syllables to determine whether to continue to the next item or not

Matt_King: If you had to listen to every item queued up in full, you would never get anything done!

Matt_King: In a similar vein, a lot of keys are echoed. Screen readers can typically keep up with typing, but not always

Matt_King: So by default, speech is interrupted by subsequent key presses

Matt_King: And each AT even has an explicit interrupt key

ChrisCuellar: Is there any kind of user setting? Can you change the default interruption threshold?

Matt_King: Normally, you can only either turn them on or off

Matt_King: I don't know anyone who has turned off interruptions

Matt_King: You could also just "ride" the control key. That would clear the buffer after every command in NVDA. I'm not sure what would happen in VoiceOver

Matt_King: I don't know if VoiceOver even has the ability to turn off interruptions, come to think of it

jugglinmike: So for next steps, we'll immediately specify a timestamp for speech events. We also want to explore the feasibility of value for capturing "assertiveness" or priority levels for utterances. We can also immediately extend our current consistency reports to see how tracking timestamps at different thresholds and attempting to filter for interruptions possibly improves consistency.

Matt_King: How long does our test runner currently wait between sending sequential commands?

jugglinmike: It waits 1 second after navigation and waits 5 seconds after each key press.

Matt_King: So we wouldn't be in danger of losing or filtering out valid speech. Ok.

jugglinmike: We won't make any changes to the ARIA-AT app or the bots until we collect this info from our consistency reports first.

That's it for the day!

Minutes manually created (not a transcript), formatted by scribe.perl version 244 (Thu Feb 27 01:23:09 2025 UTC).

Diagnostics

All speakers: ChrisCuellar, jugglinmike, Matt_King, mfairchild

Active on IRC: ChrisCuellar, jugglinmike, Matt_King