ARIA and Assistive Technologies Community Group Weekly Teleconference

Meeting minutes

Review agenda and next meeting dates

https://github.com/w3c/aria-at/wiki/September-4%2C-2025-Agenda#js-repo-pjax-container

Matt_King: Requests for changes to agenda?

Carmen: I had a question about AT support with the bots, and another question from ChrisCuellar

Matt_King: We can definitely make room for both of those; we'll tee them up for after the agenda items that we've already planned

Carmen: Thank you!

Matt_King: Next AT Driver Subgroup meeting: Monday September 8

Matt_King: Next CG meeting: Wednesday September 10

Current status

Matt_King: The big thing this week is that the "accordion" plan advanced

Matt_King: That was a lot of work resolving conflicts! Thank you dean for your part in that

Matt_King: This bumps us up to 17 plans in "Candidate Review"

Matt_King: And we have seven plans in "draft review". Four of those are on today's agenda

Matt_King: Assuming we make good progress on these over the next week or so--I'm hoping that these seven (plus one more that's approach "draft review") will be able to move forward

Matt_King: There was a JavaScript error with that eighth plan, though; I made a comment, and I'm hoping we can work it out

howard-e: Yes, I saw that, and I've responded with a suggestion for a fix

howard-e: We shouldn't be setting the ARIA property; we should be setting the HTML property

Matt_King: Ah, right. Why didn't I catch that?

howard-e: I also made another note in another pull request. This may be for a separate discussion, but the pattern says that something is required, but it isn't mentioned in the example

Matt_King: Ah, yes. Even in the references, it's not really a test of aria-checked; it's a test of HTML checked

Matt_King: It sounds like maybe I should go back to the APG documentation because the APG documentation might not be reflecting that fact

Matt_King: That's a good call-out. Thank you

Matt_King: Anyway, that's where we're at right now. It's very possible that we could reach 25 test plans by the end of the month

What was solution to VO accordion conflicts?

Matt_King: We landed accordion, and we discussed some of the issues last week, but we didn't discuss them in the context of a GitHub issue

Matt_King: According to the minutes from last week, dean was going to check on a different macBook that he has with a fresh install

Matt_King: Ultimately, I don't know how this conflict was resolved. You were using the "b" key to navigate backwards, and instead of it doing "quick nav", it was doing autocomplete

Matt_King: I wanted to create a record here because I'm not sure it will be the last time we experience this problem. How did we solve it?

dean: In a follow-up e-mail, I confirmed it was doing exactly the same thing in a fresh mac. If you start the test with "quick nav" on, and you go to that field and type the letter "b", it demonstrates the expected behavior. If you toggle quick nav, first, then it demonstrates the behavior I originally reported

Matt_King: It is a confusing issue. If "quick nav" is on by default, then setting focus on a field is having a side-effect of disabling quick nav

dean: I'm guessing that's what it does, because you can toggle it on and off, and then it will work, too

Matt_King: I don't know if Apple would consider this a bug or not. It's marginal--whether this is in-scope for this project

dean: I think that it behaves correctly. I think the test is not really correct

dean: Because I think that when you're in a form field and you type "shift+b", you should get a capital B

Matt_King: Well if quick nav is indeed turned on, then the point of having quick nav on is that it does NOT type a "b"

dean: Okay, so then it is a problem with quick nav, then

Matt_King: The setup script does not touch any of the VoiceOver settings

Matt_King: We'll stick with the way that IsaDC handled it. I'm going to make a note to myself to ask James Craig or someone else at Apple about this

Matt_King: Maybe I'll phrase it like this: "When focus is set in a text field, should quick nav functions work if quick nav is on? Or does setting focus temporarily override the quick nav setting?"

Matt_King: Great; that's very helpful to know. Thank you

Matt_King: If Apple says this is a bug, then we might want to change this to document an unexpected side-effect

dean: Yeah, we could do that It's only one test as I recall

Running Disclosure of Answers to Frequently Asked Questions test plan

Matt_King: There are 0 conflicts right now. JAWS and NVDA are complete

Matt_King: But the active testers are absent today, so there's actually not much to talk about right now

Running Switch test plan

Matt_King: We'll skip this due to inattendance, as well

Running test plan for Switch Example Using HTML Button

Matt_King: We have NVDA and JAWS responses available, ready for a volunteer to assign verdicts

dean: I had a question about my work here

dean: If the test is operating a switch is off, after doing "ctrl+option+space", it says "no actions available", would it be untestable?

Matt_King: It doesn't sound untestable--it just didn't work

dean: So we just record the output that we got and fail it

Matt_King: Yes

dean: I can pick up the NVDA work, as well.

Running test plan for Tabs with Automatic Activation

Matt_King: None of the assigned Testers are present today, so we'll skip this, as well

Bot AT Version updates

Carmen: I have two questions

Carmen: First: should we update NVDA Bot and VoiceOver Bot to the latest version?

Carmen: The second is more about process: who should give the green light for updating the bots?

Matt_King: Can we currently pick a bot version? Or do we only support one version at a time

Carmen: You can choose, but only versions that we had previously working

Matt_King: In general, we always want to move to the latest version when it becomes public

ChrisCuellar: In that case, I think it's still a process question.

ChrisCuellar: At least on the NVDA- and VoiceOver- side (for now), we will probably notice when there is a new version sooner than the CG

ChrisCuellar: Should we flag it, bring it to the CG, get approval to update, and only then update? Or should we just go ahead and update it on our own and let everyone know?

Matt_King: I'm trying to think of any scenarios where updating the bot could be disruptive or cause problems

Matt_King: When we have plans in the test queue, and we do a new run for a new version...

Matt_King: I think that in general, due to the way we use the bots, that it's not going to be a problem

Matt_King: I don't see anything wrong with Bocoup being proactive with new releases and just informing the CG

ChrisCuellar: Before we started running our own macOS images--that was the case. With GitHub, they would just update the image without warning. Now, we have more control. And it's a little different for each bot

ChrisCuellar: But even JAWS is going to let us know about new releases

ChrisCuellar: historically, with VoiceOver, we didn't know when it was upgrading--at least, not with major releases. We just had to keep an eye on GitHub's releases

ChrisCuellar: So far, it wasn't been a problem that we didn't exactly know which version was running

ChrisCuellar: But this leads me to what I think is a larger product question

ChrisCuellar: I was wondering if it would be better to move to a UI where an admin or tester could just select an available version of the bot to run

ChrisCuellar: Right now, there's some logic in the back-end that tries to match the best bot to the AT version that was selected for the test plan run

ChrisCuellar: ...but I'm wondering if it might be easier/simpler to just let the tester select the version that they want to use

Matt_King: Yeah, though there will probably always be cases where the tester wants to use an older version

Matt_King: It's pretty cool if we have those images there, now, and we can just let people do that

ChrisCuellar: I think with JAWS, we could request that the JAWS version is a flag that gets deployed on GitHub when we're running the test

Matt_King: So, for JAWS and NVDA, is it a script that builds the image?

ChrisCuellar: I think with JAWS, we just have a link to the latest build

Matt_King: So we're not storing a binding?

ChrisCuellar: Not currently

Matt_King: I think that might be an improvement. It feels like storing a binding that we use on-demand makes the infrastructure a little more robust (compared with relying on a link that could change)

ChrisCuellar: Definitely

ChrisCuellar: We have some changes to the bot-running UI coming up soon. We can revisit how much granularity we allow to admins and testers when they select a bot

<Carmen> w3c/aria-at-app#1467

Add information about conflicts to the test plan status column for automated re-runs

github: w3c/aria-at-app#1467

ChrisCuellar: This is kind of a gnarly product terminology question that we've been discussing a bit on the issue itself

ChrisCuellar: After an automated re-run is completed when there's a new bot version--right now, due to the way that the bots and testers are rendered on the test queue, it's really confusing. Whenever there are response conflicts between an automated re-run and the previous report--in those situations, we record the mismatch between the AT outputs, and we leave the verdict unassigned

ChrisCuellar: So the new run of the test plan is in a state where there are a certain number of unassigned verdicts. A human tester needs to go in and verify, maybe run the tests manually to confirm, etc.

ChrisCuellar: When that happens, it's very hard to find exactly how many tests in a test plan are in a certain state

Matt_King: Basically, we have two runs of a test plan, a prior report that's already been published (that we're comparing to) and a new run that we're comparing to that prior report

Matt_King: ...and we're comparing what the bot got for a response from the AT in the new one to what's in the prior report (whether from humans or bots)

Matt_King: This is very similar to what we do when we have two humans running it. We compare responses from person "a" and person "b"

Matt_King: The way we've done that, to date, is we've actually looked at verdict mismatches, only. When there are four conflicts, it means that there are four verdicts that don't match

Matt_King: We don't surface in the status the mismatches in the response.

Matt_King: That's where I think we're getting bogged down. We're talking about two different kinds of conflicts, but they have the same effect.

Matt_King: I'm hoping we can simplify things. We have four conflicting responses or four conflicting verdicts

ChrisCuellar: You were proposing to add a new kind of conflict terminology to describe the case where you're taking over an automated re-run and there are unassigned verdicts due to differences

ChrisCuellar: We were a little concerned about exactly what you just said. When we report conflicts in the "Status" column right now, it links to a conflict resolution screen, and there's a whole process

ChrisCuellar: So rather than adding a new type of conflict and muddying the meaning of the "status" column, I thought it might be cleaner to surface all ofthe stats that we currently surface for bot runs--to surface those for all human testers, too

ChrisCuellar: The statistic about "how many verdicts have been assigned" is what we're trying to chase right now, I think

ChrisCuellar: The human tester only has the number of tests complete.

ChrisCuellar: I don't remember the historical reasons why there are differences in how we report progress

Matt_King: In the very beginning, humans were always recording both the responses and verdicts at the same time. Historically, it was not a thing

ChrisCuellar: What's confusing about the UI right now is that when you re-assign a bot run to a human tester, it's hard to know how many conflicts you have to resolve at that point. You only know how many tests have been submitted, which is sort of a different number

Matt_King: It's really helpful to know if there were any differences in responses (and if so, how many). That tells you the magnitude of work that you have to do

Matt_King: It's the number of responses that tells you the amount of work that you have to do--not the number of verdicts

Matt_King: Once you know you have the responses settled, then rendering the verdicts is almost trivial

ChrisCuellar: To your point about now knowing how many responses just didn't match--I wonder if we keep that also for human testers, and that we also report there how many responses were recorded but mismatched from the previous run--maybe that's a solution

ChrisCuellar: Essentially, I was suggesting that we keep the same stats for all testers whether they are a bot or a human because that seems more informative

Matt_King: I think if we're always talking about responses and verdicts (which is always applicable whether you are talking about bots or humans), then you could report on missing verdicts. We could say "X or Y verdicts complete" or "X of Y conflicts compete"

Matt_King: The idea of "X conflicting verdicts" and "X conflicting responses" would then make more sense

<jugglinmike> s/conflicts complete/responses complete/

Matt_King: then we're always reporting the same two stats

Matt_King: The number of tests is not relevant

ChrisCuellar: Yeah, I was thinking that, too. The percent complete really reflects the assignment

Matt_King: We can get rid of that noise. We don't care about the number of tests

ChrisCuellar: To me, the "percent complete" was always very confusing

Matt_King: Let's figure out new content for the "testers" column where we only talk about responses and verdicts for both humans and bots. You could probably consolidate

Matt_King: It could be a little more compact--we could figure out a wording that's more concise

ChrisCuellar: Maybe it's better to always use percentages

Matt_King: Yeah, maybe. But we always want to show counts for conflicts because that's a statistic where the number truly matters

Matt_King: Maybe we put something like, "Progress: 40% of responses, X% of verdicts" Always put responses first because that's what you record, first

Matt_King: You might be able to make it pretty concise that way

ChrisCuellar: We'll brainstorm it a bit, but I hear you: make it concise, aim for shorter

ChrisCuellar: If we can get that progress statement as concise as possible for each tester, I think that's the goal

Matt_King: I would still want to go back to this concept of reporting the number of conflicting responses and reporting the number of conflicting verdicts

Matt_King: You could shorten it: "Conflicts: 0" or "Conflicts: 4 responses, 8 verdicts"

Matt_King: That would be pretty short

ChrisCuellar: It's still a problem, if we show it in the status--only one of them should be a link.

Matt_King: The whole text could be one link. "Conflicts: 4 responses, 8 verdicts" would be one link that leads to the conflicts page

ChrisCuellar: These aren't conflicts in the way that we typically think about them--we'll have a row with only one tester

Matt_King: Ah, I see. In that case, you're right

Matt_King: Now, I'm actually thinking that the percentage in the "testers" column is less helpful because the absolute numbers are better

ChrisCuellar: I do feel like the whole review process for an automated re-run is a different enough workflow to warrant a different screen/UI. I think trying to show-horn all these concepts into the test queue as we know it is maybe a problem.

ChrisCuellar: I think that's why we have all this confusion right now

Matt_King: We did it this way because wanted to limit the amount of new UI to develop

Matt_King: If we can solve this with content instead of code, I'd like to do that

Matt_King: I can revisit this issue with my new understanding of the problem (that is: the conflicts for automated re-runs being substantively different)

Matt_King: I'll put this on my to-do list to make a comment here

Matt_King: It sounds like we still made some good progress in terms of how we're thinking about this problem, though

ChrisCuellar: Agreed

– DRAFT –
ARIA and Assistive Technologies Community Group Weekly Teleconference

04 September 2025

Attendees