Meeting minutes
<Matt_King> CHARI: Matt King
Review agenda and next meeting dates
https://
Matt_King: Requests for changes to agenda?
lola: I would love to talk about the tests for the HTML-AAM features
Matt_King: Certainly. We'll discuss that as the third item (before we talk about the automated reports)
Matt_King: Next CG meeting: Wednesday May 20
Matt_King: I've been listing the AT Driver subgroup meetings as "TBD". Are you planning to be in a position to run more of those any time soon?
jugglinmike: No plans, but maybe we can talk about what that might look like offline
Matt_King: Okay. I have information about the Sovereign Tech Fund from ChrisC. Maybe we can consider two separate applications (one for ARIA-AT and one for AT Driver). I was planning on talking with ChrisC later today
jugglinmike: Got it
Current interop reporting status
Matt_King: Since our previous meeting (and with a lot of thanks to Joe_Humbert, hadi, dean, and mmoss), the quality spin button has advanced to candidate review
Matt_King: From a "draft" review perspective, our tank is on "empty"
Matt_King: I am blocked on bringing in the "Grid" plan because of an issue with previews
Matt_King: So we're a little stuck right now. I hope we can address that before the next meeting because it's kind of hurting our progress
Matt_King: JAWS is almost perfect (except for these invalid ones)
Bot infra training Scheduled
Matt_King: I'll be talking with ChrisC later today and then sending out more details on the training
Matt_King: It's scheduled for Tuesday May 12 at 08:00 PT
Matt_King: I don't know how far we can get with one meeting, but the intent is that we will have a series. Probably not on a regular cadence, though
Matt_King: So others can perform tasks like adding NVDA 2026 (and others) to the bot infrastructure (so that we can update reports)
Matt_King: That's what we hope to be able to accomplish within the first couple of sessions within that training
mmoss: I should be able to attend that meeting, too
Matt_King: Anyone else is welcome, too!
ARIA-AT HTML Support
lola: As some of you, I'm working on a project called "Accessibility Compatibility Data". We're amalgamating results from WPT and ARIA-AT to help see how accessibility support in the browser
lola: We will be including that data in Baseline (which is MDN's kind of "you can use this feature because it is available in all the browsers"). At the moment, that project does not include any accessibility data at all
lola: They generate reports by querying the DOM only. They don't consider the accessibility tree
lola: I spoke with ChrisC an jugglinmike at TPAC about this work, and I think it is awesome. I can also see that it's incomplete. It's unclear to me how the reports for the screen readers are being done
lola: The "button" for instance, it looks like the tests are being pulled in from other test plans (mainly accordion). I have a couple of questions around that
lola: When we combine the support data, we want to be able to say "the earliest version of an AT that this is supported in is X.Y"
lola: Would this always be based on the test plans that are already complete?
Matt_King: If you can get the early version of the screen reader (which is very hard to do for some screen readers and impossible for others. Narrator is completely impossible. VoiceOver might be feasible with a virtual machine
Joe_Humbert: It shouldn't be impossible for Narrator. I have VMs for Windows 7, etc.
Matt_King: I was just looking for old versions of JAWS last night, but I didn't look to see how far back you can go. I don't think it's super far back. They make one build (the final one) for each year
Matt_King: So it is not impossible to get the screen readers. If you have the screen reader installed, we have to add it to our "supported versions" list. All the test plans would have to be run manually with those older versions. It certainly could be done. Because it would all be manual, when it comes to HTML-AAM, you would want to look at the test plans that are shortest and simplest for each.
Matt_King: I don't know if that would be worth it because I don't think it's relevant to anyone. No one I know intentionally uses an old version of a screen reader
Matt_King: Maybe JAWS users might use a 3-to-4-year-old version because they don't have the resources to update
Matt_King: I suppose we can ask Vispero for data on that
lola: That would be good to know. Otherwise, we could simply say that "the screen reader is limited to this particular version". The history can begin from today's releases
Matt_King: That's what we've been doing here in this project. We have trend data based in what we tested first
lola: Got it. In terms of the HTML features test plan: the results are coming from test plans that have been built on APG. Is that right?
lola: For JAWS it says "must-have behaviors" is over 500. That seems like a lot, but I assume it's fed by all the test plans that use a button. Is that right?
Matt_King: Yeah. For some features, it's certainly possible for the context to have an impact
Matt_King: For things that are part of composites (like menu items or states), you may have things that are supported better within some features than within others
Matt_King: For HTML-AAM attributes, those are more likely to have variable support than elements, I believe.
Matt_King: In this project, we've kind of mapped out how we would get there
Matt_King: For all test plans, we have an assertion like "the name of 'element X' is conveyed"
Matt_King: What we have is explicit assertions. In this case, "the name of the button 'print' is conveyed", and "the name of the button 'mute' is conveyed" is two different behaviors in our system
Matt_King: We'd need an additional layer of abstraction to link those. That would require a chunk of engineering time
lola: Would that be Bocoup's engineering time, or someone else's engineering time?
Matt_King: Anyone who wanted to take on the engineering task. Bocoup would be able to take it on the quickest
lola: I know Bocoup has a lot of context on this. jugglinmike, do you think that, based on the work I've shared with you, that this abstraction work is necessary?
lola: I think it would help a lot
lola: Do you think we can get by without it for now?
jugglinmike: One part I don't know about is when/how we will write new tests that aren't based on APG examples
Matt_King: We definitely have a need for more atomic tests. Everything needs at least minimal context, otherwise it's meaningless.
Matt_King: We already have the repository carved up so we have a directory for APG and also one for HTML
Matt_King: When we create test plans where the test case is specifically an HTML element or an HTML attribute, those would go in the HTML directory. We have one right now
Matt_King: The same thing for ARIA. There are three partially-written tests at the moment. I started one for "aria-required on a text field".
Matt_King: They get super-specific
Matt_King: The semantic layer would go on top of all these tests
Matt_King: My motivation for the semantic abstraction has to do with simplifying the process of writing tests. So that they become faster and easier to write
Matt_King: It would be so much faster to compose tests, and I think an AI could almost start suggesting test plans if we had that layer. That was my primary motivation
Matt_King: When it comes to providing useful support data for screen reader users and web developers, in general, I think the context still almost always matters. For example, using the aria-details attribute, if it's supported on one thing but not another, you can't tell anyone to go ahead and start using it
Matt_King: Or the new HTML accordion features. I guess context might matter there if they work with some CSS attributes but not others. I just learned last week from Daniel that layout-block is required for summary to work like a heading
Matt_King: Maybe that's purely a spec thing and not a support thing. I don't know the answer to that, yet
lola: I'm speaking to a bunch of funders. With one of them, we're in the last bits of talk; we're expecting to hear from them this week. I included HTML-AAM ARIA tests. This was something that jugglinmike helped scope out. If we were able to get funding and do this work, could it fit in?
lola: If we're just starting to think about atomic tests: is this a separate piece of work, or could that fit in in filling the testing gaps?
Matt_King: Anyone could write the tests. Coming up with the test cases is always the first step--having a well-scoped test case
Matt_King: If someone wants to author such a test case, we'd be glad to review them.
Matt_King: It's tricky to write them (there's a lot to learn in our current system), but there's nothhing to stop them from writing tests, and we can review them
lola: So we can start doing them in a way that fits more with ACD, which seems to be a more atomic test case that you were speaking about?
Matt_King: Yeah. They still have to use the format that enables the bots to run them, and they still need to follow the process that says we have alignment on which test cases are useful to screen reader developers and that they are complete (namely: navigating forward, navigating backwards, operation, and requesting information)
lola: That's it for me, thanks!
Matt_King: It will be great if we can accelerate that dimension of the project!
Learning session: Completing auto-updated reports
Matt_King: This is about updating reports for test plans. We have a lot of test plans that we have run in the past for older versions of screen readers, and so for those test plans, the bot is trying to say, "get data for every test plan in the system for up to this version"
Matt_King: The latest version of JAWS available in the system is from last Auguest
Matt_King: In conversation with Vispero, it was valuable for them to see the difference between JAWS 2026 and JAWS 2025 from last August--in terms of support levels
Matt_King: We currently have 8 test plans in the system that do not have reports for the 2025 version of JAWS. That version is readily available on their site (anyone can download it)
Matt_King: Joe_Humbert has proven that you don't need a full version of JAWS to run these test plans. You can do the work in the 40-minute window available to users of the preview build
Matt_King: If you go to the test queue, there are three buttons at the top which are filter buttons. "All test runs" (there are nine right now), "Manual test runs" (there's one), and "Automated updates" (there are 8)
Matt_King: I want to talk about the "Automated updates"
Hadi: are we trying to test the test plans on older versions of JAWS?
Matt_King: The latest JAWS 2025--not JAWS 2026
Hadi: Why?
Matt_King: We will move to JAWS 2026, and we will have that done by the bot as soon as we add the JAWS 2026 bot to the system (we don't have it right now), but what Vispero would like to see (and what I would like to see) is data that compares the two
Hadi: They have resources; they could do it. Our mission is a little bit different. It's good for use to know the comparison between 2025 and 2026, but we have limited resources. I prefer that we focus on our main mission, not to satisfy Vispero's goals
Matt_King: There are two parts that are aligned with our mission. This is one of our first opportunities to show trend data and to show the way that ARIA-AT is having impact
Joe_Humbert: That's the version I'm currently running. I can make this easy and just run all eight test plans myself
Matt_King: I would like everyone to understand this process because we're also going to update the VoiceOver but, though I'm not sure whether or not we want to bother running anything with macOS 15 at this point. We may delete those report runs.
Matt_King: But I do want to make sure everyone understands the basics of the process
Matt_King: When the bot runs, it compares its output to the previous output. So, the bot is running JAWS 2025, and all the reports its comparing against are the reports we completed for JAWS 2024. If there is a difference between those sets of outputs, then it says, "I don't know if the new outputs satisfy the assertions or not", and it marks the relevant verdicts as conflicting.
Matt_King: Your job then is to review the output an ensure that the new output correctly represents the current behavior of the AT
Matt_King: I found one case where the bot recorded the wrong output. For "action menu button," the bot reported that there are no buttons available on this page. That might be a timing issue, but it's a new one to me.
Matt_King: Have you seen that, jugglinmike?
jugglinmike: No. We generally don't start executing tests until the page signals that it's ready
Matt_King: Ah, right. If I only went through three tests, and I only saw it once, maybe it's a true aberration
Matt_King: In any case, that was an instance where the bot-reported output was incorrect and needed manual modification
Matt_King: It's also possible that the output is truly different, and then it's on me, the tester, to make a judgment about whether the assertions are satisfied
Joe_Humbert: I run these just like I do with other test plans
Matt_King: The other thing is that the bots cannot determine negative side effects
Joe_Humbert: I should be able to get most of these done by the next meeting
Matt_King: That's fabulous, thank you! Joe the work horse!
Matt_King: So we're all good there
Matt_King: Which means I don't have any new work for you, dean, hadi, or mmoss
Matt_King: That's my job: to get more work for you by our next meeting in two weeks' time
Matt_King: Thanks again, everyone!