W3C

– DRAFT –
ARIA-AT Automation Subgroup Meeting

26 June 2023

Attendees

Present
jugglinmike, Matt_King, mmoss
Regrets
-
Chair
Mike
Scribe
Sam_Shaw

Meeting minutes

<Sam_Shaw> Agenda: 1. Update on our 2023-06-14 presentation to the Browser Testing and Tools Working Group 2.Defining the workflow for automated AT response collection in ARIA-AT

We are not meeting next week as its a Holiday. Our next meeting will be July 17th

Update on our 2023-06-14 presentation to the Browser Testing and Tools Working Group

jugglinmike: We presented about the status and future of the AT Driver. Meeting notes can be found here: https://www.w3.org/2023/06/14-webdriver-minutes.html#t06

jugglinmike: I wanted to know if anyone had any lesson learned they could share from developing web driver bi di.

jugglinmike: I asked about the possibility of bringing the AT Driver spec into the proposal, so its more formal and gets on the standard track. David burns was very open to this. That working group is re-chartering in august so this is a good time to do this

jugglinmike: Another member of that group, Sam from Apple, said they liked to see discussion with implementers. I've spoken with Mac from NVDA, and we have it on our to do list to talk to vispero. Lola is going to work on setting up this conversations

jugglinmike: I'm happy to answer any questions on this now or in the future via email

MK: How many people attended?

jugglinmike: I would say 10

MK: Lets talk offline about how much involvement we need from Implementors. This is a time sensitive topic, we should discuss this with Vispero tomorrow

MK: We haven't talked about what level of commitment we want from Vispero

MK: We want to be careful about approaching them from different angles

jugglinmike: It may then be that we want to hold off and let Lola be more intentional about how to reach out to Vispero

Defining the workflow for automated AT response collection in ARIA-AT

jugglinmike: We began discussing this last week at the CG meeting

Here is the design document we have been iterating on: https://docs.google.com/document/d/10681gMmTM2KmVEw_Je-A4nmRvcqIkB6jgtJ8dt_Pp0M/edit

jugglinmike: Right now we are focused on the workflow proposal section

jugglinmike: What it could and should look like to operate the aria AT app in a world where collecting AT Responses is possible

jugglinmike: Where we paused last meeting, was a suggestion from Matt that instead of making this control of the system a capability of test admin, we should instead implement that as an ability of a tester, who could optionally invoke this system to collect responses

MK: The primary reason i was thinking that way was longer term strategy, to simplify things. I originally introduced the idea that the Bot is like another person, however then we discussed analyzing responses.

MK: It really started feeling like to me that this should be another tool available to anyone, and the job of the tool is too collect responses initially

James Scholes present+

JS: Right now we assign two or more testers to a test plan. They perform the tests and assign verdicts, this has proved to be prudent because not only do we end up with differing verdict assignments, but also end up with differing AT Responses

JS: If AT is collecting these responses, we are eliminating a second persons interpretations. We are just collecting two sets of verdicts

MK: I disagree, I wouldn't want a tester to not run the test themselves.

JS: So you are saying a job of tester is to compare two sets of test results

MK: They can say it bots results are correct, if not they can raise an issue

jugglinmike: If thats an expectation, than as a tester what's my motivation to run the test at all?

MK: It saves me time.

MK: I don't have to start the test, pause it, compare results

JS: I'm not sure thats true, right now the tester runs the test and track output. We would be asking them to run the test and compare the results

JS: We should assume some testers will want to note thier response and then compare them, and in that case testing would take longer. comparing two strings would take time

MK: If we wanted to we could certainly have one tester collect responses manually, and one collect AT responses, and not have them do any comparing

MK: Our test admin should work through the test to make sure the tests are designed to make them efficient

MK: Maybe the way we release this, is that testers at PAC and Me are the only ones who have the button to automatically collect responses.

MK: Then if there are conflicts with the manual testers we resolve them. Over time we can reveal this button to collect results with AT to more and more people

MK: Right now this feature will only be available for NVDA right Mike?

jugglinmike: Yes

JS: I just don't couple the time a manual tester to collect results and the time a AT collects result.

JS: That seems unnessary to have to wait for the AT to complete a test.

jugglinmike: It its just the Test admin is running tests with the automation thats simpler

JS: I'm concerned were already asking testers to do alot and this would add more confusion and complexity

jugglinmike: There also would be an increase in coordination to runs tests, depending on the timing of AT running the tests

MK: I was assuming running a test plan would take 5 mins?

JS: I don't know where that is coming from.

JS: If the tester is just reviewing verdicts, then there is a huge time savings

MK: I don't want testers assigning verdicts without running plans themselves

MK: A tester would have to correctly interpret all of the text output, which seems like a leap

JS: I think that depends on the tester, some will recognize output that they hear everyday. However, with JAWS output, I wouldn't recognize output because I don't use it daily

MK: I don't think I can make a judgement about a verdict without running the test and getting the context

JS: Thats what I'm talking about, I would be comfortable assigning verdicts for NVDA, but not with JAWS.

JS: I don't think we should jump from a tester reading a response and assigning verdicts. We agree on this. We still have some testers who aren't daily users of AT, we couldn't rely on them to make these verdicts

MK: I think if the AT could run all the tests, have a test admin use the output to assign all the verdicts

MK: Then humans get involved with comparing results from AT and human testing. That is where the huge gain is, AT can run tests for every new version of browsers or AT that is released.

JS: I think we need to be aware of minor differences between how human tester and AT collect response, like a space at the end of the result.

MK: One way we could fix this fairly quickly, is have automation run new tests, then a test admin review a spreadsheet to compare the responses quickly and determine if there are real conflicts or just editorial ones. Maybe thats the first thing to build Mike

MK: Maybe the idea isn't to speed up human tests

MK: The human tests have all sorts of important aspects, we need to have the human fully engaged. Is that where we are aligned James?

JS: I would love to speed up human testing, but I don't think the proposal says that. I think we are aligned on the advantages of having AT re-running tests, where there are small changes or an update

MK: Lets agree on the goals of the MVP

MK: So if was just re-runs of existing plans, and an admin can assign a bot to it

MK: I'm trying to imagine how this works for the admin

MK: Lets just consider new AT versions for example

MK: You add a new AT version, that updates all candidate, draft, or recommended plans that exists to need a retest

MK: You can identify manually the tests to rerun

MK: We wouldn't need to automate this, you could review the recommended report column, open the dialog, add the missing ones to the test queue, when you do so you can have the option of running with the bot.

MK: Then it would show up in the test queue, then if it shows up with no conflicts maybe then we can publish automatically

MK: Maybe we still need a button to accept as complete

MK: We could just do that one use case, adding a new version of a test based on a new AT version

JS: I don't think we should map to a specific test case

MK: We would then have to figure out how the system know what comparison to make

MK: I think that it might be intuitively obvious to the system

MK: Mike is this making you nervous or excited?

jugglinmike: Its definitely a departure from where we started writing this document. I will need some time to rethink some of these details, I don't think there's to much to change architecturally

jugglinmike: There is some work to identify "familiar" responses

jugglinmike: One thing to confirm, acutally we will not be using this automated test collecting in draft state?

MK: Correct.

MK: Maybe in the future, when we create a test we first have a bot run the test so everyone can compare results to it. However not now, as we are hesitant about this

jugglinmike: another question, its clear that responses collected by the system that match previous results, then a verdict is assigned. If they don't match, should the test run that is incomplete, should that have the previous AT response, the automated response or be blank?

MK: I think it should be matching the original human response.

MK: We need a way to notate when responses have been collected with AT.

jugglinmike: Does the existing UI compare results against different versions of AT?

MK: No, the test plans are agnostics.

MK: Older tests wont be in the queue

JS: I think it makes sense to make it test agnostic, if for some reason we wanted run a test plan with an older version of AT that would be great. We shouldn't prioritize changes based on AT version

MK: I think there is great value for a test admin to run a test with AT, to compare against the results of another tester, but we are telling it not just collect results

MK: So you wouldn't be able to assign the bot to a test plan that hasn't been run

jugglinmike: Could a test plan be in the recommended phase with a previous runs?

MK: We are limiting this to test plans that have complete test plan runs by humans

JS: In an MVP do we need that limit? As its all test admin abilities

JS: We may not need to consider if a test plan has been run before, the bot would just not make any verdicts

MK: I think we need to figure out where the value and goal of the MVP is to rerun test plans and collect results, what is the best way to scope it so that we are building as little UI as possible?

MK: My goal is to simplify delivery

MK: Is that a clear version for the MVP?

jugglinmike: Yes, I will need to update the design document and touch base with you again

Minutes manually created (not a transcript), formatted by scribe.perl version 210 (Wed Jan 11 19:21:32 2023 UTC).

Diagnostics

No scribenick or scribe found. Guessed: Sam_Shaw

Found 'Agenda:' not followed by a URL: '1. Update on our 2023-06-14 presentation to the Browser Testing and Tools Working Group 2.Defining the workflow for automated AT response collection in ARIA-AT'.

Maybe present: JS, MK

All speakers: JS, jugglinmike, MK

Active on IRC: jugglinmike, Matt_King, mmoss, Sam_Shaw