Bi-Weekly Meeting of Assistive Technology Automation Subgroup of ARIA-AT Community Group

Meeting minutes

w3c/aria-at - #945 - Rethinking the wording for assertion verdicts

w3c/aria-at#945

jugglinmike: The working mode doesn't have the term verdict yet, but its one we intend to add.

jugglinmike: The working mode refers to verdicts as supported, not supported etc

jugglinmike: automation refers to the verdicts as acceptable, omitted, contradictory

jugglinmike: I have a proposal for a new set of terms

correction: automation refers to the verdicts as good output, no output, incorrect output

the proposed new terms are acceptable, omitted, contradictory

js: I like the new terms you proposed. In terms of bubbling up the results, I wonder if no support, partial support, supported is clearer

MK: Thats why I wanted to use numbers

MK: Partial support could mean anything between a little support to almost fully supported

JS: I agree but if something is 90% supported, the remaining 10% could still make it unusable

MK: I agree, unless we have multiple layers of assertions we don't need numbers. We also want to be diplomatic

MK: I think for your solution is pretty solid

MK: We just need to decide if we extend the use of these terms, or bubble them up

jugglinmike: Yes bubbling up we need to consider, the case where a feature is all supported except one, its not supported. For verdicts that can be in three states, understanding why its partially supported is tough. I'm not sure if bubbling can work if we are looking for a percent score

MK: Yeah supported needs to be binary

JS: I think we need all three states

MK: What do the responses tell us? Either there is some support there or there isn't. Then the reasons is because someone tried, or someone didn't try to support

MK: If you measuring something using a percentage, then it needs to be binary

JS: for the reports, are there three levels to two of support?

MK: Any level of support beyond assertion is a percentage.

MK: At the AT level, the test level, at the AT level, all will be a percentage

MK: So we would say, using Mikes terminology, At the assertion level if the response is omitted or contradictory then that counts as a 0. If its acceptable then it counts as a 1.

MK: We could do other reports we could run that say what percent is contradictory, which percent is omitted

MK: I don't know that we need to bubble up these terms in the reports we have now

MK: We don't need terms for working mode, its just level of support

jugglinmike: I do think the working mode uses supported not supported.

MK: I can get rid of that

MK: I have some other issues for the working mode, particularly 950, I think we need to work on another iteration of the working mode and share it with the community

MK: We could have a binary state for assertions, and get rid of contradictory

JS: I agree, but we should rewrite the terms

JS: Lets add this to the agenda for the CG meeting thursday

jugglinmike: What I'm hearing is, we like the terms I proposed, but we may not need three terms

JS: It will make the testing easier if we just have two states/terms

MK: Okay but if this task isn't on the critical path, I want to be conscious of that

JS: This could speed up the process

MK: But its not a blocker, we can talk about enhancements in the near future

Michael Fairchild: Is there a third state where we publish a report with some of the data missing?

JS: No, really, but we need to consider this.

JS: If there is a situation where only 50% of test have been completed, what does that look like for a percent supported?

MK: We made a decision to change the working mode, and to get rid of the three output terms

MK: The question before we change the UI, is do we go from 3 to 2 states? Acceptable, not, contradictory

w3c/aria-at - #946 - Disambiguating 'Test Plan Run' in the Working Mode

MK: I'll comment on this issue and we can move it forward outside this meeting

Review of rationale for omitting explicit references to automation from the Working Mode - we touched on this during the 2023-05-22 meeting and agreed that Matt's perspective was critical

jugglinmike: This came up two weeks ago. We were talking about my task of describing automation how it layers on to the working mode, but the working modes doesn't describe automation

jugglinmike: James was not convinced of the utility of organizing our work that way

jugglinmike: As this has been a theme of my work, I want to make sure we are aligned on our direction

JS: Yes I wasn't sure what we were trying to achieve.

JS: For our tests, it doesn't matter if the responses are entered by a human or machine. But the results may need to be checked from a human. We are a long way away from automation checks responses and interprets them and providing their own verdicts

JS: Even if we get to that point, in many many years, I still think its valuable to have a human check the responses.

JS: The automation may be able to say a response is unexpected, but it wont be able to categorize how its unexpected

MK: I asked Boaz about abstracting the working mode. I want to make sure the working mode states here are how the business things work. Its the process for generating the spec, but its not a operations manual

MK: I think that there are some things about how the group currently uses the app that can be written into documentation.

MK: I think later on we can decide what a human does, or a machine does, but is outside the set of principles of the work

JS: That makes sense, but lets make that very clear. For someone new to the project, we want them to understand both angles, not just one dry article that describes who does what

MK: I still think we should get the roles out there

MK: The working mode does need to specify who does what "Directors need to approve this" The scope of the working mode needs to include scope of authority

JS: Okay I agree. The work that happens day to day is more practical, how the app works, what it does well, etc. I do think there is a disconnect between the working mode and how we actually do things. This is partially what Mike is bringing up.

JS: We need a document that outlines governance, and another document that defines how we work

JS: The governance document is more abstract, and you can go directly to a implementation. There needs to be a step between

MK: I agree, we are slowly building towards this. The wiki work I recently did to describe how we write tests, how we onboard people. We dont have much in the way of app documentation

jugglinmike: There is one thing that comes to mind that is fundamental the work Im doing. When we talk about roles, who is responsible for intitating automation? I've been assuming thats a test admins job, if that's the case then we have to talk about what the test admin is doing. Theres another framing however that changes what we build, which is the testers responsibility, can matt assign louis some tests and then louis runs that automatio[CUT]

MK: Its features design, we can say it both ways.

MK: We could make a feature where a tester uses AT to generate a response, and then adjusts it to be correct, and submits it as part of a manual test

MK: So right now I believe we said our MVP for Automation is, somebody, we didn't say who, is for a test plan run can we collect responses

MK: The automation will know what AT to spin up, what the tests are, and run them

MK: We can add to that, MVP Prime, if any of the responses, if there is a previous run of that same plan, if any differences exist flag those

MK: Thats so we can identify regression, If a new version of chrome comes out, automation can recheck everything and say yep its still supported

jugglinmike: so for the short term for me, Can I propose a change to the working mode that would capture a test admin to collect AT Responses?

MK: I don't think we need that

jugglinmike: right now the working mode just describes running the tests. We need to split up running tests and assigning verdicts. And we need to define the actors who will do these thigns

JS: The more we abstract these details, the more it becomes vague.

JS: If were saying this level of detail needs to go into another document, thats something else

MK: So test admin can run tests, but there nothing in the working mode that says what a tester needs to do to run a test. If the first thing they do is press a button and AT runs the test.

MK: The working mode doesn't care what buttons to press or what the scope of the test is. Running a test can be, I ran a test and got the same results as the AT. We can write a manual to describe that process, which is what we do now.

jugglinmike: So what we are saying is there is no change to the working mode.

MK: Yes I don't see a need to change.

MK: The working mode says the goal of the work is, make judgements about a test, how are the screen readers behaving, acceptable or not? That is the role of the tester. The test admin role is to make sure they agree with what the testers are doing, and resolve when there are conflicts. The working mode doesn't say what buttons to press or how many characters to enter

jugglinmike: So should we give running Automation to just test admin, or to everyone?

MK: What ever you think is better and faster?

JS: I don't think human testers need to be involved in that

JS: The pattern we follow now, granted testers assign themselves to tests, but for the most part, we gather info on who is willing to do what tests, then we assign the tests and work to resolve conflicts. The test admin is the gatekeeper to make sure everything stays on track

JS: We dont want people assigning to things we not ready to review

JS: I see automation in a similar light, once in place it may make this easier. The more we use the system the more it may know what we want, but there still is a manual element of having humans run tests and review conflicts.

MK: I'm good with a conservative approach, we should roll out the smallest, simplest, least risky/most useful approach. Lets not give to much power to everyone day 1

jugglinmike: I'm envisioning, the test admin can see who has been assigned to a test plan, but now they have a new ability to say collect new responses for this tester.

jugglinmike: As responses came in they would be entered in the correct places.

jugglinmike: If we make space for the system to have errors, in that we can retry certain commands

jugglinmike: in the case where there is an issue, we can have another tester run a particular assertion and compare results

MK: I think so, We may want to do that like the test plan is in the queue, instead of assigning the test they just plan a "run" button that creates a unassigned data set, that when its done we can assign someone to it who will complete and validate the report.

MK: Please put together a design proposal and lets go through it. I think you are on the right track

– DRAFT –
Bi-Weekly Meeting of Assistive Technology Automation Subgroup of ARIA-AT Community Group

05 June 2023

Attendees

Meeting minutes

w3c/aria-at - #945 - Rethinking the wording for assertion verdicts

w3c/aria-at - #946 - Disambiguating 'Test Plan Run' in the Working Mode

Review of rationale for omitting explicit references to automation from the Working Mode - we touched on this during the 2023-05-22 meeting and agreed that Matt's perspective was critical

Diagnostics