19:04:42 RRSAgent has joined #aria-at 19:04:46 logging to https://www.w3.org/2023/03/20-aria-at-irc 19:05:06 Zakim, start the meeting 19:05:06 RRSAgent, make logs Public 19:05:07 please title this meeting ("meeting: ..."), jugglinmike 19:05:19 MEETING: ARIA-AT Community Group Automation Workstream 19:05:25 present+ jugglinmike 19:05:29 scribe+ jugglinmike 19:06:39 Topic: Considering the direction of automation 19:06:56 present+ Matt_King 19:07:23 present+: Michael_Fairchild 19:07:34 present+ Michael_Fairchild 19:09:56 mzgoddard has joined #aria-at 19:10:02 +present 19:10:17 jugglinmike: When do we want to allow/disallow automation in the collection of AT responses? 19:10:50 Matt_King: I'm going to make some proposals to change the glossary to separate "test results" into "AT responses" from "response analysis" 19:11:35 Matt_King: The "test case" (the thing you're testing) and the assertions are designed to be AT-agnostic (for the set of ATs which are in-scope for the command) 19:12:11 Matt_King: Then you have commands or events which generate responses. Right now, we use the word "output", but we might want to use the word "response" instead. 19:12:34 Matt_King: I think those responses should be called "command responses" because they are tied to a response. That's one proposal 19:13:13 Matt_King: Command responses aren't part of the test. Running the test *generates* command responses 19:13:30 Matt_King: Should the analysis be considered part of "running the test"? 19:14:09 Matt_King: Another proposal I have for the glossary is to label the analysis of command responses "verdicts" 19:14:18 present+ mzgoddard 19:14:26 mzgoddard: Does that include unexpected responses? 19:14:33 Matt_King: No 19:15:32 Matt_King: Unexpected behaviors are an attribute of the response 19:18:17 Matt_King: I think there are two aspects of unexpected behavior: a token which documents if it's present and what kind it is, and a textual description 19:20:22 Matt_King: When it comes to assertion verdicts, I think the realm of automation is pretty limited 19:21:34 Matt_King: Having automation simply detect congruence with previously-interpreted responses is reasonable 19:22:28 Michael_Fairchild: in the future, we might be able to train an AI model to perform analysis, but that seems far off right now 19:24:20 github: https://github.com/w3c/aria-at/issues/909 19:25:14 github https://github.com/w3c/aria-at/issues/909 19:31:43 jugglinmike: Verdicts need to be tied to the content of the tests. A change to the test would mean the system should not "trust" or "reuse" verdicts previously reported by a human 19:32:16 Matt_King: I was anticipating that almost everything here is versioned 19:34:15 Matt_King: If we get feedback from an AT vendor that they want some assertion to change, that drives a change to the test plan 19:34:30 Matt_King: If a screen reader command changes, that also drives a change to the test plan 19:37:35 Matt_King: What if we change a test plan and we want to generate new results for the updated test plan? We would expect the automation to collect all the responses, but in the cases where the assertions have changed, then it would say "I don't have verdict" 19:42:14 Matt_King: We need to be able to insert a test between "test 1" and "test 2" without invalidating either of their verdicts 19:42:45 Matt_King: Could changes to instructions ever invalidate the verdict? 19:42:58 Matt_King: The setup scripts could definitely. Ditto with the commands and the assertions 19:43:46 mzgoddard: Whatever we land on, we can agree that there are a group of traits which we can use to look up prior verdicts and reuse. 19:43:55 Matt_King: Exactly 19:45:16 Matt_King: Automation is allowed to run a test, execute commands, generate responses. In the event that human-approved verdicts exists, and IF those verdicts are still applicable, then assign those verdicts to the responses. 19:45:59 s/generate/collect/ 19:46:37 Michael_Fairchild: I pause in thinking that any change to the APG would invalidate the verdicts 19:49:59 Michael_Fairchild: It ought to be enough to look at the AT response, because a meaningful change to the APG example would change the AT response 19:52:54 jugglinmike: But if the source changes meaningfully and the AT incorrectly does not change its output, then the system would be "fooled" into re-using a prior verdict which is actually no longer valid 19:58:24 jugglinmike: It seems like these considerations need to be addressed in the working mode. Is that right? 19:59:49 Matt_King: I'm worried about the working mode becoming too opaque. I think the working mode is more for humans to understand the high level. Maybe certain sections of the work could link to separate documents which get into mechanics 20:00:21 Matt_King: These condiserations definitely need to be documented, though I'm not sure how yet 20:00:40 Matt_King: The first thing I want to do is update the glossary and then use that to inform how we want to update the working mode 20:01:17 Matt_King: There is a need for stakeholders (e.g. AT implementers) to understand the working mode. Like, "what's the appeals process?" 20:01:39 Matt_King: Complexity will be the enemy of clarity there 20:02:39 Zakim, end the meeting 20:02:39 As of this point the attendees have been jugglinmike, Matt_King, :, Michael_Fairchild, present, mzgoddard 20:02:41 RRSAgent, please draft minutes 20:02:42 I have made the request to generate https://www.w3.org/2023/03/20-aria-at-minutes.html Zakim 20:02:49 I am happy to have been of service, jugglinmike; please remember to excuse RRSAgent. Goodbye 20:02:49 Zakim has left #aria-at 20:02:53 RRSAgent: leave 20:02:53 I see no action items