15:00:51 RRSAgent has joined #aria-at 15:00:55 logging to https://www.w3.org/2024/09/25-aria-at-irc 15:00:55 RRSAgent, do not leave 15:00:56 RRSAgent, make logs public 15:00:57 Meeting: Assistive Technology Interop through ARIA-AT 15:00:57 Chair: Boaz Sender, Chris Cuellar 15:00:57 Agenda: https://github.com/w3c/tpac2024-breakouts/issues/83 15:00:57 Zakim has joined #aria-at 15:00:58 Zakim, clear agenda 15:00:58 agenda cleared 15:00:58 Zakim, agenda+ Pick a scribe 15:00:59 agendum 1 added 15:00:59 Zakim, agenda+ Reminders: code of conduct, health policies, recorded session policy 15:00:59 agendum 2 added 15:00:59 Zakim, agenda+ Goal of this session 15:01:00 agendum 3 added 15:01:00 Zakim, agenda+ Discussion 15:01:00 agendum 4 added 15:01:00 Zakim, agenda+ Next steps / where discussion continues 15:01:01 agendum 5 added 15:01:01 tpac-breakout-bot has left #aria-at 16:30:21 JackieFei has joined #aria-at 16:34:33 ChrisCuellar has joined #aria-at 16:48:34 ChrisCuellar has joined #aria-at 16:50:43 present+ 16:50:57 Chris Cuellar (Bocoup) 16:51:38 scribenick: ChrisCuellar 16:55:02 boaz has joined #aria-at 16:55:18 present+ boaz 16:55:20 JackieFei has joined #aria-at 16:58:32 bruce_bailey has joined #aria-at 17:00:29 Joe_Humbert6 has joined #aria-at 17:00:46 murray_moss has joined #aria-at 17:00:56 present+ 17:01:08 are we in the wrong zoom? 17:01:25 Joe_Humbert2 has joined #aria-at 17:01:51 Only Murray and I are in zoom 17:02:12 Guessing it's the wrong Zoom... 17:02:48 giacomo-petri has joined #aria-at 17:03:07 Joe_Humbert2 I got the right Zoom from https://www.w3.org/events/meetings/8b398afb-04d4-4e27-81ed-fa5cd11d9a4a/?recurrenceId=20240925T100000 17:03:15 yup 17:03:16 Jem has joined #aria-at 17:03:32 present+ 17:04:09 ChrisCuellar has joined #aria-at 17:05:25 present+ 17:05:33 jugglinmike has joined #aria-at 17:05:39 jocelyntran has joined #aria-at 17:05:45 boaz: we're getting started. reminder about the code of conduct and the anti-trust guidance. We're not recording today 17:05:46 Matt_King has joined #aria-at 17:05:50 present+ jugglinmike 17:05:53 spectranaut_ has joined #aria-at 17:05:55 present+ 17:05:56 TylerWilcock has joined #aria-at 17:05:58 nathan_lapre has joined #aria-at 17:05:59 present+ 17:06:05 boaz: we're doing a high-level overview of the ARIA-AT system 17:06:14 MJ has joined #aria-at 17:06:17 boaz: we're doing intros 17:06:19 gsnedders has joined #aria-at 17:06:20 Patrick_H_Lauke has joined #aria-at 17:06:32 present+ 17:06:52 Jamie has joined #ARIA-AT 17:07:29 estellevw has joined #aria-at 17:07:57 hdv has joined #aria-at 17:08:00 present+ 17:09:11 present+ 17:10:01 Ben_Tillyer_ has joined #aria-at 17:10:12 CharlesL1 has joined #aria-at 17:10:17 present+ 17:10:24 present+ 17:13:06 Any objections if I stop the camera from panning around? 17:13:31 No objections if you know how to stop the camera panning! 17:14:23 boaz: we're here to talk about ARIA-AT testing and our approach to testing. Compared to some of the specifics we got into yesterday 17:15:04 present+ 17:15:05 boaz: we're testing AT interop based on test cases from APG. It's the brainchild of Matt King and input from Bocoup, based on experience contributing to web platform tests 17:15:28 boaz: I have slides with image descriptions. the slide link is in the github issue for this breakout 17:15:53 boaz: we're using verdict-based testing 17:15:56 link: https://docs.google.com/presentation/d/1gyz37dYtd9IznzAjxIXYVHqtNw-quWKuBcfegU7tL80/edit#slide=id.g3042961006a_0_0 17:16:13 Adam_Page has joined #aria-at 17:16:16 CurtBellew has joined #aria-at 17:16:18 ray-schwartz0 has joined #aria-at 17:16:19 howard-e has joined #aria-at 17:16:26 Charli has joined #aria-at 17:16:28 present+ 17:16:42 ChrisLoiselle has joined #aria-at 17:16:52 alisonmaher has joined #aria-at 17:17:09 boaz: We start with writing tests based on examples from APG and collect screen reader utterances from manual testers to determine a verdict if the correct meaning was conveyed by the AT 17:17:26 boaz: if we have a good utterance, then we have stable software 17:17:33 chrisp has joined #ARIA-AT 17:17:55 boaz: if from the verdict we get a bad utterance, we either change the software or the tests and restart the verdict process 17:18:20 present+ 17:18:43 present+ 17:19:25 boaz: if the platform or a new AT version comes, we have an automation bot that re-checks the verdict. If the results pass and then utterances have not changed, then we have stable software. If not, then we restart the process again. This is how we do regression testing against new versions of AT. 17:19:43 boaz: any questions? 17:20:49 ljoakley3 has joined #aria-at 17:20:49 boaz: next, we will describe the ARIA AT working mode. 17:21:05 Would someone re-share the link to the slide deck, please? 17:21:13 boaz: in the first step, we are doing research on the test plan 17:21:15 https://docs.google.com/presentation/d/1gyz37dYtd9IznzAjxIXYVHqtNw-quWKuBcfegU7tL80/edit#slide=id.g3042961006a_0_0 17:21:47 boaz: in the second step of draft test plan review, we have at least 2 testers collect utterances and get a verdict 17:22:05 boaz: step 3 is candidate test plan review where we determine the verdict 17:22:28 boaz: here test admins ask AT vendors to make product changes 17:23:24 boaz: if there's agreement we go to step 4. if there's disagreement then we facilitate a conversation to determine where the change needs to happen, on the software side or the AT side. This can restart the whole testing process. 17:25:00 boaz: if there's agreement, this brings us to recommended test plan reporting stage (step 4). Here we go to issue triage for recommended test plan reporting (step 5). This is where we start the regression testing against newer versions of the AT. If we detect a regression, then we restart the testing process. 17:25:24 boaz: Any questions about the working mode? 17:25:35 q+ 17:25:50 Is there a way to automate person A or person B? 17:26:23 q+ 17:26:46 boaz: Because we're not testing against a spec, we're relying on human interpretation to come up with consensus on what the utterances should be. We're not trying to automate that away 17:26:53 q+ 17:27:09 human judgement needed at the moment, which i personally welcome... 17:27:14 q+ 17:27:18 Matt_King: the whole community group is making the call 17:27:19 q+ 17:27:24 ack hdv 17:28:34 Matt_King: we're only testing JAWS, NVDA on Windows and VoiceOver on Mac. This is a foundational phase but we have hopes to go way beyond that. We have to figure out how to address IOS and Android. That's a big challenge in front of us. And then going beyond screen readers 17:28:37 ack Ben_Tillyer_ 17:29:05 boaz: if there's disagreement, there's conversation in the community group 17:29:22 q+ 17:29:36 q- 17:29:54 jcraig: there's an example of the unchecked checkbox on VoiceOver that was resolved with further conversation 17:31:33 Matt_King: there's a lot of things that can happen when running a screen reader test. Can be lots of extra speech that comes through because we're testing in the real world. If there's unexpected side effects, we categorize them as moderate or severe. We try to track things like unexpected cursor movement or extra verbosity. there's only two levels of severity. This is to make sure the testers have a consistent understanding among the group 17:31:37 ack jcraig 17:31:39 ack Adam_Page 17:32:00 boaz: there's been numerous changes to ATs based on this process. We need to compile that list 17:32:05 s/unchecked checkbox on VoiceOverthat was resolved with further conversation/checkbox on VoiceOver that was resolved with further conversation... The "unchecked" state is the implicit default for checkbox, not verbose stated speech./ 17:32:47 Matt_King: hopefully the default is no changes are needed in the test because the tests can be really simple. In those cases, we hope to get consensus and move on to the recommended phases quickly 17:33:06 ack Patrick_H_Lauke 17:33:25 Patrick_H_Lauke: what about internationalization? 17:34:04 q+ for i18n 17:34:12 qv? 17:34:23 Matt_King: we don't currently have a plan for that. That's an important scoping question. We're trying to accomplish a level of interop but not totally debug the ATs. If there's localization bugs, is that an interop or AT bug? 17:34:24 q+ 17:35:12 Patrick_H_Lauke: I don't enough to determine if i18n is in scope. There may be more ambiguous utterances in other languages 17:35:15 ack me 17:35:15 jcraig, you wanted to discuss i18n 17:35:21 ack jcraig 17:36:33 jcraig: it would be complicated to test. There's a need because of the ambiguity. Ideally, there won't be that much of a difference. Sometimes these strings come from different parts of the OS stack. But that's not the core of what we're testing here. 17:37:13 jcraig: Unless there's a scenario with a polyfill that breaks a core feature. Like macjacks? which broke Voiceover. This kind of testing could detect something like that. 17:37:38 ack Ben_Tillyer_ 17:38:14 Ben_Tillyer_: Are you looking at PC Talker? 17:38:36 Ben_Tillyer_: second question - how are you getting the sound? How exactly are you collecting the utterances? 17:38:38 s/it would be complicated to test/loc would be complicated to test b/c different loc strings are coming from different places/ 17:38:53 s/Like macjacks? which/Like MathJax, which/ 17:39:16 Ben_Tillyer_: last question - do you get responses from the screen reader about focus change or window change events? 17:39:26 s/There's a need because of the ambiguity./I acknowledge your example case of ambiguity in string localization./ 17:39:46 gregwhitworth has joined #aria-at 17:39:53 present+ 17:39:55 q? 17:39:56 q+ : can someone put a link to the current tests in this chat? 17:40:09 q+ 17:40:10 Matt_King: When we start to add more AT, for example a japanese-first screen reader? I think our framework should still work, except we have more localization work to make sure of that. 17:40:18 q+ 17:40:20 s/what we're testing here/the core of what I understand the ARIA-AT project is trying to test/ 17:41:06 s/I don't enough to determine if i18n is in scope. There may be more ambiguous utterances in other languages/I don't know enough to determine if i18n is in scope. There may be more ambiguous utterances in other languages, or there may even be cases where the AT in one language has a completely different utterance (e.g. not announcing the "checked" state, for whatever reason) 17:41:41 https://docs.google.com/presentation/d/1gyz37dYtd9IznzAjxIXYVHqtNw-quWKuBcfegU7tL80/edit#slide=id.g3042961006a_0_8 17:41:51 s/Like macjacks? which broke Voiceover. This kind of testing could detect something like that./Like earlier versions of MathJax which broke VoiceOver. (Thankfully resolved now.) That type of problem might be detectable with additional loc testing./ 17:42:02 jugglinmike: I'll explain the basics of the automation. We use AT Driver. Our AT Driver servers speak a web-driver like protocol. We maintain one server tied to NVDA. The other one communicates with MacOS. They have different strengths and weaknesses. We don't collect any events other than speech. 17:42:33 Matt_King: But we get the screen reader response to a focus change event. We capture that with AT driver 17:42:40 ack estellevw 17:42:40 estellevw, you wanted to say can someone put a link to the current tests in this chat? 17:42:45 ack gregwhitworth 17:43:15 gregwhitworth: is there an expectation to add mobile support? 17:43:17 boaz: yes 17:43:44 gregwhitworth: Is there an expectation to make the driver and test suite more available? 17:44:36 q+ 17:44:41 boaz: Yes, people are starting to copy this process. This software is open source. People are using this to test web apps. They're using this verdict-based approach on web apps, or just use the automation in the dev tool chain. 17:44:49 Here is one we did at MS in the past: https://blogs.windows.com/msedgedev/2016/05/25/accessibility-test-automation/ 17:45:10 Neil has joined #aria-at 17:45:18 gregwhitworth: An example from Narrator to check UI regression 17:45:48 gregwhitworth: It would be useful to see how the tests are authored as well. 17:46:12 boaz: An SDK that evolved from this would be a great downstream benefit from this software 17:46:38 boaz: we'll hold the queue as I step through a demo. 17:46:49 https://aria-at.w3.org 17:48:18 boaz: there's interop reports, data management and test queue. For interop reports, we can show the percentage of passing tests per AT and browser combination that we're testing. The test plans have more info about the tests in each plan for every combo of browser and At. 17:48:55 boaz: in data management, we can see the state of each of the test plans. It shows us the overall status, what stage of the working mode it's in. 17:49:43 Matt_King: we did a major refactor of our test format last year. We track the dates in all of our test reports. The older ones will be harder to read and understand. 17:50:42 side mention that 65% result is disputed (invalid expectations IMO). Matt and I are discussing it and other results later today. two issues of several more. https://github.com/w3c/aria-at/issues/1060 https://github.com/w3c/aria-at/issues/1061 17:50:59 boaz: In the Test Queue page, you can run the tests. In Candidate Review, we manage the process of working with AT vendors. Each AT has a list of where they're at with the test plan support. 17:51:31 boaz: In the test plan page, you can see all the steps of a test plan. We can take actions on each test plan 17:51:48 q+ 17:51:57 boaz: This is an open source web app. You can find it on Github and try forking it. You're also welcome to contribute! 17:52:05 ack CharlesL1 17:52:13 ack CharlesL 17:53:06 Daniel has joined #aria-at 17:53:11 q+ to mention driver complexity in the context of Greg's question about providing this via Chrome Dev Tools 17:53:18 Matt_King: Collecting speech utterances is difficult. It's hard to tell when we'll be ready to take on Braille. Testing on mobile is our next big priority. 17:53:27 ack Patrick_H_Lauke 17:53:55 Patrick_H_Lauke: What happens when the problem is not on the AT side but on the browser side? 17:54:23 boaz: Acacia is more geared toward testing the accessibility APIs 17:54:39 Patrick_H_Lauke: ARIA-AT assumes that the browser is working correctly 17:55:32 jcraig: There's multiple levels of testing at each step of the stack to ensure that the browser is working correctly 17:56:07 boaz: There's other a11y testing initiatives. WPT has one. There's the Acacia project. Many lanes of testing 17:56:16 ack Ben_Tillyer_ 17:57:24 CharlesL1 has left #aria-at 17:57:33 Ben_Tillyer_: If the verdicts are more subjective than objective. Is this interop even preferable for users? 17:57:49 Patrick_H_Lauke: here is the acacia project, adding browser exposed accessibility API testing to WPT: https://github.com/Igalia/rfcs/blob/wpt-for-aams-rfc/rfcs/wpt-for-aams.md 17:58:00 boaz: We don't have a process beyond the CG to check in if changes in ATs are desirable for end users. We would really like to do this 17:58:07 Ben_Tillyer_: how can we help? 17:58:37 boaz: we're looking for testers. You can join the community group. If you have resources to get broader user input. Come to the CG. 17:59:12 aaronlev has joined #aria-at 17:59:24 q+ 18:00:07 Matt_King: When it comes to the long-term health of this project, we need a broader funding base. This has been solely funded by Meta so far. We need to get the message out about how important AT interoperability is. We need to figure out how to replicate something like wpt.fyi for this initiative. We'd love to discuss that with folks. 18:00:16 ack jcraig 18:00:17 jcraig, you wanted to mention driver complexity in the context of Greg's question about providing this via Chrome Dev Tools 18:01:30 jcraig: Screen readers tend to be highly authorized for the OS. Getting this into Chrome DevTools would be a long-term security process. It's probably not feasible in the short term. 18:01:36 ack aaronlev 18:01:50 aaronlev: I would love to use this to catch regressions in Chrome. 18:02:02 boaz: Test your browsers too with this! 18:02:06 Thanks everyone! 18:02:22 howard-e has left #aria-at 18:02:51 RRSAgent, make minutes 18:02:53 I have made the request to generate https://www.w3.org/2024/09/25-aria-at-minutes.html gsnedders 18:03:07 s/There's multiple levels of testing at each step of the stack to ensure that the browser is working correctly/There are multiple layers of more testing at each step of the stack to attempt to ensure everything is working correctly. Granular testing is more reliable for automation. WPT Accessibility testing is all inside the browser, Acacia is testing how the accessibility comes out of the browser to the Accessibility API, and ARIA-AT is 18:03:07 testing what happens after the browser, in the Screen Readers (for now)./ 18:03:16 RSRRSAgent, make minutes 18:03:28 RRSAgent, make minutes 18:03:29 I have made the request to generate https://www.w3.org/2024/09/25-aria-at-minutes.html gsnedders 18:07:41 JackieFei has joined #aria-at 18:08:43 Patrick_H_Lauke has joined #aria-at 18:12:46 JackieFei has joined #aria-at 18:14:25 Patrick_H_Lauke has joined #aria-at 18:15:05 Adam_Page has joined #aria-at 18:17:06 Adam_Page has joined #aria-at 18:23:30 gsnedders has joined #aria-at 18:25:55 Matt_King has joined #aria-at 18:28:00 ChrisCuellar has joined #aria-at 19:01:05 ChrisCuellar has joined #aria-at 19:05:26 Patrick_H_Lauke has joined #aria-at 20:08:04 JackieFei has joined #aria-at 20:14:31 Adam_Page has joined #aria-at 20:18:41 Patrick_H_Lauke has joined #aria-at 20:33:06 Patrick_H_Lauke has joined #aria-at 20:34:34 gsnedders has joined #aria-at 21:47:27 Adam_Page has joined #aria-at 21:48:21 gsnedders has joined #aria-at 21:49:42 Patrick_H_Lauke has joined #aria-at 22:25:28 gsnedders has joined #aria-at 22:48:53 Patrick_H_Lauke has joined #aria-at 23:02:54 Adam_Page has joined #aria-at 23:03:43 gsnedders has joined #aria-at 23:51:02 gsnedders has joined #aria-at