Meeting minutes
Review agenda and next meeting dates
https://
Matt_King: Requests for changes to agenda?
Carmen: I had a question about AT support with the bots, and another question from ChrisCuellar
Matt_King: We can definitely make room for both of those; we'll tee them up for after the agenda items that we've already planned
Carmen: Thank you!
Matt_King: Next AT Driver Subgroup meeting: Monday September 8
Matt_King: Next CG meeting: Wednesday September 10
Current status
Matt_King: The big thing this week is that the "accordion" plan advanced
Matt_King: That was a lot of work resolving conflicts! Thank you dean for your part in that
Matt_King: This bumps us up to 17 plans in "Candidate Review"
Matt_King: And we have seven plans in "draft review". Four of those are on today's agenda
Matt_King: Assuming we make good progress on these over the next week or so--I'm hoping that these seven (plus one more that's approach "draft review") will be able to move forward
Matt_King: There was a JavaScript error with that eighth plan, though; I made a comment, and I'm hoping we can work it out
howard-e: Yes, I saw that, and I've responded with a suggestion for a fix
howard-e: We shouldn't be setting the ARIA property; we should be setting the HTML property
Matt_King: Ah, right. Why didn't I catch that?
howard-e: I also made another note in another pull request. This may be for a separate discussion, but the pattern says that something is required, but it isn't mentioned in the example
Matt_King: Ah, yes. Even in the references, it's not really a test of aria-checked; it's a test of HTML checked
Matt_King: It sounds like maybe I should go back to the APG documentation because the APG documentation might not be reflecting that fact
Matt_King: That's a good call-out. Thank you
Matt_King: Anyway, that's where we're at right now. It's very possible that we could reach 25 test plans by the end of the month
What was solution to VO accordion conflicts?
Matt_King: We landed accordion, and we discussed some of the issues last week, but we didn't discuss them in the context of a GitHub issue
Matt_King: According to the minutes from last week, dean was going to check on a different macBook that he has with a fresh install
Matt_King: Ultimately, I don't know how this conflict was resolved. You were using the "b" key to navigate backwards, and instead of it doing "quick nav", it was doing autocomplete
Matt_King: I wanted to create a record here because I'm not sure it will be the last time we experience this problem. How did we solve it?
dean: In a follow-up e-mail, I confirmed it was doing exactly the same thing in a fresh mac. If you start the test with "quick nav" on, and you go to that field and type the letter "b", it demonstrates the expected behavior. If you toggle quick nav, first, then it demonstrates the behavior I originally reported
Matt_King: It is a confusing issue. If "quick nav" is on by default, then setting focus on a field is having a side-effect of disabling quick nav
dean: I'm guessing that's what it does, because you can toggle it on and off, and then it will work, too
Matt_King: I don't know if Apple would consider this a bug or not. It's marginal--whether this is in-scope for this project
dean: I think that it behaves correctly. I think the test is not really correct
dean: Because I think that when you're in a form field and you type "shift+b", you should get a capital B
Matt_King: Well if quick nav is indeed turned on, then the point of having quick nav on is that it does NOT type a "b"
dean: Okay, so then it is a problem with quick nav, then
Matt_King: The setup script does not touch any of the VoiceOver settings
Matt_King: We'll stick with the way that IsaDC handled it. I'm going to make a note to myself to ask James Craig or someone else at Apple about this
Matt_King: Maybe I'll phrase it like this: "When focus is set in a text field, should quick nav functions work if quick nav is on? Or does setting focus temporarily override the quick nav setting?"
Matt_King: Great; that's very helpful to know. Thank you
Matt_King: If Apple says this is a bug, then we might want to change this to document an unexpected side-effect
dean: Yeah, we could do that It's only one test as I recall
Running Disclosure of Answers to Frequently Asked Questions test plan
Matt_King: There are 0 conflicts right now. JAWS and NVDA are complete
Matt_King: But the active testers are absent today, so there's actually not much to talk about right now
Running Switch test plan
Matt_King: We'll skip this due to inattendance, as well
Running test plan for Switch Example Using HTML Button
Matt_King: We have NVDA and JAWS responses available, ready for a volunteer to assign verdicts
dean: I had a question about my work here
dean: If the test is operating a switch is off, after doing "ctrl+option+space", it says "no actions available", would it be untestable?
Matt_King: It doesn't sound untestable--it just didn't work
dean: So we just record the output that we got and fail it
Matt_King: Yes
dean: I can pick up the NVDA work, as well.
Running test plan for Tabs with Automatic Activation
Matt_King: None of the assigned Testers are present today, so we'll skip this, as well
Bot AT Version updates
Carmen: I have two questions
Carmen: First: should we update NVDA Bot and VoiceOver Bot to the latest version?
Carmen: The second is more about process: who should give the green light for updating the bots?
Matt_King: Can we currently pick a bot version? Or do we only support one version at a time
Carmen: You can choose, but only versions that we had previously working
Matt_King: In general, we always want to move to the latest version when it becomes public
ChrisCuellar: In that case, I think it's still a process question.
ChrisCuellar: At least on the NVDA- and VoiceOver- side (for now), we will probably notice when there is a new version sooner than the CG
ChrisCuellar: Should we flag it, bring it to the CG, get approval to update, and only then update? Or should we just go ahead and update it on our own and let everyone know?
Matt_King: I'm trying to think of any scenarios where updating the bot could be disruptive or cause problems
Matt_King: When we have plans in the test queue, and we do a new run for a new version...
Matt_King: I think that in general, due to the way we use the bots, that it's not going to be a problem
Matt_King: I don't see anything wrong with Bocoup being proactive with new releases and just informing the CG
ChrisCuellar: Before we started running our own macOS images--that was the case. With GitHub, they would just update the image without warning. Now, we have more control. And it's a little different for each bot
ChrisCuellar: But even JAWS is going to let us know about new releases
ChrisCuellar: historically, with VoiceOver, we didn't know when it was upgrading--at least, not with major releases. We just had to keep an eye on GitHub's releases
ChrisCuellar: So far, it wasn't been a problem that we didn't exactly know which version was running
ChrisCuellar: But this leads me to what I think is a larger product question
ChrisCuellar: I was wondering if it would be better to move to a UI where an admin or tester could just select an available version of the bot to run
ChrisCuellar: Right now, there's some logic in the back-end that tries to match the best bot to the AT version that was selected for the test plan run
ChrisCuellar: ...but I'm wondering if it might be easier/simpler to just let the tester select the version that they want to use
Matt_King: Yeah, though there will probably always be cases where the tester wants to use an older version
Matt_King: It's pretty cool if we have those images there, now, and we can just let people do that
ChrisCuellar: I think with JAWS, we could request that the JAWS version is a flag that gets deployed on GitHub when we're running the test
Matt_King: So, for JAWS and NVDA, is it a script that builds the image?
ChrisCuellar: I think with JAWS, we just have a link to the latest build
Matt_King: So we're not storing a binding?
ChrisCuellar: Not currently
Matt_King: I think that might be an improvement. It feels like storing a binding that we use on-demand makes the infrastructure a little more robust (compared with relying on a link that could change)
ChrisCuellar: Definitely
ChrisCuellar: We have some changes to the bot-running UI coming up soon. We can revisit how much granularity we allow to admins and testers when they select a bot
<Carmen> w3c/
Add information about conflicts to the test plan status column for automated re-runs
github: w3c/
ChrisCuellar: This is kind of a gnarly product terminology question that we've been discussing a bit on the issue itself
ChrisCuellar: After an automated re-run is completed when there's a new bot version--right now, due to the way that the bots and testers are rendered on the test queue, it's really confusing. Whenever there are response conflicts between an automated re-run and the previous report--in those situations, we record the mismatch between the AT outputs, and we leave the verdict unassigned
ChrisCuellar: So the new run of the test plan is in a state where there are a certain number of unassigned verdicts. A human tester needs to go in and verify, maybe run the tests manually to confirm, etc.
ChrisCuellar: When that happens, it's very hard to find exactly how many tests in a test plan are in a certain state
Matt_King: Basically, we have two runs of a test plan, a prior report that's already been published (that we're comparing to) and a new run that we're comparing to that prior report
Matt_King: ...and we're comparing what the bot got for a response from the AT in the new one to what's in the prior report (whether from humans or bots)
Matt_King: This is very similar to what we do when we have two humans running it. We compare responses from person "a" and person "b"
Matt_King: The way we've done that, to date, is we've actually looked at verdict mismatches, only. When there are four conflicts, it means that there are four verdicts that don't match
Matt_King: We don't surface in the status the mismatches in the response.
Matt_King: That's where I think we're getting bogged down. We're talking about two different kinds of conflicts, but they have the same effect.
Matt_King: I'm hoping we can simplify things. We have four conflicting responses or four conflicting verdicts
ChrisCuellar: You were proposing to add a new kind of conflict terminology to describe the case where you're taking over an automated re-run and there are unassigned verdicts due to differences
ChrisCuellar: We were a little concerned about exactly what you just said. When we report conflicts in the "Status" column right now, it links to a conflict resolution screen, and there's a whole process
ChrisCuellar: So rather than adding a new type of conflict and muddying the meaning of the "status" column, I thought it might be cleaner to surface all ofthe stats that we currently surface for bot runs--to surface those for all human testers, too
ChrisCuellar: The statistic about "how many verdicts have been assigned" is what we're trying to chase right now, I think
ChrisCuellar: The human tester only has the number of tests complete.
ChrisCuellar: I don't remember the historical reasons why there are differences in how we report progress
Matt_King: In the very beginning, humans were always recording both the responses and verdicts at the same time. Historically, it was not a thing
ChrisCuellar: What's confusing about the UI right now is that when you re-assign a bot run to a human tester, it's hard to know how many conflicts you have to resolve at that point. You only know how many tests have been submitted, which is sort of a different number
Matt_King: It's really helpful to know if there were any differences in responses (and if so, how many). That tells you the magnitude of work that you have to do
Matt_King: It's the number of responses that tells you the amount of work that you have to do--not the number of verdicts
Matt_King: Once you know you have the responses settled, then rendering the verdicts is almost trivial
ChrisCuellar: To your point about now knowing how many responses just didn't match--I wonder if we keep that also for human testers, and that we also report there how many responses were recorded but mismatched from the previous run--maybe that's a solution
ChrisCuellar: Essentially, I was suggesting that we keep the same stats for all testers whether they are a bot or a human because that seems more informative
Matt_King: I think if we're always talking about responses and verdicts (which is always applicable whether you are talking about bots or humans), then you could report on missing verdicts. We could say "X or Y verdicts complete" or "X of Y conflicts compete"
Matt_King: The idea of "X conflicting verdicts" and "X conflicting responses" would then make more sense
<jugglinmike> s/conflicts complete/responses complete/
Matt_King: then we're always reporting the same two stats
Matt_King: The number of tests is not relevant
ChrisCuellar: Yeah, I was thinking that, too. The percent complete really reflects the assignment
Matt_King: We can get rid of that noise. We don't care about the number of tests
ChrisCuellar: To me, the "percent complete" was always very confusing
Matt_King: Let's figure out new content for the "testers" column where we only talk about responses and verdicts for both humans and bots. You could probably consolidate
Matt_King: It could be a little more compact--we could figure out a wording that's more concise
ChrisCuellar: Maybe it's better to always use percentages
Matt_King: Yeah, maybe. But we always want to show counts for conflicts because that's a statistic where the number truly matters
Matt_King: Maybe we put something like, "Progress: 40% of responses, X% of verdicts" Always put responses first because that's what you record, first
Matt_King: You might be able to make it pretty concise that way
ChrisCuellar: We'll brainstorm it a bit, but I hear you: make it concise, aim for shorter
ChrisCuellar: If we can get that progress statement as concise as possible for each tester, I think that's the goal
Matt_King: I would still want to go back to this concept of reporting the number of conflicting responses and reporting the number of conflicting verdicts
Matt_King: You could shorten it: "Conflicts: 0" or "Conflicts: 4 responses, 8 verdicts"
Matt_King: That would be pretty short
ChrisCuellar: It's still a problem, if we show it in the status--only one of them should be a link.
Matt_King: The whole text could be one link. "Conflicts: 4 responses, 8 verdicts" would be one link that leads to the conflicts page
ChrisCuellar: These aren't conflicts in the way that we typically think about them--we'll have a row with only one tester
Matt_King: Ah, I see. In that case, you're right
Matt_King: Now, I'm actually thinking that the percentage in the "testers" column is less helpful because the absolute numbers are better
ChrisCuellar: I do feel like the whole review process for an automated re-run is a different enough workflow to warrant a different screen/UI. I think trying to show-horn all these concepts into the test queue as we know it is maybe a problem.
ChrisCuellar: I think that's why we have all this confusion right now
Matt_King: We did it this way because wanted to limit the amount of new UI to develop
Matt_King: If we can solve this with content instead of code, I'd like to do that
Matt_King: I can revisit this issue with my new understanding of the problem (that is: the conflicts for automated re-runs being substantively different)
Matt_King: I'll put this on my to-do list to make a comment here
Matt_King: It sounds like we still made some good progress in terms of how we're thinking about this problem, though
ChrisCuellar: Agreed