W3C

Accessibility Conformance Testing Teleconference

16 Oct 2017

See also: IRC log

Attendees

Present
Wilco, MaryJo, Anne, Tobias, Romain, Charu, Moe
Regrets
Chair
Wilco, MaryJo
Scribe
Anne

Contents


TPAC: work on agenda https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/TPAC_2017

Wilco: I started looking at a TPAC agenda
... We are looking to move the ACT meeting, because it seems to be an issue for a lot of people
... Mary Jo and I are looking at the TPAC agenda. We have 4 topics we want to tackle. First one is test case repository. We want to look at how we want to set it up. Looking to build a meta repository, using all the repositories out there

Next item is the ACT Review process

Third, we want to have a talk with the chairs from the Silver community group, talking about the future of WCAG, or what it will be called

Fourth topic is presentations on test driving the rule format, either of plans or of results, if you have already done the test drive

<rdeltour> looks pretty complete to me!

Wilco: Are there other things, you would want to add to this agenda? Things you want to look at the coming months, so we could get a kick start at at TPAC?

<cpandhi> +1

Wilco: I have been trying to get some feedback for the ACT Rules Format. We have gotten some feedback already, so I will also add this to the agenda
... I guess the follow up on the presentation question would be who would be presenting
... I will be presenting for Deque

Anne: Stein Erik and I will be presenting for Siteimprove

Charu: Mary Jo will be representing us at TPAC

<rdeltour> I can show what we do (or plan to do, depending) at DAISY

Mary Jo: I will need a handover from Charu, since she's working with the rules

Wilco: All right, I have updated the agenda for TPAC

Do we need cannot-tell results https://github.com/w3c/wcag-act/issues/69

Wilco: So the point of view is that every rule can be written so that it never comes to a cannot tell result
... An example is a color contrast rule where you first test if you can find a background color, if you can't it's "cannot tell"
... The other way around is to only test where you can find a background color
... I would prefer the cannot tell approach, since this communicates to the user that this is something that should be tested manually
... How do you guys do it in your tools?

Tobias: We don't have a concept of cannot tell. We do have review items

Charu: We have potential violations and manual checks, which are similar to cannot tell. We clarify that automation can't validate these fully. If we make it clear up front that this will only cover automated testing, we can remove cannot tell. But there will be cases where automation can do part of these tests. I'm on two minds on how we can do this

Alistair: In Level Access we have three types of tests: automatics (pass, fail, unapplicable), guided automatics (potential fails), guided manual (no sofisticated tests at all, but can flag content type)
... One of the outcomes from the guided automated tests would be "user needs to review" or "inapplicable"
... "Needs manual review" is a much nicer way to put it than "cannot tell"

Wilco: Can we add something like "you can have cannot tell in semi-automated tests"

Romain: Since "cannot tell" is used in EARL I would keep that

Wilco: It's just the data format, you don't have to show it like that in the UI

RESOLUTION: Add a note that cantTell results should be limited to semi-automated rules

Alistair: As I understand it, not many people are using EARL, so maybe we don't have to be completely bound by it

Recommendation for Accuracy Benchmarking section https://github.com/w3c/wcag-act/issues/118

Wilco: So the comment we got here is that even experts won't get a 100% agreement, do we need to change the text to reflect that
... And there is a suggestion of using statistics for finding false positives

<Wilco> https://www.w3.org/TR/act-rules-format/#quality-accuracy

Wilco: I agree with Annika that this doesn't really take into account the accuracy of humans
... Does any of you have experience with combining manual results with automated results? It assumes that false positives/negatives can be calculated as a percentage, but even experts get it wrong some of the time

Alistair: If my tool is running perfectly fine on all of the unit tests, but accessibility experts disagree...
... We are capturing what content causes these issues, and have our experts look at that and then add that to our algorithm running the tests and our unit tests

Wilco: That assumes that everything is running automatically. What about those that require user input. If users tell the wrong example 10% of the time, how do you improve on that?

Alistair: The problem is that you only know that they have made a wrong choice unless you are pooling results
... I think that it might be too much outside of what we can do. If we need to cross-check a sample of what we can do and then also with experts

Wilco: It's maybe more of a process question than a formatting question...

Tobias: It makes most sense to do for semi-automated tests, and then the measure will be how many users get it wrong, but else issues should maybe be handled as bugs in rules

Output data format for rule groups https://github.com/w3c/wcag-act/issues/116

<Wilco> https://www.w3.org/TR/act-rules-format/#output

Wilco: It's a good question: How does Rule groups impact the output data
... So we have got our tests, that would normally point to a single rule, but if you have a Rule group, what do you point to? Do you point to the one that fails? Do you point to all of them?
... So you have a Rule group. If any of the rules pass or all of them pass, the group passes

Alistair: So you would want to point to the ones that pass?

Romain: You should be able to have nested outcome

<Wilco> https://www.w3.org/TR/act-rules-format/#appendix-data-example

Wilco: In the aggregations we do something similar. But you need to have a pointer for each rule

Alistair: So why don't you just have a supertest and then outline the business rules for that supertest

Wilco: yes, that is exactly what it is
... Alistair: You would also need meta data, expressing what passed within the rule

Romain: How would the result from a rule group be different from an aggregation?

Wilco: It works in opposite ways, in an aggregation any fail while cause the parent to fail, in a rule group it's opposite

Alistair: It's a bit limiting to have any of these rules causing a pass of the rule group, you might want to have a different logic, for example having two of them pass to get a pass or none of them

Anne: I think this would be needed to cover the need

Alistair: In the future you might not want to just find a pass/fail/cannot tell, but want to find a piece of information, and then pull them into larger, more complicated tests that holds the business logic
... An example is a link in a bunch of text, and we tell users to judge if the link is right in that context. In the future we might want machine learning to judge that, and want to pull in this context to make that judgement

Terminology in Section 8 ACT Data Format https://github.com/w3c/wcag-act/issues/115

Wilco: I think this one is pretty straight-forward. We use the terms "test target" and "selected item", that basically means the same thing. Which one do we prefer? One says what happened, the one one what we are going to use it for

Alistair: I prefer "selected item" because of the css selectors

Moe: When I hear "selected item" the first thing that comes to mind is an item that is selected in the UI

Romain: When we talk about that object in the abstract, I think we should use "test target"

<tobias> +1 for "element"

Alistair: When defining the "test target" we need to say "the thing selected by the css selectors"

Romain: the items are not necessarily DOM nodes

Alistair: it could also be a widget, part of the page...

Charu: We use "context", so "selected context", "text context"

RESOLUTION: We're going with test target, it needs a definition

Wilco: That brings up other connotations

Summary of Action Items

Summary of Resolutions

  1. Add a note that cantTell results should be limited to semi-automated rules
  2. We're going with test target, it needs a definition
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.152 (CVS log)
$Date: 2017/10/17 09:20:29 $