See also: IRC log
Wilco: I started looking at a TPAC agenda
... We are looking to move the ACT meeting, because it seems to be an
issue for a lot of people
... Mary Jo and I are looking at the TPAC agenda. We have 4 topics we
want to tackle. First one is test case repository. We want to look at
how we want to set it up. Looking to build a meta repository, using all
the repositories out there
Next item is the ACT Review process
Third, we want to have a talk with the chairs from the Silver community group, talking about the future of WCAG, or what it will be called
Fourth topic is presentations on test driving the rule format, either of plans or of results, if you have already done the test drive
<rdeltour> looks pretty complete to me!
Wilco: Are there other things, you would want to add to this agenda? Things you want to look at the coming months, so we could get a kick start at at TPAC?
<cpandhi> +1
Wilco: I have been trying to get some
feedback for the ACT Rules Format. We have gotten some feedback already,
so I will also add this to the agenda
... I guess the follow up on the presentation question would be who
would be presenting
... I will be presenting for Deque
Anne: Stein Erik and I will be presenting for Siteimprove
Charu: Mary Jo will be representing us at TPAC
<rdeltour> I can show what we do (or plan to do, depending) at DAISY
Mary Jo: I will need a handover from Charu, since she's working with the rules
Wilco: All right, I have updated the agenda for TPAC
Wilco: So the point of view is that every
rule can be written so that it never comes to a cannot tell result
... An example is a color contrast rule where you first test if you can
find a background color, if you can't it's "cannot tell"
... The other way around is to only test where you can find a background
color
... I would prefer the cannot tell approach, since this communicates to
the user that this is something that should be tested manually
... How do you guys do it in your tools?
Tobias: We don't have a concept of cannot tell. We do have review items
Charu: We have potential violations and manual checks, which are similar to cannot tell. We clarify that automation can't validate these fully. If we make it clear up front that this will only cover automated testing, we can remove cannot tell. But there will be cases where automation can do part of these tests. I'm on two minds on how we can do this
Alistair: In Level Access we have three
types of tests: automatics (pass, fail, unapplicable), guided automatics
(potential fails), guided manual (no sofisticated tests at all, but can
flag content type)
... One of the outcomes from the guided automated tests would be "user
needs to review" or "inapplicable"
... "Needs manual review" is a much nicer way to put it than "cannot
tell"
Wilco: Can we add something like "you can have cannot tell in semi-automated tests"
Romain: Since "cannot tell" is used in EARL I would keep that
Wilco: It's just the data format, you don't have to show it like that in the UI
RESOLUTION: Add a note that cantTell results should be limited to semi-automated rules
Alistair: As I understand it, not many people are using EARL, so maybe we don't have to be completely bound by it
Wilco: So the comment we got here is that
even experts won't get a 100% agreement, do we need to change the text
to reflect that
... And there is a suggestion of using statistics for finding false
positives
<Wilco> https://www.w3.org/TR/act-rules-format/#quality-accuracy
Wilco: I agree with Annika that this
doesn't really take into account the accuracy of humans
... Does any of you have experience with combining manual results with
automated results? It assumes that false positives/negatives can be
calculated as a percentage, but even experts get it wrong some of the
time
Alistair: If my tool is running perfectly
fine on all of the unit tests, but accessibility experts disagree...
... We are capturing what content causes these issues, and have our
experts look at that and then add that to our algorithm running the
tests and our unit tests
Wilco: That assumes that everything is running automatically. What about those that require user input. If users tell the wrong example 10% of the time, how do you improve on that?
Alistair: The problem is that you only know
that they have made a wrong choice unless you are pooling results
... I think that it might be too much outside of what we can do. If we
need to cross-check a sample of what we can do and then also with
experts
Wilco: It's maybe more of a process question than a formatting question...
Tobias: It makes most sense to do for semi-automated tests, and then the measure will be how many users get it wrong, but else issues should maybe be handled as bugs in rules
<Wilco> https://www.w3.org/TR/act-rules-format/#output
Wilco: It's a good question: How does Rule
groups impact the output data
... So we have got our tests, that would normally point to a single
rule, but if you have a Rule group, what do you point to? Do you point
to the one that fails? Do you point to all of them?
... So you have a Rule group. If any of the rules pass or all of them
pass, the group passes
Alistair: So you would want to point to the ones that pass?
Romain: You should be able to have nested outcome
<Wilco> https://www.w3.org/TR/act-rules-format/#appendix-data-example
Wilco: In the aggregations we do something similar. But you need to have a pointer for each rule
Alistair: So why don't you just have a supertest and then outline the business rules for that supertest
Wilco: yes, that is exactly what it is
... Alistair: You would also need meta data, expressing what passed
within the rule
Romain: How would the result from a rule group be different from an aggregation?
Wilco: It works in opposite ways, in an aggregation any fail while cause the parent to fail, in a rule group it's opposite
Alistair: It's a bit limiting to have any of these rules causing a pass of the rule group, you might want to have a different logic, for example having two of them pass to get a pass or none of them
Anne: I think this would be needed to cover the need
Alistair: In the future you might not want
to just find a pass/fail/cannot tell, but want to find a piece of
information, and then pull them into larger, more complicated tests that
holds the business logic
... An example is a link in a bunch of text, and we tell users to judge
if the link is right in that context. In the future we might want
machine learning to judge that, and want to pull in this context to make
that judgement
Wilco: I think this one is pretty straight-forward. We use the terms "test target" and "selected item", that basically means the same thing. Which one do we prefer? One says what happened, the one one what we are going to use it for
Alistair: I prefer "selected item" because of the css selectors
Moe: When I hear "selected item" the first thing that comes to mind is an item that is selected in the UI
Romain: When we talk about that object in the abstract, I think we should use "test target"
<tobias> +1 for "element"
Alistair: When defining the "test target" we need to say "the thing selected by the css selectors"
Romain: the items are not necessarily DOM nodes
Alistair: it could also be a widget, part of the page...
Charu: We use "context", so "selected context", "text context"
RESOLUTION: We're going with test target, it needs a definition
Wilco: That brings up other connotations