IRC log of wcag-act on 2017-05-08

Timestamps are in UTC.

14:00:06 [RRSAgent]
RRSAgent has joined #wcag-act
14:00:06 [RRSAgent]
logging to http://www.w3.org/2017/05/08-wcag-act-irc
14:00:08 [trackbot]
RRSAgent, make logs public
14:00:08 [Zakim]
Zakim has joined #wcag-act
14:00:10 [trackbot]
Zakim, this will be
14:00:10 [Zakim]
I don't understand 'this will be', trackbot
14:00:11 [trackbot]
Meeting: Accessibility Conformance Testing Teleconference
14:00:11 [trackbot]
Date: 08 May 2017
14:00:44 [Wilco]
agenda?
14:00:46 [Wilco]
agenda+ Benchmark definition - Issue #81 https://github.com/w3c/wcag-act/issues
14:00:53 [Wilco]
agenda+ Topics to address in the Rules https://www.w3.org/TR/act-rules-format/
14:01:01 [maryjom]
maryjom has joined #wcag-act
14:01:03 [Wilco]
agenda+ Test case format https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Testing_Resources
14:01:11 [Wilco]
agenda+ Rules repository https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Rules_repository
14:01:17 [Wilco]
agenda?
14:01:45 [maryjom]
present+ MaryJoMueller
14:02:40 [Kathy]
Kathy has joined #wcag-act
14:04:20 [skotkjerra]
skotkjerra has joined #wcag-act
14:04:53 [shadi]
scribe: shadi
14:05:09 [shadi]
zakim, who is on the phone?
14:05:09 [Zakim]
Present: MaryJoMueller
14:05:39 [shadi]
present+ Wilco, Kathy, SteinErik, Debra
14:05:49 [shadi]
present+ Moe
14:06:34 [shadi]
present+ Sujasree
14:07:12 [shadi]
Topic: Introductions
14:07:21 [shadi]
Sujasree: leading Deque team in India
14:07:42 [shadi]
Debra: enagement manager for Deque fedral account
14:07:55 [shadi]
...want to understand the direction and stay on top of it
14:08:11 [shadi]
...not a coder but happy to help
14:08:25 [shadi]
zakim, take up next
14:08:25 [Zakim]
agendum 1. "Benchmark definition - Issue #81 https://github.com/w3c/wcag-act/issues" taken up [from Wilco]
14:08:53 [Wilco]
https://github.com/w3c/wcag-act/issues/81
14:09:29 [MoeKraft]
MoeKraft has joined #wcag-act
14:09:36 [shadi]
WF: previous discussion - two weeks ago - brought this up
14:09:58 [shadi]
...topic keeps coming up
14:10:06 [shadi]
...confusion about what was meant
14:10:38 [shadi]
...initially was meant as a mechanism to figure out if rules will generate false positives in practice
14:10:44 [shadi]
...measure their accuracy
14:10:55 [shadi]
...no real solution proposed
14:11:20 [shadi]
...initially thought about comparing tools to manually tested results
14:11:39 [shadi]
...but that idea is changing
14:11:54 [shadi]
...more about collecting feedback by using our rules
14:12:12 [shadi]
...let users try out the rules, and react to feedback
14:12:44 [shadi]
...IBM and Deque have a kind "beta" or "experimental" approach
14:12:52 [shadi]
...until tests are confirmed
14:13:25 [shadi]
SES: so validation is, it is accepted until someone complains?
14:13:52 [shadi]
WF: up for discussion
14:14:05 [shadi]
SES: so what would the use be of the benchmarking?
14:14:25 [shadi]
...not sure to receive comments
14:14:40 [shadi]
...unless you have a mechanism to ensure testing
14:14:47 [shadi]
...but not guarantee
14:15:00 [shadi]
...so what is the purpose then?
14:15:04 [shadi]
q+
14:15:55 [MoeKraft]
Shadi: What I understood Alistair was proposing. When you develop a rule you develop test cases along with the rule. The two approaches are not mutually exclusive.
14:16:41 [shadi]
q-
14:16:43 [MoeKraft]
Shadi: w3c maturity model, things in draft or testing phase however we need at least a minimal amount of testing and ask for feedback for further validation.
14:16:57 [shadi]
WF: already part of the spec to write test cases
14:17:19 [shadi]
...that filters out the known potential problems
14:17:29 [shadi]
...but that is not benchmarking
14:18:29 [shadi]
WF: to respond to SteinErik, before rules are put on the repository, have test cases
14:18:51 [shadi]
...additionally a feedback mechanism
14:19:24 [shadi]
DM: already an existing repository of failure conditions?
14:19:59 [shadi]
WF: yes. currently different tools have their test case repositories, but want to merge these
14:20:11 [shadi]
DM: so common place to validate the rules
14:21:10 [shadi]
WF: thoughts?
14:21:23 [shadi]
...maybe need to include positive feedback too
14:21:46 [shadi]
SES: concern about the usefullness of the information we receive
14:21:52 [shadi]
...no clear view yet
14:21:57 [shadi]
q+
14:22:25 [Wilco]
ack s
14:23:13 [MoeKraft]
Shadi: We definitely are changing from what we originally had in mind for benchmarking. At least what we have in our work statement. If we do have a test suite, we could have tools run by themselves to provide information on how well developers support test suite.
14:23:40 [MoeKraft]
Shadi: Not sure how many tool vendors would want to expose false positives.
14:24:13 [skotkjerra]
q+
14:24:33 [MoeKraft]
Shadi: There would have to be self declaration. But this could be criteria. Force some useful information to come back here. If tool implementation does not report rule comes back cleanly, then it is experimental.
14:24:43 [MoeKraft]
Shadi: Test cases would need to be versioned too.
14:24:54 [MoeKraft]
Shadi: Would be complex because of regression.
14:25:39 [MoeKraft]
Shadi: Encourage someone who proposes are rule to get it implemented first. Plus, we would get more competition because tools vendors would try to implement and get a green light.
14:25:51 [shadi]
WF: so what would be the evidence that rules work in practice?
14:26:04 [MoeKraft]
Wilco: What would be the evidence that a rule is accurate?
14:26:04 [shadi]
...if enough tools implement it?
14:26:53 [MoeKraft]
Shadi: First criteria, hope there is an active community that constantly reviews rules proposed. This is the first level of checking. Rules proposed by vendors
14:27:33 [MoeKraft]
Shadi: If rule is accepted by competitor this is a good sign. Consensus building. Raise the bar to 3 independent implementations. Disclose how many vendors implement rule.
14:28:33 [shadi]
WF: hesitant because implementation is only proof of accuracy if tools implement no false positives
14:29:07 [shadi]
...some rules indicate a level of accuracy
14:29:46 [shadi]
...for example if someone proposes a test rule with only 70 accuracy
14:30:13 [shadi]
DM: can run against failure conditions, but can also check for false positives
14:30:23 [shadi]
...then there is semi-automated testing, where human intervention is needed
14:30:38 [shadi]
...may or may not test to the full standard
14:30:44 [shadi]
...like maybe only 50%
14:30:50 [shadi]
...so it is a scale to test
14:30:53 [shadi]
q?
14:31:18 [shadi]
WF: agree that some rules are more reliable than others
14:31:32 [shadi]
DM: reliability and completeness
14:31:46 [shadi]
SES: yes, but what does accurate actually mean?
14:32:42 [shadi]
...consistent and repeatable versus correct
14:33:13 [shadi]
WF: "average between false positives and false negatives" (reads out from spec)
14:33:37 [Wilco]
https://w3c.github.io/wcag-act/act-rules-format.html#quality-accuracy
14:34:50 [shadi]
q+
14:34:55 [shadi]
ack sko
14:34:57 [Wilco]
ack sk
14:35:00 [Wilco]
ack sh
14:37:46 [shadi]
SAZ: think really trying to avoid a central group gate-keeping to scale up
14:38:00 [shadi]
...but need minimum bar of acceptance defined by the test cases
14:38:26 [shadi]
...this can be increased over time, as new situations and new technologies emerge
14:38:39 [shadi]
...may even need to pull rules at some point
14:39:01 [shadi]
WF: can publish rules at any time, with different maturity flags
14:39:13 [shadi]
...fits with the W3C process
14:39:16 [shadi]
SES: agree
14:39:22 [shadi]
MJM: so do i
14:39:27 [shadi]
SAZ: me too
14:40:11 [shadi]
WF: maybe not more than one flag
14:40:23 [shadi]
...just something like "beta" or "experimental"
14:41:16 [shadi]
SES: rely on implementers providing information that the tools function
14:42:13 [shadi]
WF: when implemeters find an issue, and make a change, to encourage them to provide that feedback
14:42:37 [shadi]
SAZ: ideally by adding a new test case
14:43:00 [shadi]
WF: want to make sure that the rules stay in synch
14:43:59 [shadi]
WF: implementers should give feedback by way of test cases
14:44:14 [shadi]
...does not work unless implementers share their test cases
14:44:49 [shadi]
...need iterative cycles, but need a way to do that
14:44:57 [shadi]
...have to encourage tool vendors
14:45:49 [shadi]
MK: guess some vendors will not want to share all their rules
14:46:31 [shadi]
WF: so can't take rules unless test cases are shared back
14:46:51 [shadi]
MK: how to phrase that
14:47:43 [shadi]
SES: fair expectation to set rules will be shared, because it is a quality check
14:48:40 [shadi]
q+
14:49:20 [shadi]
WF: does somebody want to take over writing up this part?
14:49:26 [shadi]
...also need a name change
14:49:33 [shadi]
SES: how about validation?
14:49:48 [shadi]
WF: think publication requirements
14:50:03 [shadi]
...talking about how test rules get posted on the W3C website
14:50:05 [Wilco]
ack s
14:52:09 [skotkjerra]
q+
14:52:31 [shadi]
SAZ: can take this over
14:53:01 [shadi]
...like the idea of incentives
14:53:21 [shadi]
...and the cycle that the incentives drive
14:53:30 [shadi]
SES: happy to work on this too
14:54:13 [shadi]
RESOLUTION: SES will head up drafting the publication/validation/benchmarking piece, with SAZ supporting
14:54:25 [shadi]
WF: MaryJo maybe you can help too
14:54:45 [shadi]
DM: what are the failure conditions of a rule?
14:55:04 [shadi]
WF: think that is what we can test procedure
14:55:45 [shadi]
DM: clients want to know what in the rules triggers the "fail"
14:55:56 [Wilco]
https://auto-wcag.github.io/auto-wcag/rules/SC4-1-1-idref.html
14:55:57 [shadi]
WF: that is the test procedure
14:57:15 [Wilco]
ack sk
14:58:42 [shadi]
WF: SteinErik, draft by next week?
14:58:48 [shadi]
SES: yes, will try
14:59:14 [shadi]
SK: still behind but will catch up
14:59:31 [MoeKraft]
+1
14:59:40 [Kathy]
present+ Kathy
14:59:42 [shadi]
DM: amazing effort! complicated, but excellent to address
15:00:43 [shadi]
...shared effort
15:01:04 [shadi]
SK: do we create our own test files?
15:01:20 [shadi]
WF: we have some test cases, will send you the link
15:02:00 [shadi]
trackbot, end meeting
15:02:00 [trackbot]
Zakim, list attendees
15:02:00 [Zakim]
As of this point the attendees have been MaryJoMueller, Wilco, Kathy, SteinErik, Debra, Moe, Sujasree
15:02:08 [trackbot]
RRSAgent, please draft minutes
15:02:08 [RRSAgent]
I have made the request to generate http://www.w3.org/2017/05/08-wcag-act-minutes.html trackbot
15:02:09 [trackbot]
RRSAgent, bye
15:02:09 [RRSAgent]
I see no action items