14:00:06 RRSAgent has joined #wcag-act 14:00:06 logging to http://www.w3.org/2017/05/08-wcag-act-irc 14:00:08 RRSAgent, make logs public 14:00:08 Zakim has joined #wcag-act 14:00:10 Zakim, this will be 14:00:10 I don't understand 'this will be', trackbot 14:00:11 Meeting: Accessibility Conformance Testing Teleconference 14:00:11 Date: 08 May 2017 14:00:44 agenda? 14:00:46 agenda+ Benchmark definition - Issue #81 https://github.com/w3c/wcag-act/issues 14:00:53 agenda+ Topics to address in the Rules https://www.w3.org/TR/act-rules-format/ 14:01:01 maryjom has joined #wcag-act 14:01:03 agenda+ Test case format https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Testing_Resources 14:01:11 agenda+ Rules repository https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Rules_repository 14:01:17 agenda? 14:01:45 present+ MaryJoMueller 14:02:40 Kathy has joined #wcag-act 14:04:20 skotkjerra has joined #wcag-act 14:04:53 scribe: shadi 14:05:09 zakim, who is on the phone? 14:05:09 Present: MaryJoMueller 14:05:39 present+ Wilco, Kathy, SteinErik, Debra 14:05:49 present+ Moe 14:06:34 present+ Sujasree 14:07:12 Topic: Introductions 14:07:21 Sujasree: leading Deque team in India 14:07:42 Debra: enagement manager for Deque fedral account 14:07:55 ...want to understand the direction and stay on top of it 14:08:11 ...not a coder but happy to help 14:08:25 zakim, take up next 14:08:25 agendum 1. "Benchmark definition - Issue #81 https://github.com/w3c/wcag-act/issues" taken up [from Wilco] 14:08:53 https://github.com/w3c/wcag-act/issues/81 14:09:29 MoeKraft has joined #wcag-act 14:09:36 WF: previous discussion - two weeks ago - brought this up 14:09:58 ...topic keeps coming up 14:10:06 ...confusion about what was meant 14:10:38 ...initially was meant as a mechanism to figure out if rules will generate false positives in practice 14:10:44 ...measure their accuracy 14:10:55 ...no real solution proposed 14:11:20 ...initially thought about comparing tools to manually tested results 14:11:39 ...but that idea is changing 14:11:54 ...more about collecting feedback by using our rules 14:12:12 ...let users try out the rules, and react to feedback 14:12:44 ...IBM and Deque have a kind "beta" or "experimental" approach 14:12:52 ...until tests are confirmed 14:13:25 SES: so validation is, it is accepted until someone complains? 14:13:52 WF: up for discussion 14:14:05 SES: so what would the use be of the benchmarking? 14:14:25 ...not sure to receive comments 14:14:40 ...unless you have a mechanism to ensure testing 14:14:47 ...but not guarantee 14:15:00 ...so what is the purpose then? 14:15:04 q+ 14:15:55 Shadi: What I understood Alistair was proposing. When you develop a rule you develop test cases along with the rule. The two approaches are not mutually exclusive. 14:16:41 q- 14:16:43 Shadi: w3c maturity model, things in draft or testing phase however we need at least a minimal amount of testing and ask for feedback for further validation. 14:16:57 WF: already part of the spec to write test cases 14:17:19 ...that filters out the known potential problems 14:17:29 ...but that is not benchmarking 14:18:29 WF: to respond to SteinErik, before rules are put on the repository, have test cases 14:18:51 ...additionally a feedback mechanism 14:19:24 DM: already an existing repository of failure conditions? 14:19:59 WF: yes. currently different tools have their test case repositories, but want to merge these 14:20:11 DM: so common place to validate the rules 14:21:10 WF: thoughts? 14:21:23 ...maybe need to include positive feedback too 14:21:46 SES: concern about the usefullness of the information we receive 14:21:52 ...no clear view yet 14:21:57 q+ 14:22:25 ack s 14:23:13 Shadi: We definitely are changing from what we originally had in mind for benchmarking. At least what we have in our work statement. If we do have a test suite, we could have tools run by themselves to provide information on how well developers support test suite. 14:23:40 Shadi: Not sure how many tool vendors would want to expose false positives. 14:24:13 q+ 14:24:33 Shadi: There would have to be self declaration. But this could be criteria. Force some useful information to come back here. If tool implementation does not report rule comes back cleanly, then it is experimental. 14:24:43 Shadi: Test cases would need to be versioned too. 14:24:54 Shadi: Would be complex because of regression. 14:25:39 Shadi: Encourage someone who proposes are rule to get it implemented first. Plus, we would get more competition because tools vendors would try to implement and get a green light. 14:25:51 WF: so what would be the evidence that rules work in practice? 14:26:04 Wilco: What would be the evidence that a rule is accurate? 14:26:04 ...if enough tools implement it? 14:26:53 Shadi: First criteria, hope there is an active community that constantly reviews rules proposed. This is the first level of checking. Rules proposed by vendors 14:27:33 Shadi: If rule is accepted by competitor this is a good sign. Consensus building. Raise the bar to 3 independent implementations. Disclose how many vendors implement rule. 14:28:33 WF: hesitant because implementation is only proof of accuracy if tools implement no false positives 14:29:07 ...some rules indicate a level of accuracy 14:29:46 ...for example if someone proposes a test rule with only 70 accuracy 14:30:13 DM: can run against failure conditions, but can also check for false positives 14:30:23 ...then there is semi-automated testing, where human intervention is needed 14:30:38 ...may or may not test to the full standard 14:30:44 ...like maybe only 50% 14:30:50 ...so it is a scale to test 14:30:53 q? 14:31:18 WF: agree that some rules are more reliable than others 14:31:32 DM: reliability and completeness 14:31:46 SES: yes, but what does accurate actually mean? 14:32:42 ...consistent and repeatable versus correct 14:33:13 WF: "average between false positives and false negatives" (reads out from spec) 14:33:37 https://w3c.github.io/wcag-act/act-rules-format.html#quality-accuracy 14:34:50 q+ 14:34:55 ack sko 14:34:57 ack sk 14:35:00 ack sh 14:37:46 SAZ: think really trying to avoid a central group gate-keeping to scale up 14:38:00 ...but need minimum bar of acceptance defined by the test cases 14:38:26 ...this can be increased over time, as new situations and new technologies emerge 14:38:39 ...may even need to pull rules at some point 14:39:01 WF: can publish rules at any time, with different maturity flags 14:39:13 ...fits with the W3C process 14:39:16 SES: agree 14:39:22 MJM: so do i 14:39:27 SAZ: me too 14:40:11 WF: maybe not more than one flag 14:40:23 ...just something like "beta" or "experimental" 14:41:16 SES: rely on implementers providing information that the tools function 14:42:13 WF: when implemeters find an issue, and make a change, to encourage them to provide that feedback 14:42:37 SAZ: ideally by adding a new test case 14:43:00 WF: want to make sure that the rules stay in synch 14:43:59 WF: implementers should give feedback by way of test cases 14:44:14 ...does not work unless implementers share their test cases 14:44:49 ...need iterative cycles, but need a way to do that 14:44:57 ...have to encourage tool vendors 14:45:49 MK: guess some vendors will not want to share all their rules 14:46:31 WF: so can't take rules unless test cases are shared back 14:46:51 MK: how to phrase that 14:47:43 SES: fair expectation to set rules will be shared, because it is a quality check 14:48:40 q+ 14:49:20 WF: does somebody want to take over writing up this part? 14:49:26 ...also need a name change 14:49:33 SES: how about validation? 14:49:48 WF: think publication requirements 14:50:03 ...talking about how test rules get posted on the W3C website 14:50:05 ack s 14:52:09 q+ 14:52:31 SAZ: can take this over 14:53:01 ...like the idea of incentives 14:53:21 ...and the cycle that the incentives drive 14:53:30 SES: happy to work on this too 14:54:13 RESOLUTION: SES will head up drafting the publication/validation/benchmarking piece, with SAZ supporting 14:54:25 WF: MaryJo maybe you can help too 14:54:45 DM: what are the failure conditions of a rule? 14:55:04 WF: think that is what we can test procedure 14:55:45 DM: clients want to know what in the rules triggers the "fail" 14:55:56 https://auto-wcag.github.io/auto-wcag/rules/SC4-1-1-idref.html 14:55:57 WF: that is the test procedure 14:57:15 ack sk 14:58:42 WF: SteinErik, draft by next week? 14:58:48 SES: yes, will try 14:59:14 SK: still behind but will catch up 14:59:31 +1 14:59:40 present+ Kathy 14:59:42 DM: amazing effort! complicated, but excellent to address 15:00:43 ...shared effort 15:01:04 SK: do we create our own test files? 15:01:20 WF: we have some test cases, will send you the link 15:02:00 trackbot, end meeting 15:02:00 Zakim, list attendees 15:02:00 As of this point the attendees have been MaryJoMueller, Wilco, Kathy, SteinErik, Debra, Moe, Sujasree 15:02:08 RRSAgent, please draft minutes 15:02:08 I have made the request to generate http://www.w3.org/2017/05/08-wcag-act-minutes.html trackbot 15:02:09 RRSAgent, bye 15:02:09 I see no action items