13:53:14 RRSAgent has joined #wcag-act 13:53:14 logging to http://www.w3.org/2017/04/03-wcag-act-irc 13:53:16 RRSAgent, make logs public 13:53:19 Zakim, this will be 13:53:19 I don't understand 'this will be', trackbot 13:53:19 Meeting: Accessibility Conformance Testing Teleconference 13:53:19 Date: 03 April 2017 13:55:55 Wilco has joined #wcag-act 13:56:13 topic+ Test case repository https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Test_case_repository 13:56:24 topic+ Rules repository https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Rules_repository 13:56:31 topic+ ACT benchmark https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Benchmark_requirements 13:56:39 topic+ Open Issues in Github https://github.com/w3c/wcag-act/issues 13:56:41 agenda? 14:00:05 Kathy has joined #wcag-act 14:00:39 annethyme has joined #wcag-act 14:02:02 maryjom has joined #wcag-act 14:02:37 rdeltour has joined #wcag-act 14:04:38 present+ Kathy 14:04:47 present+ 14:04:49 present+ 14:05:05 MoeKraft has joined #wcag-act 14:05:16 present+ MaryJoMueller 14:05:18 wilco: new members! please introduce yourself 14:05:32 present+ MoeKraft 14:06:54 zakim, pick up next 14:06:54 I don't understand 'pick up next', rdeltour 14:07:28 zakim, take up next 14:07:28 agendum 4. "ACT benchmark https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Benchmark_requirements" taken up [from Wilco] 14:07:53 agenda? 14:08:09 zakim, clear agenda 14:08:09 agenda cleared 14:08:47 cpandhi has joined #wcag-act 14:09:05 present+ cpandhi 14:09:17 https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Test_case_repository 14:09:18 agenda+ Test case repository https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Test_case_repository 14:09:38 abenda+ Rules repository https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Rules_repository 14:09:50 zakim, take up next 14:09:50 agendum 1. "Test case repository https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Test_case_repository" taken up [from maryjom] 14:09:57 agenda+ ACT benchmark https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Benchmark_requirements 14:10:03 wilco: we want to put together a set of test cases for running automated test tools 14:10:10 ... something very straightforward 14:10:17 agenda+ Open Issues in Github https://github.com/w3c/wcag-act/issues 14:10:25 ... 2 parts: 14:10:43 ... 1. a small test case (HTML file) with good/bad a11y practice 14:11:09 ... + some metadata on the SC and which element passes or fails the criteria 14:12:08 https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Testing_Resources 14:12:49 wiloc: we started putting together a list of ressources that could help with testing (OOA, axe Rules, QuailJS, etc.) 14:12:58 s/wiloc/wilco 14:13:19 ... do you have your own list internally that you might be using and willing to share? 14:13:36 anne: Sign Improve has a list, I can ask whether we're willing to share 14:13:53 kathy: we have a list on github 14:14:10 wilco: IBM, you have the Va11ys project 14:14:21 moe: yes, with samples for the WCAG techniques 14:14:43 ... are you looking for working code, or rules? we also have a new a11y testing dashboard, with rules in English 14:14:54 wilco: we're looking specifically for test cases 14:15:06 charu: then I don't think our dashboard would be useful 14:15:23 moe: Va11ys can be useful 14:15:27 wilco: can you tell us more? 14:15:53 moe: it's a gh repo, we took the code snippets from the WCAG techniques and put them in individual HTML files 14:16:00 ... served on github.io 14:16:33 ... we have some additional ones, but the ones from the WCAG techniques are the same (but available as live code) 14:16:51 wilco: how did you document the results you want to get out of this? 14:17:02 moe: we wanted to provided a simple way for developers to see live working samples 14:17:13 ... we're not using as a bucket of tests to run and produce results 14:17:22 ... just to demonstrate how to properly code for a11y 14:17:52 ... I don't say it can't be used for regression testing, it's a potential but not something we did 14:18:03 wilco: if we wanted to do sth like that, how would we start? 14:18:16 https://ibma.github.io/Va11yS/ 14:18:18 moe: we have no negative test cases 14:18:34 wilco: most of the tests written for tools are probably negative test cases 14:19:26 wilco: romain, you had a list as well? 14:19:56 romain: it's a short list that we just set up to manually evaluate checker tools, but might not be useful in this context 14:20:24 wilco: the question is how do we want to put our tests together to have something that tool develoeprs can easily run against? 14:20:57 charu: we may have a benchmark / regression test bucket. I will look into it and see if we can share that 14:21:19 wilco: with aXe core, we're using a test library, I can imagine doing something similar 14:21:51 ... as long as the test snippets are accessible through an HTTP request and with metadata, there can be a way to test against it, does that make sense? 14:22:28 wilco: Anne, Stein Erik, thoughts? 14:23:01 anne: I'm thinking that might be a place to send my developers instead of me :-) 14:23:46 wilco: we can either try to take all the test cases and put them in a single repository, but that would probably be difficult to maintain over time 14:24:13 charu: I was wondering, are we planning to vet all this cases at some point? we have a whole bunch from different places 14:25:02 ... say we are testing for 1.1.1 and say the test cases for OOA pass but those from QuailJS fail. 14:25:31 ... do we plan to vet the cases and have a list of valid cases that can be used? 14:25:59 wilco: I'm pretty confident in the cases which comes from WCAG, but for the rest I don't know much 14:26:33 charu: Va11ys are not really test cases but just HTML elements, we may hve to do some formatting. They're not test cases, they're valid code samples 14:26:48 ... we could create test cases off of them 14:27:06 wilco: what's the difference you see between test cases and code samples? 14:28:06 charu: you want several test cases for a single SC, code samples are more about elements 14:28:56 wilco: ok, each of them is it's own test case, but we'd need to combine these 14:29:26 ... it seems to me that the number of cases already makes a reasonable list, the test samples would be valuable 14:30:22 wilco: what would be the next step for us? we have a reasonably-sized bunch of code snippets for HTML elements, how to make a single suite of tests from that? 14:31:32 romain: maybe we need to look at the tests format first, how easy it is to run from tools, and then try to see if it's easy to transform theses test material? 14:31:52 wilco: I think we didnt't want to touch the tests from the various projects 14:33:41 anne: I didn't have the time to look at the list of the tests, but there might be a difference between tests targetting a single technique and tests for a whole SC? 14:34:17 anne: our tests are linked to a technique, I guess some others might be linked to a SC, so there might be some differences in the granularity of the tests 14:34:27 ... we have to make sure they're aligned 14:34:53 wilco: we want to get a big chunk of test cases and map them to SC 14:35:01 ... from there we can map to the Rules 14:36:28 stein erik: do we have a way to link techniques to SC, to say if the they are mutually exclusive? 14:36:59 ... there are cases where implementing 2 different techniques could actually cause problem for another technique. do we have any way to describe these relations? 14:37:15 wilco" can we have an example? 14:37:22 stein erik: I will come up with an example 14:38:07 wilco: I can think of it as an accessibility support issue, so I would say no. but otherwise we would address that into the rule itself, so probably shouldn't be part of the test case 14:38:16 ... I have a suggestion: 14:38:56 ... if we take all of these tests repositories, can the owners create a metadata file that describe where to get the HTML files and what the expected outcome should be? 14:39:20 ... if we document that in a common format I would say we'd be able to run the tools against the tests automatically 14:39:39 anne: Can I take one step back and look at an example? 14:40:00 ... for 2.4.5 ("multiple ways") there is a need to combine techniques 14:40:17 wilco: right 14:40:52 ... the granularity of our tests is at the SC level, so we'd need an example or either using both or the example would be a violation. 14:41:36 wilco: charu, WDYT of a metadata file for the test suite? 14:41:51 charu: it seems it can be useful, I'd like to see an example 14:42:16 ... if some kind of scripting is able to get the locations of all the files, it can be useful 14:43:05 wilco: charu, do you have some time maybe next week to work on a proposal for this? 14:43:26 charu: I don't think I can commit to that... 14:43:56 wilco: I don't think we're on a super hurry with that one. for now let's just create an issue to start working on this 14:44:44 ACTION moe to create an issue to kick-start the work on a test metadata description file 14:44:44 Error finding 'moe'. You can review and register nicknames at . 14:45:07 zakim, take up next 14:45:07 agendum 2. "ACT benchmark https://www.w3.org/WAI/GL/task-forces/conformance-testing/wiki/Benchmark_requirements" taken up [from maryjom] 14:45:51 [document reads: "coming soon..."] 14:46:18 wilco: the deliverable is to write up what exactly we want to do with benchmarking rules 14:47:31 ... the way I see benchmarking: there's gonna have to be a way for real-world websites, reviewed by acessbility experts, evaluate those on a SC per-page level, and evaluate those through the rules 14:47:45 ... when doing that multiple times, we get a sense of how accurate is the rule 14:48:10 ... how often do the rules pick up an a11y violation, how often they don't pick up a violation 14:48:18 ... then improve them over time 14:49:23 romain: one issue is that it's gonna be a problem to document a benchmarking session if the real-world site evolves afterwards 14:49:41 wilco: it's gonna take a lot of resource to do the evaluation 14:50:00 ... you're gonna need to work with fairly recent data on that 14:50:40 anne: we have a bunch of customers who are happy to report when we have false positives 14:50:59 q+ 14:51:14 ... we don't see all the weird edge cases in many sites, so we need to test many sites to identify some issues 14:51:27 ... sometimes an issue pops up after several years 14:52:22 stein erik: there are some challenges with what you mention, if you only do the evaluation per-page and per-SC, there is some degree of uncertainty on whether the manual check and automated check apply to the same problem 14:52:44 ... we need to see what's efficient and what's effective and balance that accordingly 14:52:50 ack s 14:52:54 ack c 14:53:19 charu: all our internal product teams use the tools, so we collect all the edge cases, false positives, and go from there 14:53:28 ... a custom base, using the rules, is essential 14:53:55 ... I don't know if there is a way to capture everything in a test site that you create, there are so many different possible scenarios 14:54:20 wilco: right, that's why the benchmarking should be based on existing web sites 14:54:32 ... there are a lot of unknowns 14:54:50 ... we're all relying heavily on user feedback to improve the quality of our tools 14:55:00 ... it's a kind of whack-a-mole 14:55:23 anne: when a11y consultants use our tools, they'll find other things than when our users use them 14:55:39 ... there could be a combination of looking into the tool and into the site 14:56:16 ... the a11y consultants are more technical 14:56:47 wilco: all these points about the granularity and up-to-date data are absolutely valid 14:56:58 [wilco stuck on his own thoughts...] 14:58:08 wilco: if we do our benchmarking by putting the rules in a draft or proposal mode, and stay like that until they're implemented and we get enough feedback on them 14:58:23 charu: I like that idea, we do something similar to that. 14:58:34 ... some rules are in 'beta' 14:58:53 wilco: what is then the point when you're taking them out of beta? 14:59:15 charu: maybe 3 months or so, to see if we need any tweaking or if they're efficient 14:59:32 wilco: so a fixed time period? certain rules are applied more frequently than others 15:00:01 charu: good point. we just started this recently... 15:00:32 anne: after internal testing we just usually release them and then wait for some customer testing 15:00:48 ... we have to put it out to the real world to get reald insight 15:01:07 wilco: with axe we played around the idea of having experimental rules 15:01:25 ... I will take these ideas and will work on a proposal for the benchmark 15:01:33 ... in the mean time, worth noting: 15:01:51 ... we have called for consensus, our FPWD has been approved! 15:02:09 ... shadi will publish it tomorrow or wednesday 15:03:00 shadi has joined #wcag-act 15:03:36 trackbot, end meeting 15:03:36 Zakim, list attendees 15:03:36 As of this point the attendees have been Wilco, MaryJoMueller, cpandhi, shadi, Kathy, rdeltour, MoeKraft 15:03:44 RRSAgent, please draft minutes 15:03:44 I have made the request to generate http://www.w3.org/2017/04/03-wcag-act-minutes.html trackbot 15:03:45 RRSAgent, bye 15:03:45 I see no action items