14:01:24 RRSAgent has joined #testing 14:01:24 logging to https://www.w3.org/2020/10/30-testing-irc 14:01:31 present+ 14:01:38 RobSmith has joined #testing 14:01:55 present+ 14:01:58 present+ 14:02:06 present+ 14:02:12 jes_daigle has joined #testing 14:02:32 Present+ 14:02:32 RRSAgent, make logs world-visible 14:02:34 krosylight has joined #testing 14:02:39 Zakim has joined #testing 14:02:45 ScribeNick: smcgruer_[EST] 14:03:27 RRSAgent: make minutes v2 14:03:27 I have made the request to generate https://www.w3.org/2020/10/30-testing-minutes.html jgraham 14:03:31 fantasai has joined #testing 14:03:40 present+ 14:03:47 present+ 14:04:04 boazsender has joined #testing 14:04:08 present+ 14:04:29 dom: Session is not being recorded as it is mostly discussion 14:05:26 leobalter: This is a follow-up discussion from TPAC sessions last week 14:05:44 leobalter: Topic is what coverage means for WPT, how we measure it, and how to reach out to new contributors 14:06:36 leobalter: I have a lot of background with test262, extracting information from tests and specs 14:06:37 q+ to share experience tracking coverage in WebRTC spec 14:06:50 ... first suggestion is to generate test plans, list observable parts 14:07:21 ... no-one wants to add complexity, create something to help new contributors instead 14:07:46 ... test-plans help guide new contributors, and help increase coverage 14:08:07 q+ 14:08:13 ... in ecmascript, I always did manual test plans, not automated. For the specs we are talking about here it may be challenging. 14:08:26 q+ 14:08:46 ... that is all the slides I have, so open the floor to discussion 14:08:48 q? 14:08:51 ack dom 14:08:51 dom, you wanted to share experience tracking coverage in WebRTC spec 14:09:26 dom: wanted to share my experience with webrtc wg, tracking coverage in WPT for it 14:09:49 ... you can enable a 'test annotation' toggle in the webrtc spec UI, spec will be annotated with green and pink sessions 14:09:59 s/sessions/sections 14:10:10 ... green sections have associated WPTs, pink (or red) do not 14:10:36 ... based on heuristics, so not perfect but can be updated manually to fixup problems 14:10:49 where can I find current status/spec of WPT? Is there any url? 14:11:14 leobalter: is this based on tests pointing to spec? 14:11:21 dom: no, spec points to tests via respec 14:11:21 Jemma: https://github.com/web-platform-tests/wpt and https://web-platfrom-tests.org 14:11:46 s/https://web-platfrom-tests.org/https://web-platform-tests.org/ 14:11:50 dom: I am also involved in reffy, which crawls specs and can identify items from them (e.g. idl definitions, which are exported to WPT) 14:12:08 also wpt.fyi for test results 14:12:09 ... as part of reffy, we have some very early results on simplifying cross-linking between tests and specs 14:12:26 ... currently it goes in both directions, but is cumbersome and no way to automatically update. Hoping to discuss later this year. 14:12:28 q? 14:13:37 leobalter: This seems very useful. I want to contribute tests to WPT, something like that helps guide me (albeit I would likely turn it into a test plan) 14:13:42 ack florian 14:13:58 florian: In CSSWG, we have done something similar to what dom has shown 14:14:09 ... we have metadata in the tests, pointing back to sections of specification 14:14:18 ... we have been fairly thorough about having this metadata 14:14:28 ... often point to multiple sections or specs, when testing interaction 14:14:42 florian: in the other direction, pointing from spec to test, we use bikeshed which also allows this 14:15:04 ... does not have the heuristics, just manual adding, but does let you 'watch' a directory in wpt and warn you if new tests appear that aren't listed 14:15:16 ... maintenance is somewhat cumbersome, but does give a good sense of coverage 14:15:41 florian: UX wise, bikeshed does not let you toggle the test view on/off dynamically (have to rebuild spec), but open issue to fix that 14:15:44 [for clarity, the heuristics bits I showed in the webrtc spec have been added to the WebRTC spec only, not in ReSpec as a whole https://github.com/w3c/webrtc-pc/blob/master/webrtc.js#L1 ] 14:16:17 florian: My experience with coverage; binary coverage is good for detecting no coverage at all, but hard to determine how much coverage is enough when there's some coverage 14:16:24 +1 14:16:27 +1 to Florian on evaluating detailed coverage needing more precision 14:16:28 ... maybe we should go with yellow when it has some tests, and require a human to mark it green 14:16:37 +1 14:16:50 florian: For getting specs to a better state, I do create manual test plans. Right now doing it on CSS text level 3 14:16:59 ... a thousand tests or so, so tedious (but getting there!) 14:17:00 q? 14:17:41 florian: Clarification - the tedious part is the double-tracking of metadata, having to update them, having to have PRs reviewed, etc 14:17:50 ... not the tests themselves! 14:18:17 leobalter: Sounds like a big part of the pain is having a lot of manual work 14:18:38 leobalter: The problem of having many tests per paragraph is also interesting, shows the diff between human language and tests 14:18:40 q? 14:19:09 ack jgraham 14:19:16 ack jgraham 14:19:20 ... think we should try to have these lists, but try to automate them 14:19:29 I think also just listing tests isn't enough to understand coverage, you have to understand what cases the test is covering. For example, we had tests for border-radius clipping content, but we didn't have any for clipping replaced elements. 14:19:49 jgraham: I wanted to ask how this works for test262. 14:20:00 ... seems ecmascript spec is very formal and explicit style, which likely means tooling is easier to write 14:20:09 ... whereas other web-platform specs tend to be more diverse in style 14:20:24 ... not always formal, different editors make different decisions 14:20:33 ... heuristics might end up having to be spec-specific? 14:20:35 q+ 14:20:41 ... which is a lot of work. 14:20:57 ... But maybe I've misunderstood, would be interested in how test262 works 14:21:34 leobalter: Actually, test262 does not have any sort of automation around their annotation 14:22:20 ... even with 5 years of experience in test262, no way to do it yet! But in discussion with current editors to formalize more of the ecmascript test to make it easier to do. 14:22:28 ... so far all test plans were manual work 14:22:39 q? 14:23:22 qq+ jgraham 14:23:40 leobalter: So overall, I would say that ecmascript is far behind what the rest of the web-platform has 14:23:42 ack jgraham 14:23:42 jgraham, you wanted to react to jgraham 14:24:05 jgraham: When you wrote the test plan for test262, was the goal to write tests for ??? 14:24:25 ... for web-platform, we usually ask browser engineers to write the feature and the test 14:24:46 ... so its different than when you have a dedicated QA team who have the time to write a dedicated test plan, take the necessary time, etc 14:24:58 leobalter: Can you clarify the question? 14:25:25 jgraham: Was your role when working in test262 QA-specific (writing tests), or also developing features? 14:25:43 leobalter: Mostly QA, I was a test262 maintainer. Using my time to faciliate others to write tests. 14:26:04 ... at tc39, whoever champions proposals have to write tests, but people aren't ??? 14:26:22 ... test262 is also slightly more formal than WPT as it has required metadata/etc 14:26:31 florian: Jumping in; WPT has metadata but its optional 14:26:45 ... I think the fact that WPT is mostly feature-implementor written, causes this difference 14:27:08 ... from the CSSWG, we used to write tests before browsers were even sharing their tests, so it was a QA-driven effort outside the browsers and so metadata was part of the culture 14:27:18 ... once browsers joined in, CSS kept its metadata culture 14:27:34 ... this didn't hold for the rest of WPT, where for much of it its more browser driven and far less metadata 14:27:36 q? 14:27:38 ack florian 14:28:07 leobalter: Grain of salt, but feels like browser engineers don't want to add metadata? 14:28:10 q+ 14:28:12 q+ to discuss tooling ideas 14:28:26 jgraham: I stand by not requiring metadata 14:28:35 ... getting browser vendors to contribute at all requires reducing the friction to writing tests 14:28:56 ... even now we have vendors with a substantial fraction of tests not shared due to friction even without metadata required 14:29:00 q? 14:29:03 q+ to share feedback I've heard from would-be external contributors (re the other topic) 14:29:15 Browser engineers have learned to comment their functions to explain to future engineers what it's supposed to do. They should also be able to comment their tests for the same reason. 14:29:16 ... even early in css/ tests, people didn't want to deal with the metadata when upstreaming large numbers of tests 14:29:21 q? 14:29:24 ack florian 14:29:25 Sure, some really simple tests are self-explanatory. But many aren't. 14:29:59 florian: Going back to annotating specs, I would welcome some automation but think we should be careful about which parts are most useful 14:30:02 If a test is to be maintainable, you have to understand what it's trying to test. Then if behavior changes in either the test or the infrastructure that sets it up, you can adjust the test without losing coverage. 14:30:08 You can't do that if you don't understand what it's covering. 14:30:17 ... reasonably easy to write heuristics to check if you have tests for things like idl blocks, css property defs, etc 14:30:18 And for a lot of tests, it isn't obvious. 14:30:34 ... but if we write this, we may cause people to write tests for syntax not behavior 14:30:54 q+ to mention a heuristic I've used for behaviors (which could use better formalization) 14:31:06 ... concerned that these syntax tests may lead to bad choices, e.g. due to PR pressure 14:31:14 ... as far as I know not happening a whole lot right now, but has happened in the past 14:31:21 ... want to avoid writing shallow tests 14:31:22 q? 14:31:26 ack dom 14:31:26 dom, you wanted to discuss tooling ideas and to share feedback I've heard from would-be external contributors (re the other topic) and to mention a heuristic I've used for 14:31:29 ... behaviors (which could use better formalization) 14:31:37 rtoyg_m2 has joined #testing 14:31:37 dom: Agree with florians last point on shallow tests 14:31:55 ... heuristic for webrtc spec is for algorithmic scripts not just idl 14:32:13 ... but no formal shared markup for algorithmic content; could be a space where work with reffy could help 14:33:05 dom: One idea I have for webrtc, is when you do a PR it should tell you "oh this section has this test, please look at that test". 14:33:13 mjasso has joined #testing 14:33:15 ... not sure if there are active conversations around such tooling? 14:33:27 r12a has joined #testing 14:33:31 dom: Want to discuss onboarding new contributors to WPT as well 14:33:48 ... this is a topic which has received some strongly worded feedback so lets make sure we discuss 14:33:49 q? 14:34:46 dom: One feedback we've heard several times (in particular one contributor) is that with the new automated submission from browser vendors (which has been positive), it feels harder as an outsider to contribute 14:34:48 q+ 14:34:58 ... There's a large queue of PRs that don't get approved rapidly, so creates a two-tier system 14:35:02 ... feels unwelcoming to new contributors 14:35:09 ... want to share that feedback from a motivated person 14:35:11 q? 14:35:40 leobalter: I've tried to mentor people to contribute to test262 14:35:47 ... most questions are 'what should I test' 14:35:57 ... so I think it comes back to coverage as well, in terms of a shared test-plan 14:36:34 ... think that helps (a) avoid shallow tests, and (b) helps new contributors 14:36:38 q? 14:37:31 leobalter: Browser vendors tend to have a better understanding of the depth of a feature, because they did the implementation. New contributors tend to need some guidance to avoid writing shallow tests. 14:38:17 ... figuring out the set of tests to write takes a lot of time, and I'm used to it 14:38:24 On the other hand, browser vendors often don't write the obvious tests so sometimes there's entire "shallow" areas that are completely untested... you need both 14:38:39 ... identifying gaps is one of the best ways to get people involved 14:38:43 Like, we have tons of tests on box-shadow parsing, but had hardly any on rendering 14:38:58 q? 14:39:07 ack jgraham 14:40:00 jgraham: Agree with florian's point about the backlog queue of PRs, difficult to get people to review them, and its nobody's job to review them 14:40:02 Also a long queue of "missing-coverage" issues: https://github.com/web-platform-tests/wpt/labels/type%3Amissing-coverage 14:40:21 jgraham: Need to find an incentive structure 14:40:49 ... for browser vendors, this is 'avoid web compat issues', or 'improve platform', but note until we made it easy they still didn't do it. 14:41:23 ... so we made it possible inside their incentive structure; put it inside their existing systems where they are incentivized to contribute 14:41:41 +1 14:41:50 jgraham: You can argue we've tried hard for the PR review problem; we have tooling to assign people, etc. Try to get it into workflow, etc. But limited success. 14:41:53 ziransun has joined #testing 14:42:15 ... if we really want to make progress, how do we make it so there exists an incentive for people to actually review PRs? 14:42:41 ... One problem - there's no core WPT team for the *tests*. Every spec has its own set of experts. 14:43:00 ... I can't review a PR for a css spec, because I don't have the expertise. The CSS reviewer probably can't review a PR for HTML or DOM. 14:43:11 ... so need to bring a lot of people onboard to make the situation good 14:43:29 ... most of those experts are already paid to work on a browser engine, so they have their incentives 14:43:51 ... so far our attempts are basically 'send emails saying this PR is assigned to you' and people ignore them 14:44:14 jgraham: I don't want to defend the existing situation, but if we want to make it better we need to have a plausible story about why its going to work this time 14:44:33 dom: To be clear, I appreciate this is a challenging issue 14:44:34 q? 14:45:03 dom: Need to align our recommendation with our ability to have people review 14:45:11 ... leo's point was that having test-plans would help, I agree 14:45:15 q+ 14:45:50 q+ 14:45:55 ... but also need to realize that we'd better have a good story on how to make contribution's meaningful. Maybe associate test-plans with attached people who are welcoming the contributors. 14:46:09 ... one thing that Mike and I discussed was to have each WG have an onboarding person assignedto it 14:46:34 ... I don't know if that would be enough, onboarding skills might not mean reviewing expertise 14:46:45 ... but generally having someone to smooth the challenges may recreate incentive structure 14:46:46 q? 14:46:58 ack jgraham 14:47:09 qq+ jgraham 14:47:57 [to be clear, my proposal would not be that the onboarding person would do the reviews - they would ensure the reviews get done by their fellow group participants] 14:48:05 leobalter: If someone has the skills to guide new contributors, it seems like they should be writing tests instead. Reviewing PRs from new contributors can command extra time for getting it done. 14:48:25 ... so much to explain to contributor 14:48:51 [agree that writing good tests is hard - if we don't think feel that's something newcomers can meaningfully contribute to, we should also be clear about it :) ] 14:49:40 ... have to be careful about causing burnout in people reviewing (and contributors) 14:49:41 q? 14:49:43 ack jgraham 14:49:43 jgraham, you wanted to react to jgraham 14:50:26 [scribing doesn't leave much time for this, but my question would still be - do we *specifically* know who the people are we hope to review PRs? My suspicion is that its mostly people paid by browser vendors, and those browser vendors have... made their decision? Would we need to push for a cultural shift in the browser vendors to achieve this?] 14:50:31 I learned to write tests by creating simplified testcases for specific bugs. Maybe that's easier than starting from a spec and trying to fix coverage? 14:51:03 jgraham: Think over the last decade, industry moved to a culture where engineers write the tests, and they find doing test plans extra work that they aren't rewarded for. 14:51:14 ... even if you believe its in their long term interests, they don't see thast 14:51:29 ... doesn't mean I don't think we should do it, but we need to align interests 14:51:30 q? 14:51:32 ack florian 14:51:38 florian: Think there's a number of things we can do 14:51:45 ... friction of using WPT is reduced but far from zero 14:51:50 ... CI takes too long, fails too often 14:51:53 ... documentation is far from perfect 14:52:09 q+ to ask smcgruer_[EST]'s question 14:52:22 florian: But when it comes to inviting newcomers, disheartening to have people get through this friction and write the PR, but then have nobody review it. 14:52:29 ... so people just don't come back after their first contribution 14:53:00 ... sadly I think the onboarding person isn't going to work - I know everyone in CSSWG and yet I still couldn't get reviewers 14:53:10 ... can we just get W3C budget to fund someone to review test PRs? 14:53:16 ... but money doesn't fall from trees :) 14:53:33 q? 14:53:36 ack jgraham 14:53:36 jgraham, you wanted to ask smcgruer_[EST]'s question 14:53:41 qq+ jgraham 14:53:52 ack jgraham 14:53:52 jgraham, you wanted to react to jgraham 14:54:09 jgraham: Asking question that smcgruer_[EST] posed, but he's scribing so asking for him 14:54:14 msanchez has joined #testing 14:54:31 q+ 14:54:43 ... do we specifically know who the people are we want to review PRs? Seems mostly people who are paid by browser vendors, who seem to have decided not to do this. Should we change internal structure or is there another pool of people we're looking at? 14:54:54 florian: Not all spec editors work for a browser vendor, but many are 14:55:11 ... and nearly all spec editors seem uninterested, whether they are paid by browser vendors or not 14:55:12 q? 14:55:15 ack r12a 14:55:34 jgraham: Quick follow-up; spec editors are experts, implementors are experts, and then we're out of experts 14:56:28 r12a: Apologize for arriving late, but want to raise question: do we need to review tests? Could we just accept the tests. 14:56:40 q+ 14:57:05 ... one of three things happens: the test is ok, the test gives a false-positive (maybe people would come back and spot this), or the test doesn't work properly and delivers a false result 14:57:15 ... for this last one, the browser developer would triage this and see the test is bad 14:57:33 ... just an idea, throwing it out there :) 14:57:47 ... since we have an unmovable problem in terms of getting folks to review 14:57:48 q? 14:57:50 ack jgraham 14:57:59 jgraham: I have previously advocated for this eventual consistency 14:58:00 [in fact, I've found a surprising number of broken tests in the WebRTC test suite that no implementer flagged as problematic for a reason or another] 14:58:03 q+ to respond to r12a 14:58:14 ... think it has some place, but if you have a high fraction of broken tests, folks get grumpy and lose faith in the test suite 14:58:28 [Note: on the Chromium side already, we have teams pushing back against WPT saying there are too many bad tests - in their view] 14:58:37 [Which of course they think other people always added ;)] 14:58:51 q? 14:58:53 ack florian 14:58:53 florian, you wanted to respond to r12a 14:59:05 florian: There are different types of failures even within the categories you listed 14:59:14 ... tests that always pass - annoying but ultimately ok 14:59:25 ... but also tests that misunderstand the spec and work, but works wrong. Doesn't test the spec. 14:59:48 ... so people may fix their implementation to match the test, not the spec 14:59:56 ... not sure if review today actually catches this, but feels like a danger 15:00:26 [I wonder if there is a distinction to be made between new ests that fail current browsers vs those that don't (which the CI already identifies, although not very explicitly)] 15:00:42 florian: Note that a few folks in CSSWG today, myself included, have the power to merge tests without review 15:01:00 jgraham: I want to make clear - we are merging *many* PRs. They're just mostly (60%) coming from browser vendors 15:01:06 q+ to invest in dashboard / monitoring 15:01:14 ... so to me this is not existential, its just that we're not doing as well as we could be for new contributors 15:01:18 ack dom 15:01:18 dom, you wanted to invest in dashboard / monitoring 15:01:31 dom: Do we know if this is a uniform problem across all specs, or for specific specs? 15:01:52 ... if it is specific specs, maybe we can do something about those specs specifically 15:02:00 ... generally want a better understanding of the queue 15:02:28 Maybe we can merge "unreviewed" PRs automatically, but label them as so either in the filename or the content, and surface them through manifest. 15:02:29 jgraham: I know smcgruer_[EST] has some overall stastistics, but not per-spec 15:02:32 ... also we're two minutes over-time 15:03:12 leobalter: I've learned a lot here about the bigger picture, which is great 15:03:12 A good test * passes when it's supposed to pass * fails when it's supposed to fail * tests the thing it thinks it's testing. 15:03:27 +1 15:03:45 ... think there's challenges, there's work to do, but also things we can do and make progress 15:03:51 ... hope we can formalize a consistent format for test plans in the future 15:03:54 ... but baby-steps first 15:04:13 dom: we used to have pullpanda, which is being turned down, but even that didn't allow us to filter out e.g. auto-exported PRs which is something we really need when analyzing reviews 15:04:27 dom: Ok, thanks everyone. Thanks leo for organizing, good discusson. Follow-ups to happen in #testing channel, public test infra mailing list, and WPT issues 15:04:39 RRSAgent: make minutes v2 15:04:39 I have made the request to generate https://www.w3.org/2020/10/30-testing-minutes.html jgraham 15:05:02 RRSAgent: stop 19:35:41 RRSAgent has joined #testing 19:35:41 logging to https://www.w3.org/2020/10/30-testing-irc 19:36:14 RRSAgent:stop 19:37:39 Meeting: wpt-coverage 19:37:47 RRSAgent: make logs v2 19:37:57 RRSAgent: bye 19:38:05 RRSAgent: make logs public 19:38:22 RRSAgent: publish minutes v2 19:38:22 I have made the request to generate https://www.w3.org/2020/10/30-testing-minutes.html jgraham 19:38:29 RRSAgent: bye 19:38:29 I see no action items