09:11:51 RRSAgent has joined #testing 09:11:51 logging to http://www.w3.org/2016/09/22-testing-irc 09:11:53 weinig has joined #testing 09:11:56 boazsender has joined #testing 09:11:56 Zakim has joined #testing 09:12:27 present+ Boaz 09:12:37 present+ SamWeinig 09:12:44 present+ MikeTaylor 09:13:00 scribe: boazsender 09:13:00 present+ 09:13:05 scribe: boazsender 09:13:06 present+ JohnJansen 09:13:07 present+ NavidZolghadr 09:13:13 present+ gsnedders 09:13:15 rrsagent, make logs public 09:13:18 present+ jgraham 09:13:24 present+ rbyers 09:13:30 miketaylr from moz on web compat 09:13:45 mikesmith from w3c, no testing wg, but mike is responsible for testing for w3c 09:13:45 scribenick: boazsender 09:13:51 https://www.chromium.org/blink/platform-predictability 09:13:54 diminic from google working on blink 09:14:03 platform predictability 09:14:08 rick byers from google working on web predictibility initiative 09:14:19 s/diminic/Dominic/ 09:14:19 nab at google working on blink 09:14:33 miketaylr has joined #testing 09:14:33 s/nab/Navid/ 09:14:45 john jahnsen from microsoft working on platform testing, layout, f12 and a11y, and OS testing. 09:15:07 jeffrey from google trying to sort out the huge mess of testing 09:15:12 s/jahnsen/jansen/ 09:15:14 frank olivier 09:15:16 s/jahnsen/jansen 09:15:17 s/jeffrey/Geoffrey/ 09:15:18 frank from microsoft edge team 09:15:28 s/jeffrey/geoffrey 09:15:32 s/jeffrey from google/Jeffrey on contract with Google/ 09:15:39 s/from google/Jeffrey on contract with Google/ 09:15:40 present+ 09:15:40 shane working on a11y, aria 09:15:44 Simon Pieters, Opera 09:16:01 simon from opera, edits specs html css, and writes web platform tests 09:16:14 RRSAgent, make minutes 09:16:14 I have made the request to generate http://www.w3.org/2016/09/22-testing-minutes.html MikeSmith 09:16:30 james from mozilla, responsile for the most hated parts of the testing infra (testharness.js, wpt serve) 09:16:35 RRSAgent, make logs public 09:16:57 boaz sender from bocoup, test 262, v8 conformance 09:17:07 s/html css/html cssom/ 09:17:17 John: update on wpt at msft 09:17:41 john jansen: our team runs our internal test runner. we have a test runner, an analyzer and a documenter 09:17:51 s/Geoffrey from google/Geoffrey on contract with Google 09:17:55 RRSAgent, make logs public 09:17:59 RRSAgent, make minutes 09:17:59 I have made the request to generate http://www.w3.org/2016/09/22-testing-minutes.html MikeSmith 09:18:00 the test runner runs about I think 50 different OS test suites 09:18:24 when we identify one we want to run, we fork it, bring it into our share and update it weekly 09:18:41 i/miketaylr from moz/Topic: Intros 09:18:48 RRSAgent, make minutes 09:18:48 I have made the request to generate http://www.w3.org/2016/09/22-testing-minutes.html MikeSmith 09:18:50 anything that uses testharness.js we can run for free 09:19:23 i/update on wpt at msft/Topic: Update on wpt at Microsoft 09:19:24 we run them, get the results, parse them out and moved into the analyzer where we store the results on in the DB 09:19:37 RRSAgent, make minutes 09:19:37 I have made the request to generate http://www.w3.org/2016/09/22-testing-minutes.html MikeSmith 09:19:43 then a bot spins up, asks what branch/build are you run on, is it the same as yesterday 09:20:25 we have an algorithm for flagging flakey tests based on day to day performance to be evaluated 09:21:05 initially we ran the entire WPT repo without virtualizing the servers, so it would take 6 hrs, we'd run out of diskspace 09:21:26 it became very intensive, so we broke them out into multiple virtual test suites 09:22:02 we also run our tests in chrome for interop. 09:22:21 so, if we have a failure, we run it in chrome and if it also fails, then we deprioritize it 09:22:45 ... if it fails in all browsers, we say 'maybe ths is a conversation to have at the w3c' 09:22:47 q? 09:22:54 and then we have a report 09:23:19 q: what is your process for new tests, asks rick from google 09:23:33 john jansen: when we decide to implement a feature... 09:23:53 our team is structured such that we have leads who are responsible for new features 09:24:16 we start by taking tests for that feature and running it another browser that has implemented it 09:24:49 ideally, as we implement, we write tests as we work on the feature, and the idea is that they will be contributed to WPT 09:25:08 but its hard, because its so much easier to write tests for internal tool chain and not WPT 09:25:32 at blink, test harness js is a huge step up 09:26:09 new q from blink: if we're talking about implementing a feature, is the quality of WPT tests for the feature a factor 09:26:27 Jansen: no, customer feedback is the bigger driver 09:26:46 never seen a dev lead say no to a feature bc the test suite sucks 09:27:00 byers: can you share your test data 09:27:05 jansen: yea 09:27:23 jansen: im a strong advocate for sharing info 09:27:42 still need to work through the noise of the data 09:27:56 we have a lot of data for 'x test case failing in y browser' 09:28:12 but we dont have the analysis of does the test case match the spec 09:28:29 also, test coverage across directories 09:28:45 simon: the useful data would be which tests are flaky 09:29:07 james graham: i can give you that list... its the lists of tests disabled in gecko 09:29:29 jansen: does mozilla publicly show the disabled tests 09:29:40 ok, now, heres what we do at mozilla 09:30:12 at moz for gecko, we run the entire test suite apart from the disabled tests in CI on every commit... ish 09:30:27 for tests that almost never fail, we run them on 1 in 10 commits 09:30:36 ... this is used for regression detection 09:31:00 we can see expected data files for x fails in y environment 09:31:06 we update our tests ocassionally 09:31:22 once every week or couple of weeks I kickoff the update process 09:31:42 I pull down the latest tests, run it in latests, use that to update the expectation files 09:32:13 when we downstream stuff we also upstream stuff, so it is currently possible to add tests as though it was internal, and it gets upstreamed 09:32:29 having that has significantly improved test contribution 09:32:43 we dont have any reporting about which tests are we failing 09:33:03 so ive been getting haranged by boris to report on which passing tests have started to fail 09:33:30 at the moment its up to individual developers to look at tests and keep them up to date 09:33:40 ... and it happens sometimes, but never for legacy features 09:33:57 or if they are I dont know, and theres no structured way to go about it 09:34:02 rick now on blink: 09:34:10 we've long run a subset of the tests 09:34:21 we dont yet have an automated upstreaming process 09:34:49 every week, or more, we do an import of a test, disable failing tests, and then those just form a list of tests failing 09:35:06 Ms2ger has joined #testing 09:35:10 we are now running about 1800 of about 9k of the test files on every commit 09:35:15 otherwise it is ad hox 09:35:18 *hoc 09:35:28 s/hox/hoc/ 09:35:28 i/what we do at mozilla/Topic: Update on wpt at Mozilla 09:35:46 some teams have put a lot of effort into going back and looking at the tests for their features 09:36:00 some teams are writing tests as they go on new features (pointers, storage) 09:36:13 but its tough because there are multiple copies 09:36:14 i/rick now on blink/Topic: Update on wpt and Blink 09:36:19 ... we dont use WPT server 09:36:27 RRSAgent, make minutes 09:36:27 I have made the request to generate http://www.w3.org/2016/09/22-testing-minutes.html MikeSmith 09:36:42 for all of our tests, like moz, we have expected results 09:36:47 we dont run in any other browsers 09:37:07 we run in a cut down browser called "content shell" with a bunch of flags turned on 09:37:26 so im curious, does edge test the bleading edge of chome or stable? 09:37:34 Jansen: stable 09:38:17 and teams, yeah so, my team was writing the service worker tests, and we started using testharness.js from the get go and it was great 09:38:23 web componants is doing that too 09:38:49 big problem is the massive legacy of tests we have (~30k files ??) that use our legacy runner 09:39:03 but the biggest problem is upstreaming our tests 09:39:16 having a separate repo is tough 09:39:48 rego has joined #testing 09:39:54 rick again: many of the pointer events are manual, because there is no way to gen synthetic events 09:40:06 so we have a -manual convention in the file name to find them 09:40:08 Blink's flakiness data for imported test suites: http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=%5Eimported%2F 09:40:52 we're working on a tool to make upstreaming easier, so that a blink dev can write tests with their product feature, get code review, and land it and then auto open a pull request automatically 09:41:22 james: we auto merge the branch... because ppl complain that PRs sit there for ever 09:41:34 q: how do you auto run manual tests 09:42:33 rick: for manual tests we have special internal tools to automate them 09:43:03 ... but eventually we want to move them to chrome driver, and eventually webdriver 09:43:08 ... *maybe* 09:43:32 rick: theres always going to be things that we cant get into webdriver, or will take a long time 09:43:49 RRSAgent, make minutes 09:43:49 I have made the request to generate http://www.w3.org/2016/09/22-testing-minutes.html MikeSmith 09:43:50 like in chrome, we have window.internals that lets you mod geo location, for example 09:44:30 Present+ FrankOlivier 09:44:35 also, we want to get to the point where features must have interop documentation, and that you need a test suite and info about how your tests are doing in other browsers, and that we could regect features that dont have that 09:44:44 Present+ DominicCooney 09:44:52 right now, engineers just put their hand over their heart and say they did it 09:44:55 and we dont ask 09:44:59 because its too hard 09:45:09 Present+ TessOConnor 09:45:18 dom has joined #testing 09:45:27 jansen: if youre implementing tests for feature foo, and we are, youre not publishing your code upstream 09:45:28 Present+ SimonPieters 09:45:34 Present+ MikeSmith 09:45:44 rick: no, we submit PRs on commit 09:45:51 so canary features have tests 09:46:11 at apple we pull other browsers' tests to look at them 09:46:39 james: im not concerned about test duplications 09:46:41 hows the temprature for everyone? 09:47:03 *caucophony of temp convo* 09:47:20 rick: we're trying to figure out where is the biggest bang for the buck? 09:47:51 adding 5k tests is non trivial (because we have a bots running on all commits), so we want to add the good ones first 09:48:09 james: I failed to mention that servo also uses these tests almost exlusively 09:48:27 ... so they are writing tests for bits of the platform that are behind the curve 09:48:50 mikesmith: that is pretty significant given that all the other browsers are so mature 09:48:55 \0/ 09:49:01 sam from apple: 09:49:36 the webkit testing strategy has been historically one of regression testing. we write tests not to check for correctness, but more to check if something has gone wrong 09:49:46 i/sam from apple/Topic: Update on wpt and WebKit 09:49:57 but it turns out correctness tests also help with regressions, so we've started importing other tests 09:50:06 we've started importing some WPT tests 09:50:30 we dont do this on any kind of regular basis, but people in that area tend to import as they work 09:50:42 we currently only run js and ref (automatic) tests 09:50:45 we skip manual tests 09:50:47 s/but it/samweinig: but it/ 09:50:55 we use the test harness as much as we can 09:51:10 we've been trying to go back and fix dom/html errors in the past year 09:51:34 its been very fruitful for finding areas where the specs were wrong, or have changed, etc 09:51:50 in terms of upstreaming, we havent done that, but I think we should 09:52:01 im going to go back and see if people are interested in it 09:52:07 we have problems with perf 09:52:13 todd has joined #testing 09:52:37 our tests are run on devs machines as they work, and then when you post a patch for review, our bots run them immediately and then again when the commit lands 09:53:17 in terms of what you all are talking about non standard way of interacting with the browser (eg webdriver type stuff) we have a few things we've done, like event sender 09:54:15 but as the browser has evovled we've moved away from that to a more async model (called something like async io), so now what we have is a pure js environment running in a separate process, so we take a sting of javascript that we can send to the ui process, 09:55:01 so if we were to standardize something looking to implement somehting like mouse events or scroll gestures, I think doing it in a way where you can separate the execution of the tests from the ui, and order the timing of them, that would be good. 09:55:10 maybe we could do that with network stuff 09:55:24 jameS: wpt serve is supposed to do that 09:55:46 for us at apple its more about the ui thread 09:56:02 s/maybe/dominicc: maybe/ 09:56:16 s/for us/samweinig: for us/ 09:57:09 for what google mentioned with th geolocation api, or other promts for users... because there is a nead for the test harness to mach geolocation, maybe as this goes forward maybe we should look at what the entry points are so we can have unified interfaces for testing these things 09:57:47 weinig has joined #testing 09:58:06 s/jamesS:/jgraham:/ 09:58:18 many of our internal testing interfaces dont necissarily make sense for standards though 09:58:21 s/jameS:/jgraham:/ 09:58:30 ... they test really internal state 09:58:37 ... that are not interoperable 09:58:50 RRSAgent: make logs 09:58:50 I'm logging. I don't understand 'make logs', gsnedders. Try /msg RRSAgent help 09:59:04 RRSAgent, make minutes 09:59:04 I have made the request to generate http://www.w3.org/2016/09/22-testing-minutes.html MikeSmith 09:59:14 with that in mind, it might be interesting for someone to go through in bits and see what all the browsers expose to their test harnesses, and see what we can standardize 09:59:17 q? 09:59:30 certainly the prompts/alerts will be there 09:59:39 ok, so thats what webkit does 09:59:52 coffee break 10:00:00
10:13:53 weinig has joined #testing 10:18:39 boazsender has joined #testing 10:19:13 miketaylr has joined #testing 10:19:35 TOPIC: what should we invest in? 10:19:55 rick: I think we want to improve developers lives 10:20:22 James: here are the 2 infrastructure things that should be done next: 10:20:26 q+ 10:21:43 1) some sharable way of writing the manual tests in a cross browser way. the plan for that would be to write a web driver api in js, that is then injected into WPT repo somewhere, and then somewhere in JS you could write a test wich can call out to a web driver server, which does web driver stuff, finishes, and resolves a promise on the client. 10:21:55 q+ to comment 10:23:00 boaz: defining the api is main thing, figure out when to go to the network etc. is separate 10:25:24 2) the second thing I think we want to work on is some way of getting insight into which tests are passing in specific implementations and which aren't 10:25:38 $ find . -name "*-manual.html" | wc -l 10:25:38 333 10:25:54 zcorpan: that's a sensible approach to get a number 10:26:30 it is not just about how many manual tests we have, because the problem is there are many requirements we have no tests for because nobody has or wants to write manual tests for them 10:26:56 zcorpan: 888 manual tests 10:26:59 zcorpan: many aren't HTML 10:27:23 the concern with number 2, is with vanity metrics 10:27:40 this should not be a marketing report 10:28:01 i get 892 for "*-manual.*" 10:28:19 zcorpan: I'm looking at what the manifest has in it 10:28:20 the great thing about WPT is that it's massive, crowd sourced, and its hard to game. 10:28:24 david_wood has joined #testing 10:28:34 present+ 10:28:46 ^ jansen says 10:29:21 Sam: the worry is that some browser is going to be most compatible, and will be "winning" 10:30:45 James: if theres a tool that allows you to show the latest results on a test directory resolution, then we dont telecast a competitive vanity metric 10:30:55 q+ boazsender 10:31:30 ack dcooney 10:31:47 dcoony agrees that the front end shouldn't roll up the test results 10:32:16 dcooney: what I want at my desk, is here is the thing that will increase compat the most today. 10:32:35 triblondon has joined #testing 10:34:21 a second vector for this central system of test passing is API consumption telemetry 10:35:50 dcooney: if we want to reduce developer pain, moving testing into a client side js web driver lib would help this. 10:36:09 zakim, q? 10:36:09 I see MikeSmith, boazsender on the speaker queue 10:36:28 ack MikeSmith 10:36:28 MikeSmith, you wanted to comment 10:36:39 james: it should be easier to get new features into webdriver now 10:37:16 MikeSmith: we've been talking about the manual testing issue, we know how to solve it, but we dont have the budget to pay people to do it at the W3C, so the next step is solving this. 10:37:56 ACTION: mike to follow up with interested parties to fund work on webdriver tests 10:38:16 sam: it seems like theres basic work we could do to try this 10:38:55 sam: it seems like theres basic work we could do to try this and make a stawman. 10:39:16 MikeSmith: I admire that as a strategy, but I think a more systematic stragey would work better. 10:39:31 MikeSmith: I think return of investment should be high here. 10:39:54 s/stragey/strategy/ 10:40:41 Rick: I think we should invest in the central testing dashboard first. 10:40:41 reftest: 483 10:40:41 stub: 50 10:40:41 wdspec: 2 10:40:41 testharness: 7177 10:40:42 manual: 888 10:41:17 note that there is "manual" and manual. Some are semi-automated... 10:41:32 Rick: we have a team at google thinking of prototyping a dashboard like this. 10:42:03 schuki has joined #testing 10:42:23 TOPIC: MVP reqs for the central WPT dashboard 10:42:40 https://docs.google.com/document/d/1ehJAwL03NM-ptSbw7rDUo_GkiaZBrxanMrvjjqtuqis/edit 10:43:03 miketaylr has joined #testing 10:43:56 RRSAgent, make minutes 10:43:56 I have made the request to generate http://www.w3.org/2016/09/22-testing-minutes.html MikeSmith 10:43:58 q? 10:50:40 q- boazsender 10:57:43 q+ to suggest the relation to bug reports. 11:00:58 ack david_wood 11:00:58 david_wood, you wanted to suggest the relation to bug reports. 11:01:29 zcorpan_ has joined #testing 11:01:35 q+ 11:06:48 Domic: web devs would submit tests if there was a jsfiddle like interface 11:09:02 ack zcorpan_ 11:11:11 q+ jgraham 11:11:26 q+ to talk about metadata 11:11:26 q+ boazsender 11:11:37 ack jgraham 11:11:52 dominic wants to write a tool to crawl metadata of tests and append results to margins of specs 11:12:01 s/dominic/Domenic/ 11:12:03 james: we've tried this and it didint work 11:12:46 q+ 11:12:55 Present+ DomenicDenicola 11:13:45 It works somewhat on https://xhr.spec.whatwg.org/ 11:13:54 But it's a lot of work to maintain 11:15:02 present- Sam Weinig 11:15:03 q? 11:15:05 ack ShaneM 11:15:05 ShaneM, you wanted to talk about metadata 11:15:10 q- boazsender 11:15:13 q+ to say something about the CSS tests harness dying 11:15:39 james: maybe the api for this should be from the IDL side in the specs 11:16:26 Ms2ger, what works on https://xhr.spec.whatwg.org/ I don’t see anything in the margins or links to tests 11:17:09 Click on "Add links to tests from requirements (beta)" 11:17:30 ack gsnedders 11:17:30 gsnedders, you wanted to say something about the CSS tests harness dying 11:17:31 q- gsnedders 11:17:48 There's also some prior art from Philip` 11:18:17 Ms2ger: any link anywhere? 11:18:54 dcooney: i have an issue when a spec has an algorithm with 20 steps or something, and want to know where the test is for step n. 11:19:39 q? 11:19:44 ack dcooney 11:20:59 q+ to suggest we follow up the conversation on public-test-infra but from the high-level goals that rbyers described 11:21:42 https://lists.w3.org/Archives/Public/public-test-infra/ 11:22:16 ack MikeSmith 11:22:16 MikeSmith, you wanted to suggest we follow up the conversation on public-test-infra but from the high-level goals that rbyers described 11:22:53 ACTION: MikeSmith to send dashboard outline to public-test-infra for feedback 11:26:30 RRSAgent, make minutes 11:26:30 I have made the request to generate http://www.w3.org/2016/09/22-testing-minutes.html MikeSmith 11:29:31 rick: proposing a new "Defacto" directory for tests for features with no specs, eg: all of hit-testing 11:30:00 james: no, i'd want a hit-testing dir 11:30:10 there is precedence for tests without specs 11:31:33 undecided 11:31:59 zcorpan has joined #testing 11:32:06 breaking for lunch 11:32:56 triblondon has joined #testing 11:35:40 triblondon__ has joined #testing 12:13:50 Ms2ger has joined #testing 12:20:00 zcorpan has joined #testing 12:22:59 dom has joined #testing 12:43:54 miketaylr has joined #testing 12:43:57 http://test.csswg.org/suites/geometry-1_dev/nightly-unstable/html/chapter-6.htm 12:44:01 HELLO 12:44:02 thx 12:44:07 yingying has joined #testing 12:45:05 yingying_ has joined #testing 12:46:36 boazsender has joined #testing 12:50:26 JohnJansen has joined #testing 12:55:06 xorro has joined #testing 13:01:10 tidoust has joined #testing 13:02:40 kawai has joined #testing 13:05:00 http://www.w3.org/2016/09/22-testing-minutes.html 13:05:18 csswg-test: 13:05:20 testharness: 439 13:05:20 reftest: 10851 13:05:20 wdspec: 0 13:05:20 manual: 2240 13:05:23 stub: 0 13:05:25 visual: 3535 13:06:57 Max has joined #testing 13:07:18 triblondon__ has joined #testing 13:09:25 https://csswg-test.org/ 13:10:01 scribenick: zcorpan 13:10:07 including https://csswg-test.org/submissions/ where PR branches are mirrored 13:10:25 C:\Windows\System32\drivers\etc 13:10:41 Topic: CSS WG merger into WPT 13:10:44 Topic: merging csswg-test and web-paltform-tests 13:11:13 s/Topic: CSS WG merger into WPT/ 13:11:33 s/paltform/platform/ 13:11:48 gsnedders: the basic status quo is that most of the policies that made it hard to run the css testsuite in the same way as wpt 13:11:51 gsnedders: are now gone 13:12:00 gsnedders: most of the policies are now the same between the two 13:12:10 gsnedders: some things are still different because the css build system 13:12:22 gsnedders: we have agreement to get rid of the build system 13:12:41 gsnedders: the biggest thing that ppl are waiting on are a replacement of the css test harness which shows results across browsers 13:13:07 gsnedders: most of the work has been about wpt-tools like manifest generation 13:13:15 gsnedders: to create an accurate list of test in csswg-test repo 13:13:25 Ms2ger has joined #testing 13:13:31 gsnedders: that's still a PR, not finished. 99.9% done, 99.9% left to do 13:13:46 boazsender: what's the PR? 13:13:47 https://github.com/w3c/wpt-tools/pull/90 13:14:10 gsnedders: the comments aren't quite up to date 13:14:35 gsnedders: teh current css testsuite build system doesn't build the test unless it has a for something that sheperd knows about 13:14:42 gsnedders: lots of such tests 13:14:50 RRSAgent, make minutes 13:14:50 I have made the request to generate http://www.w3.org/2016/09/22-testing-minutes.html MikeSmith 13:15:03 s/teh current/the current/ 13:15:12 gsnedders: once that PR is merged it should be able to make manifests in the same way for the two repos 13:15:29 gsnedders: and running the server and the lint tool 13:15:35 gsnedders: all the tooling should work 13:16:27 gsnedders: the build system is the only thing that currently works 13:16:55 gsnedders: the big thing is reaching the point of where css is ready to get rid of the build system 13:16:59 s/all the tooling should work/all of the tooling is likely currently broken except the build system 13:17:07 gsnedders: so having results displayed for CR exit criteria 13:17:08 rniwa has joined #testing 13:17:16 gsnedders: once that is done it should be easy to merge the two repos 13:17:47 JohnJansen: do they have the directory structure? 13:17:49 gsnedders: yes 13:18:04 JohnJansen: we make each folder into a virtual server 13:18:15 JohnJansen: we don't want to have too many tests in one folder 13:18:22 gsnedders: only top-level? 13:18:27 JohnJansen: arbitrary 13:18:40 weinig has joined #testing 13:18:44 gsnedders: the output of the build system is not great 13:18:53 boazsender: the source files are separate 13:18:55 gsnedders: yeah 13:19:14 boazsender: how is html different? oh subdirectories 13:19:14 I think it would be nice if there weren't more than two levels of depth just for test selection reasons (when using the manual interface) 13:19:34 JohnJansen: css puts them all into one dir because that's the tool output 13:20:00 boazsender: like visual tests, is box blue? 13:20:01 that sounds relatively easy to fix FWIW 13:20:12