TPAC 2017 WPT (web-platform-tests) F2F -- 07 Nov 2017

<boazsender> :wave:

<gsnedders> RRSAgent: make logs public

<foolip> We're now using this channel to take notes for the TPAC meeting

<boazsender> https://www.w3.org/wiki/TPAC/2017/Testing

<miketaylr> sup

<gsnedders> ScribeNick: boazsender

<miketaylr> boazsender: my issue is that i can't log into the wiki

Intros

Foolip: I work for google in sweden. I lead ecosys infra team. trying to make the web platform more predictable. I want to figure out what to do this quarter for wpt.fyi, to change the metrics to incentivize interop. this data is already out there, but the presentation affects behaviour.

kereliuk: I work at google on chromedriver, and how it relates to wpt. I want to talk about figuring out what the state of things we can test are with the current tooling.

jeffcarp: I work on chrome ops. I work on wpt.fyi. Today I want to understand the needs of wpt.fyi.

johnjansen: I work at microsoft on microsoft web driver, and infra for wpt consumption.

scottlow: I work at microsoft as interop PM. interested in learning more about wpt.fyi and getting plugged in.

<gitbot> [13web-platform-tests] 15foolip closed pull request #8089: Clean up click-cancel.html a bit (06master...06click-cancel-cleanup) 02https://github.com/w3c/web-platform-tests/pull/8089

<gitbot> [13web-platform-tests] 15foolip 04deleted 06click-cancel-cleanup at 145732e85: 02https://github.com/w3c/web-platform-tests/commit/5732e85

clmartin: I work as a program manager at microsoft as a program manager webdriver, dev tools, and internal engineering on test infra.

<miketaylr> boazsender: I'm from Bocoup. I have a lot of interest in this meeting.

<miketaylr> boazsender: I'd like to add to the agenda: talk about test discoverability, workflow about figure out where tests should go

<miketaylr> boazsender: test coverage analysis

miketaylr: I work at mozilla on the compat team. I'm here because I think tests are great. I contribute to this project ~4hrs every year. I'm here for emotional support.

<twisniewski> me

twisniewski: I'm here to be a fly on the wall and to get to know everyone intimately, so we can get everyone's tests out of their trees and int wpt.

rbyers: I manage the web platform team in waterloo for google. I want developers to stop saying that the web is hard to build for. The main thing I want to get out of today is what are the biggest bang for the buck we can do on the chrome team such that interop is a natural outgrowth of the engineering discipline we apply to our work.

<Hexcles> me

hexcles: I'm working on chromium import/export for google. I am interested in wpt.fyi. and interested in talking to people about import/export.

gsnedders: I'm a google contractor. I want the web to become interopable by the end of the meeting.

automatedtester: I work on moz. I would like to learn about how mozilla can contribute better. I have a similar goal to rick to making web developers lives better. I see web compat as a bug of the web.

ato: I also work for mozilla. I worked on marrionette and webdriver and gecko driver. My primary goal to day is how to expose a privileged api to help with testing. or can we instrument the web browser to automate these test cases that are hard to automate. what we discuss on this will have a direct impact on BTT.

jgraham: I work at mozilla. I have done lots of the infra for wpt for a while and also some internal mozilla stuff like our two way sync. I would like to kno what concretely we should be concentrating on in the next year to achieve better interop.

Wilhelm: I am wilhelm. I used to work at opera. I am here lurking.

foolip: it looks like there is no one from apple here.

rbyers: its worth pointing out that we have two people at google working on webkit (focus on ios)

jgraham: igalia are also contributing to webkit.

<gsnedders> rniwa and youenn are both apparently coming according to who registered

miketaylr: at mozilla we are the compat team. we look at broken websites. I think testing and promoting interop through wpt is the future.

ato: this is how we shift this from reactive to proactive.

twisniewski: if anyone has any questions about our broken website logging, we're here and we're always on IRC.

foolip: lets try to sure the agenda.

rbyers: can we add to the agenda an update from each browser?

jgraham: +1

foolip: can we front load the 2018 conversation?

rbyers: how do we measure our progress in 2018.

webkit enters.

youenn: I'm youenn working on webkit. I work on SW, and other things. I work on wpt on the weekend.

miketaylr: can you change the name of edgeHTML?

clmartin: no
... i have an agenda item. whats everyone's policy for contributing tests back?

scottlow: new agenda item. what are people's thoughts on how these tests map back to real world use case. how do we clean up the tests to catch real world problems.

johnjansen: this is about mapping tests to real world problems.

foolip: so failing tests?

johnjansen: I think that fixing those wouldn't move that needle.

jgraham: maybe add this to the agenda.

twisniewski: also proposing clean up.

jgraham: I think we should add this as an agenda item. can we push goals to the end?

youenn: related to that, there is some concern that there may be duplicated tests between webkit src layouttests and wpt. that this may be slow. we would like to improve this.

<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-212137cf99a662aba7c4 at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-212137cf99a662aba7c4

we also have a breakout tomorrow

rbyers: we'd like to take the consensus of today to the break out tomorrow

foolip: I'm giving a talk about this tomorrow, and the breakout can be an opportunity for qs on that

jgraham: I suggest we start with the status updates

<foolip> https://bit.ly/hackfest-wpt is a talk from which I'll be pasting some links

foolip: so ecosys infra. predictability effort in chromium started two blinkons ago (1.5yrs) ecosys infra is a spinoff of that which focuses mostly on testing.

rbyers: we've done two things - cross cutting to give advice to evry blink engineers, plus a core team working on infra.

foolip: the first thing we did was to work on two way sync to make sure that chromium engineers could make changes and land them in wpt. jeffcarp did that and its working.
... i measured the contributions from chromium for the year before and the year since and got a 220% increase. this is because of two way sync.
... people are using it.
... the other half of export is frequent import. we're always importing (a few times per day)

<MikeSmith> gsnedders, how many more you need?

<gsnedders> MikeSmith: currently we're just at 100%, not yet over 100%

foolip: we import them and run them through the cq, and we're working on a system for discovering regressions.
... then we started working on the wpt.fyi dashboard. our goal is to run it for every commit for as many configurations we need (16) and do so in 50 minutes. we're not there yet, but we're on our way.

<gsnedders> MikeSmith: thx!

foolip: when we're making changes to wpt, we want to know that we're not breaking anything. james did a stability checker there, and another part we're trying to sort out now is how to tell why a test is flakey.
... inside of blink again, we've started asking for wpt tests as part of intent to ship. its not required, but you have to show us why a test is not a wpt test if its not. just aksing makes people more likely to write a test as a wpt test.

<rbyers> https://web-confluence.appspot.com

foolip: we also did web api confluence dashboard. what that is doing is looking at the window object in different browser and describing whats there, and then looking at that to help prioritize compat work for browser engineers. and it gives you some things that wpt doesnt.

<rbyers> https://foolip.github.io/day-to-day/

<foolip> https://github.com/foolip/testing-in-standards/blob/master/lifecycle.md

<foolip> https://github.com/foolip/testing-in-standards/blob/master/policy.md

foolip: I've been working on getting testing to be an integrated part of the standards process. I recently rediscovered that rbyers was the first to do this for pointer events.
... recently we started in whatwg to have a test pr with each spec pr. this worked better than we thought it would.
... the editors doing the work say they are more productive. and we are finding issues we didn't expect to find.

<foolip> https://foolip.github.io/ecosystem-infra-stats/testpolicy.html

foolip: on this graph that I just pasted, I am counting the number of specs that have this policy that we have in whatwg for tests blocking spec prs in CR+.
... there are about 200 specs in the world that matter. we are passed 100. they are the most active 100, I think.
... so we can say that a lot more than 50% of the work is happening this way.
... greg said that all the other wgs besides css that microsoft is involved in are doing things this way.
... I just got webrtc on board, and web sec is coming. css is more tricky.

johnjansen: i don
... i know we have agreement from each of the wgs we work with to do this.

foolip: this is step one of 4 steps in my master plan.

wilhelm: plh is making this a soft requirement for new groups.

<foolip> the 4 steps are in https://github.com/whatwg/html/issues/1849

rbyers: we're going to keep racheting up the pressure in chromium. once we have the automation going, we'll eliminate all unit tests that are not wpt tsts.

foolip: automation is a big deal. we just landed the bits that allow tests to click themselves. the model is that wpt is already controlled by webdriver, the tests will just talk to webdriver and ask it to do things.
... thats all wrapped up with a testdriver.click() api which returns a promise.
... we're talking to second screen, and permissions to mach devices.
... I want to have a solution for all automation problems for chromium engineers.
... this may not be possible for bluetooth.
... we're doing some work right now to figure out what all the manual tests are.
... we're working with teams to find their painpoints. we titied up the css tests.

<dom> Accelerated Shape Detection in Images

<foolip> 2018 OKRs: https://docs.google.com/document/d/1hCOWgU95pN0LaZLdkoJyjlS9A5sjyQQWW6gh8YaDF8o/edit?usp=sharing

foolip: our three 2018 objectives so far are:1) web platform is coherent, web developers are happier.

rbyers: we should add that we did a survey whose results that we cant make public, but it was clear the biggest problem was interop.

foolip: 2) web platform has high quality shared test suites, and everything about quality, automation, dedupe, etc, falls under this. basically in 2018, I want to delete all layoutTests.
... #3 is that shared tests are a first class citizen in chromium.

clmartin: what do you think about users over webdevelopers?

foolip: its similar to the relationship between engine devs and web devs.
... the value has to do with users being happier.

jgraham: I think that talking about users is more difficult.
... e.g. they might not like the web because of 5mb of tracking on a page, and thats not something we can fix.
... what we can fix is that the websites they visit dont work in certain browsers.
... we can give them choice over browsers.

zcorpan: another thing that affects users and that I think you can measure is when websites block a certain browser. thats a real problem that all of us have experience, and interop reduces that problem.

jgraham: so thats a special case of the site doesn't work.

johnjansen: from a user perspective, we've discuovered with a new browser in the market. a lot of time we'll see that websites wont look at edge, and will say it doesnt work in edge, but it actually does.

rbyers: I think this still follows for interop. at google our products do this. they think interop is such a problem, that of course we would have to test it.

<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-735324 from 14d46dcbb to 142966740: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-735324

<gitbot> 13web-platform-tests/06chromium-export-cl-735324 142966740 15Florent Castelli: Make getUserMedia throw DOMException or OverconstrainedError...

johnjansen: when you go into the js engine or networking stack, there are differences.

rbyers: also, we don't have a user goal, becuase we dont have an interop problem.
... but I think that FF and edge should measure this.

<kereliuk> me

kereliuk: however far down you go through the pipeline from users, web devs, browser devs, its harder and harder to measure.

johnjansen: we used to not have an interop problem either, that is real to keep in mind. we had 98% market share. google does have an interop problem, but you dont notice because you have such market share. thats what we were doing. at microsoft, we did a hard turn (180 deg on a cruise line). it came down from up top. we decided to build a new browser, forked trident, started implementing chrome prefixes, and the primary goal was to build an

interoperable engine.

<rachelandrew> this has had a review, but I think someone with write access needs to merge it https://github.com/w3c/web-platform-tests/pull/7364#pullrequestreview-74849500

johnjansen: in edge, we use interop and compat interchangeabley.
... we measure interop with a team of people really using the internet in all browsers and scoring their experience in 100 websites (rolling 100), and we say we should always be within 1% of success, and then we also measure firefox. if we experience a gt 1% slip, then a bug gets filed.
... we did a/b testing (edge in ie, ie engine in edge) to see how we were doing on this, and after a year, we passed this test.
... so we had to build internal infra to run wpt, because we are tied to our o
... because we are tied to our OS and IE, and our infra preexists the WPT infra.
... we also have legacy XML material.
... we have a fork of wpt, we run it on 4 branches of edge. it takes about 4 hours.
... you can look at how each branch is doing against public releases, etc.

<gitbot> [13web-platform-tests] 15tomalec opened pull request #8096: Remove tomalec from url/OWNERS (06master...06remove-tomalec-from-url) 02https://github.com/w3c/web-platform-tests/pull/8096

johnjansen: we have an analyzer that ignores tests known to fail, and prioirizes sudden regressions.
... we're trying to get our team to push to our internal github server. this is not happening becuase our internal DRT (developer regression tests) are easier for our devs to write.
... we're working to convert these tests, and making better tools.
... i am able to represent the value to our team. 1) the tests already exists. 2) I'm running code coverage analysis.
... its tough because we have a lot of history.

rbyers: how will that two step thing work?

johnjansen: we haven't gotten to this yet.

jgraham: if this is some source transformation thing, will these be super weird?

clmartin: we converted our DRTs to the wpt harness

jgraham: lets assume that we find a way to have internal reviews auto merge, and then someone fixes a bug, would you see it?

johnjansen: we're going to do a two way sync.
... we'll plan to delete the DRT when we get it into wpt.
... the interesting thing we just discovered through code coverage is that we are getting only minorly more out of DRTs on top of WPT.
... I'm also looking at bugs logged for this, and I'm finding way more coming from WPT that we missed through WPT.
... we are very close to having VP approval for changing individual contributor behavior.

foolip: have you made the WPT path maximally easier?

johnjansen: no.
... its easy. the problem we have is a strongly templated approach for DRTs. we need to replicate it for WPT.

jgraham: sounds like there could be a feature req for wpt.

johnjansen: yes

rbyers: what about cultural/legal issues with the public nature of this.

clmartin: I think you see less ms engineers talking on wpt because our engineers are not working on it
... there is some fear, but its changing.

johnjansen: there is also a cultural extension that ways on us from the past.

<gitbot> [13web-platform-tests] 15foolip closed pull request #8087: Fix a typo (stray [0]) in the RTCCertificate test (06master...06webrtc-certificates-typo) 02https://github.com/w3c/web-platform-tests/pull/8087

<gitbot> [13web-platform-tests] 15foolip 04deleted 06webrtc-certificates-typo at 148337777: 02https://github.com/w3c/web-platform-tests/commit/8337777

tea break

<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-ce8ef48f00cb1b034fcc at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-ce8ef48f00cb1b034fcc

<gitbot> [13web-platform-tests] 15annevk closed pull request #8096: Remove tomalec from url/OWNERS (06master...06remove-tomalec-from-url) 02https://github.com/w3c/web-platform-tests/pull/8096

<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-dac2cd37d89215cc1ef6 at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-dac2cd37d89215cc1ef6

scribe nick: wilhelm_

<zcorpan> ScribeNick: wilhelm

<wilhelm_> jgraham: Mozilla are running wpt, more or less. We have two-way sync, which we've had for a couple of years

<boazsender> ScribeNick: wilhelm_

jgraham: Not as automated as the blink system, working on finalizing automation

<gitbot> [13web-platform-tests] 15bert-github opened pull request #8097: The descriptive text was missing the word "filled" with respect to the reference file (06master...06visibility) 02https://github.com/w3c/web-platform-tests/pull/8097

jgraham: We are working on making our two-way sync more rapid, because that is hopefully going to improve the feedback that devs get, prevent conflicts, make the experience nicer
... I don't have statistics on how much tests we're writing, but we have some of the problems that other people have mentioned
... Still culture for not always writing wpt
... Difficultly to learning a new system, some tests can't be supported. Some solved by click automation

<foolip> https://bit.ly/ecosystem-infra-stats has some stats about contribution origins, and usage for Chromium.

jgraham: I don't know the ergonomics of that vs. gecko-specific. Easier to write single browser test.
... Implemented stability testing upstream
... Interal work on stability, for other test types
... Internal infrastructure will catch some problems before we upstream
... We want to improve: tell people when spec changes, test start failing
... We want to surface that more obviously. When there is an import, you'll see a bug somewhere that tests started failing
... We can be more responive to that kind of thing, dashboard is helpful
... Discussion on dashboard for each PR
... "We started failing, this all works in Edge"
... We don't run wpt on Android

foolip: We neither

<ato> jgraham: (Would be nice if we hooked that up to mozreview/the new review tool.)

jgraham: (Describing legacy systems blocking some progress)

foolip: What is the strategy for wpt default?

jgraham: I'm less confident that we can remove browser-specific tests
... I don't know how to test: schedule a GC at a certain time in a cross-browser way
... We can't write a testing API for such things

foolip: I don't think we can delete layout tests, but we can move 80% of them. Clean separation of webplatform vs internal tests for a good reason

jgraham: If you write a non-wpt test for web platform things, a justfication is needed
... we have also been doing work directly on upstream, more impactful. Test stability
... wptrun makes it easier to run, unmaintained testrunner being phased out
... Goal of making upstream ergonomic

<boazsender> Nell from microsoft and the web vr spec editor just joined us

<gitbot> [13web-platform-tests] 15gsnedders closed pull request #8097: The descriptive text was missing the word "filled" with respect to the reference file (06master...06visibility) 02https://github.com/w3c/web-platform-tests/pull/8097

clmartin: We have an XML file that has an inventory, looking at time budgeting. "You have this much time to run your legacy tests"

AutomatedTester: On the flight over I looked at one direction, we do have a lotof duplication, a lot of coverage in mochitest. Look at what is usable, migrate acrsoss. A lot of work.
... We can't meaningfully automate it over
... The tests should be testing something in a meaningful way

<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-a63feaee753c930c0077 at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-a63feaee753c930c0077

AutomatedTester: SW tests have been imported, appreciated not having to write tests

jgraham: We grabbed the SW tests, fixed them, upstreamed them. Now this would not happen, Chrome developers upstream due to better process

AutomatedTester: From Mozilla point of view, we had a lot fewer mochitests, wpt helped
... Certain bits of WebRTC++ will need specific browser harnesses

foolip: They are writing more wpt tests by default, requring test changes for normative changes
... Other areas are a lot worse than WebRTC

boazsender: For new features, there is momentum for convening on shared test suites. For old tests, there is work to do, an analysis of duplication across vendor test directoires

jgraham: I don't particularly care if we run the same tests twice

boazsender: two reasons: speed, inverse of coverage: easier to reason about if we have coverage

jgraham: From the wpt perspective, code coverage from mochitests is not relevant
... If you see coverage on browser specific tests but not wpt, that is a good candidate from wpt upstreaming

zcorpan: Cost of duplcate test: maintenance, spec change. Not only do we need to find the relevant tests, editing five different tests.. Man-hours

jgraham: You get a spec change, wpt test fails, you fix the code, you see the mochitest failed, you delete the mochitest failed

boazsender: That's awesome, and I think that establishing that for all four teams is the cultural challenge

jgraham: Cultural challenge: if this test fails, delete it
... "Mochitests run on android"
... Traditionally, we run everything on every platofm. No real reason for why we don't run it on Android, it's just work.

youenn: We are currently slowly importing every test suite, planning to import all the test suites implementing features on
... When it is imported once, we refresh it every 2 weeks, by doing painful manual uploading to our infrastructure
... Sometimes we see flakiness, handled manually
... People are using wpt more and more, trying to fix implementation space on this
... We also have wpt serving webkit-specific tests, so we can use the whole wpt features
... It may be convenient to use testharness.js++, tests will be easier to upstream
... Now we have a script under review to create wpt PR from bugzilla patch
... Long-term plan, one review on bugzilla

foolip: Blockers?

youenn: Allocate time to do these scripts, busy implementing features

rbyers: Reach out to us for help

youenn: If you are moving wpt tests into webkit, future changes will get lost. Working to upstream more
... When we start modifying bugzilla and bots, we need to work with the community working on these bots
... INterested in latest webdriver things, to run the manual tests through wptrunner
... That's the plan
... Hopefully we can advertise these new features, start thiking about using testharness.js
... wpt is evolving a lot, which is great. Nobdy in Webkit has time to see what's happening there.. I discovered Webdriver features by chance last week, was very happy. Clicktest automation
... Need a way to advertise this to webkit community

foolip: Would an emal summary if what we're doing per quarter be useful?

youenn: Yes
... Nobody in Webkit following wpt closely, notice when things start breaking

foolip: Another way for visibility: use priority labels

youenn: If there is adoption needed on webkit needed, CC webkit team members

foolip: Who is primary contact?

youenn: At the moment, me
... I'll try to push things internally
... I'm not in a position to delegate...

<gitbot> [13web-platform-tests] 15gsnedders opened pull request #8098: Fix check-layout-th.js script includes (06master...06check-layout-fix) 02https://github.com/w3c/web-platform-tests/pull/8098

Roadblocks

foolip: Roadblocks on implementors blocking contributions to wpt

jgraham: We want to hear from MS, Webkit, about what problems they are encountering

JohnJansen: I don't think there is anything inherently problematic about upsteeaming, we need a cultural shift that is in the process
... It's down to prioritization

clmartin: General question: automatic upstreaming, should we do GH PR?

ato: Idea is that if it is reviewed internally, just upstream

jgraham: Historically, the policy has been a linked review
... If we say no, that is a policy change

foolip: Not possible if internal review happens with code review

jgraham: Concern: if people submit crap ,that is harmful to everyone
... If you link to a code review, you can document that it happened, audit it

boazsender: can you describe internal code review?

clmartin: In BSO + git, pull request
... CI

boazsender: Is there a policy for test review?

clmartin: Test review is per team currently, 100x test runs, catch flakiness

boazsender: Combination change of edge + wpt policy that can achieve review goal

foolip: Do you change the source code in the same commit on the same review?

clmartin: Bundled together

foolip: You cannot make wpt first-class without keeping that
... How much information can we upstream? Name of reviewer?

jgraham: Original policy was a change from CSS process
... In spirit of current policy: hypothethically, original reviwer approves PR in Github.
... Can't enforce meaningful comments

foolip: In the spirit of getting Edge contributions: PR, edge export: Who wrote the code, who reviewed it.

twisniewski: Value of review beyond rubber stamp: What is what we want from the review. Annotation, link to a spec for why the test exists.
... Even if there is a not a person to reach out to, there is a reference to a spec

jgraham: We have deliberated removed this requirement, CSS has it
... It is not enforceable at the same time as we request upstreaming

twisniewski: We have a problem that we have tests not covered by the spec

gsnedders: Plenty of CSS tests with links, fail for unrelated reasons, not that useful

jgraham: From Edge POV, would this make your job easier?

JohnJansen: No
... Code review not blocking us from contributing. Linking to spec lines would hurt, make work harder

jgraham: You commit to your DRT tests, code review is not exposed. Proposed change: this person reviewed this

gsnedders: Do a second upstream review, second Edge person rubber-stamps

jgraham: That provides no value

JohnJansen: Model we try to build: we check in DRTs, written in a way to ease upstreaming, we submit pull request, do the review on actual PR
... Getting DRT easily upstreamable is our biggest challenge, not code reviews

jgraham: In Gecko, Chrome, it has been imported that people can use existing workflows, magic upstreaming

zcorpan: If there is an internal review, and that is useful to make public, maybe there could be tooling in your review to export individual comments

JohnJansen: WebDriver tests upstreamed first, because we wanted feedback from other vendors

jgraham: For controversial things this is good

ato: GOod for spec changes + tests

zcorpan: In some cases you want to export individual comments from test reviews

JohnJansen: I don't know how to flag "this needs review from webkit"

jgraham: Labels

ato: Labels could block integration

<gsnedders> RRSAgent: make minutes

jgraham: maybe we should make policy more relaxed

clmartin: (describing policies for keeping track of known failures on import)

JohnJansen: Who would be on point for bad tests?
... Pretending, hypotetical bad tests from one vendor. How to follow up?

rbyers: That is the reason for current policy, to have a way to handle that

boazsender: Inbox workflow: to not discourage test authors, auto-landing, backlog of tests

ato: How is this different from PRs with second review?

jgraham: There is an information visibility problem, dashboard is part of this. If you add tests that fail everywhere, that is more visible

JohnJansen: Challenge: vendor sees crap tests from other vendor. Visible feedback

gsnedders: You should see the pull request..

jgraham: It should be possible to see this data from history

JohnJansen: This is not to surface on the dashboard. If MS engineers submit crap tests, there should be a way for MS to see this, learn, escalate

jgraham: If there is a systemic problem from one source, people will notice and complain

JohnJansen: I want to make sure our code reviews are sufficent, good quality tests

ato: Removing tests is different

boazsender: Is there a way for you to tell if bad tests are written in your org?

nell: Does some of the flakiness come from the difference between browser engines? May work fine in one browser, only becomes visible once upstreamed

jgraham: We have tooling to assess flakiness in major browsers
... Question about bad tests is not just flakiness, also wrong

gsnedders: Two categories of wrong tests: false negatives and false positives.

jgraham: If some test just passes, but is wrong, it's not a big problem. Irrelevant test?

boazsender: We want to find the blockers from Edge
... Establishing that workflow.
... Independently of wpt

jgraham: I see that there isn't a problem. We can adjust the wording of our policy, what you do internally we can't solve
... If you or another vendor submit bad tests, people will notice

foolip: You want to do a good thing, we can accept little information.
... Everything is a PR, problems are not unique to MS

jgraham: Let's talka bout webkit

boazsender: Long-term plan for DRT conversion

<foolip> agenda is at https://www.w3.org/wiki/TPAC/2017/Testing

<gitbot> [13web-platform-tests] 15bert-github opened pull request #8099: Fix to numbers-units-011 (06master...06numbers-units) 02https://github.com/w3c/web-platform-tests/pull/8099

youenn: Concerns about efficiency, github issue about making test running faster
... Caching, performance improvements would be good
... Easier ways to use tools like wpttestrunner, stability checks
... From organzational POV, we are fine with how things are done. Reviewed on webkit, no need for double review
... There is this fear that we have webkit-specific tests, people will not see why it is this way. Testing a particular test that is redundant for another browser, but for us it is not

jgraham: Generic problem. Skeptical to deleting duplicates for this reason
... Maybe a comment "special case in webkit" would help

foolip: Deduplciation effors must be very careful as to not destroy existing value

jgraham: Hypothetical: code coverage test per test

foolip: We can do that!

boazsender: We can deduplcate webkit, chrome, firefox

<gitbot> [13web-platform-tests] 15gsnedders closed pull request #8098: Fix check-layout-th.js script includes (06master...06check-layout-fix) 02https://github.com/w3c/web-platform-tests/pull/8098

youenn: There are things that are difficult to handle: memory cache. Easy to break the tests. We'd keep those tests internal for now? Maybe just a comment

jgraham: For specific tests, we may need a mechanism for indicating that "this test tests something special in gecko, don't change it without review from Moz engineer"

foolip: We have comments on references to exact bug reports

twisniewski: Some duplicates are just extra tests, may also be useful for other vendors

jgraham: As long as it's not testing browser-specific internals..

foolip: Gecko roadblocks?

jgraham: Covered in status update, not policy things, technical issues that need to be solved, cultural stuff

foolip: For Chromium, automation seems most important
... Make all people who do standards treat that as a focus
... Make wpt the default working mode
... Make layout tests smaller, wpt bigger.
... Needs coordination with webkit

youenn: Issue with license of tests?

foolip: If we move layout tests without coordinating with webkit, problems will arise on webkit side

youenn: There are people working on these issues, can ping them

foolip: Maybe licensing thing as well..

youenn: More priority on getting the export things running

foolip: asked layout, big painpoint is CSS TS is a big mess, gsnedders fixed that.
... Duplication with webkit.. that will be your webkit if we solve our other problems in the wrong way

youenn: I can try to increase priority on this

foolip: Deduplication must take Chromium + webkit into account at the same time

youenn: Automation: there may be some sharing of GS layer

<gsnedders> testdriver-vendor.js

jgraham: Will you have a way to detect for testdriver-vendor.js if you get a different result upstream, instead of when running in your infrastructure?

foolip: I want to solve this by getting content_shell into wpt

jgraham: I want to avoid situation where chromium engineers make things that depend on vendor-specific APIs

foolip: By noticing the differences, we can sort out most of that

boazsender: Can you explain that more?

<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-df5f5d9bc1a4b2f514c1 at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-df5f5d9bc1a4b2f514c1

<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot created 06chromium-export-cl-733862 (+1 new commit): 02https://github.com/w3c/web-platform-tests/commit/d10bcc029e31

<gitbot> 13web-platform-tests/06chromium-export-cl-733862 14d10bcc0 15Xida Chen: [PaintWorklet] Pass un-rounded size to PaintWorklet...

jgraham: The automation (testdriver.js) works through a common implementation through WebDriver. Not all browsers can run that infrastructure, WD can be slow. Vendors can replace the click method with their internal test API

foolip: content_shell what we use to run our tests, there are some differences

jgraham: If you are running a vendored version, there may be differences to the standard ways

boazsender: Why is it a good idea to write content_shell tests?

jgraham: They don't want to run the full version

foolip: Historical reasons, could be slower
... We're running everything in content_shell, getting more stuff upstream is a matter of finding out where to go..

jgraham: Testdriver automation is specifically made for this use case, with known risk
... "don't ship stuff in Chrome if it doesn't work in the standard ways.."

zcorpan: Preventing content_shell specific things to leak in, make content_shell specific calls log when they happen

Lunch

\o/

Resuming at 13:30

<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-692579 from 1484362b2 to 14075802b: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-692579

<gitbot> 13web-platform-tests/06chromium-export-cl-692579 14075802b 15Tom Anderson: Spell length correctly...

<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-692579 from 14075802b to 1415c4749: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-692579

<gitbot> 13web-platform-tests/06chromium-export-cl-692579 1415c4749 15Tom Anderson: Spell length correctly...

<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-692579 from 1415c4749 to 143e35a1b: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-692579

<gitbot> 13web-platform-tests/06chromium-export-cl-692579 143e35a1b 15Tom Anderson: Spell length correctly...

<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-753262 from 14539d4a2 to 14d354cb3: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-753262

<gitbot> 13web-platform-tests/06chromium-export-cl-753262 14d354cb3 15Matt Falkenhagen: service worker: Upstream websocket test....

<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot closed pull request #7528: Spell length correctly (06master...06chromium-export-cl-692579) 02https://github.com/w3c/web-platform-tests/pull/7528

<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-db735ca54c8559c950be at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-db735ca54c8559c950be

<gsnedders> ScribeNick: jeffcarp

reconvening

boazsender: next item: Assessing the compat impact of tests.

Assessing the compat impact of tests

JohnJansen: when prioritizing bugs, EdgeHTML gives high priority to bugs on highly trafficked websites
... but always fix bugs before triaging test failures - catch 22
... fixing the failure on a live site fixes that site, but might not cover other API invocations that would be covered in a test

boazsender: can think of examples of tests written specifically for compat issues

JohnJansen: we see that with <table>s, which is why there's a rejuvenated effort to get <table>s spec'd

<gitbot> [13web-platform-tests] 15gsnedders opened pull request #8101: Fix #8095: give a better error message when elementsFromPoint is unsupported (06master...06testdriver-safari-error) 02https://github.com/w3c/web-platform-tests/pull/8101

JohnJansen: we test interop for the spec, not interop for the web

everyone: what are WPT tests for? should they test the spec or test for the web?

jgraham: we are not focused on tests that test the spec, we hope people upload tests that test the web, even if it does not correspond to a line in the spec

boazsender: would it be useful for us to try to categorize these tests?

jgraham: we want to emphasize that testing to 100% of the spec is not the goal

boazsender: should we have a mission statement / shared set of goals?

foolip: it's a group of many test suites, they might have different goals

zcorpan: the WPT documentation is technical only right now, it doesn't mention organization/test writing goals

jgraham: many interop bugs are from using many web platform features in unison, where do you put those interop tests? which folder in WPT?

boazsender: would it make sense to have a new top level directory for this?

<gitbot> [13web-platform-tests] 15gsnedders closed pull request #8101: Fix #8095: give a better error message when elementsFromPoint is unsupported (06master...06testdriver-safari-error) 02https://github.com/w3c/web-platform-tests/pull/8101

zcorpan: a new directory is the wrong way to do this
... better to have an out of band way of annotating a specific test (e.g. this is testing web compat problems)

<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-746494 from 14307fc13 to 1410c2dac: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-746494

<gitbot> 13web-platform-tests/06chromium-export-cl-746494 1410c2dac 15Hiroki Nakagawa: Worklet: Disallow Worklets on insecure contexts...

jgraham: one potential approach is to add comments to wpt.fyi

<gitbot> [13web-platform-tests] 15jgraham tagged 06untagged-e72fdc711db03e73a427 at 06master: 02https://github.com/w3c/web-platform-tests/commits/untagged-e72fdc711db03e73a427

clmartin: putting in the actual test data is better

tom: if we're making it precise and formalized we'd need to validate

clmartin: we want to use this to find tests that would improve interop

boazsender: what is that number? (of tests?)

JohnJansen: yes, we want sub-prioritization (not just "interop")
... if we fix 100% of failures in WPT, we don't know if that will actually help real web devs and users

jgraham: you can never tell that, you can tell that a specific test fixes a bug in gmail, e.g.
... a test could be not important at all today, but tomorrow FB could rely on it

zcorpan: it might be possible to find tests that exercise things websites are doing

<boazsender> here's an example of a mapping between a compat issue and a test: https://github.com/w3c/web-platform-tests/pull/8080

zcorpan: if we get a mapping between browser implementation and tests, we can run through httparchive and see which code paths the tests & the sites are exercising

andreastt: which tests are important over time changes
... e.g. when FF transitioned from single to multi process, a different set of tests mattered

miketaylr: one of the difficulties is that websites serve diff. versions to diff. browsers

JohnJansen: we've done the crawling, but typically the browser isn't going to go down the bad code path because the web devs use browser detection
... it feels like an unsolvable problem
... this breaks the web

boazsender: recency (what tests most recently went from 0 to 3 passing) might be a good metric

JohnJansen: if only Edge fails a case, why would they upstream it if all other platforms pass?

boazsender: we're working on running tests on every PRs and collect the time series data

jeffcarp: we should talk about using the wpt.fyi archives too

(that last one wasn't boazsender - that was me)

Prioritisation of infra backlog, future infra work

gsnedders: we have 213 issues & PRs labeled "infra" in WPT
... is there anything infra-wise that's blocking people?

andreastt: what kind of breakage?

jgraham: mozilla has 100Ks of issues open, having issues open in and of itself isn't bad

foolip: ecosystem infra rotation triages bugs and puts them on roadmap

Getting people involved in upstream issues/PRs, esp. insofar as triage

gsnedders: for quite a lot of test suites
... nobody notices the bugs that are filed against it in WPT

<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-756096 from 14895bb95 to 14fc64769: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-756096

<gitbot> 13web-platform-tests/06chromium-export-cl-756096 14fc64769 15Hiroki Nakagawa: Worklet: Implement exception report path for ThreadedWorklet...

clmartin: so we need more resources there for issue triage?

foolip: the problem to be solved is that ppl have a bad first experience
... they create a PR and it doesn't get looked at (you have to know who to ping)

jgraham: we've spent a lot of time with OWNERS files
... can you make sure people are paid to do it?
... it's not specifically part of anyone's job, and break the illusion that you can work on WPT from within your own browser's repository

foolip: is there any part of the test suite that's under good control?

jgraham: in some cases the spec editors are taking control

andreastt: webdriver's doing very well

jgraham: the unique thing about webdriver is that about 80% of the ppl who care about it are in this room

boazsender: this is a major barrier, if you're not working on a web browser, it's difficult to work in WPT

<ato> jeffcarp: Hi, I’m andreastt.

ato: ahh thanks

jgraham: we did Test the Web Forward to teach 100s of ppl to write WPT but got a lot of junk tests

zcorpan: mentoring is important
... we could say if you contribute a test, you should review someone else's test

jgraham: that's hard to do at this level
... people don't have the context to review issues outside their spec (e.g. grids)

clmartin: the process should be painless to upstream, but devs should also care about upstream WPT
... having a shared channel to go to?

jgraham: WPT is unusual as a single repo because it's a conglomeration of 200+ repos

ato: I think we've made great strides
... the individual contributor experience is so much better than a few years ago
... for instance, `wpt run`
... it's beneficial for me to be in the OWNERS file, why aren't others?

jgraham: for every CSS PR there are probably 10 ppl cc'd there

boazsender: there are great things we can do for spec editors to improve enfranchisement
... we can build community for spec editors, e.g. in the context of TPAC

gsnedders: most ppl writing PRs in WPT are 1. web compat ppl who don't work outside their own browser, 2. a few slightly random people, 3. to a lesser degree, spec editors

foolip: have been talking about per-contributor metrics to reward contributors with awesome contributions

zcorpan: not just onboarding ppl making a PR, also onboarding implementors (how to read the spec, etc.)

JohnJansen: in a prev. meeting, we talked about having a rotating "cop" for maintainer duties

foolip: could build some tooling to give us and people we're onboarding a higher level view of things

zcorpan: could have a shared dashboard for what needs doing

foolip: something that tells you what issues need comment

clmartin: something that pings you if a comment is untriaged

boazsender: an intermittent call, quarterly?
... we are adding more headcount to WPT infra
... we are not a browser but we are increasing velocity and this will have an impact

<clmartin> 👍

jgraham: there are some things that need to be fixed, like the docs are out of date and the website is less than ideal (apologies to gsnedders )
... it's fun to go make a cool dashboard but it might be higher impact to update documentation

<gitbot> [13web-platform-tests] 15gsnedders closed pull request #8005: Tag master with the PR id that merged at that point. (06master...06tag_master) 02https://github.com/w3c/web-platform-tests/pull/8005

jgraham: there are people paid to work on infra, so writing docs happens, but looking at random PRs doesn't fall within the scope of what ppl are paid for

<gitbot> [13web-platform-tests] 15gsnedders 04deleted 06tag_master at 146357192: 02https://github.com/w3c/web-platform-tests/commit/6357192

foolip: the browser vendors can fix that

jgraham: we don't have many people who prefer technical writing over coding

boazsender: I find that very exciting (JohnJansen +1s)
... there are a lot of moving parts to the user flow in WPT

<gitbot> [13web-platform-tests] 15jgraham tagged 06merge_pr_8005 at 06master: 02https://github.com/w3c/web-platform-tests/commits/merge_pr_8005

boazsender: relationship between the WPT website and infra is not obvious
... thinking about the various personas of people who interact with WPT and laying out the paths we want them to take through this

jgraham: we've made a lot of progress
... there have been a lot of improvements in the WPT workflow in the last year
... e.g. you can run a test in the browser of your choice
... there are other things that haven't been worked on as much, e.g. documentation

boazsender: I flew to California for jgraham to show me an undocumented CLI argument

jgraham: it's easy for me to justify improving docs & workflow

clmartin: for EdgeHTML's internal tests, if it's not being viewed, it's not being managed - could ping relevant spec editors with a mail with a list of bugs/PRs

jgraham: there may not be general buy-in from the spec editing community that they're responsible for that

foolip: for each directory, there's someone who cares

JohnJansen: how do we see issues by directory?

foolip: sort by labels, there's a label per directory

ato: ... is there a bot that pings stale bugs/PRs?

everyone: no

ato: it would be useful

foolip: what I'd like to do is start manual
... ping people manually
... then if people are feeling it's repetitive, make a bot for it

JohnJansen: internally the rotation is a week

ato: one of the problems here is that most of us have general familiarity with the web platform but aren't domain experts
... in specific specs

boazsender: we could make spec buddies

foolip: I'll write a script for myself and then if it's useful others can use it

<scribe> ACTION: foolip will write a script

<scribe> ACTION: follow up on github issue about not following up on github issues

foolip: if there are labels that nobody cares about...

boazsender: no on spec buddy idea? (buddying up with spec editors to help triage bugs)
... if everyone in this room befriends one editor
... everyone send an editor a postcard (with a github URL)
... follow spec editors on twitter
... I'll find you a spec buddy and follow up in 3 months

foolip: 3 months!

boazsender: one person from each browser?

JohnJansen: testing this, seeing how it works with a small group would be a great way to start

<boazsender> https://docs.google.com/spreadsheets/d/1CP4yv8bcyHZ5JDmQNiP0so-by1zZpndj_n3QsKoRP1I/edit#gid=0

boazsender: click on this link to sign up for a spec buddy

dknox: introductions: I'm a PM on the Web Platform team on Chrome

More introductions: Alex, working on WebKit at Apple

<miketaylr> harald kirschener

two more people: Andrew & Harald

dknox: the idea for metrics is that we have the WPT dashboard
... we use wpt.fyi to prioritize tests that need fixing, but it's a fairly generic tool, it just lists pass/fail
... thinking about how we can make the tool more useful
... for a while we were shying away from this because we don't want to encourage people to view it as a public interop dashboard
... had idea: have an "interop score" per spec directory / repo
... could be weighted sum of tests passing in 4/4, 3/4, etc
... more generally how do we quantify the interop for a repo?
... also how to help browsers prioritize interop work
... could create an ordered list per browser for prioritiztion

*prioritization

dknox: main goal is helping people working on implementations pick low hanging fruit
... main non-goal is making implementors feel pressure

<foolip> WPT Dashboard Interop Stats PRD: https://docs.google.com/document/d/1g22NuZ82RqPUYEOUzihc3quCuWqZ2NcearOGOIYlF20/edit?usp=sharing

dknox: in this meeting we want to find out how to structure metrics to be most useful

jgraham: it seems this is aiding a manager-y point of view (deciding what people will work on)
... one disadvantage is that it could create pressure for people to not submit tests
... another constituency is browser developers who only care about passing tests in their browser
... seeing if a specific browser passes is still valuable

clmartin: like caniuse with more data?
... who are our customers?

foolip: this is for bowser engineers?
... a goal for this is that it doesn't become benchmarking

<foolip> that is what I said, as a claim

dknox: if web developers chose to use any of this they'd want to use it like caniuse
... but we've been thinking of this only with browser vendors in mind

boazsender: I think that it will appear to web developers that this is useful information, but the information is not in the format in which web devs use features
... cultural difference

zcorpan: caniuse turns out to be useful for browser devs as well
... a single spec can have lots of features
... there's already a mapping between caniuse and specs (e.g. html) (bikeshed does this)
... being able to show a mapping between tests and features to see interop status could be useful

jgraham: what if we forgot about sharing results?

<gitbot> [13web-platform-tests] 15gsnedders 04force-pushed 06web-platform-test-lint from 14adc03d9 to 149ff5788: 02https://github.com/w3c/web-platform-tests/commits/web-platform-test-lint

<gitbot> 13web-platform-tests/06web-platform-test-lint 14207a923 15Mike West: Add a lint check for 'web-platform.test....

<gitbot> 13web-platform-tests/06web-platform-test-lint 149ff5788 15Geoffrey Sneddon: fixup! Add a lint check for 'web-platform.test.

dknox: things we've been thinking about most specifically - unsorted list of tests for each browser where tests are passing in other browsers
... also: a top level view of each directory, with the number of tests passing in all 4 engines

<foolip> is it like https://github.com/w3c/wptdashboard/issues/83#issuecomment-333371334?

dknox: should we exclude tests passing nowhere?
... we don't want to impede ppl from adding new tests
... at a basic level, let people see high level interop

scottlow: in that case is it even valuable to show individual test results?

jgraham: it's still useful to see for implementors
... the data is relatively new, nobody's being pointed at it, nobody's integrating with it
... it would be useful to pass into a PR/CL a link with test results and what's expected to change

<zcorpan> ScribeNick: zcorpan

foolip: jeffcarp what are the options?

jeffcarp: who is using this dashboard is the best place to start

jgraham: i can give you a list of feature requests
... but unrelated to how we present the data
... how we present it is interesting in that we shouldn't be making it look like Acid4 thing

clmartin: from our perspective, hte current information is something something right hand left hand

boazsender: wpt.fyi shared metric, impacts ...

jgraham: i'm against it being gameified
... want to be able to point a dev to detailed results

foolip: i think we need to keep the current presentation
... have both and be able to flip between them
... make it without colors and boring, gray?
... flip side view of the thing with 3/4 4/4 breakdown

jgraham: my use case i care about test file and below that, not so much per directory
... when you drill down enough you see how well a particular browser does
... second feature request is same thing but for the changes for a particular PR
... want to be able to paste a link to show which tests went from pass to fail

boazsender: we're also working on this in bocoup
... these are things that people are going to leave this meeting and do it this quarter
... interop score, on front page, and per test file browser breakdown...
... distribution and per-browser breakdown
... does that feel right to you edge team?

clmartin: if we can react quickly...

jgraham: i agree

boazsender: curious about reflection from webkit

<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-754161 from 143eec5ac to 14ee46f41: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-754161

<gitbot> 13web-platform-tests/06chromium-export-cl-754161 14ee46f41 15Tarun Bansal: NetInfo: web platform tests for the saveData attribute...

alex: regularly fix bugs that make tests pass...
... at some point which browser pass which tests needs to be available, shouldn't try to hide it
... side-effects "don't add that test, it will ruin our score", etc, should try to avoid that
... but this is a public repo ...

AutomatedTester: there can be cases where people aren't implementing...
... will reflect negatively on a particular feature
... "why aren't you doing web compat?"
... we are but have so many engineers, prioritization...

<gitbot> [13web-platform-tests] 15gsnedders closed pull request #6199: [css-tables] Add tests for visibility: collapse, visibility:hidden (06master...06visibilityAndCollapseTests) 02https://github.com/w3c/web-platform-tests/pull/6199

AutomatedTester: a number might reflect badly

jgraham: if you don't implement a feature then that's that
... maybe you have judged that there are more important things to work on
... e.g. methods on prototype instead of instance
... might not be important but affects a lot of tests

gsnedders: it's not important

JohnJansen: another example is acid3
... we didn't implement the last 3 tests
... got lots of bugs about that
... very difficult to look at something that shows a score and not treat it as a benchmark
... having a score inherently is a notification that we should try to avoid
... focus on things supported by 4/4 engines

jgraham: acid3 the process was (censored)
... we have intentionally not repeated that

boazsender: the thrust of this effort is to get data between browser teams
... if not drilldown, then how?

JohnJansen: i use the tools
... wpt run

clmartin: i know which tests edge fails that other browsers are passing

jgraham: we don't have that internally, if we have a tool it would be public

JohnJansen: we don't need this so biased against it

boazsender: a new engine engage in this workflow
... to the extent that we can move your workflow to interact with the material, would make the web a service

dknox: having ??? already be there fresh can put themselves into that pipeline

<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot closed pull request #8059: NetInfo: web platform tests for the saveData attribute (06master...06chromium-export-cl-754161) 02https://github.com/w3c/web-platform-tests/pull/8059

boazsender: once a month or so, people who are not in the standards process already can use the tools

JohnJansen: different conversation
... brave browser comes along, they don't want the drill ins, they want the 4/4 from the top
... DOM is 100% interop (LOL)
... you don't care about the tests are failing in edge, ...

<gitbot> [13web-platform-tests] 15jgraham tagged 06merge_pr_8059 at 06master: 02https://github.com/w3c/web-platform-tests/commits/merge_pr_8059

jgraham: it'd be useful to have someone from servo

clmartin: drew had a suggestion, what if drilling in just showed data per data, but not a score

dknox: one thing i'm hearing, feels like there's high-level agreement about what we want to do
... james submitted a suggestion a while back ???
... we have 4 browsers with test results, hasn't done anything
... it seems like we're scarred, which is real, but how about we try something and be ready to pull back quickly
... just have a combined score, and everything else stays the same

AutomatedTester: implementation reports
... people do want drilldowns there
... should that be separate?
... for spec process point of view

gsnedders: typically out of scope
... spoke to plh about this
... not so useful except for ticking boxes for the process
... grabs all the results from the dashboard so you don't need to run the tests again

boazsender: that data is available

jgraham: you have to edit the url carefully etc, not trivial

foolip: how are we going to present it

alex: why are we trying to remove information
... shouldn't we embrace that competition improves the web

foolip: not going to remove it, but not default view

JohnJansen: terrible to be conformant to css2.1 but at the same time breaking the web

jgraham: want to avoid having people game the system by contributing tests that pass in their browser and fail everywhere else
... don't want it to be a thing you have to run by the marketing department
... not saying the data for browser developers will still be there....
... yes that will still be there
... dishonest because the tests are incomplete
... in 2dcontext tests, chrome passes some number of tests, firefox passes some other number of tests, that's useless info
... conclusion, try to make incremental changes

foolip: the interop view, will be there in a few weeks

dknox: there's general agreement, we'll try something, we'll push that out and see how it goes, can try something else, can discuss that
... need to try somehting concrete

foolip: are things percentages or numbers?

<gitbot> [13web-platform-tests] 15gsnedders pushed 1 new commit to 06web-platform-test-lint: 02https://github.com/w3c/web-platform-tests/commit/b269bdf85ba4886c3ccd44ba35ec4abcde1e7bde

<gitbot> 13web-platform-tests/06web-platform-test-lint 14b269bdf 15Geoffrey Sneddon: fixup! fixup! Add a lint check for 'web-platform.test.

jgraham: this discussion isn't going to resolve

foolip: look at mockups after break

boazsender: what do we take for you, for edge to use these tools?
... wpt results reporting

JohnJansen: we wouldn't

boazsender: so why are we making this

JohnJansen: for use, the external tool to help us,
... we don't like the idea of a score card, even at per-test level

boazsender: data sharing
... how about having shared harvesting of data?
... if we use the same tools, it becomes better

JohnJansen: it's an interesting conversation

<break>

<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-594890 from 1437e3b02 to 148355796: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-594890

<gitbot> 13web-platform-tests/06chromium-export-cl-594890 148355796 15Benjamin C. Wiley Sittler: Async Cookies API: First layout tests....

<Wrs2> Hi

<scribe> ScribeNick: jeffcarp

foolip: let's talk about metrics for 10 more minutes

https://github.com/w3c/wptdashboard/issues/83#issuecomment-333371334

<foolip> https://github.com/w3c/wptdashboard/issues/83

boazsender: also talking about data sharing

foolip: looking at wpt.fyi
... options - we can either have a score or have columns that show 4/4, 3/4, 2/4, etc

jgraham: we don't know how to compute a meaningful score
... you start focusing on the metrics, not on the higher goal
... since we don't have an idea what the right meaningful score is, we should avoid scores

ato: it also focuses on the purpose of the project - interoperability

foolip: going from 3/4 to 4/4 most important metric?

jgraham: we can always imagine edge cases where going from 3/4 to 4/4 is not super useful (or really useful?)

JohnJansen: a challenge is that the denominator is always different

jgraham: to fix that the plan is to get a union of all subtests run

boazsender: we would like to get to a place where the denominator was the same

foolip: this is on our roadmap for this quarter
... NOTRUN is a status a test can have

gsnedders: I think percentages is more helpful

(we're talking about the 4/4, 3/4 columns design)

foolip: it becomes a spec to spec comparison, not a browser to browser comparison

dknox: we're thinking about this as a tool to aid thinking, not replace thinking
... gives you places to start and dig in, whereas now the problem is intractable

twisniewski: is this 4/4 agree? or 4/4 pass?

my bad, thx

ato: especially for webdriver, we
... have a lot of tests that no browser implements

<gitbot> [13web-platform-tests] 15chromium-wpt-export-bot 04force-pushed 06chromium-export-cl-594890 from 148355796 to 14da48eca: 02https://github.com/w3c/web-platform-tests/commits/chromium-export-cl-594890

<gitbot> 13web-platform-tests/06chromium-export-cl-594890 14da48eca 15Benjamin C. Wiley Sittler: Async Cookies API: First layout tests....

foolip: the browser scores will be the way they are, or more boring

everyone: more boring!

<foolip> ato points out that 0/4 is also needed

(we begin to debate the hex values for each color on the dashboard)

foolip: please submit ideas for designs to public-test-infra please

mdittmer in Waterloo is implementing

(the color debate will be conducted by raising hands, we'll be doing a binary search, those in favor of #000-#777 raise your hand)

Automating manual tests

gsnedders: we have a variety of harder cases, e.g. geolocation
... on the other hand, webbluetooth has an entire protocol
... proposal is to do simple cases of webdriver, only test JS APIs
... behind flag for complex cases

(The topic is Automating manual tests)

<gsnedders> RRSAgent: make minutes

ato: there's been talk about adding duplex communication to webdriver
... talking about getusermedia
... mentioned on mailing list, gsnedders's code is very cool - if every browser is implementing the webdriver api, why don't we just expose this as a privileged web api?

jgraham: what is the question?

<gsnedders> RRSAgent: make minutes

boazsender: I wonder how much having a synthetic interface will really exercise the platform
... how much should be done in the Webdriver spec vs. how much should be done by each feature?

<gsnedders> RRSAgent: make minutes

jgraham: we've always wanted things to be in webdriver
... one of the things we've talked about having more compact impact - easier testing of websites in mobile browsers
... web developers can't (/won't) test in content shell

foolip: the question about permissions was whether it'd be part of the WD spec or an extension to WD
... it'd be easier for implementors to write it in their spec

gsnedders: what I wanted in this agenda item was whether ppl are happy with where the line is drawn
... thought we can do everything through WD

jgraham: a lot of specs we don't know how to deal with in a generic way
... we should have conversations with those specs
... for certain APIs we end up with browsers specific flags, like webrtc and webvr

foolip: given we have gsnedders's thing, if you can use WD, do it in your own spec and wrap it in testdriver.js
... if it's deeper than that, maybe go the way of webusb

jgraham: that's a discussion to have with ppl working on specific features

JohnJansen: adding to the WD spec?

foolip: the extension spec

jgraham: the point of testdriver is that it works across browsers

JohnJansen: we'll talk more in WD wg on Thurs
... webvr is a good example of this

NellWaliczek: the challenge is that each browser has implemented webvr on a different VR platform stack

<foolip> https://github.com/w3c/web-platform-tests/pull/5535 is the PR being discussed

NellWaliczek: how can we allow the tools to make these work across browsers?
... we'd love to have someone join our conversation for how to approach these problems

jgraham: should we be advertising "if you have special testing requirements, meet with us" to working groups?
... I don't think we've solved testing on webvr for anything yet

NellWaliczek: e.g. if we're doing tests about moving the headset

jgraham: how do you test vr apps?

NellWaliczek: finding ways to prevent web developers from needing to buy every single piece of hardware to test their app
... we're not there yet for tooling, but we're on the verge of having these great tools for writing VR tests
... going to start VR testing as a service
... would be great if from the beginning we could set best testing practices

foolip: the PR mentioned above is very similar to the current issue in getusermedia

<zcorpan> https://www.w3.org/wiki/TPAC/2017/Testing

breakout session will be on the wiki ^^

Fuzzy reftests

jgraham: we're going to talk about fuzzy reftests
... it is not always easy or possible to write pixel perfect reftests
... e.g. sometimes GPU doesn't give you the same antialiasing every time you run it
... FF has implemented a system for fuzzy reftests
... WPT may need to implement this (this year)

Hexcles: when Chromium runs tests in content shell, it disables a bunch of AA

rbyers: (we avoid the problem by not testing what we're shipping)

gsnedders: Chrome also avoids the problem by not using GPU in the CQ

jgraham: for Gecko reftests, the default config is no GPU
... there's a subset that people run on GPU

clmartin: we have a lot of tests in this area

gsnedders: understanding for what Edge does is that fuzziness is allowed within a given rectangle

JohnJansen: that's correct

<gsnedders> I heard of a case where Servo had issues <span>foo</span> <span>bar</span> and "foo bar" rendering differently

JohnJansen: would using Ahem help?

twisniewski: it wouldn't fix it 100%
... there could be a way to do the fuzzing in a more sane manner if we have more control over the tests

boazsender: demonstrating how the scrollable behavior in an input is rendered differently and impossible to assert in javascript

<zcorpan> (that should then be a separate test)

<zcorpan> (imo)

jgraham: there are clearly cases where moving to Ahem won't fix certain GPU problems

foolip: in WebKit and Chromium there are pixel tests, for anything that's fuzzy we generate per-platform baselines
... we could get into generating png baselines for wpt.fyi

jgraham: where would we store those? we wouldn't run that in FF infra

foolip: you'd catch accidental changes to the test
... we can't do just pixel tests
... for the shared infra, fuzzy tests seems more palatable

Hexcles: I don't think fuzzy matching has anything to do with pixel tests

clmartin: should baselines be the responsibility of vendors?

<gitbot> [13web-platform-tests] 15foolip created 06oops-RTCDtlsFingerprint (+1 new commit): 02https://github.com/w3c/web-platform-tests/commit/8c5d6017f9d9

<gitbot> 13web-platform-tests/06oops-RTCDtlsFingerprint 148c5d601 15Philip Jägenstedt: Fix a instanceof DictionaryType bug in RTCCertificate.html

<gitbot> [13web-platform-tests] 15foolip opened pull request #8102: Fix a instanceof DictionaryType bug in RTCCertificate.html (06master...06oops-RTCDtlsFingerprint) 02https://github.com/w3c/web-platform-tests/pull/8102

boazsender: another example of this is audio
... the webaudio wg was talking about testing agenda, talking about how to deal with variation in sample amplitude in ArrayBuffer, there's a reasonable amount of difference between platforms in the way sound works

JohnJansen: image comparison tests are always really flaky

rniwa: sometimes the browser isn't flaky, it's the test that written flaky
... e.g. for gradients, it can be drawn differently each time, but the test should not be asserting it's exactly the same
... in the case of AA, it is expected that AA result will be different between each time

JohnJansen: allowing e.g. 10 pixels can be different will make you miss failures

miketaylr: the threshold should be on a per-test level

jgraham: that might make certain tests impossible

<ato> -s/miketaylr/wilhelm/

jgraham: the second possibility is that you could have a small set of parameters
... could have some masking that allows specific regions to flake

JohnJansen: I think the first thing is to use Ahem, a non-AA font for non-font tests

jgraham: there are not many objections to some sort of global parameterized things

zcorpan: one objection is that it misses bugs - on the flip side, if you don't have fuzziness, you're not able to write a test that tests one bug and not two bugs

JohnJansen: if you're having two bugs causing one failure your tests are too general
... you're asking the test author to determine fuziness, but I think it's on the browser vendor

jgraham: you could have a default fuzziness level and let vendors override it

zcorpan: make it a boolean and make vendors define it

rniwa: if a test is fuzzy in only one browser, do you mark it fuzzy in all browsers?

JohnJansen: how would that work in wpt.fyi

jgraham: could use fuzzy matching

zcorpan: wpt.fyi could store those baselines

jgraham: should we in the test encode params for diff. browsers? no

foolip: are we going to do fuzzy reftests?

jgraham: don't have specific plans to work on it, but there is some internal expectation at Mozilla that it should start working at some point (mayb e2018)

rniwa: wpt.fyi could run a failed test with fuzzy matching and suggest that the test enable fuzzy matching

Test discoverability and coverage

<scribe> New agenda item: Test discoverability and coverage

boazsender: how to onboard new contributors

zcorpan: another way of having test metadata is having it be part of the relevant spec
... sometimes tests fall through the cracks and browser devs only notice when they implement the change

rniwa: I've had many cases where a change to a dom api will affect unrelated tests

Alex: important thing is that we have simple tests that don't use 10000 features

<Hexcles> RRSAgent: make minutes

foolip: could experiment with putting metadata in specs themselves

JohnJansen: CSS wg uses bikeshed for this and it feels very fragile

clmartin: a comment on the top of the file is unmaintainable?

foolip: it's not useful enough

rniwa: in the perf wg in some of the prefetch cases, missing tests with resource types that don't work with CORS

<foolip> https://github.com/tabatkins/bikeshed/issues/1116 is the Bikeshed feature

twisniewski: want to make sure when someone imports a test, the intention of that test is clear

JohnJansen: there are some whatwg tests that should be ported or deduplicated?

jgraham: for some tests you can just write the JS file and it wraps it for you

<JohnJansen> https://wpt.fyi/dom/abort

jgraham: but we shouldn't put extra requirements on every test
... .worker.js - runs test in a worker

<Hexcles> http://web-platform-tests.org/writing-tests/testharness.html"Auto-generated test boilerplate"

jgraham: .window.js - runs test in a window

.any.js - runs test in both

<foolip> JohnJansen: it's implemented in https://github.com/w3c/web-platform-tests/blob/master/tools/serve/serve.py

<foolip> in particular https://github.com/w3c/web-platform-tests/blob/master/tools/serve/serve.py#L131

jgraham: some tests need to write bytes to a socket, e.g.

rniwa: the current requirement of using wptserve is painful because it can be slow

jgraham: random WPT developers are probably not going to add metadata to their tests about whether it needs wptserve
... but you could probably get a reasonable approximation using grep

rniwa: (concerns about adding number of necessary processes for using wptserve)

jgraham: wptserve could be faster but not multiple times faster, more like percentages faster

boazsender: wanted to talk about shared goals for WPT project, but might have hit the limit for the day

Adjourning for the day

<gsnedders> RRSAgent: make minutes

- DRAFT -

TPAC 2017 WPT (web-platform-tests) F2F

07 Nov 2017

Attendees

Contents

Intros

Roadblocks

Lunch

Assessing the compat impact of tests

Prioritisation of infra backlog, future infra work

Getting people involved in upstream issues/PRs, esp. insofar as triage

Automating manual tests

Fuzzy reftests

Test discoverability and coverage

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output