SV_MEETING_TITLE -- 13 Nov 2013

<scribe> ScribeNick: leif

jgraham: This session is more on the nitty-gritty, unlike previous on policy etc.

… The state of testing for those who are not aware:

… Tests on GitHub, accepting pull requests

… Decent docs

…We now own t…twf.org, have docs there instead of a dozen wiki pages

SimonSapin: Which groups use that repo?

tobie: (lists groups)

jgraham: all but XML-oriented groups

<simonstewart> Or CSS

<simonstewart> :)

tobie: Hopefully soonish CSS

… takes some time because CSS tied itself to hg and Shepherd

… it's a bit complicated

jgraham: As for actually running the tests…

… changes coming soon to run themmore easily

…a script coming soon to identify which files are tests in repo

…and what kind of test file

…Show what files do what

…Other change is: Previously running tests req'd Apache and PHP, not fun to get CORS tests running on w3.org

…Individual contributors had to install heavyweight software

…At Moz we didn't want PHP on every single test slave. Sysadmins would never have spoken to us again.

…We have a custom Python-based solution replicating the dynamic things from the PHP solution, but with testing in mind

…easy to make HTTP response, but doesn't force you to stick to the standards, useful to diverge in testing

…currently in review, about 2/3 done

…Anyone who worked on XHR testsuite could help review

<zcorpan> https://critic.hoppipolla.co.uk/r/368

…Should be days < weeks away

…<< months

(i.e. much less than months)

jgraham: Still haven't mentioned running tests :)

…ppl are working on it

<zcorpan> also https://critic.hoppipolla.co.uk/r/364

…Would like some discussion now on some issues

…One is the enormous code-review backlog. Need a strategy

…Another is working out whether we have tests for a certain thing

…Very interesting for a lot of reasons. One of the long-term ways of using the testsuite is, instead of stability marker for spec or going to caniuse.com, could map tests to spec parts. Req's us to obtain data from vendors on test results.

…Thoughts on Code review?

tobie: Some stuff I'd like to do if I had time.

…A system to easily run tests

…w3c-test.org (?)

…Run on different browsers automatically and report back to pull request

<simonstewart> Interestingly, this is what we do with the selenium project already for our own tests.

…Struck a deal with SauceLabs (?) that they can do that

…want to do asap

David Burns: They don't run nightlies

tobie: Right now just hooking the whole thing up, hopefully ask them to do nightlies later

…don't know how feasible, but this is a first step

jgraham: For unreviewed stuff, it's interesting. Security concerns though

…full test run data, you really want to leave that to vendors

…If you want your impl considered as an impl for parsing whatever, you should really be running tests.

…Not true today, but in the long term.

…We don't necessarily need a system for running every test every day in SauceLAbs, but these tests once is useful.

tobie: Both use cases are valuable. Review is obvious. But also aggregate into WebPlatform.org and feed to devs.

…Lot of value for devs.

David Burns: "dev" means "webdev"

jgraham: A problem with code review is that we try to do too much upfront. We should work out what fails, then come back and say that, vendor should look at the test.

…There's a tension between getting a quality standards and quick review.

tobie: and quantity

jgraham: Hard work, and nobody's paid to do it

tobie: [missed]

…it's a bottleneck

…I often see in CR that metadata is missing, other formalities. Could be automated.

…Immediate comment in CR.

(CR= code review)

zcorpan: Trailing whitespace. Don't bother whining about it myself

rebecca: [missed]

zcorpan, tobie: can't solve all the problems

tobie: But saves reviewers from going nuts over details. Reviewer is engaged immediately, instead of 2-6 months and then whitespace complaints. Encourages rude replies!

zcorpan: Test writers can be encouraged to run checks before submitting

simonstewart: Would like to edit pull requests

(may have been a joke)

simonstewart: I've only seen people volunteering at TTWF event, but afterwards engagement drops.

rhauck2: Yes, a problem … Shanghai and Baidu people stayed engaged

tobie: Want to set up assigning Pull Reqeusts to people

… have a test coordinator for a spec

…Some automation plus finding the right people…it's my best offer at this point.

jgraham: I definitely agree that that's valuable

…GitHub's are one set of solutions, there are others

…Lots that I could review if it wasn't hard and boring

…checking that assertion about spec are correct etc.

tobie: Intersection of skill sets often empty

rhauck2: (?) has good policies on this

jgraham: They have salaries

…Need someone to employ them essentially to review tests part time

wilhelm: Both …and review question are about resources

…Don't know what the right form is.

…Can go to employers rather than guilting people

tobie: Non-trivial

…as seen over the past year

jgraham: If non-trivial to the level that it won't happen, we need a different strategy

tobie: Right. Instead of thinking in terms of "making reviews happen quickly", "if not reviewed in 2 weeks, it's out"

…Build a toolset that makes quality possible with those constraints

…Quality on one side, quantity on the other. Put the cursor on the right place.

jgraham: A countdown timer incentivizes people. You might just find a issue to extend the timer

tobie: "It's no longer my problem"

jgraham: This is the reason i like to track the progress of reviews. Mark files reviewed. If it always says 100 % remaing…

rhauck2: Are there test submissions that can be special-cased? From vendors, e.g., that are scuritinized more on beforehand

tobie: We've changed the process for this

David Burns: Would Chromium people be happy with Mozilla's? Being devil's advocate here.

rhauck2: It's a compromise

David Burns: More politics

jgraham: … could we have magic that turns a patch into a PR

tobie: Yeah, explicitly changed process. Two employees can write and review, as long as process is public.

rhauck2: Not quite the same. Same company.

… [missed]

…Can we special case certain things, like working together on Flexbox suite?

David: I personally don't see issues

jgraham: e.g. if one company has completely wrong model of a spec

… We would have accepted wrong specs

tobie: Not politics, mistakes

jgraham: Also can't prevent it completely

…There are issues, but might be worth it, otherwise we won't accept anything. A vendor could potentially submit 1000 tests

…Could have to wait a while

zcorpan: I noticed :)

tobie: Could be more valuable to just have it public and accessible. If it has a problem, just take it out!

jgraham: Yeah. If we're happy to automatically forward tests, makes it easier to work with the repo

Burns: If it's internal to Mozilla, should it be public to all?

rhauck2: Good question. Would be great if not too much trouble

jgraham: …

tobie: Two different things. One is acceptable, CR was in the open, you can track it, valuable info. Doesn't mean we shouldn't special-case tests coming from trusted people or orgs.

…Maybe we shouldn't, but keep in mind that they're different

…No-one questions open reviews

Burns: Review doesn't have to be on GitHub.

…Keep wording fluffy

…If a vendor doesn't already have an open process, have to submit PR.

jgraham: …

tobie: The policy says that same company can approve PR if review in the open

jgraham: Review in the open could be just look at internal bug tracker and marking reviewed.

Burns: …

tobie: You want a paper trail

rhauck2: Private reviews doesn't work very well

tobie: Yeah, no need to discuss it

…One thing is if Microsoft brings in 1000 tests, and someone else in MS OKs

…Is the use-case MS and Opera? How does Opera work re. Blink?

zcorpan: Going forward Opera uses Chromium testing infra, not old Opera infra.

lmcliste_: Do you have tests upstream of Chromium?

jgraham: Q is, if there was an Opera-developed feature, web-interacting and needing tests, and you submitted tests, would it be an open patch or reviewed behind closed doors?

zcorpan: Not sure how it works right now.

…the only changes were made by philipj

…I reviewed them in the open using Critic

jgraham: My feeling is that Opera is not the problem case here

tobie: How do Microsoft contribs usually work? My impression is that they are bulks coming in from time to time.

… Maybe better to decide that they need review from someone else, but that someone can communicate with them to ask how much internal review was done etc.

Burns: TBH, better if merge was instantaneous. Paper trail is the valuable thing.

jgraham: From vendors, we're willing to have forgiveness, not permission

…and if we find out a lot of crap tests have been coming, we want review from other vendor in the future.

tobie: That's exactly my intention

…We're all tryinging to do the right thing. If ppl behave like idiots, we have to deal with that

…This a minor policy question when problems arise.

…Don't mean to pick on MS, but I mean closed impls.

…Can we make a decision on this now?

jgraham: Want to ask ML, might be dissent.

zcorpan: I don't mind blessing vendor tests, but I'd like some time window for review.

…In case people do want to review and they find problems.

…At least some way of identifying recently merged, unreviewed tests

…If they're just merged, I wouldn't normally look at them.

tobie: At this point, want to move to ML. We've nailed down problem, just need to make a decision.

action jgraham to ask mailing list

rhauck2: When you migrated to GitHub, did you …

jgraham: We moved everything that was obvious where to move

…some had a different hierarchy, didn't know how to reorg

…1000s of tests

…'old' directory not reviewed

???: Some tests contain errors, that we found on Saturday. AT some point someone needs to review

rhauck2: Can we run them and use them?

jgraham: yes

…Now that I can run tests in Gecko automatically (on the python server branch) I fixed a lot of broken tests

…Broken tests become very obvious

tobie: You're bringing that up because of worry about state of CSS testsuite

…I wouldn't worry too much, it's a fact of life. Start running on WebDriver, SauceLabs. You'll quickly see what's going on. If something works on 4 browsers, it's probably good, if fails everywhere, probably a broken test

rhauck2: Right. We're going to refactor the dir strucuture

…krit and others are writing scripts that assume a structure

…How do we address this

tobie: This gets addressed by running them

rhauck2: Running them all over the place

tobie: This is why I push for time and money for doing this.

…SauceLabs runs on 10s of combos of OS and browsers

jgraham: Obviously need to take the effort. If you're a vendor and see failures, you need to look at them because they might be impl or spec bugs. If you're not looking at fails, you're not getting value out of testsuite.

…It's worth always running tests, even if failing.

rhauck2: FAiling tests are a good things to me.

tobie: Running the tests and analyzing the results are two different steps

rhauck2: …

tobie: Try to enforce using short names

<Ms2ger> Short names?

jgraham: That's what we wanted to discuss about CR

…Also want to discuss coverage, but worried tobie will kill himself

tobie: No :)

…Spec coverage, usually it's easier to measure that code has coverage rather than specs

…Specs are harder because two stakeholders with diff. requirements and interests

…Some are interested in broad but shallow want to parse the spec for normative requirements, specs and algorithms, webidl, propdefs, assume you need a certain number of tests for each

…Not a fantastic solution, but gives you a good idea of what you've tested and not

…estimate fairly precisely the engineering time needed

…some data points show that these estimates are rather solid

…The other is for robustness and interop, you wanted to test known-brittle areas, know-non-interoperable

…We don't have good solutions to measure coverage for this. jgraham has some strategies

jgraham: I have two interests: one a bit like what you talked about as coarse-grained. Eventually we have a spec and we can believe we don't have major interop problems.

…reasonable number of tests given a spec.

…not perfect, obvs.

…The other is working out what you've missed. Hard to do: combos of features (web sockets and workers) rarely obvious…

rhauck2: Mihai Balan was perplexed for the same reason

jgraham: That's just hard, and requires afaict people who know that there are interactions

…One thing that might work is look at ? data from vendors

…You have 80 % of what we need

…Would be nice if for specific specs we could say what is covered.

…"We need worker tests"

<rhauck2> http://lists.w3.org/Archives/Public/public-css-testsuite/2013Nov/0000.html

…want to look at it in the feature. Looked at it for an afternoon, clearly non-trivial

…(e.g. killing browser all the time loses data)

tobie: Points at where complexity of impl is

…Gives the big picture

…about codebase

jgraham: Points out places most likely to be brittle

…no good solutions atm, out of time

…anyone else want to say something briefly?

???: Would be useful to document the state of discussion

…coverage analysis etc.

tobie: The tool you saw in the previous meeting [at TPAC plenary], I'm planning to release probably in the next week the coverage tool that makes estimates for effort and cost

… [missed] ttwf….org/coverage

…some flaws, stale data, only shallow coverage

…but will at least be public.

…if somebody doesn't want me to do that, stop me now!

Meeting closed.

- DRAFT -

SV_MEETING_TITLE

13 Nov 2013

Attendees

Contents

Summary of Action Items

Scribe.perl diagnostic output