Web Platform Testing

28 Oct 2015

See also: IRC log


Florian, JohnJansen, SimonSapin, dom, fantasai, gsnedders, jgraham, jyasskin, kawai, mkwst, ojan, r12a, rniwa, shoko, yosuke


jgraham: We have two places where we collect tests
... We have web platform tests, github repo, can submit tests using normal github workflow
... Has tests for most specs apart from CSS
... CSS has a separate repo for historical repo
... It is actually an hg repo, but has a mirror in github
... There are slightly different requirements for each set of tests
... We have a site testthewebforward.org which attemts, oocasionally inaccurately, how you write and submit tests
... Kinds of things we can test atm :
... 1. Things you can access through JS DOM Apis, using testharness.js
... which gives you a way to write JS test
... 2. reftests, which are for things that depend on layout/rendering
... You create two versions of a document, one that uses feature you're testing, and another that's supposed to have identical rendering, but using simpler technologies (specifically, not the feature being tested)
... We also accept manual tests as well
... There's a design to add some sort of automation to that, for things that can currently only be done with manual tests or only browser-internal testing APIAs

Florian: In Opera we distinguished between tests requiring interaction, and others which you couldn't create a reference file, but could for subsequent runs after a pass, compare by screenshot

e.g. for gradients or something like that

SimonSapin: Gecko has fuzzy reftests, where only a few pixels off. Need to specify how many pixels, and how much off

<gsnedders> (Presto-based Opera, that was)

jgraham: Those are impl detaisl at opera
... One of the goals that I have at last, is to allow as many tests as possible to be run in continuous integration system
... Even with all of Opera's infrastructure, couldn't run those in continuous integration

Florian: yes

<yosuke> rrsaget, draft minutes

gsnedders: Sort of

jgraham: You can do it if you're running handful of interations a day
... If you're running 100s of integrations a day, if they all requiring manual fiddling with tests
... Even with Opera's case, slight tweaks would need re-evaluation, ended up with thousands of possible renderings,

gsnedders: I don't think anyone has a desire to do this

jgraham: Other thing is, in terms of where we are wrt running the tests
... At Mozilla we run almost all the web platform tests in automation for each commit
... Have an open source impl called wptrunner, which is somewhat browser-independent
... There's a pull request to get edge support
... Also have it running in servo
... Also in Servo, run a subset of the CSS tests using same test runner
... That's the State of the Union

jyasskin: What proportion of tests is Chrome actually running?

jgraham: Hoping someon can say
... Chrome has a heavyweight import process
... As a result, nobody actually imports stuff
... So Chrome is only running tests they upstreamed, but not taking bugfixes down

ojan: Really adhoc, so ppl who are motivated wrt a test suite will pull new versions

?: We made some effort to run wptserv so we can run the tests as written

mikewest: That work is 80% done, but had problems finding people to do the last 20%

?: Once that's done, will be able to just run all of the tests

ojan: Are you working on this?

mikewest: Intern did most of it

gsnedders: So, CSS test suite
... Has basically same types of tests
... As wpt
... But also has more metadata, such that it's possible to find out which ones can be screenshotted
... Ideally we want to get rid all the screenshot compared tests

fantasai: Can get rid of all of them, but not quite all
... Some can't be turned into reftests
... E.g. can't test for underlines
... Can test that it's not not underlined, but underlining thickness and position isn't defined per spec

JohnJansen: But converting the reftestable tests is a big undertaking

gsnedders: We should have all new tests with references

Florian: When possible

rniwa: That's only for things that need visual comparison
... Should be JS test if possible

<jyasskin> thx gsnedders

JohnJansen: CSSWG resolved on Js test first, reftest second, manual test third

fantasai: No, resolved on reftest or JS test (automatable) over manual
... Didn't want to prefer JS over reftest because CSS has non-JS implementations

r12a: ...
... Can we go over objectives for this session?

<gsnedders> the first was about viewing test results

jgraham: In regard to first thing, there is a tool that was built for visualizing which tests pass
... show you tests that pass
... but that doesn't work with the output from wptrunner
... There's another in-browser runner that people were using for getting specs to CR
... rather than continuous integration systems
... That will output in ways that can be read by this tool
... For general web platform tests in wptrunenr, don't have a way to visualize
... CSS has other systems that allows running tests online and store test results including UA data
... Shoudld allow slurping test results, and display those
... And display results fro 200,000 tests

gsnedders: Should integrate with continuous integration systems

jyasskin: Totally unrelated to what's discussed so far. How do we do WebBluetooth or USB testing or geolocation
... To test those, you have to tell the browser how to respond to those tests
... We don't currently have a way to do that
... This group should tell us how

rniwa: What are you saying?

jyasskin: For web bluetooth, the implementation we have right now doesn't go end to end. Tests platform-independent part of crhom
... As part of the test, need to make the API call that shows dialog, tell browser how to respond to dialog
... Spec says there should be some prompt, or that the user grants permission in some way
... Has a space for UI to appear
... You don't get the response until permission granted
... Want to configure a fake device that will respond in certain ways to bluetooth radio
... Then want to assert that those responses make it to the browser and ..

Florian: Or geolocation, pretend to be somewhere

rniwa: if you're mocking it, don't know if it actually works or not

jyasskin: we test half of the thing
... We can use same functions to configure a physical device that actually tests the whole stack.
... Would like tests to work for both
... Produce same results

rniwa: Suppose bluetooth device, need to have a very specific deice for very speciif cresult

jyasskin: There ar test devices that allow responding in specific ways

rniwa: bluetoothe Q tool

jyasskin: yes
... want to run in a lab, but want to run tests in multipe browsers, and be able to run test if you don't have the lab
... I have a spec for this set of testing functions, but happens to be what we write in chrome, it's terrible names, no consensus on this

rniwa: Seems like a prime candidate for webdriver
... Seems like the right place to do this kind of stuff

jyasskin: Talked to them yesterday
... For first part, controlling dialog, yes, but 2nd part maybe not

fantasai: We're off-topic. Going back to r12a's 2nd question, clarifying the topic

<jyasskin> Sorry for asking the off-topic question. :)

<gsnedders> fantasai: the topic is actually primarily getting everything synchronied up and getting everyone running it up so we don't have duplicate tests being written and then can spend more reasources on writing different tests


jgraham: Going back to synchronization
... Situation we have at Mozilla atm is worth talking about, because better than anyone else has
... For web platform tests, we have a script that allows us to.. it pulls in the upstream repository and replaces our copy with the upstream copy
... Which would be fine, and what we had at the beginning
... But didn't allow devs an easy workflow to submit tests
... So what we did then is we added functionality that allows devs to land patches on a local copy of the test
... And before we do a pull, those patches get upstreameed
... web platform test review policy is that as long as review was public, it's accepted as a review
... We upstream the patches, and then pull down the changes from the w3c master
... We then have to update metadata about which changed tests pass/fail
... That takes about a day, mostly automatic and just waiting
... With CSS, the tests that we run aren't the tests that we submit
... We submit source files, but run built tests
... So our devs can't patch the tests

SimonSapin: CSS tests have a build system

jgraham: The build system changes the tests

r12a: Do we still need that?
... It was introduced years ago when we had to deal with XHTML-only implementations

gsnedders: Part of it was to get of CR

fantasai: Was not just getting out of CR, but also that we wanted the tests to be able to run in more CSS implementations than just browsers

<gsnedders> fantasai: we can probably change the build system at this point o that the HTML copy of the tests is just a pass through, we do need to parse it to extract the mtadata but we can probably change that


jgraham: Need that, also need accepting review from other organizations

rniwa: I have oposite problem, get reviews where need to change the test, and dont' have time to go back and fix the test.

Florian: Depending on who's reviewing the test, can get comments on "would be better to fix this, but not necessary" vs. "this test is incorrect"

?: Same way that browsers reviewing patches, also need to review tests

fantasai: I've noticed in a lot of cases, tests aren't reviewed, just "yay, you have tests, good good check it in"

jgraham: Once we have tests running everywhere, breakage on a different impl will highlight test errors

gsnedders: better to run it, because ppl runing tests will notice it

Florian: If the tests fail when they should not, will catch it. If the tests pass when they should not, will not catch it.

<gsnedders> == about:blank about:blank is a great test right?

Florian: e.g. test that written to not fail

fantasai: I think it's fine to go with this approach for browser vendors in our community, just need to be clear that W3C tests might not be correct, need to read spec when implementing
... I'm concerned that people will try to fix implementation to match the tests instead of reporting errors in the tests
... Esp. implementers in China, Japan, places that don't speak English and aren't as wlel integrated into this community

gsnedders: What do people need to get this to work/

jgraham: automate more tests
... Need to make it easy for people to fix tests locally and upstream the fixes

[discussion about out-of-date documentation]

JohnJansen: Having documentation all in one place makes it easy to integrate tests
... We import tests ever week, run them every day
... Our automation for CSS tests is screenshot based, very problematic, want to switch to reftests
... It's hard for us
... There's tribal knowledge necesary to contribute to CSSWG tests

gsnedders: So need to make it easier to contribute tests

rniwa: Make it easier for browser vendors to sync
... Need to import all tests automatically
... Not set by set

fantasai: Mozilla has a directory that gets automatically synced, but nobody puts tests into that directory for some reason

?: For blink, isn't about import directories, but about it being on a different server

mkwst: Because of special headers etc.

mikewest: Setting up a server is historically difficult

jgraham: The difficulty at Mozilla has been that people are more familiar with existing tools, so use Mozilla-specific tools instead of wpt
... Or they need browser-specific APis

mkwst: Some layout tests we use browser-speciic stuff, could possibly be done with DOM but very tricky

JohnJansen: Blinks' layout tests require internal calls

mkwst: We have things we wnat to do to set up browser in certain ways, need to use internal APIs

jyasskin: No consensus on standardizing these APis

jgraham: Historically ppl hesitant to standardize these APIs

rniwa: Click event, etc. that coudl be done through webdriver
... halfway there
... can add more of that stuff
... tricky ones, e.g. geolocation
... I'd imagine that feature would be very useful for random websites, too
... Let's say your'e trying to show UI based on location
... if you could tell browser to pretend it's in Japan could be useful

jgraham: Class of features would be useful for adding to web driver
... There's also a class of really internal stuff, that nobody wants to expose to authors
... Like "now trigger a garbage collection", which would be a disaster to have on the web
... That's the thing where ppl go, well since I need to use this interal API in 1/10 tests, will write all the tests using that API

mkwst: Also test that for convenience, will print things from browser internals
... For resource loading e.g.
... We wouldn't want that on the web
... There are class of thigns we want to test
... Hard to test with content-level apis

fantasai: For tests where we have the ability, we need to address people writing tests in the wrong format without any real excuse
... So how do we do that?

[discussion of review policies requiring tests to be in the right format]

jgraham: For Servo, policy is if you can write a test that we can upstream, do it.
... If you can't, use the same harness if policy, but put it into a different directory
... But Servo doesn't have a history, devs haven't learned
... Problem with Google, and Mozilla etc. then have 500 engineers, whov'e been doign something different for 10 years

fantasai: If we can get just the reviewers to switch over and enforce that switch on the patches they review, then we can make that shift happen

jgraham: Need to make it easy enough for the reviewers to do that, so if it's hard currently need to fix that.

Florian: We also have presto-testo repo with lots of tests

jgraham: There has been some effort

Florian: There's still 80,000 files in it

gsnedders: Not much interesting

gsnedders, jgraham: Let's keep talking about this

jgraham: feel free to chat with us
... Discussion is on public-test-infra@w3.org
... CSS also has public-css-testsuite@w3.org

gsnedders: Relatively low traffic atm

1. Change the build system at CSSWG

2. Fixing ttwf documentation to make CSSWG testing info findable and up-to-date

3. Automate more tests

4. Infrastructure for automating manual tests in cross-browser way

5. Get browser vendors to agree that all new tests should be wpt/testharness/reftest format

rniwa: testharness is verbose

jgraham: It's a lot better now

<jyasskin> FWIW, Blink folks _do_ write tests in testharness, but so many things are impossible there.

need to reduce required metadata in tests

r12a: People leaving out assertions is problematic, can't tell what's being tested

<gsnedders> RRSAgent: stop

<gsnedders> RRSAgent: off

<gsnedders> ACTION: Change the build system at CSSWG [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action01]

<gsnedders> ACTION: Fixing ttwf documentation to make CSSWG testing info findable and up-to-date [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action02]

<gsnedders> ACTION: Automate more tests [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action03]

<gsnedders> ACTION: Infrastructure for automating manual tests in cross-browser way [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action04]

<gsnedders> ACTION: Get browser vendors to agree that all new tests should be wpt (testharness/reftest) format [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action05]

<gsnedders> ACTION: Reduce metadata requirements in CSSTS [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action06]

<gsnedders> RRSAgent: make the minutes

<gsnedders> RRSAgent: make the minutes

Summary of Action Items

[NEW] ACTION: Automate more tests [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action03]
[NEW] ACTION: Change the build system at CSSWG [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action01]
[NEW] ACTION: Fixing ttwf documentation to make CSSWG testing info findable and up-to-date [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action02]
[NEW] ACTION: Get browser vendors to agree that all new tests should be wpt (testharness/reftest) format [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action05]
[NEW] ACTION: Infrastructure for automating manual tests in cross-browser way [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action04]
[NEW] ACTION: Reduce metadata requirements in CSSTS [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action06]
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.140 (CVS log)
$Date: 2015/10/28 07:19:38 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.140  of Date: 2014-11-06 18:16:30  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/ttwf.org/testthewebforward.org/
Succeeded: s/?/mikewest/
Succeeded: s/gsnedders/jgraham/
Succeeded: s/?/mikewest/
Succeeded: s/mikewest/mkwst/
Succeeded: s/tests/tests to be in the right format/
No ScribeNick specified.  Guessing ScribeNick: fantasai
Inferring Scribes: fantasai

WARNING: No "Topic:" lines found.

Present: Florian JohnJansen SimonSapin dom fantasai gsnedders jgraham jyasskin kawai mkwst ojan r12a rniwa shoko yosuke
Got date from IRC log name: 28 Oct 2015
Guessing minutes URL: http://www.w3.org/2015/10/28-testing-minutes.html
People with action items: automate browser change documentation fixing get infrastructure reduce ttwf vendors

WARNING: No "Topic: ..." lines found!  
Resulting HTML may have an empty (invalid) <ol>...</ol>.

Explanation: "Topic: ..." lines are used to indicate the start of 
new discussion topics or agenda items, such as:
<dbooth> Topic: Review of Amy's report

[End of scribe.perl diagnostic output]