See also: IRC log
jgraham: We have two places where
we collect tests
... We have web platform tests, github repo, can submit tests
using normal github workflow
... Has tests for most specs apart from CSS
... CSS has a separate repo for historical repo
... It is actually an hg repo, but has a mirror in github
... There are slightly different requirements for each set of
tests
... We have a site testthewebforward.org which attemts,
oocasionally inaccurately, how you write and submit tests
... Kinds of things we can test atm :
... 1. Things you can access through JS DOM Apis, using
testharness.js
... which gives you a way to write JS test
... 2. reftests, which are for things that depend on
layout/rendering
... You create two versions of a document, one that uses
feature you're testing, and another that's supposed to have
identical rendering, but using simpler technologies
(specifically, not the feature being tested)
... We also accept manual tests as well
... There's a design to add some sort of automation to that,
for things that can currently only be done with manual tests or
only browser-internal testing APIAs
Florian: In Opera we distinguished between tests requiring interaction, and others which you couldn't create a reference file, but could for subsequent runs after a pass, compare by screenshot
e.g. for gradients or something like that
SimonSapin: Gecko has fuzzy reftests, where only a few pixels off. Need to specify how many pixels, and how much off
<gsnedders> (Presto-based Opera, that was)
jgraham: Those are impl detaisl
at opera
... One of the goals that I have at last, is to allow as many
tests as possible to be run in continuous integration
system
... Even with all of Opera's infrastructure, couldn't run those
in continuous integration
Florian: yes
<yosuke> rrsaget, draft minutes
gsnedders: Sort of
jgraham: You can do it if you're
running handful of interations a day
... If you're running 100s of integrations a day, if they all
requiring manual fiddling with tests
... Even with Opera's case, slight tweaks would need
re-evaluation, ended up with thousands of possible
renderings,
gsnedders: I don't think anyone has a desire to do this
jgraham: Other thing is, in terms
of where we are wrt running the tests
... At Mozilla we run almost all the web platform tests in
automation for each commit
... Have an open source impl called wptrunner, which is
somewhat browser-independent
... There's a pull request to get edge support
... Also have it running in servo
... Also in Servo, run a subset of the CSS tests using same
test runner
... That's the State of the Union
jyasskin: What proportion of tests is Chrome actually running?
jgraham: Hoping someon can
say
... Chrome has a heavyweight import process
... As a result, nobody actually imports stuff
... So Chrome is only running tests they upstreamed, but not
taking bugfixes down
ojan: Really adhoc, so ppl who are motivated wrt a test suite will pull new versions
?: We made some effort to run wptserv so we can run the tests as written
mikewest: That work is 80% done, but had problems finding people to do the last 20%
?: Once that's done, will be able to just run all of the tests
ojan: Are you working on this?
mikewest: Intern did most of it
gsnedders: So, CSS test
suite
... Has basically same types of tests
... As wpt
... But also has more metadata, such that it's possible to find
out which ones can be screenshotted
... Ideally we want to get rid all the screenshot compared
tests
fantasai: Can get rid of all of
them, but not quite all
... Some can't be turned into reftests
... E.g. can't test for underlines
... Can test that it's not not underlined, but underlining
thickness and position isn't defined per spec
JohnJansen: But converting the reftestable tests is a big undertaking
gsnedders: We should have all new tests with references
Florian: When possible
rniwa: That's only for things
that need visual comparison
... Should be JS test if possible
<jyasskin> thx gsnedders
JohnJansen: CSSWG resolved on Js test first, reftest second, manual test third
fantasai: No, resolved on reftest
or JS test (automatable) over manual
... Didn't want to prefer JS over reftest because CSS has
non-JS implementations
...
r12a: ...
... Can we go over objectives for this session?
<gsnedders> the first was about viewing test results
jgraham: In regard to first
thing, there is a tool that was built for visualizing which
tests pass
... show you tests that pass
... but that doesn't work with the output from wptrunner
... There's another in-browser runner that people were using
for getting specs to CR
... rather than continuous integration systems
... That will output in ways that can be read by this
tool
... For general web platform tests in wptrunenr, don't have a
way to visualize
... CSS has other systems that allows running tests online and
store test results including UA data
... Shoudld allow slurping test results, and display
those
... And display results fro 200,000 tests
gsnedders: Should integrate with continuous integration systems
jyasskin: Totally unrelated to
what's discussed so far. How do we do WebBluetooth or USB
testing or geolocation
... To test those, you have to tell the browser how to respond
to those tests
... We don't currently have a way to do that
... This group should tell us how
rniwa: What are you saying?
jyasskin: For web bluetooth, the
implementation we have right now doesn't go end to end. Tests
platform-independent part of crhom
... As part of the test, need to make the API call that shows
dialog, tell browser how to respond to dialog
... Spec says there should be some prompt, or that the user
grants permission in some way
... Has a space for UI to appear
... You don't get the response until permission granted
... Want to configure a fake device that will respond in
certain ways to bluetooth radio
... Then want to assert that those responses make it to the
browser and ..
Florian: Or geolocation, pretend to be somewhere
rniwa: if you're mocking it, don't know if it actually works or not
jyasskin: we test half of the
thing
... We can use same functions to configure a physical device
that actually tests the whole stack.
... Would like tests to work for both
... Produce same results
rniwa: Suppose bluetooth device, need to have a very specific deice for very speciif cresult
jyasskin: There ar test devices that allow responding in specific ways
rniwa: bluetoothe Q tool
jyasskin: yes
... want to run in a lab, but want to run tests in multipe
browsers, and be able to run test if you don't have the
lab
... I have a spec for this set of testing functions, but
happens to be what we write in chrome, it's terrible names, no
consensus on this
rniwa: Seems like a prime
candidate for webdriver
... Seems like the right place to do this kind of stuff
jyasskin: Talked to them
yesterday
... For first part, controlling dialog, yes, but 2nd part maybe
not
fantasai: We're off-topic. Going back to r12a's 2nd question, clarifying the topic
<jyasskin> Sorry for asking the off-topic question. :)
<gsnedders> fantasai: the topic is actually primarily getting everything synchronied up and getting everyone running it up so we don't have duplicate tests being written and then can spend more reasources on writing different tests
[...]
jgraham: Going back to
synchronization
... Situation we have at Mozilla atm is worth talking about,
because better than anyone else has
... For web platform tests, we have a script that allows us
to.. it pulls in the upstream repository and replaces our copy
with the upstream copy
... Which would be fine, and what we had at the beginning
... But didn't allow devs an easy workflow to submit
tests
... So what we did then is we added functionality that allows
devs to land patches on a local copy of the test
... And before we do a pull, those patches get
upstreameed
... web platform test review policy is that as long as review
was public, it's accepted as a review
... We upstream the patches, and then pull down the changes
from the w3c master
... We then have to update metadata about which changed tests
pass/fail
... That takes about a day, mostly automatic and just
waiting
... With CSS, the tests that we run aren't the tests that we
submit
... We submit source files, but run built tests
... So our devs can't patch the tests
SimonSapin: CSS tests have a build system
jgraham: The build system changes the tests
r12a: Do we still need
that?
... It was introduced years ago when we had to deal with
XHTML-only implementations
gsnedders: Part of it was to get of CR
fantasai: Was not just getting out of CR, but also that we wanted the tests to be able to run in more CSS implementations than just browsers
<gsnedders> fantasai: we can probably change the build system at this point o that the HTML copy of the tests is just a pass through, we do need to parse it to extract the mtadata but we can probably change that
fantasai:
jgraham: Need that, also need
accepting review from other organizations
...
rniwa: I have oposite problem, get reviews where need to change the test, and dont' have time to go back and fix the test.
Florian: Depending on who's reviewing the test, can get comments on "would be better to fix this, but not necessary" vs. "this test is incorrect"
?: Same way that browsers reviewing patches, also need to review tests
fantasai: I've noticed in a lot of cases, tests aren't reviewed, just "yay, you have tests, good good check it in"
jgraham: Once we have tests running everywhere, breakage on a different impl will highlight test errors
gsnedders: better to run it, because ppl runing tests will notice it
Florian: If the tests fail when they should not, will catch it. If the tests pass when they should not, will not catch it.
<gsnedders> == about:blank about:blank is a great test right?
Florian: e.g. test that written to not fail
fantasai: I think it's fine to go
with this approach for browser vendors in our community, just
need to be clear that W3C tests might not be correct, need to
read spec when implementing
... I'm concerned that people will try to fix implementation to
match the tests instead of reporting errors in the tests
... Esp. implementers in China, Japan, places that don't speak
English and aren't as wlel integrated into this community
gsnedders: What do people need to get this to work/
jgraham: automate more
tests
...
... Need to make it easy for people to fix tests locally and
upstream the fixes
[discussion about out-of-date documentation]
JohnJansen: Having documentation
all in one place makes it easy to integrate tests
... We import tests ever week, run them every day
... Our automation for CSS tests is screenshot based, very
problematic, want to switch to reftests
... It's hard for us
... There's tribal knowledge necesary to contribute to CSSWG
tests
gsnedders: So need to make it easier to contribute tests
rniwa: Make it easier for browser
vendors to sync
... Need to import all tests automatically
... Not set by set
fantasai: Mozilla has a directory that gets automatically synced, but nobody puts tests into that directory for some reason
?: For blink, isn't about import directories, but about it being on a different server
mkwst: Because of special headers etc.
mikewest: Setting up a server is historically difficult
jgraham: The difficulty at
Mozilla has been that people are more familiar with existing
tools, so use Mozilla-specific tools instead of wpt
... Or they need browser-specific APis
mkwst: Some layout tests we use browser-speciic stuff, could possibly be done with DOM but very tricky
JohnJansen: Blinks' layout tests require internal calls
mkwst: We have things we wnat to do to set up browser in certain ways, need to use internal APIs
jyasskin: No consensus on standardizing these APis
jgraham: Historically ppl hesitant to standardize these APIs
rniwa: Click event, etc. that
coudl be done through webdriver
... halfway there
... can add more of that stuff
... tricky ones, e.g. geolocation
... I'd imagine that feature would be very useful for random
websites, too
... Let's say your'e trying to show UI based on location
... if you could tell browser to pretend it's in Japan could be
useful
jgraham: Class of features would
be useful for adding to web driver
... There's also a class of really internal stuff, that nobody
wants to expose to authors
... Like "now trigger a garbage collection", which would be a
disaster to have on the web
... That's the thing where ppl go, well since I need to use
this interal API in 1/10 tests, will write all the tests using
that API
mkwst: Also test that for
convenience, will print things from browser internals
... For resource loading e.g.
... We wouldn't want that on the web
... There are class of thigns we want to test
... Hard to test with content-level apis
fantasai: For tests where we have
the ability, we need to address people writing tests in the
wrong format without any real excuse
... So how do we do that?
[discussion of review policies requiring tests to be in the right format]
jgraham: For Servo, policy is if
you can write a test that we can upstream, do it.
... If you can't, use the same harness if policy, but put it
into a different directory
... But Servo doesn't have a history, devs haven't
learned
... Problem with Google, and Mozilla etc. then have 500
engineers, whov'e been doign something different for 10
years
fantasai: If we can get just the reviewers to switch over and enforce that switch on the patches they review, then we can make that shift happen
jgraham: Need to make it easy enough for the reviewers to do that, so if it's hard currently need to fix that.
Florian: We also have presto-testo repo with lots of tests
jgraham: There has been some effort
Florian: There's still 80,000 files in it
gsnedders: Not much interesting
gsnedders, jgraham: Let's keep talking about this
jgraham: feel free to chat with
us
... Discussion is on public-test-infra@w3.org
... CSS also has public-css-testsuite@w3.org
gsnedders: Relatively low traffic atm
1. Change the build system at CSSWG
2. Fixing ttwf documentation to make CSSWG testing info findable and up-to-date
3. Automate more tests
4. Infrastructure for automating manual tests in cross-browser way
5. Get browser vendors to agree that all new tests should be wpt/testharness/reftest format
rniwa: testharness is verbose
jgraham: It's a lot better now
<jyasskin> FWIW, Blink folks _do_ write tests in testharness, but so many things are impossible there.
need to reduce required metadata in tests
r12a: People leaving out assertions is problematic, can't tell what's being tested
<gsnedders> RRSAgent: stop
<gsnedders> RRSAgent: off
<gsnedders> ACTION: Change the build system at CSSWG [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action01]
<gsnedders> ACTION: Fixing ttwf documentation to make CSSWG testing info findable and up-to-date [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action02]
<gsnedders> ACTION: Automate more tests [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action03]
<gsnedders> ACTION: Infrastructure for automating manual tests in cross-browser way [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action04]
<gsnedders> ACTION: Get browser vendors to agree that all new tests should be wpt (testharness/reftest) format [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action05]
<gsnedders> ACTION: Reduce metadata requirements in CSSTS [recorded in http://www.w3.org/2015/10/28-testing-minutes.html#action06]
<gsnedders> RRSAgent: make the minutes
<gsnedders> RRSAgent: make the minutes
This is scribe.perl Revision: 1.140 of Date: 2014-11-06 18:16:30 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/ttwf.org/testthewebforward.org/ Succeeded: s/?/mikewest/ Succeeded: s/gsnedders/jgraham/ Succeeded: s/?/mikewest/ Succeeded: s/mikewest/mkwst/ Succeeded: s/tests/tests to be in the right format/ No ScribeNick specified. Guessing ScribeNick: fantasai Inferring Scribes: fantasai WARNING: No "Topic:" lines found. Present: Florian JohnJansen SimonSapin dom fantasai gsnedders jgraham jyasskin kawai mkwst ojan r12a rniwa shoko yosuke Got date from IRC log name: 28 Oct 2015 Guessing minutes URL: http://www.w3.org/2015/10/28-testing-minutes.html People with action items: automate browser change documentation fixing get infrastructure reduce ttwf vendors WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report[End of scribe.perl diagnostic output]