W3C

- DRAFT -

Core Mobile Web Platform Community Group Teleconference

26 Jun 2012

See also: IRC log

Attendees

Present
Wonsuk_Lee, Ming_Jin
Regrets
Chair
Jo Rabin, Robin Berjon
Scribe
Josh_Soref

Contents


<trackbot> Date: 26 June 2012

<scribe> Scribe: Josh_Soref

Testing

darobin: topic for today is Testing Testing Testing
... with maybe a little on vendor prefixing
... yesterday we talked about QoI tests
... conformance tests
... prioritizing interop issues
... testing the untestable
... we had a notion of testing for areas
... "categorizing testing/levels"

[ darobin live edits a text file ]

rob: you might be interested in building a web app that's primarily an audio player
... you might really care about ring 2+3 and only ring 1 of typography

tobie: rob's point goes in the direction of the point that Josh_Soref made yesterday
... leveling doesn't make sense for extra features

dehgan: polling app developers
... "what features do you need for these themes"

DanSun: we might want a video category

[ Scribe isn't going to transcribe the text file ]

mattkelly: the need to automate tests....

[ chairs bicker at eachother over testing the untestable ]

tobie: categorization is useful
... but a goal of this project is to fight fragmentation
... having a device that's a good fit for some apps and not others
... is a problem
... i want to raise a flag about this

jo: surely it's legitimate to have devices with a specific purpose in mind

tobie: for the vast majority of mobile devices people are interested in
... i'd argue it's less so

jo: say you're building a navigation - car app

tobie: it's not mobile

jo: it's "mobile scoped, not mobile specific"
... rob, why don't you lead us on QoI?

rob: i don't know how to do this
... it's the thing that causes us the most problems:
... browsers not quite behaving right

jo: give us an example

rob: there are 2 examples that sum up the problems
... 1. password field
... if it has lots of dom elements before it, it hangs when you press backspace
... we attach a dom listener and clear it if it had one character
... 2. browser crashes if you have a thing to define a schema
... 3. browser clears local storage if you get a large calendar invite
... it took us 6 months to reach what we think is a reproducible test case for that last one

darobin: some of the tests you mention are egregious corner cases of one browser
... hopefully in a single version of the browser
... we could have a test suite for that
... but it would require automation driving
... and it's more in the field of regression testing
... than QoI

tobie: i agree w/ darobin
... you end up w/ test suites targeted at existing browser bugs
... and browser vendors don't like that

rob: absolutely
... and it makes the browsers you build for look like they're the worst
... conformance to spec is something we don't pay attention to
... we need to focus on real devices
... nuances that don't quite work
... we need to deliver now
... waiting for things to improve isn't an option

darobin: conformance testing brings a lessening
... of problems with time
... there's a reason no one's asking about GIFs or Tables

Josh_Soref: only in the last 5 years (gifs were crashing before)
... (tables may have been problematic more recently)

darobin: performance... not hardware accelerated graphics
... css animations
... where the frame rate suddenly drops to 1/5 s
... those are more common
... i think fixing those things can help

rob: i think we're close to the problem of defining what a device is capable of
... and detecting if it's doing well enough
... or doing badly
... we have flags to detect "fastish" or "slowish"
... and vary how much we do based on how fast we perceive the device to be
... that isn't correlated to the absolute performance of the hardware
... it correlates to the browser

darobin: there's a relationship
... part of what we've talked about before wrt QoI
... is whether it's doable
... and people get performance testing wrong most of the time
... i'd like to find out if this group wants to do it
... and has the right resources to do it right

<darobin> ack lars eris

Josh_Soref: i want to praise FT for doing the right thing
... namely to detect performance
... and then adjusting what they do based on it

tobie: among the QoI issues
... are those that i added to the spec yesterday
... asked on and on again by game makers
... speed of canvas
... speed of css animation
... multiple sounds together
... latency
... - which is really terrible on some devices
... -- close to a second on some devices
... things which prevent the game industry from building html games

mattkelly: i'd add physics performance
... and GC pauses
... what i was focusing on in Ringmark early
... was page scrolling
... which affects everyone
... i'd assume including FT

darobin: page scrolling performance
... touch responsiveness is delayed to handle clicks

jo: people use native for touch reasons

darobin: it's deliberate and can be hackiliy disabled

rob: yet: can you talk about testing video output

jet: mozilla has backdoors into firefox to do testing
... for fps
... for e.g. animations

darobin: there's the Browser Testing and Tools WG

jet: it may well be
... i haven't seen a proposal from them

darobin: the scope is anything related to testing a browser
... they'd be allowed to produce technology we're not

tobie: we could write a note to that group

darobin: if you have requirements around that
... then talk to them

jet: for our needs, are requirements are largely met
... for this group you want to be able to test across all
... browsers

itai: just wondering if the answer to these tests is highly dependent on the hardware perf
... to test one compared to another
... maybe we need a way to have a combined grade for a hardware platform
... combining memory bandwidth, computing power, ...
... say "i'm a class B platform"

darobin: that's possible, but it's hard
... we talked about yesterday
... to draw a line and say "this is a typical platform"
... on anything like this or better, you need to do this or better
... if you do something piggishly on a high end hardware, good for you
... for feature phones, you can say you're below that

itai: the idea is captured

mattkelly: my opinion is in line with darobin
... we should have a baseline and go from there
... for level 1, 50 sprites @30fps, any phone should run
... even an iPhone 3
... no Device Capabilities are in the fold
... e.g. NFC
... no one is building apps for that

darobin: we're about to get an NFC WG
... i hear interest in this
... how do we make it actionable
... does someone want to pick a baseline hardware
... i want speed of cpu/gpu

bkelley: you can't quantify performance with a couple of numbers
... different architectures
... memory bandwidth
... cache size

darobin: can we cut corners in a way to be meaningful
... we know it's wrong, but good enough for our purposes

bkelley: by establishing that baseline, we exclude devices

tobie: one issue at the bottom of this is whether we can look at a browser outside the device it's running on
... as an end user, i care about how quickly it runs on my browser on my phone
... they're tied together in a way much deeper than on desktop
... the other aspect is who the audience of these tests is
... for browser vendors, being able to compare matters
... for developers, it matters whether you can build to a phone

mattkelly: 500mhz, no memory
... and completely awesome browser, and does 50fps, and it passes
... maybe we can go w/ numbers for individual target bits
... don't worry about hardware

darobin: say targets for browser-device

Dong-Young: what matters is the combination of browser-hardware

darobin: we can test that
... it just makes more test results

tobie: you can do analysis to compare browsers on 200 different devices

jo: this conversation is going in the direction i want to talk about
... setting a particular hardware spec is the road to ruin
... many a young man has fallen on that road
... it's important to not talk about mobile phone
... say your purpose is to make a "video player"
... it should be testable
... relativistic measures
... are probably the only sensible way of testing
... if i produce a thing and it works abysmally on a device
... it's not useful

mattkelly: i'd argue we need very clear focus
... at least short term
... my opinion is the group should focus on where the market is
... to catch up w/ native
... enable 2d games
... and where people will buy in new markets
... when we hit critical mass
... then it's much easier to talk about more aspirational issues
... focus on current market
... where they're sold and why
... 2d games
... a/v apps
... camera apps

jo: i don't disagree
... i'd say categorizing in a limited and extensible way is a good thing
... i think relativistic measures is a good way

<Zakim> Josh_Soref, you wanted to say target UCs

<fantasai> Josh_Soref: I don't know if it's technically possible to count how many sprites are on the screen in Angry Birds, but a survey of the top N apps in the market, 2d games, video players...

<fantasai> Josh_Soref: Top 3 devices, top 10 apps for a thing, see what they're using

<fantasai> Josh_Soref: Maybe 25 sprites at 30 frames per second

<fantasai> Josh_Soref: You test at 15 frames, 30 frames, 60 frames

<fantasai> Josh_Soref: Figure out how many sounds, test for that

<fantasai> Josh_Soref: you build tests so it can test more than the target, so it can report that

<fantasai> Josh_Soref: then the tests can naturally scale up

<fantasai> Josh_Soref: you can go back and say "This year, we need twice as many sprites"

<fantasai> Josh_Soref: we don't need to rewrite the tests, just change the benchmarks

<fantasai> Josh_Soref: I don't think it's very hard to do most of this. Might be boring. Might be fun

jo: mattkelly you have done sprite counting, or you haven't don sprite counting?

mattkelly: we did this 8 months ago
... we were building jsgamebench
... we built a 2d game bench
... we launched sprite counting in ringmark about 2 weeks ago
... we measure sprites rendering @30fps
... bare minimum
... high games need @60fps
... but that's rare, even on xbox
... it's definitely testable
... but on devices, push notices inbound can lead to a pause
... causing a fail, same for gc()
... from my perspective, if the pause happens, fail the test anyway
... we're definitely doing sprite counting

tobie: jo, you were asking about type of sprites in a game

darobin: jo was asking if sprite counting was done

tobie: the answer to that was "yes"

jo: mattkelly just answered that at more length

tobie: a point of cory's research for jsgamebench
... was to define types of games and sprites per game
... cards have max of 5 sprites concurrently
... 25 for 2d platform games

jo: action to tobie to chat this into the public domain

<darobin> ACTION: Tobie to provide numbers for required sprites/fps in games [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action01]

<trackbot> Created ACTION-26 - Provide numbers for required sprites/fps in games [on Tobie Langel - due 2012-07-03].

jo: it seems publishing the numbers you're talking about
... it tells developers you need to target this
... and to browser vendors
... the test's job
... is to see if you can do 1fps, 2fps, 6...
... until it barfs
... at that point, you say "you did 25fps", "but you can't do X/Y/Z @fps"
... that's all it should say, not pass/fail
... but there are external qualifiers
... it doesn't matter if you haven't reached that
... external contemporaneous events on a device
... in the event you get an SMS during audio, what happens
... ok, you can do 60fps
... but what happens to the battery
... there's a range of metrics that are testable
... no Pass/Fail criteria
... but perfectly testable

tobie: cory's jsgamebench
... brought to this discussion
... to have anything smooth enough, you need 30fps
... you don't need more than that, except hard core 3d games
... and less doesn't work
... about Battery
... how badly running a game drains the battery
... it goes back to browser-hardware combo
... good browser on bad hardware
... will have the same perf on bad browser on good hardware
... but good browser will probably drain the battery less than bad browser
... adding that would be good to test

jo: and you can directly compare to find 'good' / 'bad' browser on a single device

darobin: trying to summarize to reach actions
... anyone want to write tests?
... since you joined this group to do testing

jo: i joined this group to talk about testing

mattkelly: the question is who wants to write these tests
... i'm happy to port over what we've done w/ ringmark

jo: can we reverse out the underlying bits
... to codify the tests we want to accomplish

mattkelly: we've done a bit of research for jsgamebench

<girlie_mac> an interesting study on browser battery consumption: http://www2012.org/proceedings/proceedings/p41.pdf

mattkelly: GC pauses can be guessed based on dramatic framerate drops

vidhya: what's a GC pause

<jo> ACTION: mattkelly to document JSGameBench and the approach behind it [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action02]

<trackbot> Sorry, couldn't find user - mattkelly

mattkelly: sorry, Garbage Collection pause

Josh_Soref: GC pauses run a bit on the main thread
... historically heavily there, recently less so

mattkelly: for <audio>, we're testing from areweplayingyet
... you can't detect a pop, except w/ your ear
... page scrolling
... you need a high speed camera and a robot that flicks it

darobin: for audio testing
... we could have a background audio track
... and whenever you're supposed to have a file overlay
... you have a visual queue

tobie: it's doable to write a test
... it's harder to automate
... i wanted to add about physics testing
... and gc pauses
... the guy impact.js
... wrote extensively about it
... he had a 1 minute game with pre-controlled movements
... measuring movements
... to recognize GC pauses
... he explained why
... for physics, it's raw js engine perf
... it's not very difficult to script a physics scene and measure how many loops it does
... in a given time

jo: in the category of external interrupts
... sms, calls, gc
... anyone have a list?
... there are 2 different categories
... gc isn't really external
... it's part of what you want to test
... you have accidental external events
... it's QoI
... it's stupid if receiving an SMS busts gameplay
... but it isn't fair if it impacts results of the test
... it's hard to reproduce
... gc pauses will get similar count if you run it a number of times
... i want to scope this down
... tests in terms of Sprites, FPS
... don't want to characterize testing as what else is going on
... which will have an impact

darobin: but it wouldn't be fair

jo: let's decide SMS is out of scope
... objections?

[ None ]

jo: are we talking about Steady state perf or burst

<darobin> RESOLUTION: Interruptions and slowdowns due to factors external to the browser engine are out of scope for our tests

jo: sustained rate of 30fps but a burst of 60fps
... for 5s
... useful in network testing

Josh_Soref: offhand, not this year

mattkelly: are there UCs for this where things happen ... differently?
... e.g. drawing perf in canvas
... birds just sitting in slingshot
... there's 1 sprite
... when he hits the blocks+pigs, there are 50 sprites
... we should just test for 50 sprites steady

jo: a good example is network interface performance
... queuing effects

rob: some new devices have cameras that capture in burst mode

Josh_Soref: can we rule it out until the end of 2012?

[ Yes ]

jo: no one has mentioned dom manipulation performance

darobin: we have test suites for dom perf

mattkelly: i think every game developer's opinion is canvas is the future
... it has a very granular api
... but some game developers use dom manipulation is faster than canvas on Android
... but let's eliminate that from gaming perspective

Josh_Soref: do we care about accessibility?

jo: we need to put dom manipulation in scope

<marcos_lara> additional info on Benchmarking canvas.

<marcos_lara> "Benchmark Info: Tests the 2D canvas rendering performance for commonly used operations in HTML5 games: drawImage, drawImage scaling, alpha, composition, shadows and text functions."

Josh_Soref: canvas doesn't have an accessibility story today
... but there's an accessibility story coming to html5
... which doesn't have performance tests

<marcos_lara> http://www.kevs3d.co.uk/dev/asteroidsbench/

tobie: saying you can do games fast enough with dom manipulation on a mobile phone

<marcos_lara> test it out and it's open source

tobie: means there's no need to test it

darobin: if people aren't complaining about it
... then it's not an issue

tobie: it's no longer a real performance issue
... coming from a company that build systems timeline
... which has a huge amount of dom nodes
... it's not something we've heard as an issue

mattkelly: i'd agree
... there are more important things to push
... it's not dom manipulation that's important
... it's position:fixed
... i think it's when it's combined with other things
... that leads to problems
... and more important to focus on
... we have a massive feed in timeline
... but position:fixed killed timeline

rob: i was going to echo mattkelly 's point
... momentum scrolling+position:fixed
... they aren't well implemented
... you end up fiddling with them yourself

DanSun: video is an important thing too
... for perf
... do we want to test for video too?
... resolution/fps...

darobin: it's difficult
... one thing to test is battery consumption
... testing fps on <canvas> is easy
... i'm not sure we can do it for <video> w/o underlying engine helping
... i think it's a good idea, not sure how

tobie: your comment on video reminded me
... i heard from folks @orange
... that on a lot of devices, especially iPhone
... playing video isn't done in DOM
... but as a native plugin
... you can't overlay it with stuff
... like commercials

darobin: video controls

tobie: that's an issue
... but it's QoI

jo: jet+rob made a point
... about consistency/flow
... it may pass a 70fps test
... but not smoothly
... do we need to look out for it in QoI

rob: yes, but i'm not sure how other than using an external camera

jo: if it turns out to be impractical, it can drop out

rob: i'm happy to take an action to see if it's practical

jo: it would be nice to indicate to vendors that it's important for animations to be smooth

<darobin> ACTION: Shilston to expeditiously check whether it is practical to measure consistency of framerate [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action03]

<trackbot> Created ACTION-27 - Expeditiously check whether it is practical to measure consistency of framerate [on Robert Shilston - due 2012-07-03].

<fantasai> Josh_Soref: So, on video I think that most browsers are starting to have APIs not standardized to check FPS in their enignes

<fantasai> Josh_Soref: Don't know when you'll b eable to do it formally, but think sometime early next year it might be possible at least non-standardly

<fantasai> Josh_Soref: For other forms of testing, a lot of devices have HDMI or displayport or something else

<fantasai> Josh_Soref: Now that might not match the display output, but might be able to write blackbox tester that uses that

<fantasai> Josh_Soref: instead of a camera

<fantasai> Josh_Soref: Also, some devices while they have platform access, might be a debug tool that lets you capture video

<fantasai> Josh_Soref: at RIM we have something that captures 1fps

<fantasai> Josh_Soref: I think it may be possible at least on some platforms to capture frame buffers and store that to a file for testing later

<fantasai> Rob: Wondering Jet whether you were able to explain your HDMI capture etc.

<fantasai> Jet: I wouldn't hold that up as a best practice. Largely non-deterministic.

<fantasai> Jet: We try to get close

<fantasai> Jet: but in practice all the browser implementations upload a X to the GPU and ask the hardware to draw

<fantasai> Jet: Beyond that we can't measure

<tobie> GC test: http://html5-benchmark.com/ and related blog post: http://www.phoboslab.org/log/2011/08/are-we-fast-yet by ImpactJS author.

<fantasai> Jet: ... impacts our ability to get 60Hz

<fantasai> Jet: Definitely room for innovation, but need hardware vendors to come back with methods to measure hardware

<fantasai> mattkelly: Not sure how important to measure things like fps, given most devices defer to the native layer

<fantasai> mattkelly: But need things like adaptive streaming

<fantasai> mattkelly: They have a video that's 2 hours long, can actually dial up and down the bandwidth

<fantasai> mattkelly: important for audio as well

<fantasai> mattkelly: can then queue up the next bit at the correct rate

<darobin> RESOLUTION: We are not going to specify baseline hardware, instead we will test device+browser combos

<fantasai> Josh_Soref: we're not just testing device combos, we're testing to targets

<darobin> RESOLUTION: We will specify a number of metrics that will be used to assess the limits of performance of specific device+browser targets

<darobin> RESOLUTION: We will not be testing burst performance for now

<wesj> http://wiki.whatwg.org/wiki/Video_Metrics

<darobin> RESOLUTION: We will be testing in isolation

[ Break ]

darobin: we covered QoI
... i'm somewhat concerned we have Actions for things
... but not Actions to write actual tests
... writing tests is welcome

jo: can we clarify
... do we want text
... or bits of JS?

darobin: i mean actual code
... you may care about <audio> latency and parallelism
... and submit a proposed test to the group

jo: i wonder if there's scope for people who don't write JS to write text to write JS to implement it

darobin: it may be useful, but it's hard to describe the JS w/o knowing how to write it

mattkelly: what i'd like to avoid is that people start writing random tests that add no value
... i think it's important to get consensus on level 1
... and the framework to produce them
... and get consensus on the harness
... and get a clear way to coordinate writing these tests
... preferably not the ML
... from my perspective, it's something like github

darobin: i'm not sure we need the samework for QoI and Conformance tests

jo: do you have a harness you'd like to propose?

mattkelly: i think keeping a lot of the things in mind that we're trying to acheive
... particularly the ability to automate these things
... in Ringmark, we're using QUnit
... it may not be the right thing
... but people know how to use it
... QUnit can compile to w3c test frame
... but not back the other way
... it may be a potential thing we can use

fantasai: what is ringmark?
... it's a bunch of tests?
... is it a harness?

mattkelly: Ringmark uses
... a lot of QUnit methodology
... it has a runner, a results page
... all of the tests
... and it's built so you could add in automatable tests
... so long as they don't require single page instances
... and you can run it through the QUnit test runner as well

fantasai: so it's a framework for running JS that has to be in the same top level page

mattkelly: they can use iframe fixtures
... if you go to http://rng.io

fantasai: if you put 10,000 iframes in a page
... that's a major conformance test on iframes

darobin: you test memory leaking fairly efficiently
... one thing i'm unclear about the differences between QUnit and testharness
... i've used both
... i can do the same thing in both

mattkelly: you can
... we ran into a lack of documentation + direction in how you write these things
... these are fixable things
... there might be some overhead
... documentation is a big thing
... how tests are set up
... it's a lot harder to run in an automated fashion
... each test is meant to have an entire page defined
... for a <canvas> test, you have to have <head>, <body>

darobin: the reason i'm pushing back here
... we need to integrate with existing test suites
... we have thousands of tests using testharness
... i'd like to avoid conversion

mattkelly: there are probably tens of thousands of testss
... they are of varying qualit must be easy to write requirements, since you did thaty/implementations
... they're all over the map
... some include other harnesses
... it seems like tests were of mixed quality

darobin: one thing that would be useful would be to have documentation on these issues
... testharness is THE STANDARD for HTML, WebApps, etc., etc.

s/s, etc./s, DAP, etc./

scribe: even if we agreed there was a better alternative, i don't think we could convince them to convert

mattkelly: from ringmark's perspective, it was about moving fast
... we had limited resources
... we had a goal of automating these things
... from OEMs and vendors I talked to
... none seem to run these
... they don't run testharness when they do device QA
... a goal should be to have Vendors run these so they can fail them

jet: in general, we don't go running the entire W3 test suite
... before we ship a browser
... it takes more than 24 hours
... to the other extent, anything that claims to test the browser in 60s isn't trustworthy
... ringmark could be useful for something in the middle
... for Mozilla, we can't commit to a third, fourth or fifth harness

<darobin> http://w3c-test.org/framework/app/suite

darobin: you have a list of test suites
... suites test specifications
... you can look at results
... you can run tests
... you can load a runner
... there's a JSON API on this Database
... if you can have a Requirements Doc of what you'd like to see
... it would be possible for us, you, or a third party, to get a list of these tests
... run them, etc.
... to get something that could run in 15 minutes
... running 10,000 tests. and you could cherrypick

jet: sure

darobin: you could find bugs in the test
... and presumably file them
... and hopefully find more bugs in the browsers

fantasai: i don't think cherrypicking a bunch of tests
... and saying here's a test of the web stack

darobin: i meant cherrypicking whole suites

fantasai: like ACID tests,
... we shouldn't build an ACID test

darobin: i meant more the ones you can run automatically

jet: a basic need i ran into
... i'm hacking firefox
... i put it on my phone
... i couldn't find a way to run the w3c suite against us

fantasai: importing the suite into tinderbox

jet: that works for us, but we're trying to address everyone

<Zakim> Josh_Soref, you wanted to talk about flaws in tests

<fantasai> Josh_Soref: I wanted to talk about flaws in tests

<fantasai> Josh_Soref: Most of browser tests have laughed at tests they've looked at for the flaws they've found in the tests

<fantasai> Josh_Soref: But I don't think anyone has made a list of common mistakes

<fantasai> Josh_Soref: e.g. not scoping variables

<fantasai> Josh_Soref: Would be helpful to have a list for new test authors to write better tests

<darobin> ACTION: Josh to survey people and compile a list of common errors in test writing [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action04]

<trackbot> Sorry, amibiguous username (more than one match) - Josh

<trackbot> Try using a different identifier, such as family name or username (eg. jkerr, jsoref)

<darobin> ACTION: Robin to write documentation for testharness.js [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action05]

<trackbot> Created ACTION-28 - Write documentation for testharness.js [on Robin Berjon - due 2012-07-03].

<darobin> ACTION: timeless to survey people and compile a list of common errors in test writing [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action06]

<trackbot> Sorry, couldn't find user - timeless

<darobin> ACTION: Soref to survey people and compile a list of common errors in test writing [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action07]

<trackbot> Created ACTION-29 - Survey people and compile a list of common errors in test writing [on Josh Soref - due 2012-07-03].

<fantasai> darobin: Problems I saw in Ringmark were feature tests, not perf tests

<fantasai> darobin: First 5% of testing something

<fantasai> darobin: rest was missing

<fantasai> mattkelly: goal of Ringmark isn't surface area testing

<fantasai> mattkelly: To be successful, you'd need 100s of thousands of tests

<fantasai> mattkelly: we were just trying to provide a framework for thinking about things

<fantasai> mattkelly: need to get consensus around that

<fantasai> fantasai noted above that testharness.js can't test CSS other than its parsing

<tobie> http://test262.ecmascript.org/

tobie: i'm not sure you're familiar w/ test262
... it's probably a good idea to know about performance
... running 27k tests takes about a quarter of an hour
... each in its own frame
... having been responsible for the architecture of ringmark
... about testharness.js and qunit
... the idea behind the original architecture
... having written for Prototype
... having JS test separated from the page in which it would run
... was extremely useful
... and a good architectural choice
... there's a lot of boilerplate

darobin: fantasai might have something to add to that
... i know the CSS WG uses a build tool
... notably for multiformat

fantasai: the tests we write @CSS WG have a bunch of metadata
... a lot of the boilerplate is XML
... a goal was tests be standalone
... that you could load in your browser
... rather than having to run a build to be able to see the results of your tests
... it made it easier to work on tests
... it was harder when we had the build system required for Selectors
... it's only a little more work to have <!DOCTYPE> at the top

tobie: i guess it makes more sense to have doctype in CSS
... that explains about how you did that
... fortestharness, it's in github
... it's easy to submit patches
... the documentation exists
... but it's included in the comments
... i submitted a patch a while back
... to turn that documentation into markdown
... to be turned into a readme
... it was turned down
... AFAIK, the plan is to move the documentation into the wiki
... i don't think there's more overhead in testharness than any other Open Source project

mattkelly: i had something of value to add
... i'd like to stress pragmatism
... about building practical web apps
... which is the reason people buy smartphones these days
... we need lots of tests
... but it's easy to go overboard
... including very strict testing
... is something to consider not testing
... e.g. ecmascript
... we shouldn't go overboard

jo: we could take a RESOLUTION not to go overboard
... so requirements for testharness

rob: how could it be made more friendly to newcomers
... a vm image?

darobin: like a git-clone of template project

tobie: it requires Node

darobin: who does not have Node.js?

rob: it has a bunch of dependencies
... the entry barrier could be lowered

darobin: the only thing you need is testharness.js and a test page
... you probably tried ringmark

rob: there are dependencies like php for AppCache

darobin: oh, right

tobie: there was a design disagreement about how that was done
... to write a testharness test, if you don't need server side stuff
... you don't need anything but an html page
... the coremob stuff on coremob's github repo
... requires both Node and a php runtime
... and that's stupid and should be fixed
... if all it requires is PHP, there are 1 click installers for it
... the existing code needs to be fixed
... and then documentation eeds to be updated

ACTIONS?

<darobin> ACTION: Matt to remove the dependency on Node to get Ringmark running, and help make it easier to set up [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action08]

<trackbot> Created ACTION-30 - Remove the dependency on Node to get Ringmark running, and help make it easier to set up [on Matt Kelly - due 2012-07-03].

jo: who has requirements?
... i have a requirement that we not create another system for doing this

darobin: to address rob's point
... is whether it'd be useful to have something similar to jsFiddle
... but to have it preloaded w/ testharness
... and then be able to save it online

Josh_Soref: sounds useful

tobie: sounds like a good idea
... you just volunteered

darobin: if i can get time+budget...
... i can look into it

jo: jet , you expressed requirements earlier

<darobin> ACTION: Robin to look into something like jsFiddle for test writing [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action09]

<trackbot> Created ACTION-31 - Look into something like jsFiddle for test writing [on Robin Berjon - due 2012-07-03].

jet: something that takes more than 60s but less than 24hrs
... proper scoring of tests
... not green/gray
... some depth to tests as well
... it's too easy to cheat on green

jo: i volunteer darobin to write requirements

darobin: i'll implement, but not write
... it wouldn't hurt if an OEM or Carrier did it
... how about jfmoy ?
... wouldn't that be helpful?

jfmoy: for sure.
... i don't knkow

s/kow/ow/

darobin: what would you need to run automated tests

jfmoy: for now, we're working on automation tests
... which we committed to give back to the group
... we're going down that road

tobie: so it
... if it's sharable, then you should be able to give it

<darobin> ACTION: Moy to provide requirements for an automated test runner of all tests [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action10]

<trackbot> Created ACTION-32 - Provide requirements for an automated test runner of all tests [on Jean-Francois Moy - due 2012-07-03].

jfmoy: some of our tests are interactive

darobin: if you'd like to present that

jfmoy: we compared 3 test platforms
... ours, html5test.com, rng.io

<jo> ISSUE what are the requirements for a test framework?

<jo> s/ISSUE what are the requirements for a test framework?//

jfmoy: sometimes interaction is needed for things
... like forms

<jo> ISSUE: what are the requirements for a test framework?

<trackbot> Created ISSUE-29 - What are the requirements for a test framework? ; please complete additional details at http://www.w3.org/community/coremob/track/issues/29/edit .

jfmoy: all form bits for html5test/ringmark
... the proper keyboard display isn't tested
... for video, it isn't tested usefully
... ringmark is more automated than ours

mattkelly: starting a conversation in the group
... how QA processes work @ OEMs, Carriers, Browser Vendors
... making it as flexible as possible
... action to Orange, Mozilla, Qualcomm
... what would be the best way to get information out of ringmark

<darobin> ACTION: matt to document JSGameBench and the approach behind it [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action11]

<trackbot> Created ACTION-33 - Document JSGameBench and the approach behind it [on Matt Kelly - due 2012-07-03].

<darobin> ACTION: matt to talk to OEMs/carriers about what they would most usefully need to get out of Ringmark results [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action12]

<trackbot> Created ACTION-34 - Talk to OEMs/carriers about what they would most usefully need to get out of Ringmark results [on Matt Kelly - due 2012-07-03].

<darobin> COREMOB TESTING:

<darobin> - Quality of Implementation tests

todo today

<darobin> - speed of canvas

<darobin> - speed of CSS transitions

<darobin> - audio latency

<darobin> - audio parallelism

<darobin> - physics performance (just raw JS performance)

<darobin> - GC pauses (see ImpactJS)

<darobin> - page scrolling performance

<darobin> - touch responsiveness

<darobin> ✓ DOM manipulation (not a real issue)

<darobin> - Conformance tests

<darobin> - Ringmark

<darobin> - blockers for test writing

<darobin> - test automation

<darobin> - things that have perceptual outcomes (reftests, audio reftests…)

<darobin> - Prioritising interoperability issues

<darobin> - overlaying atop video

<darobin> - integration with the W3C Test Framework facilities

<darobin> - Categorising testing/levels (but fragmentation is evil)

<darobin> - Gaming 2D

<darobin> - Gaming 3D

<darobin> - Device-Aware functionality

<darobin> - e-books

<darobin> - Multimedia playback (Audio, Video…)

<darobin> - Core (networking, application packaging & configuration, HTML…)

<darobin> - Testing the untestable

<darobin> - things that don't have adequate test specs of their own (e.g. HTTP)

Testing

s/Testing/Testing Goals

darobin: if you could get 5 test suites, what would you like

<fantasai> Rob: I wonder if we could put a survey up

<fantasai> Rob: e.g. Tobie's been talking with people buliding apps, maybe he has some idea of what people need most

<fantasai> mattkelly: In ringmark we focused on audio, 2d gaming, and camera apps

<fantasai> mattkelly: And then going from there, dirlling down into what features are missing

<fantasai> mattkelly: how can you test those features extensively to make sure they work well; that was the goal of Ringmark v1

<fantasai> darobin: Ringmark tries to cover a lot of ground, covers some of it very thinly

<fantasai> mattkelly: Whatever we agree on L1 is not that big

<fantasai> mattkelly: In Ring 1 it's only about 14 features

<fantasai> mattkelly: 1-2 that are large: one is DRM

<fantasai> mattkelly: I think the feature set is reasonably small, and feedback I'm hearing is we just don't hae deep enough tests for each of those areas

<fantasai> mattkelly: want to go through the features and see if group agrees on them

<jfmoy> +q

<tobie> +q

<fantasai> mattkelly: features were determined by us working with developers

<fantasai> mattkelly: I think I have an action to put more research in the group on how we qualified what's in ring 1

<fantasai> mattkelly: based on what apps are out there today

<fantasai> mattkelly: that would be my proposal, to start with what we've done in Ringmark and figure out if we have any pieces missing or should be removed, an dfocus our test writing effort there

<fantasai> ...

<fantasai> mattkelly: probably makes sense to have deeper consensus on categories in L1

<fantasai> jfmoy: I put two links to our comparison

<fantasai> jfmoy: That's our results

<fantasai> jfmoy: We're pretty happy with L1 right now

<fantasai> tobie: Missed part of conversation

<fantasai> tobie: Robin, you wanted a couple areas of focus to work on?

<fantasai> tobie: Why not looking at what holes exist?

<fantasai> tobie: If what we want to do is to reuse existing tests and run those, makes sense to have a good understanding of what exists

<fantasai> tobie: and go through tests we want but aren't writen, might not need to prioritize

<jo> ACTION: tobie to carry out a gap analysis of existing W3C test suites [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action13]

<trackbot> Created ACTION-35 - Carry out a gap analysis of existing W3C test suites [on Tobie Langel - due 2012-07-03].

<fantasai> darobin: a lot of work for one person, could split by section

<jo> ACTION-35?

<trackbot> ACTION-35 -- Tobie Langel to carry out a gap analysis of existing W3C test suites -- due 2012-07-03 -- OPEN

<trackbot> http://www.w3.org/community/coremob/track/actions/35

<fantasai> darobin: HTML5!

<fantasai> darobin: There are gaps we know aren't tested

<fantasai> darobin: Are there missing tests on things we care about there? Does someone want to look into that?

<fantasai> tobie: could be it's not a concern for companies/ppl

<jo> ACTION-35?

<trackbot> ACTION-35 -- Tobie Langel to carry out a gap analysis of existing W3C test suites -- due 2012-07-03 -- OPEN

<trackbot> http://www.w3.org/community/coremob/track/actions/35

fantasai: what's the relationship between the tests we want to write
... in ringmark
... and level 1
... there's no way to get a solid set of tests for things in level 1 in any time reasonable
... if you can do 2% testing
... how is that representing
... showing interop
... testing 5% of features at 50% effectiveness
... but you want this to be level 1
... and show interop by the end of the year

tobie: what's your proposed solution

fantasai: pick a few features, and prioritize those
... what's the goal of this document wrt testing?

tobie: that's true of every

different wg's aren't building test suites for the specs they're publishing

fantasai: i can't figure out how reporting results would relate to this

tobie: as a group, as editing that spc

s/spc/spec/

fantasai: yes we want to contribute tests
... to a bunch of WGs
... and also have some other way to report things
... as in ringmark
... that's the main advantage of it, right?

jo: that's an assumption that needs to be verified
... it isn't an assumption of mine
... it isn't an assumption that this CG will produce a reporting framework

darobin: i'd like to get the bottom of it
... fantasai has a good point
... the relationship between this document and the testsuite is unclear
... we should be able to reach consensus by the end of the year
... but how does that document relate to the testing effort it requires
... in the referenced specifications
... which we can't possibly accomplish by January
... unless aliens arrive
... that's where XO planet research helps
... plus we have to verify those tests
... we shouldn't produce a testsuite and say "This fully tests level 1"
... we need to articulate this clearly
... what i'd like to get out is an improvement
... if we test 5% where before we tested 2%, then i'm happy
... not as happy as if we could test 10%, but happier
... the test suite for this will never be final in under 10 years
... but i wanted to focus on high value targets for interop
... maybe html5 parsing is mostly interoperable
... maybe it's tested at 2% and that's ok
... but maybe shades of red, green, or pink doesn't work in <canvas>
... but maybe it's more important to get matching on blue by January
... does that make sense to people?

dehgan: one thing that would make tests solid
... if we make tests a moving target
... i get a result today
... and a result tomorrow, and my score goes down

fantasai: i think it's great that people want to contribute to the testing effort @w3c
... but the goal of this CG seems to be to push for specific things to be fixed

darobin: we want to defrag the web

fantasai: right
... you want those fixed
... and to push for vendors to implement or fix those
... one thing that has not been done well
... at w3c
... is getting tests we've done
... and getting people excited
... ringmark did that
... well, making it a game
... the psychological pressure is lost if you won't seem to go somewhere in 10 years
... this is gamification of testing
... but if level up takes 10 years
... then it isn't going to work

jo: it's impractical to do a suite for level 1

fantasai: one thing to think about is
... to break it down and prioritize
... to avoid spreading yourself too thinly
... and to focus communication effort
... more than even adding 3 CSS testing volunteers

darobin: focus on making things pretty
... probably having ringmark 1, 2, 3, ... 17 in the next few years
... making it identifiable
... having conformance targets to have PR
... to avoid getting lost

fantasai: it would be good
... to have a goal to release testing wise
... this document is a 10 year road map
... what will you get done by the end of the year
... and getting them involved and excited about

<jo> ?

fantasai: if all you have is an extra 200 tests
... to the html5 parsing algorithm
... that won't get anyone excited

<Zakim> Josh_Soref, you wanted to note that w3c test suites rarely test perf

<fantasai> Josh_Soref: if we want perf tests, we either need to find someone whose written them and steal them

<fantasai> Josh_Soref: or write them ourselves

<fantasai> Josh_Soref: current w3c tests are conformance/interop tests

<fantasai> Josh_Soref: Wrt ringmark, don't like that failing the first ring prevents runnign the second ring

<darobin> ACTION: Robin to draft a test suite release strategy based on what fantasai and Josh_Soref described [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action14]

<trackbot> Created ACTION-36 - Draft a test suite release strategy based on what fantasai and Josh_Soref described [on Robin Berjon - due 2012-07-03].

<fantasai> Josh_Soref: HTMl5 tests can have bonus points -- you can get them even if you didn't pass

s/runnign/running/

<fantasai> Josh_Soref: people like getting points

s/HTMl5/HTML5/

<fantasai> Josh_Soref: different tests run on different tracks

<fantasai> Josh_Soref: same engineer doesn't work on all the different aspects of the web platform

<fantasai> Josh_Soref: can race up one track while another engineer works on other track

<fantasai> mattkelly: it boils down to focus

<fantasai> mattkelly: earlier point around the hesitation and concern that L1 spec can get unweildy and large

<fantasai> mattkelly: I share the same concern

<fantasai> mattkelly: I feel strongly that for L1 spec we should focus on 14 different features, like we are in Ringmark

<fantasai> mattkelly: and focus intensely on that batch

<fantasai> mattkelly: and feel comfortable about our coverage of those 14 features by end of year

<fantasai> mattkelly: if we try to test all of HTMl5, we'll go down a rabbithole

<fantasai> mattkelly: and will not ship a coherent suite of tests

<fantasai> mattkelly: another point wrt bonus points, and why ringmark stops running if it fails

<fantasai> mattkelly: primary reason it does that is to make the browser look like it failed

<fantasai> mattkelly: goal is to reduce the fragmentation

<fantasai> mattkelly: don't want to reward browser for jumping out and implementing WebGL from L2 when core features are not implemented

<fantasai> mattkelly: Think we should have many releases, and have different levels

<fantasai> mattkelly: L1 should have small amount of functionality, with ample test coverage

<fantasai> mattkelly: Ultimately, we don't know what the unknowns are until we start building this stuff

<fantasai> mattkelly: if we do small bite-size chunks, can cover more ground faster

<fantasai> mattkelly: I do feel that without having test suite in this group, we'd just have a nother doc, have no impact on industry

<fantasai> mattkelly: need a product that encapsulates our vision. test suite is how we make this happen

<fantasai> mattkelly: Do think group should have some work aorund crafting message

<fantasai> mattkelly: .. need to own that message

<fantasai> mattkelly: sharing of message, where group formulates what the structure of the message is

<fantasai> mattkelly: OEMs figure out how you message that to end users, end developers

<fantasai> mattkelly: Unclear if that should be part of group's goal

<fantasai> mattkelly: wrt focus, should focus on structure of that message, not necessarily delivering it

<fantasai> Jo: I agree with everything said before

<fantasai> Jo: I think the whole thing would be more tractable if there was a L0 which was smaller in scope than L1

<fantasai> timeless: if there was a smaller level 1...

<fantasai> darobin: There's no useful reduction of the current document for which we would have sufficient tests

<fantasai> Jo: I am not convinced this group should present a flashy state of things

<fantasai> Jo: But to present tests that other people can show under a UI

<fantasai> darobin: But we already have that

<fantasai> darobin: we already have a number of test suites that can report results that can be reused by others

<fantasai> darobin: Why would we do that?

<fantasai> darobin: We have frameworks to do that

<fantasai> darobin: One thing missing so far is packaging the rsults in a way that creates market pressure to improve the situation

<fantasai> Jo: What's the point of making a pretty interface?

<fantasai> Jo: Let's make the tests reusable by anybody

<fantasai> darobin: But we already have that in the W3C test frameworks

<fantasai> discussion between Jo and Robin of whether we should use w3c test frameworks or not

<fantasai> ~_~

<jo> ?

<fantasai> tobie: I think it would be reasonably easy for Ringmark to pull out tests from other WGs

<fantasai> tobie: Either by unbuilding a stage to existing tests

<fantasai> tobie: or by just changing ringmark so that it actually uses iframes and pulls existing tests into it

<fantasai> tobie: not a hard problem to solve

<fantasai> tobie: fantasai talked aobut test the web forward

fantasai: Test The Web Forward effort
... Adobe is spearheading it
... and teaching people to write tests for CSS and SVG
... primarily
... it's complementary to what you're doing here
... it isn't quite the same
... it's getting broader community to write tests
... and they're w3c contributions

darobin: also something Mosquito did

jo: can we have you talk to eachother?

fantasai: there's events and you're welcome to attend

tobie: do you explain how testharness works

fantasai: we talk people through the process of creating tests
... and submittting them
... and reviewing each-other's tests

darobin: how did it go?

fantasai: it takes writing 20 tests to get good at it

darobin: i made a presentation similar to yours 2 weeks before
... about half an hour in, i realized no one had written tests for anything before

fantasai: i started in the mozilla project doing this with very little guidelines/guidance

jet: comments on testharness
... about depth of a test
... and fail on a test v. continue on a test
... there are very basic features that if you add them to ringmark
... no browser will pass ring 0
... i don't think that's the goal of testing
... i don't think complete css 2.1 every single thing is good
... and by definition not testing other features

<tobie> +q

jet: we'd like /all to be the default config for ringmark

s/+q//

scribe: you can timebox
... level 1 december
... what you have in your tests at december is your level 1

darobin: if we did this today
... we'd have html3 and some level of scripting and styling

jet: right
... browsers claim support for html5
... we'll try to turn things green
... but that won't solve interop

darobin: one thing to do is testing
... every six months we release a new set of tests we'd like to turn things green

fantasai: so like an acid test with more tests?

darobin: right, with a lot more tests
... anything we have tests already, we take

fantasai: seems like a reasonable goal to me

DanSun: whatever we do
... testing, quality is the key
... ringmark testing, in 5s there's no chance at all

<fantasai> darobin^: wherever there's major gaps or interop problems, we add more tests, and package this all up nicely

DanSun: can we list test suites
... and which are the most trusted
... and maybe leverage that?
... and integrate w/ ringmark to show the results

tobie: the problem is, that needs towrk

s/twork/work/

scribe: and that requires resources

s/towork/work/

mattkelly: +1 for small packets of tests
... -1 on a 10 year plan for a doc

s|s/twork/work/||

darobin: i'd find that weird

[ Lunch ]

<darobin> ACTION: Robin to assess which existing test suites can be reused and at what level of coverage they stand [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action15]

<trackbot> Created ACTION-37 - Assess which existing test suites can be reused and at what level of coverage they stand [on Robin Berjon - due 2012-07-03].

<darobin> ISSUE: should the document track the testing effort or not

<trackbot> Created ISSUE-30 - Should the document track the testing effort or not ; please complete additional details at http://www.w3.org/community/coremob/track/issues/30/edit .

<fantasai> Jo: Various side discussions happened over lunch in an attempt to break the logjam

<fantasai> Jo: Starting point is having concrete deliverables by end of year

<fantasai> Jo: Document seems perfectly achievable, but what are we going to deliver in terms of tests by the end of the year

<fantasai> Jo: So here's a plan, taking fantasai's point on board,

<fantasai> Jo: Yes, we need something nice and visual that ppl can rally around. But doesn't have to be this group

<fantasai> Jo: So we should provide infrastructure to do that,

<fantasai> Jo: What we need to do is an existing proof of an actual implementation of such a thing

<fantasai> Jo: Facebook is happy to refactor their existing ringmark output to fit in with what I'm about to say

<fantasai> Jo: In terms of meeting objective of having visual output, FB will provide that existence proof

<fantasai> Jo: Would be good fo others to provide similar things

<fantasai> Jo: Browser vendors might want to work headless testing into ..

<fantasai> Jo: So objective of this group then is to produce a framework within which tests can be run and can be incorporated into other things

<fantasai> Jo: Next thing is what tests should be done by the end of hte year

<fantasai> Jo: Well, actually, we have a whole slew of tests that exist today

<fantasai> Jo: If we said what we want by the end of the year is what exists today, could be done

<fantasai> Jo: But have som notion of prioritization, want to influence things

<fantasai> Jo: to influence browser vendors, device manufacturers, and users

<fantasai> Jo: some tests in ringmark, and lots of tests in WG

<fantasai> Jo: But we have to do some gap analysis

<fantasai> Jo: All that is so clear so far

<fantasai> Jo: What is the framework these tests are to b eexecuted in?

<fantasai> Jo: Seems clear to me that there is only one option, and that is to use the existing W3C infrastructure

<fantasai> Jo: Sounds like doing that in tobie's output is not simple, but doable

<fantasai> Jo: So what we'll have by end of year, is a framework document that says what we're trying to do in some timeframe writ large

<fantasai> Jo: Then a prioritized list of features that goes into our initial test stuff

<fantasai> Jo: Won't be whole of HTML5, but HTML5 things that people find problematic

<fantasai> Jo: And then at least 1 visual representation of those results

<fantasai> Jo: If you don't like FB's version, can create your own!

<fantasai> Jo: So I think that's it. HOpe it made some kind of sense

<fantasai> Rob: So, you've got a test suite for all tests which is already capturing data

<fantasai> rob: Ringmark is a nice way of showing the data, and the idea is to combine the two?

<fantasai> darobin: Bit more than that

<fantasai> darobin: To summarize,

<fantasai> darobin: 1. Keep document for L1, it's the shopping list of what devs need today, and guidance for finding gaps

<fantasai> darobin: 2. Write a smaller document, list of things to test for 2013

<fantasai> darobin: that document will match the release of the test system

<fantasai> darobin: That test system would ideally be able to use tests in W3C databases

<fantasai> darobin: Talked with matt wrt separating Ringmark visual representation from running the tests

<fantasai> darobin: Could also compare the test results across browsers

<fantasai> darobin: Has advantage that nonautomated test results can be included

<fantasai> Vidya: I did not understand what you said.

<fantasai> Vidya: You said, Ringmark will do what it does today plus it will show me other stuff that's in the database about my browser?

<fantasai> darobin: I don't know if this was clear in earlier explanation

<fantasai> darobin: There is a W3C existing system on w3-test.org

<fantasai> darobin: Many of the tests by W3C WGs have been integrated

<fantasai> darobin: This contains a test runner, you can take your browser and run the tests

<fantasai> darobin: If the tests are automated, the results in your browser are automatically submitted

<fantasai> darobin: But for non-automated tests, the person looking at the test can say Pass/Fail/Can't Tell/etc

<fantasai> darobin: All that info is stored

<fantasai> darobin: So for all browsers we have stored data on pass/fail results on all these tests

<fantasai> darobin: You can query this data, it's in a database

<fantasai> darobin: Some of the things we want to test are not automatable, can't be used in Ringmark

<fantasai> darobin: But we can pull all that data and display it in a similar way to Ringmark

<fantasai> ...

<fantasai> darobin: The visual representation would be cleanly abstracted

<fantasai> Vidhya: The output here is what? Someone is going to define this api

<fantasai> darobin: That's up to FB

<fantasai> darobin: need to talk about what we need to feed into it

<fantasai> Vidhya: I think the reality is that we see a lot of browsers that people out there don't see

<fantasai> Vidhya: We'll see them before they're commercial

<fantasai> Vidhya: Would be great to go in and see that

<fantasai> Josh_Soref: Is there an action on you to fix the JSON to help people?

<fantasai> DanSun: So this team, or ringmark, is going to connect to test harness to get results?

<fantasai> mattkelly: Yes, the goal would be to integrate with the test harness

<fantasai> mattkelly: would need to make changes to do that, but that would be the goal

<fantasai> mattkelly: Ringmark would just be a results page, rather than a runner and a test suite and all that stuff

<fantasai> mattkelly: we would just sit on top of what the group produces

<fantasai> some confusion over test harness and testharness

<fantasai> darobin: We want 2 things

<fantasai> darobin: go to a page, and it tells you your browser sucks, what we have today

<fantasai> darobin: that would run the tests right there, automated tests

<fantasai> darobin: Other thing is to use the same visual component to get results from the W3C database (or some private database)

<fantasai> darobin: Idea is to produce multiple reports that are buzzword-compliant

<fantasai> DanSun: Two step process or one step?

<fantasai> DanSun: Run first in W3C harness, then Ringmark?

<fantasai> darobin: Depends. One thing will run those automated tests directly and show you your results

<fantasai> darobin: Other thing will pull data from W3C test database, that will be 2 step process

<fantasai> DanSun: Are there documents to run the tests?

<fantasai> darobin: Ideally it should be user friendly enough that you won't need documentation to run the tests

<fantasai> Jo: Note there isn't any one method of running tests, or one visual representation, we're just outlining what FB would like to achieve

<fantasai> Jo: If anyone wants to volunteer for something else, that's great.

<fantasai> Jo gives some history of the mobileOK testing

<fantasai> Jo: Proposal is not to limit how reporting and test results happen, but just to make a start of it

<fantasai> Robin shows off

<fantasai> darobin: Let's imagine you want to run some tests

<fantasai> darobin: you go here, click on the button to run tests

<fantasai> darobin: It reports your UA, lets you choose which tests, and then starts running tests

<fantasai> darobin: Shows you the test with some buttons to choose the results, and some metadata about the test

<fantasai> darobin: The results you produce here, will appear in the results table

<fantasai> Robin shows off the table

<fantasai> darobin: The data used here you can have access to

<fantasai> darobin: in a JSON dump from the system

<fantasai> darobin: Are we all in agreement here?

<fantasai> Rob: The idea I was talking about was to create a short-term hit list

<fantasai> Rob: We can choose our own reporting and visualization

<fantasai> Rob: Everybody can take whatever data they like and show it off

<fantasai> Rob: But we can share the data

<fantasai> darobin: so long as there's a test suite

<fantasai> Rob: And we contribute our tests to the main W3C test suites, so it's valuable all around

<fantasai> Rob: And people can theoretically run private instances of this

<fantasai> tobie: and run the tests on their own devices, yes

<darobin> TAKEAWAY:

<darobin> - target: end of year

<darobin> - Level 1 document

<darobin> - this is the aspirational documentation of what developers

<darobin> need to produce applications today

<darobin> - specific test suite nice and visual

<darobin> - this is pretty, can run atop testharness.js

<darobin> - document for the specific test suite

<darobin> - this is the subset of the Level 1 document that describes

<darobin> the interoperability hitlist that we are targeting for the

<darobin> current test release

<darobin> - refactoring Ringmark to be able to place the visual component

<darobin> atop results from a test run, or stored runs

<darobin> PROPOSED RESOLUTION: the target for this group for EOY 2012 is the above summary

this is the aspirational documentation of which apis are needed by developers to produce applications today

fantasai: this CG is going to focus on which things need to be worked on
... by the end of the year

darobin: mattkelly indicated he had 14 features
... in ringmark
... and those might be what we focus on
... or maybe we trim things out

<darobin> [the test bundle could be called Hit List Zero]

<darobin> RESOLUTION: the target for this group for EOY 2012 is the above summary

<darobin> RESOLUTION: the primary input for Hit List Zero is the list of fourteen features currently focused upon by Ringmark

<darobin> ACTION: Tobie to make a fluffy picture out of the architecture described by Robin for the test system [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action16]

<trackbot> Created ACTION-38 - Make a fluffy picture out of the architecture described by Robin for the test system [on Tobie Langel - due 2012-07-03].

<darobin> RESOLUTION: The group will not try to boil the ocean nor make a perfect system for the first release — which only care about rough consensus and running code

<darobin> ACTION: Robin to draft the architecture of the test system [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action17]

<trackbot> Created ACTION-39 - Draft the architecture of the test system [on Robin Berjon - due 2012-07-03].

Vendor Prefixes

jo: No!!!!!!!!!!!!!!!!!

darobin: I think we're done w/ Testing

[ No ]

darobin: we had vendor prefixes on the agenda
... we agreed as chairs to drop the discussion
... the reason is that the proponent for text in that area isn't in attendance
... i think it's a solved problem in CSS WG

jet: I think it becomes a topic for the last question
... will our tests have prefixes?

darobin: they won't
... I believe the current opinion is that our tests won't have prefixes
... opinions mattkelly ?

mattkelly: can of worms
... we want to strike two balances
... give ability for vendors to move quickly
... and implement things
... move fast
... prefixes introduce fragmentation
... for ringmark we thought about allowing prefixes but marking as yellow
... passing but non standard
... for developers, they just want the feature
... long term, there needs to be long term stigma if you continue to use prefixes
... we need to move quickly and get features in
... but also remove fragmentation

darobin: anyone want to react to that?

Josh_Soref: I think vendors who have employed prefixes shouldn't be punished for supporting prefixes
... in their code

s/Josh_Soref:/jo:/

scribe: but they shouldn't get credit for implementing the feature
... since they did what the CSS WG asked them to do
... what we should do is test for conformance to the spec as finally agreed

Josh_Soref: +1

Beyond Level 1

darobin: tobie, you wanted to talk about your UCs and Reqs doc

tobie: not really that ready
... i'm working on a document for UCs and Reqs for level 1
... i'm hoping to have something to share w/ the group in the near future
... i'm also going to bring UCs for AppConfig and Chromelessness

QoI Testing

darobin: we have a fairly clear plan for Conformance testing
... for QoI testing
... we have agreement that it's cool
... and ideas of what we would like to test
... but no commitment to producing tests

jo: have we enough on our plate
... to do something in that area
... but not yet
... at least not before december

darobin: i have too many Action items

jo: that's largely my feeling
... absent volunteers
... i think it won't be worked on yet

darobin: anytime someone feels like jumping into it
... we welcome that contribution

mattkelly: we have a giant action item for a testrunner-testresults thing
... seems like we can do general compliance testing in parallel
... testing things like speed of canvas is highly important to goals of the group
... it feels like we should dip our toes in the water
... w/o perf tests on things like <canvas>
... even if we get a feature in
... if it's slow and crappy, it defeats the purpose

darobin: <canvas> is the easy one to test

mattkelly: 2d <canvas> perf
... should be something we could tackle by the end of the year

jo: i think that's fairly generous of you to think of doing

rob: for us, we aren't using <canvas>
... about our own ports
... by being able to cherrypick things
... we can use them to prove bugs to vendors
... but if we know there are multiple browsers failing
... then we know of places where we should suggest pain points for future devices
... but we can't do that until we can see where we are at the moment

jo: that's a tentative offer of contributing something in the future

rob: i think it's slightly firmer than that

darobin: any other offers on QoI testing?

http://arewefastyet.com

<fantasai> Josh_Soref: This is essentially a QoI test

<fantasai> Josh_Soref: comparse FF and Chrome with v8

<fantasai> s/parse/pares/

<fantasai> Josh_Soref: I don't actually use this thing, I just knows it exists

<fantasai> Mozilla rep explains the tests

<fantasai> which are used internally to monitor performance

bkelley: it seems JS benchmarking has been done to death
... i think we should stay away from that
... unless there's something we can do that addresses a UC more directly
... maybe a physics computation benchmark
... just stealing + rebranding won't add value

jo: i feel inspired

<darobin> RESOLUTION: For QoI testing, we're open to input, but we won't move on it before someone proposes something specific (FT & FB have tentatively suggested they might think about it)

Wrap

jo: AOB

darobin: is there AOB?
... next F2F?

jo: proposal for group telecoms?
... darobin isn't enamored of the idea
... i'd like to try it
... meetings are difficult to coordinate based on time zones

rob: could we try dual-location F2F?

darobin: jo was talking about Phone Bridges
... separately to plan a single location F2F
... probably close to London

jo: if not @Orange, perhaps @FT

darobin: we know others in London, perhaps @Vodafone

jfmoy: I'll try to do my best if we can host
... but if it's more people than today, that'll be tough in London
... 40 people Max
... i need to check
... if we had to do it in Orange, we could do it in Paris
... I prefer London
... but we have more space in Paris

ACTION jfmoy check on hosting @Orange Oct 2-3

<trackbot> Sorry, couldn't find user - jfmoy

ACTION moy check on hosting @Orange Oct 2-3, in London (alt Paris)

<trackbot> Created ACTION-40 - Check on hosting @Orange Oct 2-3, in London (alt Paris) [on Jean-Francois Moy - due 2012-07-03].

s/ACTION moy/ACTION: moy/

<darobin> ACTION: Jo to figure out teleconference logistics, timing, and critical mass [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action18]

<trackbot> Created ACTION-41 - Figure out teleconference logistics, timing, and critical mass [on Jo Rabin - due 2012-07-03].

jo: AOB?

[ None ]

darobin: many thanks to everyone for coming
... special thanks for Josh_Soref and fantasai (who got dragged in) for scribing

[ Applause ]

darobin: thanks to FB for hosting in this cool location with great logistics

[ Applause ]

Josh_Soref: thanks for calling in lgombos

<darobin> RESOLUTION: The CG thanks Facebook for great organisation, location, and logistics

<darobin> RESOLUTION: The CG thanks Josh and fantasai for their outstanding scribing

tobie: thanks to the chairs

[ Applause ]

trackbot, end meeting

Summary of Action Items

[NEW] ACTION: Jo to figure out teleconference logistics, timing, and critical mass [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action18]
[NEW] ACTION: Josh to survey people and compile a list of common errors in test writing [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action04]
[NEW] ACTION: matt to document JSGameBench and the approach behind it [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action11]
[NEW] ACTION: Matt to remove the dependency on Node to get Ringmark running, and help make it easier to set up [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action08]
[NEW] ACTION: matt to talk to OEMs/carriers about what they would most usefully need to get out of Ringmark results [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action12]
[NEW] ACTION: mattkelly to document JSGameBench and the approach behind it [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action02]
[NEW] ACTION: Moy to provide requirements for an automated test runner of all tests [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action10]
[NEW] ACTION: Robin to assess which existing test suites can be reused and at what level of coverage they stand [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action15]
[NEW] ACTION: Robin to draft a test suite release strategy based on what fantasai and Josh_Soref described [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action14]
[NEW] ACTION: Robin to draft the architecture of the test system [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action17]
[NEW] ACTION: Robin to look into something like jsFiddle for test writing [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action09]
[NEW] ACTION: Robin to write documentation for testharness.js [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action05]
[NEW] ACTION: Shilston to expeditiously check whether it is practical to measure consistency of framerate [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action03]
[NEW] ACTION: Soref to survey people and compile a list of common errors in test writing [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action07]
[NEW] ACTION: timeless to survey people and compile a list of common errors in test writing [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action06]
[NEW] ACTION: tobie to carry out a gap analysis of existing W3C test suites [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action13]
[NEW] ACTION: Tobie to make a fluffy picture out of the architecture described by Robin for the test system [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action16]
[NEW] ACTION: Tobie to provide numbers for required sprites/fps in games [recorded in http://www.w3.org/2012/06/26-coremob-minutes.html#action01]
 
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.136 (CVS log)
$Date: 2012/06/26 22:41:55 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.136  of Date: 2011/05/12 12:01:43  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/can/yet: can/
Succeeded: s/g+//
Succeeded: s/..../.../
Succeeded: s/new/in new/
Succeeded: s/nium/nimum/
Succeeded: s/per game/concurrently/
Succeeded: s/a point of mattkelly 's research for jsgamebench/a point of cory's research for jsgamebench/
Succeeded: s/corie's jsgamebench/cory's jsgamebench/
Succeeded: s/XXX/jet/
Succeeded: s/grou/group/
FAILED: s/s, etc./s, DAP, etc./
Succeeded: s/test/tests/
Succeeded: s/perf/conformance/
Succeeded: s/build/build system/
Succeeded: s/for/for /
FAILED: s/kow/ow/
Succeeded: s/it/it must be easy to write requirements, since you did that/
FAILED: s/ISSUE what are the requirements for a test framework?//
FAILED: s/Testing/Testing Goals/
Succeeded: s/Robin:/darobin:/G
Succeeded: s/different/... different/
FAILED: s/spc/spec/
FAILED: s/runnign/running/
FAILED: s/HTMl5/HTML5/
FAILED: s/+q//
FAILED: s/twork/work/
FAILED: s/towork/work/
FAILED: s|s/twork/work/||
Succeeded: s/darobin:/darobin:/g
FAILED: s/Josh_Soref:/jo:/
FAILED: s/parse/pares/
FAILED: s/ACTION moy/ACTION: moy/
Found Scribe: Josh_Soref
Inferring ScribeNick: Josh_Soref
Present: Wonsuk_Lee Ming_Jin

WARNING: Fewer than 3 people found for Present list!

Found Date: 26 Jun 2012
Guessing minutes URL: http://www.w3.org/2012/06/26-coremob-minutes.html
People with action items: jo josh matt mattkelly moy robin shilston soref timeless tobie

[End of scribe.perl diagnostic output]