W3C

Testing

29 Jan 2013

Attendees

Present
David Baron (Mozilla), Michael Cooper (W3C), Art Barstow (Nokia), Judy Brewer (W3C), Clarke Stevens (CableLabs), Glenn Adams (Cox Communications), Mark Vickers (Comcast), Tobie Langel (Facebook), Lars Erik Bolstad (Opera), Rebecca Hauck (Adobe), Larry McLister (Adobe), Filip Maj (Adobe), Charles McCathie Nevile (Yandex), Jet Villegas (Mozilla), Kaz Ashimura (W3C), Yosuke Funahashi (Tomo Digi), Peter Linss (HP), Hiroyuki Aizu (Toshiba), Alan Stearns (Adobe), Graham Clift (Sony), Jeff Jaffe (W3C), Bob Lund (CableLabs), Michael Champion (Microsoft), Philippe Le Hegaret (W3C), Dan Sun (Verizon), Masahiro Wada (KDDI), Bryan Sullivan (AT&T), Edward O'Connor (Apple), Paul Irish (Google), Cyril Rickelton-Abdi (Turner), Andrea Trasatti (Nokia), Jennifer Leong (AT&T), Olu Akiwumi-Assani (Verizon)
Chair
plh
Scribe
Robin, Bryan, Jeff, David

Contents


<masinter> Larry Masinter, Adobe, lame duck TAG, just interested in encouraging testing, testing-sourced spec review. helped managed interop testing for some IETF specs. some references http://larry.masinter.net/draft-ietf-newtrk-interop-reports-00.html http://blogs.adobe.com/standards/2013/01/16/testing-the-third-pillar-of-standards/

Introduction

[slides]

plh: we will try to make as much of the material here today public
... Hi! I'm PLH, welcome

slide #3

plh: if all the devices that claim to be HTML5 ready were, we wouldn't be having this meeting today
... testing is not something that we have a good track record of doing
... the core Process requirements at W3C are actually rather weak
... it's only a SHOULD
... the Director can approve a Rec even with limited testing

slide #5

scribe: WGs tend to demonstrate implementability in general rather than interoperability

slide #6

scribe: WGs tend to do the minimum to declare victory
... and there is only limited incentive to maintain test suites
... so they get abandoned and forgotten over time

slide #7

scribe: to get HTML5 to Rec, the HTML WG is going to use its judgement and not necessarily test absolutely everything
... it's a really low bar
... if we're counting on the HTML WG to produce a full fledged test suite, we could wait a while

Bryan: that enters in the definition of a Rec
... in this case, a Rec is not necessarily a verifiably interoperable document
... everyone needs to be aware of that

plh: yes, and if did that it would delay shipping

slide #8

plh: I looked at the mobile and TV profiles

(techs listed on the slide)

CoreMob 2012 and DLNA HTML5 for TV

scribe: if you are serious about a profile, you have to be serious about testing as well
... there is a lot of overlap between the two
... the overlap is probably what we want to target for testing first

Mike: when they list a spec, is it complete or a subset?

Mark: for DLNA, we say "all the mandatory parts of $givenSpec"

Mark: e.g. HTML does not require a given syntax, or JS
... so all the specs are in effect 100% included

tobie: same for CoreMob

tobie: we consider it to be the role of the WG to cut specs down if they're not implemented in devices

jeff: these common parts, are they 95%, 30%, etc. of each side?

mark: it's well over 90%, even if only because HTML is so big, and we
... are converging more
... some things are not listed because they're required indirectly by specs we require

tobie: same for us

plh: there's also a document in the wiki that lists the differences
... CoreMob as more things that are not in TV
... e.g. touch events, geolocation, things that don't make sense
... conversely, TV lists the image formats which CoreMob does not

Mark: to be clear, the DLNA profile is not TV-specific, it's for all DLNA devices
... so if we sat down with CoreMob we'd probably align
... eg. for touch, you might touch a DLNA device

tobie: I think we have a common view, same apps

robin: align?

mark & tobie: yeah we should

plh: some of the documents listed in the overlap are very stable, others rather unstable

slide #9

plh: a bunch of those documents have no tests
... e.g. HTTP, Web origin
... others have some tests, but are far from complete

<lmclister> I thought there were B& B tests?

bryan: when you see "we"?

plh: W3C

bryan: we need to figure out where to source the tests

dan: HTTP is not W3C, right?

plh: correct, neither is ES
... but both are referenced
... we like tests

slide #10

plh: we also like? test tooling, review, coverage, results, documentation
... first and foremost we like consistency across WGs
... right now we don't have that, and it's painful and problematic
... it keeps biting us

slide #11

plh explains the terminology on the slide

s/we also lack/we also like/

slide #13

plh: we can do more than the minimal target!
... we can also do regression tests

<glenn> need better specification of "features" also; some specs normatively enumerate "features", e.g., http://www.w3.org/TR/ttaf1-dfxp/#features

slide #14

plh: there are various strategies to increase coverage
... including crowdsourcing and subcontracting

slide #15

plh: for crowdsourcing, we need a lot more documentation that can help people get there
... the quality of the tests that we receive varies a lot

hober joins physically after joining in IRC

slide #16

plh: for subcontracting, you have to count at least $100 per feature test
... HTML5 is roughly estimated at 10000 features
... you do the maths
... that's a minimum price, the quotes vary a lot
... we don't know the quality of the result

mike: how do you map features and tests in there?

plh: it's one feature one test in this case (so it might contain multiple unit tests obvously)

tobie: and $1 million doesn't count the fact that you have to review the tests, which doubles the amount at least

Mark_Vickers: there's a substantial cost, but we're paying cost now without the tests because we have to deal with interop, and the problem increases exponentially
... if you multiply that by the number of companies, the expense is vastly superior to a couple million dollars
... if you divide that by a number of contributors, you get a very reasonable number
... we would certainly be williing to contribute our share

bryan: certainly the numbers are big, we've contracted before and it's expensive
... but I think that this overestimates the cost of focusing on the priorities
... a lot of those features have been around so long that they don't have interop problems

<bryan> Counting # of features overestimates the cost of developing an effective test suite, as many features have been around for 10 years or more and should not be priorities for testing.

bryan: so if we focus on priorities, it costs less

jeff: I think both Mike and Bryan are correct
... in general I encourage the companies to come back with their own perspecitves about how they see this happening

<masinter> you might be underestimating the work, because some of the documents haven't been reviewed for testability, and trying to test features will come up with many document bugs and ambiguities

slide #17

plh: we have to figure out our priorities, you folks have to tell us where to put our resources?
... CSS Animations is a moving target, should we test?
... HTTP, should we leave that to the IETF?

slide #18

plh: a test management system, identifying coverage and gaps

<masinter> IETF doesn't do testing itself; HTTP testing was done with self-reporting of interop

plh: several different groups use different things
... CSS uses Sheperd, HTML is using GitHub
... how do we do test reviewing
... documentation, consistency across groups
... it took us 18 months to get all groups using testharness.js
... so we have to consider a similar timeline for the rest
... the CoreMob people want a test framework
... the current one we have has problems, we have to figure out how to move forward
... and we have to figure out how important reporting is as part of that

slide #21

plh: need to document the full process of test writing
... especially at introduction level

slide #22

plh: I want three things from presentations: GOALS, REQUIREMENTS, PRIORITIES
... if you can't list those, I'm not interested in your presentation :)
... then based on that, figure out how we achieve this and what resources we have
... if we have priorities that people aren't providing resources for, then we'll drop them

slide #23

<Zakim> kaz, you wanted to mention the assertion manager software which VBWG and MMIWG have been using to manage test assertions and test suites (though there are some more candidates)

kaz: there's a tool used by Voice Browser
... we have a simple kind of DB with HTML UI, that we use to manage tests

Testing Accessibility

[slides]

Judy: some accessibility testing requirements
... integrating a11y testing where possible

slide #3

scribe: a11y is distributed across the OWP
... a lot of features support a11y in a variety of ways
... the ways in which that can be tested varies, might require additional expertise around the table

slide #4

scribe: in some cases a11y is required, at times by law
... problems with a11y can shut out a market

slide #5

scribe: examples of how this is handled in some cases, in some browsers
... within HTML5 there is extensive embedding of ARIA
... there are 60 roles, lots of properties that apply
... so about a 1k feature tests are required here
... also need to test focus management and ARIA events
... reference material for requirements on the slides
... testing of AAPI is particularly important
... the benefit of AAPIs is that UAs don't need to have direct knowledge of AT but just expose an API for ATs to plug into
... AAPI tooling could make it possible/easier to automate such testing

[I wonder if WebDriver has AAPI support]

slide #7

scribe: PFWG has a harness that is worth looking into

slide #8

scribe: one of the challenge of testing for a11y is the broad range of ATs out there
... we have a project to compile information about actual AT deployment, support, usage
... automatable tests expressible through WebIDL
... we need to ensure that we have a11y experts coordinate with other test suites

<Zakim> bryan, you wanted to ask: To get started on accessibility, is there a wiki for developers that identifies the most broadly valuable features, as a way to get started and clarify

bryan: is there any information that we could use to identify priorities to build a11y into our tests, make sure we target the most important cases?

Judy: yes, I think part of the issue is that the information is distributed rather than centralised
... if you want to do a11y testing for x features, it's not necessarily clear where to go
... for ARIA, there's already a fair amount, but outside of that we're missing centralised repo

Michael: we don't have a central place because we're driven by feature maturity
... we circle around to needing testing in order to answer your question
... but we could certainly make a list of priorities

<Zakim> darobin, you wanted to ask about WebDriver having AAPI support, and how automated tests are expressed in WebIDL

<Zakim> chaals, you wanted to suggest that this is a question markets will answer, rather than W3C trying to suggest who matters most

chaals: one of the issues that we will face, which is made clear by a11y, is that there are different priorities for different markets
... in particular for a11y
... W3C is probably not the organisation that we should ask to choose between the priority of things that only affect blind users, as opposed
... to those that affect only some subset of hearing-impaired users
... so there's an issue of the [...] that we the members need to figure out for testing

Judy: I have no idea what chaals is getting at [due to strobed audio]

<chaals> [there are priority decisions that we won't effectively make in committee...]

plh: we are here because of market conditions, and one way of driving the priorities is through the market

Judy: I think there's a value in planning for functionality that we spent a while building in

<bryan> thanks Judy and Michael, I would like to followup with you on developer priority-clarifying resources, similar to our use case - focused CoreMob 2012 spec

Judy: I hear the point, but I would rather do good planning for testing, then let the market take its best shot

<chaals> [and we get to a point where looking for prioritisation between orthogonal requirements isn't something we can do from inside W3C]

<chaals> [I think Judy and I are largely in agreement]

AT&T Goals for W3C Web Testing

[slides]

slide #1

bryan: CoreMob has been publicly mentioned by our CTO at the ATT summit, we are going HTML5 all the way
... we need to avoid overselling
... and we need to check that specs have been implemented and tested
... we need to be able to tell developers what works
... our goal is NOT to help W3C produce specs, it's to help the Web be a stronger place

we === ATT

bryan: priority is a detail question
... we can't do 10k tests right now, we have to focus on 1k to do quickly
... testing priorities needs to be a living document

slide #4

bryan: we want the test framework to be a core part of WGs' work
... and we need to be able to export it and run our own copy
... we need to make sure that everything that can be automated is
... we need to expose a resource to the web community about data gathered through testing

slide #5

scribe: test assets need to be associable with features
... tests need to have a clear life cycle that we can document
... and the life cycle needs to be the same across groups

slide #6

scribe: we're very much behind CoreMob and its inspiration
... W3C are all over the place
... and nothing is clear
... and sometimes there are no tests

slide #7

scribe: in house we have lots of tests that we use for all the devices we ship
... for the Web we want to focus on what is used by everybody else
... we need to find and target gaps
... and we can take on test writing
... if it's focused enough on a feature, we can help
... this is a priority for 2013

jennifer: we're interested in doing a lot of this testing for pre-launch devices
... so we need private instances of the test framework
... to avoid information about devices leaking
... manufacturers should be able to use them

bryan: so if we can clone a TF, then the manufacturers can run it
... and when we go public, the data is available already

<Zakim> darobin, you wanted to ask if features can == spec section or not

robin: can features be considered to just map to spec sections?

bryan: that could be good enough, so long as we can map thing properly

plh: you said that you could help with test writing, but not with the framework, yet you make the framework a priority
... so how does ATT see their participation?

bryan: that's a very good question
... we want to contribute more time to this
... but we're not framework experts

plh: funds are always welcome

bryan: if there's some kind of sponsorship programme for this I can definitely speak to the right people

glenn: comment on feature
... specs could formally enumerate and identify features
... we did this in TTML and it's been very useful

<glenn> http://www.w3.org/TR/ttaf1-dfxp/#features

glenn: notably for test evaluation

plh: but that shifts the burden to WGs, which we'd like to avoid
... you'll get push back

glenn: this can be done in separate documents that aren't the main document

bob: a thought about the feature issue
... I think that what we want is to demonstrate conformance against specs
... we run a bunch of tests, and we can say "yes, this is conformant"
... we'll have to think about feature to get there, but for us the goal is really conformance to the whole spec
... we may accept demonstration of lower levels of conformance in the short term, but the endgame is a high level of conformance demonstration
... so we shouldn't spend too much time wondering about features

bryan: I didn't show the spreadsheet, but I think it can provide input to test coverage assertion

<masinter> suggestion: you'll never get to perfect coverage or even coverage-in-depth. Focus on breadth and regression testing: get *some* tests for every spec, then focus on testing against complaints.

Comcast

[slides]

Mark_Vickers: just a couple caveats
... Web&TV just started a testing TF
... I'm just providing my input, not the TF or the group

<Zakim> bryan, you wanted to note that the CoreMob 2012 coverage analysis I did is intended as input to an aligned effort of test coverage assessment, to which we will contribute

Mark_Vickers: we deliver video apps to "screens"
... I don't believe in separate profiles for different devices, we're justfocused on video apps

slide #3

scribe: we shouldn't change the way that groups use testing to ship specs, that's fine
... but we should do more in testing
... the cost of developing a cross-browser app is still high, despite improvements
... this is a reason to improve the consistency of the OWP
... and this cost is multiplied by devices, number of developers, etc. — a really high cost
... there are three legs to an API definition: spec, docs, tests
... we have specs, and now we have webplatform.org for docs
... but we're missing the third leg of that stool, so it falls over
... we want a webplatform.org for tests
... some ideas on how to do this
... mechanism for developers to report inconsistencies between browsers
... maybe it could make sense to use webplatform.org
... then generate tests for that problem
... sometimes it's a browser bug
... but sometimes it's a spec bug
... another angle is to review libraries
... since they deal with browser inconsistencies, you can just go through the code and everything that is papering over problems is a bug
... we also need outreach to ensure that we prioritise based on the needs of web develoeprs
... we need to formally take it on rather than consider it as side work

<bryan> +1 to creation of a resource for inconsistency publication, consensus based and allowing for explanation by vendors

scribe: so that it can be funded, resourced, etc.
... when DLNA references the OWP, we made a clear commitment not to define new specs
... and other orgs are doing the same
... many of those provide testing and certification
... it's important because some times you need to be able to claim that you're aligned with e.g. a national standard
... we need to make it easy for those organisations to use our tests
... in DLNA we wondered about creating our own tests for that
... but if you define your own tests, you're defining a new standard
... external groups have a lot of problems reusing W3C tests
... they move, they break, etc.
... I do not think that W3C should take on certification
... but we should provide those organisations with the ruler they need to use
... we need One Home for all tests
... need to configure which tests you want to run
... One Click To Run Them All

<bryan> +1 to W3C enabling certification providers to serve a market based upon W3C recommendations - but maybe they are also stakeholders to which W3C should look to for support?

scribe: need to be able to save a detailed test run

[shows the Khronos tests for WebGL]

scribe: DLNA finds that the WebGL tests are great and are a goal
... it would be useful to be able to load an existing profile that can configure the run so you don't have to repeat that over and over again

<dbaron> he's showing a URL that looks a bit like http://www.khronos.org/.../.../tests/webgl-conformance-tests.html but I can't read the middle part

jeff: so when plh established the objectives, I hear it very much around the technology of what to test, picking the right profiles to prioritise
... but hearing you talking about linking to W3C tests, looking inside libraries, I also heard more of a curation and management role than what plh described
... is that just a different expression, or something you're adding?

Mark_Vickers: there are definitely two aspects
... the external organisation, could rely on a centralised test runner
... if we have that, then they can just use it

<dbaron> https://www.khronos.org/registry/webgl/sdk/tests/webgl-conformance-tests.html

Mark_Vickers: I don't think we need curation so long as we have a clear way of setting things up

<Zakim> dbaron, you wanted to say that getting feedback from authors or libraries is great, and I think we'd be likely to find that much of the feedback is about bugs in old browser

jeff: so if we do it right then it won't require curation, good point

dbaron: building a way for authors to provide feedback about what's not-interoperable would be great
... but we need to make sure we don't then get to target old browsers for bugs that have already been fixed

Mark_Vickers: yes, filtering would be an issue

<Zakim> bryan, you wanted to ask if the test results output should include a signature from the test server to validate that the results are certified?

bryan: just a note, as Mark_Vickers is describing the output of this test run
... we could add a signature to that run to validate that it did occur

robin: it's too easy to break that

Mark_Vickers: we can handle honesty by contract instead

a12u: I think a permanent link to each test is very useful
... because it encourages external testing

<chaals> [(in response to dbaron) Agree that it is important for us to be looking at what is new. But there are many people who would like to have a historical record of support over the last 7 years (because it is as important to their work as understanding what will be released in 3 months) - and after 7 years we will enabled others to collect that information if they want it.]

CableLabs

[slides]

bob_lund: intro to cablelabs
... suportive of the vision outlined by Mark
... building a ref impl of the DLNA RUI as defined by DLNA
... (shows a dongle with HDMI & USB interfaces) which connects to the back of a TV
... looking for wider implementations and OEMs to take advantage of it
... DLNA product vendors will be a primary adoption target
... DLNA expects compliance of the implementations to ve verified
... DLNA needs a way to run tests & show compliance, wants the framework and tests to be W3C developed
... beyond HTML5 there are specific requirements of DLNA e.g. for the video tag
... product-specific tests will be defined for these
... also e.g. multiple audio tracks e.g. for accessibility
... these additional tests should use the W3C framework and be made available to W3C
... creation of test media will be included
... extending test scope beyond W3C scope e.g. for manual tests is another objective
... Cert Orgs like DLNA would like to use a framework in the three areas mentioned
... (shows W3C & related specs of interest to DLNA)
... re Test Framework Requirements, shows gaps e.g. a single URL to the framework
... local instance of the framework - work in progress with W3C help e.g. Robin
... product devs need to create their own test suites - the framework does (kind of) support that
... multiple test formats e.g. JS, Ref-Test, Manual, are needed - unclear if all are supported
... need more flexibility in reporting indiv results & How they are aggregated
... also need to include the Framework in nightly build tests

plh: anyone disagree with one or more of the requirements shown (slide "Test Framework Requirements")

tobie: the requirements are great

<dbaron> Though some of the requirements do seem like nice-to-have and some really seem like requirements

Clarke: a comment - refining Web & TV is a goal of the TF starting next week - please sign up if you have requirements for that

Dan: anyone have ROM estimates for building such a framework as described?

plh: how far is the W3C framework from your requirements

bob_lund: it's a great starting point

darobin: one problem is syncing with the repository

bob_lund: recording is another major area

Mark_Vickers: this is not just about writing down ideas, we are looking for committments on getting it done

Mozilla perspective

[slides]

jet: W3C can help us be both correct and fast in how we test the Web
... we need to address development focused tests, and compliance focused tests

<Clarke> Clarification of my comment above: The Web & TV Testing TF is starting next week. The primary objective is to define testing requirements for Web & TV use cases. Please sign up if you are interested in testing Web & TV stuff.

jet: development tests need to support experimentation, local results (not uploaded to a server), automation, inclusion in a regression harness, ability to extend to special APIs use in development
... compliance tests need to focus on stable UA versions, enable server-based results collection, support automated and manual...
... numeric scores vs % passed, having a specific number of passed tests to enable rolling up the counts per spec
... in summary, automation is key, focusing on testing vs keeping score, and lowering barriers to contribution

dbaron: one more thought on the etsting vs scoring, the more broadly we give a score, the more time we spend arguing about the score and managing reports by others on the score
... we need to avoid debates on scores and weighting within a score

jet: we also have more granular requirements that we can provide

mark_vickers: re the scope issue, agree that W3C should not get into certification or posting scores - that would just cause fights

plinss: WG's need the ability to score to keep track for REC

jenleong: a score will be more useful for devs to look at categories of features, to see how well they are implemented, vs a single score

tobie: devs need to know what features are supported where, understand this may have issues and puts vendors on the spot, but the value for devs is tremendous

tobie: e.g. external sites can access the results and provide that value to devs

jet: will format a list of the detailed requirements for later

girlie_mac: (agree) about value of making data usable externally
... curious about Mozilla's own web APIs, is it expected to support proprietary features in test results

dbaron: we write tests to include all that we work on, re.the "special powers" API this is something we use internally

Bryan: it's very important that W3C not being seeing implying certification level but let's be data focus for external provider for what matters to them. having access to the raugh count is important

andrea: thinking about the web platform, it was mentioned that W3C will not be a cert body. but on webplatform.org there are can i use type tables - we need to make sure this is not taken as a cert statement

andrea: we should also have ability to import other data into webplatform.org

tobie: re webdriver, there is value in using it to automate reftests, we should look into it

plh: re test results, we look at webplatform.org as part of W3C to access to the test results is a given

mark_vickers: we just don't want the data becoming a barrier, an unhelpful influence

bob_lund: we see webdriver as very useful, and re the special powers API it's similar in objective to we would like to see it align with webdriver

Testing the Open Web Platform by Tobie

[slides]

tobie: as Facebook AC rep, this is our perspective in 3 parts
... it makes the life of Engineers easier
... Platform developers
... people using FB
... the end goal is to improve the Web; state of a spec is less important than use of a feature
... driving to fewer bugs, better interop, ...
... devs need to know feature support
... to focus browser development e.g. through CoreMob
... and to make bug reporting easier - there is no tracking of bugs across browsers - when we hit issues, we don't know how to report them to the vendors
... the framework should enable bug-driven tests to feedback to vendors
... including a social aspect to this for crowdsourcing, e.g. using github

jeff: re scoring or not, it seems to be a nice to have from FB's perspective. On this slide it seems more fundamental as a requirement
... it seems here to say this needs to be very visible

tobie: yes, to clarify, this is not intended to lead to browser vendor fights, just support devs

dbaron: no real disagreement, exposing results at a feature level is fine, but a higher rollup is problematic

???: a complication is when external sites use the info, there will be possibilities for number games

tobie: we believe this is a long term effort

tobie: including aspects for infrastructure, process, outreach, education, and data driven

chaals: We can't stop people from extracting some simple number, and there is a certain amount of motivation to do so. But we should clearly disdain such raw numbers as far as practicable.

mark_vickers: consumer reports protects their report numbers through license restriction

jeff: heard some a very clear message on partnering with other orgs e.g. DLNA
... not getting into the scoring game
... want to cover specs without needing to hit ever corner case

<chaals> [+4 to Tobie's point about this being a long-term exercise, BTW]

jeff: support different consumers e.g. devs, engineers

plh: we will cover how to increase test coverage after lunch

W3C Testing by Larry McLister

[slides]

lmclister: skipping to slide "test the web forward"

lmclister: part of getting more tests is more contributors e.g. TTWF - next is Syndney sponsored by Google
... preparing event kits to enable more events, smaller ocused group events, reviewer training
... growing the community by developing tests outside events, virtually
... re WG Documentation, it's hard to find the test suites, samples, know the review process, and know who is the owner
... who can help us with issues, links to WG resources, and backlog of tests and reviews needed

mark_vickers: have you looked at the cost for the events, re the end results, to estimate the cost per tests produced etc

rebecca: have tried to do some informal tracking of the events, but that has been really hard

lmclister: other test drivers e.g. selenium communities may also be leveraged to add tests

robin: what about the idea that some API groups have to just use the same single GitHub repository as the HTML WG for all the tests?

lmclister: a good approach

dan: for each spec we need to know where coverage is, as clear info to contributors

jeff: any sense of what #s of tests we could get outside TTWF - crowdsourcing is financially the most attractive, but we need to predict how well it will be able to deliver

lmclister: putting a # on tests we can expect is hard - we have historical data but that future prediction is difficult

jeff: if we put focus on fundnig critical path tests, would that de-motivate the crowd sourced lowe priority tests?

plh: are you considering webinar type events?

<darobin> [worldwide test hackathon!]

lmclister: we are trying to get the community online principally, not really/necessarily in a realtime webinar type context

plh: have you tried to approach and leverage universities
... are you looking at docs on how to write and contribute to tests?

lmclister: we are looking to the WGs to write docs for their tests

plh: so we need a central repository of the docs so testers know how to use/develop

darobin: things are currently spread all over

alan: you can't necessarily score past events as people are learning in the process

alan: one idea for a focused meetup would be to convert mochitests to W3C tests etc. this would promote those involved with a specific engine to get involved

tobie: one issue with Ringmark development, is that knowing what specs related to what WGs will help
... 2nd thing is that contractors need documentation/process similar to crowdsourcing, so we have to do that anyway
... we need to move faster on test approval, as we will lose test devs if it takes too long

jeff: I had an expectation that browser vendors would talk about internal tests, and aspects of how they are/can be made available to W3C - what are the obstacles etc

bryan: the first browser testing workshop covered a lot of that, it would be interesting to see what has changed

mike: the intertia to move internal tests to the public, e.g. different frameworks were documented in the last workshop

plh: some issues we face, is that we can throw money at converting tests...

lars: we have donated the bulk of our tests to W3C already

rebecca: some obstacles are the lack of clarity on how the specific things for the different test sources can be harmonized for use by W3C

Personal Views on the Test Plan, Masahiro Wada

[slides]

wada: intro to KDDI. mobile phones & software are complex and many tests are required. This talk is based on our experience. KDDI is a leading Japanese mobile company.
3M strategy: multi-use, multidevice, multi-network. Music, mobile, games, money on mobile devices, smartphones, e-books, PCs, TV, etc. Networks: 4GLTE, WIMAX, FTTH, 3G, WI-FI, CATV
KDDI wants HTML5 to underly a hardware-independent fashion. Anywhere, anytime, seamless. E.g. download maps at home using pc/tablet, then while driving, the mobile network allows us to use this data on the road. We can also download additional data using smartphone, which then communicates via wi-fi with the vehicle.
note: this represents my personal view and not any organization, including KDDI.
the platform is HW-independent...
slide: Is HTML5 perfect? Implementation differences exist. Problems porting from other platforms. Lack of security issues.
In Japan, content providers are waiting until HTML5 becomes more stable & mature. Some believe that this cannot be achieved by the current ability of mobile phones. Also, they have no incentive to move from native to HTML5 b/c they are successful in native already. "conducting wire" necessary for HTML5 to be come successful for business
<slide: Goal of Tests>. To obtain trust from the industry that the open web platform is reliable, and...
<slide: Necessary Tests>. Specification-based tests: Works against the spec, performed mainly by W3C. User-viewpoint test: verify the functionality, quality, reliability of the products. Performed by industries. Test use cases not thought of by spec writers.
For specification-based tests: Who will conduct the tests? W3C? How will we keep on schedule? For outsourcing, tests must be well-defined in detail beforehand. Who will analyze and feedback the results? Are there sufficient tests? What about duplication? Who will make this analysis?
For user-viewpoint tests, crowdsourcing will be essential. The test platform will be mandatory.
<slide: development of test platform>
<slide: Development of Test Platform> We can read & write tests, and obtain expected results on this platform. External parties can also use this for their functional or interoperability verification.
<slide: Test Platform> Should be open to parties outside W3C. Need automation, management of test content db. Mechanism to utilize test results must be provided. Also need users support (e.g. Q&A)
<slide: Way of proceeding> Dedicated test lab is necessary for executing, hosting, and developing tests. Probably not achievable on a volunteer basis. Analysis will also be performed in the lab.
Emphasis on keeping a chedule. We will wish to exhibit at appropriate venue, e.g. CES

Tobie: Happy to see alignment on many issues. What do you mean by 'test lab'? Is it a working group?

wada: practical dedicated resources

tobie: so team who can lead day-to-day?

bob_lund: you also mentioned hosting

wada: and manage outsourced resources

stearns: We should have W3C-wide test owners

philippe: I agree except that we haven't had much success with test suite owners so far. They cannot be counted on to review the test

stearns: the idea is rather for the owner to coordinate, to act as a contact point.

darobin: ... pull requests will make this easier. I've been doing this. If we keep up the plan of a shared repository, then this will be doable. The past process did not work, but the new one for HTML is better

plh: how do we hire these people?

rebecca: A lot of it boils down to what the owner is/isn't. If we define the expectations, then they can delegate tasks. Test suite owner may own a to-do list. This is similar to "test facilitator". Document the duties clearly. Set term (6mo, yr)

mike (mc): What is the model for moving forward? Should we be thinking harder about the process? Don't want a multi-year conversation. Let's define the next step. What are some achievable goals that we have resources for?

plh: tobie, do you want to do a document?

tobie: sure

Improving W3C Testing Activity with "Testing and Interop Lab"

[slides]

yosuke: This presentation is my research from Keio Research Institute and not any other organization

yosuke: Keio University is planning to create a new lab for testing. Keio has money & resources to foster interoperability testing,
which can be used. Want to enable industries to adopt open web standards more quickly and easily. Feasibility study ends Feb. Tracer-bullet project starts Mar. Eval in June
<slide: Short and Medium Term Objectives>. Develop and polish testing tools and frameworks, testing infrastuctures. Try to get industries to adopt living standards. Develop methods to test devices effectively & efficiently. ISO already has framework for certification. but organizing the program is very heavy and ineffective. Need a more lightweight framework for certification. Get them
to use W3C tests for their certification.
<slide: My Today's Agenda> Using the tracer-bullet project. It has a small budget and small resources. Want to use this meeting as a point to get ideas, prioritize them, and align with W3C
<slide Initial ideas on how to improve HTML5...> For croudsourcing, we need: visualization of test-suite status. Also need links between spec features, docs, test code, reviews and central test runner. Also need gamification
<slide Initial ideas on how to improve HTML5...(cont'd)> Need fully functional HTML5 spec doc. Link to corresponding test code in github. Let test writes "reserve" part of the spec they are going to write. ...
<next slide> Write test code the UA manufacturers don't have time to. Organize test writes from SE Asia & India academia. Refine existing tools & integrate them. E.g. improve idlharness.js with better automation. Video & video tests are the hard part from point of view of browser vendors b/c checking test results could slow down the testing site significantly
testing visual elements ...

<darobin> [it has been mentioned before to use something like jsFiddle plugged into test submission]

<Mike5> "Organize test writers from .. academia"?

jeff: Thanks for the offer to partner with Keio University. What is the quantity of test cases we could count on from the Keio initiative? We need lots of test cases, lots of sources of writers.

yosuke: We have not decided to use our resources & money to improve individual specs or platform or framework. As for the tracer bullet, we have concrete budget. Once we decide what we want to do, we can figure out how much we can accomplish.

<Zakim> tobie, you wanted to comment on having a centralized place for info on test status.

tobie: Two previous speakers mentioned the necessity for a place to keep test data, what was covered, what was missing, in the specs themselves. This is a key deliverable for the near future. I'd like to add more discussion on this when we dig into the test frameworks. How do we get the tests and which tests do we write?

darobin: You mentioned mediawiki. We haven't prototyped this but perhaps we could use jsFiddle for people to enter tests. It's familiar to developers, you just enter your code. We can include testharness directly in it. Leah, who is on the W3C team, could help us

plh: what is jsFiddle?

<girlie_mac> http://jsfiddle.net/

darobin: online code editor that allow execution

dan: question for robin. Do we have an idea on how much coverage we have?

darobin: Philippe has started this document. It's not up-to-date with the latest spec and needs work, but we have the basics. For each section, we have the tests and how many

darobin: some the sections will have 0 but once we have the data we can incorporate into the spec itself.

darobin: it's all automated. previous version uses metadata about which section it's for. however copy-paste issues arise. to help people get it right, the tests get put into a directory structure which maps to the spec.

tobie: It bothers me that we know how many tests we have for each section, but we don't know how many we need!

tobie: is there any way to measure coverage in a more precise way? Can we get information from parsing the specs? I will try again because there could be value in it. OTherwise it needs to be done by hand

mchampion: I've heard a lot about HTML5... Do webapps & HTML use the same test framework?

darobin: yes. We will move to using github for everything and this will simplify things.

mchampion: someone has to put in the metadata

darobin: the tests don't care what wg they belong to. but we do have a way of mapping back to specs that works.

plh: good transition into the specs

plh: television/mobile profile. I would like to get a sense from the room about priorities. If there were only room to do one in 2013, which one would we do? HTTP 1.1, Web Origin Concept, ECMA SCript 5.1. Raise your hands if you think it's a priority

Goals and Requirements

<plh> http://www.w3.org/wiki/Testing/Common

plh: <reading off groups of specs to vote on>. This is a complete list. <entering list into IRC>

<jenleong> HTTP 1.1: 0

<jenleong> Web Origin Concept: 0

<jenleong> ECMAScript 5.1: 0

tobie: coverage is actually very good for ECMAScript

<jenleong> HTML5 Canvas 2D Context: 16

<jenleong> HTML5: most of the people in the room

<fantasai> tobie, test coverage is hard to measure unless you break out a spec into test assertions (*not* the same as 'testable assertions')

<jenleong> CSS 2.1: 6

<glenn> CSS2.1+, CSSOM+, DOM4+

rebecca: this is being broken into other specs.

bryan: but some work needs to be done in 2013?

<jenleong> CSS 2.1: 2

<chaals> html5 canvas, h5, over CSS 2.1

<glenn> CSS2.1 ++

<chaals> CSS animation we would like tests for

plh: spec is unstable?

<jenleong> CSS Animations: 13

<fantasai> tobie, Melinda Grant and I did this for css3-page back in 2008 or so; I can show you that as an example. The actual number of tests was much higher because we didn't break it down quite enough in the first pass...

<jenleong> CSS Background & Borders: 17

<jenleong> CSS Color Level 3: 1

<fantasai> RRSAgent: pointer

<chaals> transform: probably

<jenleong> CSS Transforms: 20

<jenleong> CSS Fonts Level 3: 12

<chaals> transitions.

<jenleong> CSS Transitions: 21

darobin: Rodney Reihm already wrote a test suite for this

<glenn> +1 for CSSOM

<glenn> +1 for CSSOM View

<tobie> fantasai: would love to see this and get your feedback on this effort.

<glenn> there is considerable divergence in behavior in CSSOM among UAs now

Bryan: what is everyone's criteria?

Mike: it's in CR

Bryan: we care if it's in Coremob 2012

<glenn> +1

stearns: if something is shipping but the spec isn't solid that is priority

<jenleong> CSS Object Model: 15

plh: CORS just moved to CR

<jenleong> CORS: 8

<jenleong> DOM 3 Events:

Bryan: not referenced by Coremob?

tobie: I added it

<jenleong> DOM 3 Events

<jenleong> skipping that one

<chaals> D3E over DOM4

<jenleong> DOM 4: 11

<jenleong> Progress Events: complete, only 1 vote

<jenleong> Web Storage: 1

<jenleong> XHR: 17

<dbaron> did plh ask for hands for Web storage?

<jenleong> Web sockets is not common enough

<dbaron> er, sorry, workers

<jenleong> he didn't b/c it's complete already

<jenleong> Web Workers: 16

<jenleong> Web Sockets: 8

<jenleong> Indexed DB: 14

<jenleong> SVG: 0

<jenleong> WOFF: 2

plh: a lot of people chose HTML5. How shall we test this? Crowdsourcing, vendors, outsourcing?

l- http://www.w3.org/2011/10/28-testing-minutes

bryan: i'm linking to the minutes from the previous workshop

darobin: if we try to buy tests from a company, we may get low quality. how about we pay a company to converting the tests from one system to another?

Jet: There are a lot of our tests that can't be put into a boilerplate, with server/client components. When execeptions are thrown outside test/test_step function, it doesn't cause test failure.

jet: test functions have longer names. other limitations of testharness.js: can't run server-side code and capture results

jet: reasons for this complexity: avoid relying on window.onerror, and to be able to put several independent logical tests into a single file while having them pass/fail independently of each other (a test failing may cause downstream tests to not run and report failure)

jet: Doesn't seem to be much value for either of these. We are looking at webdriver to circumvent. We can set browser preferences, go outside of browser sandbox (check back button enabled in history stack > 1, e.g.)

jet: currently webdriver doesn't have a good way to wait for given events

jet: if we want these tests in the w3c framework, we need to get these to other browsers in a secure fasion

<Zakim> andreatrasatti, you wanted to ask about device API's like geolocation, device orientation and other sensors

<Zakim> tobie, you wanted to comment on async test in testharness.js

andrea: i will speak later, about priortization

tobie: I used testharness a lot. None of your changes pose a problem. I'm concerned with special-power APIs that don't have anything to do with specs (e.g. back button functionality). Standardizing on a server-side component may be necessary, so we can include what you need.

<chaals> [The problem in picking a server-side standard is that it is untested for interoperability… and testing servers is a useful thing to do]

jeff: most of the issues identified were syntactic issues. Even if we outsourced it, it will take less time to put some sytactic sugar on your test case than to generate it from scratch

<Zakim> darobin, you wanted to clarify about the server-side requirement

darobin: it's easier to have no server-side testing, but how do you test that stuff?

<Zakim> andreatrasatti, you wanted to ask about device API's like geolocation, device orientation and other sensors

dbaron: we run locally, but it's just not another machine. however, what we run is probably not portable to others

plh: We have enough work already to take us through a few years. if your company can work on _________, you are certainly welcome. We are trying to prioritize what to do with our current resources

<chaals> [Robin's point is important. Because for different stakeholders, different tests matter (or not). But this assumes the cost of handling that exercise is cheap enough not to worry]

someone: there are 3-4 things where almost the whole group voted for it.

mark: regarding webdriver, does it need to be more integrated into the mainstream tests?

jet: we support moving functionality to webdriver, but the webdriver spec is missing key use cases.

jet: we're happy to share our tests, but we would want them to come back upstream. They are currently organized into folders. We would need to set it up to handle that

<chaals> [so provenance metadata in the test would help you, jet?]

re Andrea's comment, we developed spec priorities through the profiling efforts e.g. under CoreMob and Web&TV. Beyond that I prioritize based upon availability of tests, automation of tests, integration of tests into the W3C framework

bryan: responding to andrea. The importance of context should not be missed. Device-specific features like geolocation are more important for phones than TVs. Also prioritize those tests which are more incomplete.

plh: let's move to tooling for 2013.

<tobie> https://gist.github.com/4668636

tobie: Took Bob's work and did copy/paste
... An agreement on requirements for test framework
... so we know what we need to build

<fantasai> chaals, I'm not sure that exactly is necessary, but there needs to be careful tracking of which copy is master and which is slave; and if that relationship switches, that needs to be updated in both systems as well

tobie: [reads requirement list]

Test Framework Requirements
===========================
- Single URL to W3C Framework.
- Ability to use the framework to run the tests locally.
- Ability to define and run test suites for specific profiles.
- Single test run.
- Ability to run testharness.js, ref and manual tests.
- Reporting individual and aggregated results.
- Allow browser vendors to run the tests as part of their CI strategy.

DBaron: Some features on list push to manual; some to automation
... "manual" ones are dangerous... will lead to manual
... seen it at CSS
... so let's deemphasize manual stuff

BobL: Agree that automatically is best

DBaron: So drop "single URL to framework"

<fantasai> I agree with that, but think they still need to be handled somehow; that's the escape hatch we have for anything that we can't automate (or can't automate yet)

[discussion whether this is done already]

MarkV: That one doesn't drive you to manual

DBaron: For ref tests it would drive you to manual

<fantasai> that would be bad...

DBaron: encourages manual operation
... would like to gather data through automation

MarkV: That was not the intent. We can add requirement that everything should be as automatable as possible.

BryanS: +1 to Mark

DBaron: Consider Khronos. Load a web page and press a few buttons.
... this should be firable from the browser

markV. But Khronos thing is automated

Tobie: Conflicting interests. Some people feel manual could be a feature

DBaron: Issue is design point. What does it encourage people to do. Should be automated.
... not concerned about minutiae - just the result of what it encouraged people to do

Bryan: Problem with single tests?

DBaron: Depends on how it is presented.

Bryan: That is an education issue.

PeterL: Yes, you should be able to see them manually when needed

MarkV: Automatable as possible. Solves it for me and David.

[wordsmithing of requirements list]

PLH: Don't need to drill more.

rhauck: This is a running issue; not like the other broader issues

<fantasai> Yes, it should be possible to pull up an individual test and examine it. Somehow.

<fantasai> Yes, we should be able to aggregate results from manual tests, for cases where we cannot automate particular assertions

Tobie: I also want to talk about resource center for data and documentation.

dbaron:& I think I want the requirement to be that it should encourage running the tests in an automated way, through the way it works (not through text).

Tobie: this is just tool for running tests

<fantasai> And yes, we should design the systems to encourage automated tests and automated running of tests.

Tobie: I will say "encourage automated testing".

PLH: Anything else on tooling?

hober: enthusiastically agrees with dbaron

<fantasai> And not to encourage the manual running of automated tests, since that wastes manual effort

<tobie> http://tobie.github.com/w3c-testing-plan/unofficial-w3c-testing-plan-20120116.html#create-a-centralized-resource

Tobie: This document (starting at 3.3.2) provides a centralized resource to host test activities
... writing, running tests; consuming results

<dbaron> Tobie's first and second requirements also *sound* contradictory (single URL vs. ability to run locally), though I think they're not meant that way. (single URL where you *can* find/run all the tests, not a single URL that is the only place for the tests)

[reads 3.3.2 of document]

<dbaron> (Tobie's requirements being https://gist.github.com/4668636 .)

Tobie: Tooling also helps review process
... Is centralized resource useful.

All in unison: Yes.

Dan: Can we do all of this in 2013

Tobie: Documentation and test coverage is first priority
... Q1
... test harness let's you run tests
... test framework is second step. not a hard requirement.

Jennifer: With TTWF, should this be community based to leverage branding?

Tobie: Yes. Adobe is amazing. Also Web Platform stakeholders.
... something agile, work on quickly, find a home for this

Dan: What can we expect about test coverage?

PLH: Later in agenda.

<Zakim> bryan, you wanted to suggest that tools (e.g. wiki) aiding in the organization of the overall activity is a key first thing to put in place

BryanS: Organization over next few months is key.

Tobie: Yes. Otherwise become inefficient.
... wiki won't scale.

PLH: Topic: Documentation.

RBerjon: We need centralized documentation
... First introductory stuff
... Many people have tried and are lost trying to provide tests
... cf. TTWF
... need simple templates
... set expectations about time to review sunbmitted tests

Second can dig into spec; conformance, etc.

PLH: Assuming our framework is stable

RB: We document what we have. If we change something, then we change documentation.
... and 20K tests are already using harness
... new stuff would need shim

Dan: Representation has converged to be on github?

RB: Yes.

RH: Lots of information out there... much inconsistency
... Need audit, scrub, and consolidation

<bryan> re Robin's proposal to collect documentation, I agree that introductory guidelines are essential to getting new test submitters onboard. It exists but needs to be coalesced under a easy-to-find access point.

RH: what is authorotative

RB: +1
... Today it is confusing... search engines send you to different places

[War stories of trying to find information]

Robin: i made this at one of the TTWF events. http://documentup.com/paulirish/testharness.js

RH: That said, there can be WG specific direction as well.

TL: Sure; small specifics within a coherent framework

RB: WG specific stuff should also go into central directory
... 5 years later, everyone may do the same thing.

PLH: Break; followed by discussion of priorities

plh: Paul, can you talk about WebKit's perspective?

Paul Irish: I'm Paul Irish, dev rel team for Chrome. Not a WebKit engineer, but can speak to my experience.
... The WebKit tests -- I took on trying to upstream WebKit layout tests at the first testtwf event.
... There's a big challenge because the way the tests are organized doesn't have a clear mapping to specs.
... So any bulk upstreaming is tough.
... Also, many tests are WebKit-specific and not part of standard.
... I did end up finding a few unit tests that were successfully upstreamed and migrated to testharness.
... But it's a very manual process.
... I think the tricky thing here is to find a way to make sure that the w3c tests are able to be pulled down and run against builds of WebKit as it's being developed.
... That seems like the best way to ... as far as helping WebKit to understand that this can be valuable ... is demonstrating that ... I've seen features (pointer lock, IndexDB), where Mozilla authored some tests and WebKit just used those.
... Showing that again and again would make the case that it's worth adding tooling around the testing infrastructure for WebKit to allow running all the W3C tests continually.

plh: We have several companies here using WebKit already.

Jeff: So the characteristic of WebKit tests not necessarily mapping to w3c specs -- I don't know if that's a bad thing or a good thing.
... The reality is that a lot of what we're trying to do is test the Web platform.
... While we often map it to the specs, if WebKit has good tests that go across specs in interesting ways, that would be an important enhancement to have in the w3c test suite.
... So not sure exactly what you meant, but interested in clarification.

PaulIrish: I'd be fine with seeing tests that go across a few specs.

MarkVickers: Since we're getting towards action item part of the day -- seems like hearing this and what we heard from Mozilla. Seems like WebKit, Mozilla, Microsoft... seems like Opera already did... for the others to generate a written engineering plan for each of these to go through the issues for what needs to be done for contributions, tools.
... Is that something we'd need to fund, or ...?

hober: Just to respond to Jeff's question -- the WebKit layout tests are primarily organized on functional basis.
... Sometimes the folder structure is an accident of history.
... I think that's something we'd like to improve over time.
... Overall, I think the platform tests we're talking about here have ended up being organized by WG or by spec (Conway's law, code structure matches org structure).
... We can try to combat the inertia of doing this per-wG or per-spec.

<darobin> +1 to hober on combating Conway's Law

<jeff> David: Mozilla has a similar problem

<jeff> ... lots of tests not organized by spec

<jeff> ... much of the work is figuring out where these tests go

Next steps

plh: So I think one thing that became clear is that we want to do a framework.
... If we want to get serious about it, we should create a task force so the people who are interested actiually set up a proposal for what this framework is about within a month or so, and come back to this group with a proposal.

Robin: Can we reuse public-test-infra rather than a new task force?

Jeff: To me a task force is somebody leading it, and others working on it.

Robin: We don't have a chair for that IG, do we?

plh: Who'd be interested in ... the framework in obviously a priority in 2013 ... who is interested in leading a task force on a framework?
... ... or finding out if we should write a new tool.

Tobie: I can lead a Task Force on test suite framework.

plh: Who would like to be part of this task force?

Framework Task Force: Tobie (lead), Jet, Yosuke, Aizu, Robin, Larry, Rebecca, Bob

<bryan> Three main focuses for work being needed: organization of the work (e.g. documentation/outreach, prioritization, collaboration/task management), test framework development, test asset development (e.g. importing or new test development).

<hober> +1 for re-using the existing mailing list

<Mark_Vickers> Volunteering Bob Lund, who had to go to airport.

<bryan> The current discussion is for one of those three focuses.

plh: the goal of this task force is to tell us what we need to do on the framework and tells us what we need to do, and whether any existing frameworks can fulfill those needs

Bryan: I just dropped something in IRC. I see that as essential, not necessarily something we have the skills to support. 3 areas of focus: framework, test assets, organization of this whole thing.
... We're definitely willing to support the latter two; we leave the framework to the exports.

plh: Do we need a separate task force to deal with test submission?

Tobie: It should be separate.

plh: And this task force on test management systems would need to review what is being done across working groups and propose a solution that could be adopted across workin ggroups.
... Process and tooling around test management.

Tobie: I'm happy to chair that too, but I think they should be two different things.

Bryan: Documentation, outreach, prioritization.... everything other than framework and test asset development.

Tobie?: documentation should be separate

plh: In terms of crowdsourcing, need to help people submit tests. This task force is about figuring out what's there.

Bryan: We could support that effort, depends on how much effort it is to do it completely.
... If we could have a collaborative exercise...

plh: Could you lead the task force or would you rather have Tobie do it?

Tobie: If you feel like this is something you want to lead, please do, but if you won't have the time, please don't.

Jeff: My feeling is that we need a 4 week sprint and then get around the next level down. So if people are thinking about volunteering either to participate or to lead, think about whether it can get done within 4 weeks.

plh: So Tobie as well for test management systems, then.
... So who wants to be part of test management systems task force?

Bryan: management system would also include results database exposure?

Tobie: probably not

?: I'd expect it part of the framework

Robin: Let's make the bricks as small as possible.
... lifecycle... documentation ...

plh: On the documentation aspect, I wonder if we create a task force on this one?

<darobin> same list of volunteers as the previous one minus Aizu

Test Management Task Force: Tobie (lead), Jet, Yosuke, Robin, Larry, Rebecca, Bob

plh: On the documentation front, do we need a task force?

Task force on documentation: Yosuke, Robin, Tobie (lead), Rebecca, Bob

plh: Big chunk remains the test coverage.

plh: One is figuring out what we cover.

Robin: i.e., what is the existing coverage

Rebecca: Are you talking about finding the denominator?
... We have 100 tests out of what?

Bryan: It's a research effort, documentation effort, and then prioritization effort

Rebecca: I'd argue it's about a gating consensus effort

Robin: Better to do it and then let people complain if they disagree.

plh: In the list of specs we discussed earlier, we had at least 5 with >15 people in support
... It seems to me those are the first specs we should start with.

Robin: Not the easiest ones.

plh: Robin, HTML testing TF doesn't hav ea lot of resources?
... If we wanted a test coverage report on HTML5 test suites and proposal on what we need to do, could we count on the HTML Testing TF to do that?

Robin: That would end up as an action item to me.

plh: can you do that in 4 weeks?

Robin: At least something rough, yes.
... As long as I can convince the HTML chairs that it's a priority for editors.
... One thing I can easily do is list tests-per-section and contrast that with simple heuristics (normative statements, lines of algorithms).
... And probably a few other metrics.

ACTION: Robin to report on the HTML5 test coverage

ACTION: Rebecca to report on the CSS transform test coverage

plh: What I'd like to do with the data when it comes back to us is go around this group again and see who can sign up to do what.

<Mark_Vickers> Please add these six specs to the Common spec list: CSS Selectors, CSS Media Queries, Web Messaging, CSS Image Values and Replaced Content, CSS Text, CSS Values and Units

Bryan: Assessment and prioritization, a 2 step process.

Jeff: What was the task Robin just got? How many tests HTML5 needs total? Or how many we have?

Robin: Both. Find what we have, and what relationship to an estimate of what we need.

Jeff: Expect in 4 weeks 2 numbers of the form: need X tests, have Y already, most in browser test repositories

Robin: I wouldn't expect to go through browser repositories in 4 weeks.
... We can compare number of tests per section against the size of section.

Jeff: Can browser vendors help by looking at tests that already exist.

Alan: It would be easier for the browser vendors to identify tests for a specific section and ask if they have tests for a particular section.

Jeff: Would you be able to do the identification early enough that you can hand browser vendors the list in 2 weeks?

Robin: If I put it at the top of my list I can come up with a rough and ready estimate in 2 weeks.

Rebecca: If it's per-spec, I could do transforms.

plh: We were going to start with HTML5....

Jeff: But you had a volunteer...

plh: that would be great
... Jet, Mike, Paul, David: could you answer whether you have tests for X in a few weeks?

Jet: We can respond to that email.

mcham: What are we assuming about shims or conversions?

Robin: Really depends on the browser vendor

lbolstad: Should also add opera to the list of people to send email to

Jeff: In your response, you could say that we have tests to cover the section, but we'd need the following types of shims to adapt it.

Robin: Could, e.g., send sample tests and we could figure out what's needed for shim.
... Opera has made all of their tests available, but the vast majority not integrated into the current repository.
... Some are in submitted section and not in pull requests ,others on public server that Opera has made available but not integrated into repo.
... IT's a lot of work to go through all Opera's tests and integrate into the repo.
... They don't necessarily map onto the structure we have even though they do map on using testharness. Would be great to have time from James for doing that.

lbolstad: Opera will happily assist in converting our tests

plh: Any volunteers for any other specifications that were on the list?

Bryan: We have a list, I have a working analysis of where we have tests against coremob 2012. But they all have to be covered.

Robin: While I'm doing html I can do canvas.

plh: I'm wondering if you're going to come back and say "we have 0 tests for X" or "we need Y tests"

Bryan: Both. Find the state, then talk about the need.
... Then what we tackle is prrioritization

Tobie: Format... what do we need to measure? per-section vs global across spec, percentage vs. ???

plh: I think needs to be per-section at the minimum.

Tobie: We need to agree what it is and standardize it.

plh: How many tests we have and what we think we need, per-section.

Tobie: How deep within sections?
... high-level sections or each heading?

plh: I think the deeper the better

Tobie: agree on a format and a place to store it?

Robin: now?

Tobie: Before everyone goes and does it

Alan: I think we should try starting and iterate, on the list.

Bryan: Earlier point about thing on prioritization. All CSS stuff in a bundle, and I'll need to talk to you about it... whole bunch of thing sfor which we have no tests.

Tobie: I'll see if I can have a look at workers (soft commitment).

plh: One thing I'm curious about: we talked to vendors about being able to produce some tests, and they came back with various numbers.
... How realistic would it be, once we know what we need to produce, should we try throwing money at them to write some tests? Or hiring some engineers to write tests?
... And probably browser vendors were exposed to tests produced by such a vendor.
... Was it a good experience?

<Zakim> kaz, you wanted to ask whether we should ask the other browser vendors about their tests or not

kaz: I was wondering if we should include ???

Jeff: [list of other people who might have tests]
... I wanted to support the idea of having multiple tricks in our arsenal and of managing outsourced vendors very carefully.

Kaz: will try to create a list

Jeff: I suspect that by the time we're done there will be a large number of testcases that have to be written or transformed.

<fantasai> plh, it depends highly on who you're hiring.

Jeff: With a large programming task... if members want to create a ... I don't see W3C staffing this in the long term. Help from the outside is something we could do (maybe from lower cost countries). Don't want to take off-the-street programmer who's never heard of the technology before.

<fantasai> plh, if they're competent, then it's good! if not, then it's a lot of time spent reviewing and rewriting the tests. (I've had both experiences with contractors hired by HP and Mozilla, respectively.)

Jeff: I'd also want to report on experiences at CES; met with some companies, and discovered that it's not the case that we're just finding unvarnished programmers off the street who don't know about Web technology. They shared description of some testing they'd done with a number of vendors across the world related to Web technology.
... I was surprised that in each case they showed me the HTML5 set top box that they had each developed. They're starting to invest in our technology, not just doing outsourced testcases. But also every organization has it's stars/needs to be managed carefully.

<fantasai> plh, familiarity with Web technology is not enough; you also have to be able to think like a tester

<fantasai> plh, and read specs meticiulously

plh: Thing I forgot earlier -- improving testharness.js. Bottom line is working with whoever from Mozilla, Microsoft??,

mcham: Who'd step up with improvements, to make it easier for you to submit tests.

Lars: and James from Opera

Tobie: I'm happy to do a more-specific effort on testharness.js

ACTION: Tobie to look at possible improvments for testharness.js

Jet: sounds fine.

Tobie: or fine within other task force

plh: Also people from WebKit side...
...

Alan: Two issues: Make it easier to upstream existing tests, and make it good enough that vendors want to use it for their internal tests.

Tobie: Could we have commitment from browser vendors to help with this?

plh: jet said yes, paul will try to find someone

Tobie: Can I have poinst of contact, Jet, Paul, hober, ...

fil: ... cordova ...

mcham: Kris for MS

ACTION: Tobie to investigate and report on converting existing vendors into W3C tests

Mark: Look at this whole project... key resources, engineers. Other companies who may not have the right engineers, but might be able to provide some money.
... We have a large number of members, I expect more who could contribute money than engineers.

Robin: And we know of at least 2-3 freelance people who would be good.

Mark: I can't go and ask for money unless there's a plan.
... I'd encourage that.

Bryan: If we have a proposal for specific things that could be sponsored, we can consider them.

plh: That's the point of these task forces.

Jeff: I appreciate offer from Mark and Bryan for financial support; agree we need to size the proble mfirst. Typically what we do when we have an additional effort above and beyond the normal effort, we'll send out notice to the AC and get a sense for that. Preparatory work is helpful.

<glenn> 10M is probably a better estimate

Jeff: Earlier today, Philippe said it's going to cost $1M. Then people said there are tests around, or added other things we need to do. How you approach that from a funding perspective...
... If there's any conversation among the membership that's useful in the next few weeks, or should we hold off after reconvening in about 4 weeks?
... In my mind you represent large industries that have an interest in this but not necessarily a lot of engineers to contribute.

Mark: I've asked up already some, want to see a plan.

Jeff: Talking about communicating more among membership.

Mark: I have a list of overlapping members who might be interseted.

Bryan: If we want to outline a program, and begin to share that with stakeholders to get their feedback, I think that'd be valuable.

Jeff: To be very practical, I think that from the day that some AC rep wants to help with such a fund... that's when they start the conversation inside their compnany, but that takes time.

Dan: I talked with my management... willing to help with test running... want proposal and ballpark estimate.
... Some vendors not able to give reasonable proposals. So looking to W3C for a concrete plan we can take to management.

<darobin> [I wonder if some companies that have engineers but not necessarily with the required knowledge would be interested in training around this]

plh, we know some vendors we could get price estimate from... not saying we should go to vendors for the plan

<fantasai> Wrt any infrastructure we build... let's make sure to get some UX people involved up front!

<fantasai> We've got people like plinss who can think through backend architecture, but the front-end architecture is also important...

Yosuke: Quick comment about financial support: test ??? project. Several options: hiring programmers, money to hire. We'd also like to see and discuss task force plan and if it's a good option we'd support.

<Zakim> darobin, you wanted to mention training

plh: ...

Robin: An idea in case it's interesting: I've noticed in speaking to companies about testing, that in some cases companies do have engineers that they'd be willing to dedicate to testing, but those engineers understand Web tech but don't have knowledge about how to write spec tests.
... Wondering if we could provide some training to train engineers... send somebody to company to do training for writing tests?
... Would make it possible to contribute tests afterwards.

Tobie: An enterprise edition of test the web forward?

Bryan: Train the trainers?

Robin: Need a course that people can follow.
... There's a difference between documentation and a course.

<fantasai> Robin: Much cheaper to come up with $5000 to bring over an expert than $50,000 to hire a test-writer. And increases the capacity to contribute tests.

plh: nede documentation first

<fantasai> Robin: [ facilitates cross-company collaboration, etc ]

Dan: What's going to be the next step after the 4 weeks?

plh: Report results to the participants. Then depends on the propsed actions from the task forces.
... Figure out how much it's going to cost.
... ... resources to improve what we have...
... Similar one for test mgmt system.
... for documentation, less clear
... ... what conclusion the documentation tf can come up with?
... What do you expect docs tf to do in 4 weeks?

Robin: I'd expect, maybe in 2 days, to figure out what topics we need docs on as a priority
... And where we're going to host the docs
... Anything that allows someone who wants to write docs to know what to do

plh: One other thing to explore: relationship of this project with webplatform.org

Tobie: I can take that action item

mcham: One reason webplatform.org is not w3.org/webplatform is because it's broader than w3c
... At least webplatform.org is somewhat neutral.
... could be a test subtree parallel to docs subtree; I don't have strong opinions.

Dan: A question: for test framework, were talking about ??? profile. We should be able to repeat tests if there is a failure, vendor may need to rerun to verify. Can take snapshot to repeat the same tests at a different time?

Tobie: I think that's technical details for within the TF.

Mark: On webplatform.org, I think you have to think about communities. For both of these we're trying to get Web app developers as part of the community.
... If we get people to come for Web platform... think about it as a community. Thus at least integrated if not the same thing.

plh: Going back to platform...if we need X for documentation, next task force needs to figure out who is going to write those ??? tests.

Tobie: some people who gravitate around w3c and might be available

plh: Regarding ??? on coverage itself. It's a matter of going around the table and seeing who can contribute tests.
... Then estimating what it takes to write tests.

Tobie: Or maybe people in here willing to lead crowdsourcing.

plh: Sure. I don't want to prejudge methods for coming up with tests.
... Anything else we need to discuss / conclude?

Bryan: I was going to send a note... describe approach.
... We talked about HTML5, CSS, other things.
... As part of coremob CG... we need to determine based on that work how it fits into prioritization in this effort.
... My assumption is it's guiding prioritization of this effort.
... To clarify prioritization is important.

plh: We're gonig to have WGs looking at the results of this meeting.
... Maybe we motivate the WGs to write more tests?

<plh> ack

kaz: Minor question-- which group do the TFs belong to?

plh: Not attached to a WG
... the Web testing IG?

Tobie: what's the point in doing that?
... That would mean I'd need to get legal approval to join the group I'd be going to chair.

plh: I wouldn't expect IP commitments. Not a group under the patent policy.

<bryan> here is a link to the email that I mentioned, further describing the priority approach I recommend for test asset development: http://lists.w3.org/Archives/Public/public-test-infra/2013JanMar/0005.html

plh: Not sure if there's value in doing this. Concept of TF is outside the process. And not working on a REC-track document.

Robin: I think we should use the public-test-infra mailing list but not make it part of the group.

Bryan: We're interested in helping that IG; don't want to chair an empty group.

<Zakim> kaz, you wanted to wonder which group do the TFs belong to

Bryan: I dropped link to the email in IRC.

<andreatrasatti> thank you everyone

plh: Thanks to everyone for coming, Mozilla for hosting at short notice, and to scribes.

Robin: Does anyone object to making the minutes public?

[no objections]

<chaals> ok, have a good evening folks.

<fantasai> bye~

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.134 (CVS log)
$Date: 2013/02/05 17:22:40 $