SV_MEETING_TITLE -- 29 Jan 2013

<darobin> scribenick: darobin

plh: we will try to make as much of the material here today public
... Hi! I'm PLH, welcome

<Mark_Vickers> Mark Vickers

<tobie> Tobie Langel

<lbolstad> Lars Erik Bolstad

<rhauck1> Rebecca Hauck

<lmclister> Larry McLister

<fil> Filip Maj

<chaals> Chaals McCathie Nevile, Yandex (currently IRC only)

<jet> Jet Villegas

<kaz> Kaz Ashimura

<yosuke> Yosuke Funahashi from Tomo-Digi Corporation and Keio Research Institute at SFC

<plinss> Peter Linss

<a12u> Hiroyuki Aizu from TOSHIBA

<glenn> whoever is on the phone, please mute if you aren't speaking... 61# mutes 60# unmutes

<dbaron> everybody prior to Mike Champion was around the table, now we're doing the people behind on the benches

<stearns> Alan Stearns

<Graham> <Graham> == Graham Clift Sony

<jeff> Jeff Jaffe

<masinter> Larry Masinter, Adobe, lame duck TAG, just interested in encouraging testing, testing-sourced spec review. helped managed interop testing for some IETF specs

intros finished

<masinter> some references http://larry.masinter.net/draft-ietf-newtrk-interop-reports-00.html http://blogs.adobe.com/standards/2013/01/16/testing-the-third-pillar-of-standards/

slide #3

plh: if all the devices that claim to be HTML5 ready were, we wouldn't be having this meeting today
... testing is not something that we have a good track record of doing
... the core Process requirements at W3C are actually rather weak
... it's only a SHOULD
... the Director can approve a Rec even with limited testing

slide #5

scribe: WGs tend to demonstrate implementability in general rather than interoperability

slide #6

scribe: WGs tend to do the minimum to declare victory
... and there is only limited incentive to maintain test suites
... so they get abandoned and forgotten over time

slide #7

scribe: to get HTML5 to Rec, the HTML WG is going to use its judgement and not necessarily test absolutely everything
... it's a really low bar
... if we're counting on the HTML WG to produce a full fledged test suite, we could wait a while

Bryan: that enters in the definition of a Rec
... in this case, a Rec is not necessarily a verifiably interoperable document
... everyone needs to be aware of that

plh: yes, and if did that it would delay shipping

slide #8

plh: I looked at the mobile and TV profiles

(techs listed on the slide)

CoreMob 2012 and DLNA HTML5 for TV

scribe: if you are serious about a profile, you have to be serious about testing as well
... there is a lot of overlap between the two
... the overlap is probably what we want to target for testing first

Mike: when they list a spec, is it complete or a subset?

Mark: for DLNA, we say "all the mandatory parts of $givenSpec"

<soaa> Olu Akiwumi-Assani from Verizon wireless

Mark: e.g. HTML does not require a given syntax, or JS
... so all the specs are in effect 100% included

tobie: same for CoreMob

<jenleong> Jennifer Leong from AT&T

tobie: we consider it to be the role of the WG to cut specs down if they're not implemented in devices

jeff: these common parts, are they 95%, 30%, etc. of each side?

mark: it's well over 90%, even if only because HTML is so big, and we
... are converging more
... some things are not listed because they're required indirectly by specs we require

tobie: same for us

plh: there's also a document in the wiki that lists the differences
... CoreMob as more things that are not in TV
... e.g. touch events, geolocation, things that don't make sense
... conversely, TV lists the image formats which CoreMob does not

Mark: to be clear, the DLNA profile is not TV-specific, it's for all DLNA devices
... so if we sat down with CoreMob we'd probably align
... eg. for touch, you might touch a DLNA device

tobie: I think we have a common view, same apps

robin: align?

mark & tobie: yeah we should

plh: some of the documents listed in the overlap are very stable, others rather unstable

slide #9

plh: a bunch of those documents have no tests
... e.g. HTTP, Web origin
... others have some tests, but are far from complete

<lmclister> I thought there were B& B tests?

bryan: when you see "we"?

plh: W3C

bryan: we need to figure out where to source the tests

dan: HTTP is not W3C, right?

plh: correct, neither is ES
... but both are referenced
... we like tests

slide #10

plh: we also like? test tooling, review, coverage, results, documentation
... first and foremost we like consistency across WGs
... right now we don't have that, and it's painful and problematic
... it keeps biting us

slide #11

plh explains the terminology on the slide

s/we also lack/we also like/

slide #13

plh: we can do more than the minimal target!
... we can also do regression tests

<glenn> need better specification of "features" also; some specs normatively enumerate "features", e.g., http://www.w3.org/TR/ttaf1-dfxp/#features

slide #14

plh: there are various strategies to increase coverage
... including crowdsourcing and subcontracting

slide #15

plh: for crowdsourcing, we need a lot more documentation that can help people get there
... the quality of the tests that we receive varies a lot

hober joins physically after joining in IRC

slide #16

plh: for subcontracting, you have to count at least $100 per feature test
... HTML5 is roughly estimated at 10000 features
... you do the maths
... that's a minimum price, the quotes vary a lot
... we don't know the quality of the result

mike: how do you map features and tests in there?

plh: it's one feature one test in this case (so it might contain multiple unit tests obvously)

tobie: and $1 million doesn't count the fact that you have to review the tests, which doubles the amount at least

Mark_Vickers: there's a substantial cost, but we're paying cost now without the tests because we have to deal with interop, and the problem increases exponentially
... if you multiply that by the number of companies, the expense is vastly superior to a couple million dollars
... if you divide that by a number of contributors, you get a very reasonable number
... we would certainly be williing to contribute our share

bryan: certainly the numbers are big, we've contracted before and it's expensive
... but I think that this overestimates the cost of focusing on the priorities
... a lot of those features have been around so long that they don't have interop problems

<bryan> Counting # of features overestimates the cost of developing an effective test suite, as many features have been around for 10 years or more and should not be priorities for testing.

bryan: so if we focus on priorities, it costs less

jeff: I think both Mike and Bryan are correct
... in general I encourage the companies to come back with their own perspecitves about how they see this happening

<masinter> you might be underestimating the work, because some of the documents haven't been reviewed for testability, and trying to test features will come up with many document bugs and ambiguities

slide #17

plh: we have to figure out our priorities, you folks have to tell us where to put our resources?
... CSS Animations is a moving target, should we test?
... HTTP, should we leave that to the IETF?

slide #18

plh: a test management system, identifying coverage and gaps

<masinter> IETF doesn't do testing itself; HTTP testing was done with self-reporting of interop

plh: several different groups use different things
... CSS uses Sheperd, HTML is using GitHub
... how do we do test reviewing
... documentation, consistency across groups
... it took us 18 months to get all groups using testharness.js
... so we have to consider a similar timeline for the rest
... the CoreMob people want a test framework
... the current one we have has problems, we have to figure out how to move forward
... and we have to figure out how important reporting is as part of that

slide #21

plh: need to document the full process of test writing
... especially at introduction level

slide #22

plh: I want three things from presentations: GOALS, REQUIREMENTS, PRIORITIES
... if you can't list those, I'm not interested in your presentation :)
... then based on that, figure out how we achieve this and what resources we have
... if we have priorities that people aren't providing resources for, then we'll drop them

slide #23

<Judy> Slides for Testing Accessibility are here: https://www.w3.org/2013/Talks/0129-testing-accessibility/

<Zakim> kaz, you wanted to mention the assertion manager software which VBWG and MMIWG have been using to manage test assertions and test suites (though there are some more candidates)

kaz: there's a tool used by Voice Browser
... we have a simple kind of DB with HTML UI, that we use to manage tests

Judy: some accessibility testing requirements
... integrating a11y testing where possible

slide #3

scribe: a11y is distributed across the OWP
... a lot of features support a11y in a variety of ways
... the ways in which that can be tested varies, might require additional expertise around the table

slide #4

scribe: in some cases a11y is required, at times by law
... problems with a11y can shut out a market

slide #5

scribe: examples of how this is handled in some cases, in some browsers
... within HTML5 there is extensive embedding of ARIA
... there are 60 roles, lots of properties that apply
... so about a 1k feature tests are required here
... also need to test focus management and ARIA events
... reference material for requirements on the slides
... testing of AAPI is particularly important
... the benefit of AAPIs is that UAs don't need to have direct knowledge of AT but just expose an API for ATs to plug into
... AAPI tooling could make it possible/easier to automate such testing

[I wonder if WebDriver has AAPI support]

slide #7

scribe: PFWG has a harness that is worth looking into

slide #8

scribe: one of the challenge of testing for a11y is the broad range of ATs out there
... we have a project to compile information about actual AT deployment, support, usage
... automatable tests expressible through WebIDL
... we need to ensure that we have a11y experts coordinate with other test suites

<Zakim> bryan, you wanted to ask: To get started on accessibility, is there a wiki for developers that identifies the most broadly valuable features, as a way to get started and clarify

bryan: is there any information that we could use to identify priorities to build a11y into our tests, make sure we target the most important cases?

Judy: yes, I think part of the issue is that the information is distributed rather than centralised
... if you want to do a11y testing for x features, it's not necessarily clear where to go
... for ARIA, there's already a fair amount, but outside of that we're missing centralised repo

Michael: we don't have a central place because we're driven by feature maturity
... we circle around to needing testing in order to answer your question
... but we could certainly make a list of priorities

<Zakim> darobin, you wanted to ask about WebDriver having AAPI support, and how automated tests are expressed in WebIDL

<Zakim> chaals, you wanted to suggest that this is a question markets will answer, rather than W3C trying to suggest who matters most

chaals: one of the issues that we will face, which is made clear by a11y, is that there are different priorities for different markets
... in particular for a11y
... W3C is probably not the organisation that we should ask to choose between the priority of things that only affect blind users, as opposed
... to those that affect only some subset of hearing-impaired users
... so there's an issue of the [...] that we the members need to figure out for testing

Judy: I have no idea what chaals is getting at [due to strobed audio]

<chaals> [there are priority decisions that we won't effectively make in committee...]

plh: we are here because of market conditions, and one way of driving the priorities is through the market

Judy: I think there's a value in planning for functionality that we spent a while building in

<bryan> thanks Judy and Michael, I would like to followup with you on developer priority-clarifying resources, similar to our use case - focused CoreMob 2012 spec

Judy: I hear the point, but I would rather do good planning for testing, then let the market take its best shot

<chaals> [and we get to a point where looking for prioritisation between orthogonal requirements isn't something we can do from inside W3C]

<bryan> http://bkaj.net/w3c/20130129-WebTesting.html

<chaals> [I think Judy and I are largely in agreement]

slide #1

bryan: CoreMob has been publicly mentioned by our CTO at the ATT summit, we are going HTML5 all the way
... we need to avoid overselling
... and we need to check that specs have been implemented and tested
... we need to be able to tell developers what works
... our goal is NOT to help W3C produce specs, it's to help the Web be a stronger place

we === ATT

bryan: priority is a detail question
... we can't do 10k tests right now, we have to focus on 1k to do quickly
... testing priorities needs to be a living document

slide #4

bryan: we want the test framework to be a core part of WGs' work
... and we need to be able to export it and run our own copy
... we need to make sure that everything that can be automated is
... we need to expose a resource to the web community about data gathered through testing

slide #5

scribe: test assets need to be associable with features
... tests need to have a clear life cycle that we can document
... and the life cycle needs to be the same across groups

slide #6

scribe: we're very much behind CoreMob and its inspiration
... W3C are all over the place
... and nothing is clear
... and sometimes there are no tests

slide #7

scribe: in house we have lots of tests that we use for all the devices we ship
... for the Web we want to focus on what is used by everybody else
... we need to find and target gaps
... and we can take on test writing
... if it's focused enough on a feature, we can help
... this is a priority for 2013

jennifer: we're interested in doing a lot of this testing for pre-launch devices
... so we need private instances of the test framework
... to avoid information about devices leaking
... manufacturers should be able to use them

bryan: so if we can clone a TF, then the manufacturers can run it
... and when we go public, the data is available already

<Zakim> darobin, you wanted to ask if features can == spec section or not

robin: can features be considered to just map to spec sections?

bryan: that could be good enough, so long as we can map thing properly

plh: you said that you could help with test writing, but not with the framework, yet you make the framework a priority
... so how does ATT see their participation?

bryan: that's a very good question
... we want to contribute more time to this
... but we're not framework experts

plh: funds are always welcome

bryan: if there's some kind of sponsorship programme for this I can definitely speak to the right people

glenn: comment on feature
... specs could formally enumerate and identify features
... we did this in TTML and it's been very useful

<glenn> http://www.w3.org/TR/ttaf1-dfxp/#features

glenn: notably for test evaluation

plh: but that shifts the burden to WGs, which we'd like to avoid
... you'll get push back

glenn: this can be done in separate documents that aren't the main document

bob: a thought about the feature issue
... I think that what we want is to demonstrate conformance against specs
... we run a bunch of tests, and we can say "yes, this is conformant"
... we'll have to think about feature to get there, but for us the goal is really conformance to the whole spec
... we may accept demonstration of lower levels of conformance in the short term, but the endgame is a high level of conformance demonstration
... so we shouldn't spend too much time wondering about features

bryan: I didn't show the spreadsheet, but I think it can provide input to test coverage assertion

<masinter> suggestion: you'll never get to perfect coverage or even coverage-in-depth. Focus on breadth and regression testing: get *some* tests for every spec, then focus on testing against complaints.

Mark_Vickers: just a couple caveats
... Web&TV just started a testing TF
... I'm just providing my input, not the TF or the group

<Zakim> bryan, you wanted to note that the CoreMob 2012 coverage analysis I did is intended as input to an aligned effort of test coverage assessment, to which we will contribute

Mark_Vickers: we deliver video apps to "screens"
... I don't believe in separate profiles for different devices, we're justfocused on video apps

slide #3

scribe: we shouldn't change the way that groups use testing to ship specs, that's fine
... but we should do more in testing
... the cost of developing a cross-browser app is still high, despite improvements
... this is a reason to improve the consistency of the OWP
... and this cost is multiplied by devices, number of developers, etc. — a really high cost
... there are three legs to an API definition: spec, docs, tests
... we have specs, and now we have webplatform.org for docs
... but we're missing the third leg of that stool, so it falls over
... we want a webplatform.org for tests
... some ideas on how to do this
... mechanism for developers to report inconsistencies between browsers
... maybe it could make sense to use webplatform.org
... then generate tests for that problem
... sometimes it's a browser bug
... but sometimes it's a spec bug
... another angle is to review libraries
... since they deal with browser inconsistencies, you can just go through the code and everything that is papering over problems is a bug
... we also need outreach to ensure that we prioritise based on the needs of web develoeprs
... we need to formally take it on rather than consider it as side work

<bryan> +1 to creation of a resource for inconsistency publication, consensus based and allowing for explanation by vendors

scribe: so that it can be funded, resourced, etc.
... when DLNA references the OWP, we made a clear commitment not to define new specs
... and other orgs are doing the same
... many of those provide testing and certification
... it's important because some times you need to be able to claim that you're aligned with e.g. a national standard
... we need to make it easy for those organisations to use our tests
... in DLNA we wondered about creating our own tests for that
... but if you define your own tests, you're defining a new standard
... external groups have a lot of problems reusing W3C tests
... they move, they break, etc.
... I do not think that W3C should take on certification
... but we should provide those organisations with the ruler they need to use
... we need One Home for all tests
... need to configure which tests you want to run
... One Click To Run Them All

<bryan> +1 to W3C enabling certification providers to serve a market based upon W3C recommendations - but maybe they are also stakeholders to which W3C should look to for support?

scribe: need to be able to save a detailed test run

[shows the Khronos tests for WebGL]

scribe: DLNA finds that the WebGL tests are great and are a goal
... it would be useful to be able to load an existing profile that can configure the run so you don't have to repeat that over and over again

<dbaron> he's showing a URL that looks a bit like http://www.khronos.org/.../.../tests/webgl-conformance-tests.html but I can't read the middle part

jeff: so when plh established the objectives, I hear it very much around the technology of what to test, picking the right profiles to prioritise
... but hearing you talking about linking to W3C tests, looking inside libraries, I also heard more of a curation and management role than what plh described
... is that just a different expression, or something you're adding?

Mark_Vickers: there are definitely two aspects
... the external organisation, could rely on a centralised test runner
... if we have that, then they can just use it

<dbaron> https://www.khronos.org/registry/webgl/sdk/tests/webgl-conformance-tests.html

Mark_Vickers: I don't think we need curation so long as we have a clear way of setting things up

<Zakim> dbaron, you wanted to say that getting feedback from authors or libraries is great, and I think we'd be likely to find that much of the feedback is about bugs in old browser

jeff: so if we do it right then it won't require curation, good point

dbaron: building a way for authors to provide feedback about what's not-interoperable would be great
... but we need to make sure we don't then get to target old browsers for bugs that have already been fixed

Mark_Vickers: yes, filtering would be an issue

<Zakim> bryan, you wanted to ask if the test results output should include a signature from the test server to validate that the results are certified?

bryan: just a note, as Mark_Vickers is describing the output of this test run
... we could add a signature to that run to validate that it did occur

robin: it's too easy to break that

Mark_Vickers: we can handle honesty by contract instead

a12u: I think a permanent link to each test is very useful
... because it encourages external testing

[break]

<chaals> [(in response to dbaron) Agree that it is important for us to be looking at what is new. But there are many people who would like to have a historical record of support over the last 7 years (because it is as important to their work as understanding what will be released in 3 months) - and after 7 years we will enabled others to collect that information if they want it.]

<bryan> scribenick: bryan

<Mike5> RRSAgent make minutes

bob_lund: intro to cablelabs
... suportive of the vision outlined by Mark
... building a ref impl of the DLNA RUI as defined by DLNA
... (shows a dongle with HDMI & USB interfaces) which connects to the back of a TV
... looking for wider implementations and OEMs to take advantage of it
... DLNA product vendors will be a primary adoption target
... DLNA expects compliance of the implementations to ve verified
... DLNA needs a way to run tests & show compliance, wants the framework and tests to be W3C developed
... beyond HTML5 there are specific requirements of DLNA e.g. for the video tag
... product-specific tests will be defined for these
... also e.g. multiple audio tracks e.g. for accessibility
... these additional tests should use the W3C framework and be made available to W3C
... creation of test media will be included
... extending test scope beyond W3C scope e.g. for manual tests is another objective
... Cert Orgs like DLNA would like to use a framework in the three areas mentioned
... (shows W3C & related specs of interest to DLNA)
... re Test Framework Requirements, shows gaps e.g. a single URL to the framework
... local instance of the framework - work in progress with W3C help e.g. Robin
... product devs need to create their own test suites - the framework does (kind of) support that
... multiple test formats e.g. JS, Ref-Test, Manual, are needed - unclear if all are supported
... need more flexibility in reporting indiv results & How they are aggregated
... also need to include the Framework in nightly build tests

plh: anyone disagree with one or more of the requirements shown (slide "Test Framework Requirements")

tobie: the requirements are great

<dbaron> Though some of the requirements do seem like nice-to-have and some really seem like requirements

Clarke: a comment - refining Web & TV is a goal of the TF starting next week - please sign up if you have requirements for that

???: anyone have ROM estimates for building such a framework as described?

plh: how far is the W3C framework from your requirements

bob_lund: it's a great starting point

darobin: one problem is syncing with the repository

bob_lund: recording is another major area

Mark_Vickers: this is not just about writing down ideas, we are looking for committments on getting it done

Mozilla perspective

jet: W3C can help us be both correct and fast in how we test the Web
... we need to address development focused tests, and compliance focused tests

<Clarke> Clarification of my comment above: The Web & TV Testing TF is starting next week. The primary objective is to define testing requirements for Web & TV use cases. Please sign up if you are interested in testing Web & TV stuff.

jet: development tests need to support experimentation, local results (not uploaded to a server), automation, inclusion in a regression harness, ability to extend to special APIs use in development
... compliance tests need to focus on stable UA versions, enable server-based results collection, support automated and manual...
... numeric scores vs % passed, having a specific number of passed tests to enable rolling up the counts per spec
... in summary, automation is key, focusing on testing vs keeping score, and lowering barriers to contribution

dbaron: one more thought on the etsting vs scoring, the more broadly we give a score, the more time we spend arguing about the score and managing reports by others on the score
... we need to avoid debates on scores and weighting within a score

jet: we also have more granular requirements that we can provide

mark_vickers: re the scope issue, agree that W3C should not get into certification or posting scores - that would just cause fights

plinss: WG's need the ability to score to keep track for REC

<Zakim> tobie, you wanted to ask about the extra reqs

jenleong: a score will be more useful for devs to look at categories of features, to see how well they are implemented, vs a single score

tobie: devs need to know what features are supported where, understand this may have issues and puts vendors on the spot, but the value for devs is tremendous

<Andrea> q

tobie: e.g. external sites can access the results and provide that value to devs

jet: will format a list of the detailed requirements for later

girlie_mac: (agree) about value of making data usable externally
... curious about Mozilla's own web APIs, is it expected to support proprietary features in test results

dbaron: we write tests to include all that we work on, re.the "special powers" API this is something we use internally

<plh> Bryan: it's very important that W3C not being seeing implying certification level

<plh> ... but let's be data focus for external provider

<plh> ... for what matters to them

<plh> ... having access to the raugh count is important

<Zakim> Andrea, you wanted to talk about results and if and how these are published on webplatform.org

andrea: thinking about the web platform, it was mentioned that W3C will not be a cert body. but on webplatform.org there are can i use type tables - we need to make sure this is not taken as a cert statement

<Zakim> tobie, you wanted to comment on WebDriver

andrea: we should also have ability to import other data into webplatform.org

tobie: re webdriver, there is value in using it to automate reftests, we should look into it

plh: re test results, we look at webplatform.org as part of W3C to access to the test results is a given

mark_vickers: we just don't want the data becoming a barrier, an unhelpful influence

bob_lund: we see webdriver as very useful, and re the special powers API it's similar in objective to we would like to see it align with webdriver

Testing the Open Web Platform by Tobie

tobie: as Facebook AC rep, this is our perspective in 3 parts
... it makes the life of Engineers easier
... Platform developers
... people using FB
... the end goal is to improve the Web; state of a spec is less important than use of a feature
... driving to fewer bugs, better interop, ...
... devs need to know feature support
... to focus browser development e.g. through CoreMob
... and to make bug reporting easier - there is no tracking of bugs across browsers - when we hit issues, we don't know how to report them to the vendors
... the framework should enable bug-driven tests to feedback to vendors
... including a social aspect to this for crowdsourcing, e.g. using github

jeff: re scoring or not, it seems to be a nice to have from FB's perspective. On this slide it seems more fundamental as a requirement
... it seems here to say this needs to be very visible

tobie: yes, to clarify, this is not intended to lead to browser vendor fights, just support devs

dbaron: no real disagreement, exposing results at a feature level is fine, but a higher rollup is problematic

???: a complication is when external sites use the info, there will be possibilities for number games

tobie: we believe this is a long term effort

<Zakim> chaals, you wanted to type that We can't stop people from extracting some simple number, and there is a certain amount of motivation to do so. But we should clearly disdain such

tobie: including aspects for infrastructure, process, outreach, education, and data driven

chaals: We can't stop people from extracting some simple number, and there is a certain amount of motivation to do so. But we should clearly disdain such raw numbers as far as practicable.

mark_vickers: consumer reports protects their report numbers through license restriction

jeff: heard some a very clear message on partnering with other orgs e.g. DLNA
... not getting into the scoring game
... want to cover specs without needing to hit ever corner case

<chaals> [+4 to Tobie's point about this being a long-term exercise, BTW]

jeff: support different consumers e.g. devs, engineers

plh: we will cover how to increase test coverage after lunch

W3C Testing by Larry McLister

lmclister: skipping to slide "test the web forward"

<rhauck1> you can advance one more bullet

lmclister: part of getting more tests is more contributors e.g. TTWF - next is Syndney sponsored by Google
... preparing event kits to enable more events, smaller ocused group events, reviewer training
... growing the community by developing tests outside events, virtually
... re WG Documentation, it's hard to find the test suites, samples, know the review process, and know who is the owner
... who can help us with issues, links to WG resources, and backlog of tests and reviews needed

mark_vickers: have you looked at the cost for the events, re the end results, to estimate the cost per tests produced etc

rebecca: have tried to do some informal tracking of the events, but that has been really hard

lmclister: other test drivers e.g. selenium communities may also be leveraged to add tests

robin: what about the idea that some API groups have to just use the same single GitHub repository as the HTML WG for all the tests?

lmclister: a good approach

dan: for each spec we need to know where coverage is, as clear info to contributors

jeff: any sense of what #s of tests we could get outside TTWF - crowdsourcing is financially the most attractive, but we need to predict how well it will be able to deliver

lmclister: putting a # on tests we can expect is hard - we have historical data but that future prediction is difficult

jeff: if we put focus on fundnig critical path tests, would that de-motivate the crowd sourced lowe priority tests?

plh: are you considering webinar type events?

<darobin> [worldwide test hackathon!]

lmclister: we are trying to get the community online principally, not really/necessarily in a realtime webinar type context

plh: have you tried to approach and leverage universities
... are you looking at docs on how to write and contribute to tests?

lmclister: we are looking to the WGs to write docs for their tests

plh: so we need a central repository of the docs so testers know how to use/develop

darobin: things are currently spread all over

alan: you can't necessarily score past events as people are learning in the process

<Zakim> tobie, you wanted to comment on WGs

alan: one idea for a focused meetup would be to convert mochitests to W3C tests etc. this would promote those involved with a specific engine to get involved

tobie: one issue with Ringmark development, is that knowing what specs related to what WGs will help
... 2nd thing is that contractors need documentation/process similar to crowdsourcing, so we have to do that anyway
... we need to move faster on test approval, as we will lose test devs if it takes too long

jeff: I had an expectation that browser vendors would talk about internal tests, and aspects of how they are/can be made available to W3C - what are the obstacles etc

bryan: the first browser testing workshop covered a lot of that, it would be interesting to see what has changed

mike: the intertia to move internal tests to the public, e.g. different frameworks were documented in the last workshop

plh: some issues we face, is that we can throw money at converting tests...

lars: we have donated the bulk of our tests to W3C already

rebecca: some obstacles are the lack of clarity on how the specific things for the different test sources can be harmonized for use by W3C

<jenleong> wada: intro to KDDI. mobile phones & software are complex and many tests are required. This talk is based on our experience. KDDI is a leading Japanese mobile company.

<jenleong> 3M strategy: multi-use, multidevice, multi-network. Music, mobile, games, money on mobile devices, smartphones, e-books, PCs, TV, etc. Networks: 4GLTE, WIMAX, FTTH, 3G, WI-FI, CATV

<jenleong> KDDI wants HTML5 to underly a hardware-independent fashion. Anywhere, anytime, seamless. E.g. download maps at home using pc/tablet, then while driving, the mobile network allows us to use this data on the road. We can also download additional data using smartphone, which then communicates via wi-fi with the vehicle.

<jenleong> ... note: this represents my personal view and not any organization, including KDDI.

<jenleong> ... the platform is HW-independent...

<jenleong> ... slide: Is HTML5 perfect? Implementation differences exist. Problems porting from other platforms. Lack of security issues.

<jenleong> ... In Japan, content providers are waiting until HTML5 becomes more stable & mature. Some believe that this cannot be achieved by the current ability of mobile phones. Also, they have no incentive to move from native to HTML5 b/c they are successful in native already. "conducting wire" necessary for HTML5 to be come successful for business

<jenleong> ... <slide: Goal of Tests>. To obtain trust from the industry that the open web platform is reliable, and...

<jenleong> <slide: Necessary Tests>. Specification-based tests: Works against the spec, performed mainly by W3C. User-viewpoint test: verify the functionality, quality, reliability of the products. Performed by industries. Test use cases not thought of by spec writers.

<jenleong> ... For specification-based tests: Who will conduct the tests? W3C? How will we keep on schedule? For outsourcing, tests must be well-defined in detail beforehand. Who will analyze and feedback the results? Are there sufficient tests? What about duplication? Who will make this analysis?

<jenleong> ... For user-viewpoint tests, crowdsourcing will be essential. The test platform will be mandatory.

<jenleong> ... <slide: Development of Test Platform> We can read & write tests, and obtain expected results on this platform. External parties can also use this for their functional or interoperability verification.

<jenleong> ... <slide: Test Platform> Should be open to parties outside W3C. Need automation, management of test content db. Mechanism to utilize test results must be provided. Also need users support (e.g. Q&A)

<jenleong> ... <slide: Way of proceeding> Dedicated test lab is necessary for executing, hosting, and developing tests. Probably not achievable on a volunteer basis. Analysis will also be performed in the lab.

<jenleong> ... Emphasis on keeping a chedule. We will wish to exhibit at appropriate venue, e.g. CES

<Zakim> tobie, you wanted to understand the idea of a test lab better

<jenleong> tobie: Happy to see alignment on many issues. What do you mean by 'test lab'? Is it a working group?

<jenleong> wada: practical dedicated resources

<jenleong> tobie: so team who can lead day-to-day?

bob_lund: you also mentioned hosting

<jenleong> wada: and manage outsourced resources

<jenleong> stearns: We should have W3C-wide test owners

s/A:/stearns:/

<jenleong> phillipe: I agree except that we haven't had much success with test suite owners so far. They cannot be counted on to review the test

<jenleong> stearns: the idea is rather for the owner to coordinate, to act as a contact point.

oops

<jenleong> darobin: ... pull requests will make this easier. I've been doing this. If we keep up the plan of a shared repository, then this will be doable. The past process did not work, but the new one for HTML is better

<jenleong> plh: how do we hire these people?

<jenleong> rebecca: A lot of it boils down to what the owner is/isn't. If we define the expectations, then they can delegate tasks. Test suite owner may own a to-do list. This is similar to "test facilitator". Document the duties clearly. Set term (6mo, yr)

<Graham> anyone else on remote side having bad audio experience?

<jenleong> mike (mc): What is the model for moving forward? Should we be thinking harder about the process? Don't want a multi-year conversation. Let's define the next step. What are some achievable goals that we have resources for?

<Zakim> dbaron, you wanted to discuss an administrative question about the leftover food

<jenleong> plh: tobie, do you want to do a document?

<jenleong> tobie: hahaha

<jenleong> leftover food is being disposed of into the gullets of mozilla

<Cyril_Rickelton-Abdi> We shall set up an iPhone running Vine in front of the leftover table

<jenleong> plh: Yosuke, you're up next

<jenleong> Slide deck: Improving W3C Testing Activity with "Testing and Interop Lab"

<jenleong> yosuke: This presentation is my research from Keio Research Institute and not any other organization

<jenleong> yosuke: Keio University is planning to create a new lab for testing. Keio has money & resources to foster interoperability testing,

<jenleong> ... which can be used. Want to enable industries to adopt open web standards more quickly and easily. Feasibility study ends Feb. Tracer-bullet project starts Mar. Eval in June

<jenleong> <slide: Short and Medium Term Objectives>. Develop and polish testing tools and frameworks, testing infrastuctures. Try to get industries to adopt living standards. Develop methods to test devices effectively & efficiently. ISO already has framework for certification. but organizing the program is very heavy and ineffective. Need a more lightweight framework for certification. Get them

<jenleong> to use W3C tests for their certification.

<jenleong> ... <slide: My Today's Agenda> Using the tracer-bullet project. It has a small budget and small resources. Want to use this meeting as a point to get ideas, prioritize them, and align with W3C

<jenleong> ... <slide Initial ideas on how to improve HTML5...> For croudsourcing, we need: visualization of test-suite status. Also need links between spec features, docs, test code, reviews and central test runner. Also need gamification

<darobin> [it has been mentioned before to use something like jsFiddle plugged into test submission]

<jenleong> ... <slide Initial ideas on how to improve HTML5...(cont'd)> Need fully functional HTML5 spec doc. Link to corresponding test code in github. Let test writes "reserve" part of the spec they are going to write. ...

<jenleong> ... <next slide> Write test code the UA manufacturers don't have time to. Organize test writes from SE Asia & India academia. Refine existing tools & integrate them. E.g. improve idlharness.js with better automation. Video & video tests are the hard part from point of view of browser vendors b/c checking test results could slow down the testing site significantly

<jenleong> ... testing visual elements ...

<Mike5> "Organize test writers from .. academia"?

<jenleong> *writers

<jenleong> thanks

<Mike5> jenleong, well I was actually wondering about the "academia" part :)

<jenleong> jeff: Thanks for the offer to partner with Keio University. What is the quantity of test cases we could count on from the Keio initiative? We need lots of test cases, lots of sources of writers.

<jenleong> yosuke: We have not decided to use our resources & money to improve individual specs or platform or framework. As for the tracer bullet, we have concrete budget. Once we decide what we want to do, we can figure out how much we can accomplish.

<Zakim> tobie, you wanted to comment on having a centralized place for info on test status.

<jenleong> tobie: Two previous speakers mentioned the necessity for a place to keep test data, what was covered, what was missing, in the specs themselves. This is a key deliverable for the near future. I'd like to add more discussion on this when we dig into the test frameworks. How do we get the tests and which tests do we write?

<Zakim> darobin, you wanted to mention jsFiddle

<jenleong> darobin: You mentioned mediawiki. We haven't prototyped this but perhaps we could use jsFiddle for people to enter tests. It's familiar to developers, you just enter your code. We can include testharness directly in it. Leah, who is on the W3C team, could help us

<jenleong> plh: what is jsFiddle?

<girlie_mac> http://jsfiddle.net/

<jenleong> darobin: online code editor that allow execution

<jenleong> dan: question for robin. Do we have an idea on how much coverage we have?

<jenleong> darobin: Philippe has started this document. It's not up-to-date with the latest spec and needs work, but we have the basics. For each section, we have the tests and how many

<jenleong> darobin: some the sections will have 0 but once we have the data we can incorporate into the spec itself.

<jenleong> darobin: it's all automated. previous version uses metadata about which section it's for. however copy-paste issues arise. to help people get it right, the tests get put into a directory structure which maps to the spec.

<jenleong> tobie: It bothers me that we know how many tests we have for each section, but we don't know how many we need!

<jenleong> tobie: is there any way to measure coverage in a more precise way? Can we get information from parsing the specs? I will try again because there could be value in it. OTherwise it needs to be done by hand

<jenleong> mchampion: I've heard a lot about HTML5... Do webapps & HTML use the same test framework?

<jenleong> darobin: yes. We will move to using github for everything and this will simplify things.

<jenleong> mchampion: someone has to put in the metadata

<jenleong> darobin: the tests don't care what wg they belong to. but we do have a way of mapping back to specs that works.

<jenleong> plh: good transition into the specs

<jenleong> plh: television/mobile profile. I would like to get a sense from the room about priorities. If there were only room to do one in 2013, which one would we do? HTTP 1.1, Web Origin Concept, ECMA SCript 5.1. Raise your hands if you think it's a priority

<plh> http://www.w3.org/wiki/Testing/Common

<jenleong> plh: <reading off groups of specs to vote on>. This is a complete list. <entering list into IRC>

<jenleong> HTTP 1.1: 0

<jenleong> Web Origin Concept: 0

<jenleong> ECMAScript 5.1: 0

<jenleong> tobie: coverage is actually very good for ECMAScript

<jenleong> HTML5 Canvas 2D Context: 16

<jenleong> HTML5: most of the people in the room

<fantasai> tobie, test coverage is hard to measure unless you break out a spec into test assertions (*not* the same as 'testable assertions')

<jenleong> CSS 2.1: 6

<glenn> CSS2.1+, CSSOM+, DOM4+

<jenleong> rebecca: this is being broken into other specs.

<jenleong> bryan: but some work needs to be done in 2013?

<jenleong> CSS 2.1: 2

<chaals> html5 canvas, h5, over CSS 2.1

<glenn> CSS2.1 ++

<chaals> CSS animation we would like tests for

<jenleong> plh: spec is unstable?

<jenleong> CSS Animations: 13

<fantasai> tobie, Melinda Grant and I did this for css3-page back in 2008 or so; I can show you that as an example. The actual number of tests was much higher because we didn't break it down quite enough in the first pass...

<jenleong> CSS Background & Borders: 17

<jenleong> CSS Color Level 3: 1

<fantasai> RRSAgent: pointer

<chaals> transform: probably

<jenleong> CSS Transforms: 20

<jenleong> CSS Fonts Level 3: 12

<chaals> transitions.

<jenleong> CSS Transitions: 21

<jenleong> darobin: Rodney Reihm already wrote a test suite for this

<glenn> +1 for CSSOM

<glenn> +1 for CSSOM View

<tobie> fantasai: would love to see this and get your feedback on this effort.

<glenn> there is considerable divergence in behavior in CSSOM among UAs now

<jenleong> Bryan: what is everyone's criteria?

<jenleong> Mike: it's in CR

<jenleong> Bryan: we care if it's in Coremob 2012

<glenn> +1

<jenleong> stearns: if something is shipping but the spec isn't solid that is priority

<jenleong> CSS Object Model: 15

<jenleong> plh: CORS just moved to CR

<jenleong> CORS: 8

<jenleong> DOM 3 Events:

<jenleong> Bryan: not referenced by Coremob?

<jenleong> tobie: I added it

<jenleong> DOM 3 Events

<jenleong> skipping that one

<chaals> D3E over DOM4

<jenleong> DOM 4: 11

<jenleong> Progress Events: complete, only 1 vote

<jenleong> Web Storage: 1

<jenleong> XHR: 17

<dbaron> did plh ask for hands for Web storage?

<jenleong> Web sockets is not common enough

<dbaron> er, sorry, workers

<jenleong> he didn't b/c it's complete already

<jenleong> Web Workers: 16

<jenleong> Web Sockets: 8

<jenleong> Indexed DB: 14

<jenleong> SVG: 0

<jenleong> WOFF: 2

<jenleong> plh: a lot of people chose HTML5. How shall we test this? Crowdsourcing, vendors, outsourcing?

l- http://www.w3.org/2011/10/28-testing-minutes

<jenleong> bryan: i'm linking to the minutes from the previous workshop

<jenleong> darobin: if we try to buy tests from a company, we may get low quality. how about we pay a company to converting the tests from one system to another?

<jenleong> Jet: There are a lot of our tests that can't be put into a boilerplate, with server/client components. When execeptions are thrown outside test/test_step function, it doesn't cause test failure.

<jenleong> jet: test functions have longer names. other limitations of testharness.js: can't run server-side code and capture results

<jenleong> jet: reasons for this complexity: avoid relying on window.onerror, and to be able to put several independent logical tests into a single file while having them pass/fail independently of each other (a test failing may cause downstream tests to not run and report failure)

<jenleong> jet: Doesn't seem to be much value for either of these. We are looking at webdriver to circumvent. We can set browser preferences, go outside of browser sandbox (check back button enabled in history stack > 1, e.g.)

<jenleong> jet: currently webdriver doesn't have a good way to wait for given events

<jenleong> jet: if we want these tests in the w3c framework, we need to get these to other browsers in a secure fasion

<Zakim> andreatrasatti, you wanted to ask about device API's like geolocation, device orientation and other sensors

<Zakim> tobie, you wanted to comment on async test in testharness.js

<jenleong> andrea: i will speak later, about priortization

<jenleong> tobie: I used testharness a lot. None of your changes pose a problem. I'm concerned with special-power APIs that don't have anything to do with specs (e.g. back button functionality). Standardizing on a server-side component may be necessary, so we can include what you need.

<chaals> [The problem in picking a server-side standard is that it is untested for interoperability… and testing servers is a useful thing to do]

<jenleong> jeff: most of the issues identified were syntactic issues. Even if we outsourced it, it will take less time to put some sytactic sugar on your test case than to generate it from scratch

<Zakim> darobin, you wanted to clarify about the server-side requirement

<jenleong> darobin: it's easier to have no server-side testing, but how do you test that stuff?

<Zakim> andreatrasatti, you wanted to ask about device API's like geolocation, device orientation and other sensors

<jenleong> dbaron: we run locally, but it's just not another machine. however, what we run is probably not portable to others

<jenleong> plh: We have enough work already to take us through a few years. if your company can work on _________, you are certainly welcome. We are trying to prioritize what to do with our current resources

<chaals> [Robin's point is important. Because for different stakeholders, different tests matter (or not). But this assumes the cost of handling that exercise is cheap enough not to worry]

<jenleong> someone: there are 3-4 things where almost the whole group voted for it.

<jenleong> mark: regarding webdriver, does it need to be more integrated into the mainstream tests?

<jenleong> jet: we support moving functionality to webdriver, but the webdriver spec is missing key use cases.

<jenleong> jet: we're happy to share our tests, but we would want them to come back upstream. They are currently organized into folders. We would need to set it up to handle that

<chaals> [so provenance metadata in the test would help you, jet?]

re Andrea's comment, we developed spec priorities through the profiling efforts e.g. under CoreMob and Web&TV. Beyond that I prioritize based upon availability of tests, automation of tests, integration of tests into the W3C framework

<jenleong> bryan: responding to andrea. The importance of context should not be missed. Device-specific features like geolocation are more important for phones than TVs. Also prioritize those tests which are more incomplete.

<jenleong> plh: let's move to tooling for 2013.

<jeff> scribenick: jeff

<tobie> https://gist.github.com/4668636

tobie: Took Bob's work and did copy/paste
... An agreement on requirements for test framework
... so we know what we need to build

<fantasai> chaals, I'm not sure that exactly is necessary, but there needs to be careful tracking of which copy is master and which is slave; and if that relationship switches, that needs to be updated in both systems as well

tobie: [reads requirement list]

Test Framework Requirements

===========================

- Single URL to W3C Framework.

- Ability to use the framework to run the tests locally.

- Ability to define and run test suites for specific profiles.

<Zakim> dbaron, you wanted to comment on single url and manual vs. automated running

- Single test run.

- Ability to run testharness.js, ref and manual tests.

- Reporting individual and aggregated results.

- Allow browser vendors to run the tests as part of their CI strategy.

DBaron: Some features on list push to manual; some to automation

- DRAFT -

SV_MEETING_TITLE

29 Jan 2013

Attendees

Contents

Mozilla perspective

Testing the Open Web Platform by Tobie

W3C Testing by Larry McLister

Summary of Action Items

Scribe.perl diagnostic output