W3C

Browser testing meeting

28 Oct 2011

Agenda

See also: IRC log

Attendees

Present
RakeshTiwary, Jeane_Spellman, Bryan_Sullivan, Wilhelm_Anderson, James_Graham, Elika_Etemad, Jason_Leyba, Simon_Stewart, Kris_Krueger, John_Jansen, Peter_Linss, Mike_Smith, Alan_Stearns, Narayana_Babu_Maddhuri, Duane_O'Brien, Charlie_Scheinost, Ken_Kania, Jeff_Hammel, Clint_Talbert, Tab_Atkins, Michael_Cooper, Philippe_Le_Hégaret
Chair
Wilhelm_Andersen

Contents


<MichaelC_SJC> scribeNick: MichaelC_SJC

Introductions

wa: testing helps everybody

figure out how to make best possible test suites

<plh> Wilhelm: I'd like to figure how to make the best possible test suite, how to make the Web better

I work for Opera as testmonkey, test manager

in various parts

jg: also work for Opera

<missed the rest>

ee: also known as fantasai

work on testing in CSS WG

jl: work on testing in Google

want to improve the ecosystem so it all works better

ss: created Webdriver, working Selenium

very aware of the differences between browsers, would love to sort it out

kk: worked in testing at Microsoft

more recently on Web standards

jj: also at Microsoft

interested in automation, test suites

pl: co-chair of CSS WG

have contributed extensively to that test suite

and working on test shepherd for <missed>

ms: work for W3C, staff contact to HTML WG

work on testing for HTML, extensive contributions to framework

as: working for Adobe

interested in tests working across browsers

represent Nokia

nm: learn what's up

do: <missed>

<MikeSmith> https://browserlab.adobe.com/en-us/index.html <- Adobe BrowserLab

cs: represent adobe

<simonstewart> Ken_Kania

kk: work for google, Webdriver

bs: AT&T, mobile data services

interoperability in various fora

want to understand the challenges browser vendors have in automation

and how to leverage tools in repeatable continuous framework

to certify new devices as they come out, get updated, etc.

jh: Mozilla, test automation

ct: Mozilla, testing

ta: Google, work on Chrome

not as closely involved in testing, but have worked in CSS on some

<plh> involed in WAI. zstaff contact for PF, developping ARIA> we're struggling in testing. hoping to contribute to the test framework

<plh> ... we have reuirements that we'd like to bring as well

plh: W3C, Interaction Domain, lots of your favourite groups

want a common framework, common way to write tests

Agenda Overview

wa: first, want browser vendors to introduce how they do testing

then, presentations of a few testing approaches

finally, discussion of how to write tests for different types of functionality

90% of tests cover how something rendered to screen in a particular way

or script returns an expected result

or user fills out form and certain result

WebDriver API

ss: WebDriver is an API for automation of WebApps

developer-focused, guides people to writing better tests

Merged with Selenium a couple years ago

fairly simple, load page, find element, perform actions like focus, click, read, etc.

kk: does it simulate user input at driver level, or elsewhere?

ss: in past user interactions were done by simulating events in DOM

but browers inconsistent in how they handle those

when they do what etc.

so events at script level not feasible

so do events at OS level

that is high fidelity but terrible machine utilization

and wastes developer's time

so now, allow window not to have focus and send events via various OS APIs

but OS not designed to send high fidelity user input to background window

so now, Opera and Chrome pump events into event loop of browser

<scribe not sure that was caught right>

Webdriver has become a de facto standard for browser automation

most popular open source framework

as can be seen by job postings requiring familiarity with it

has reasonable browser support

Opera, Chrome, and Android add-on, Mozilla starting

uses Apache2 license

business-friendly license

nm: tried on mobile browsers?

ss: yes, in various <lists>

it's a small team

covering wide range of browsers and platforms

see 3 audiences for automation

1) App developers are vast majority

need to test applications

hard to get developers to write tests, and can only get them to write to one API when you get it at all

first audience for WebDriver

2) browser vendors

desire to automate their testing as much as possible

bs: how does Webdriver related to qunit <sp?>

ss: <didn't catch details>

bs: so Webdriver isn't a framework, it's an API for automating events

ss: clearly a browser automation API

e.g., understand Opera runs 2 million tests / day with this

3) Spec authors

some specs can be articulated entirely in script

and tested that way

others need additional support, this provides that

ee: more spec testers than authors?

ss: yes, those focusing on test aspects
... user perspective

it's a series of controlled APIs

to interrogated DOM

execute script with elevated priveleges

and provide APIs to interact, so not just read-only

jj: <question missed>

ss: <answer missed>

jj: avoids cross origin vulnerability?

ss: yes

bs: good, some complicated scenarious

ss: implementer view

neutral to transport and encoding

provide JSON

which bring clients that can handle immediately

also have released JavaScript APIs

ss: Security

<JohnJansen> My question was regarding the bypass of the x-origin security restriction

ss: automation and security are opposite concerns

<JohnJansen> answer: the jscript still honors that restriction, though webdriver itself ignores it.

generally, build support into browser

and enable it via an additional component

or command line features

ss: Demo

<shows short script, then executes>

kk: how Opera?

ss: Watir on top of WebDriver
... API designed to be extensable

expose capabilities via a simple interface or casting

jj: How are visual verifications handled?

ss: can take a screenshot, platform-dependent

Opera has extended with ability to get hash of the screenshot

attempt to capture entire area described by DOM, not just viewport

deals with difficulties like fixed positioning etc.

but very browser specific

jj: human comparison mechanism?

ss: in google, teams of people do that

we just provide the mechanism

don't want to over-prescribe how to process images, as state of the art continually changes

bs: to compare layout between different browsers

capture screens, or query position of elements?

ss: can do both

can get location of an element

and size

bs: how about different screens sizes

interested in specifically how things rendered in various circumstances

ss: the locatable interface can provide various types of measures

kk: differences among browsers are wide for many reasons

it's part of the landscape

ss: was able to use same tests using same APIs

at rendering level can be different

plh: platform AAPIs use similar services

hope e.g., ARIA can use WebDriver

ss: have looked at AAPIs, can look at elements by ARIA role etc.

on relationship to AAPIs

sometimes they're enough, sometimes not

one of the next big things in hybridized apps, part native and part Web

may need to use AAPIs to test

plh: think ARIA can be tested using this

ss: have applied Webdrive to native app testing using AAPIs

kk: there has been a path starting with MSAA

ss: AAPIs are extremely low-level

e.g., a combobox is represented as a few different controls together

kk: developers create all kinds of crazy things

so UI automation allows patterns

mc: can speak to AAPI from WebDriver

ss: Webdriver sits on top of AAPI

but because of script interface, could talk back and forth a bit

wa: Opera has a layer "Watir" on top of WebDriver

<shows sample>

test file looks like a manual test, e.g., a human could interact with it

<demos manual execution of test>

<that can also be executed using the script showed previously>

for each test file, there's a block in the automation script

ss: Webdriver simlilar

nm: <missed>

ss: <answer related to webelement.gettext>

jj: why wrapping in Watir

wa: was done before projects had merged

now doesn't matter as much

plan to submit Opera set of tests to HTML WG for official test suite

but want them in a format other browser vendors could use

Opera uses Ruby bindings, Mozilla uses Python bindings

need to automate in all browsers, Webdriver seems way to go

for official W3C tests, question of what language binding to use?

ss: Javascript is hugely known

Python is the other one being explored by Mozilla and Chrome

also is "politically unencumbered"

vs some other candidates out there

<MikeSmith> I vote for Javascript

wa: how complete are JS bindings?

js: still finalizing

kk: <something detailed>

js: API stable

loading script within browser is the part that still needs working on, to get around sandbox

it's usable now, but have debugging etc. to do

ss: so maybe Python preferable?

jg: having dependency on core could be a big stability issue

<^ not sure that's scribed right>

kk: dangerous to build on things that are changing

otoh, need bindings to be something that's available on all targets

ss: normally test and browser communicate like a client / server

can do over a web socket

and run test on machine independent of browser

wa: was able to test a mobile device on a different continents this way

plh: if we set up a test server on W3C site, could you allow it to just run tests at you?

ss: can connect from browser to a test server

so in theory, this works

but security concerns

need a manual intervention to put browser in testing mode

mc: have to trust W3C server from security POV

how we allow tests to be contributed needs to be careful

<general view of usefulness of this approach>

as: <missed>

<JohnJansen> as: is there support for IME? how good is it?

ss: support varies by platform as we prioritize development

<mentions wherefores and whynots>

do support internationalized text input

for testing I18N but could be used to test other stuff

do: how well documented is JS API?

ss: fairly extensive

<jhammel> http://code.google.com/p/selenium/wiki/JsonWireProtocol

Facebook developed PHP bindings using this documentation

Selenium stuff hosted under software freedom conservancy

can use w/o the open source stuff, but also handy to use the open source stuff

wa: Just started browser tools and @@ WG

<jhammel> http://www.w3.org/2011/08/browser-testing-charter

primary goal is to standardize Webdriver API at W3C

<jhammel> (i think)

welcome you all to join to make this happen

also want to explore whether all browser vendors can handle official test suites using Webdriver API

ss: aware of support from Google, Opera, Mozilla

explicit non-support from Microsoft, Apple, Nokia, HP

also support from RIM

plh: would Microsoft be able to accommodate tests using this?

kk: depends

standardization of the API will help a lot

<Another link for the WG is http://www.w3.org/testing/browser/>

also need tests structured in certain ways we can work with

<fantasai> kk: having the tests be self-describing is very important. If I was a TV browser vendor that doesn't support webdriver, I would want to be able to leverage the W3C tests as well

jg: tests always structured so you could run manually, though would be ridiculous to do so with them all in practice

ms: first thing we need is a spec

doesn't matter where editors draft hosted, can do at W3C

IP commitments kick in when we publish a Working Draft

ss, wa: ready to move right away on that

kk: W3C would own code?

ss: W3C would maintain spec

and a reference implementation

but there could be other implementations

mc: reference implementation doesn't necessarily have to be W3C

plh: spec is most important for W3C

ss: all Google testing in some way related to Webdrive

bs: supported in mobile?

ss: chrome and android

wa: also opera for mobile

bs: so other platforms is just lack of implementation?

ss: right; Nokia and Apple haven't implemented

just need a driver

kk: support IE6? want to get rid of that

ss: drop support when usage drops below a certain level

plh: support from Microsoft for Webdriver API will help HTML WG a lot

jj: even if Opera submits tests and HTML adopts, they're self-describing so still testable manually

plh: what does Nokia think?

nm: Nokia not really interested

focused on Webkit stuff

today is first time hearing about it

ss: it's not just about testing a spec, it's about ensuring users can use content in your browser

so that market force should drive interest even if internal interest is elsewhere

nm: how is performance?

ss: rapid on Android, but slow on emulator

Iphone is fast directly and in emulator

<something else> fast

nm: <missed>

<jhammel> ^ pixel verification

ss: haven't seen a lot of pixel verification on mobile devices

<scribe having a hard time hearing or understanding remainder of discussion>

<MikeSmith> agenda: http://lists.w3.org/Archives/Public/public-test-infra/2011OctDec/0014.html

<dobrien> Could we get the minutes updated again as well please?

jj: propose not requiring webdriver in first version of test suite

<bryan> Scribenick: bryan

Testing IE

kk: To walk thru testing of IE
... shows slides "Standards and Interoperability"

<fantasai> IE testing diagram: Standards, Customer Feedback, Privacy, Accessibility, Performance, Security

<fantasai> (these are pictured as hexagrams around a central "Internet Explorer" label)

kk: IE testing has various chunks as shown on the slide (slides to be shared)

<fantasai> "Internet Explorer Testing Lab" w/ photo

<fantasai> IE5 -> IE10

<fantasai> 948 Workstations

<fantasai> 119 servers

<fantasai> 1200 virtual machines

<fantasai> remotely configurable

<fantasai> 152 versions of IE shipped every "Patch Tuesday"

<fantasai> Green Lab Initiative saves ~218 tons of CO2/Year

kk: IE testing lab using a lot of machines with a lot of IE versions tested every week

<fantasai> "Standards Engagement"

<fantasai> ECMA

<fantasai> TC39 (Ecmascript 5)

<fantasai> W3C

<fantasai> - CSS

<fantasai> -WebApps

<fantasai> -HTML

<fantasai> -SVG

<simonstewart> Slides for the webdriver notes: https://docs.google.com/present/edit?id=0AVrYfCxRNKUGZGc5Nm1ocGhfNzFnaGd2bmZnYw

<fantasai> -XML

<fantasai> cycle diagram: Testing -> spec editing -> implementations -> (loop back to Testing)

<fantasai> "Standard Contributions"

<fantasai> - Spec editing

<fantasai> -co-chairing

<fantasai> -test case contributions w3c and ecma

kk: encourage standards engagement and participation in various groups

<fantasai> -- 14623 tests submitted

<fantasai> -- across IE9/IE9/IE10 features

<fantasai> - hardware (Mercurial server)

<fantasai> - IE Platform Preview Builds

kk: have contributed a lot of tests and hardware
... preview builds allow early access and feedback

<fantasai> "IE10 Standards Support"

<fantasai> CSS2.1 , 2D Transofrms, 3D Transforms, Animations, backgroudns and Borders, Color, Flexbox, Fonts, Grid alignment, hyphenation, image values gradients, media querie,s multi-col, namespaces, OM Views, positioned floats, selectors, transitions Value sand Units

<fantasai> DOM element traversal, HTML, L3 Core, L3 Events, Style, Traversal and Ragne

<fantasai> ECMASCRIPT 5

<fantasai> File Reader API

<fantasai> FIle Saving

<fantasai> FormData

<fantasai> Geolocation

kk: IE 10 will support a lot of standards CSS, HTML5, Web APIs, ... http://ietestdrive.com

<fantasai> HTML5 appcache, asycn cavnas, drag and drop, forms and validation, structure clone, history API, parser sandbox, selection, semantic element,s video and audio

<fantasai> ICC Color profiles

<fantasai> Indexed DB

<fantasai> Page Visibliity

<fantasai> Selectors API L2

<fantasai> SVG Filter Effects

<fantasai> SVG standalone and in HTML

kk: also look at the IE blog

<fantasai> Web Sockets

<fantasai> Web Workers

<fantasai> XHTML/XML

<fantasai> XMLHttpREquest L2

<fantasai> "Items for Discussion"

<fantasai> * WG Testing Inconsistent

<fantasai> - when are test created? Before LC? CR?

<fantasai> - Whena re tests reviewd?

<fantasai> - vendor prefixes

<fantasai> - 2+ impl passing test srequired for CR/

<fantasai> * Review Tools (none)

kk: issues are inconsistent testing across WGs

<fantasai> Note -- that's not quite true anymore, plinss wrote one for csswg :)

kk: when tests are created e.g. related to last call or earlier
... soft rules for how a spec is allowed to progress are maybe not enough

plh: these are soft rules currently

jj: test tools recently developed have helped with consistency, flushing our remaining inconsistencies is a goal
... different test platforms result in different tests as submitted to W3C

Michael_Cooper: experience has convinced that tests should be available by last call

Kris_Krueger: why would this not be a rec across W3C?

plh: its not easy to enforce
... some WGs will complain

jj: amping the expectations on testing will help

mc: it should be the rule, with exceptions allowed

<Zakim> MichaelC_SJC, you wanted to say I now believe tests need to be ready by Last Call

Elika_Etemad: implementations are needed to see how tests are working

James_Graham: the process does not map to browser development reality

Elika_Etemad: its difficult to say when spec development is done thus making a hard deadline

<dobrien> @

<dobrien> Mhmv @7

John_Jansen: problems often cause the specs to move backward

<dobrien> Sorry about that.

Elika_Etemad: CR is test the spec phase, not fixing bugs in browsers
... having to move CR back due to bugs is an issue, we need an errata process to allow edits in CR

plh: we are not here to fix the W3C process

John_Jansen: the more times you go thru the circle (edit/implement/test) the better, and also the earlier

James_Graham: when we implement we write the tests... test suites should not be closed

<fantasai> James_Graham: The state of the spec is irrelevant to when we write tests

Mike_Smith: the Testing IG is scoped broadly perhaps too much so. The IG will decide what its products will be, e.g. a best practice on when test suites are developed.
... writing this down even if we do not fix the process will help others avoid the same mistakes of the past
... it will still have some value

Wilhelm_Anderson: how do you run tests, what is automated, is development inhouse

Kris_Krueger: write our own tests

plh: from JQuery?

Kris_Krueger: no, customer feedback is also considered
... e.g. Gmail support provides feedback
... have a lot of automated tests, ship every Tuesday, and get quick feedback from users/developers

Narayana_Babu_Maddhuri: is there any review of the test cases to determine is the test a valid test, validation of the test results?

plh: the metadata of the test log should clarify what is being tested

Kris_Krueger: pointing to where the test relates to the spec is helpful

plh: we cannot force metadata into tests, but we can encourage this info to help ensure test value clarity

Narayana_Babu_Maddhuri: good reporting would be helpful

plh: knowing e.g. what property works across devices and platforms is a goal, and matching tests to specs would support that

James_Graham: knowing why something is failing is sometimes difficult, dependencies are not clear and why the test failed is unclear

<plh> [lunch]

<MichaelC_SJC> == Lunch break is 1 hour ==

<ctalbert_> http://people.mozilla.org/~ctalbert/automationpresentation/Automation.html

Testing Firefox

<krisk_> Firefox Testing Presentation

<krisk_> clint: Tools automation lead at Mozilla

<krisk_> Clint: overview of their testiong

<krisk_> Grown over the years

<krisk_> Test Harnesses

<fantasai> "Automation Structure: Test Harnesses"

<fantasai> - C++ Unit

<krisk_> C++ Unit testing, XPCShell, no too intresting for this group

<fantasai> - XPCShell (javascript objects)

<fantasai> - Reftest

<fantasai> -Mochitest

<fantasai> -UI Automation Frameworks

<fantasai> - Marionette

<krisk_> Mochitest - tests dom stuff

<krisk_> New UI automation framework - Marionette

<krisk_> Reftest drill down

<fantasai> "Reftest: style and layout visual comparison testing"

<fantasai> Reference: <p><b>This is bold</b></p>

<fantasai> Test: <p style="font-weight: bold">This is bold</p>

<fantasai> clint: The test and the reference create the same rendering in different ways.

<fantasai> clint: Then we take screenshots and compare them pixel by pixel

<fantasai> clint: Mochitest is an HTML file with some javascript in it.

<fantasai> clint: One of the libraries it pulls in is the SimpleTest library.

<fantasai> clint: It has the normal asserts: ok, is, stuff to control whether asynchronous or not

<fantasai> clint: This other file here (in this example) turns off the geolocation security prompts

<fantasai> clint shows a geolocation test

<jhammel> ^ http://mxr.mozilla.org/mozilla-central/source/dom/tests/mochitest/geolocation/test_allowWatch.html

<fantasai> plh: How does this route around the security checks?

<fantasai> clint: uses an add-on

<fantasai> clint: has a special powers api

<fantasai> "Marionette: Driving Gecko into the future"

<fantasai> This is a mechanism we can use to drive any gecko-based application either by UI or by inserting scrit actions into its various script contexts.

<fantasai> How it works -

<fantasai> 1. socket opened from inside gecko

<fantasai> 2. Connect to socket from test harnes, either local ro remote

<fantasai> 3. Send JSON protocol to it

<fantasai> 4. Translates JSON protocol into browser actions

<simonstewart> uses webdriver json protocol streamed over sockets directly

<fantasai> 5. Send results back to harness in JSON

<jhammel> wiki page: https://wiki.mozilla.org/Auto-tools/Projects/Marionette

<jhammel> (WIP)

<fantasai> clint: We run all of these test on every check in every tree we build on.

<fantasai> clint: Goes into a dashboard

<fantasai> slide: shows screenshot of TinderboxPushLog

<fantasai> wilhelm: Can we steal your Mochitests? What do we need to do to do so?

<fantasai> clint: Check them out of the tree and see how well they run in Opera

<fantasai> clint: Some of the stuff we did, e.g. special powers extension,

<fantasai> clint: but it's now a specific API (used to be scattered randomly throughout tests)

<fantasai> clint: If you had something similar and named it specialpowers, then you could use that to get into your secure system

<fantasai> clint: So should be possible.

<fantasai> clint: A lot of tests we have in the tree are completely agnostic; don't do anything special at all, should work today

<jhammel> mochitests are at http://hg.mozilla.org/mozilla-central/file/tip/testing/mochitest

<fantasai> wilhelm: Are there plans to release these tests to geolocation wg?

<fantasai> clint: I think they already did. guy wrote tests is on that wg

<fantasai> kk: ... they're hard-coded to use the Google service. If you don't use it, they don't run...

<fantasai> kk: Not too many though

<fantasai> some discussion of sharing tests

<fantasai> Alan: I think WebKit is using some Mozilla reftests, but not using them as reftests

<fantasai> kk: I'm fine w/ reftests. But of course won't work for everything.

<fantasai> kk: CSS tests we wrote are self-describing.

<fantasai> Alan: do you have automation?

<fantasai> kk: Yes

<fantasai> rakesh: Do you run the tests every day?

<fantasai> clint: Every checkin

<fantasai> clint: Different trees run different numbers of tests.

<jhammel> https://tbpl.mozilla.org/

<fantasai> clint: Our goal is to have test results back within 2 hours. Right now we're averaging 2.5hrs

<fantasai> fantasai: You're responsible for watching the tree and backing out if you broke something.

<fantasai> discussion of test coverage

<fantasai> discussion of subsetting tests during development

<fantasai> wilhelm: How much noise do you have?

<fantasai> clint: Don't know about false positives

<fantasai> clint: Probably not many; once we find one, we check for that pattern elsewhere

<jhammel> orange factor, for tracking failures: http://brasstacks.mozilla.com/orangefactor/

<fantasai> clint: Thing we really have is intermittent failures

<fantasai> clint: We're trying really really hard to bring it down

<fantasai> clint: Used to be on every checkin you'd get, on average, 8 intermittent failures

<fantasai> clint: we pushed it down to 2

<fantasai> clint: And then we added the Android tests

<fantasai> clint: trying to bring it down again

<fantasai> duane: Can I instrument Marionette today in FF7?

<fantasai> clint: No, code we're depending on now is landing currently on Nightly

<fantasai> clint: Released probably... May?

<fantasai> clint: Depending on work done by Developer Tools group

<fantasai> clint: They have a remote debugging protocol they're implementing

<fantasai> clint: Will be really nice; decided this would be great to piggyback on. Don't need two sockets in lower-level Gecko.

<fantasai> clint: So won't be available until that's released.

<fantasai> clint: Currently in a project repo... land in Nightly in ~2.5 weeks

<fantasai> plh: Marionnet is only for Fennec, not for desktop version?

<fantasai> clint: For Fennec right now. Planning to go backwards and use for Desktop as wel.

<fantasai> clint: My goal is to move all our infrastructure towards that

<fantasai> kk asks about reducing orange

<fantasai> clint: It's mostly a one-by-one effort of fixing the tests

<simonstewart> Interesting comment about avoiding using setTimeout in tests

<fantasai> kk: Are you going to take Mochitests into W3C? Anything preventing you?

<fantasai> clint: Nothing right now. We'd have to clean them up and make them cross-browser. Good for everyone, not opposed, j ist a matter of finding people and time

<fantasai> jgraham: there's a bug on making testharness.js look like Mochitest to Mozilla

Testing Opera

<fantasai> "This looks vaguely familiar"

<fantasai> wilhelm: Say a few words about testing at Opera

<fantasai> wilhelm: We have a mainline, which is supposedly always stable, and then when we're developing a feature, it gets branched and at some point tests start passing (that's the yellow, b/c out of sync with mainline) and then we merge and that becomes mainline

<fantasai> diagram shows mainline with six green dots going forward

<fantasai> branch goes off, two red dots, one yellow

<fantasai> arrow from mainline to green dot on feature branch

<ctalbert_> The wiki page we(mozilla) wrote that details our "lessons learned" from fixing intermittently failing tests is here: https://developer.mozilla.org/en/QA/Avoiding_intermittent_oranges

<fantasai> arrow from green dot back to green dot on mainline

<fantasai> jgraham: ...

<fantasai> jgraham: Our setup's a bit different

<fantasai> jgraham: All the tests are in subversion in their own repository that's separate from the code. It's just a normal webserver: apach, php

<fantasai> jgraham: When you ask for tests to be run, they get assigned from the server and we send them out to a couple hundred virtual machines

<fantasai> jgraham: not quite MSFT's setup

<fantasai> jgraham: And then we store every result of every test

<fantasai> jgraham: I think you just store did all the tests past.. we store, in this build this test passed.

<fantasai> jgraham: We have a huge database of this information

<fantasai> jgraham: Theoretically we can delete stuff, but we store everything.

<fantasai> jgraham: In a mainline build from yesterday, we ran quarter of a million tests

<fantasai> jgraham: That's not quarter million files -- it's 60,000 files, some of which produce multiple results

<fantasai> jgraham: e.g. some tests from HTML5 test in W3C, one file might produce 10,000 results

<fantasai> jgraham: Typically it's a JS thing and it just runs a bunch of code and at the end it has some results

<fantasai> jgraham: Dumps them to the browser in some way

<fantasai> jgraham: The way we do that right now is pretty stupid, so I won't talk about it

<fantasai> slide: Visual tests, JS tests, Unit tests, Watir tests, Manual tests :(

<fantasai> jgraham: System was designed 7 years ago or sth

<fantasai> jgraham: For visual tests, you just take a screenshot, and then we store the screenshot.

<fantasai> jgraham: Someone manually marks whether that screenshot was a pass or fail.

<fantasai> jgraham: Don't do that. You have to do it once per test, and then once any time anything changes very slightly

<fantasai> jgraham: e.g. introduce anti-aliasing test, have to re-annotate all tests

<fantasai> jgraham: this format is deprecated

<fantasai> wilhelm: We have 20,000 tests on 3 different Opera configurations...

<fantasai> wilhelm: We want to kill these tests and use reftests instead

<fantasai> jgraham: Oh, reftests should be on that list too

<fantasai> jgraham: Recently we implemented reftests, and we're actively trying to move tests to reftests.

<fantasai> jgraham: You can't test everything with reftest, but when you can it's much better

<fantasai> Alan: Do you keep track of when the reference file bitmap changes?

<fantasai> Alan: What if both the reference and the test change identically such that the test should fail but doesn't?

<fantasai> plinss: In the case of the CSSWG when we have a fragile reference, we have multiple references that use different techniques

<fantasai> jgraham: We have a very lightweight framework we used to use for JS tests. Only allowed one test per page.

<fantasai> jgraham: Easy to use, but required a lot of convoluted logic for each pass/fail result.

<fantasai> jgraham: For new test suites, we're using testharness.js

<fantasai> jgraham: similar to Mozilla's MochiKit

<fantasai> jgraham: Unit tests are C++ level things not worth talking about here

<fantasai> jgraham: When things need automation, we use Watir -- discussed this morning

<fantasai> jgraham: When all else fails, we have manual tests

<fantasai> wilhelm: Notice that the monkey looks really unhappy

<fantasai> jgraham: For the core of Opera, we schedule a test day and just run tests

<fantasai> plh: How many manually tests do you have?

<fantasai> wilhelm: around 2000 before, less now...

<fantasai> wilhelm: Probably spend about a man-year on manual tests per year

<fantasai> wilhelm: Say some things about challenges we have, things we need to take into account when writing tests internally and for W3C

<fantasai> wilhelm: First thing is device independence

<fantasai> wilhelm: We run 3 different configurations of Opera: Desktop profile, Smartphone profile, and TV profile

<fantasai> wilhelm: Almost every time someone requests a build, it will be tested on those three profiles

<fantasai> wilhelm: We notice that if you have a static timeout in your test, e.g. wait 2s before checking result, that will break on stupid profile with low resources

<fantasai> wilhelm: On some platforms we automatically double or triple it, and we hope it works, but it's not really good solution

<fantasai> jgraham: How do you deal with ... ?

<fantasai> clint: we time out our tests after a set time period and mark it as failed

<fantasai> jgraham: Most assumption is don't depend on device size or speed -- test will randomly fail.

<fantasai> wilhelm: Brings me to the next problem: random

<fantasai> wilhelm: If you have so many tests and even small percentage fail randomly, going to spend man-years investigating those failures

<fantasai> wilhelm: When we add new configurations, when we steal tests from source of unknown quality, we spend many man-years stamping out randomness in the tests

<fantasai> wilhelm: The more complex the test, the more likely to randomly fail

<fantasai> wilhelm: Simplest tests are JS.

<fantasai> wilhelm: For imported tests from random sources, could be very bad

<fantasai> wilhelm: Then comes visual tests

<fantasai> wilhelm: Sometimes complexity is needed, but if can simplify will do that

<fantasai> wilhelm: We have a quarantine system: run 200 times on test machines first to make sure its good

<fantasai> wilhelm: Still, sometimes things slip through.

<fantasai> wilhelm: We steal your tests. Thank you.

<fantasai> slide: jQuery, Opera, Chrome, Microsoft, mozilla, W3C

<fantasai> wilhelm: Keeping in sync with the origin of the test is difficult

<fantasai> wilhelm: When someone updates a test elsewhere, w don't automatically get that

<fantasai> wilhelm: When we muck about the test to get it to work on our system, we have to maintain patches

<fantasai> wilhelm: If we fix bad tests, sometimes easy to contribute back, but sometime not

<fantasai> wilhelm: Automating tests to use our Watir scripts, can also become a problem.

<fantasai> wilhelm: Our current approach is not usable

<fantasai> wilhelm: need a better way for us all to keep in sycn

<fantasai> kk: This is why we have submitted and approved folders

<fantasai> jgraham: The problem from our POV is really .. part of it is version control problem on our

<fantasai> end

<fantasai> jgraham: Don't have a good way to keep our patches separate from upstream changes

<fantasai> jgraham: If we have w3C tests, and we pull new version, don't have a way to say "these are bits we changed ot make it work on our version"

<fantasai> jgraham: ... reporting and script file separate

<fantasai> jgraham: if we pull some tests from Mozilla, say, and they're JS engine tests andthey update them, if we try and merge them.. someone has to work out how to do that by hand. It's kindof a nightmare.

<fantasai> wilhelm: Last thing about randomness, esp imported

<fantasai> wilhelm: Some tests rely on external tests.

<fantasai> wilhelm: Great when we only had a few tests

<fantasai> wilhelm: But now it's a problem. Servers go down, etc.

<fantasai> wilhelm: Conclusion there is: don't do that. :)

<fantasai> wilhelm: That's it!

<fantasai> jhammel: Wrt upstream tests, standardizing on formats and standardizing on process

<fantasai> wilhelm: We set up time at 3:15 today to discuss this exact issue

<fantasai> mc: You say you have to fix tests to work on your product.

<fantasai> mc: Question is how do you separate fixing test to be not random, vs. making them work on a particular product

<fantasai> jgraham: When we pull in tests, we try not to change anything to do with the test.

<fantasai> jgraham: We don't require the tests to pass to be in our system.

<fantasai> jgraham: The thing we need to change is, can this test report back to our servers.

<fantasai> jgraham: But external tests are usually not designed that way.

<fantasai> wilhelm: I think testharness.js approach is good, because those are separated.

<krisk_> That is the end of Opera

<MichaelC_SJC> 's presentation

<krisk_> The next person up is peter from HP on css wg update (10 minutes)

<krisk_> Then a discussion on rendering tests for about 1 hour

Testing in the CSS WG

<krisk_> test.csswg.org

<krisk_> has lots of information on CSS WG testing

<krisk_> Tests are 'built' from xml into multiple formats - html, xhtml, etc...

<krisk_> Test harness is a wrapper around the tests that are loaded in an iframe

<krisk_> It loads the tests that have the least number of tests

<krisk_> The harness has a filter for spec section, etc..

<krisk_> The harness has meta-data description for each of the tests

<stearns> test format requirements: http://wiki.csswg.org/test/css2.1/format

<krisk_> The harness also has test results that can be shown for each of the browser/engine versions

<krisk_> Build process has requirements that will be improved overtime - meta data, ref test, title, etc...

<krisk_> Adding meta-data helps review process, though most submitters don't like to add this data

<krisk_> Multiple refs for the same test exist and a negative ref test as well

<krisk_> You can have two ref tests if the spec has two different results - for example margin collapsing

<krisk_> If a ref test can't be used then in some cases a self-describing test works

<plinss> http://test.csswg.org/annotations/css21/

<krisk_> Spec annotations are used that map back to the annotated spec

<krisk_> The annotated spec has total tests and results for each section of the spec

<krisk_> Now on to the test review system

<krisk_> http://test.csswg.org/shephard/

<krisk_> Very tight coupling to the css test metadata

<krisk_> Tracks history and other information about a test case

<krisk_> jgraham: is this tied to the test file?

<krisk_> peter: no it's possible to have this information in another file

<krisk_> jgraham: can this handle a case when multiple files are used to create alot of tests

<krisk_> peter: yes we have the same issue for the media query test cases

<krisk_> Wilhelm: So does css still use visual non-ref tests?

<krisk_> fantasi: for css3 we require ref-tests, so no

<Alexia> http://b39b5112.thesegalleries.com

<plh> s|http://b39b5112.thesegalleries.com||

<krisk_> peter: The system is built to save time and automate parts

<krisk_> peter: for example when a test is approved it is moved from submitted to approved

<krisk_> Michael: Does the system have access control checks for approval?

<krisk_> peter: yes

Testing Chrome

<krisk_> Ken: Chrome Testing Information

<simonstewart> kk: works on the chrome automation team

<simonstewart> kk: not an automation group in the same sense as mozilla

<simonstewart> chrome depends on webkit

<krisk_> kk is not krisk

<simonstewart> webkit layout tests, pixel-based tests

<simonstewart> kk == ken_kania

<simonstewart> kk: dom dump tree tests

<simonstewart> kk: not got a lot of insight into the specifics of the webkit tests. Focuses mainly on the chrome browser

<simonstewart> kk: couple of layers of testing

<simonstewart> kk: lowest layer is the c++ browser tests

<simonstewart> kk: probably more than other browsers do. Special builds of chrome which will run C++ in the ui thread

<simonstewart> kk: relatively low level, though

<simonstewart> kk: beyond those, there are the ui test framework. Based on the automation proxy (AP)

<simonstewart> kk: ap is pretty old, but is an ipc mechanism

<simonstewart> kk: very much internal facing

<simonstewart> those tests are still fairly low level, depsite being called ui tests

<simonstewart> kk: higher than that, Ken's team work on something called the chrome bot

<simonstewart> kk: runs on real and virtual machines

<simonstewart> kk: cache of a large number of sites in a cache. Often used for crash testing. Also include tests that perform random ui actions

<simonstewart> kk: a little bit smarter than pure random, but that's the gist

<simonstewart> kk: qa level tests. Tests that are done by manual testers. Piggy back off the ui test automation framework. things ilke creating bookmarks, installing extensions, etc

<simonstewart> kk: break down manual testing to test parts. First app compat. Push a new release of chrome it continues to work, and testing chrome at the ui level

<simonstewart> Most of the ui is "based on the web"

<simonstewart> For the chrome specific native widgets there are manual tests

<simonstewart> kk: app compat depends on webdriver

<simonstewart> kk: lots of google teams depend on webdriver to verify that sites work.

<simonstewart> kk: guess that at a high level, the testing strategy tends to be developer focused.

<simonstewart> kk: devs should write the tests in whatever tool and harness is most expedient for their purpose

<simonstewart> kk: piggy back a lot on the fact that chrome does rapid releases. 4 channels release to users (canary, dev, beta, stable)

<simonstewart> kk: different release schedules

<simonstewart> kk: depend a lot on user feedback from the canaries

<simonstewart> kk: that's the gist of it

<simonstewart> tab: sounds good to me

<simonstewart> jhammel: do chrome do performance testing?

<simonstewart> kk: we do. Using the AP and the ui testing framework mentioned earlier

<simonstewart> http://build.chrome.org

<simonstewart> to view the tests that have been run

<simonstewart> plh: do we run jquery tests

<jhammel> ^ correction: http://build.chromium.org

<simonstewart> kk: not really. webkit guys might, and we pick that up

<simonstewart> krisk_: do you create tests and feed them back

<simonstewart> TabAtkins: we don't do much, but we do

<simonstewart> krisk_: is that because it doesn't fit with the systems

<simonstewart> TabAtkins: the ways we write and run tests isn't really compatible with the existing w3 systems.

<simonstewart> TabAtkins: would like to change that!

<simonstewart> TabAtkins: some tests are html/js. which might be used where possible. Doesn't ahppen that regularly

<simonstewart> krisk_: how do you know that you're interoperable?

<simonstewart> TabAtkins: in terms of webkit stuff, it's a case of testing being done by different browser vendors

<simonstewart> kk: lots of c++ tests that are specific to chrome

<jhammel> simonstewart: np :)

<simonstewart> krisk_: v8?

<simonstewart> TabAtkins + kk: v8 team live in europe. Who knows?

<simonstewart> wilhelm: also has legacy stuff for opera. New tests written in a way that (in theory) is usable outside. Can chrome do the same thing?

<simonstewart> TabAtkins: will agitate for that. Involved in spec writing rather than active dev, so might be tricky

<simonstewart> wilhelm: This is a great forum to raise those issues. Opera happy to share with Chrome if Chrome does the same :)

<simonstewart> krisk_: do chrome try and pass a bunch of the w3c test suites?

<simonstewart> TabAtkins: yes. Some of the might be integrated into the chromium waterfall. Some of them might be run by hand

<simonstewart> ?? does anyone know about webkit testing

<simonstewart> TabAtkins: the people who'd I'd like to ask aren't around

<simonstewart> webkit does seem to take in test suites from mozilla. They're running against a bitmap that's different from the moz rendering

<simonstewart> TabAtkins: we don't have a good infrastrcuture for ref tests

<simonstewart> TabAtkins: the test infrastructure people _do_ want to fix that

<simonstewart> TabAtkins: every time a new port is added to webkit, there are more pixel tests. Provides pressure to do better

<simonstewart> plh: any other questions?

<simonstewart> 15 minute break coming up

Info available from webkit: https://trac.webkit.org/wiki

also see http://www.webkit.org/quality/testing.html

<krisk_> Next agenda Item jgraham talking about testharness.js

<MichaelC_SJC> scribe: testharness.js

<MichaelC_SJC> scribe: krisk_

krisk_

testharness.js

<MichaelC_SJC> s/topic: krisk_//

<fantasai> scribenick: fantasai

jgraham: testharness.js is something I wrote to run tests.
... It runs JS tests specifically
... It's a bit like MochiTest or QUnit which JQuery uses, or various things

<plh> --> http://w3c-test.org/resources/testharness.js testharness.js

jgraham: Every JS framework has invented its own testharness
... This has slightly different design goals
... The overarching goal is that it's something we can use to test low-level specs like HTML and DOM
... So it can't rely on lots of HTML and DOM :)
... The design goals were to provide some API for writing readable and consistent tests

in JS

jgraham: Our previous harness at Opera, as I mentioned, didn't resul in very readable

tests

jgraham: The other is to support testing the entire DOM level of behavior
... There are 2 test types : asynchronous tests and synchronous tests
... second us purely syntactic sugar
... Another design goal was to allow possibility of the test to have multiple assertions, and all have to be true for test to pass
... typical example might be checking that some node has a set of children.
... Might want to first test for any children before testing that 4th child is a <p>
... Multiple tests per file was a requirement; learning from Opera's 1/file, which was painful for test writers and discouraged many tests
... ... runs everything in try-catch blocks
... One feature of that is that every bit of the test is like a function, basically
... it tries to handle some housekeeping.
... if you have 1000 tests in a file, nice if you can time out those tests individually
... Uses settimeout(); can override that if you want, e.g. if running on slow hardware
... and a design goal was easy integration with browsers' existing test systems
... Should be easy to use on top of MochiKit or whatever you use for reporting results
... next thin I thought I'd do is go through creating a test.

jgraham's text editor:

<script src="resources/testharnessreport.js"></script.

<script src="resources/testharness.js"><script>

<div id="log"></div>

jgraham: By default testharnessreport.js is blank. It's for you to integrate into your testing system.
... the order is not at the moment relevant
... we might later check in testharness.js that testharnessreport.js was included

added to file:

(at the top)

<title> Dispatching custom events</title>

(at the bottom)

<script>

var t = async_test("Custom event dispatch");

</script>

jgraham: Each test has a number of tests, and each step is a function that gets called
... It gets called inside a try-catch block, and we can check if the test failed. We don't put anything as top-level code.

(added at the bottom)

t.ste(function() {

(ok, that's too much to type)

jgraham: Here it's adding an event listner before the second step
... When it gets called, it'll cal lthis other function here, which will run this other step, which is another function. Can get a bit verbose.
... There's a convenience method that will make this easier.. all documented in testharness.js
... Simple assert_equals() with value we get, value we expect, and then you can optionally have a string that describes what it is you're asserting.
... At this point everything we want done is done, so we say t.done();
... If you load this in a browser, because we have div#log, it will show whether it passes or fails and what assert failed

<plh> --> http://w3c-test.org/webapps/ElementTraversal/tests/submissions/W3C/Element-childElementCount.html Example of testharness.js

jgraham: That's all

jj: Is there an id on the steps, so that you can say you failed step 4 of test foo?

jgraham: If there's demand, there could be a second argument there.

jj: would be nice to know where it failed so I can set a breakpoint there

jgraham: If you get a huge number of tests per file, it's usually auto-generated
... if it's failing in an assert, then it'll tell you which assert failed

plh shows his example

plh: everything shown here is generated by testharness.js

jgraham: There's a failure in this, and it seems everyone fails that.

plh: Bug in testharness.js

jj: Easiest way to debug the test. Is there an error in the test, error in testharness.js, or error in browsers

jgraham: There are various types of assertions. Usually corresponds to webIDL
... But what's in webIDL isn't always the same

kk: It's pretty well-written, only 700 lines or so

clint: If it's synchronous, you don't have to do t.step()

jgraham: A test that is synchronous implicitly creates a step

wilhelm: Opera currently uses this tool for all the new tests that we write. Can others use this?

clint: Yeah, I think so

kk: There use to be some nunit or something that W3C had
... Was in IE, but some browsers couldn't run it.
... Very complicated

[server problems]

plinss: Are tests grouped by section into files?

jgraham: In this case, it checks reflection section, plus section of each part of the spec that defines a reflected attribute

topic change

wilhelm: plh wanted to talk about test harness, fantasai wanted to talk about syncing problem

How should we organize public test suites so that they are as easy as possible to contribute to and reuse?

htp: //w3c-test.org/framework/

MikeSmith: This is an instance of the framework peter demoed

Mike: I'm going to show you what has been added here to make it easier for test suite maintainers to add data to the system.
... There's this area called Maintianer Login
... It'll give you an http_auth, which authenticates against W3C's user database
... Email me if you want access to the system
... Once you go in there you'll see 2 options: add metadata, change metadata
... Can add a specification
... one early piece of feedback I got was they have tests they want to run that are not associated with a spec.
... So in this instance of the system, it's not a requirement to have a spec for your test suite
... You can give it an arbitrary ID as long as not a duplicat
... Title of the spec
... URL for the spec
... It expects you'll point it to a single-page version of the spec
... If you have a multi-page spec, don't point it at the TOC. You need the full version of the spec.
... Could change later, but initially set up this way 'cuz easier
... This will get added to the list here
... Next thing you can do is needed if you want to do what Peter was demoing earlier, which was associating testcases with specific sections of the spec -- or specific IDs in the spec
... Structured around idea that you put your IDs per section
... But some WGs like WOFF WG they're putting assertions at the sentence level
... They don't actually have section titles, so needed to accommodate that too

Peter: Alan and fantasai did some work on that, too.
... Shepherd tool will be able to parse out spec to find test anchors
... and then can report testing coverage of the spec, so this is something we will automate

Alan: What fantasai and I worked out was based on WOFF work, but will be simpler for spec editors. A bit harder to automate, though

Mike: This part add spec metadata.
... Instead of a form to fill out, it lists existing specs in the system
... once you go here, if there's already data in the system, will show you data in the system alread
... otherwise it'll show you generated data
... This parses the spec and pulls out the headings. If it looks ok, you press submit
... It'll put these section titles into the database.
... If you have IDs below the section title level, then you'll have to use a different way to get it into the DB
... You might have to get me to do it for now :)
... Those steps are optional right now.
... What is necessary is going in and giving info about the test suite itself.
... you can give it an arbitrary ID
... Title, longer description
... to explain better thet est suite
... base URL of where your test suites are stored
... Difference from CSS is, that one requires format subdirectories

plinss: it's optional

Mike: This one doesn't expect subdirectories. Expects all tests in this one directory
... If you have separate subdirectories...
... Need to make different test suites or ...
... Simplest case you have all tests in one directory

plinss: The code's actually a lot more flexible wrt formats. We'll talk offline.

MikeSmith: Then you have contact information for someone who can answer questions about test suites
... Then you indicate format of the test suite
... Then you have a list of flags, you can select which ones indicate optional tests
... There are ways to add flags to the system
... No ui for it, so contact me
... Last thing you then do is upload a manifest file
... You have to have a test suite
... You select a test suite
... and then what I have it do right now is that you need to point it to the url for a manifest file, and it'll grab that and read it in
... Right now two forms of manifest files that it will recognize
... second one here is just a TSV that expects path/filename, references, flags, links, assertions
... links are the spec links
... The other big change is, I was talking with some people e.g. annevk and ms2ger
... the format they're using is just listing the filenames
... it marks support files as support files

kk: Mozilla guys wanted to know what files were needed to pull to run a test case

plinss: In the CSSWG, the large manifest file with metadata -- that gets built by the build system

MikeSmith: This form expects the full filename, not just the extensionless filename
... Because that's what they had
... Once you have that, you should be able to get your test cases into the test database
... and it'll show up on the welcome page
... Long way to go on this.
... Goal when I started on this was to get it to the point where I didn't have to manually do INSERT in SQL to get specs into the database
... What would be really nice is if ppl start using this and getting more test suites in there so that we can ..

plinss: But right now only limited set of ppl can contribute to that code

MikeSmith: I created two groups in our database
... I created a group for developers -- anyone who wants to contribute to framework
... That'll give you write access to hg repo for the source code for this
... Take a look at source code and see problems, send me patches or I'll give you direct access
... Second thing is if you want to have access to use this UI to submit test suite data, I'll have to add you to a particular group

fantasai: how is this code related to plinss's code?

MikeSmith: It's forked from that.
... I've just been pulling the upstream changes
... been able to merge everything without it breaking.
... Think it's in good enough shape that we could port it back upstream

plinss: This system and the Shepherd share a lot fo the same base code
... Lots of things I was going to port Shepherd system back into this system, and then pull your stuff in too
... Mike also has code that ties into the testharness.js code, and will automatically submit results from that

MikeSmith: If you go to enter data, it gives you some choices about whether you want to run full test suite or not
... There's a button here that will pull automatic results where possible
... Be careful, this will submit the data publicly!

jgraham: Not saying it's a bad idea, but from our POV, we're not going to use it offline.

(Brian was talking about trying out the system privately offline)

plinss: The system tracks who's submitting the data. By login if you're logged in, by IP if not

Brian: Privacy is useful

plinss: goal is for pulling data from as may sources as possible

wilhelm: fantasai wanted to talk about keeping things in sync

<dobrien> Is someone scribing? I can't keep up on the iPad

<ctalbert_> This is the writeup that we are planning to set up at Mozilla for the CSS tests specifically: https://wiki.mozilla.org/Auto-tools/Projects/W3C_CSS_Test_Mirroring

<krisk_> Mozilla has a way to move tests from mozilla -> w3c -> mozilla

<ctalbert_> wilhelm: how will this cope with local patches?

<krisk_> fantasi: The master copy only lives in one place...

<ctalbert_> jgraham: probably not a problem with the css tests

<krisk_> fantasi: approved is the master in w3c

<krisk_> fantasi: submitted is the master for submissions

<ctalbert_> jgraham: opera is thinking of having the master from w3c which is intact, and our checkout from that master will have the local patches, and when we pull we'll rebase our patches atop the w3c master

<ctalbert_> this should be possible now that hg is in the w3c side and our (opera) side

<ctalbert_> fantasai: we'll probably have to do something similar

<krisk_> wilhelm: how does this handle local patches?

<ctalbert_> jhammel: is there a technical limitation to not have people editing the w3c tests

<ctalbert_> fantasai: no

<krisk_> fantasi: this is only for css which don't seem to have this problem

<ctalbert_> jgraham: probably make it a commit hook

<ctalbert_> ctalbert_: agreed

<ctalbert_> peter: if someone pushes to the approved directory without actually being approved then the system just automatically denies them

<ctalbert_> that may be incorrect ^ (scribe error)

<ctalbert_> wilhelm: might be an idea to split test suites down at lower granularity levels so that you can have test suites with differnt levels of maturity

<ctalbert_> jgraham: don't think that would make a difference tbh

<ctalbert_> peter: our repo would keep all the data from all the suites in the repo so that our build system could build any version of them from any suite

<ctalbert_> wilhelm: are there other things we can do to make it easier to contribute test suites?

<ctalbert_> fantasai: one problem on the mozilla side - there's no place to put tests that should go to the w3c - we depend on a manual process to sort out which should be submitted and then it is done later

<ctalbert_> fantasai: these tests just sit in a random place and are forgotten

<ctalbert_> fantasai: once we have a directory that goes to w3c and we tell the reviewers, then it will help quite a bit.

<ctalbert_> fantasai: the basic idea is to make the process obvious what developers need to do with that test to indicate that it is appropriate and ready for w3c then it should "just happen"

<ctalbert_> jgraham: we have a similar problem. it's hard to surface those tests and bugfixes without a policy and a place for those tests

<ctalbert_> peter: if we have a standard format among the test writers then it will be easier to help developers to upload the tests to the w3c. If the developers have to convert the tests it's too difficult and people won't expend the effort to make it happen

<ctalbert_> krisk_: sometimes it depends on the editors as to when they allow tests into the spec, and you find that tests sometimes lag the spec by quite a bit

<ctalbert_> fantasai: we found that with the css - the person writing the spec is often nominally tasked with also writing the test suite but because the skill sets are different and the spec editor is usually swamped, then the tests get neglected

<ctalbert_> fantasai: we really need a dedicated person to manage these tests and testing effort for each spec

<ctalbert_> MikeSmith: is there some way to motivate people to do that?

<ctalbert_> MikeSmith: maybe we should publicly track the testsuite owner?

<ctalbert_> fantasai: we can do that, but the burden is on getting resources for that, really.

<ctalbert_> MikeSmith: yeah, the question is how do you encourage the managers allow their people to spend times on w3c work

<ctalbert_> MichaelC_SJC: you might be able to convince your company to do that, but we also need to have the working group chairs understand that this needs to happen

<ctalbert_> jgraham: if we have them already in an interoperable format then it's pretty easy, but for our existing tests that are in a different format, we aren't going to spend the time to convert them

<ctalbert_> fantasai: we might just have a place at w3c to take those tests, and just post them publicly and have someone else do the conversion work

<ctalbert_> jgraham: I suspect that's a wide problem

<ctalbert_> krisk_: if you getin the habit of submitting stuff as you're doing development, tat seems reasonable.

<ctalbert_> krisk_: keeping things not super complex is a win, and being consistent will pay dividends

fantasai^: Because for Opera it may not be valuable to do the conversion, but e.g. Microsoft might want those tests, and decide that the cost of converting is less than the cost of rewriting tests from scratch, so to them it'll be worth it to do the conversion

<ctalbert_> fantasai: thanks, I'm not too good at this :/

<ctalbert_> (scribe note ^)

<ctalbert_> wilhelm: the more I think of this, the more I realize that facilitating the handover of tests is a full time job

<Zakim> MichaelC_SJC, you wanted to ask how much should there be a "W3C format" vs how much does W3C framework need to format (nearly) any format?

<ctalbert_> wilhelm: if we could get every browser vendor to commit one person to do this work on their team then that would be good.

<ctalbert_> fantasai: the problem we're at now, people havne't adopted the w3c ofrmats internally

<ctalbert_> it will be less work once that happens

<ctalbert_> it's not w3c's responsibility to convert your tests to w3c

<ctalbert_> fantasai: you can write a conversion script to convert your test to w3c format

<ctalbert_> better to do that than to have w3c to accept all differnt formats

<ctalbert_> jgraham: the problem is that many of these harnesses are not built for portability

<ctalbert_> MichaelC_SJC: the problem with a common format (and I may be wrong) is that you run into things you can't test

<ctalbert_> jgraham: if we run into that, then in that case maybe we can find some lightweight format for those tests, or in that case maybe we use a different type of harness

<ctalbert_> scribe: ctalbert has to step out

<ctalbert_> fantasai: ^

scribe:

kk: If you can write it with testharness.js, do that. If not, try reftest, if not, try self-describing test
... In your case you have the difficulty of needing a screenreader or something
...

jgraham: If you can get ppl to contribute in one format, at least you solve the problem once per platform rather than once per test

mc: I think there's a hierarchy of goodness
... The framework should have at least thepossibility of hooking in new formats

general agreement

wilhelm: For the Watir cases, we noticed areas where we'd want to addtests for something very obscure and specific. What we've done is add support at a low level in Opera and use an API
... Such things could be later added to WebDriver

<MichaelC_SJC> s/I think there's/I can agree with the idea that/

Alan: For tests where there isn't a w3c version, but browsers have something, is there a list of most-wanted specs that need tests on the w3c site

fantasai: All of them? :)

Alan: We were talking about poking ppl, committing ppl to translating browser tests to w3c tests
... Would be more successful to getting resources if we have a specific list of things we need

jj: Also possibility to ask specific people.
... Rather than saying, please call all submit tests for HTml5
... Say, can you submit tests for WebWorkers
... need a specific ask to get things done
... It might not cause immediate surge in test submissions, but for me from outside to inside, the idea of submitting tests was impossible to me. Didn't know where to submit them, figured they'd be rejected, didn't know what a reftest was, etc.
... So process was hard, and not being specific
... Better way to get things done is asking
... Would like Opera to submit WebWorker tests

wilhelm: Can I get that in writing so I can show it to my manager?

Alan: Identify the tests, see who has those tests, then request them

plh: We've been corrsponding on testing framework a little bit, but part of task is also going out there in the wild and finding tests and getting them to W3C
... Need to get to point where we have framework and start on asking tests

Alan: Use framework to identify areas, since it annotates the spec

jj: We have no idea how much coverage those 47 tests have -- number isn't meaningful from a coverage perspctive
... 1 is better than 0, but maybe 100 is needed not47

?: Test coverage is a negative covered only know when something is not covered, not how well something is covered

jj: Even if you say you have 100% on that normative statement, still doesn't tell you if you got all the edge cases

jgraham: At the moment for HTML we have nothing, though.

<simonstewart> ^^ simonstewart: test coverage is a negative thing. It'll only say what's not covered, not how well the covered areas are tested

jgraham: We have our tests organized by section in the repo, but it's not explicit
... Being able to say per normative statement, do we have a test for this, is pretty nice

<plh> --> http://www.w3.org/2011/10/timer.html (annoying) timer

jgraham: If you look somewhere, there's an annotation per sentence in the spec showing tests for section X
... But that's really complicated, because spec isn't marked up to make that easy
... and testing dozens of disconnected statements

kk: The problem we're struggling with is not that how do we get perfect coverage. There's a spec, and there's no coverage.
... Browsers all have this feature, and they don't work the same. So having some is a good start.

Bryan: If you look at most of WebAPIs near LC or at LC, only 1/3 have tests available

<jhammel> fantasai: setup a process for getting tests from *your* organization to w3c, and *going forward*, you should write w3c-submittable tests *and* submit the tests. Once that is in place, we can go back and convert legacy tests

<plh> s/corrsponding/working/

<jhammel> fantasai: we need to get the webkit people to commit to this

<jhammel> fantasai: you can require that when checked into repo, they become reftests

<jhammel> fantasai: plan going forward is to convert to reftest

<jhammel> jgraham: if you're comparing to something bitmap-based, it may take 2x time, but it will save time going forward

fantasai^: Because then the number of legacy tests that are not w3c-formatted stops growing, and we can work on making that number smaller

Additional Items

example of a test that has to be self-describing: This tests that the blurring algorithm produces results within 5% of a Gaussian blur

http://test.csswg.org/source/contributors/mozilla/submitted/css3-background/box-shadow/box-shadow-blur-definition-001.xht

bryan: We developed a number of specs for device APIs
... We recognize these APIs are quite sophisticated, an it'll take some time, but we're continuing the development of these capabilities for web runtimes
... We have developer program, global ... ecosystem

bryan (from AT&T): wanted very briefly ...

bryan: show you these links to the specs, the APIs, but more importantly the test framework
... Test framework is based on QUnit
... Pulls in a file from a test directory, which has the list of test associated with this particular API.
... Tests individual JS filesin the same directory
... will run them one by one
... This is packaged up as a widget file, whcih is available for download
... So we can run all the tests for example using this widget framework.

bryan shows pie charts of resutls

bryan: Automatically uploaded and made available to vendor

plh: Say 1000 tests for core web standards?

bryan: No for APIs
... What comes for underlying platform is inherently tested by that community
... We need to cover device variation
... identify things that we reference
... We have individual tests for these, test scripts
... this is more than acid level test, but not what we hope to see from W3C in long run
... We don't want to develop and maintain this level of detail in WAC. Want to leverage W3C test suites
... If you look at the tests, you can see for example the geolocation test suite, which we reference.
... We want to auto-generate the tests as widget

jj: So if hte test suite changes, do you update your widget?

bryan: Our goal is to create frameworks where we can pull in tests and run them in this runtime environment without havng to necessarily maintain the tests ourselves
... We would benefit from a common test framework
... What exactly these tests are is basically just a JS procedure
... We test existence of methods, call qunit functions for pass/fail, not necessarily married to this format, but it was the most common one at the time we developed this.
... So to summarize our goal is to have the scalability to support this widget-based ecosystem across dozens of devices across the world
... So we have to have scalability
... To depend on the core standards as something we don't spend a lot of effort on
... Duplicate things that eventually come from W3C.
... We'd like to see this developed at W3C so we can directly leverage it.

fantasai comments on how this shows having a few common formats is better than having w3c accept many similarly-capable formats -- it better supports reuse of the tests

Conclusions and Action Items

1. Vendors commit to running W3C tests

2. Vendors push internally to adopt W3C test formats

plh says W3C should make ti easier for vendors to import suites

fantasai: what does that entail?

plh: make guidelines for WG

jgraham: I feel the problem is more on our side than on W3C side

wilhelm, jgraham: but of course, using hg instead of cvs is important for tests

wilhelm: W3C should commit resources to get tests from vendors

plh: start with webapps

wilhelm: Any conclusions on WebDriver discussion?
... We commit to work on the spec, and get that into our browser

plh: MS and Apple should look into that

Mike: normal people at apple are interested, but they're not the ones who sign off on things

kk: Using testharness.js seems to me a very low-hanging fruit, rather than writing a whole bunch of APIs

<jhammel> "not buy Apple" would be more effective

wilhelm: There should be a spec that talks about it, for the IP stuff, we need to get a spec out so there's less risk for those implementing

jgraham: There was some discussion, but no decision, about which bindings W3C would accept tests in

wilhelm: I'd list that as an open issue

MikeSmith: We want to follow up with testing IG , [other grou]

s/grou/group/

MikeSmith: Spec discussion would go to [... mailing list ...]

wilhelm: Dumping ground for non-W3C-format tests

kk: You can put whatever you want in submitted folder

<MikeSmith> public-browser-tools-testing@w3.org

jgraham: It would be nice if ppl dump random test suites in random formats, to separate those out from thing sthat would be approved in roughly their current form

<MikeSmith> http://lists.w3.org/Archives/Public/public-browser-tools-testing/

kk: We should have an old_stuff directory

jgraham: And encourage people to dump stuff there

<MikeSmith> for the Testing IG, http://lists.w3.org/Archives/Public/public-test-infra/ and public-test-infra@w3.org

plh: We can associate a repo with the testing IG, and then anyone in that IG can push to the repo

<plh> ACTION: Mike to create mercurial repositories for Web Testing IG and Browser Tools WG [recorded in http://www.w3.org/2011/10/28-testing-minutes.html#action01]

fantasai: Should be clear that dumping things here is not the same as submitting to an official W3C test suite

bryan: Should also have a wiki that documents what's there

<ctalbert_> TabAtkins_: I accidentally locked myself on the patio, could you come rescue me?

jj: Right, should be clear these are not submitted for review; they're there, and someone can take them and convert them and submit them

<MikeSmith> http://www.w3.org/wiki/Testing

jgraham: Come up with a prioritized list of things that need tests

jj: anything that's in CR? :)

plh: I'll take an action item to do that

<scribe> ACTION: plh to make a list of things that need tests [recorded in http://www.w3.org/2011/10/28-testing-minutes.html#action02]

bryan: Need a list of what's available, what are the key gaps, what do we need to get there

kk: Identify specs that are in a bad situation.

fantasai: Also want to track not just what needs testing, but ask vendors whether they have tests for any of these.
... Can then go pester people to submit those tests

<scribe> ACTION: MikeSmith to Create repos for testing IG and testing framework group [recorded in http://www.w3.org/2011/10/28-testing-minutes.html#action03]

plh: Need places to dump tests for groups that don't have repos atm
... more and more groups have their own test repo

<plh> ACTION: plh to convince the geolocation WG to use mercurial for their tests [recorded in http://www.w3.org/2011/10/28-testing-minutes.html#action04]

3. Vendors commit to finding a person to facilitate submission and use of W3C tests

wilhelm: need to make a formal request to each organization

bryan: Someone should pull together format descriptions and include the guidelines

<plh> --> http://www.w3.org/html/wg/wiki/Testing/Authoring/ Authoring Tests

dicussion of where to collect this information

<plh> --> http://www.w3.org/testing/ Testing

jgraham: should be in a place not specific to a given working group
...

plinss: There's a lot to be gained by standardizing metadata

jgraham: hard to do the CSS way for an HTML test
... Could have n ways to do it, where n is a small number

Alan: It would be nice to have everything on a wiki so we don't have to go through a staff member
... What if this page was a redirect to a wiki?

jgraham: Could have that page be a link to a wiki

MikeSmith: I like redirect idea, minimizes work I have to do :)

wilhelm: So when should we meet again?

jj: I think we should definitely make this a regular meeting.
... Seems like everyone in every WG is going to be solving the same problems
...

plh: WebDriver will be under browser tools WG

mc: Who's "we"?

wilhelm: I don't know, but this crowd is great.

plh: We can put under the IG

fantasai: We can say at last meet again next TPAC

plh: Would be in France next year

fantasai: Since not everyone will be travelling to TPAC, would we want to do another place at at different time as well?

jj: Does everyone agree we should meet?

kk: Depends on deliverables.

MikeSmith: If we meet 6 months from now, when would that be?

?: April

mc: Just want to be sure who the "we" is the invite would go out to

wilhelm is designated in charge

Meeting closed.

RRSAgent: make minutes

Summary of Action Items

[NEW] ACTION: Mike to create mercurial repositories for Web Testing IG and Browser Tools WG [recorded in http://www.w3.org/2011/10/28-testing-minutes.html#action01]
[NEW] ACTION: MikeSmith to Create repos for testing IG and testing framework group [recorded in http://www.w3.org/2011/10/28-testing-minutes.html#action03]
[NEW] ACTION: plh to convince the geolocation WG to use mercurial for their tests [recorded in http://www.w3.org/2011/10/28-testing-minutes.html#action04]
[NEW] ACTION: plh to make a list of things that need tests [recorded in http://www.w3.org/2011/10/28-testing-minutes.html#action02]
 
[End of minutes]