This document describes the infrastructure and test development plan and budget for the 2013 Open Web Platform Testing Effort.
This document is merely a W3C-internal document. It has no official standing of any kind and does not represent consensus of the W3C Membership.
More precise metrics TBD.
This sections describes the different components of the testing infrastructure and estimates their costs.
This includes basic infrastructure for hosting markdown and HTML pages, simple navigation, and overall website design and branding.
The documentation center contains documentation on authoring, submitting and reviewing tests, along with documentation on the various APIs, the test runner, etc.
This is a database containing references to all W3C and non-W3C specs referenced by W3C specifications. It allows basic CRUD editing, and exposes an API used by various components of the testing infrastructure and spec authoring tools such as ReSpec.
Spec coverage is determined by comparing different heuristics with the number of existing tests for each section and subsections of a spec. This process is fully automated, but can be overridden in specific areas through manual input (usually by test coordinators).
Test coverage data is available through a JSON API, as a widget (which can be embedded directly in specs or on webplatform.org), and in a dashboard which allows getting coverage info at the spec level or drilling down to find the precise coverage of nested subsections.
Ref tests maybe be fully automated in WebDriver-capable browsers. However, this cannot be done by simply visiting an URL. The test runner has to be installed locally to run ref tests using WebDriver. Note that testharness.js and manual tests can also be run in that scenario. Although much more complex to install and use than regular web-based testing, WebDriver is an interesting solution to run ref tests. It also seems it could enable automated testing of certain accessibility specs such as [WAI-ARIA], by providing access to the accessibility tree.
The test runner relies on a server module that is specially designed to allow testing Open Web Platform specs such as [XMLHTTPREQUEST] or [POSTMSG].
An instance of the Web test runner is hosted on the testing website. Test run results are stored in a database.
Semi-automated testing and manual testing are both needed for testing implementations of certain types of accessibility-supporting features. This requirement is enabled through a dedicated test runner which is integrated with the rest of the testing infrastructure, and which will provide integrated output on test results.
Test results are stored in a database along with information about the user agent that was tested (such as the user agent string).
Test result data is available through a JSON API, as a widget (which can be embedded directly in specs or on webplatform.org), and in a dashboard which displays support status of features for each browser.
Test management is done through Git and GitHub. In order to mitigate the risk of relying on tools provided by a third party, the test repository is synced to W3C infrastructure. Comments on pull requests and issues are archived using GitHub's API.
A budget is reserved to help working groups which are relying heavily on other system to migrate.
The Website also offers the possibility to link GitHub and W3C accounts together, sign-up for writing tests, and showcases contributors and test coordinators within a dashboard.
Experience shows test review is the main bottleneck to increase coverage. A number of tools and process change are designed to simplify the work of the reviewer. This system is designed to be easily extensible.
It includes a continuous integration solution where submissions are automatically tested against a subset of user agents and stress-testing to reject unstable tests upfront. This solution can also be used to seed the test results database and not rely solely on crowd-sourcing test runs to do so.
Other tools can be used to verify metadata, validate markup, run linters, check CLA, etc.
Results are presented in a publicly accessible dashboard and are linked from the GitHub pull request.
Improving the [WEBIDL] parser and idlharness to generate better and more numerous test cases is high leverage as it impacts all specs which have a dependency on [WEBIDL].
Currently, idlharness is a client-side tool. Adding a server-side mode could allow pre-building the tests which would speed up test runs.
We are focusing our effort on automate-able unit tests (in-depth conformance testing), testable through either User Agent scripting (testharness.js tests) or WebDriver (ref tests). This does not rule out any manual tests per se, but makes them the exception rather than the norm.
The total scope of this effort is the union of Coremob and TV profiles. Whether this scope can be met is essentially a matter of how much funding is obtained. Obviously, the larger the funding, the more comprehensive the effort. We described below alternate milestones which we might aim for depending on the level of funding and of other aspects such as preserving the community, etc.
In order to provide the best cost estimate possible, we have devised a tool that parses the specifications and, through different heuristics, estimates the number of tests necessary for each section and subsection of the document.
Through various means, we obtained data on the number of exiting tests and submissions for each of these tests and derived the number of tests that still needed to be developed per specification.
We estimated the cost of writing and reviewing each of these tests, accounting for the various benefits our planned infrastructure would provide.
We then made projections for best and worst case scenarios which we used in the below estimations.
In order to account for different funding amounts, we divided the project in three levels. Note that none of these estimate account for some of the unknown costs described in section 4.4.4 Accounting for Unknowns, nor do they consider external contribution (notably from vendors) or test repurposing as described in section 4.4.3 Repurposing existing tests.
This first level focuses on [HTML5] and a small number of other specs, picked to help validate our estimation model and develop a good understanding of the different types of testing.
A second level which increase the scope of the effort to the intersection of the Coremob and TV profiles and which comprises all the specs listed in section A. Intersection of the Coremob and TV Profiles (this amount includes the estimated cost of the [HTML5] spec listed in section 4.3.1 Mostly HTML5 above).
A third level which includes the specs described in section B. Symmetric Difference of the Coremob and TV Profiles to cover the union of the two profiles.
Choosing which level to aim for is essentially a matter of how much funding is available. Yet there are a number of other factors to keep in mind. Some are listed below.
One of the goals of this effort is to foster a community around testing that can continue its work and increase its effort in both breadth (as new specs mature) and depth (looking into performance, quality of implementation and interaction testing, notably). Aggressively out-sourcing all the specs might hinder the development of this community. We need to find the right balance between out-sourcing to meet the aggressive deadlines some of our members have in mind and crowd-sourcing in order to foster this nascent community.
We are at the very beginning of this effort and are yet to understand and account for the contributions made by implementors, Working Group participants, and Web developers. The infrastructure and process changes we're planning to make will drive these contributions up, that's for sure, but we do not yet know by how much.
There are significant opportunities to increase test coverage at lower cost by repurposing existing tests and taping into not yet reviewed submissions. There are costs associated with these efforts that need to be estimated, but they can substantially lower the cost of test acquisition.
We are still looking for data on existing tests for a number of specs and/or our tuning our test coverage tool to gather better test requirements. These specs are marked by the dagger symbol(‡) in section 4.3.2 Intersection of the Coremob and TV profiles.
In view of the above, we recommend raising for test development, pick the low hanging fruits described above (repurposing tests, submissions from implementors, etc) and focus on developing tests for [HTML5] and a small number of other, diverse enough specs. This will help us validate our model, refine our estimates, and understand the progress we can make through crowd-sourcing and other contributions. We can provide an update late Q3 2013, and re-adjust funding then.
A community manager or dedicated task force is needed to foster a community around testing. This should represent a 50% FTE but could be spread among multiple people.
As discussed during out January meeting, contributions from implementors can be an important source of tests.
After further discussion with implementors, there's clear agreement that the real values lies in:
This requires guarantees in terms of the quality of the tests contained in this repository. These are covered in section 2.7 Continuous Integration & Review tools.
In order to enable this key effort to succeed, implementors need to dedicate resources to:
This list all of the specs which are present in both profiles and that we intend to test.
Specs marked by the symbol (‡) are specs where were cost estimates are still pending.
This list all of the specs which are present in either one profile or the other, but not both.
This list all of the specs which are present in either one of the profiles but which we won't be developing tests for as they are not W3C specs. Where possible and when licensing permits, we will enable running existing these suites for these specs on our infrastructure.
$Revision: 1.4 $ of $Date: 2013-04-23 18:01:47 $