This content is out of date. (2013-02-05)
This content is now even more out of date. (2014-01-07)
This page collects and categorizes information about goals and requirements for the common W3C test-suite framework for browser testing. [SAZ: afaik, the scope should not be limited to browser alone][JS: ATAG 2.0 and UAAG 2.0 will need to test authoring tools and media players in addition to browsers]
This version is an attempt by FD to merge previous version with the alternate approach to test requirements suggested by Michael Cooper, grouping requirements per functional unit so that they become actionable.
The testing framework is used below to mean the whole W3C test-suite framework that is being considered. It consists of:
- a Web test server to serve test cases over the Web
- a test runner to run a series of tests and gathering results for all of them
- a test case review mechanism to ensure the correctness of submitted test cases
- a test suite management system to ease management
- a reporting tool to produce implementation and interoperability reports
- a spec annotation tool to assess some level of spec coverage
List of known frameworks:
- CSS Framework (http://wiki.csswg.org/test)
[WAI comment: add "repository" to the list]
Requirements for the testing framework
The testing framework must be intended for Candidate Recommendation and post Candidate Recommendation phases
The test suite should be suitable to evaluate if the spec is implementable, but it should also be used to promote interoperability.
- testing of precise technical requirements such as parsing and validity rules
- testing of technical requirements that can only be tested in the context of other requirements.
- testing of more general requirements for specification conformance that cannot be evaluated with simply unit tests.
[WAI comment: clarify that this is of general value -- just wording issue with "must be"]
The testing framework must support simple and complex tests
It should be possible to run unit tests (e.g. testing the value of an attribute) as well as complex tests (e..g acid or stress tests).
The testing framework should be intended for user agent conformance testing
It may not be an immediate goal to perform user agent conformance testing, but the creation of a test harness naturally meets many of the requirements for this, and there is likely to be interest in using the test harness for this purpose.
The testing framework should help improve interoperability
While a W3C goal is to test specifications conformance, more important to the community may be interoperability testing. Knowing which user agents produce what results for a given test, regardless of specification requirements related to that test, allows identification of areas of generally consistent and generally inconsistent user agent behaviour.
See also Accessibility Support Database
The testing framework must distinguish the roles of test files, test cases, test suites, test results and provide respective repositories
The architecture must expose these classes even though some of these layers may be merged in practice to improve automation.
The testing framework must allow many-to-many relationships between test files, test cases, and test results
There should not be an assumption of one-to-one relationship between elements at the various layers. A given test case may require several test files. A given test file may be used by several test cases. A given test execution may be repeated by different users and results stored separately.
The testing framework must equally support test case metadata definitions in test files and external
To improve reuse of test files, test case metadata should be stored separately from test files when possible. Metadata stored within test files could also potentially introduce side effects on the test outcome.
Not withstanding the above, the harness must allow test case metadata to be included in test files as that can facilitate automation in various ways (authoring, review, execution).
The testing framework must be explicit about the test license
Contributors and users of the system must be clear about the license applied to content submitted to the repository.
The testing framework must allow for multiple test licenses
The testing framework must allow testing of different layers
For instance, network (HTTP, low bandwidth/latency, server throttling), syntax, DOM, layout model, rendering.
[WAI comment: should disambiguate the term "layer" in this context]
The testing framework must be able to serve test cases over the Web
See below for requirements for the Web test server.
The testing framework must use a decentralized version control system for test files and test cases
W3C uses Mercurial.
[WAI comment: seems overly restrictive as a core requirement; some W3C WGs use other systems]
The testing framework must include a test runner
See below for requirements for the test runner.
The testing framework must provide a mechanism for test case review
See below for requirements for the test case review mechanism.
The testing framework must provide a user-friendly tool to ease test suite management
See below for requirements for the test suite management system.
[WAI comment: assuming that accessibility of W3C systems is a given anyway]
The testing framework must provide a reporting tool
See below for requirements for the reporting tool.
The testing framework must provide "coverage" information
In order to know which areas of a spec are well-tested and hence have a sense for (an upper bound on) the completeness of a test suite as well as the areas where it would be most profitable to direct new testing effort, it would be beneficial to produce an annotated version of the spec that associates each testable assertion in the spec with a link to onr or more test cases for that assertion.
See below for requirements for spec annotation.
The testing framework must allow for direct contributions from external individuals or entities
The public at large should be able to submit test files, test cases, as well as test results.
Requirements for the Web test server
The Web test server must be able to run server-side scripts
The exact list of languages that the Web test server must support remains to be precised.
PHP and Python should be available.
XMLHttpRequest, CORS, EventSource, HTML5, Widgets WARP, and WCAG will all need a setup like this.
Note: We no longer support PHP on w3c-test.org. There was a builtin review process of the PHP code in the mercurial respository, but it is no longer relevant since the test suite has been converted to being self-hosting in Python.
The Web test server should pull out content from test case repository automatically
Test cases submitted to the test case repository should appear automatically on the Web server, except for test cases that make use server-side scripting, which should first be approved for security reasons.
See also Dom's 17-Feb-2011 clarifications regarding the constraints of these hosts for PHP usage.
[WAI comment: also client-side test cases need pre-approval for several reasons; and the review status of test cases must be clearly indicated to the repository user]
The Web test server must run on a dedicated domain name
For security reasons, the server must use a dedicated domain name.
The W3C Web test server, launched in February 2011 (see PLH's announcement), uses w3c-test.org.
[WAI comment: it may be good to reassure that the test files and procedures themselves will not be bound to a particular domain name]
The Web test server should allow to tweak configuration settings on a per test case basis
For instance, the Web test server should leave full control over media types and charsets, e.g. through the use of .htaccess configuration files.
The Web test server may need to run additional libraries
Some test suites may require the use of specific libraries. For instance, to test the Web Sockets protocol and its client API, a Web Sockets library needs to be installed such as http://code.google.com/p/pywebsocket/ (we might need something else, ideas?).
The Web test server must be available through different domain names
W3C Web test server exposes the following domain names for testing purpose as of 2011-06-07:
The Web test server must be available through different ports
HTTP servers for w3c-test.org are available on ports 80, 81, 82, and 83.
The Web test server must be available through HTTPS
Different certificates may be needed, such as a certificate with Extended Validation and an invalid certificate.
With SSL support:
Requirements for the test runner
The test runner is responsible for running a series of tests and gathering results for all of them.
[WAI comment: Requirements that begin "The test runner must..." seem to be requirements that it be possible to create test runners for that requirement. However, not all test runners may need to meet all of these requirements. Therefore suggest language like "It must be possible for test runners to...". We made this change the first time we encountered it but haven't done it for all of them yet.]
The test runner must support multiple test methods (including self-describing, reftest, and script)
The following test methods are considered.
aka human or manual tests.
This is the most basic level. A file (or more) is displayed and a human indicates if the test is passed or failed. Ideally, we should avoid those types of tests as much as possible since it requires a human to operate. Some folks want to have a comment field as well.
[WAI comment: s/A file (or more) is displayed and a human indicates if the test is passed or failed/A human is provided with one or more test files and a corresponding test procedure (which may be included as part of the test files), and is asked to indicate if the test passes or fails.]
Plain text output
This is equivalent as doing saveAsText on two files and comparing the output.
[WAI comment: a little unclear what is meant]
Two pages are displayed and the rendered pages are compared for differences.
For comparison, we might be able to use HTML5 Canvas, or an extension to get screenshots. Worth case scenario is to use a human to compare the rendered pages.
compare equivalent pages
(@@ through screen shots?) Not sure how this one differs from the one above...
Some engines could dump their in memory view/layout model, ie the one directly affecting the rendering.
The test result is established through scripting:
We're looking at using testharness.js for those. Note that it doesn't preclude human intervention sometimes, such as authorizing geo information, pressing a key or a button, etc.
The test runner must be able to load tests automatically based on manifest files
Manifest files should contain the metadata necessary to load the tests (URI, type, etc.)
The test runner must be able to order test cases smartly
Purely automated tests should be grouped together to avoid a situation where the user is solicited on a random basis. This may be done when creating manifest files.
The test runner must allow for tests to be run in random order and repetitively
The goal is to detect failure under certain conditions
The test runner must allow for complete and partial execution of tests
Selection of subset can be based on the metadata describing the test; for instance, to select all tests that apply to a certain feature, element, or other aspect of the test.
It must be possible to create test runners that work on various platforms
Test runners should be available that work on main operating systems (e.g. Windows, MacOS, Ubuntu), most user agents, and on various types of terminals (e.g. desktop, mobile).
Some environments might require specific developments. For instance, on mobile devices, test suites might need to be splitted or packaged differently after a certain size to cope with the limitations of the platform.
This requirement might be met by providing different test runners for different environments.
The test runner must provide some way to output collected results
This might either take the form of a raw text file format, XML, JSON, or internal database storage.
The test runner must allow for automatic and manual gathering of context information
This context information includes the browser versions, the OS platform, as well as relevant configuration settings and assistive technology if applicable.
The test runner must include context information in collected results
Result records must be complete with information about the test case, the tester, the revision if applicable, the user agent, etc.
The test runner must support positive and negative testing
- It must be possible to define positive tests of specification requirements.
- It must be possible to define negative tests that actively test failure to meet specification requirements or test error handling behaviour.
The test runner must support testing of time based information
The requirement is needed for SVG animation, HTML video for instance.
The test runner must allow a test to report its result automatically
Some hook must be available so that automated tests can report their results without human intervention.
The test runner must allow humans to report on manual test outcome
There should be some pass/fail/unknown submission procedure available for manual tests.
The test runner must allow reftests to be run by humans
Even if reftests can be automated, the test runner should provide a way for humans to report on a reftest, possibly switching between test view and reference view several times per second and asking if the user sees flickering.
Automatic running of reftests requires browser-specific code and is explicitly out of scope.
The test runner should allow for humans to comment on a test outcome
Allows a text comment field for human evaluator notes (e.g. test conditions, failure notes) on the individual test result that can be included in the reporting. E.g. they might write: "the authoring tool implements this SC with a button that automatically sends the content being edited to the XXX Checker accessibility checking service".
The test runner must allow tests to be created on smaller tests
This would allow one action to be repeated several times within the same test, for instance to detect failure under certain conditions.
The test runner must be usable by external entities and individuals
Note though that some test suites may need specific conditions to run.
Requirements for the test case review mechanism
The test case review mechanism must enable review without putting a Working Group on the critical path for every single test
See the work of the WCAG 2.0 Test Samples Development Task Force (TSD TF) which included the development of a review process that allowed the Task Force to pre-review tests yet allow the Working Group to make the final decision.
[WAI comment: we may also want to pursue public review and rating systems (though there are several concerns including critical mass to make the system useful, avoiding spam, avoiding disruptive or bogus entries]
The test case review mechanism must provide an easy way to submit a test
A Web author should be able to submit a test to the W3C. See also the Policies for Contribution of Test Cases to W3C.
The test case review mechanism must allow anyone to easily give feedback on tests
In particular, this should not be restricted to named reviewers or people with W3C accounts
The test case review mechanism should integrate with Mercurial
The distributed version control system should be used as much as possible.
Requirements for the test suite management system
The test suite management system must scale to a large number of tests
There may be more than 100,000 test cases per specification.
The test suite management system must track the state of test cases
Test cases may be:
- under review
The test suite management system should allow association of a test case with issues, action items or mailing-list threads
Integration with W3C tracker tool?
The test suite management system should allow stable dated release of test suites
Test suite revisions will be used in particular to link back collected results to the appropriate versions of a test suite and to create snapshots when needed (e.g. for an implementation report).
Requirements for the reporting tool
The reporting tool must be able to produce a machine-readable report
The actual format needs to be precised. It could be XML or non-XML. The Evaluation and Report Language (EARL) provides a machine-readable format for expressing test results in RDF with an XML serialization, for instance.
The output should be reusable by other applications. It should also be usable to answer questions such as:
- Is feature X supported on Browser 4.3?
- What does Browser 4.3 support?
The reporting tool should be able to produce an agglomerated report
Multiple test results may be available for a given test case. The reporting tool should be able to combine them and report most likely test outcome.
When multiple test results for a given test case exist, there must be a mechanism to compare results and determine an authoritative results. This must be limited to privileged users.
Requirements for the spec annotation tool
[WAI comment: it is important to further explain what the "spec annotation tool" is. Also, one should not assume that spec annotation is the only method for identifying testable statements from the spec.]
The spec annotation tool must map each test case onto a part of the spec
In turn, this creates a requirement on the metadata test cases must define. The definition of "part" is up to the spec under test. It may mean:
- the section that contains the conformance statement
- the paragraph that contains the conformance statement
- the conformance statement itself
The spec annotation tool must react smoothly to spec modifications, deletions, insertions and rearrangements
A one-word update should not invalidate the mapping.
Requirements for test cases and test files
Test cases must not depend on the test runner
A test may be able to generate its result automatically (such as Script test) or not (such as Self describing test). If it is automatic, it is the responsibility of the test to report its result to the test runner above it using some hook. Otherwise, it is the responsibility of the test runner to gather the result from an alternate source (such as a human).
Test cases should be designed for multiple purpose
Test files and test cases should be designed as neutrally as possible so they can be repurposed. Multiple Working Groups may have reasons to re-use test files and should not be forced to create redundant versions. Even within a specification, a given test file may be used to test multiple things.
Test cases must have a unique ID
Test cases (and test files) must have a unique ID. A URI may be sufficient for test files. The ID should not be expected to contain metadata about the test in its lexical form, although as a convenience many IDs may have some structure.
Test cases must identify the relevant specification section(s) and/or conformance statement(s) under test
The targeted granularity may vary depending on the specification. For some specification, it may be enough to link back to the section that contains the conformance statement. For other specifications, a more precise link to the actual conformance statement may be needed.
[WAI comment: this relates to the spec annotation and this relationship should be explicit and clearly explained]
Note a test case may apply to more than one specification.
[WAI comment: it is mainly test files rather than test cases that may apply to more than one specification]
Test cases may apply to the same conformance statement as other test cases
There may be more than one test cases per conformance statement.
Test files may depend on other test files
Test files consisting of a single file (singleton test files) are preferred for simplicity and portability, but it must be possible for test files to have dependencies on external resources such as images, scripts, etc.
It must be possible for resources, such as images, scripts, etc., to be shared by multiple test files. The test file repository structure must accommodate actual "test files" as well as resources that are not themselves considered test files.
Test files may generate test files
Some of the test files may be generators for a collection of test files and test cases created e.g. by varying a single parameter.
[WAI comment: this may interfere with the requirement for unique and constant identifiers for test cases]
Requirements and ideas not yet categorized
- allow more than one way to test functionality
- tests that require a top level browsing context
- be suitable for HTTP 1.1, HTML5, CSS 2.1, CSS 3, ES5, Web APIs (HTML DOM, DOM L2, Selectors, Geolocation, XHR, etc.), MathML 1.0, SVG 1.1, Web sockets Protocols, etc.
- ideally, the browser vendors should help us getting what we need to run the tests on their products.
- How can the framework help ensure the completeness of a test suite with regards to a particular specification?
- regroup a set of existing tests from different sources (DOM, CSS, SVG, HTML, etc.). Can we create a test runner to run them all? Is it possible to convert them?
- regroup the set of metadata needed/provided in the existing testing framework/tests.
Existing work to consider:
- CSS test suite ([MWI testing framework (self describing and DOM testing), metadata associated with tests, test format, submission process, review process)
- SVG test suite
- jQuery test suite (DOM testing)
- Selectors API test suite
- webkit test suite (testing methods, metadata associated with tests, test format)
- mozilla test suite (testing methods, metadata associated with tests, test format)
- QA Framework: Test Guidelines
- QA taxonomy of tests
- Conformance Test Suites for mobile web technologies
- Mobile Web Test Suites Working Group
- WHATWG Test suite
- Windows Internet Explorer Testing Center
- Browser Tests
that one is highly interesting. Looks like the guy is trying to do what we need for pixel comparison
- <canvas> tests
- HTML 4.01 Test Suite - Assertions
- Brad Pettit presentation during a DOM face-to-face meeting
- Design Notes for a Test Review System
- Test Swarm
- SVG Conformance Test Suite
- Watir and FireWatir. Watir allows one to automate tests using Watir drives browsers the same way people do. It clicks links, fills in forms, presses buttons. Watir also checks results, such as whether expected text appears on the page. It does not seem to provide screenshot facilities unfortunately. No support for Safari on Windows?
- Browserscope is a community-driven project for profiling web browsers. The goals are to foster innovation by tracking browser functionality and to be a resource for web developers.
- WCAG 2.0 Test Samples include a metadata format for describing the tests and a review process to review the tests without bottlenecking a Working Group, yet comply with the W3C Process
- W3C Evaluation and Report Language (EARL) 1.0 is a machine-readable format for expressing test results from quality assurance reviews (including accessibility) -- this can be used to output test results