Testing/Requirements

From W3C Wiki

This content is out of date. (2013-02-05)

This content is now even more out of date. (2014-01-07)


This page collects and categorizes information about goals and requirements for the common W3C test-suite framework for browser testing. [SAZ: afaik, the scope should not be limited to browser alone][JS: ATAG 2.0 and UAAG 2.0 will need to test authoring tools and media players in addition to browsers]

This version is an attempt by FD to merge previous version with the alternate approach to test requirements suggested by Michael Cooper, grouping requirements per functional unit so that they become actionable.

Overview

The testing framework is used below to mean the whole W3C test-suite framework that is being considered. It consists of:

  • a Web test server to serve test cases over the Web
  • a test runner to run a series of tests and gathering results for all of them
  • a test case review mechanism to ensure the correctness of submitted test cases
  • a test suite management system to ease management
  • a reporting tool to produce implementation and interoperability reports
  • a spec annotation tool to assess some level of spec coverage

List of known frameworks:

[WAI comment: add "repository" to the list]

Requirements

Requirements for the testing framework

The testing framework must be intended for Candidate Recommendation and post Candidate Recommendation phases

The test suite should be suitable to evaluate if the spec is implementable, but it should also be used to promote interoperability.

This includes:

  • testing of precise technical requirements such as parsing and validity rules
  • testing of technical requirements that can only be tested in the context of other requirements.
  • testing of more general requirements for specification conformance that cannot be evaluated with simply unit tests.

[WAI comment: clarify that this is of general value -- just wording issue with "must be"]

The testing framework must support simple and complex tests

It should be possible to run unit tests (e.g. testing the value of an attribute) as well as complex tests (e..g acid or stress tests).

The testing framework should be intended for user agent conformance testing

It may not be an immediate goal to perform user agent conformance testing, but the creation of a test harness naturally meets many of the requirements for this, and there is likely to be interest in using the test harness for this purpose.

The testing framework should help improve interoperability

While a W3C goal is to test specifications conformance, more important to the community may be interoperability testing. Knowing which user agents produce what results for a given test, regardless of specification requirements related to that test, allows identification of areas of generally consistent and generally inconsistent user agent behaviour.

See also Accessibility Support Database

The testing framework must distinguish the roles of test files, test cases, test suites, test results and provide respective repositories

The architecture must expose these classes even though some of these layers may be merged in practice to improve automation.

The testing framework must allow many-to-many relationships between test files, test cases, and test results

There should not be an assumption of one-to-one relationship between elements at the various layers. A given test case may require several test files. A given test file may be used by several test cases. A given test execution may be repeated by different users and results stored separately.

The testing framework must equally support test case metadata definitions in test files and external

To improve reuse of test files, test case metadata should be stored separately from test files when possible. Metadata stored within test files could also potentially introduce side effects on the test outcome.

Not withstanding the above, the harness must allow test case metadata to be included in test files as that can facilitate automation in various ways (authoring, review, execution).

The testing framework must be explicit about the test license

Contributors and users of the system must be clear about the license applied to content submitted to the repository.

The testing framework must allow for multiple test licenses

See [multiple test licenses

The testing framework must allow testing of different layers

For instance, network (HTTP, low bandwidth/latency, server throttling), syntax, DOM, layout model, rendering.

[WAI comment: should disambiguate the term "layer" in this context]

The testing framework must be able to serve test cases over the Web

See below for requirements for the Web test server.

The testing framework must use a decentralized version control system for test files and test cases

W3C uses Mercurial.

[WAI comment: seems overly restrictive as a core requirement; some W3C WGs use other systems]

The testing framework must include a test runner

See below for requirements for the test runner.

The testing framework must provide a mechanism for test case review

See below for requirements for the test case review mechanism.

The testing framework must provide a user-friendly tool to ease test suite management

See below for requirements for the test suite management system.

[WAI comment: assuming that accessibility of W3C systems is a given anyway]

The testing framework must provide a reporting tool

See below for requirements for the reporting tool.

The testing framework must provide "coverage" information

In order to know which areas of a spec are well-tested and hence have a sense for (an upper bound on) the completeness of a test suite as well as the areas where it would be most profitable to direct new testing effort, it would be beneficial to produce an annotated version of the spec that associates each testable assertion in the spec with a link to onr or more test cases for that assertion.

See below for requirements for spec annotation.

The testing framework must allow for direct contributions from external individuals or entities

The public at large should be able to submit test files, test cases, as well as test results.


Requirements for the Web test server

The Web test server must be able to run server-side scripts

The exact list of languages that the Web test server must support remains to be precised. PHP and Python should be available.

XMLHttpRequest, CORS, EventSource, HTML5, Widgets WARP, and WCAG will all need a setup like this.

Note: We no longer support PHP on w3c-test.org. There was a builtin review process of the PHP code in the mercurial respository, but it is no longer relevant since the test suite has been converted to being self-hosting in Python.

The Web test server should pull out content from test case repository automatically

Test cases submitted to the test case repository should appear automatically on the Web server, except for test cases that make use server-side scripting, which should first be approved for security reasons.

See also Dom's 17-Feb-2011 clarifications regarding the constraints of these hosts for PHP usage.

[WAI comment: also client-side test cases need pre-approval for several reasons; and the review status of test cases must be clearly indicated to the repository user]

The Web test server must run on a dedicated domain name

For security reasons, the server must use a dedicated domain name.

The W3C Web test server, launched in February 2011 (see PLH's announcement), uses w3c-test.org.

[WAI comment: it may be good to reassure that the test files and procedures themselves will not be bound to a particular domain name]

The Web test server should allow to tweak configuration settings on a per test case basis

For instance, the Web test server should leave full control over media types and charsets, e.g. through the use of .htaccess configuration files.

The Web test server may need to run additional libraries

Some test suites may require the use of specific libraries. For instance, to test the Web Sockets protocol and its client API, a Web Sockets library needs to be installed such as http://code.google.com/p/pywebsocket/ (we might need something else, ideas?).

The Web test server must be available through different domain names

Different domains, e.g. http://foo.example.org vs http://bar.example.org, but also http://example.org vs http://example.invalid (different as far as http://publicsuffix.org/ is concerned)

W3C Web test server exposes the following domain names for testing purpose as of 2011-06-07:

The Web test server must be available through different ports

e.g. http://w3c-test.org:80 vs http://w3c-test.org:81

HTTP servers for w3c-test.org are available on ports 80, 81, 82, and 83.

The Web test server must be available through HTTPS

Different certificates may be needed, such as a certificate with Extended Validation and an invalid certificate.

With SSL support:

Requirements for the test runner

The test runner is responsible for running a series of tests and gathering results for all of them.

[WAI comment: Requirements that begin "The test runner must..." seem to be requirements that it be possible to create test runners for that requirement. However, not all test runners may need to meet all of these requirements. Therefore suggest language like "It must be possible for test runners to...". We made this change the first time we encountered it but haven't done it for all of them yet.]

The test runner must support multiple test methods (including self-describing, reftest, and script)

The following test methods are considered.

Self describing

aka human or manual tests.

This is the most basic level. A file (or more) is displayed and a human indicates if the test is passed or failed. Ideally, we should avoid those types of tests as much as possible since it requires a human to operate. Some folks want to have a comment field as well.

[CSS 2.1 test]

[WAI comment: s/A file (or more) is displayed and a human indicates if the test is passed or failed/A human is provided with one or more test files and a corresponding test procedure (which may be included as part of the test files), and is asked to indicate if the test passes or fails.]

Plain text output

This is equivalent as doing saveAsText on two files and comparing the output.

[WAI comment: a little unclear what is meant]

Reftest

Two pages are displayed and the rendered pages are compared for differences.

For comparison, we might be able to use HTML5 Canvas, or an extension to get screenshots. Worth case scenario is to use a human to compare the rendered pages.

[test] [reference]

See also

compare equivalent pages

(@@ through screen shots?) Not sure how this one differs from the one above...

Descriptive dump

Some engines could dump their in memory view/layout model, ie the one directly affecting the rendering.

Script

The test result is established through scripting:

  • compare two DOM trees using Javascript for differences,
  • test the result of a javascript function or attribute,
  • etc.

We're looking at using testharness.js for those. Note that it doesn't preclude human intervention sometimes, such as authorizing geo information, pressing a key or a button, etc.

[HTML test] [HTML parser test]

The test runner must be able to load tests automatically based on manifest files

Manifest files should contain the metadata necessary to load the tests (URI, type, etc.)

The test runner must be able to order test cases smartly

Purely automated tests should be grouped together to avoid a situation where the user is solicited on a random basis. This may be done when creating manifest files.

The test runner must allow for tests to be run in random order and repetitively

The goal is to detect failure under certain conditions

The test runner must allow for complete and partial execution of tests

Selection of subset can be based on the metadata describing the test; for instance, to select all tests that apply to a certain feature, element, or other aspect of the test.

It must be possible to create test runners that work on various platforms

Test runners should be available that work on main operating systems (e.g. Windows, MacOS, Ubuntu), most user agents, and on various types of terminals (e.g. desktop, mobile).

Some environments might require specific developments. For instance, on mobile devices, test suites might need to be splitted or packaged differently after a certain size to cope with the limitations of the platform.

This requirement might be met by providing different test runners for different environments.

The test runner must provide some way to output collected results

This might either take the form of a raw text file format, XML, JSON, or internal database storage.

The test runner must allow for automatic and manual gathering of context information

This context information includes the browser versions, the OS platform, as well as relevant configuration settings and assistive technology if applicable.

The test runner must include context information in collected results

Result records must be complete with information about the test case, the tester, the revision if applicable, the user agent, etc.

The test runner must support positive and negative testing

  • It must be possible to define positive tests of specification requirements.
  • It must be possible to define negative tests that actively test failure to meet specification requirements or test error handling behaviour.

The test runner must support testing of time based information

The requirement is needed for SVG animation, HTML video for instance.

The test runner must allow a test to report its result automatically

Some hook must be available so that automated tests can report their results without human intervention.

The test runner must allow humans to report on manual test outcome

There should be some pass/fail/unknown submission procedure available for manual tests.

The test runner must allow reftests to be run by humans

Even if reftests can be automated, the test runner should provide a way for humans to report on a reftest, possibly switching between test view and reference view several times per second and asking if the user sees flickering.

Automatic running of reftests requires browser-specific code and is explicitly out of scope.

The test runner should allow for humans to comment on a test outcome

Allows a text comment field for human evaluator notes (e.g. test conditions, failure notes) on the individual test result that can be included in the reporting. E.g. they might write: "the authoring tool implements this SC with a button that automatically sends the content being edited to the XXX Checker accessibility checking service".

The test runner must allow tests to be created on smaller tests

This would allow one action to be repeated several times within the same test, for instance to detect failure under certain conditions.

The test runner must be usable by external entities and individuals

Note though that some test suites may need specific conditions to run.

Requirements for the test case review mechanism

The test case review mechanism must enable review without putting a Working Group on the critical path for every single test

See the work of the WCAG 2.0 Test Samples Development Task Force (TSD TF) which included the development of a review process that allowed the Task Force to pre-review tests yet allow the Working Group to make the final decision.

[WAI comment: we may also want to pursue public review and rating systems (though there are several concerns including critical mass to make the system useful, avoiding spam, avoiding disruptive or bogus entries]

The test case review mechanism must provide an easy way to submit a test

A Web author should be able to submit a test to the W3C. See also the Policies for Contribution of Test Cases to W3C.

The test case review mechanism must allow anyone to easily give feedback on tests

In particular, this should not be restricted to named reviewers or people with W3C accounts

The test case review mechanism should integrate with Mercurial

The distributed version control system should be used as much as possible.

Requirements for the test suite management system

The test suite management system must scale to a large number of tests

There may be more than 100,000 test cases per specification.

The test suite management system must track the state of test cases

Test cases may be:

  • under review
  • approved
  • rejected

The test suite management system should allow association of a test case with issues, action items or mailing-list threads

Integration with W3C tracker tool?

The test suite management system should allow stable dated release of test suites

Test suite revisions will be used in particular to link back collected results to the appropriate versions of a test suite and to create snapshots when needed (e.g. for an implementation report).


Requirements for the reporting tool

The reporting tool must be able to produce a machine-readable report

The actual format needs to be precised. It could be XML or non-XML. The Evaluation and Report Language (EARL) provides a machine-readable format for expressing test results in RDF with an XML serialization, for instance.

The output should be reusable by other applications. It should also be usable to answer questions such as:

  • Is feature X supported on Browser 4.3?
  • What does Browser 4.3 support?

The reporting tool should be able to produce an agglomerated report

Multiple test results may be available for a given test case. The reporting tool should be able to combine them and report most likely test outcome.

The reporting tool should support authoritative result

When multiple test results for a given test case exist, there must be a mechanism to compare results and determine an authoritative results. This must be limited to privileged users.


Requirements for the spec annotation tool

[WAI comment: it is important to further explain what the "spec annotation tool" is. Also, one should not assume that spec annotation is the only method for identifying testable statements from the spec.]

The spec annotation tool must map each test case onto a part of the spec

In turn, this creates a requirement on the metadata test cases must define. The definition of "part" is up to the spec under test. It may mean:

  • the section that contains the conformance statement
  • the paragraph that contains the conformance statement
  • the conformance statement itself

The spec annotation tool must react smoothly to spec modifications, deletions, insertions and rearrangements

A one-word update should not invalidate the mapping.

Requirements for test cases and test files

Test cases must not depend on the test runner

A test may be able to generate its result automatically (such as Script test) or not (such as Self describing test). If it is automatic, it is the responsibility of the test to report its result to the test runner above it using some hook. Otherwise, it is the responsibility of the test runner to gather the result from an alternate source (such as a human).

Test cases should be designed for multiple purpose

Test files and test cases should be designed as neutrally as possible so they can be repurposed. Multiple Working Groups may have reasons to re-use test files and should not be forced to create redundant versions. Even within a specification, a given test file may be used to test multiple things.

Test cases must have a unique ID

Test cases (and test files) must have a unique ID. A URI may be sufficient for test files. The ID should not be expected to contain metadata about the test in its lexical form, although as a convenience many IDs may have some structure.

Test cases must identify the relevant specification section(s) and/or conformance statement(s) under test

The targeted granularity may vary depending on the specification. For some specification, it may be enough to link back to the section that contains the conformance statement. For other specifications, a more precise link to the actual conformance statement may be needed.

[WAI comment: this relates to the spec annotation and this relationship should be explicit and clearly explained]

Note a test case may apply to more than one specification.

[WAI comment: it is mainly test files rather than test cases that may apply to more than one specification]

Test cases may apply to the same conformance statement as other test cases

There may be more than one test cases per conformance statement.

Test files may depend on other test files

Test files consisting of a single file (singleton test files) are preferred for simplicity and portability, but it must be possible for test files to have dependencies on external resources such as images, scripts, etc.

Test files may depend on shared resources

It must be possible for resources, such as images, scripts, etc., to be shared by multiple test files. The test file repository structure must accommodate actual "test files" as well as resources that are not themselves considered test files.

Test files may generate test files

Some of the test files may be generators for a collection of test files and test cases created e.g. by varying a single parameter.

[WAI comment: this may interfere with the requirement for unique and constant identifiers for test cases]

Requirements and ideas not yet categorized

  • allow more than one way to test functionality
  • tests that require a top level browsing context
  • be suitable for HTTP 1.1, HTML5, CSS 2.1, CSS 3, ES5, Web APIs (HTML DOM, DOM L2, Selectors, Geolocation, XHR, etc.), MathML 1.0, SVG 1.1, Web sockets Protocols, etc.
  • ideally, the browser vendors should help us getting what we need to run the tests on their products.
  • How can the framework help ensure the completeness of a test suite with regards to a particular specification?
  • regroup a set of existing tests from different sources (DOM, CSS, SVG, HTML, etc.). Can we create a test runner to run them all? Is it possible to convert them?
  • regroup the set of metadata needed/provided in the existing testing framework/tests.


See also

Existing work to consider:

  that one is highly interesting. Looks like the guy is trying to do what we need for pixel comparison