Testing/Requirements

From W3C Wiki
< Testing
Revision as of 15:37, 9 May 2011 by Plehegar (Talk | contribs)

Jump to: navigation, search

This page collects and categorizes information about goals and requirements for the common W3C test-suite framework for browser testing. [SAZ: afaik, the scope should not be limited to browser alone][JS: ATAG 2.0 and UAAG 2.0 will need to test authoring tools and media players in addition to browsers]

Alternate approach to test requirements suggested by Michael Cooper

Test-case serving (Web server)

The following are requirements for any mechanisms for serving test cases over the Web.

Basics

Server separate from w3.org that can run server-side scripts (e.g. PHP / Python), is backed by Mercurial or Git, and has decent amount of freedom when it comes to configuring details via .htaccess. E.g. full control over media types and charsets.

XMLHttpRequest, CORS, EventSource, HTML5, Widgets WARP will all need a setup like this.

On 14-Feb-2011, PLH announced the following are "live" e.g. for testing:

See also Dom's 17-Feb-2011 clarifications re the constraints of these hosts e.g. PHP only.

WebSockets

To test the WebSocket protocol and its client API we would need to install e.g. http://code.google.com/p/pywebsocket/ on top of the basic server and run it.

Might need something else: http://code.google.com/p/pywebsocket/issues/detail?id=65 Ideas?

Security

To test browser security more completely when it comes to HTTP you need the following:

Test-case execution (in-browser/client)

The current test runner used by the HTML WG is optimised mainly for manual tests. An in-browser test runner that deals automatically with the results of javascript tests is essential. It should also deal with self-describing reftests and manual tests in a sane way.

The following requirements are not categorized yet:

  • allow tests to be created based on smaller tests (this would allow one action to be repeated several times within the same test) (detect failure under certain conditions)
  • allow testing of error handling
  • allow testing of time based information (SVG animation, HTML video)
  • allow more than one way to test functionality:
  • allow direct contributions to the test-runner and framework code from external individuals or entities

And here are some possible considerations:

  • Tests that require a top level browsing context

Test methods

A test must NOT depend on the test runner used to run a set of tests. A test may be able to generate its result automatically (such as Script test) or not (such as Self describing test). If it is automatic, it is the responsibility of the test to report its result to the test runner above it. Otherwise, it is the responsibility of the test runner to gather the result from an alternate source (such as a human).

Self describing

aka human or manual tests.

This is the most basic level. A file (or more) is displayed and a human indicates if the test is passed or failed. Ideally, we should avoid those types of tests as much as possible since it requires a human to operate. Some folks want to have a comment field as well.

[CSS 2.1 test]

Plain text output

This is equivalent as doing saveAsText on two files and comparing the output.

Reftest

Two pages are displayed and the rendered pages are compared for differences.

For comparison, we might be able to use HTML5 Canvas, or an extension to get screenshots. Worth case scenario is to use a human to compare the rendered pages.

[test] [reference]

compare equivalent pages

(@@ through screen shots?) Not sure how this one differs from the one above...

Descriptive dump

Some engines could dump their in memory view/layout model, ie the one directly affecting the rendering.

Script

The test result is established through scripting:

  • compare two DOM trees using Javascript for differences,
  • test the result of a javascript function or attribute,
  • etc.

We're looking at using testharness.js for those. Note that it doesn't preclude human intervention sometimes, such as authorizing geo information, pressing a key or a button, etc.

[HTML test] [HTML parser test]

Test Runner

The test runner (see diagram) is responsible for running a series of tests and gathering the results for all of them.

  • Loads tests automatically based on test manifest files containing metadata about test uri, type, etc.
  • Allows all or a subset of tests to be run [SAZ: selection of subset is based on the metadata describing the test; for instance, to select all tests that apply to a certain feature, element, or other aspect of the test]
  • allow tests to be run in random order and repetitively (detect failure under certain conditions)
  • allow the test suite to be run on multiple platforms (mobiles, windows, mac os, ubuntu)
  • Output the results in some way (XML, json, database?)

List of known test runners:


Gathering results

  • Allow a test to report its result automatically, such as hooks into testharness to extract the results of the javascript tests without human intervention
  • Allows manual tests to be run by humans, ie have pass/fail/unknown buttons.
  • Allows reftests to be run by humans, e.g., by automatically switching between test view and ref view several times per second and asking the user if they see flickering (automatic running of reftests will require browser-specific code and is explicitly out of scope) [SAZ: does not have to be automatic switching -- could also have the user manual switch between the views to compare the outputs]
  • [SAZ: allow automatic and manual gathering of context information, such as the browser version, OS platform, and relevant configuration settings and assistive technology if applicable]
  • [JS: Allows a text comment field for human evaluator notes (e.g test conditions, failure notes) on the individual test result that can be included in the reporting. E.g. they might write: "the authoring tool implements this SC with a button that automatically sends the content being edited to the XXX Checker accessibility checking service".]

Test-case review

Most test review in many working groups is currently done informally via a mailing list. This doesn't work so well, especially for for large testsuites. Maybe there is an existing tool that can help us here.

Requirements:

Test-case management/tracking

  • allow management 100,000 or more tests per spec
  • track the state of a test (under review, (approved, rejected))+
  • associate issues, action items, mailing list threads to tests (integration with W3C tracker?)
  • allow stable dated release of test suites. @@version control per test?
  • track the state of a test suite (use case: browser vendors want to track changes to test suite in order to stay in sync)

Test-results reporting/output

  • Produce a machine-readable report in some format (could be current XML or some other possibly non-XML format). [SAZ: the Evaluation and Report Language (EARL) provides a machine-readable format for expressing test results (in RDF but with an XML serialization]
  • output should be resuable by other applications (such as validators? [SAZ: yes, and accessibility checkers etc.]) or in answering questions such as "is feature X supported on Browser 4.3? What does Browser 4.3 support?"

Test-case spec annotations

In order to know which areas of a spec are well-tested and hence have a sense for (an upper bound on) the completeness of the testsuite as well as the areas where it would be most profitable to direct new testing effort, it would be beneficial to produce an annotated version of the spec that associates each testable assertion in the spec with a link to one or more test cases for that assertion. Requirements:

  • Map each test onto a piece of spec
  • Fine grained definition of "piece"; some sections are long and contain many normative requirements so paragraph-level is probably the minimum useful level
  • Good behaviour in the face of spec modifications, deletions, insertions and rearragements.

[SAZ: the work of the WCAG 2.0 Test Samples Development Task Force (TSD TF) included a metadata format that is based on the more elaborate Test Case Description Language (TCDL)

Requirements not yet categorized

  • be intended for CR and post CR phases. The test suite should be suitable to evaluate if the spec is implementable, but it should also be used to promote interoperability
  • allow the test suite to be ran by external entities or individual (it may be that the test suite can only be ran under specific conditions) (should it be available as a W3C widget to facilitate deployment on mobiles?)
  • allow simple (eg testing the value of an attribute) or complex tests (eg acid or stress tests) to be part of the test suite
  • allow a test to cover multiple specifications and sections of specifications
  • be suitable for HTTP 1.1, HTML5, CSS 2.1, CSS 3, ES5, Web APIs (HTML DOM, DOM L2, Selectors, Geolocation, XHR, etc.), MathML 1.0, SVG 1.1, Web sockets Protocols, etc.
  • allow testing of different layers: network (HTTP, low bandwith/latency, server throlling), syntax, DOM, layout model, rendering
  • ideally, the browser vendors should help us getting what we need to run the tests on their products.
  • allow for multiple test licenses
  • How can the framework help ensure the completeness of a test suite with regards to a particular specification?
  • regroup a set of existing tests from different sources (DOM, CSS, SVG, HTML, etc.). Can we create a test runner to run them all? Is it possible to convert them?
  • regroup the set of metadata needed/provided in the existing testing framework/tests.

See also

Existing work to consider:

  that one is highly interesting. Looks like the guy is trying to do what we need for pixel comparison