SVG and WebCGM Test Suites

Lofton Henderson

Prepared for: W3C QA Workshop
NIST (Gaithersburg), 3-4 April 2001

Latest Revision: April 1, 2001

Introduction

Over the past year, we have finished work on a preliminary SVG conformance test suite, and work is in progress on a test suite for WebCGM 1.0. Some of the issues and methodologies of these projects are unique to graphics, but much is common to conformance work for a wider range of specifications. We will present the principles and methodologies, summarize the test suite contents, identify issues and shortcomings, and extract lessons which might be useful for other conformance work.

In interpreting the material in this paper, it is useful to know:

The SVG test suite was constructed from scratch within the SVG Working Group over a period of 12 months, from a late and "stable" working draft through two CR draft public releases.
The WebCGM test suite is being constructed by adapting extensive pre-existing CGM materials in the NIST-ATA test suite for the ATA GRexchange 2.4 profile of CGM, and new materials are being designed and added as needed for the significant dynamic functionalities of WebCGM 1.0.

Characteristics of the Suites

Focus - Conformance of What?

In the domain of graphics standards, conformance work has focused on three areas:

conformance of graphics format file instances;
conformance of graphics format generators;
conformance of graphics format interpreters and viewers.

Both the SVG and WebCGM projects focused on #3, viewer conformance.

Note. There is an existing WebCGM instance validator that is very complete, MetaCheck. There is no complete SVG instance validator, other than various XML tools to validate against the DTD. There has been a lot of activity, but not much useful result, in the notoriously difficult "generate conformance" topic.

Principal Purposes

For both the SVG and WebCGM projects, these principal purposes were agreed:

provide self-assessment tool for implementation builders;
help implementation builders achieve interoperability of implementations;
help users assess the fidelity and completeness of implementations to the respective (SVG or WebCGM) specification.

Although there have been other benefits, such as improvements to the standards (Recommendations) themselves, these where not principal goals at the start of work.

Non-Goals

It is not a goal of either project to build:

a certification suite or establish a certification service. In general, this requires much more rigor, formality, and "defensability" in the test suite materials.
"goodness" tests, i.e., tests to measure such parameters as viewer performance, optional non-normative features, etc.
a demo suite, although one effect of a thorough conformance test suite is to demonstrate of all of the functionality of the standard.

(Note. Demo suites tend to be more for the marketing of the standard or products, prioritizing the attractive and entertaining, and generally sacrificing some testing principles such as "atomicity". We make this point because of an audience suggestion at the March 2001 W3C Technical Plenary, "...should contain lots of realistic, typical, legal files.")

Methodology

Overview

There are two meanings to "methodology" the context of building these graphics test suites.

the contents of the suite, and how it approaches the testing of viewers;
the process of building the suites.

(Note. Methodology could also refer to the methodology of applying the tests, which question becomes particularly interesting in the context of a certification service. But for the most part, it is outside of the scope of this paper.)

Kinds of Tests in the Suites

Extending prior work, in the SVG project we further developed and defined a notion of progressive testing. Progressive is used both in the sense of a logical order in which to expose a viewer to tests, and a sensible order in which to develop test materials.

Basic Effectivity (BE) - verify rudimentary capability across all functional areas;
Detailed (DT) - comprehensively probe all testable assertions.

The SVG suite was scoped initially as a BE-plus-DT (full BE, substantial DT) project, but the scope was narrowed to BE in the face of higher than expected labor to build the tests and infrastructure. BE-leve is the defined scope of the WebCGM suite, although some pre-existing content which is adapted for the WebCGM suite more resembles DT tests.

Error Tests (ER) - test viewer adherence to normative specifications in the standard for handling of erroneous content.

While ER tests are applicable and potentially in the scope of the SVG work, no ER tests are yet built. ER tests are inapplicable to WebCGM, as there are no normative specifications of error response - WebCGM defines conforming viewer behavior on conforming content.

Notwithstanding the disclaimer about demo test suites, in both projects we identified that a few demo file instances would be desirable. These could be considered to be a "BE test of combinations" (going beyond the testing principal of "atomicity", which is to identify one atomic functionality and test it in isolation):

Demo (DM) - "real world" file instances, from SVG or WebCGM generator products, ideally complex and not hand-crafted.

Test Suite Contents

Both the SVG and WebCGM suites contain:

a collection of Test Case (TC) instances;
for each TC instance, a Reference Image which illustrates the expected eesult (a correct rendering of the content)
an Operator Script, which describes how to run test, what constitutes pass/fail ("Verdict Criteria"), and a verbal description of the graphical content.
an XML database describing the Test Cases
one or more harnesses, for organizing the presentation of the materials and navigation through the suite.

The following table compares some details of the two suites.

Comparison of SVG & WebCGM Test Suite Contents
	SVG	WebCGM
Test Cases	127 BE test cases (complete BE suite).	~230 existing BE/DT static-graphics tests; ~25 additional (est) for dynamic BE tests.
Reference Images	PNG raster images, 450x450	GIF in existing static, 1000x1000 (will be converted to PNG); PNG for new tests.
Operator Scripts	Prose descriptions of test purpose, of expected visual result, what deviations are permissable for "pass".	Existing static: terse operator instructions and checkpoints for certification testing. New dynamic: more descriptive (like SVG).
Test Harnesses	4 different harnesses for different viewer types. HTML or SVG linked pages, one per test case, which present reference image, OS, rendered content, and links through test suite. Generated from XML database via XSLT.	Existing: Single HTML frameset with pull-down forms for test case navigation, button to view reference image, and OS presented in right frame. Modifications: add button to access test itself. Generated from XML base via JavaScript-DOM program.
XML database	One XML description file per test case, including the Operator Script, and identification of link neighbors.	Single XML file with TestCase elements, each of which contains test purpose, version information, operator script, etc.

You can see sample content of each test suite in the respective references.

How they Were Built

Idealized Process

The following idealized process for test suite construction has been widely applied in conformance suite work, graphics and otherwise.

analyze the standard (Recommendation) and extract all testable assertions - Test Requirements (TR).
synthesize and associate with the TRs a set of Test Purposes (TP).
write and implement a set of Test Cases (TC) which realize the Test Purposes.

Content Guidelines

The reference documents for the two projects discuss in some detail, what guidelines and principles we followed in generating the actual test cases - atomicity, consolidation for conciseness, self-documenting, etc. We emphasize one of the principles in particular, because it is one of the most important and at the same time proved to be one of the most problematic:

Traceability. A test must be traceable back to a statement or statements in the standard's text (Recommendation text).

Actual Processes & Results

Here is a summary comparison of the two suites:

SVG & WebCGM Process Details
	SVG	WebCGM
TR extraction	For BE tests: implicit and informal - read chapter and identify major functional components which should be touched by a BE test.	For existing BE/DT static graphics tests: nothing done originally; and, won't retrofit because of cost. New dynamic tests: TR extraction done (into a HTML table and XML database) - see WebCGM reference for the full TR set.
TP synthesis	For BE tests: implicit and informal (note that the SVG reference does contain bibliographic reference to a sample formal TR/TP process for DT-level tests for 'path' operator.)	Existing BE/DT static-graphics tests: nothing done. New dynamic tests: TP synthesis done (into a HTML table and XML database) - see WebCGM reference for the full TR set.
TC instance	Hand edit in standard text editor.	Existing static graphics tests: apply global changes to adapt to WebCGM with hand edit ClearText and convert, or by MetaWiz script. New dynamic: hand edit new ClearText and convert, or construct in MetaWiz.
Reference image	Direct SVG rasterization to PNG from implementations; or, screen capture and SaveAs PNG; or, ... See SVG reference for details.	Existing static graphics tests: convert existing GIF files to PNG. New dynamic: as SVG (note also, reference image will sometimes be HTML). See WebCGM reference for details.
Operator Script	One XML description file per test case, including the Operator Script, and identification of link neighbors.	Single XML file with TestCase elements, each of which contains test purpose, version information, operator script, etc.
Repository	CVS on a centralized server, with R/W access to authorized test suite contributors.	Simple disk cache/repository.
Public access	ZIP archive release of CVS repository to public at reasonable intervals. Also browsable/executable online.	TBD (but likely similar to SVG).
Serialization	Automated via CVS $Revision$ keyword.	Manual, or via an automated serialization feature built into MetaWiz.

Notes on table:

In some of the "dynamic" WebCGM tests (esp. CGM-to-HTML navigation tests), the reference image (expected result) will not be a picture, but rather a browser snapshot of some presented HTML.
"MetaWiz" refers to a Windows tools, that provides a drag-and-drop interface for CGM test case generation, featuring a "meta-CGM" language with looping, includes, and other useful control structures.
See serialization description in "Lessons" section.

Shortcomings

There are some things we would do differently or better, and things we should have done but didn't.

traceability - this shouldd be a key requirement in any test suite. It wasn't done in SVG (but the negative impact may be mitigated by the BE nature). It wasn't done in existing static-graphics BE/DT tests for WebCGM. It certainly should be done for any DT tests in either project. It will be done in the new dynamic tests for WebCGM.
too much manual effort (more automation needed), especially in the TR extraction and traceability implementation.
imprecise visual methods - in graphics test suites, the pass/fail criterion is largely visual, and is easily subject to operator error.

Lessons Learned & Issues Identified

The process of building test suites confers tremendous benefit to the standard (Recommendation itself). SVG was done during the standardization and led to many changes; WebCGM was done after the fact, and is leading to numerous defect corrections.
Get started early in the standardization cycle - ideally at the first "stable" Working Draft.
Companion to #2. Beware the pitfalls of working against early, unstable specifications.
Not only does early conformance work detect ambiguities and defects in the standards, it also forces consideration of "fuzzy" conformance statements, optional features, "recommended" behaviors, and the like, all of which are inimical to the goal of building a cadre of strongly interoperable applications and implementations of the standards.
For the graphics test suites, a full DT suite is probably something like 10 times the number of test cases as the BE suite.
Opinion. The greatest value for effort invested arguably comes from the BE suite. There is probably something like a 90-10 rule here: 90% of the benefit (to implementations, the standard, etc) from 10% of the test materials (a comprehensive BE-level suite).
#6 notwithstanding, a full DT suite is essential for guiding and enforcing completely interoperable implementations (aside: in such mission-critical application areas as aircraft maintenance manuals, "98% interoperable" is not good enough).
The processes we described are labor-intensive. More automation is needed.
Especially labor-intensive are the TR/TP phases and in constructing trace-back. The standards documents typically don't facilitate this. (Note. The OASIS XSLT/Xpath Conformance TC has considered this issue and has designed some interesting labor-saving and error-reducing methodologies).
Labor requirements can be reduced by leveraging existing test suites and QA materials (e.g., from members), but this introduces a new set of problems: retroactive quality screening of large numbers of tests for correctness, retrofiting traceability features, etc. (Again, the XSLT/Xpath Conformance TC has experience here.)
The TR extraction often involves interpretation, paraphrasing, or synthesis of the text of the specification. I.e., the TR is sometimes not stated obviously. We are not sure whether or not this is always avoidable.
Social comment. While the WG definitely should play a role in the building of test suites, on the other hand the WG members who are willing to invest much effort in test suite construction comprise a significant minority - the prevailing view (I believe) is that the WG is an arena for technical invention.
Resources. Be prepared to spend at least one person-year of labor, even for a BE suite. For DT, be ready for 1-1/2 to 4 years, or more. See SVG reference document for a cursory survey of several efforts, graphical and non-graphical.
Serialization. It is imperative to provide a versioning interlock between test case instance and "expected result" (reference image for graphical suites), so that it is always clear whether or not a reference image came from a particular test case file version. Automation is important here - manual serialization is tedious and easily overlooked.
Issue. An interoperability conformance suite might (should?) test features in the standard which are optional or recommended, whereas a strict certification suite would not.
Visual comparison of rendered content with Reference Image (expected result) might be okay at BE level. However, it is both imprecise and labor intensive. In graphics, this is a difficult technical issue and some attractive ideas such as "XOR the images" have significant problems. (Reliably deterministic automated methods to declare pass/fail may be unattainable, however we think that some automated techniques that provide indicative aids to manual inspection might be attainable).

References

You can find much more detail about the SVG and WebCGM projects, including extensive bibliographies, in:

"SVG Conformance Test Suite - Test Builder's Manual", at http://www.w3.org/Graphics/SVG/Test/svgTest-manual.htm.
"WebCGM Conformance Test Suite - Methodology & Contents Proposal", at http://www.cgmopen.org/technical/testing/task2-report-TOC.html

The SVG test suite itself is available online from the Web page:

http://www.w3.org/Graphics/SVG/Test;
From this page you can link to one of 4 harnesses, for example: if you don't have an installed SVG plugin, the bare-bones harness can be viewed; or, if you have a plugin, you can see all of the test suite components in the frame-based harness.