W3C

GRDDL Test Cases

W3C Proposed Recommendation 16 July 2007

This version:
http://www.w3.org/TR/2007/PR-grddl-tests-20070716/
Latest version:
http://www.w3.org/TR/grddl-tests/
Previous versions:
http://www.w3.org/TR/2007/WD-grddl-tests-20070502/
Editor:
Chimezie Ogbuji, Cleveland Clinic Foundation, <ogbujic@ccf.org>
Authors:
see Acknowledgments

Abstract

This document describes and includes test cases for software agents that extract RDF from XML source documents by following the set of mechanisms outlined in the Gleaning Resource Description from Dialects of Language [GRDDL] specification. They demonstrate the expected behavior of a GRDDL-aware agent by specifying one (or more) RDF graph serializations which are the GRDDL results associated with a single source document.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document reconciles tests from other documents in the repository (see Acknowledgements)

This July 16th 2007 release of the GRDDL Test Cases is a W3C Proposed Recommendation by the W3C GRDDL Working Group (part of the Semantic Web Activity). It has been widely reviewed and contributes to the requirements documented in GRDDL Charter; The tests within have been well implemented by a variety of software.

A pair of tests within contribute to addressing Web Architecture issue: xmlFunctions-34 and the notion of an elaborated infoset

In June 6th, 2007 the Working Group resolved to postpone issue-faithful-infoset in anticipation of ongoing dialog about the issue and the XML Processing Model Working Groups work to answer questions about transformation signaling and a default processing model.

This document enters a Proposed Recommendation review period. W3C Advisory Committee Members are invited to send formal review comments until 24 August 2007

Aside from formal W3C membership reviews from Advisory Committee Representatives, please send comments about this document to public-grddl-comments@w3.org (with public archive). A log of changes is maintained for the convenience of editors and reviewers.

The Working Group's implementation report demonstrates that the goals for interoperable implementations stated in the May 2007 Candidate Recommendation draft of the GRDDL specification were achieved.

Publication as a Proposed Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

Appendices

Introduction

A set of test cases is provided as part of the definition of [GRDDL]. This document presents those test cases. They are intended to provide examples for, and clarification of, the normative behavior of a GRDDL-aware agent. They should be used for testing the conformance of GRDDL-aware agents. The normative tests cover behavior expected of a GRDDL-aware agent.  The informative tests demonstrate other permitted behavior with respect to the issues resolved by the Working Group. This document itself has (as a GRDDL result) a manifest describing the test cases in RDF. For convenience, serializations of the GRDDL result are available as RDF/XML and Turtle.

This document and its files serve as a framework to test implementations of the GRDDL specification. It contains assertions demonstrating which of the GRDDL rules are relevant for each test contained within. In this way, it formally exercises (via EARL) the mechanisms exhibited by GRDDL-aware Agents within a Semantic Web of XML Documents.

Deliverables

The deliverables included as part of the test case collection are:

Note: the zip archive does not include tests which require network connectivity in order to properly calculate their GRDDL results. In addition and for convenience, two manifest files are included which only contain the normative tests: grddl-tests-normative.n3 (expressed in Turtle) and grddl-tests-normative.rdf (expressed in RDF/XML)

Test Manifest Format

This test collection uses an RDF vocabulary for manifests developed for the RDF Test Cases Recommendation. A GRDDL-aware agent can extract the test collection and automatically test compliance by attempting to reproduce the expected GRDDL result(s) associated with each test case. Some input documents have multiple output documents, see below

Documenting Test Coverage

Each test starts with: a title (in bold), links to the input and output documents, and a list of terms which indicate which GRDDL rules are exercised by the test. Each term links to a section in the Test Coverage appendix which has additional information about the rule definition and the other tests which exercise that particular rule.

The test manifest includes statements which identify the rules exercised by each test. For every test, there will be a assertion between the test and a URI associated with the rule, using the following property URI:

http://www.w3.org/2001/sw/grddl-wg/td/grddl-test-vocabulary#exercisesRule

The rule URIs correpond with anchors within the GRDDL specification document and are of the form:

http://www.w3.org/TR/grddl/#rule_RULE_IDENTIFIER

Using the Test Driver

We provide testft.py, a test driver, written in Python and based on rdflib 2.3.3. Run it a la:

$ python testft.py --run your_grddl_impl testlist1.rdf >earl_out.rdf
All tests were passed!

It has options for --debug and such; invoke it with no arguments (or with --help) for details:

Options:
  -r, --run              path to a GRDDL implementation to use to process the
                         source document (checking results)
  -u, --update           path to a GRDDL Implementation to use to process the
                         source document
      --tester           The URI of an agent associated with the EARL test assertions.
                         A BNode is used if none is given
      --project          The URI of the EARL 'subject' (the implementation being tested).
                         A BNode is used if none is given
      --local            A boolean flag (false by default) which indicates whether to run only the local tests

The tests do not require the use of this driver

EARL Reporting

In addition to writing various diagnostic messages to STDERR, the test harness writes additional RDF data to STDOUT: an [EARL] test assertion about each test it runs.

To tell it about the person running the tests and the software project being tested, point it to a tester (a URI in a [FOAF] RDF graph) and a test subject (a URI in a [DOAP] RDF graph).

Protocol Tracing

We find TCPWatch useful for debugging [HTTP] protocol interactions. If you start TCPWatch like so:

$ python tcpwatch.py -p 6543 &

then you can use it as a proxy:

$ http_proxy=http://127.0.0.1:6543 python testft.py
--run your_grddl_impl testharness.rdf

GRDDL Transform Library

A library of standard transforms is available for widespread use by authors

Local Policies, Faithful Rendition, and Conformance

The GRDDL specification states that any transformation identified by an author of a GRDDL source document will provide a Faithful Rendition of the information expressed in the source document. The specification also grants a GRDDL-aware agent the license to makes a determination of whether or not to apply a particular transformation guided by user interaction, a local security policy, or the agent's capabilities. However, in defining these tests it was assumed that the GRDDL-aware agent being tested is using a security policy which does not prevent it from applying transformations identified in each test. Such an agent should produce the GRDDL result associated with each normative test, except as specified immediately below.

Tests with Multiple GRDDL Results

multiple GRDDL results

Certain tests have multiple GRDDL results as a direct consequence of Faithful Infoset considerations, information resources with multiple representations, and seperate GRDDL mechanisms which produce distinct GRDDL results.

Tests of these kind can be considered as groups of N where N is the number of valid GRDDL results for the common input document.

Testing Faithful Infosets

In section 6. GRDDL Transformations of [GRDDL], the question is raised about how a Faithful Rendition of an XML Document's infoset (a Faithful Infoset) can be assured. GRDDL is silent about whether or not any XInclude [XINCLUDE] processing occurs before an XPath data model is created for use with GRDDL and any nominated transformations. GRDDL suggests the use of XProc [XPROC] where more complex or sophisticated transformation are required. XProc's XInclude component (see: 1.6 XInclude) can be used in an XML pipeline to explicitly specify the application of XInclude semantics against an XML infoset [XML INFOSET].

In the absence of an explicit XProc test, Testing GRDDL when XInclude processing is enabled and Testing GRDDL when XInclude processing is disabled are examples of tests which share the same source document, but have different XPath data models depending on whether any XInclude processing occurs. For such tests, a GRDDL-aware agent should output at least one of the GRDDL results associated with the single source document.

The tests manifest includes a symmetric property [OWL] (http://www.w3.org/2001/sw/grddl-wg/td/grddl-test-vocabulary#alternative) asserted between them. A GRDDL-aware agent running the tests can take this into consideration.

Testing for Multiple Representations

Information resources can also have multiple representations in response to content negotiation. In addition to the GRDDL results associated with each representation a test for the maximal result is included: the GRDDL result which consists of the merge of all possible GRDDL results.

Note, however, that the maximal result is not isomorphic with the other results. To aid a test harness in determining compliance for scenarios such as these, the tests have a property (http://www.w3.org/2001/sw/grddl-wg/td/grddl-test-vocabulary#subsumes) asserted from the test for the maximal result to the other tests in the group. A GRDDL-aware agent running the tests can take this into consideration.

Testing for Maximal Result

The remaining set of tests with multiple results are those where there is no ambiguity with the XPath data model associated with the source document, there is a single representation, and multiple GRDDL mechanisms apply. In the absence of a policy which prevents each GRDDL result from being computed, a GRDDL-aware agent should produce the maximal result.

Test Naming Convention

Every test has a URI of the form:

http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests#LOCALNAME

The test collection can either be run locally (see "Localized Tests") or over a network. Certain tests are marked as requiring a network connection with an open circle as their list item marker. These tests are asserted as members of the http://www.w3.org/2001/sw/grddl-wg/td/grddl-test-vocabulary#NetworkedTest class in the test manifest. A GRDDL-aware agent running the tests can take this into consideration.

The tests which require a network connection use absolute URIs (in the test manifest) to refer to their test material (input and output)

Tests which do not require a network connection use relative URIs (in the test manifest) instead.

Normative Tests

Each test has an input document and an output document. the output document is an RDF/XML document and represents a GRDDL result of the input document.

Localized Tests

For the sake of convenience, this first set of normative tests cover simple scenarios where neither namespace documents nor absolute URIs are used. Such tests can run offline rather easily.

Namespace Documents and Absolute Locations

These tests include the use of namespace documents and absolute URIs and are more difficult to run offline.

Library Tests

The following tests are tests primarily of the library code.

Ambiguous Infosets, Representations, and Traversals

These tests help check for robustness of implementations in the face of various odd cases.

Primer Material

This section includes material from the [Primer].

Informative Tests

This section includes tests not covered explicitly by the normative text of the GRDDL but demonstrate additional behavior that a GRDDL-aware agent may exhibit. They reflect behavior suggested by the Working Group as a result of resolving certain issues.

Security Tests

The following security tests are provided for implementers to adapt and use for their implementation. Security issues are usually system specific, and it may be possible for a malicious party to access XSLT version and vendor information concerning a specific GRDDL agent instance.

We do not provide instructions as to how to test your system against these tests, since they are likely to be not directly applicable. Developers of GRDDL aware agents are encouraged to understand these tests, and consider how their own systems may have potential security weaknesses.

A Test Coverage

This section groups the tests according to the GRDDL rules they exercise as described in the specification. Each group leads with a link into the specification where the formal semantics of the corresponding rule is defined.

A.1 Nominating GRDDL Transformations in well-formed XML - xml

See rule (#rule_GRDDL_transformation).

A.2 Merging GRDDL Results - merge

See rule (#rule_merge).

A.3 Nominating Namespace Transformations - ns

See rule (#rule_nstx).

A.4 RDF/XML Base Rule - rdfx-base

See rule (#rule_rdfxbase).

A.5 Nominating Transformations via GRDDL Metadata Profile - grddl-profile

See rule (#rule_tlrel).

A.6 Identifying Metadata Profile Transformations - other-profile

See rule (#rule_profiletrans).

B References

B.1 Normative

[GRDDL]
Gleaning Resource Descriptions from Dialects of Languages (GRDDL), Dan Connolly, ed., 2007/07/16
[RDF Concepts]
RDF Concepts and Abstract Syntax, Graham Klyne and Jeremy J. Carroll, Editors, W3C Recommendation 10 February 2004. Latest version available at http://www.w3.org/TR/rdf-concepts/ .
[RDF Syntax]
RDF/XML Syntax Specification (Revised). Dave Beckett, Editor, W3C Recommendation 10 February 2004. Latest version available at http://www.w3.org/TR/rdf-syntax-grammar/ .

B.2 Informative

[XINCLUDE]
XML Inclusions (XInclude) Version 1.0 (Second Edition) , J. Marsh, D. Orchard, D. Veillard, W3C Recommendation, 15 November 2006, http://www.w3.org/TR/2006/REC-xinclude-20061115/ . Latest version available at http://www.w3.org/TR/xinclude/ ..
[XPROC]
XProc: An XML Pipeline Language, N. Walsh, A. Milowski, W3C Working Draft (work in progress), 5 April 2007, http://www.w3.org/TR/2007/WD-xproc-20070405/ . Latest version available at http://www.w3.org/TR/xproc/ ..
[XML INFOSET]
XML Information Set (Second Edition), J. Cowan, R. Tobin, W3C Recommendation, 4 February 2004, http://www.w3.org/TR/2004/REC-xml-infoset-20040204/ . Latest version available at http://www.w3.org/TR/xml-infoset/ ..
[TURTLE]
Turtle - Terse RDF Triple Language. Dave Beckett, Editor, 04 December 2006.
[PRIMER]
GRDDL Primer , I. Davis, Editor, W3C Working Draft (work in progress), 2 October 2006, http://www.w3.org/TR/2006/WD-grddl-primer-20061002/ . Latest version available at http://www.w3.org/TR/grddl-primer/ ..
[EARL]
Evaluation and Report Language (EARL) 1.0 Schema. Shadi Abou-Zahra and Charles McCathieNevile, Editors, W3C Working Draft 27 September 2006, http://www.w3.org/TR/EARL10-Schema/ .
[WEBARCH]
Architecture of the World Wide Web, Volume One , N. Walsh, I. Jacobs, Editors, W3C Recommendation, 15 December 2004. Latest version available at http://www.w3.org/TR/webarch/ .
[OWL]
OWL Web Ontology Language Reference , S. Bechhofer, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider, L. Andrea Stein, W3C Recommendation, 10 February 2004. Latest version available at http://www.w3.org/TR/owl-ref/ .
[RFC2616]
IETF RFC 2616: Hypertext Transfer Protocol - HTTP/1.1, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, June 1999. Available at http://www.ietf.org/rfc/rfc2616.txt.
[FOAF]
FOAF Vocabulary Specification , Dan Brickley, Libby Miller, 27 July 2005.
[DOAP]
DOAP: Description of a Project , Edd Dumbill.

C Acknowledgements

The editor thankfully acknowledges the contributions of the following Working Group members and personel:

The security tests were created during the development of the Jena GRDDL Reader which uses the Saxon8.8 XSLT processor. They hence illustrate how a malicious party may try to abuse features of such an implementation.

D Change Log

Changes since the Working Groups decision to publish on 27 June:

$Log: Overview.html,v $
Revision 1.7  2007/07/16 21:40:17  connolly
membership reviews are due 24 Aug

Revision 1.6  2007/07/16 21:20:41  connolly
update GRDDL spec cite

Revision 1.5  2007/07/16 21:17:22  connolly
cite implementation report from status section

Revision 1.4  2007/07/16 21:04:39  connolly
refine link to formal membership reviews

Revision 1.3  2007/07/16 21:00:00  connolly
#appendices, #ref-XPath links don't go anywhere; removed them

Revision 1.2  2007/07/16 20:44:41  connolly
based on 1.68 2007/07/16 17:24:30 of draft from chime

Revision 1.68  2007/07/16 17:24:30  cogbuji
pubrules fix attempt

Revision 1.67  2007/07/16 17:21:45  cogbuji
added link to membership questionaire

Revision 1.66  2007/07/16 17:15:50  cogbuji
changed publication date/period to Jul 16 - Aug 13

Revision 1.65  2007/06/29 12:49:55  cogbuji
added link to issue-faithful-infoset

Revision 1.64  2007/06/29 01:25:32  cogbuji
pubrules fix: SOTD fixes

Revision 1.63  2007/06/29 01:21:49  cogbuji
pubrules fix: invalid CSS color values and removed "visible" HTTP URI

Revision 1.62  2007/06/29 01:11:50  cogbuji
well formedness fix

Revision 1.61  2007/06/29 01:10:34  cogbuji
fix CR->PR.  Added approval links for inline-rdf*.  Updated SOTD to match PR request

Revision 1.60  2007/06/28 13:14:33  cogbuji
one fix to pubrules - added id to h2

Revision 1.59  2007/06/28 01:49:03  cogbuji
pubrules fixes

Revision 1.58  2007/06/28 01:43:40  cogbuji
well-formedness error

Revision 1.57  2007/06/28 01:42:41  cogbuji
proposal date update

Revision 1.56  2007/06/28 01:41:37  cogbuji
XHTML Transitional logo

Revision 1.55  2007/06/28 01:40:12  cogbuji
fixes for XHTML Transitional validation

Revision 1.54  2007/06/28 01:18:10  cogbuji
CR request

Revision 1.53  2007/06/26 16:55:13  cogbuji
added text and links to normative-only manifests.  added informative references to: XInclude/XProc/XML Infoset.  Removed all remaining editorial questions

Revision 1.52  2007/06/26 13:14:43  cogbuji
changed 'corresponds' to 'illustrates' - per D.Booth's suggestion

Revision 1.51  2007/06/25 21:03:11  cogbuji
removed note about note about not listing redundant rules

Revision 1.50  2007/06/25 20:57:44  cogbuji
added todo for normative-only manifests

Revision 1.49  2007/06/25 20:43:50  cogbuji
added a class for informative test lists, removed questions about faithful-infoset, updated #xinclude / #noxlinclude approval links

Revision 1.48  2007/06/18 13:05:07  cogbuji
added more prose about test coverage and the corresponding statements in GRDDL

Revision 1.47  2007/06/18 03:20:17  cogbuji
fixed WFdness error

Revision 1.46  2007/06/18 03:16:28  cogbuji
removed approval links for renamed inline-rdf* tests

Revision 1.45  2007/06/18 03:11:10  cogbuji
per action to add test coverage triples to manifest and renaming of embedded-rdf*

Revision 1.44  2007/06/15 18:32:18  cogbuji
added todo indicators about wording of text having to do with faithful-infoset resolution/postponement

Revision 1.43  2007/06/04 18:55:09  cogbuji
Broke out a new appendix.  Added references/ack/changelog to it.  Added a Test Coverage section in the appendex.  Added a section describing the test coverage nomenclature.  "tagged up" each test with GRDDL rule terms.  Elaborated on Faithful Infoset testing and the XInclude tests (added diagram).  Added approval link for embedded-rdf4. Added approved test (error1)

Revision 1.42  2007/05/03 16:47:55  cogbuji
fixed #changelog anchor

Revision 1.41  2007/05/03 16:40:34  cogbuji
fixed anchor links to GRDDL reference

Revision 1.40  2007/05/03 16:35:10  cogbuji
fixed more broken links

Revision 1.39  2007/05/03 14:49:25  cogbuji
fixed broken links

Revision 1.38  2007/04/30 15:56:16  cogbuji
changed per thread on #xmlbase3.  See: http://lists.w3.org/Archives/Public/public-grddl-wg/2007Apr/0266.html

Revision 1.37  2007/04/30 15:42:38  cogbuji
removed  base-detail.  See: http://lists.w3.org/Archives/Public/public-grddl-wg/2007Apr/0264.html

Revision 1.36  2007/04/28 05:24:57  cogbuji
removed bad css color

Revision 1.35  2007/04/28 04:57:03  cogbuji
fixed date

Revision 1.32  2007/04/28 04:43:26  cogbuji
fixed pubrules violations and merged conflicts

Revision 1.31  2007/04/27 22:36:20  hhalpin
updated status text, removed embeddedrdf-4 approval

Revision 1.30  2007/04/27 20:07:33  cogbuji
added text for xmlbase1-4

Revision 1.29  2007/04/27 19:12:44  cogbuji
fixed correct output for xmlbase2 and xmlbase4

Revision 1.28  2007/04/27 15:22:20  cogbuji
fixed output files for xmlbase1-4 tests

Revision 1.27  2007/04/27 15:01:34  cogbuji
- changed all test input/output to absolute URIs
- updated approval indications
- removed tests per WG decision (httpHeaders and primer-hotel-data)
- added base-detail

Revision 1.26  2007/04/25 14:20:41  cogbuji
well-formedness-fixes

Revision 1.25  2007/04/25 05:59:18  cogbuji
removed obsolete todos

Revision 1.24  2007/04/25 05:50:38  cogbuji
added ack for Dom..

Revision 1.23  2007/04/25 03:39:25  cogbuji
fixed CSS class for network test (requires .htaccess magic)

Revision 1.22  2007/04/25 02:48:29  cogbuji
fixed double ref-WEBARCH, removed incorrect approval citation, fixed test li id syntax, added networked tests

Revision 1.21  2007/04/23 16:57:46  cogbuji
- moved in fixed versions of missing tests
- removed use of 'will'

Revision 1.20  2007/04/23 16:23:34  cogbuji
- added text to tests (from john-l suggestions)
- synched in commentary from jeremy
- added missing tests
- moved in additional tests from the pending list
- updated approved tests

Revision 1.19  2007/04/16 20:03:31  cogbuji
- updated multiple output section to clarify the 3 kinds of multiple output scenarios
- removed background color for approved test links
- added approval links for tests approved during 4/11 teleconference
- Collapsed single infoset / representation multiple output tests into maximal result

Revision 1.18  2007/04/09 22:09:03  cogbuji
- added CSS hooks for maximal result tests
- added security tests section
- clarifications to multiple output section
- removed three-transforms

Revision 1.17  2007/04/08 05:23:37  cogbuji
- Minor editorial fixes
- added note about Primer editorial draft material
- moved library tests to normative section
- added primer material test section
- updated documentation for testft.py (--local option)
- fixed links to GRDDL spec LC draft
- shrunk multiple output diagram and floated left
- removed uneccessary 'should's
- fixed input link to loop instead of loop.xml (see result)
- added primer material section
- informative link to primer

Revision 1.16  2007/04/06 20:32:42  cogbuji
xhtmlWithGrddlEnabledProfile properly marked as a NetworkedTest (and movd appropriately)

Revision 1.15  2007/04/06 19:29:27  cogbuji
Fixed erroneous text about (and proper generation of) g:alternative and added further text using an explicit example to demonstrate multiple outputs and their effect on compliance.

Revision 1.14  2007/04/06 18:11:40  cogbuji
Fixes towards WG actions:

- migrated all remaining tests from testlist* and pendinglist
- added CSS styling for test approval and network tests
- moved hcarda (networked test)
- removed WD indications
- added text for naming conventions and use of NetworkedTest and alternative in test manifest

Revision 1.13  2007/04/05 04:42:48  cogbuji
Moved in most of remaining tests from repository.  Added editorial todos / notes.  Added clear indication that this is an editors draft.

Revision 1.12  2007/04/04 15:50:36  cogbuji
removed indication of WD (commented styling to that effect)

Revision 1.11  2007/03/27 23:28:43  cogbuji
fixed patent policy link

Revision 1.10  2007/03/24 19:25:06  cogbuji
more pubrules fixes: ids for references headings, fixed cascade of css

Revision 1.9  2007/03/24 05:48:34  cogbuji
removed recursive log directive

Revision 1.8  2007/03/24 05:45:47  cogbuji
fixed change log link and added retrospective change log entries

Revision 1.7  2007/03/24 05:42:42  cogbuji
fixed changelog entries

Revision 1.6 2007/03/24 05:09:36  cogbuji;
fixed broken links to manifest files

Revision 1.5 2007/03/24 04:59:19 cogbuji
added SOTD and fixed css for WD (per pubrules)


Revision 1.4 2007/03/24 04:39:30 cogbuji
various XHTML validity modifications, synched up with deprecated doc50/grddl-tests.html (which was subject of WG approval), and other pre-transtion pubrules checks

Valid XHTML 1.0 Transitional