A schema for serialized infosets

Richard Tobin and Henry Thompson, LTG, University of Edinburgh

This is a schema that describes an XML serialization of XML infosets. There are two main versions: one for the basic infoset, and one for the post-schema-validation (PSV) infoset.

Our main goal in defining this serialization was to allow comparison of the infosets generated by different processors (including parsers and schema validators). It has also proved useful for finding flaws in the infoset and schema specifications themselves, and the serializations can also be converted to HTML (by stylesheets) for display.

The top-level schemas are

XMLInfoset.xsd
The basic infoset
XMLInfoset-strict.xsd
"Strict" version of the basic infoset (see below)
PSVInfoset.xsd
PSV infoset (uses strict version of basic infoset)

Notes

All properties and infoitems are represented as elements. There are two reasons for this:

A type is declared for each info item and property. Type names are camel-case with an initial capital. Element names are camel-case with an initial lower-case letter.

All properties are represented as elements whose name is the property name. These elements are globally declared (except where there are infoitems with the same name, in which case "Property" is appended to the name). A consequence is that properties with the same name must have the same type; this is true for both the basic and PSV infosets.) Their types fall into several categories:

Since there is no requirement for a processor to produce all infoitems or properties, in the basic infoset schema all properties are optional. In addition, to allow extensions of the infoset to be validated against the basic schema, all infoitems end with

<s:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>

There is a "strict" version of the schema which requires all the properties (but still allows extra properties from other namespaces).

The serialization of the basic infoset uses the namespace http://www.w3.org/2001/05/XMLInfoset and corresponds to what is expected to be the CR draft of the Infoset spec.

The serialization of the PSV infoset uses the namespace http://www.w3.org/2001/05/PSVInfosetExtension for added properties and infoitems, and corresponds to the XML Schema Recommendation.

Future work

There are some incompletenesses that will be rectified. In particular, no serialization has yet been defined for ID/IDREF or identity constraint tables. The schema could be tightened up in several places (facets, for example).

We intend to make the schemas compatible with the RDF schema for the basic infoset, so that a serialization can be valid according to both.

There are no doubt many bugs in these schemas, which we will attempt to correct. Please mail Richard Tobin (richard@cogsci.ed.ac.uk) with corrections.