Guide to Versioning XML Languages using XML Schema 1.1

Based on XML Schema 1.1 Working Draft of 31 Aug 2006

W3C Working Draft 28 September 2006

This version:
Latest version:
David Orchard, BEA Systems, Inc. <David.Orchard@BEA.com>


This document is a guide to versioning XML languages using XML Schema 1.1. It shows many of the new Schema 1.1 mechanisms, provides context above the schema 1.1 WD, and solicits reader input.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is the First Public Working Draft, published 28 September 2006, of a document which when complete is expected to become a W3C Working Group Note. It has been developed by the W3C XML Schema Working Group, as part of the W3C XML Activity, to illustrate the use of XML Schema 1.1 in defining XML languages. XML Schema 1.1 introduces a number of new features intended to make it easier to define XML languages which are flexible enough to tolerate later revision in a forward-compatible way. The current draft is not complete, but it illustrates several techniques important for the versioning of XML languages defined using XML Schema 1.1. It will be updated to make it more complete and to reflect further technical changes in the development of XML Schema 1.1. The Working Group has consensus that this draft should be published, but does not necessarily have consensus on every aspect of the exposition.

Please send comments on this document to the archived versioning mailing list public-xml-versioning@w3.org (archive).

The English version of this specification is the only officially maintained version. Information about translations of this document is available at http://www.w3.org/2003/03/Translations/byTechnology?technology=xmlschema.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
2 Wildcards
3 Updated All Group
4 Negative Wildcards
5 Multiple Namespaces
6 End of WD Mechanisms
7 Default Wildcard
8 Not in Schema wildcard
    8.1 NIS Variant #1: namespace keyword
    8.2 NIS Variant #2: notQnames keyword
    8.3 NIS Variant #3: processContents keyword
    8.4 NIS Variant #4: enumerated namespaces
9 Type extension with wildcards
10 Fallback to Declared Type
11 Guidance for selecting specific mechanisms TBD should we have this section?
12 Some mechanisms are currently not in scope.
    12.1 Not in Content Model wildcard
    12.2 Extension replacing wildcard
    12.3 Fallback
    12.4 FallbackElement in instance
        12.4.1 Multiple Versions with Fallback Type
    12.5 Version numbers
    12.6 MustUnderstand
13 References
14 Acknowledgements

1 Introduction

This document, Versioning Guide to XML Schema 1.1, is intended to provide an easily approachable description of the versioning features in the XML Schema definition language, and should be used alongside the formal descriptions of the language contained in Parts 1 and 2 of the XML Schema specification. The intended audience of this document includes application developers whose programs read and write schema documents, and schema authors who need to know about the features of the language, especially features that provide functionality above and beyond what is provided by DTDs. The text assumes that you have a basic understanding of XML, Namespaces in XML, and XML Schema.

The guide is a non-normative document, which means that it does not provide a definitive (from the W3C's point of view) specification of the XML Schema language. The examples and other explanatory material in this document are provided to help you understand XML Schema, but they may not always provide definitive answers. In such cases, you will need to refer to the XML Schema specification. To help you do this, we provide many links pointing to the relevant parts of the specification.

The W3C TAG is working on a versioning finding that provides a language-independent rationale and description of languages, extensibility, versioning, and compatibility. It is suggested reading for these topics.

As suggested in the TAG finding, creating and using multiple versions of a language is common and useful. As described, extensibility is a key contributor to versioning. It can enable forwards and backwards compatible versioning. The majority of this guide focuses on Schema 1.1 extensibility techniques that enable forwards-compatible versioning. In schema terms, this is when a schema processor with an older schema can process and validate an instance that is valid against a newer schema.

This guide focuses on describing the different ways extra content can be added to create new versions of a schema. These different ways reflects different conditions faced by schema authors. These conditions may manifest themselves as technical constraints (e.g. location of extension points where content can be added) or non-technical constraints (e.g. ownership over a schema).

XML Schema 1.1 contains a number of new extensibility mechanisms. The ones described in August 31 2006 draft are:

  1. Weak wildcards - permits wildcards adjacent to optional elements

  2. Updated All Group - wildcards within All Group

  3. Negative wildcard - exclude specific namespaces and names

The mechanisms that are possible/probable/under discussion for the next draft are:

  1. Fallback To Declared Type - use declared type if xsi:type is unknown

  2. Auto-insertion of wildcards

  3. Not in Schema wildcard - a wildcard that allows anything not defined in the current schema

This document does not discuss the Post Schema Validation Infoset properties as that would result in a signifcantly less readable document.

2 Wildcards

Let us start quickly by considering a name instance. It describes a name:

As mentioned earlier, extensibility in the type is desired. Schema wildcards are a significant mechanism for allowing extensibility. Using XML Schema 1.1, the name owner might like to write a schema such as:

One possible extension is adding a middle. It describes a name:

The next version of the schema with middle name added might look like:

This schema is illegal in Schema 1.0 and remains illegal in Schema 1.1 because there is a UPA (Unique Particle Attribution) rule violation with the additional optional middle and wildcard. However, Schema 1.1 has partially reduced the UPA constraints such that the following is now legal, and it will valid the previous 2 instances:

Alternatively, each of the elements could be wrapped in a sequence followed by a wildcard:

Another alternative is each of the elements could be a reference to a model group that is the sequence of the element followed by a wildcard:

The wildcard construct enables authors to create schemas that are both forwards and backwards-compatible. The new schema is backwards compatible because it will validate old and new instances. The exception is instances that have content that is legal in the wildcard but not in the new content. An example might be a middle name that has structure or digits. However, that scenario means that an author created a middle name instance in the name namespace according to one schema AND an author defined a new name in the same namespace according to a different schema. Arguably there is an authority over the namespace that will prevent such clashes and so in practice this exception won't happen. Alternatively, we can make a slightly different compatibility guarantee, which is the new schema is backwards compatible with validate old and new instance where new instances do not have any extensions in the defined namespaces. The old schema is forwards compatible because it will validate old and new instances - of course it sees these as current and future instances.

The new schema is created by the replacement of a repeating wildcard in the original, with an optional-wildcard, optional-element, optional-wildcard sequence, in the later schema. The new schema explicitly states the entire new content model, including everything from the original schema as well as the new explicit declaration for middle, and for that reason we call it a "Complete Respecification" of the type.

The new type declared above using wildcards could be declared as an explicit <xs:restriction/> of the original type, because every document accepted by the new type is also accepted by the old. XML Schema's type <xs:restriction/> allows alteration of wildcards anywhere in the content model, like Complete Respecification, but allows the original type to be preserved. Alternatively, XML Schema's type extension mechanism <xs:extension/> [provide ref to Recommendation] provides a different way of specifiying a modified type, in which the original content is not restated, but only the new elements are explicitly referenced. The differences are: (1) xs:extension allows new content only at the end of the model and (2) using wildcards as shown above, the original type will accept not only documents in the original language, but also documents containing the middle name, something that's not true in typical uses of xs:extension. Thus the schema author of new version of a type has 3 options outlined above: 1) Complete Respecification without explicit use of xs:restriction; 2) Complete Respecification with explicit use of xs:restriction; 3) xs:extension.

3 Updated All Group

All-groups can use the updated wildcard:

This highlights two change to all-groups: addition of wildcards and maxOccurs > 1. Which allows names such as:

There are a number of other potential changes to all-groups which the WG is still considering.

  1. Providing an option to allow the schema author to specify clustering semantics (in which all elements matching a given particle in the all-group must be together in the input, rather than interleave semantics)

  2. Allowing all-groups to appear within choices and sequences.

  3. Allowing choices and sequences (and nested all-groups?) to appear within all-groups.

  4. Defining the extension of an all-group as adding new items to the all-group, rather than as producing a sequence consisting of the all-group followed by the new material.

4 Negative Wildcards

The previous schemas allowed extra content that was already defined because the wildcard could match anything, such as:

We can preclude the "extra" occurances by using the negative wildcard to disallow certain elements, ie:

The first wildcard allows anything other than name:name, name:given, name:family. The 2nd wildcard allows any element in the http://www.example.org/name/1 namespace expect for name, given, family. The last wild card allows for any namespace other than the http://www.example.org/name/1 namespace.

It is possible to re-use the excluded QNames by use of a model group with a wildcard containing the QNames.

5 Multiple Namespaces

Many XML Languages use multiple namespaces. Imagine the name where the given and family are from different namespaces than the name with wildcards at the end. What wildcard should be there? Saying "##any" means already existing terms could be inserted. ##other only prevents items from the target namespace (name) from appearing. If we want to allow only elements that are not in the givenns, familyns, or name namespaces, we can use negative wildcards:

A common versioning scenario is to extend an existing namespace with new names, as mentioned in TAG finding (http://www.w3.org/2001/tag/doc/namespaceState.html). If we want to exclude existing terms from a namespace, we can also list them, as shown earlier but reprised in multi-ns:

6 End of WD Mechanisms

This ends the description of the mechanisms that are in the WD. The following are under active discussion.

7 Default Wildcard

The previous schemas can be hard to read because of the wildcards sprinkled through the types. Also, the language designer has to remember to put the wildcards in everywhere. To solve this problem, we introduce default wildcards, as in:

8 Not in Schema wildcard

We see that this can become difficult to manage as the number of wildcards and definitions grows. If there were 1000 element definitions and another one was added, then potentially 1000 wildcards would need to be updated to preclude the new element. To help with this manageability issue, the Not In Schema wildcard specifies that any existing element is not allowed in the wildcard and is equivalent to the negative wildcard. For ease of authoring, the examples are based upon the default wildcard mentioned above.

9 Type extension with wildcards

We acknowledged earlier that the use of wildcards in places other than the end of a contenet model meant that the existing type couldn't be used to extend the content model. If we restrict ourselves to wildcards at the end of a content model, then we can use type extension to add content to the content model, such as

This results in a type which has two wildcards: the first between family and middle, the 2nd after middle. It is roughly:

This may not be the desired approach because we may want to replace the trailing wildcard with the additional content. Currently, this replacement can only be done using Restriction:

Note that the restriction forces us to replicate the content model of the restricted type. This does allow us to perform significant updates to the content model, for example

10 Fallback to Declared Type

The previous example creates a new type, but can retain the element name. When a consumer receives the existing element name with an xsi:type specifying the new type, it may not know about the new type. A fallback from the xsi:type to the declared type allows a consumer can "cast" the type it does know about into the declared type.

If the consumer of this element didn't understand the xsi:type definition (name:nameWithMiddle), they could "cast" it to the declared type (name:name), resulting in roughly:

11 Guidance for selecting specific mechanisms TBD should we have this section?

Place Holder:

Use wildcards not at the end when not worried about by-reference extension.

Use wildcards when allowing current or other ns:

Use fallback when items should be removed from PSVI or configurable

Use Negative wildcard to exclude specific items

Use NIS as a shorthand for negative wildcards

Use fallback Type when new type is sent

12 Some mechanisms are currently not in scope.

Note, the syntax used here-in is probably incorrect as it isn't maintained as closely as the in-scope mechanisms.

12.4 FallbackElement in instance

The previous example creates a new type and preserved the existing name. Another possibility is that new element names will be created. When a consumer receives a new element name, it may not know about the new element name or type. An xsi:fallbackElement can be specified in the instance, and a consumer can "cast" the element it does know about into the xsi:fallbackElement.

If the consumer of this element didn't understand the element definition, they could "cast" it to the fallbackElement, resulting in:

This would then combine with the fallback to Declared type as mentioned previously.

13 References

14 Acknowledgements

Gilbert Pilz, BEA Systems. W3C Schema Working Group Members.