[Editorial Draft] Extending and Versioning Languages: Strategies

Draft TAG Finding 04 July 2007

This version:
http://www.w3.org/2001/tag/doc/versioning-strategies-20070704.html ( xml )
Latest version:
Previous versions:
Unapproved Editors Drafts: http://www.w3.org/2001/tag/doc/versioning-20070518.html, http://www.w3.org/2001/tag/doc/versioning-20070326.html, http://www.w3.org/2001/tag/doc/versioning-20061212.html, http://www.w3.org/2001/tag/doc/versioning-20060726.html, http://www.w3.org/2001/tag/doc/versioning-20060717.html, http://www.w3.org/2001/tag/doc/versioning-20060710.html, http://www.w3.org/2001/tag/doc/versioning-20031116.htmlhttp://www.w3.org/2001/tag/doc/versioning-20031003.html
David Orchard, BEA Systems, Inc. <David.Orchard@BEA.com>


This document provides motivation for versioning, a number of questions that language designers must answer, and a variety of version identification strategies. Separate documents contain the terminology definitions and XML language specific discussion.

Status of this Document

This document has been developed for discussion by the W3C Technical Architecture Group. It does not yet represent the consensus opinion of the TAG.

Publication of this finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.

Additional TAG findings, both approved and in draft state, may also be available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.

Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).

Table of Contents

1 Introduction
    1.1 Why Do Languages Change?
    1.2 How Do Languages Change?
        1.2.1 Why Extend languages?
    1.3 Kinds of Languages
2 Versioning Strategies
    2.1 Why Have a Strategy?
    2.2 Versioning Designs
        2.2.1 Big Bang/Incompatible
        2.2.2 Forwards Compatible
   Must Accept Unknown Extensions
   Fallback Provided
   Understanding unknown version identifiers
   Supporting functionality
        2.2.3 Backwards compatible
        2.2.4 Mixtures
3 Language Requirements
    3.1 What language form
    3.2 Can 3rd parties extend the language?
    3.3 Can 3rd parties extend the language in a compatible way?
    3.4 Can 3rd parties extend the language in an incompatible way?
    3.5 Can the designer extend the language in a compatible way?
    3.6 Can the designer extend the language in an incompatible way?
    3.7 Is the vocabulary a stand-alone language or an extension of another vocabulary?
    3.8 What Schema language(s)?
    3.9 Should extensions or versions be expressible in the Schema language?
    3.10 Requirements Summary
4 Language Design
    4.1 Schema language design choices or constraints.
    4.2 Substitution Mechanism.
    4.3 Component identification
    4.4 Identification of incompatible extensions
    4.5 Design Summary
5 Identifying Languages
    5.1 Version Numbers
    5.2 XML Namespaces
6 Case Studies
    6.1 HTML
7 Extension versus Versioning
8 Conclusion
9 References
10 Acknowledgements


A Change Log (Non-Normative)

1 Introduction

The evolution of languages by adding, deleting, or changing syntax or information is called versioning. Making versioning work in practice is one of the most difficult problems in computing. Arguably, the Web rose dramatically in popularity because evolution and versioning were built into HTML and HTTP. Both systems provide explicit extensibility points and rules for understanding extensions that enable their decentralized extension and versioning.

This finding describes general problems and techniques in evolving systems in compatible and incompatible ways. These techniques are designed to allow compatible changes with or without schema propagation. A number of questions, design patterns and rules are discussed with a focus towards enabling versioning in XML vocabularies, making use of XML Namespaces and XML Schema constructs. This includes not only general rules, but also rules for working with languages that provide an extensible container model, such as SOAP and RDF/OWL.

The terminology definitions used throughout are defined in [Versioning]

1.2 How Do Languages Change?

One of the most important aspects of a language change is whether texts in the revised language are backwards or forwards compatible with the unrevised language .

Some typical backwards- and forwards-compatible changes:

Some typical forwards-compatible changes:

Some typical backwards-compatible changes:

Some typical incompatible changes:

2 Versioning Strategies

In broad terms, the strategies to versioning fall into a number of classes ranging from "none" to a "big bang":

There's no single approach that's always correct. Different application domains will choose different approaches. But by the same token, the approaches that are available depend on other choices or constraints. One very important constraint is the whether the language can be evolved by distributed parties such that parallel evolutionary development can occur. The dependencies makes it imperative to plan for versioning from the start. If versioning is not planned from the start, then the possible versioning strategies may be constrained by decisions that have already been made.

A language commonly goes through a lifecycle of iterative development followed by deployment followed by deployment of new versions. The point in the lifecycle of the language may also affect the selection of the versioning strategy for the language

Just as there are a number of strategies, there are a number of designs for implementing a strategy. The internet - including MIME, markup languages, and XML languages have successfully used various strategies, either singly or in combination. Summaries of strategies and requirements have been produced for earlier technologies and guided XML Namespaces and Schema, such as [Web Architecture: Extensible Languages].

2.1 Why Have a Strategy?

Different kinds of languages and different versioning strategies expose different problems. Attempting to deploy a system that provides no versioning mechanism puts the burden of version "discovery" on consumers and is often impractical in anything except a closed system.

At the other end of the spectrum is the "big bang" approach which is also problematic.

"Big bang" is a very coarse-grained approach to versioning. It establishes a single version identifier, either a version number or namespace name, for an entire text.

The semantics of the "big bang" are that applications decide on the basis of the text version whether or not they know how to process that text. If the version isn't recognized, the entire text is rejected. Typically, when introducing a new version using the big bang approach, all of the software that produces or consumes the texts is updated in a sweeping overhaul in which the entire system is brought down, the new software deployed and the system is restarted. This big bang approach to versioning is practical only in circumstances where there is a single controlling authority, and even in that case, it carries with it all manner of problems. The process can take a considerable amount of time, leaving the system out of commission for hours if not days. This can result in significant losses if the system is a key component of a revenue generating business process and the cost of coordinating the system overhaul can also be quite costly as well.

The "big bang" approach is appropriate when the new version is radically different from its predecessor. But in many cases, the changes are incremental and often a consumer could, in practice, cope with the new version. For example, it might be that there are many messages that don't use any features of the new version or perhaps it is appropriate to simply ignore components that are not recognized.

Recall our Name example and consider a producer and a consumer exchanging name messages. Imagine that some future version of the name language defines a new "middle" component. Because producers and consumers are distributed, it may happen that an old consumer, one unprepared for a middle component, encounters a message with a middle component sent by a newer producer.

If big bang versioning is used, old systems will reject the new message. However, if the versioning strategy allowed the old consumer to simply ignore unrecognized content, it's quite possible that other components of the system could simply adapt to the previous behavior. In effect, the old system would ignore the middle component and its descendents so it would "see" a message that looks just like the old message it is expecting.

For the producer, the result would be that the request is fulfilled, though perhaps not quite the way it may have hoped. For example, a request that results in a name response may return a name without the middle. In many cases this may be better behavior than receiving an error. In particular, producers using the new language can be written to cope with the possibility that they will be speaking to older consumers.

If the new system needs to make sure that middle is respected, then it can change the language in an incompatible way to indicate that the new behavior is not considered backwards compatible.

Often, what is needed is some sort of middle ground solution.

2.2 Versioning Designs

For any given strategy, there are various designs that achieve the strategy, and some may be more appropriate than others. Among them we find:

2.2.2 Forwards Compatible

Forwards compatible evolution of a language typically means that producers should be able to produce texts with new additional content without consumers having to change existing implementations. It may also allow a centralized authority greater opportunities for versioning. The common characteristic of a compatible change is the use of extensibility in a language.

A supreme example of the benefits of extensibility is HTML. The first version of HTML was designed for extensibility; it said that "unknown markup" may be encountered. An example of this is the addition of the IMG tag. This is a great example of a language designed for extensibility.

The first rule introduced in this Finding relating to extensibility is:

Good Practice

Forwards-compatible requires extensibility rule: Any Language intended for forwards-compatible versioning SHOULD have extensibility.

A caveat is that a language may be changed in a forwards-compatible way by reducing the range or maximum allowed number of occurences of component from the language. However, there are extremely few languages that have been revised in forwards-compatible way with just these reduction changes. Unless the language designer knows that reduction will be the only change in the language, then a language they intend to have forwards-compatibly versioning possible MUST have extensibility.

Further, a language that allows extensibility also requires a specification of what happens when the extensibility is used. If the extensibility is used in a forwards-compatible way, then by definition the software consuming the extension does not know about the extension and we can call the extension an unknown extension. If the software consuming the extension "knows" about the extension, then it has been revised and uses the revised language that incorporates the extension. The behavior of software when it encounters an unknown extension should be clear. For this, we introduce the next rule:

Good Practice

Provide Extension handling Rule: Languages SHOULD specify how unknown extensions are handled.

We will shortly describe a variety of extension handling models for uknowns, but there is a constant across all the models. An extension that invalidates an existing component is an incompatible change.

Good Practice

Preserve existing information Rule: Any Language intended for forwards-compatible versioning MUST require that extensions MUST not invalidate the non-extension text's information.

There are a variety of specific patterns for handling extensions, which may be used in combination. Must Accept Unknown Extensions

Perhaps the simplest extension handling model that enables forwards-compatible changes is to accept content that is unknown. This rule is:

Good Practice

Must Accept Unknowns Rule: Consumers MUST accept any text portion that they do not recognize.

This is sometimes called the "MUST Ignore Unknowns" rule. HTML 4.01 follows this approach "If a user agent encounters an element it does not recognize, it should try to render the element's content. ". Under HTML 4.01, a user agent is free to remove or preserve an element that is not recognized. The Must Accept Unknowns rule for XML was first standardized in the WebDAV specification RFC 2518 [6] section 14 and later separately published as the Flexible XML Processing Profile [3].

More specific variants of the Must Accept rule are to qualify what kind of handling beyond accepting is required. One model is to remove the unknown:

Good Practice

Must Accept and Remove Unknowns Rule: Consumers MUST accept and remove any text portion that they do not recognize.

There is a history of usage of the Must Accept and Remove Unknowns rule. HTML 1, 2 and 3.2 follow the Must Accept and Remove Unknowns Rule as they specify that any unknown start tags or end tags are mapped to nothing during tokenization.

Another forwards-compatible extension handling model is to preserve the unknown. This rule is:

Good Practice

Must Accept and Preserve Unknowns Rule: Consumers MUST accept and preserve any text portion that they do not recognize.

HTTP 1.1 [7] specifies that a transparent proxy should accept and preserve any headers it doesn't understand: "Unrecognized header fields SHOULD be ignored by the recipient and MUST be forwarded by transparent proxies."

There are two broad types of Must Ignore rules for dealing with extensions, either ignoring the entire tree or just the unknown part of the tree. The rule for ignoring the entire tree is:

Good Practice

Must Accept All Rule: The Must Accept rule applies to unrecognized texts and their descendents in tree based formats.

This variation on must accept requires the consumer to accept the text and any children it does not understand. Most data applications, such as Web services that use SOAP header blocks or WSDL extensions, adopt this approach to dealing with unexpected markup. For example, if a message is received with unrecognized elements in a SOAP header block, they must be ignored unless marked as "Must Understand" (see Rule 10 below). Note that this rule is not broken if the unrecognized elements are written to a log file. That is, "accepted" or "ignored" doesn’t mean that unrecognized extensions can’t be processed; only that they can’t be the grounds for failure to process.

Other applications may need a different rule as the application may want to retain the content of an unknown component, perhaps for display purposes. The rule for accepting the component only is:

Good Practice

Must accept Container Rule: The Must accept rule applies only to the smallest portion of the tree.

This variation on must accept requires the consumer to accept the smallest part of the text that is ignorable. For markup languages, this could be just an element or attribute that it does not understand, but in the case of elements, to process the children of that element. The Must accept Container practice was described in [HTML 2.0]

This retains the element descendents in the processing model so that they can still affect interpretation of the text, such as for display purposes.

Ignoring content is a simple solution to the problem of substitution. In order to achieve a compatible evolution, the newer texts of a language must be transformable (or substitutable) into older texts. Object systems typically call this "polymorphism", where a new type can behave as the old type. Fallback Provided

A language can provide mechanisms for explicit fallback if the text is not supported. [MIME] provides multipart/alternative for equivalent, and hence fallback, representations of content. [HTML 4.0] uses this approach in the NOFRAMES element. In XML, the XML Inclusions specification [XInclude] provides a fallback element to handle the case where the putatively included resource cannot be retreived. There are many variations on where the fallback content can be found. For example, a schema language could specify that fallback content is found in a text, in a schema, or even in the schema for the schema language. Understanding unknown version identifiers

Providing forwards compatibility often requires more than a substitution model for texts, it must also provide a substitution model for any version identifiers.

Good Practice

Provide Version Identification substitution model: Languages MUST provide a substitution model for version identifiers for forwards-compatible evolution.

The use of a version identifier requires a substitution from an unknown version to a known version for a consumer that doesn't understand the version identifier.

There could be an algorithmic approach. For version numbers, one could say that version numbers will only have a "major" change if there is an incompatible change. For example, version 1.1 of a language is by definition compatible with version 1.0 and version 2.0 is incompatible. Then, when the producer puts 1.0, 1.1, or 2.0, a consumer at any level will know whether it can process the content. This also means that there is a choice about which version number to put in, the lowest or the highest. A document that contains "1.1" means that any 1.X processor can process it. A "2.0" document means that a 1.X processor cannot process it, but any "2.X" processor can.

Then the language have wording about processing unknown version numbers. Sample wording for a substitution model for version identifiers: "A processor of this version MUST not fault if it receives a document that contains the same major version number." This rule would be in conjunction with forwards-compatible design for the texts, such as "Must accept Unknowns".

2.2.3 Backwards compatible

In general, providing backwards compatibility is easier than providing forwards compatibility. Backwards compatibility means supporting the previous versions of text in a newer consumer. There are are two significant ways that backward compatibility can be supported.

3 Language Requirements

Given the types of versioning strategies and designs that are available, there are some key requirements the language designer consider in choosing a strategy and design.

3.10 Requirements Summary

Every language design will make decisions about these requirements. These requirements can be expressed in a table form:

Language form
Schema Lang
3rd party compatibly extend
3rd party incompatibly extend
Designer incompatibly extend

4 Language Design

Upon answering these questions, there are some key decisions that a language developer makes, whether they are consciously made or not.

4.5 Design Summary

Every language design will make a decision in these areas. These designs can also be expressed in a table form:

Schema design
Substitution Mechanism
Component Identification
Incompatible Ext identification

5 Identifying Languages

In many cases, an important aspect of versioning is to be able to determine the specific language of a given texts and sub-texts. This is often done by providing an identifier of the version of text.

Good Practice

Language Identification rule: Any Languages intended for versioning SHOULD have a version identification strategy

5.1 Version Numbers

Having multiple versions naturally leads to the need to identify versions. Version identification has traditionally been done with a decimal separating the major versions from the minor versions, ie "8.1", "1.0". Often the definition of a "major" change is that it is incompatible, and the definition of a "minor" change is that it is forwards- and/or backwards - compatible. Usually the first broadly available version starts at "1.0". A compatible version change from 1.0 might be identified as "1.1" and an incompatible change as "2.0".

The version numbers can be contained in the texts, in the protocol messages containing in the text, or the address for the protocol messages. Some examples are shown below:

It should be noted that associating version number changes with compatibility changes may be idealistic as there abundant cases where this system does not hold. New major version identifiers are often aligned with product releases, or incompatible changes identified as a "minor" change. A good example of an incompatible changed identified as a minor change is XML 1.1. XML 1.0 processors cannot process all XML 1.1 documents because XML 1.1 extended XML 1.0 where XML 1.0 does not allow such extension.

Unfortunately, version numbers often wind up looking very similar to the big bang approach. In many approaches, each language is given a version identifier, almost always a number, that's incremented each time the language changes. Although it's possible to design a system with version numbers that enables both backward and forward compatibility - for example XSLT - typically a version change is treated as if that the new language is not backwards compatible with the old language.

Some efforts, such as HTTP, try to have the best of both worlds by allowing for extensibility (in HTTP's case, via headers) as well as version numbers that explicitly identify when a new version is backwards compatible with an old version.

One argument in favor of version numbers is that they allow one to determine what is a 'new version' and what is an 'old version'. But in practice this is not necessarily true. For example, RSS has 0.9x, 1.x, and 2.x versions, all being actively developed in parallel. In effect the version numbers, even though they appear to be ordered, are simply opaque identifiers. Using version numbers does not gaurantee that version 1+x has any particular relationship to version 1.

Version numbers typically work best when versioning and extending a language is done in a centralized and linear manner. The makeup of each version can then be consistent and well described.

6 Case Studies

6.1 HTML

Language formMarkupMarkupMarkup + attribute names
Schema LangDTD with changesBackus-Naur FormuFormat specific
3rd party compatibly extendYesNoYes
3rd party incompatibly extendNoNoNo
Designer incompatibly extendYesYesYes
stand-aloneYesYesinside HTML
Schema designExtensibleBNF with no extensibilityConstraints on HTML elems+attribute values
Substitution MechanismMust Accept UnknownsNoneHTML's
Component IdentificationDTD + Name Name or Qualified Namestring in class attribute
Incompatible Ext identificationNoneNoneNone

7 Extension versus Versioning

Languages that are designed for decentralized extensibility, notably but not limited to XML, have the interesting situation where the distinction between an extension and a version can be quite blurred, depending upon the language designer’s choices.

The typical way of thinking of these two concepts is that extension is typically the addition of components over space; that is, designers other than the language’s creator are adding components. Versioning is typically the addition of components over time, under the designer’s explicit control. In either case, a change to the language may be done in a compatible or an incompatible way. The simple cases of extensions are compatible decentralized additions and versions are compatible or incompatible centralized changes are how we typically distinguish the terms. But these break down depending upon how the language is designed.

There are a couple of scenarios that illustrate the ambiguity in these terms. Imagine that version 1.0 of a Name consists of "First" and "Last" elements. A 3rd party author extends the Name with a "middle" element in a new namespace which they control.

In scenario 1, the Name author decides to formally incorporate the middle name as an optional (and hence compatible) addition to the name, producing version 1.1 of the Name type. They do this by referring to the third party’s definition for middle names. This is typically considered a new "version" of the Name and would probably result in a new definition. If the Name author re-uses the existing names for compatible revisions, there will be no difference in a text containing middle that is of Version 1.0 or Version 1.1 type. The texts are the same, and thus the distinction between a "version" and an "extension" is meaningless for an individual text.

In scenario 2, the middle author decides that the middle name is a mandatory part of the Name type. They were provided a mechanism for indicating an incompatible change and they use it. Now an instance of Name with the middle is incompatible with version 1.0 of the Name. What "version" of the Name is this middle, and is the middle an "extension" or a "version"? It isn’t 1.0. It’s probably more accurately thought of as a version defined by the 3rd party. Again, the presence of the "extension" is actually an incompatible change.

These two examples—a 3rd party extension being added into a compatible version and a 3rd party extension resulting in an incompatible version—show the ability to specify (in)compatibility has blurred the distinction between these two terms.  

8 Conclusion

This Finding is intended to motivate language designers to plan for versioning and extensibility in the languages from the very first version. It details the downsides of ignoring versioning. To help the language designer provide versioning in their language, the finding describes a number of questions, decisions and rules for using in language construction and extension. The main goal of the set of rules is to allow language designers to know their options for language design, and make backwards- and forwards-compatible changes to their languages to achieve loose coupling between systems should that desirable.

9 References

Free Online Dictionary of Computing. (See http://wombat.doc.ic.ac.uk/foldoc/.)
Flexible XML Processing Profile. (See http://www.upnp.org/download/draft-goland-fxpp-01.txt.)
RFC 793, TCP (See http://www.ietf.org/rfc/rfc793.txt.)
RFC 1521, MIME. (See http://www.ietf.org/rfc/rfc1521.txt.)
HTML 2.0
RFC 1866, HTML 2.0. (See http://www.ietf.org/rfc/rfc1866.txt.)
WebDAV XMLIgnore post
Yaron GolandXML Ignore proposed for WebDAV (See http://lists.w3.org/Archives/Public/w3c-dist-auth/1997AprJun/0190.html.)
RFC 2518, WebDAV (See http://www.ietf.org/rfc/rfc2518.txt.)
RFC 2616, HTTP (See http://www.ietf.org/rfc/rfc2616.txt.)
HTML 4.0
HTML 4.0. (See http://www.w3.org/TR/1998/REC-html40-19980424/.)
TBL Mandatory Extensions
Berners-Lee. Web Architecture: Mandatory extensions. (See http://www.w3.org/DesignIssues/Mandatory.html.)
TBL Extensible languages
Berners-Lee. Web Architecture: Extensible languages. (See http://www.w3.org/DesignIssues/Extensible.html.)
TBL Evolution
Berners-Lee. Web Architecture: Evolvability. (See http://www.w3.org/DesignIssues/Evolution.html.)
Web Architecture: Extensible Languages
Berners-Lee and Connolly, ed. Web Architecture: Extensible Languages World Wide Web Consortium, 1998. (See http://www.w3.org/TR/1998/NOTE-webarch-extlang-19980210.)
HTML Document types
Connolly, ed. HTML Document dialects World Wide Web Consortium, 1996. (See http://www.w3.org/MarkUp/WD-doctypes.)
SOAP 1.2
W3C Recommendation, SOAP 1.2 Part 1: Messaging Framework (See http://www.w3.org/TR/SOAP/.)
Unapproved DRAFT TAG Finding, Versioning: Terminology (See http://www.w3.org/2001/tag/doc/versioning.)
WSDL 1.1
W3C Note, WSDL 1.1 (See http://www.w3.org/TR/WSDL/.)
WS-Policy 1.2
W3C Note, WS-Policy 1.2 (See http://www.w3.org/Submissions/WS-Policy/.)
XML 1.0
W3C Recommendation, XML 1.0 (See http://www.w3.org/TR/REC-xml.)
W3C Working Draft, XML Inclusions (See http://www.w3.org/TR-Xinclude.)
XML Namespaces
W3C Recommendation, XML Namespaces (See http://www.w3.org/TR/REC-xml-names.)
XML Schema Part 2
W3C Recommendation, XML Schema, Part 2 (See http://www.w3.org/TR/xmlschema-2.)
XML Schema Wildcard Test Collection
XML Schema Wildcard Test collection (See http://www.w3.org/XML/2001/05/xmlschema-test-collection/result-ms-wildcards.htm.)
XFront Schema Best Practices
XFront Schema Best Practices (See http://www.xfront.com/BestPracticesHomepage.html.)
XML.com Schema Design Patterns
Dare ObasanjoXML.com Schema design patterns (See http://www.xml.com/pub/a/2002/07/03/schema_design.html.)
Dave Orchard writings on Extensibility and Versioning
Dave Orchard writings on extensibility and versioning (See http://www.pacificspirit.com/Authoring/Compatibility.)

10 Acknowledgements

The author thanks Norm Walsh for many contributions as co-editor until 2005. Also thanks the many reviewers that have contributed to the document particularly David Bau, William Cox, Ed Dumbill, Chris Ferris, Yaron Goland, Rhys Lewis, Hal Lockhart, Mark Nottingham, Jeffrey Schlimmer, Cliff Schmidt, and Norman Walsh.

A Change Log (Non-Normative)

DBO20070518Incorporated Rhys' comments, added version identifier story to forwards compatible evolution, split part 1 into terminology and strategies documents.
DBO20070518Incorporated WG comments from May f2f which involved many updates to, made 1 table for case studies.