W3C

[Editorial Draft] Extending and Versioning Languages: Compatibility Strategies

Draft TAG Finding 28 March 2008

This version:
http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies-20080328.html
Latest version:
http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies
Previous versions:
Unapproved Editors Drafts: http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies-20071113.html, http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies-20071026.html, http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies-20070920.html, http://www.w3.org/2001/tag/doc/versioning-compatibility-2007017.html, http://www.w3.org/2001/tag/doc/versioning-compatibility-20070704.html, http://www.w3.org/2001/tag/doc/versioning-strategies-20070518.html, http://www.w3.org/2001/tag/doc/versioning-20070518.html, http://www.w3.org/2001/tag/doc/versioning-20070326.html, http://www.w3.org/2001/tag/doc/versioning-20061212.html, http://www.w3.org/2001/tag/doc/versioning-20060726.html, http://www.w3.org/2001/tag/doc/versioning-20060717.html, http://www.w3.org/2001/tag/doc/versioning-20060710.html, http://www.w3.org/2001/tag/doc/versioning-20031116.htmlhttp://www.w3.org/2001/tag/doc/versioning-20031003.html
Editor:
David Orchard, BEA Systems, Inc. <David.Orchard@BEA.com>

Abstract

This document focuses on providing information on how a language can be designed for forwards compatible versioning, often the hardest type of versioning to plan for. It also provides motivation for versioning and some discourse on incompatible and backwards compatible versioning. Separate documents contain the terminology definitions and XML language specific discussion.

Status of this Document

This document has been developed for discussion by the W3C Technical Architecture Group. It does not yet represent the consensus opinion of the TAG.

Publication of this finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.

Additional TAG findings, both approved and in draft state, may also be available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.

Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).

Table of Contents

1 Introduction
    1.1 Why Do Languages Change?
    1.2 Kinds of Languages
2 Versioning Strategies
    2.1 Why Have a Strategy?
        2.1.1 Identifying Languages
            2.1.1.1 Version Numbers
            2.1.1.2 XML Namespaces
3 Incompatible
4 Backwards compatible
    4.1 Replacement
    4.2 Side-by-side
5 Forwards Compatible
    5.1 Must Accept Unknowns
        5.1.1 Ignore all or only unknown part
    5.2 Fallback Provided
    5.3 Understanding unknown version identifiers
    5.4 Supporting functionality
6 Mixtures
7 Conclusion
8 References
9 Acknowledgements

Appendix

A Change Log (Non-Normative)


1 Introduction

The evolution of languages by adding, deleting, or changing syntax or information is called versioning. Making versioning work in practice is one of the most difficult problems in computing. Arguably, the Web rose dramatically in popularity because support for evolution and versioning were built into HTML and HTTP. Both systems provide explicit extensibility points and rules for understanding extensions that enable their decentralized extension and versioning.

This finding describes general problems and techniques in evolving systems in compatible ways. The terminology definitions used throughout are expressed in [Versioning]. A number of design patterns and rules are discussed with a focus towards enabling language changes such that newer version(s) of a language are processable by software that only understands the older version(s) of the language, aka forwards-compatibility. There are a few crucial good practices that enable forwards compatible versioning in a language:

  1. the language should be extensible;

  2. any extensions in a text of the language should have a well-defined default meaning (which often is that the extension conveys no information and can be ignored);

  3. if the texts of the language contain version identifiers, then a given language version should define a set of compatible future version identifiers.

1.1 Why Do Languages Change?

There are many reasons why a different version of a language may be needed. A few of them include:

  1. Bugs may need to be fixed. Production use may reveal defects or oversights that need to be fixed. This may involve changes to texts of the language or changes to the information of existing texts.

  2. Changing requirements may motivate changes in the language. For example, a person name structure may be extended with a middle name, prefix, suffix, and/or common name.

  3. Different variations of a language may be desirable. For example, the XHTML 1.0 Recommendation defines strict, transitional, and frameset languages. All three of those languages purport to define the same namespace, but they describe different languages. Additional languages may be defined by other specifications, such as the XHTML Basic Recommendation.

Whether ten, a hundred, or a million resources have been deployed, if a language is changed in such a way that all those applications will determine that texts of the new language are invalid, a versioning problem with real costs has been introduced. Whatever the cause, over time, different versions of the language exist and designing applications using it to deal with the changes in a predictable, useful way requires a versioning strategy.

1.2 Kinds of Languages

Ultimately, there are different kinds of languages. The versioning approaches and strategies that are appropriate for one kind of language may not be appropriate for another. Among the various kinds of languages, we find:

  • Just strings: some languages are just set(s) of strings. Using strings to identify countries, states or provinces, airport 3 letter codes, and traffic light colors are examples of "just string" languages.

  • Non-markup Text: languages designed with a character based text format. These may be programming languages such as Java or ECMAScript, or data formats like CSS or Comma Separated Values. Typically these are intended for humans to author and/or view.

  • Markup: SGML, HTML, and the non-SGML variants of HTML are all character based markup languages. Versioning XML languages is described in ???

  • binary: languages that are not in a text format. These may be image formats like GIF, JPEG, or even binary encoded XML.

This is by no means an exhaustive or exclusive list. Many languages have mixed modes. For example, XQuery has a non-XML text mode and an XML mode.

Languages may be composed of different languages, including different kinds of languages. If a language is composed of other languages, then languages are considered nested. There may be different versioning strategies for each nested language, and they all combine together into the overall versioning strategy.

2 Versioning Strategies

In broad terms, strategies for versioning fall into a number of classes ranging from "none" to "compatible" to "incompatible":

There's no single approach that's always correct. Different choices or decisions may be appropriate for different applications. But by the same token, the approaches that are available depend on other decisions or constraints. One very important decision is the whether the language can be evolved by distributed parties such that parallel evolutionary development can occur. The point in the lifecycle of the language may also affect the selection of the versioning strategy for the language. A language commonly goes through a lifecycle of iterative development followed by deployment followed by deployment of new versions. A decision for the development cycle of the language could be different from the decision at the deployment. For example, many W3C languages adopt a strategy of incompatible changes are allowed between Working Drafts and up to Candidate Recommendation, but then Proposed Recommendation and Recommendation are all compatible versions.

The variety of decisions makes it imperative to plan for versioning from the start. If versioning is not planned from the start, then the possible versioning strategies may be constrained by decisions that have been made implicitly rather than explicitly.

Just as there are a number of strategies, there are a number of designs for implementing a strategy. The internet - including MIME, markup languages, and XML languages have successfully used various strategies, either singly or in combination. Summaries of strategies and requirements were produced for earlier technologies [Web Architecture: Extensible Languages] and guided XML Namespaces and Schema.

2.1 Why Have a Strategy?

Different kinds of languages and different versioning strategies expose different problems. Attempting to deploy a system that provides no versioning mechanism puts the burden of version "discovery" on consumers and is often impractical in anything except a closed system.

At the other end of the spectrum is incompatible versioning approach which is also problematic. "Incompatible" versioning is a very coarse-grained approach to versioning. It requires that all of the text must be understood and known by the consumer or the consumer will fault. It often establishes a single version identifier, such as a version number or namespace name, for an entire text.

The semantics of the incompatible approach are that applications decide on the basis of the version of the text (with or without a version identifier) whether or not they know how to process that text. If the version isn't recognized, the entire text is rejected. Typically, when introducing a new version using the incompatible approach, all of the software that produces or consumes the texts is updated in a sweeping overhaul in which the entire system is brought down, the new software deployed and the system is restarted. This incompatible approach to versioning is practical only in circumstances where there is a single controlling authority, and even in that case, it carries with it all manner of problems. The process can take a considerable amount of time, leaving the system out of commission for hours if not days. This can result in significant losses if the system is a key component of a revenue generating business process and the cost of coordinating the system overhaul can also be quite costly as well.

The incompatible approach is appropriate when the new version is radically different from its predecessor. But in many cases, the changes are incremental and often a consumer could, in practice, cope with the new version. For example, it might be that there are many messages that don't use any features of the new version or perhaps it is appropriate to simply ignore components that are not recognized.

Consider a producer and a consumer exchanging messages of a particular language. Imagine that some future version of the language defines a new component. Because producers and consumers are distributed, it may happen that an old consumer, one unprepared for a new component, encounters a message with a new component sent by a newer producer.

If incompatible versioning is used, old consumers will reject the new message. However, if the versioning strategy allowed the old consumer to process the text even with the unrecognized content, it's quite possible that other components of the system could adapt to the previous behavior. In effect, the old system would ignore the new component and so it would "see" a message that looks just like the old message it is expecting.

For the producer, the result would be that the request is fulfilled, though perhaps not taking into account the new features. For example, a request that results in a response may return a text of the language without the new component. In many cases this may be better behavior than receiving an error. In particular, producers using the new language can be written to cope with the possibility that they will be communicating with older consumers.

If the new system needs to make sure that the new component is understood, then it can change the language in a way to indicate that the new behavior is not considered optional, aka backwards compatible.

Often, what is needed is some sort of middle ground solution.

2.1.1 Identifying Languages

As part of a strategy for language design, it is often necessary to be able to determine the specific language of a given text. This is often done by providing an identifier of the version of text, such as a version number or some other structure such as an XML Namespace. Regardless of any particular technologies chosen, the language should have an explicit version identification strategy.

Good Practice

Language Identification rule: Any Languages intended for versioning SHOULD have a version identification strategy

The difficult, important, and often overlooked part of a version identification strategy is specifying the meaning and interpretation when a consumer encounters a version identifier it does not know about. A typical problem with version identifiers is that it is unclear what is being identified with the version identifiers. The common misconception is that a version identifier, such as a number, indicates the version of the document. That may be true, but in many cases it is an incomplete and unusable assertion.

In the scenario of documents, there are many possibilities for what a version identifier identifies. For a document that is potentially in many different versions of a language, an identifier that is a number could be:

  • The highest version of the language the document is compatible with

  • The lowest version of the language the document is compatible with

  • The lowest version of the language that has all the features the language uses

  • The range of versions of the language the document is compatible with

Imagine a name language version 1 is first, last, and extensions; and version 2 is first, last, optional middle, and extensions. If a name contains first and last, should the identifier be version 1 or 2? The previous options yield answers of: 2, 1, 1, 1-2. If a name contains first, last, and middle then the previous options yield answers of: 2, 1, 2, 1-2

Usually, the document contains the version number of the latest version of the language that the producing application understands. Thus the "newest" version of the identifier is used, even if the document itself is valid under older versions of the language. This usually works fine if the producer and consumer are at the same version, or even if the consumer understands the older and the newer version. But forwards compatibility requires that a consumer that doesn't understand the newer version must somehow treat the document as if it was an older version. Approaches for using version identifiers to enable forwards compatibility is covered in 5 Forwards Compatible

As a side note, version identifiers are often used in protocols that exchange documents. One scenario is that the version number identifies the highest version of the language the document is compatible with AND the highest version of the language that can the producer will understand when it is treated as a consumer. In this case it is a protocol version identifier, not just a format identifier. HTTP is a good example. The HTTP specification says "The protocol versioning policy is intended to allow the sender to indicate the format of a message and its capacity for understanding further HTTP communication, rather than the features obtained via that communication." Because HTTP is a request-response protocol, the capacity for further HTTP communication is the crucial "version" information conveyed. Most documents that might have version #s do not fit that "future capacity" use case, they are just documents that do not have a protocol.. There are cases where a document format is combined with a protocol, such as Atom. These combination protocol/format case are fairly rare and not generally applicable.

2.1.1.1 Version Numbers

Version identification has traditionally been done with a decimal separating the major versions from the minor versions, i.e. "8.1", "1.0". Often the definition of a "major" change is that it is incompatible, and the definition of a "minor" change is that it is forwards- and/or backwards - compatible. Usually the first broadly available version starts at "1.0". A compatible version change from 1.0 might be identified as "1.1" and an incompatible change as "2.0".

The version numbers can be contained in the texts, in the protocol messages containing in the text, or the address for the protocol messages. Some examples are shown below:

Example 1: Name examples with version identifiers.
<name version="2.0">
  <given>Dave</given>
  <family>Orchard</family>
</name>

<span class="fn20">Dave Orchard</span>

urn:nameschemev2:given:Dave:family:Orchard

<?XML version="1.1"?>

GET /name/123456789  HTTP/1.1

GET /name/v2/123456789/ HTTP 1.1

It should be noted that associating version number changes with compatibility changes may be idealistic as there abundant cases where this system does not hold. New major version identifiers are often aligned with product releases, or incompatible changes identified as a "minor" change. A good example of an incompatible changed identified as a minor change is XML 1.1. XML 1.0 processors cannot process all XML 1.1 documents because XML 1.1 extended XML 1.0 where XML 1.0 does not allow such extension.

Unfortunately, version numbers often wind up looking very similar to the incompatible approach. In many approaches, each language is given a version identifier, almost always a number, that's incremented each time the language changes. Although it's possible to design a system with version numbers that enables both backward and forward compatibility - for example XSLT - typically a version change is treated as if that the new language is not backwards compatible with the old language.

Some efforts, such as HTTP, try to have the best of both worlds by allowing for extensibility (in HTTP's case, via headers) as well as version numbers that explicitly identify when a new version is backwards compatible with an old version.

One argument in favor of version numbers is that they allow one to determine what is a 'new version' and what is an 'old version'. But in practice this is not necessarily true. For example, RSS has 0.9x, 1.x, and 2.x versions, all being actively developed in parallel. In effect the version numbers, even though they appear to be ordered, are simply opaque identifiers. Using version numbers does not guarantee that version 1+x has any particular relationship to version 1.

Version numbers typically work best when versioning and extending a language is done in a centralized and linear manner. The makeup of each version can then be consistent and well described.

2.1.1.2 XML Namespaces

There are many cases where decentralized and non-linear versioning is desired. The desire for decentralized and non-linear versioning and extensibility was a large motivator for XML and for XML Namespaces. The self-describing and extensible nature of XML markup, and the addition of XML Namespaces, provides a framework for developing languages that can evolve in a decentralized manner. XML Namespaces [ XML Namespaces 1.0] provide a mechanism for associating a URI with an XML element or attribute name, thus specifying the language of the name. This also serves to prevent name collisions.

3 Incompatible

As desirable as compatible evolution often is, sometimes a language may not want to allow it. In this model, a consumer will generate a fault if it finds a component it doesn’t understand. An example might be a security specification where a consumer must understand each and every extension. This suffers from the significant drawback that it does not allow compatible changes to occur in the language, as any changes require both consumer and producer to change. As this finding focuses on compatible versioning, we provide no more focus on incompatible evolution.

4 Backwards compatible

Backwards compatibility evolution of a language means that producers of texts in a language should be able to produce texts that consumers that have been updated with a newer version of the language will understand. It generally means supporting previous versions of text in a newer consumer. There are two significant ways that backward compatibility can be supported.

4.1 Replacement

In the replacement design, a new version of software replaces the old and the new version of the software supports the old and the new version. The producer may or may not need to distinguish between the old and the newer consumer or the texts produced. For example, a web resource that supports additional Name Information as input may not need to change the URI of the resource.

As our definition of backwards compatibility specifies that the newer language's Defined Text Set must be a superset of the older language's Defined Text Set, the typical change is the addition of optional content into the newer version of the language. The older producer simply won't produce texts with the newer content. It is possible to reduce the Defined Text Set by removing items and achieve backwards compatibility, as long as the newer Language's Accept Text set contains all the texts originally in the Defined Text Set. One mechanism to do this is to replace the content with a construct that allows the removed construct.

4.2 Side-by-side

In the side-by-side design, the new version of the software and the old version of the software are deployed "side-by-side". One variant of the approach is offering both versions of the system, for example by using different URIs for the old and new with a particular focus on enabling older versions of applications to operate on inputs that make use of newer language features.. The request to one resource gets mapped to the other resource behind the scenes using a proxy or gateway. This "alternative" approach works when the intermediary can completely handle or generate the new information (for backwards compatibility) or accept the new information (for forwards compatibility). For example, adding SSL security to a resource changes the URI but a Web server can typically handle mapping the https: URI to the older http: URI. If both URIs are maintained, then the addition is a compatible change. Another example is where new information is required, such as the priority, and the intermediary can apply a default value to provide the required priority. However, this too has its costs as multiple versions of the software must be supported and maintained over time and there is the added cost of developing the proxy or gateway between the two environments. Further, this does not work in scenarios where the intermediary cannot generate the new required content. For example, if a middle name is required in V2, a middle cannot be generated from just a family and a given name.

5 Forwards Compatible

Forwards compatible evolution of a language means that producers of texts in a language should be able to produce texts in a revision of the language without consumers having to change existing implementations that know of only the original language. The most common characteristic of a compatible change is the addition of syntax and/or features in a language, usually using the original language's extensibility mechanisms. However, languages may change in a forwards compatible way through the removal of features or syntax. This finding deals with the common case, in which features and/or syntax are added not removed, and in which the goal is to maximize compatibility when making such changes.

A good example of the benefits of extensibility is HTML. The first version of HTML was designed for extensibility; it specified that "unknown markup" may be encountered. An example of this is the addition of the IMG tag. This is a great example of a language designed for extensibility. Perhaps the opposite example of extensibility is XML 1.0. The first version of XML was designed for language authors to create their own elements and attributes, but it did not allow for new unknown content. It did not allow for new characters in element or attribute names, new punctuation such as different quote characters. This made it very difficult to move to XML 1.1 and adding new characters to element names for internationalization and other purposes.

The first rule introduced in this Finding relating to extensibility is:

ednote

Please select one of the following 3 alternatives for the finding

Good Practice

Be Extensible rule: Languages should be Extensible.

Good Practice

Be Extensible rule: Languages that will be versioned should be Extensible.

We have observed that languages that are successfully versioned are generally extensible.

5.1 Must Accept Unknowns

A language whose syntax allows additional unknown text also requires a specification of what happens when a text contains such additional and unknown text. By the definition of Extensibility, there must be a default rule for interpreting any additional unknown text. If the extensibility is used in a forwards-compatible way, then by definition the software consuming the extension does not know about the extension and we call such extension an unknown extension. If the software consuming the extension "knows" about the extension, then it has been revised and uses the revised language that incorporates the extension. The behavior of software when it encounters an unknown extension should be clear.

The simplest model that enables forwards-compatible changes is to require that a language consumer must accept content that is unknown. This rule is:

Good Practice

Must Accept Unknowns Rule: Consumers MUST accept text portions that they do not recognize where the language has allowed extensibility.

HTML 4.01 follows this approach in "If a user agent encounters an element it does not recognize, it should try to render the element's content". The Must Accept Unknowns rule for XML was first standardized in the WebDAV specification RFC 2518 [6] section 14 and later separately published as the Flexible XML Processing Profile [3].

An extension that affects an existing component in an incompatible way is an incompatible change because a consumer that is unaware of the extension will produce Information from the text that is incompatible with the intended Information. An additional rule is required:

Good Practice

Preserve existing information Rule: An Extensible Language SHOULD require that any texts with extensions SHOULD be compatible with a text without the extensions.

An example of an incompatible extension because of a violation of this rule is a purchase order with a payment amount that is in US dollars that is extended by an element that specifies the payment amount is in Euros. An older consumer will incorrectly assume that the payment amount is still in US dollars when in fact it is in Euros.

There are two commonly deployed refinements of the Must Accept rule that qualify what kind of handling beyond accepting is required.

Must Accept and Ignore Unknowns Rule: Consumers SHOULD accept and ignore any text portion that they do not recognize. This is commonly shortened to "Must Ignore Unknowns Rule". HTML 1, 2 and 3.2 follow the Must Accept and Ignore Unknowns Rule as they specify that any unknown start tags or end tags are mapped to nothing during tokenization.

Ignoring content is a simple solution for the default processing rule for unknown content. In order to achieve a compatible evolution, the newer texts of a language must be able to be treated as older texts if the unknown content is ignored. Object systems typically call this "polymorphism", where a new type can behave as the old type.

Another model is to preserve the unknown. Must Accept and Preserve Unknowns Rule: Consumers SHOULD accept and preserve any text portion that they do not recognize. HTTP 1.1 [7] specifies that a transparent proxy should accept and preserve any headers it doesn't understand: "Unrecognized header fields SHOULD be ignored by the recipient and MUST be forwarded by transparent proxies."

These models may be combined together. Web browsers often implement both variants. Unknown elements are ignored for the purposes of rendering, but the elements are still placed in the DOM and are available for CSS or other DOM related technologies.

Another way of looking at this combination is that there are two languages. The browser renderer understands HTML which involves ignoring unknown elements or attributes. By our language definitions, the renderer's Defined Text set does not include unknown elements or attributes, though the Accept Text Set does. The browser DOM understands any HTML elements or attributes. The DOM Defined and Accept Text set includes any elements or attributes.

5.1.1 Ignore all or only unknown part

In tree based languages, which includes most markup languages, there are two variants of the Must Accept rules for dealing with extensions, either ignoring the entire tree or just the unknown part of the tree.

The rule for ignoring the entire tree is: Must Accept All Rule: The Must Accept rule applies to unrecognized texts and any tree descendents. This variation on must accept requires the consumer to accept the text and any children it does not understand. Most data applications, such as Web services that use SOAP header blocks or WSDL extensions, adopt this approach to dealing with unexpected markup. For example, if a SOAP message is received with a SOAP header block that contains unrecognized child elements, the child elements and their children must be ignored unless marked as "Must Understand". Note that this rule is not broken if the unrecognized elements are written to a log file. That is, "accepted" or "ignored" doesn’t mean that unrecognized extensions can’t be processed; only that they can’t be the grounds for failure to process.

Other applications may need a different rule as the application may want to retain the content of an unknown component, perhaps for display purposes. Must Accept Container Node Rule: The Must accept rule applies only to the smallest node in the tree. This variation on must accept requires the consumer to accept the smallest part of the text or node that is ignorable. For markup languages, this could be just an element or attribute that it does not understand, but in the case of elements, to process the children of that element. HTML is an example of this, where the start or end tag is ignored and any children are processed in place of the start or end tag. The Must accept Container Node practice was described in [HTML 2.0]. This retains the element descendents in the processing model so that they can still affect interpretation of the text, such as for display purposes.

Each variant has different costs and benefits. Choosing to ignore the container node only helped HTML considerably, but there are some elements who's children also should be ignored for rendering, particularly the Script element. It is possible to design a more complicated system that mixes the two together. For example, HTML could have provided inline syntax such as an ignore-childrenattribute that allowed an author to specify the element and it's children should be treated with the Must accept all rule. This has various problems too, such as a cumbersome syntax for such additions especially when the extension becomes part of the language. The languages designers could have made the first official version of the Script element have the ignore-children attribute optional but that absence of the attribute meant a default of ignore all children. That would enable the old and the new browsers to correctly process the Script element. Then authors or even the language designers could decide at some point to even remove the attribute from the Script element.

5.2 Fallback Provided

A language can provide mechanisms for explicit fallback if the text is not supported. [MIME] provides multipart/alternative for equivalent, and hence fallback, representations of content. [HTML 4.0] uses this approach in the NOFRAMES element. In XML, the XML Inclusions specification [XInclude] provides a fallback element to handle the case where the putatively included resource cannot be retrieved. There are many variations on where the fallback content can be found. For example, a schema language could specify that fallback content is found in a text, in a schema, or even in the schema for the schema language.

5.3 Understanding unknown version identifiers

Providing forwards compatibility often requires more than a substitution model for texts, it must also provide a substitution model for any version identifiers. The fundamental problem with most designs of a single version identifier in a document is that it usually doesn't provide for a given document to be valid under more than one version. For forwards compatibility, the version identifier must specify a "space of versions" that a given document is valid under, whether that's a list or regular expression or some other algorithm. In particular, the version identification strategy must specify how unknown versions are dealt with.

Good Practice

Default Unknown Version Identifier Handling Rule: Languages MUST provide a default model for unknown version identifiers for forwards-compatible evolution.

The handling model could be an algorithmic approach. For version numbers, one could say that version numbers will only have a "major" change if there is an incompatible change. For example, version 1.1 of a language is by definition compatible with version 1.0 and version 2.0 is incompatible. Then, when the producer puts 1.0, 1.1, or 2.0, a consumer at any level will know whether it can process the content. This also means that there is a choice about which version number to put in, the lowest or the highest. A document that contains "1.1" means that any 1.X processor can process it. A "2.0" document means that a 1.X processor cannot process it, but any "2.X" processor can.

Then the language should have wording about processing unknown version numbers. Sample wording for a handling model for version identifiers is, "A processor of this version MUST not fault if it receives a document that contains the same major version number." This rule would be in conjunction with forwards-compatible design for the texts, such as "Must Accept Unknowns".

HTML handled unknown version numbers in a forwards compatible way because browsers ignored the version numbers they didn't understand. On the other hand, XML 1.0 did not specify what an XML 1.0 processor should do when an XML document identified as XML 1.1 or XML 2.0 was encountered. As such, many XML 1.0 processors faulted when encountering XML 1.1 documents, whether those documents contained XML 1.1 content or not. Note that XML 1.1 specified that documents should only be specified as XML 1.1 documents if they had XML 1.1 content. Perhaps if XML 1.0 had specified that any document marked XML 1.X should be processable as an XML 1.0 document, then the migration to XML 1.1 would have been easier.

5.4 Supporting functionality

Additional functionality can be provided in a language for determining the capabilities of the system that the text is being interpreted in. A language can provide a mechanism for explicit testing. The XSLT Specification provides a conditional logic element and a function to test for the existence of extension functions. This allows designers of stylesheets to deal with different consumer capabilities in an explicit fashion.

6 Mixtures

Languages can choose a mixture of approaches. For example, XSLT provides both an explicit fallback mechanism for some conditions and explicit testing for others. The SOAP specification, another example, specifies Must accept as the default strategy and the ability to dynamically mark components as being in the Must Understand strategy.

7 Conclusion

This Finding is intended to motivate language designers to plan for versioning and extensibility in the languages from the very first version. It details the downsides of ignoring versioning. To help the language designer provide versioning in their language, the finding describes a number of good practices for using in language construction and extension. The main goal of the set of rules is to allow language designers to know their options for language design, and make backwards- and forwards-compatible changes to their languages to achieve loose coupling between systems should that desirable.

8 References

FOLDOC
Free Online Dictionary of Computing. (See http://wombat.doc.ic.ac.uk/foldoc/.)
FlexXMLP
Flexible XML Processing Profile. (See http://www.upnp.org/download/draft-goland-fxpp-01.txt.)
tcp
RFC 793, TCP (See http://www.ietf.org/rfc/rfc793.txt.)
MIME
RFC 1521, MIME. (See http://www.ietf.org/rfc/rfc1521.txt.)
HTML 2.0
RFC 1866, HTML 2.0. (See http://www.ietf.org/rfc/rfc1866.txt.)
WebDAV XMLIgnore post
Yaron GolandXML Ignore proposed for WebDAV (See http://lists.w3.org/Archives/Public/w3c-dist-auth/1997AprJun/0190.html.)
WebDAV
RFC 2518, WebDAV (See http://www.ietf.org/rfc/rfc2518.txt.)
HTTP
RFC 2616, HTTP (See http://www.ietf.org/rfc/rfc2616.txt.)
HTML 4.0
HTML 4.0. (See http://www.w3.org/TR/1998/REC-html40-19980424/.)
TBL Mandatory Extensions
Berners-Lee. Web Architecture: Mandatory extensions. (See http://www.w3.org/DesignIssues/Mandatory.html.)
TBL Extensible languages
Berners-Lee. Web Architecture: Extensible languages. (See http://www.w3.org/DesignIssues/Extensible.html.)
TBL Evolution
Berners-Lee. Web Architecture: Evolvability. (See http://www.w3.org/DesignIssues/Evolution.html.)
Web Architecture: Extensible Languages
Berners-Lee and Connolly, ed. Web Architecture: Extensible Languages World Wide Web Consortium, 1998. (See http://www.w3.org/TR/1998/NOTE-webarch-extlang-19980210.)
HTML Document types
Connolly, ed. HTML Document dialects World Wide Web Consortium, 1996. (See http://www.w3.org/MarkUp/WD-doctypes.)
SOAP 1.2
W3C Recommendation, SOAP 1.2 Part 1: Messaging Framework (See http://www.w3.org/TR/SOAP/.)
Versioning
Unapproved DRAFT TAG Finding, Versioning: Terminology (See http://www.w3.org/2001/tag/doc/versioning.)
Versioning XML including XML Schema
Unapproved DRAFT TAG Finding, Extending and Versioning Languages: XML Languages (See http://www.w3.org/2001/tag/doc/versioning-xml.)
WSDL 1.1
W3C Note, WSDL 1.1 (See http://www.w3.org/TR/WSDL/.)
WS-Policy 1.2
W3C Note, WS-Policy 1.2 (See http://www.w3.org/Submissions/WS-Policy/.)
XML 1.0
W3C Recommendation, XML 1.0 (See http://www.w3.org/TR/REC-xml.)
XInclude
W3C Working Draft, XML Inclusions (See http://www.w3.org/TR-Xinclude.)
XML Namespaces
W3C Recommendation, XML Namespaces (See http://www.w3.org/TR/REC-xml-names.)
XML Schema Part 2
W3C Recommendation, XML Schema, Part 2 (See http://www.w3.org/TR/xmlschema-2.)
XML Schema Wildcard Test Collection
XML Schema Wildcard Test collection (See http://www.w3.org/XML/2001/05/xmlschema-test-collection/result-ms-wildcards.htm.)
XFront Schema Best Practices
XFront Schema Best Practices (See http://www.xfront.com/BestPracticesHomepage.html.)
XML.com Schema Design Patterns
Dare ObasanjoXML.com Schema design patterns (See http://www.xml.com/pub/a/2002/07/03/schema_design.html.)
Dave Orchard writings on Extensibility and Versioning
Dave Orchard writings on extensibility and versioning (See http://www.pacificspirit.com/Authoring/Compatibility.)

9 Acknowledgements

The author thanks Norm Walsh for many contributions as co-editor until 2005. Also thanks the many reviewers that have contributed to the document particularly David Bau, William Cox, Ed Dumbill, Chris Ferris, Yaron Goland, Rhys Lewis, Hal Lockhart, Mark Nottingham, Jeffrey Schlimmer, Cliff Schmidt, and Norman Walsh.

A Change Log (Non-Normative)

Changes
WhoWhenWhat
DBO20070518Incorporated Rhys' comments, added version identifier story to forwards compatible evolution, split part 1 into terminology and strategies documents.
DBO20070518Incorporated WG comments from May f2f which involved many updates to 2.2.2.2, made 1 table for case studies.
DBO20080328Incorporated WG comments from Feb f2f and Noah's Feb comments which all involved many updates.