W3C TAG Editor's Draft:
Architectural Considerations for Language Versioning for the Web

Abstract

Introduction

This document is an attempt to focus on the issues of how languages evolve, with an aim toward providing guidance for W3C Working Groups considering how to write extensible languages, whether to provide version indicators, how to control extensibility of languages when introducing new versions of languages.

The TAG has spent many years looking at versioning in general, and there are a number of drafts on which consensus has been elusive. The primary motivation for continuing this work has been to specifically look at the issues and requirements for versioning in HML, between HTML4 and HTML5.

W3C defines languages. Languages evolve, creating new "versions" of the language. Evolution is either incremental or drastic. After evolution, new readers need to work with older content, and new content needs to not "break" inappropriately with older readers. How should language designers plan for evolution, advise implementors of consumers and creators of content about how to "future proof" what they are making?

Background

The history of TAG work in versioning is summarized in Appendix I. More recently, the TAG reviewed the history of previous work: http://lists.w3.org/Archives/Public/www-tag/2009Apr/0028.html , and updated ISSUE-41 http://lists.w3.org/Archives/Public/www-tag/2009Apr/0061.html.

Terminology

Language, Version, Version Indicator, Sender, Receiver

Nature of Language Changes for New Versions

Analyze the kinds of language changes that have occurred in the past and might occur in the future, so we can correlate them to the utility of version indicators (not really started)

Reasons for Language Changes

Why do languages evolve, in ways that might need to be called out as separate versions? Specifically, why might HTML evolve?

How can we define how we define HTML5 today such that, if problems are discovered that require incompatible language changes, we don't have rampant compatibility problems if implementations are updated to a later version?

In the history of computer science, it is difficult to come up with any language that has not evolved, been extended, or otherwise "versioned" as long as the language has been in use. This history of extension and change applies to network protocols, character encoding standards, programming languages, and certainly to every known technology found on the web.

It is difficult to come up with cases where a language hasn't gone through at least some minor incompatible change.

The standards process is established as a way of evolving specifications and implementations in a way to reduce the likelihood of complete failure to interoperate, but certainly not to guarantee that no incompatible changes will be needed in the future.

Reasons why Languages (and HTML in particular) will need changes in the future:

  1. Requirements change: This is the main reason for evolution of languages -- people want the language to support some new feature that hadn't been thought of at the time of the original language design. Often requirements can be accommodated without actually changing the behavior of anything else, but at times, something resembling a "version" is necessary.
  2. Difficulties are uncovered after CR: Two implementations aren't representative "Candidate Recommendation" exit criteria only needs two implementations, and does not even require spanning the breadth of applicable hardware and software. Can HTML5/CSS3 work well on an electronic paper display such as Kindle? Can it work well in a collaborative multi-pointer system? Is there a single "focus" or "tab order"? Does it work well with typical "remote control" devices used for TVs? These are current platforms which are not required to work well, in order to exit CR.
  3. ambiguities appear: Implementors get together and write a specification. They're happy because the spec matches what they implement -- or so they think! However, all of the implementors were part of the spec development process and .... amazingly .... there are some things they know and agree on that aren't part of the spec. (No matter how brilliant and wise the spec editor). Later, someone else comes along and implements the spec as written, but, either because of confusing wording or missing information, their implementation is incompatible. Then there's a desire to update the spec to resolve the ambiguity, but there is no way for authors to create material that acknowledges that the author has chosen the new (unambiguous) definition over the previous (ambiguous) one.

Certainly there are other reasons for language evolution and there's some overlap between these.

Version Indicators: How New Versions of languages might be marked

There are many ways in which the "version" or "nature" of an entity might be indicated. This section enumerates the kinds of version indicators available generally (out of band, in-band global, in-band local) and specifically for HTML  (MIME types, comments, DOCTYPE, new tags, namespaces).

Version indicators can either be

In-content version indicators can either be

Nature of Version Changes

Languages can change through

Whether language changes can be recognized without version indicators depends on the type of change: Some augmentations might be recognized by appearance of syntax that wasn't previously recognized (i.e., the "version indictor" is the use of the feature itself). Augmentations might be ignored or merely processed incorrectly by old implementations rather than being recognized as intended with a formerly unimplemented interpretation. Restrictions, clarifications, incompatible changes cannot readily be determined by scanning, though.

Version Detection: How can receiving software detect versions

Use of "Modes" in browsers: ("quirks mode", "near standards mode", etc.) in the browser seems like it would have some correlation to "versions".

Even though it is possible to avoid having out-of-band or global-scope version indicators for augmentations, this does not mean that there are no advantages or uses for in-band global indicators. If there are multiple languages (whether Algol 60 vs Algol 68 or just multiple "modes", having a global-scope in-band version indicator allows for switching between one interpreter and another. Indicating the version in-band but requiring parsing of the content means that it isn't possible to evolve syntax or parsers.

To modulate the interpretation of the text in question. That is, depending on what the version indicator is, interpreting agents might have to interpret the same text in two different ways.

Utility of Version Indicators outside of Publication / Distribution

Review the compatibility and development workflow strategies for using different kinds of version indicators (future content with current readers,  distinguishing current from future content with future readers) http://lists.w3.org/Archives/Public/www-tag/2009Apr/0064.html

To syntactically characterize the text in question.

One use case for embedded version indicators is to track versions during authoring, production and deployment before they are sent over the wire. Authors and authoring tools may well know which version of a language they are editing or producing content for, which features they are assuming and so forth. Without any way of marking the intended version in the content itself, it is likely that version indicators will be carried outside, and subject to loss. As has been seen with MIME types, external metadata is subject to risks of separation, lack of control by authors on deployment separation. Right now, new HTML features seem to be deployed on the web by advanced cites "sniffing" the User Agent version string and using it to determine which version of a HTML page should be generated. This process is subject to some significant failures, mainly because new or otherwise unrecognized servers have no way of indicating to such sniffers that they, too, intend to interpret the same features as one or another proprietary browser. We need to consider the use cases of language version management during pre-publication processes, and also the use case of "browser version" sniffing and the failure cases. This touches on the "content negotiation" issue (as the sub-case of negotiating versions).

Evaluation Criteria: reasons for and against using version indicators

evaluate the use of version indicators against possible future language changes to determine what are the reasons for and against using version indictors (to be done). Version indicators should be approached with some amount of skepticism.

Sometimes versions can be determined by scanning the text to see whether it syntactically conforms to the corresponding language specification; the purpose of the indicator is to make it unnecessary for the consumer to do this.

Versioning formalism

http://w3.org/mid/063031F1-1645-4A4C-A350-2DF0077B9722@creativecommons.org"

Motivation of Implementors of agents

See This formalism ought to be modified to account for differential payoff to sender and receiver.

Situation: a sender sends a message containing parts that are not understood by the receiver

and distinguish between points in sender/receiver payoff space with good (positive payoff), punishing (negative payoff), or neutral (zero payoff) coordinates along the sender/receiver axes.

The payoff to the sender will shape the sender's behavior, and the payoff to the receiver will shape the receiver's behavior.

The payoff to the sender depends on what the receiver does (unless communicating just "feels good" or is required by law), but will often be indirect. E.g. in advertising, it's not the payoff from any particular transaction that matters, but the amortized payoff from many transactions. When a message is broadcast to multiple audiences with different capacities, it matters a lot whether the sender knows that this is the case.

Example: Creators of good children's TV shows (which I hereby define to be the ones I like to watch) know that there are two audiences and craft their material so that both appreciate the content. Creators of bad ones don't and only aim for one audience. But including material meant only for grownups (positive receiver 2 payoff), perhaps disguised or deemphasized so as not to make those who don't understand anxious (negative receiver 1 payoff), is, to me at least, the essence of craft and quality.

We have the same situation with content creators

Those exercising craft have a more difficult job in creation and testing - they have to think - and this extra effort will only be made if the perceived benefit outweighs the cost. In a sense material that is knowingly destined for different audiences constitutes multiple communications channels, and the question of server/client compatibility (payoff) might be better thought of not as a language extensibility problem but a multiplexing problem.

If in-line language extensibility (think: child-inaccessible puns) is outlawed, the new material will be communicated *somehow*. (This is similar to the rewrite-based extensibility question in programming languages. Macros happen whether a language spec supports them or not; it's just a question of how extensibility will be managed - forking new languages (think: content-types), external preprocessors, or in-language macro facilities.)

It's not clear what purpose version markers in HTML, indicating the presence of stuff not understood by some clients, should have for clients that don't understand that stuff. I guess it could lead to "upgrade your browser" or "get this plugin" dialogs, or "save this file or choose application", the payoff of which is the ultimate benefit minus the annoyance of having to deal with the dialog. But in some cases knowing that you don't understand will only lead to receiver anxiety.

Analogy: Suppose I go to Peru and am spoken to in Spanish, which I don't understand. How can I tell how important what's being said should be to me? Sometimes very, sometimes not; sometimes I can tell whether it has must-understand status, sometimes I can't; in the latter case I only have anxiety (perhaps a lot, if it's an armed soldier who's speaking) and in a sense I might prefer to have heard nothing, like the child who happily doesn't know that an adult pun has just flown past in their TV show.

Adoption of these recommendations

any decision to require or encourage use of version indicators as a way of modulating behavior will require some agreement of current browsers to do work that will only pay off in the future, and getting that agreement requires buy-in by the affected parties. However, I don’t want to start with the presumption that “they won’t go along” without first making the case for why allowing for future non-compatible extensions in current browsers is good practice, even when such changes should be avoided if at all possible.

Specific HTML recommendations

Use of <!DOCTYPE HTML> vs. specific version identified HTML5. Version indicator useful and traditional for authoring software. Some other DOCTYPE might signal validation behavior. No incompatible changes expected.

Ownership of application/xhtml+xml and the way in which the application/xhtml+xml migration might or might not be assigned to one or another development path

Conclusions

References

 

Appendix I: Review of past W3C TAG work on versioning

Appendix II: Use of Version Indicators in Other Languages

It is empirically true that one can version a language without having inline version indicators. For example, Algol 60 and Algol 68 do not have version indicators. Supplying version indicators is a design choice.

Appendix III: Versioning Formalism

JAR an attempt at my own formalism of language compatibility.

This framework was inspired by a paper by Peter L. Hurd (J. Theor. Biol. 174:217-222 1995 [3]).

Two agents, a sender and a receiver, are playing a game that goes as follows:

  1. An objective (target action) o is chosen somehow from A = a space of possible action
  2. Given o, the sender's choice of message is via a function S: A->M where M = a space of possible messages (texts, strings)
  3. Given m, the receiver's choice of action is a function R: M->A where A = a space of possible actions (meanings, interpretations)
  4. Given o and a, success is judged according to a success criterion Z(o,a). I.e. if Z(R(S(o)),o) then communication has been successful.

The simplest situation would be where the objective is simply to perform the desired action:

Z(a,o) iff a = o.

Note: M = message space includes all possible messages, including those that are not used for communication.

Note: A = action space includes all possible actions, not just those that might be achieved through communication. Examples: displaying some text with certain visual or behavioral effects, or the results (output, effects) that one might want to expect from a program.

The functions S and R are not determined by A, so the sender and receiver will need to agree on a correspondence. I'll define a "language" to be a contract that might be entered into between a sender and a receiver, presumably for the purpose of maximizing communication success.

The simplest kind of language would be to agree on the function R that is to be implemented by the receiver. Then given an objective o the sender can choose any message for which Z(R(S(o)),o).

However, we are interested in language extensibility. A language defined to merely specify the behavior of R cannot be extended because the receiver cannot change the interpretation of any message for fear that a sender unaware of the change might send it, in which case there would be no way to guarantee that R's action would meet the objective. Therefore we consider languages in which some messages are reserved for future expansion (i.e. sent messages are limited):

definition: A language L is a pair (F,I) where

The unused messages are those that are in M but not in F. Note that I defines actions for unused messages. This will come in handy later.

definition: A sender speaks L if the image of S is a subset of F.

definition: A receiver understands L if R(m) = I(m) for all m in F.

(The sender will generally choose to further constrain S in order to maximize success.)

Now consider a language change - a language L changing to become a language L'.

definition. L' is backward compatible with L iff any receiver that understands L' understands L in the same way on F.

Put more simply (trivial theorem): L' is backward compatible with L iff I'(m) = I(m) for m in F.

We would like to also have some notion of forward compatibility: Any sender that speaks L' also speaks L. Since a receiver that understands L cannot know how in advance what new messages (in F') are supposed to mean, we have a problem: what should R(m) be when m is new?

To answer this, we introduce the notion of adequate defaulting. The idea is that there might be communication success if we substitute some action a^ for the unknown future desired action a. In this situation we write a^ ~> a. To simplify the formalism we also allow a ~> a for all a in A.

definition: A receiver respects L iff R(m) ~> I(m) for all m in M.

We can define forward compatible as follows: L' is forward compatible with L iff any receiver that respects L also respects L', i.e. R(m) ~> I(m) implies R(m) ~> I'(m) for all m in M. Trivial theorem: Forwards compatibility holds iff I(m) ~> I'(m) for m in M.

definition: L weakly extends to L iff L' is backward and forward compatible with L, i.e. F' superset of F I(m) = I'(m) for all m in F I(m) ~> I'(m) for all m in {F' - F}

In order for forward compatibility to be transitive, we also need to make sure nothing happens with non-final messages to break future extensibility:

definition:. L extends to L' iff it weakly extends to L' and preserves or improves defaults defined by L:

Def. L is extensible if there is an L' that extends it.

Note. If ~> is transitive (is a partial order), then extends and the other relations are all transitive. [requires proof?]

 

Example:
  • A = {stop, go, caution, reverse, unassigned}
  • M = {red, yellow, green}
  • unassigned ~> o for all o in O
  • L = (F,I), L'=(F',I')
  • I = {green:go, stop:red, yellow:unassigned}
  • F = {red, green}
  • I' = {green:go, stop:red, yellow:caution}
  • F' = {red, green, yellow}

We want to be able to "kill off" a message - to decide in a future extension that it shouldn't have any meaning and shouldn't be sent. We can specify I(m) = k with Z(k,o) always false, and then the sender won't send it. This only works if either

  1. k is not on any "upgrade path" to an action that succeeds (i.e. not k ~> a), or
  2. k is not upgradable (k is in F).

(2) puts fewer constraints on A - it requires only one special undefined action, e.g. u as in the example, instead of two that behave differently in the partial order. However, (1) is better formally as one simply specifies a ~> k for any a, and then one may upgrade any message's action from a to k. (2) would require a change in some of our definitions to allow the meaning of a message to pass from a default action a to k, which is not related to it by ~>. (Maybe we could introduce a set K of killed messages as a component of a language.)

Note: If an action can only be done using a defaulted message, why would a sender not "cheat" by sending a message outside F in order to achieve that objective? We can remove the temptation by making sure that a defaulted objective is always achievable by a non-defaulted message: For all m, if Z(I(m),o), then there is some m' in F such that Z(I(m'),o).

Correspondence with [1]:

sender = producer

receiver = consumer

message = text

final = in defined set OR undefined by virtue of having been "killed"

message's action succeeds for some objective = in accept set

Enrichments:

We could extend the framework to nondeterministic sender and receiver behavior, to quantitative payoffs, and/or to differential sender/receiver payoffs. Evolutionary biologists do these things in order to explain the presence and absence of cheating in natural communication systems. There is probably no need to do this in a protocols and formal languages engineering setting, except by way of explaining why a sender would choose a richer action when a default would suffice.

We should be able to model distributed extensibility in this setting: how precoordination can enable the existence of upper bounds among the compatibility relations.

Thanks to Alan Bawden for checking my math. Any remaining errors are mine. - Jonathan [1] http://www.w3.org/2001/tag/doc/versioning [2] http://www.w3.org/2001/tag/doc/versioning-strategies [3] http://scholar.google.com/scholar?hl=en&lr=&cluster=11821292421815354248.

Appendix IV: Other considerations

This section is for things that haven't been integrated into the main doc. To modulate the interpretation of the text in question. That is, depending on what the version indicator is, interpreting agents might have to interpret the same text in two different ways.

This choice has profound consequences for the design of future versions. Suppose that an A (old) text is marked with indicator A. #1 does not in itself imply that a text generated by an A-interpreter will lead to the desired payoff for a B producer, for any text. That will only be true if when we designed the versioning regime we made a stipulation that all future versions will have this property (new producers "must be" happy with what old consumers do with all texts). If we stipulate only sense #1, then future version designers do not have the freedom to transition a given interpretation of a given text from acceptable (in A) to less acceptable (in B) - or vice versa.

If there is in fact forwards and backwards compatibility in a language series, there is may be no no strong need for a version indicator, other than as a convenience (so that agents who care don't have to scan the document to see if it contains constructs it doesn't understand).

So in any discussion, you need to be clear about the sense of the version indicator.

Sense 1 is economical in that a consumer can always just use a B-interpreter to interpret according to language A. There is strong incentive for a consumer to assume it even when doing so isn't in spec. Sense 2 is harder to implement since the consumer needs two interpreters or two interpretation modes, one for A texts and one for B texts.

Version indicators can be helpful, but they just push off the problem one level - they are really part of the language(s) in question, so they have to be evaluated according to exactly the same criteria that one would apply to a language series that doesn't have them. Suppose you have language versions A and B, and then a "sum" language C = A + B whose texts consist of a version indicator followed by a text of either A or B. (If A and B both already have version indicators you *may* be able to take C texts = A texts union B texts.) You still have to agree ahead of time - before language B is invented - on how to interpret texts of C - that is, everyone concerned needs a priori knowledge of how to parse and understand version indicators, even if it's just to say that rejecting unknown versions, or unknown texts, is OK. When you design a language series initially, you may set aside a place for version indicators, and specify that the indicator "sublanguage" is extensible (i.e. new indicators may come along). If you get the indicator language wrong in the first place, e.g. if you define it to specify sense 1 instead of sense 2 or vice versa, then you may find yourself stuck, either underconstraining the series (so that old consumers can't consume new content with confidence) or overconstraining series (so that new content will be rejected by conforming old consumers).

So version indicators only support extensibility (or whatever other goal you're after) if the future consequences for both old and new consumers are articulated and documented before the whole process gets started.

Notes: Saying that C = A + B where B is not yet invented is not an nonsensical as it sounds. An extension may be thought of as a secret that is somehow known in principle, but not revealed to producers and consumers until some future date. I think of versioning and extension as being similar to the concept of single assignment or "future" in programming languages.

2. JAR used a different definition of "language" in his formal writing... I think that language (or language version) as class/predicate of interpreters, or equivalently requirements/specification/constraints on interpreters, is probably a more useful definition that either language as set of strings or language as single interpretation function on set of strings.

Languages can and have been evolved without changes that require implementations to implement forks to consume content from different versions of the language in different ways. On the Web, for example, URIs have evolved that way, and with some unnecessary exceptions, so have CSS, HTML, and the DOM APIs.

In fact, in the case of CSS and HTML, the only versioning has been quirks vs standards mode, a versioning that wasn't sanctioned by the specifications contemporary to its introduction, and which would have been unnecessary had the deviations from the original design required by deployed content been codified as standard, as we have been doing for the past few years with CSS 2.1 and HTML5.

Forking the language makes implementations orders of magnitude more complex. Watching the Internet Explorer engineers' pained expressions when one discusses the implications of their decision to ship multiple versions of their rendering engine makes this abundantly clear. It also makes the language less suitable for constrained devices (instead of one language to support, one effectively ends up with multiple languages to support), harder to test (instead of testing one language implementation, one effectively ends up testing multiple implementations, as well as their interactions in edge cases), and harder to document (instead of just specifying the weird behaviours that end up de-facto part of the language due to wide deployment of implementation bugs, one has to also specify the other behaviours expected in each version).

Language designers should strive to make their languages versionless at the syntax level.

Version indicators only support extensibility (or whatever other goal you're after) if the future consequences for both old and new consumers are articulated and documented before the whole process gets started.

That's not uniquely true of version indicators.

That's true no matter what technique is used to distinguish one version from another. The alternative, where there aren't any version identifiers, requires consumers to deal with both old and new markup as well.

For some languages and some applications, it may be reasonable to define a universal semantics for all versions, such as the HTML rule of ignoring wrappers it doesn't recognize. (Not that that hasn't introduced problems of its own, with special elements created over time just to work around the consequences of the "ignore wrappers" rule.)

For other languages and other applications, it may not be reasoanble to define a universal semantics. Applications must be expected in that case to do something else. Version identifiers offer a convenient mechanism to help users distinguish between versions, even if machines don't need them: "Unexpected element 'fribble' encountered in this V1.2.3 document. The element 'fribble' is not defined in V1.2.3."

To allow any particular input to be flagged as an error is itself (what I would call) semantics. We're having so much trouble with "error", "must accept", "must reject", "must understand", and so on, that it ought to be useful to deconstruct a bit and just talk about the desirability (payoffs) of these various outcomes for producer and consumer. I expect it to be helpful to treat outcomes such as reject, ignore, default, and understand uniformly, and talk about semantics (or specification) not as giving the single "correct" outcome for all consumers but as saying which possible outcomes are acceptable across consumers of varying abilities and inclinations. So ask not "should the consumer accept (or reject) X" but rather "should it be OK with the producer if the consumer rejects (or accepts) X".