HTML and version mechanisms
Disclaimer: This article doesn't represent any kind of consensus in the HTML WG. It is an attempt at capturing the different opinions expressed on the mailing-list.
There has a been a lot of debate in April on the HTML WG mailing list about versioning. Should the new HTML language bear a version mechanism. It is a difficult topic with interesting arguments. The debate will have certainly influences on discussions on the Technical Architecture Group. Versioning is one of the topics addressed in Web Architecture.
no versioning
html is one language, and every implementations must be able to read its content, whatever happens. All future “versions” of html should never dismissed what has been done in the past. Any programs starting from scratch has to implement everything from the start. The semantics of element will never change as well, because it might break the intent of authors from the past who have written accordingly to a previous version.
versioning
People are requesting a version number to be able
- to switch between two different modes of rendering.
- to author with specific requirements and/or semantics
- to convert from one version to another one
- to create helping tool for authoring document
- to validate
- to evolve the semantics of elements and attributes
html fragment
There is no simple way for identifying an html fragment used in another application. A version attribute could be done on the root element of this html fragment. It gives a difficult constraint on authoring tool if this html fragment is changed and have to push the version attribute on the new root element.For example, going from
<p version="foo">babar</p>
to
<div version="foo"><p>babar</p></div>
Authors and version
Author: a version system/mechanism which is constrained to be in the head or DOCTYPE or html element is difficult to change for author with no access to the html template. (ex: CMS with access to content only). On the other side a version number accessible from the body will make it easy to change.
Authoring tool: The mechanism to change the version is not defined in a conversion context. (ex: HTML editor X taking over HTML editor Y.)
Template designer: A version number which is not accessible to author might be a feature by constraining author to only a given set of elements.
Converter / Helping tools
A version number is useful to be able to convert a document to/from an earlier/future version of the language. It is useful to create an helping tool which gives recommendation depending on the semantics of the feature.
Possible Version Syntax mechanism
“version” attribute
The version attribute is found on the html element.
<!DOCTYPE html> <html version="something"> … </html>
The format of “something” is not defined, but in HTML 4.01 As defined in HTML 4.01
version = cdata [CN]
Deprecated. The value of this attribute specifies which HTML DTD version governs the current document. This attribute has been deprecated because it is redundant with version information provided by the document type declaration.
and the DTD for HTML 4.01 Transitional only!
<!ENTITY % HTML.Version "-//W3C//DTD HTML 4.01 Transitional//EN">
<!ENTITY % version "version CDATA #FIXED '%HTML.Version;'">
<!ATTLIST HTML %i18n;
%version;
>
Then an HTML 4.01 Transitional document would be:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html version="-//W3C//DTD HTML 4.01 Transitional//EN">
<head>
...
</head>
<body>
...
</body>
</html>
In HTML 3.2, The DTD declares
<!ENTITY % HTML.Version "-//W3C//DTD HTML 3.2 Final//EN">
<!ENTITY % version.attr "VERSION CDATA #FIXED '%HTML.Version;'">
<!ATTLIST HTML %version.attr;>
It means an HTML 3.2 document with version information would look like:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML version="-//W3C//DTD HTML 3.2 Final//EN">
<HEAD>
<TITLE>document title</TITLE>
</HEAD>
<BODY>
... document body
</BODY>
</HTML>
“meta” element for versioning
Another possibility for versioning is to use a meta name
<meta name="version" content="something"/>
The syntax of something is not defined. Many authoring tools have similar mechanism to advertise that they have created the document and often use a version number.
version in HTTP headers
In the same way we can specify the content-type with http headers, the version could be given through HTTP headers. It is difficult to modify for authors who have rarely access to the server configuration.
version in comments.
<!-- version: something -->
A syntax which will be freely consumed by consumer (various user agent) or creator by producer (authoring tool), but that is not required for any class of products. The syntax is still undefined. An opt-in mechanism specific to a browser vendor makes it difficult to manage in an interoperable way.
Neither Version in comments nor Version in HTTP-Headers is a viable approach in my opinion - for the same reason:
If there's a need for a Version-Information of a HTML-Document it has to be part of the Document.
The HTTP-Header is not part of the Document obviously. And comments - as far as I understand them - may be left out, deleted or whatever without loosing any Information that belongs to "the Document". In the context of a programming language, the software still works, if I delete all comments from the source code, isn't it?.
I see your point. It is fair indeed.
About programming languages which contains information in the comments, there is the case of python which puts the Encoding information for programs in a comment.
Some CMSes also rely on the comments to manage the information in the document. That is tricky. I guess part of it lies that Authoring tools developers and content management systems designers are not involved enough in the design of HTML. They might have requirements which are slightly different from the browser vendors. Some requirements for specifically managing content.
A tricky problem, and one that needs a solution! One use for a version mechanism is in being able to decide whether to, say, serve XHTML1.1 or XHTML5, based upon user agent capabilities or user preferences. Ideally this would be done at an HTTP level, however as both have a application/xhtml+xml MIME type, this is impossible.
The HTML5 spec seems to discourage the XML serialization and instead recommends HTML with the text/html though, and as this is backwards-compatible with HTML4, there's less of a need for a version mechanism.
So, is HTML5 no longer considered an application of SGML?
If a "proper" SGML DOCTYPE declaration is to be omitted then this may break many existing SGML editing applications that support validation using any arbitrary DTD. These applications should otherwise be able to handle the new HTML5 DTD, provided they can be given access to it and the DOCTYPE declaration is allowed to reference it, either using a PUBLIC or SYSTEM id. The workaround for these editors would likely need to be to use a "proper" DOCTYPE during editing and then have a "save for the web" mode that alters it to suit the new requirements of HTML5 (by putting in this seemingly odd DOCTYPE that does not contain any version information).