Re: ISSUE-4: html-versioning / ISSUE-84: legacy-doctypes - Straw Poll for Objections

On Aug 13, 2010, at 5:39 AM, Robin Berjon wrote:

> On Aug 12, 2010, at 19:41 , Roy T. Fielding wrote:
>> On Aug 12, 2010, at 5:55 AM, Robin Berjon wrote:
>>> On Jul 22, 2010, at 11:02 , Maciej Stachowiak wrote:
>>>> The poll is available here, and it will run through Friday, July 30th:
>>>> 
>>>> http://www.w3.org/2002/09/wbs/40318/issues-4-84-objection-poll/
>>>> 
>>>> Please read the introductory text before entering your response.
>>>> 
>>>> In particular, keep in mind that you don't *have* to reply. You only need to do so if you feel your objection to one of the options is truly strong, and has not been adequately addressed by a clearly marked objection contained within a Change Proposal or by someone else's objection. The Chairs will be looking at strength of objections, and will not be counting votes.
>>> 
>>> I was on vacation while this poll was open, but I wanted to register my strong objection to the addition of a versioning indicator of any kind. It is an approach that with respect I can only deem naïve and that adds complexity without addressing the issue of compatible behaviour across change.
>>> 
>>> I have covered the topic previously, going into some lengths to describe architectural issues with version indicators as part of a discussion with the TAG in http://lists.w3.org/Archives/Public/www-tag/2009Dec/0116.html, as well as in a lighterweight description that ends with a decision tree about the cases in which you need a version indicator at http://berjon.com/blog/2009/12/xmlbp-naive-versioning.html.
>> 
>> I find it incredible that such a large group of people can manage
>> to make identical "strong objections" based on an argument that
>> only holds true if browsers are the only consumers of HTML.
>> For all other consumers and producers, every objection made in
>> that poll is demonstrably false.  The problem is that none of
>> the other implementors of HTML bother to participate here because
>> their requirements are routinely ignored.
> 
> My argument very definitely does not come from any manner of browser-oriented point of view. My background here is entirely in robust processing of evolving vocabularies (typically XML) in situations that don't even remotely involve a browser. I have seen no convincing argument that what applies to the processing of XML would not apply equally to HTML, whether in a browser context or not.

Your argument does not apply to the processing of XML in general.
It presumes the recipient is processing the document for uniform
interpretation of presentation (i.e., what browsers do).

> Speaking of arguments, since you seem to disagree with the stated objections and since you claim that every objection in this poll is "demonstrably false" why not bring forward said demonstration? I'd be thrilled to see proof of its existence as life would indeed be a lot simpler if version indicators provided a usable solution to versioning problems.

That is a paper tiger.  The issue is not about versioning problems,
let alone finding a universal design solution for versioning languages.
Issue 4 is "Should the new HTML language bear a version mechanism?" and
Issue 84 is "Should spec discourage use of legacy doctypes?"

There have been extensive discussions on the mailing list, including
many examples where an *optional* but standard way of indicating an HTML
language version is useful for authoring tools and systems that allow
workflows to be established based on such version indicators.  Such an
indicator is not intended to solve versioning in general -- it is intended
to record metadata about the rules followed during mark-up generation.
It is useful for any application of HTML that may process the language
differently (for its own reasons) based on the value of that indicator.

The fact that it is not useful for specific applications that intend
to provide uniform processing of all versions of a language does not
in any way detract from its usefulness for other types of applications
that have an equal right to influence HTML as a language.  Applications
that don't use the version will ignore the indicator.  Applications that
do use the version need to have a defined, compliant place to find it
in a standard way so that authoring and workflow tools built by
independent vendors can interoperate.

In regard to your argument:

> Versioning, as explained previously, is important enough that it deserves to be done right. Yet some languages have taken a rather naïve approach to it typically consisting in a version attribute on the root element or other such simplistic schemes built on the presence of a version indicator. That is fine if the purpose is to die immediately when a given version is not supported (in which case simply changing the namespace would be less verbose and just as effective), but will not produce any useful effect if the intent is to allow processors to work across versions.

Of course not, since that isn't the intent of the version indicator.
Version handling in HTML includes things like "ignore all unknown tags"
and other default rendering algorithms.  The version indicator in HTML
is only useful during the authoring process, wherein custom edit dialogs
are frequently enabled/disabled based on the chosen version, and custom
workflow actions can be triggered or alternatively processed based on
the chosen language version (usually in the form of strict validation
instead of non-strict, though there is no limitation on such processing
in general).

> Indeed, what is such a processor to do if it sees a version attribute with a value greater than the version it supports?

Whatever it wants to do.  Browsers will of course want to ignore this.
Editing tools might dynamically load (and thereafter "know") a set of
instructions for version-specific handling, or they might simply adopt
the highest version they do support and highlight the language elements
that are not understood.  It is not the role of the language designer
to determine how all readers behave.

> Nothing useful comes to mind, short of warning the user that there may be rendering issues, a message which said user will either ignore, or will cause him to panic, but will not yield any useful result.

Again, clearly written from the standpoint of a browser.

> Conversely, if the version attribute points to an earlier version, should features from later versions be ignored? That would make implementations unduly complex.

I suggest you consider what an editor would do with that information.
It is actually possible for an authoring system to attempt validation
of the same document according to every known variant of the language,
all in the background, and then provide a hint to the user if the document
is only valid under a later version.

And, no, this is not "unduly complex".  What would be unduly complex is
an authoring system that provides every potential language form to the
user, including options that are obsolete, when the user has already
configured the tool to only let them author in strict HTML6.

> Furthermore, if a language is extended in a modular fashion rather than through linear versions, this approach breaks down with the complexity of specifying the modules in use and their respective versions.

Yes, which is why even modular frameworks end up being grouped together
at some point in the (standard) future and given a version indicator.

> When producing content, it is easily admitted that using the smallest possible version that includes all of the needed functionality is a good practice as it will enable the largest usage by older implementations.

Only if you presume the earlier assumption that a version indicator
exists to cause the unknowing recipient to catch fire.

> But doing so properly requires authors to know for a given list of language constructs which is the lowest version number that comprises them all. That is asking a lot, and in practice authors will likely fall back in such situations to using the highest version number that they can get away with.

In practice, authors use tools.

> Either way, version information will often be out of synch (either through error, or organic growth, or because the content is composed from multiple sources) with the actual content. This tendency is strong enough that relying on version metadata is largely useless, unless applied to content that is exclusively produced under tight control by programmatic means — a situation that is exceedingly rare for web formats.

Actually, very little web content is produced via emacs and vi,
these days.  Most is generated using tools, translated from other
languages (that use strict versioning because they don't like entropy),
or dropped into fields within a template.

In any case, the purpose of the version indicator is to preserve intent,
which can then be mechanically checked against actual used syntax by
those few tools responsible for content maintenance.  If there is no
specific intent, then the version indicator is not used.

> But even without consideration that include hard to demonstrate behaviour from putative users, the following very short decision tree can be followed when looking at whether a version indicator would be useful:
> 
> Are processors expected to process content across version boundaries?
> 
> No.
> Then each version is actually a different language (i.e. it is not mutually intelligible). Just change the namespace (or if there is no namespace, any other global indicator such as the root element or media type and extension). You don't need a version indicator.
> Yes.
> The processors will have to be defined so that they can apply lacunae values, language-level error handling, and other similar rules intended to render unknown constructs sufficiently intelligible. There is nothing which a version indicator could add on top of what they already do. You don't need a version indicator.

Again, entirely based on the assumption that the processor is a browser
(or the equivalent read-only, uniform rendering machine that you refer
to as an XML processor).

....Roy

Received on Friday, 13 August 2010 23:20:53 UTC