Re: Updated DOCTYPE versioning change proposal (ISSUE-4)

Contrary to popular belief, I'm actually a huge proponent of MIME.
The reason MIME is important is that it lets us multiplex different
protocols over HTTP without "cross-talk" where the server thinks it's
saying one thing but the client thinks it's saying something else.
This cross-talk is a large source of security issues today.  The case
here is different because we explicitly *don't* want to multiplex
different HTML dialects.

This whole discussion hinges on the costs and benefits of making this
change (whether it's "adding" or "removing" something is a red
herring).  As you say below the opportunity cost of not making this
change is merely a time delay (i.e., between steps 1 and 2) in the
hypothetical future in which we need to make an incompatible change to
HTML.  Let's do a back of the envelope cost/benefit analysis.

First, we need to estimate various input quantities:

1) Probability that we'll make an incompatible change to HTML in the
future that necessitates these version markers.  You admit that this
is unlikely.  Let's say the probability is 1% (I actually think it's
much lower, and you might think it's higher).

2) At what time in the future will this change be needed?  Let's say
15 years.  Ian estimates that HTML5 will be a proposed recommendation
in 2022, so that gives us a couple years of headroom to realize we
screwed up without being able to change the spec.

3) Delay between steps 1 and 2 caused by not changing the spec in this
way.  Let's say it takes 5 years for people to update their validators
to understand that version indicators are ok.  This seems ample to me.
 (Notice that if people could update their validators instantly, there
would be no delay between steps 1 and 2 and the change to the spec
would have zero value.)

4) Long term, risk-free rate of return.  We need this quantity to
estimate the time-value of various things.  Five year treasuries are
trading at 2.62% today, but that's probably on the lower side of
historical averages.  Let's say 5% as a more normal discount factor.

Ok, now we're ready do to do the calculation.  Suppose the benefit is
$1,000,000.  Deflating by the total probability of this eventuality
gives us $10,000.  Now, we're only extracting the time value of this
benefit since we get the benefit, just five years later.  That gives
us the utility lost due to delay of roughly $2126 = $10,000 - $10,000
/ (1.05)^5 (see http://en.wikipedia.org/wiki/Present_Value for an
explanation of this equation).  Now, we need the present value of that
quantity, which is roughly $1027 = $2126 / (1.05)^15.

In summary, for every million dollars of benefit we would get from
having this versioning feature, we only lose $1000 of utility today by
not making the change to the spec.  That means the benefit of making
this change must be 1000x greater than the cost to have a positive
payoff.  Sounds like a bad investment to me.

Adam


On Thu, Jan 7, 2010 at 6:04 PM, Larry Masinter <masinter@adobe.com> wrote:
> >From change proposal rationale:
>>> While everyone *hopes* there are never going to be any further
>>> incompatible changes to HTML in the future, there *is* a
>>> possibility that in some unfortunate situation, it will be
>>> necessary to introduce incompatible changes. In that case, it will
>>> be necessary to introduce a new version indicator, to allow (alas)
>>> processors to determine which of the incompatible interpretations
>>> was meant.
>
>> If the new version is going to be incompatible anyway, why not add
>> the version indicator at that time?
>
> If you have a production workflow that only deals in valid documents,
> and you later introduce a version indicator which isn't valid now,
> then the existing workflow would reject the new version indicator as
> invalid.
>
>>> While this will be unfortunate, it would be doubly unfortunate to
>>> have to introduce a new "place" for a version indicator that was
>>> previously non-conforming, which would cause even worse uproar,
>>> because documents that *didn't* want the new incompatible behavior
>>> would have no place to say explicitly that which version of the
>>> incompatible behavior they wanted.
>
>> We wouldn't need to invent a new "place" for this information, we
>> could just resurrect this proposal to use the old place.  The
>> documents that don't want the new behavior can just use the HTML5
>> doctype of "<!DOCTYPE html>".  If we were certain that this
>> eventuality would come to pass, we might want to optimize for it by
>> providing a more elegant alternative, but the current indications
>> are that this not a likely course of events.
>
> If you remember from the issue with "Referer", using the absence of an
> indicator to indicate something is ambiguous; a user specifying
> <!DOCTYPE html> might be saying they want the HTML5 behavior, or they
> might be saying they don't care about the behavior because they're not
> using the feature that has the incompatible change, or that they are
> willing to accept either behavior.
>
>>> By *allowing* a version indicator in conforming content today, we
>>> can avert more serious damage. Having a location for a version
>>> indicator, even if it isn't explicitly used, allows it to be used
>>> at some point in the future.
>
>> This is the core of your argument, but if the future version we're
>> planning for is incompatible anyway, why does it matter if it
>> re-introduces versioned doctypes?
>
> This isn't the core of my argument, by the way; it's only one of
> several arguments. And I think "incompatible" is not a binary "yes
> it's incompatible" or "no it is not". In most cases, incompatibilities
> are minor, only affect edge cases, may not be particularly significant
> to most users and only apply in particular cases.
>
>> You're not asking for anything to change for user agent
>> implementations, so it's not like user agents will act differently
>> in your alternative future.
>
> I think you have to be careful to allow staging of incompatible
> changes, basically, to allow for new browsers which implement the
> incompatible behavior to be deployed asynchronously with new content
> that is explicit that it wants the current behavior and not the new
> behavior. The way that can happen is:
>
> NOW:
> 1. allow current browsers and validators to accept a current version
>   indicator
> 2. allow (but do not encourage) content to be deployed which
>   explicitly calls for current behavior using a current version indicator
>
> LATER: (after deciding to introduce incompatible change):
>
> 3. encourage current content which depends on old behavior
>   to identify old version (only allowed by 2)
> 4. deploy new browsers which implement the new incompatible behavior,
>   if there is an explicit new version indicator
> 5. deploy content which uses the version indicator to call
>   for new behavior rather than old
>
> This change proposal basically does 1 and 2.  Step 3 is possible
> because conforming validators allow current version indicators in 2.
> Steps 3 and 4 are asynchronous, and step 5 (which is what you
> need to get any benefit) is delayed until 4 reaches an acceptable
> threshold.
>
> If you do not allow explicit version indicators now, then
> we would need another step before 3 which would be to deploy
> validators which accepted a current version indicator.
>
> This would delay consistent adoption; more likely people would
> deploy incompatible content labeled with "Best Viewed With
> Firefox 12" instead.
>
>> You explicitly tell authors not to use the extended syntax: "A
>> PublicIdentifier SHOULD NOT be used," so the extant documents on the
>> world-wide web at this future time when we need an incompatible
>> version will likely be the same.
>
> I use "SHOULD NOT" carefully in the RFC 2119 sense, that the practice
> is not recommended except when there are good, well-understood
> reasons. There are some such justifications now (for use within
> editing or polyglot pipelines), but should an incompatible change be
> necessary, then the reasons would increase. Well before browsers that
> implement the incompatible change are widely deployed, content which
> is explicit as to the version intended can be deployed. And plans for
> incompatible changes would add to the justification for using
> version indicators.
>
> Possibly the SHOULD NOT in fact should be softened
> to MAY, or only applied in the context of hand-edited or manually
> assembled content, or situations where the actual version is
> unknown.
>
>> All that seems to change is the conformance status of documents
>> produced *after* the new incompatible spec is issued.
>
> No, documents with an explicit version indicator would be conforming
> NOW. That's important in staging the deployment of (unfortunate,
> hopefully unnecessary) incompatible changes.
>
>> Moreover, it's only the conformance status of those document
>> w.r.t. the *old* specification.  That seems like pedantry in the
>> extreme.
>
> Pedantry:
>  * the character, qualities, practices, etc., of a pedant,
>   esp. undue display of learning.
>  * slavish attention to rules, details, etc.
>
> I'm not sure how "pedantry" applies. Is supplying technical
> analysis based on facts and previous experience "undue"?
> We're engaged in writing a technical specification for a language
> used by millions, and being careful about rules and details.
> Is it "slavish" to do so?
>
> I think "pedantry" is inappropriately pejorative in
> this context.
>
> I said:
>>> In the history of computer languages, there are no languages that
>>> have not evolved, been extended, or otherwise "versioned" as long
>>> as the language has been in use.
>
> to which you replied:
>> Really?  Where are the version indicators for C++?  The C++
>> languages has certainly evolve since its inception, but it hasn't
>> needed an explicit version indicator.
>
> I didn't say that there are no languages without in-band version
> indicators. C++ programs are not self-contained. They come with
> make files, version compatibility installers, and other documents
> which indicate -- usually pretty clearly -- which version of which
> language of which compiler is to be used to compile the program.  C++
> versioning is in fact problematic, and there are numerous ad-hoc
> solutions to managing the evolution of programming languages and
> compiler implementations.
>
> It is not possible to retrieve a C++ program and run it without
> getting out of band information about which version of C++ was
> intended, or using version information embedded in the README or
> configuration files or scripts.
>
> One of the key innovations of the web (in the early '90s) over
> previous distributed network information systems was the adoption of
> the MIME architecture, previously invented for email, to make
> interpretation of message bodies and content self-contained, so that a
> large amount of contextual information wasn't necessary in order to
> discern the meaning of an exchange.  This allows HTTP to be stateless,
> which allows for load balancing, distribution, cloud computing,
> caching, and a wide variety of other facilities that were not possible
> with other distributed information systems.
>
> And part of the MIME architecture is that the content-type label in a
> message indicates the general category of information contained in the
> message, while any other versioning information is contained within
> the message itself.
>
> The adoption of MIME in HTTP wasn't a slam-dunk; it followed from the
> resolution to use MIME in Gopher at GopherCon '93; see
> http://prentissriddle.com/trips/gophercon1993.html.
>
> The adoption of the MIME architecture in HTTP between HTTP 0.9 (which
> had no content labels at all) and HTTP 1.0 was again one of the major
> innovations in the web which has led to its growth and evolution over
> such a long period of time.
>
> Most of the MIME types in use on the web for stand-alone content
> contain in-band version indicators, whether for the whole file
> (as with image/gif, image/jpeg, application/ogg, application/pdf,
> Flash, Java), or through version indicators on chunks in an
> unversioned file structure (as with image/png and audio/mp3).
>
> While CSS and JavaScript don't have versions, but they are also
> not standalone content -- a CSS style sheet doesn't constitute
> a "message" in any meaningful way, and so the type and the
> version of the type can be managed as part of the bundle.
>
>> It seems entirely likely that HTML will continue to evolve without a
>> version indicator because the mechanism we've been using for
>> versioning has been more or less ignored because authors screw it up
>> too much.
>
> "more or less ignored" is in browsers; it seems to me that the
> majority of HTML editors use version indicators as part of the
> HTML authoring process.
>
> That the web has been successful, but that much current
> web content seems to be sloppily constructed, is not evidence
> of a causal relationship. By itself, it not an argument
> for reifying sloppy construction or adding a requirement
> for sloppy construction (by, for example, not allowing authors
> to identifying in a standard way which specification(s)
> they are attempting to be compatible with). The sloppy
> construction was a result of the success of the web and
> the DotCom bubble, not a cause.
>
>
> When I said:
>>> This applies to network protocols, character encoding standards,
>>> programming languages, and certainly to every known technology
>>> found on the web.
> You replied:
>> That's quite a bold claim and certainly untrue.
>
> But the "this" was in context of languages evolving without
> incompatible changes, and subsequently when I asserted:
>
>>> There are no known cases where a language hasn't gone through some
>>> at least minor incompatible change.
>
> you replied:
>> Right, but that doesn't mean we need a version indicator.
>
> So I think your "certainly untrue" is contradicted by your "right". If
> you had a counter-example of a web language that hasn't had
> at least a minor incompatible changes, you would have supplied it.
>
>> HTML has gone though a number of minor incompatible changes and the
>> world has managed not to end in spite of everyone ignoring the
>> version indicator.
>
> I don't think the criteria for continuing an HTML 4 feature in HTML 5
> includes the requirement that "the world will end if we don't".
>
> And *everyone* does not ignore version indicators. In fact, the change
> proposal is much more explicit than before in requiring BROWSERS to
> not exhibit different rendering behavior in the face of version
> indicators, but to allowing validators, editors, and content
> production pipelines to use them.
>
> Yes, HTML can survive without a global version indicator, and only
> specifying <!DOCTYPE html> may continue to work in the narrow context
> of communication between web server and current browser, but leaving
> the ability to provide a PublicIdentifier and SystemIdentifier will
> allow some current production and editing tools to work better, and,
> if used carefully, will cause no harm. I think that's all we ask for
> new features, and the threshold for retaining old features should be
> lower than the threshold for adding new ones.
>
>> In summary, we don't need to add versioning now to future-proof the
>> spec because the effects of this change are felt only after we
>> discover an incompatible version is required.  Attempting to prepare
>> for that eventuality as described in you change proposal doesn't
>> actually do anything substantive to help.
>
> The Change Proposal is not to "add versioning" but to "leave (some
> parts of) HTML4 versioning as a HTML5 feature". Couching this as an
> "addition" is misleading.
>
> The argument for "future proofing" was just one of several
> arguments. And I think I've made the case for why attempting
> to add a version indicator later would require an extra step
> in staging the deployment of what would be, presumably, an
> fix important enough to introduce an incompatibility.
>
> Larry
> --
> http://larry.masinter.net
>
>

Received on Friday, 8 January 2010 19:24:57 UTC