Technical Architecture Group

16 Apr 2007


See also: IRC log


Stuart, Norm, Dave, Dan, Henry, Raman, Rhys
TimBL, Noah



Accept minutes of 2 April 2007?


Stuart gives regrets for 23 Apr

Rhys gives regrets for 23 Apr

Norm volunteers to prepare an agenda, Dan will chair.

Next meeting: 23 Apr, David to scribe.

September f2f proposal: 17-18 Sep in University of Southampton, UK.

Resolved: proposal accepted, we will meet 17-18 Sep in Southampton, UK.

<scribe> ACTION: Stuart to inform TimBL so that logistics can be resolved. [recorded in http://www.w3.org/2007/04/16-tagmem-minutes.html#action01]

Panel at the AC meeting (Banff, May 2007) has been resolved. TAG participation still invited.

Any other administrative business?

We'll skip item 2 on the agenda; Ed cannot attend this week for passwordsInTheClear-52.

Issue XMLVersioning-42 (and TagSoupIntegration-54)

Dan: The HTML WG is putting together design principles and preparing them with TAG good practice notes in WebArch.
... There's pushback on a number of them, but for today: "Formats should bear version information"
... Some folks don't want HTML v.next to include version info.
... There's the general question of whether formats in general should include version information, then there's the specific case for HTML.

DanC: There are specific arguments against for HTML

TV: The issue of versioning has become conflated with the question of how browsers react. This is unfortunate because these should be orthogonal.

<DanC> (one of Baron's msgs... not sure this is the best one... http://lists.w3.org/Archives/Public/public-html/2007Apr/0279.html )

TV: But if you believe there are more things in the world than browers, then maybe it does make sense for HTML to have versioning information.

<ht> http://lists.w3.org/Archives/Public/www-tag/2007Mar/0042.html starts the thread, continues in April here: http://lists.w3.org/Archives/Public/www-tag/2007Apr/0000.html

DanC: One argument is: I think we should be documenting the behavior that browsers need to implement in order for new browsers to come into the web.
... That's why I'm chairing this WG.

<ht> Here's David Barron's contribution, in a thread on another list: http://lists.w3.org/Archives/Public/public-html/2007Apr/0279.html

TV: The 2, 3, and 4 browser vendors want to replace #1, but they're also not strongly motivated to encourage vendors 5, 6, and 7 from entering the market.
... Entry into the browser space if you have to understand everything that's out there.
... It would be easier if there was (conceptually at least) a filter in front of the existing HTML that created XHTML.

DanC: Clean XHTML isn't an option; the options are HTML with or without versioning information.
... If you want to make the playing field more level, I don't think version numbers help.

TV: I don't think those two things are related.
... The dominant vendor gets to decide when V+1 becomes the default. If the world is bigger than browsers, then it seems like it would be valuable to have a version identifier in the DOM. How that's serialized is a separate question.
... The <!DOCTYPE syntax is already a trigger for "standards mode". That should be divorced from the versioning question.

DanC: It's a huge difference about whether it's serialized or not.

TV: If the property is in the DOM and if it shows up when serialized that's fine. The separable question is whether or not authors should write it. I can serialize all sorts of things that the author never wrote.

DanC: What the WebArch document says is that formats should have version information.

TV: I think it is a good idea to have version information in HTML. I don't believe that anyone is smart enough to get it all right in the first version. Everything gets revved.

DanC: The question is whether you revise it in place or give it a new name.

TV: We've had the same argument about CSS and I think CSS is hurting quite badly because it doesn't have version information. But the CSS WG feels differently and this may be an unbridgable divide.

<DanC> (the CSS argument I still haven't finished thinking thru)

Stuart: Version numbers have been called a solution in search of a problem.
... And are we trying to talk about versioning a language or are we making claims about an instance document.

DanC: The instance document.

<Zakim> ht, you wanted to try to understand the "walled garden" argument

Stuart: I leave the question open then of what problem they're trying to address.

<DanC> (hmm... maybe instance document isn't as explicit in webarch as I thought... "A data format specification SHOULD provide for version information." -- http://www.w3.org/TR/webarch/#pr-version-info )

Henry: Several contributors to the thread have listed reasons why version information would be valuable.
... The response has been "validation is a red herring" without addressing the other points raised.
... Another argument about version numbers is that they create walled gardens.

Henry, TV, DanC: I don't understand what that means.

TV explains walled gardens.

David: The question of adding version information surprised me, because I thought HTML already had version information.
... I'm not sure where this goes. The arguments about version numbers that suggest that newer versions will somehow cause a problem doesn't make sense to me.

DanC: One obvious application of version numbers is the pickle tag. Suppose in V4, the pickle tag is green but in V5 it's blue. So the semantics of some parts of the document change depending on the version number.
... HTML doesn't have those, by design I think.
... There's a big question of whether you can trust anything the author puts at the top of the document.
... Less than 0.1% of the documents on the web actually conform to what's in their doctype declaration

David: If the version information is effectively inaccurate then maybe the TAG finding should be qualified to say "accurate" or none at all.

<raman> if you add tag <foo>bar plus bas </foo> to a version of HTML -- then this will change display semantics based on version.

DanC: Or opt out of the "should" because authors of this format are too sloppy.

David: Can someone explain "to use the version information to keep improvements in standards compliance to new versions on the web"

Some discussion of what it might mean

<Stuart> I note that WebArch states that formats should "provide for" version info. I take that to mean that the format should provide a means for document authors to declare what version (they think) an instance conforms to...

DanC: Imagine that the world thinks HTML 12 is cool and then Gorrillasoft does something stupid with HTML 12 pages.

TV: What's special about version numbers in this case?

<Stuart> ie. it's a statement about providing a means to make a claim in a document.

Henry: Why does the version number make that bad thing happen? Why is that a likely cause.

DanC: Suppose they invent a new, proprietary feature that depends on the version 12 identifier.
... This will cause people to stick the standard version number in there in order to get proprietary behavior.

Henry: People will lie to get into that space, but they'll lie about anything not just version numbers.

DanC: So we should arrange the world so that we aren't encouraging bad behavior.

TV: There are many other uses for version numbers. Killing them just to avoid this is a bad idea because you shut out all the other use cases.
... If your world view is that only browsers matter, then it's a fine argument. But the world is bigger than browsers, isn't it?

DanC: Should ASCII documents have version numbers at the top?

<Zakim> dorchard, you wanted to support Henry's point about validation only part of the discussion

David: I wanted to support Henry's point earlier that the discussion hasn't done a good job of following through all the points.

<DanC> (can we just stipulate that the email discussion is messy and just have whatever discussion we want to have now?)

David: Can we call this "version identifiers" instead of "version numbers"? Because that includes namespaces and other features.

<raman> ascii documents == text/plain --- I've not seen versions of ascii documents that need different processing. Incidentally, IETF RFC documents that are ASCII are indeed dinstinctively recognizable compared to random string of octets that dont use the high bit

David: Validation has supporters and detractors. But the various flavors of validation aren't even always well specified. RELAX NG is different from XSD, etc.
... Just being able to identify versions has a whole lot of benefits that haven't been addressed.

<Zakim> DanC, you wanted to think out loud about past/future issues User-Agent: , and the organic alternative, ala gnu autoconf

DanC: The proposition to demote the good practice isn't getting any favor, but I think we should beef it up a bit.

<ht> Here are three posts which list benefits of version numbers which have not been replied to as far as I can see: http://lists.w3.org/Archives/Public/www-tag/2007Mar/0043.html, http://lists.w3.org/Archives/Public/www-tag/2007Apr/0028.html, http://lists.w3.org/Archives/Public/www-tag/2007Apr/0053.html

DanC: Predicting the future often goes bad. Consider POSIX C programs; using the header doesn't really work. What really works is something like GNU autoconf that tests as much of the environment as it can.
... It checks to see what actual behavior exists.
... It's horrible, but it's awfully robust. There's a lot of JavaScript that works this way these days.
... Saying that "this and such identifier means that" hasn't worked out really well in practice.

TV: GNU autoconf does this horrible thing. But you didn't have to write it so you just use it and you don't have to worry about it.
... The reason this works is because you can write down all these little tests and you can run them.
... What fixed this was the DejaGNU project that give the world a declarative syntax for all these options, expressed in M4 in the case of autoconf.
... But if you don't provide any formalisms, which is another thread, then you're really screwed.
... The reason that configuration has been able to proceed is because DejaGNU gave us a formalism that you could hang your hat on.

Stuart: What can the TAG do to help, Dan?

Henry: I don't see any of the arguments put forward in favor of version identification has providing better founding for the webarch principle.
... My experience to date is that they don't hear claims of value either from validation, which is sometimes useful, or that there are other uses for version identification.

DanC: My question that started this thread was, mix two things and we get a sloppy result. First proposition: version identifiers are good most of the time. The HTML WG might stipulate that but not want to do it for HTML.

Henry: I think the burden is on the HTML WG to explain why the cost-benefit analysis comes out negative for version identification in the particular case of HTML

DanC: One of the more coherent messages I'm getting is that sometimes version numbers are good and sometimes they aren't and it depends.

Henry: I agree. We're not saying "when I put version identifier banana in my document, it must determine what happens forevermore". We're saying that for those cases, where it has value, there's a standard place to put it.
... For version information to be of any use, it has to be interoperable and for that to happen it has to be in the specification.
... I'm not saying that it has to be required, just that there is a standard notation when I do want to say.

David: You just want a framework for carrying the version information?

DanC: Suppose I said, you must always put the letter "Q" near the top of your document for version information.

Henry: I'd push back on that at Last Call because it doesn't seem like a very good technique.

Some discussion of how this might work and what the minimum bar is.

DanC: So you need a version number, but you don't care if it includes differences from the past or predicts the future.

TV: So the WHAT WG folk will say <!DOCTYPE gives you that.

Henry: That's not sufficient, because there will likely be more versions in the future.

<Stuart> If the versioning mechanism is intrinsic to the document type.... and it varies by version.... how do we get to the version id without already knowing the version id?

DanC: So, Henry, you do want them to say something about the past and predicte the future.

<DanC> <!DOCTYPE html>

Henry: !DOCTYPE isn't any good because it doesn't distinguish between versions in the past.
... I don't believe that you will never change the spec in ways that I'm not happy with.
... I want to be able to state unequivocally what version of the standard I authored against.

TV: And why do you want this?

Henry: Here's an example. The last message I cited was from Karl Dubost and he gives several reasons.
... Suppose when I edit this document, I want to stay within the vocabulary of the standard that I originally authored to.
... I want exactly the accelerators that are applicable to the 2008 spec and not the 2010 version.

<DanC> (the "authoring tool versions" requirement seems more like a feature; I suspect there's a more tangible, user-oriented requirement behind it.)

TV outlines the question of why it matters if a future version includes support for some new tag, "foo"

Henry: No. I'm authoring to this version because I have legacy software that only understands this version.
... For backwards compatibility and legacy reasons, I want to be able to stay with an older version.

TV: In fact, you'll just see the tag and use it and bad things will happen downstream.

<Zakim> dorchard, you wanted to say why is for properly doing dispatch

David: I wanted to mention two things: I agree about authoring tools, and I also wanted to mention dispatch.
... The issue of having a version identifier arises when you have software that supports more than one version.
... One strategy is "dispatch at the top" where you look at the version identifier and dispatch the right code path.
... the converse is "late dispatch" and as you progress through you find the places where you care about versions and you do tests in different places in your software.

<ht> HST remembers a quote he uses a lot: "validate at trust boundaries"

David: Dispatch is one really important reason to be able to identify versions.

<ht> The author of that quote: DanC :-)

David: One of the knocks against versions has to do with how they're currently being used.
... The reason that I think people like version numbers is because of issues related to backwards compatibility.
... People are used to the idea of major and minor version numbers.
... But people tend to do straight string comparisons, so we don't get the benefits we might.

<DanC> (I recall Ed's advice that we institutionalize the "bump the major number for incompatible changes" pattern, which the XML 1.1 experience supports.)

<Zakim> DanC, you wanted to note that Chris Wilson's argument sounds a lot like this dispatch argument. Baron argues that the HTML did this (quirks-mode) once and shouldn't do it again. I

DanC: The dispatching argument is basically Chris Wilson's argument. They introduced quirks mode and maybe they have three code paths now.
... He's arguing that the HTML WG should allow him to add one more. Others argue that quirks mode hurt and so they shouldn't allow it.
... I think the market dynamics actually dominate a lot of the technical arguments in this case.

TV: The other way to think about this is as a bunch of moving parts: a moving spec and a bunch of moving browsers.
... If you believe that all future specs will be a superset of all previous specs, then you can say that backwards compatibility is handled.
... But there are lots of other moving parts, authoring tools, PHP libraries, JSP libraries, etc.
... Library X of version Y producing HTML version Z.
... I'm not convinced that with that many moving parts you can make something that works without version numbers.

DanC: The argument I hear is that, yeah, it won't work very well. But it won't work much better with version numbers so why bother.

<DanC> (hmm... a version attribute allowed on every element, to accomodate the case where different parts of the doc are produced.)

Henry: Yeah, but where's the harm?

DanC: They are trying to tell you.

Henry: I haven't gotten it yet, can you try again?

DanC: It's got to do with the way authors are motivated to do things for internet explorer. I haven't quite got it.

Stuart: There was a very use-case oriented argument. It doesn't matter what you feel, you need to express the use cases you actually want to solve.
... We should talk about the problems that these design decisions are trying to address.
... Maybe the TAG ought to offer stronger motivation for the WebArch good practice.
... Do we, the TAG, have an obligation to try to provide stronger rationale?

David: I'd support that.
... The way some of this got into WebArch is from an early version of the Versioning finding.
... We could beef it up in the Finding or we could extend WebArch.

TV: We should also keep an open mind, we put it in early, maybe we were wrong and don't really need it.

David: Maybe we could add some notes about understanding your marketplace.

<Stuart> use case centric message was at: http://lists.w3.org/Archives/Public/public-html/2007JanMar/0440.html

DanC: The folks against version numbers claim that the whole world evolves HTML together.

David: That doesn't make much sense to me.

DanC: 99.9999% of the world doesn't even know if there are different versions of HTML.

<Zakim> ht, you wanted to support the idea of trying to identify dimensions

TV: And if they have, then it's only about different behavior in browsers.

Henry: I'd like to support the idea is that we ought to try to understand it by stipulating that we're wrong and the HTML WG is right. It's not always a good idea to identify versions. In that case, then it's our job to identify the dimensions in which languages might differs.
... Then we can say how different classifications are related to whether version identifiers are or are not a good thing.

DanC: The dimensions are extrinsic, it's about the marketplace.

Henry: I don't know if programming languages are an example or a counter example.

<DanC> (I'd like to hear/learn more about the 1.4/1.5 transition)

<DanC> (I'm pretty familiar with the python version transitions. very, very messy.)

DanC: That boils down to compilers being smart enough to put version numbers in the .o files but programmers not being able to put it in the source.

TV: I've heard both arguments for Python.

DanC: On the flip side, that penalizes everyone who has stuff that does work with the new version automatically.

TV: The other Python trick is the "import __FUTURE__" hack.
... That enables future features but becomes a nop when the new version finally ships.

Stuart: Do we have any action items to take away?

<DanC> (it's not "shall". it's "should")

TV: I don't think we understand the problem deeply enough yet to make profound architectural statements about it.

Scribe lost that in echo

David: The current Versioning finding assumes that you want to do versioning and then talks about how you can implement a policy.
... It doesn't really attempt to justify why you'd want version identifiers in the first place.

Stuart: Anything else we can do today?
... Is this something we should come back to, and if so how and when?

DanC: I think we should come back, but I think it'll come up again naturally.
... One possibility is to review the design principles.

<DanC> possibility: review http://esw.w3.org/topic/HTML/ProposedDesignPrinciples

TV: Brokenness is in the eye of the beholder.

DanC: I prefer "honor existing content", but that's too short too.
... There's a 1 paragraph description on the wiki. There's a couple of page version by David Baron too.

<ht> The obvious tension is that "honor existing content" makes the web vulnerable to Gresham's Law. . .

DanC: I haven't argued against the things that I disagree with yet

Henry: I hope we can come back to the "honor existing content" principle.

<DanC> (Gresham's Law? "bad money drives out good". hm. new to me.)

Stuart: I think it's probably well covered by both issues 41 and 54.

<DanC> (to the extent I understand it, issue 54 is all about Gresham's Law.)

<Rhys> draft is at http://lists.w3.org/Archives/Member/tag/2007Mar/att-0063/HttpRange-14.html

Stuart: Rhys has put together a draft for a Finding on httpRange-14.

<ht> Gresham's law (about counterfeitting): "Bad money drives out good" -- when counterfeiting is common, people hoard non-counterfeit money, so the portion of counterfeit in circulation rises rapidly

Rhys: Noah give me some feedback that I think needs to be incorporated before we go public.
... The general feeling I'm getting is that it probably is getting ready for public review.

<DanC> (yet another example of how economics (and information theory) are more relevant to Web Arch (and Internet Arch) than traditional design reflects.)

Rhys: There's a set of open questions that I have in editorial notes that I think would benefit from public discussion.

Stuart: Any objection to putting it in the public?

DanC: On the contrary...

Stuart: Rhys, please put it out whenever you're ready.

Any other business?

We're meeting on 23 Apr unless Norm sends a cancellation notice on Friday.


Summary of Action Items

[NEW] ACTION: Stuart to inform TimBL so that logistics can be resolved. [recorded in http://www.w3.org/2007/04/16-tagmem-minutes.html#action01]
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.128 (CVS log)
$Date: 2007/04/16 17:29:18 $