XML/HTML Task Force -- 21 Dec 2010

Date: 21 December 2010

<scribe> Agenda: N/A

<scribe> Meeting: 1

<scribe> Scribe: Norm

<scribe> ScribeNick: Norm

Norm attempts to describe some of the background of the task force.

MC: TV Raman led a discussion on AC-Forum back in the April time-frame.

James: Perhaps someone could make that discussion public, as I don't have member access.

MC: It may all have been copied to www-tag

<scribe> ACTION: Norm to review the ac-forum mail and see if he can summarize what wasn't made public. [recorded in http://www.w3.org/2010/12/21-html-xml-minutes.html#action01]

<hsivonen> see also the tag list (as opposed to www-tag)

<hsivonen> are we going to use the queue?

yes, henri, sorry

Norm decides to minute this telcon fairly lightly.

Some discussion of what we imagine the TAG's goal to have been in creating the task force.

Henri observes that there are two plausible goals: adding namespaces to HTML and making it possible to parse HTML with an XML parser.

Henri: It appears that the popularity of namespaces is waning even in the XML community, so it doesn't make sense to add it to HTML.
... And it seems unlikely that the majority of HTML authors are going to produce XML-well-formed content, so that's not likely to be broadly successful.

<jcowan> +1 to Henri's points

Henri: I think something like tagsoup or my HTML5 parser that exposes an XML stream from HTML5 is a more likely to be successful approach.

<hsivonen> for the record, I think neither goal is "plausible" as a goal to pursue. they are goals I've heard from TAG members. :-)

JJC: Two goals expressed to me: figure out how to use an XML toolchain to produce web pages and in the future how to reduce the divergence.
... Looking forward ten or twelve years, I think we should be thinking about how to make things better in the long run.

<jcowan> We already know how people process HTML as XML: they use TagSoup or Tidy or NekoHTML.

JCowan: I think convergence has a use beyond parsing the wild XML; it's true it only works in closed contexts, but there are a lot of those.
... the ability to embed HTML as a rich text island in "data XML" is a valuable thing and I think there should be a standard way to do this.
... Polyglot documents focus on XML validity which I'm inclined to think is less valuable than it used to be. I'm more interested in XML well-formedness and HTML validity.

Yves: During the last TAG f2f we discussed the issue. I rember that Raman that having two different stacks, one for XML and one for HTML was costing a lot to all parties involved.
... He wanted more compatibility between tools and libraries.
... At least that was my understanding.

Henri: Two points: first, it sounds like the existence of XHTML5 is getting forgotten. The HTML5 WG is already defining XHTML5 alongside HTML5. There's already a way to express the whole HTML5 vocabulary in XML.
... The main difference is that you can have namespaces that the parser can't output. There are some fringe differences that you can have in HTML but not in XML, for example the FF character is whitespace in HTML but not XML.
... So you can do distributed extensibility with HTML and you can embed HTML in XML with XHTML5.
... Second, the question about software stacks, I think the problem is that people think that we're adding stuff when they see HTML5. But it doesn't add a stack, it documents the existing stack.
... XML is the second stack, but it's not useful to point fingers about which is first or second, except to recognize that HTML5 isn't adding stuff.
... Both stacks are more than a decade old, so neither is being added. One is simply being documented at this point. I think it's way past the point of avoiding adding a second stack.
... There are already at least three stacks and different communities: HTML, XML, and RDF. Treating the situation as if something is being added isn't really productive, I don't think.

JCowan: While those are all valid points, it seems to me that characterizing browser behavior as a stack makes it a kind of truncated stack. It simply renders. There's no transformation facility or other post-processing steps that can interevene.

Henri: The situation before the HTML5 spec is that IE was implementing DOM Level 1 so IE didn't recognize DOM Level 2 in the implementation sense. But gecko, presto, and webkit were implementing DOM Level 2.
... So in all browsers except IE, the view to the data model has been the same for years. There were inconsistencies across the XML/HTML data models, especially with respect to namespaces.
... HTML5 has codified the resolution of these inconsistencies. Now the data model is the same for XML or HTML, with a few small differences in the details.
... Once the parser is done, the data model is the same now. That's something that's an achievement of HTML5. The same approach already existed on the non-browser side.
... First tagsoup and now HTML5 conformant parsers provide the same kind of API for both XML and HTML5. So I think we've gone a long way to unify the data model.
... This means that as far as the stack goes, we've already done much of the unification. You can, for example, use an XSLT engine on HTML5 using the output of my HTML5 parser. It just works, whether the input is XML or HTML5.
... I think it's a win that the stack is shallow, limited just to the parser and the serializer.
... The question is can we unify the parser and the serializer? I think we could unify the serializer, but it seems unlikely to me that we can get more unification on the parser side. It would do violence to one side or the other.

Norm: I sometimes struggle to see what we should do, on the one hand long term harmonization seems like ti would be good, on the other, in the short term Henri's HTML5 parser and an HTML5 serializer do sort of "fix" the problem of how to read/write HTML5/XML together.

JCowan: That makes me think that a possible outcome is a set of recommendations for the XML toolset to be able to serialize HTML5 instead of the current HTML serializer which is incomplete.

<hsivonen> XSLT should definitely get an HTML5 output mode

Norm: Yes, clearly the XML serialization spec could/would/should/will get an "HTML5" serialization method.

MKay: Yes. We decided a year ago that it was too early to start looking at that, if we looked again now we might feel differently.

James: I don't agree with Henri; I think there's plenty that one can do to make things better. But the way to go forward on that is probably to make some concrete use cases as Noah suggested.

Norm: Yes, perhaps some use cases would be a good work item.

MKay: I think one of the use cases is the one John Cowan mentioned, that is handling files that are data rich but include rich textual parts.
... The other is the inverse of that, rich textual files that contain data either XML or RDF. Whether it's an existing XML vocabulary or a new one or a user defined one.
... An important part of that is looking not just at the formats on the wire but also at the programming experience: both in generation and consuming/rendering.
... We need to look at that whole picture from the perspective of processing, not just syntax on the wire.

Henri: Do you mean browsers providing a way to edit non-HTML data natively? Or do you mean JavaScript that might provide editing for the private data?

MKay: I mean the whole spectrum from wikis and form-based data across the whole spectrum.

Henri: The editing story for HTML is actually rather bad in terms of what actually works. I wouldn't expect browsers to be interested in addressing problems beyond editing HTML5 and perhaps SVG for a long time because they've already got lots of issues.

MKay: So there's room for improvement?

Henri: Yes, but I wouldn't expect generic editing to become part of the browser feature set anytime soon beyond what comes along naturally.

MKay: Perhaps architecturally what we'll see is editors as a client tool become a separate kind of tool from browsers.

Henri: I'd expect editing in the browser to be custom JavaScript.

Norm: What can we glean from the past 40 minutes or so for next steps?
... use cases seems like a possibility.

MChampion: I had some good conversations at TPAC about some specific problems.
... Could we write down and triage some of those?

Henri: Terminology-wise, "foreign" means MathML and SVG.

Norm: Is there a term for random XML?

Henri: No, because it's not possible in text/html.
... The specific issue that David Carlisle mentioned is about non-intuitive error handling.
... If you stick to the cases where HTML5 is expected in foreign markup, then things work ok now.
... The error handling isn't intuitive if you put them elsewhere.

JCowan: And is it to late to fix this in HTML5?

Henri: It's not a bug, it's a feature. It minimizes the risk to getting mathml and svg support deployed in browsers.
... There is existing web content that contains math or svg tags. In order to keep those pages more-or-less backwards compatible, we have to have the current rules.

<jjc> +q

Henri: The counter-intuitive behavior only arises if the document is an error. If you try to do sensible stuff, you don't see this behavior.
... Even if we decided it was a problem, it would be too late to fix it. It's already shipping in Chrome and will ship in Firefox 4.

<jcowan> This spec is

James: I'm troubled by this idea that there's nothing that can be changed in HTML5. HTML5 is a WD, if the W3C process means anything, the idea that something is frozen and static before it gets into last call is off base.
... I also completely disagree that one has to be constrained by what existing browsers do. There used to be two modes but folks have judged that that's not good. But the case could be made for the other decision.
... The idea that there should be one mode and standards mode should be quirky is very disappointing.

JCowan: I think there's a distinction between prospective and retrospective standardization. This is retrospective standardization and that does make things less fixable.
... This may come to an end at some point, but I don't think it's appropriate to complain that they're not behaving like a prospective standardization group. They aren't because that's not where we are.

Henri: As far as the process goes, I think the W3C process is out of touch with reality as far as the implementation overlap with the specification process goes.
... In theory you're supposed to start implementing after CR. But in practice, for something as complex as a browser, you need to have a constant feedback cycle.
... It's unfortunate that the process document doesn't recognize this.
... It seems that the HTML5 WG gets more scrutiny on this point; I think the problem isnt the WG but the process document.
... About the modes: there's a big difference between browser vendors on this point. In IE8, there are 4 modes; I think there are 7 in IE9. Other vendors with the experience of having 2.5 or 3 modes, have been pushing to remove modes.

<hsivonen> http://hsivonen.iki.fi/doctype/#ie8

Henri: I think it's unrealistic for a WG or process to impose modes. Doing HTML5 with no new modes is how it has to be.

ack

<MikeK> I regret I have to leave you now for another call. I'll stick around on IRC

MChampion: I think to address Henri's point. This is implementation feedback, this is rapid integration with the waterfall model. There's a problem with real use cases. This isn't even a LC WD, in principle it should be open to a bug report from the XML community saying that this isn't going to work, especially if a reasonable fix was proposed.
... I think it would be reasonable for this TF to triage the problem report. Does it effect enough users? Is it worth fixing, even if it introduces some churn in the HTML5 spec?
... I wouldn't propose or preclude any particular solution. The mission I'd like to see for this TF is to assess how severe the problem is and to see if a solution can be proposed.
... It may be too hard to change, but I don't think we should make that decision apriori.

Norm: We're losing folks.

Adjourned.

- DRAFT -

XML/HTML Task Force

21 Dec 2010

Attendees

Contents

Summary of Action Items

Scribe.perl diagnostic output