See also: IRC log
Chair apologizes for lack of agenda and careful planning in prep for this meeting. Expresses goal as simply an initial meeting, continuing the conversations that have started on the list about our goals as a task force.
MKay: Could you give us some background?
Norm attempts to describe some of the background of the task force. It arose from the TAG issue HTML-XML-Divergence-67.
MC: TV Raman led a discussion on AC-Forum back in the April time-frame.
James: Perhaps someone could make that discussion public, as I don't have member access.
MC: It may all have been copied to www-tag
<scribe> ACTION: Norm to review the ac-forum mail and see if he can summarize what wasn't made public. [recorded in http://www.w3.org/2010/12/21-html-xml-minutes.html#action01]
<hsivonen> see also the tag list (as opposed to www-tag)
Scribe struggles to work out the right level of detail for scribing this meeting. Probably unsuccessfully.
Some discussion of what we imagine the TAG's goal to have been in creating the task force.
Henri observes that there are two plausible goals: adding namespaces to HTML and making it possible to parse HTML with an XML parser.
Henri: It appears that the
popularity of namespaces is waning even in the XML community,
so it doesn't make sense to add it to HTML.
... And it seems unlikely that the majority of HTML authors are
going to produce XML-well-formed content, so that's not likely
to be broadly successful.
<jcowan> +1 to Henri's points
Henri: I think something like tagsoup or my HTML5 parser that exposes an XML stream from HTML5 is a more likely to be successful approach.
<hsivonen> for the record, I think neither goal is "plausible" as a goal to pursue. they are goals I've heard from TAG members. :-)
James: Two goals expressed to me:
figure out how to use an XML toolchain to produce web pages and
in the future how to reduce the divergence.
... Looking forward ten or twelve years, I think we should be
thinking about how to make things better in the long run.
<jcowan> We already know how people process HTML as XML: they use TagSoup or Tidy or NekoHTML.
JCowan: I think convergence has a
use beyond parsing the wild web; it's true it only works in
closed contexts, but there are a lot of those.
... the ability to embed HTML as a rich text island in "data
XML" is a valuable thing and I think there should be a standard
way to do this.
... Polyglot documents focus on XML validity which I'm inclined
to think is less valuable than it used to be. I'm more
interested in XML well-formedness and HTML validity.
Yves: During the last TAG f2f we
discussed the issue. I rember that Raman that having two
different stacks, one for XML and one for HTML was costing a
lot to all parties involved.
... He wanted more compatibility between tools and
libraries.
... At least that was my understanding.
Henri: Two points: first, it
sounds like the existence of XHTML5 is getting forgotten. The
HTML5 WG is already defining XHTML5 alongside HTML5. There's
already a way to express the whole HTML5 vocabulary in
XML.
... The main difference is that you can have namespaces that
the parser can't output. There are some fringe differences that
you can have in HTML but not in XML, for example the FF
character is whitespace in HTML but not XML.
... So you can do distributed extensibility with HTML and you
can embed HTML in XML with XHTML5.
... Second, the question about software stacks, I think the
problem is that people think that we're adding stuff when they
see HTML5. But it doesn't add a stack, it documents the
existing stack.
... XML is the second stack, but it's not useful to point
fingers about which is first or second, except to recognize
that HTML5 isn't adding stuff.
... Both stacks are more than a decade old, so neither is being
added. One is simply being documented at this point. I think
it's way past the point of avoiding adding a second
stack.
... There are already at least three stacks and different
communities: HTML, XML, and RDF. Treating the situation as if
something is being added isn't really productive, I don't
think.
JCowan: While those are all valid points, it seems to me that characterizing browser behavior as a stack makes it a kind of truncated stack. It simply renders. There's no transformation facility or other post-processing steps that can interevene.
Henri: The situation before the
HTML5 spec is that IE was implementing DOM Level 1 so IE didn't
recognize DOM Level 2 in the implementation sense. But gecko,
presto, and webkit were implementing DOM Level 2.
... So in all browsers except IE, the view to the data model
has been the same for years. There were inconsistencies across
the XML/HTML data models, especially with respect to
namespaces.
... HTML5 has codified the resolution of these inconsistencies.
Now the data model is the same for XML or HTML, with a few
small differences in the details.
... Once the parser is done, the data model is the same now.
That's something that's an achievement of HTML5. The same
approach already existed on the non-browser side.
... First tagsoup and now HTML5 conformant parsers provide the
same kind of API for both XML and HTML5. So I think we've gone
a long way to unify the data model.
... This means that as far as the stack goes, we've already
done much of the unification. You can, for example, use an XSLT
engine on HTML5 using the output of my HTML5 parser. It just
works, whether the input is XML or HTML5.
... I think it's a win that the stack is shallow, limited just
to the parser and the serializer.
... The question is can we unify the parser and the serializer?
I think we could unify the serializer, but it seems unlikely to
me that we can get more unification on the parser side. It
would do violence to one side or the other.
Norm: I sometimes struggle to see what we should do, on the one hand long term harmonization seems like ti would be good, on the other, in the short term Henri's HTML5 parser and an HTML5 serializer do sort of "fix" the problem of how to read/write HTML5/XML together.
JCowan: That makes me think that a possible outcome is a set of recommendations for the XML toolset to be able to serialize HTML5 instead of the current HTML serializer which is incomplete.
<hsivonen> XSLT should definitely get an HTML5 output mode
Norm: Yes, clearly the XML serialization spec could/would/should/will get an "HTML5" serialization method.
MKay: Yes. We decided a year ago that it was too early to start looking at that, if we looked again now we might feel differently.
James: I don't agree with Henri; I think there's plenty that one can do to make things better. But the way to go forward on that is probably to make some concrete use cases as Noah suggested.
Norm: Yes, perhaps some use cases would be a good work item.
MKay: I think one of the use
cases is the one John Cowan mentioned, that is handling files
that are data rich but include rich textual parts.
... The other is the inverse of that, rich textual files that
contain data either XML or RDF. Whether it's an existing XML
vocabulary or a new one or a user defined one.
... An important part of that is looking not just at the
formats on the wire but also at the programming experience:
both in generation and consuming/rendering.
... We need to look at that whole picture from the perspective
of processing, not just syntax on the wire.
Henri: Do you mean browsers providing a way to edit non-HTML data natively? Or do you mean JavaScript that might provide editing for the private data?
MKay: I mean the whole spectrum from wikis and form-based data across the whole spectrum.
Henri: The editing story for HTML is actually rather bad in terms of what actually works. I wouldn't expect browsers to be interested in addressing problems beyond editing HTML5 and perhaps SVG for a long time because they've already got lots of issues.
MKay: So there's room for improvement?
Henri: Yes, but I wouldn't expect generic editing to become part of the browser feature set anytime soon beyond what comes along naturally.
MKay: Perhaps architecturally what we'll see is editors as a client tool become a separate kind of tool from browsers.
Henri: I'd expect editing in the browser to be custom JavaScript.
Norm: What can we glean from the
past 40 minutes or so for next steps?
... use cases seems like a possibility.
MChampion: I had some good
conversations at TPAC about some specific problems.
... Could we write down and triage some of those?
Henri: Terminology-wise, "foreign" means MathML and SVG.
Norm: Is there a term for random XML?
Henri: No, because it's not
possible in text/html.
... The specific issue that David Carlisle mentioned is about
non-intuitive error handling.
... If you stick to the cases where HTML5 is expected in
foreign markup, then things work ok now.
... The error handling isn't intuitive if you put them
elsewhere.
JCowan: And is it to late to fix this in HTML5?
Henri: It's not a bug, it's a
feature. It minimizes the risk to getting mathml and svg
support deployed in browsers.
... There is existing web content that contains math or svg
tags. In order to keep those pages more-or-less backwards
compatible, we have to have the current rules.
Henri: The counter-intuitive
behavior only arises if the document is an error. If you try to
do sensible stuff, you don't see this behavior.
... Even if we decided it was a problem, it would be too late
to fix it. It's already shipping in Chrome and will ship in
Firefox 4.
James: I'm troubled by this idea
that there's nothing that can be changed in HTML5. HTML5 is a
WD, if the W3C process means anything, the idea that something
is frozen and static before it gets into last call is off
base.
... I also completely disagree that one has to be constrained
by what existing browsers do. There used to be two modes but
folks have judged that that's not good. But the case could be
made for the other decision.
... The idea that there should be one mode and standards mode
should be quirky is very disappointing.
JCowan: I think there's a
distinction between prospective and retrospective
standardization. This is retrospective standardization and that
does make things less fixable.
... This may come to an end at some point, but I don't think
it's appropriate to complain that they're not behaving like a
prospective standardization group. They aren't because that's
who they [the HTML WG] are.
Henri: As far as the process
goes, I think the W3C process is out of touch with reality as
far as the implementation overlap with the specification
process goes.
... In theory you're supposed to start implementing after CR.
But in practice, for something as complex as a browser, you
need to have a constant feedback cycle.
... It's unfortunate that the process document doesn't
recognize this.
... It seems that the HTML5 WG gets more scrutiny on this
point; I think the problem isnt the WG but the process
document.
... About the modes: there's a big difference between browser
vendors on this point. In IE8, there are 4 modes; I think there
are 7 in IE9. Other vendors with the experience of having 2.5
or 3 modes, have been pushing to remove modes.
<hsivonen> http://hsivonen.iki.fi/doctype/#ie8
Henri: I think it's unrealistic for a WG or process to impose modes. Doing HTML5 with no new modes is how it has to be.
<MikeK> I regret I have to leave you now for another call. I'll stick around on IRC
MChampion: I think to address
Henri's point. This is implementation feedback, this is rapid
integration with the waterfall model. There's a problem with
real use cases. This isn't even a LC WD, in principle it should
be open to a bug report from the XML community saying that this
isn't going to work, especially if a reasonable fix was
proposed.
... I think it would be reasonable for this TF to triage the
problem report. Does it effect enough users? Is it worth
fixing, even if it introduces some churn in the HTML5
spec?
... I wouldn't propose or preclude any particular solution. The
mission I'd like to see for this TF is to assess how severe the
problem is and to see if a solution can be proposed.
... It may be too hard to change, but I don't think we should
make that decision apriori.
Norm: We're losing folks.
Adjourned.