XML/HTML Task Force -- 04 Jan 2011

Accept this agenda?

http://www.w3.org/2010/html-xml/2011/01/04-agenda

Accepted.

Accept minutes from the previous meeting?

-> http://www.w3.org/2010/html-xml/12/21-minutes.html

Accepted.

Next meeting: 11 January 2011

No regrets heard.

Welcome to new members

Welcome Anne and TV.

Use cases and articulating goals

MKay: There's been a bit of traffic over the holiday.
... I've only skimmed it.

Henri: One thing I'd like to highlight is my observation that HTML5 as been converging standards mode and quirks mode and HTML/XML data models.
... One thing to keep in mind that trying to make progress by adding new modes may lead to divergence in other areas.

Norm: In what sense are new modes convergence?

Henri: Last time we talked about the possibility of a new mode that would make the parser more XML-like, I'd like to point to out that that would be divergent from the legacy code path.
... Convergence in one place may cause divergence in another place. Trying to converge just with XML, ignoring legacy HTML, would be taking a step backwards.

Norm: I think that's a fair point.
... I do think the email became a bit argumentative and I was tempted to try to correct that but decided instead to let it play out.

<Zakim> noah, you wanted to suggest summarizing the thread

Noah: I had tuned for a bit, but came back to it a few days ago.
... I think someone could pretty succinctly set down the points of the arguments.
... Distributed extensiblity is good or bad, in or out of the spirit of a certain kind of markup
... Or what matters is really the code or broader matters.
... We should try to capture them, without taking sides. Then as the discussion moves forward there will be tradeoffs to make.
... Some folks will remain in favor or oppposed to certain points regardless, but we could capture those points.

Norm: Yes, that sounds like a good idea.

Noah: I think that might also clear the email thread.
... I think opinions fall on a scale, the degree to which the user agent community represents a definitive or only a significant fraction of what we need to worry about.

Henri: I prefer to go over the use cases and start there rather than listing disagreements about things like distributed extensibility which isn't a use case in and of itself.

<noah> FWIW, I agree with Henri on focusing on use cases. I was just observing that a lot of the email thread was thrashing on pros and cons of distribtued extensibility, and I think we can net out the disagreement and move on. For now.

MChampion: I agree that focusing on the use cases is a good place to start. Identifying old arguments may be a good idea, but I like where [Norm] started. It may be more important to converge with one side or another, but that's not a filter on the list of use cases.

<hsivonen> http://lists.w3.org/Archives/Public/public-html-xml/2010Dec/0064.html

Use cases

Norm: Use case 1 was an XML toolchain to consume HTML5 content

Norm attempts to summarize the world.

Henri: In this case, if the application receives application/xml+html content, it should instantiate an XML parser and if it gets text/html, it should instantiate an HTML5 parser.
... From that point on, HTML, MathML, and SVG elements will appear in their respective namespaces.
... The HTML5 parser won't report elements in any other namespaces, but if you've written your code to handle arbitrary namespaces, then whatever the HTML5 parser gives you will be a subset of the stuff you're already ready to handle.
... There's no need for the application to abstract anything about namespaces. The HTML5 parser already does that for you.

<Zakim> noah, you wanted to ask if this is the place to promote enlarging the polyglot subset?

Noah: I'm not sure on the right point here, is this a point where we should ask about enlarging the polyglot subset?
... I think Henri may have suggested that we're already heading in that direction.

Henri: Surely extending the polyglot subset isn't a use case. It might be a solution to a use case, but as long as we're considering use cases, that's not a use case.

Noah: I guess I was asking a terminology question in some sense.

Norm: I'll try to make WF XML vs. polyglot specifically clearer.

Anne proposes a new use case

Anne: I think one interesting use case is the ability to generate XML w/o XML tools. You might do this by just generating strings, for example, the way that WordPress generates feeds.
... Pretty much anyone who generates XML using string concatenation run into problems with, for example, certain characters, and then get a non-well-formed page.
... I think that's one of the problems that people have with doing XML properly.

<hsivonen> I agree that this is major problem for people who try to use XML

Norm: How does that bear on the intersection of HTML and XML?

Anne: I think this is pretty much what HTML was designed to support. About 95% of HTML wasn't valid in 2005, but the upside is that a lot more people can publish it.

Norm: So the use case is wanting to produce structured markup and if you're producing HTML it's easy but if you're producing XML it's hard.

Norm: Use case 2 was the other way around, I have HTML5 tools and I want to process XML

Norm summarizes his email.

Henri: Now that HTML5 has unified the data model between HTML and XML, the goal is to be able to use the XML toolchain except for the parser or the serializer.
... What *is* the HTML5 "toolchain"?
... Who's use case is this?

<noah> Norm: I want to do things like put Docbook XML into an otherwise HTML stream, and use CSS and maybe Javascript to style it.

<anne> So you want to embed raw XML into HTML? Or does the HTML parser need to handle XML documents?

Norm: I imagine there will be HTML5 tools for doing things, like say offline rendering, and I might want to use them with XML even if they were designed to do HTML5.

Henri: I can't think of any system that doesn't, or wouldn't, naturally support XML too.

<noah> That use case seems significant to me.

<noah> Henri, isn't the question in part what the specifications say about the use case, as well as noting that current implementations tend to support it?

MKay: Thinking of things that typically form part of an authoring or publishing platform: like validation, transclusion, differencing of documents, and those are all things that one can do by processing the HTML into XML and using XML tools.
... But that's always a slightly undesirable thing to do if you end up modifying content that human authors are going to touch again.
... So how should those things be done? Should they be done all in an HTML world, or should we convert them to XML?

<noah> I think people sometimes use server-side tooling, like PHP, for transclusion scenarios.

<hsivonen> noah, maybe, but specs can't make tools support stuff that the tool authors don't see demand for. Hence, HTML5 allows implementations of only text/html or only application/xhtml+xml

Anne: Why does this require an HTML5 toolchain?

MKay: Yes, you can do it with XML, but the question is can you get back what you started with, or something that the original author recognizes?

Norm: The HTML community is large and I can't imagine that there won't some day be HTML tools that I'll need or want to use with my XML content.

Henri: I think if the tool vendor makes the tool HTML only, then the tool vendor doesn't care about XML. So you want to make the tool work with XML even though the tool vendor doesn't care about you.
... So you're trying to do something to work around the vendor not caring.

Norm: Fair enough. But I imagine this *will be the case*.

Noah: +1 to what Norm said. Having tools that are immensely valuable that don't have the APIs you need are extremely common.
... This tool is 85% of exactly what I want but there's this big gap.

Norm: I think this is largely about making XML play nicer with HTML

Henri: It seems to me that having a simpler XML doesn't solve your problem if you already have the XML. You already have legacy content, so you already have the problem.

Noah: I think that depends on what the subset is.
... If you ruled out an odd character set, for example, then that might still be useful.
... Processing instructions are somewhere near the middle.
... In some cases the subset might be more useful.

Norm: Yes, I think that's what I was getting at. Dropping namespaces and throwing out PIs might be a lot easier than converting to HTML.

Anne: It would be nice if it was a little clearer that it was about embedding XML in HTML.
... Parsing a real XML document with an HTML5 parser is probably never going to be possible.

Norm: This use case really wasn't about embedding to me.

Anne: They seem very closely related.

<hsivonen> Anne, you are now restricting the use case with assumptions about solutions :-)

Norm: Fair enough. I was trying to tease apart the use cases to help the discussion.

<anne> hsivonen, heh, fair enough

Any other business?

Norm: Sounds like we're happy to talk about use cases, so perhaps we should be thinking about producing a use cases document.

General sounds of assent.

Adjourned.

- DRAFT -

XML/HTML Task Force

Meeting 2, 04 Jan 2011

Attendees

Contents