HTML/XML Task Force -- 25 Jan 2011

Accept this agenda?

http://www.w3.org/2010/html-xml/2011/01/25-agenda

Accepted.

Accept minutes from the previous meeting?

-> http://www.w3.org/2010/html-xml/2011/01/18-minutes

Accepted.

Next meeting 1 Feb 2011

None heard.

Review wiki use cases

-> http://esw.w3.org/HTML_XML_Use_Cases

Norm: First use cases is 01: http://esw.w3.org/HTML_XML_Use_Case_01

Henri summarizes the use case.

Norm: Does anyone think that Henri has failed to capture the use case or overlooked a solution that was discussed?

John: I think he underrates polyglot markup; it's true that polyglot markup doesn't let you handle arbitrary HTML, but it does let you handle non-arbitrary HTML.

Norm: Second use case is 02: http://esw.w3.org/HTML_XML_Use_Case_02

MChampion summarizes the use case.

<Zakim> noah, you wanted to ask suggest a clarification

Noah: When I read the wiki, my first reaction was "do you mean someone has content that they probably thought of as XHTML and they want to consume that or do you mean something like a purchase order".

Mike: I think it was probably the PO

<hsivonen> the wiki page says "non-XHTML"

Noah: Right. I think that was the case and the page could be clarified.
... Sophisticated users may have some idea that handing arbitrary XML to HTML5 will produce something that might be useful.

Mike: I think the vast majority of users don't have a clear sense of what's in HTML

Noah: All I was suggesting that clarifying the page would be better.

Mike: Right. I can do that. I thought it was about arbitrary stuff with angle brackets. Either they don't know or don't care about the tag set.

Mike; What were you thinking of Norm?

Norm: I was thinking of DocBook or Chemical Markup Language or a Purchase Order. Nothing vaguely like XHTML.

<scribe> ACTION: Mike to update the wiki to clarify that point. [recorded in http://www.w3.org/2011/01/25-html-xml-minutes.html#action01]

<Zakim> Norm, you wanted to ask if Henri's solution to 01 is possibly applicable here

Norm: I wondered if Henri's solution to use case 01 would be applicable. If you had an XML parser that produced an event stream that could be read by an HTML tool and produced (let's say no namespaces) elements named LINK with content etc. Would that just work?

John: I think that sounds like use case 05

Norm: Is closing link elements and such only a parse time thing?

Henri: It's only parse time, but it depends on the interface to the tool.
... If the tool you have has a text or byte-oriented interface so that you have to give it a text/html document, then you can't do it.
... The internals of an HTML5 tool would be like the internals of an XML tool. In the browser for example, the DOM is namespace-aware.
... If you define the toolchain as stuff that happens after parsing and you can give a DOM to it, then there's no problem
... But I disagree with the wiki page where it says there's no parsing problem. It's not just about namespaces, there's also the void elements and other elements that have specific processing associated with them. And the empty element syntax in XML. There's much more to the parsing algorithm than you might expect.
... If the use case is something like offline rendering, and you define the conversion broadly enough, then it might work, but if you want to some trivial serialization of the XML and then parse that, then there's more problems than what the wiki page says.

Mike: What edit would you suggest?
... Or do you think the use case is pointless?

Henri: I'd go for the lower expectations solution unless you're willing to make very broad conversions.

John: Are there HTML tools outside the browser that don't deal in HTML syntax?

Henri: The validator.nu validator is I guess an HTML5 tool, but it's also an XHTML5 tool.

John: We know some things do both.

Henri: So far, I don't think there are any HTML tools only that are only HTML5 aware and not XHTML aware.

John: And I expect they'd all deal in syntax. Like an HTML editor that reads syntax and writes syntax.
... What does it mean to have a tool that deals in a DOM?

Henri: I think Saxon is one such example.

Anne: There are HTML editors that take a DOM as well, like BlueGriffon (I think)

<hsivonen> (BlueGriffon operates on a DOM, but I don't know if you can give it one except by letting it parse HTML or XML)

Norm: Next use case is 03: http://esw.w3.org/HTML_XML_Use_Case_03

Noah checks in his text and the TF reads for five minutes.

Henri: When it says that the HTML might be generated by a tool over which the user has no control, in that case, you might also apply an HTML parser to turn that content into XHTML and then you're back to the first bullet point.

Noah: I think I agree, under solutions there's a third one: take the HTML, parse it, reserialize as XHTML, and the process that.

Henri: At the point where you're generating the larger XML file, you can serialize to XHTML.

Noah: I understand; I'll have to think about how to split that across the problem statement and a solution.

Noah describes how he might edit the page to general agreement.

Norm: Anyone think the problem is mistated or that solutions we discussed were overlooked.

Anne: This looks complete to me.
... I'm not sure this is a problem that needs any more solution than we already have.

Norm: Any further discussion?

None heard.

Norm: The next use case is 04: http://esw.w3.org/HTML_XML_Use_Case_04

John summarizes

John: Kurt put a bunch of stuff in the discussion section that I haven't looked at yet.

<noah> I would find it helpful to have this wiki page using the separate problem-statement/solution-statement style that the other pages use.

Norm: Thank you. As Noah suggests, would you be willing to rework the page to have problem statement/solution as the other pages do?

John: I'll try; it's a busy week for me.

Norm: Does anyone think this fails to capture the problem or the solutions that were discussed?

Noah: I'm having trouble grokking it in this form, but I'll work through it.

Henri: I think that what John wrote captures the discussion pretty well; what Kurt added goes into new areas that weren't discussed very much.

Norm: John, when you look at Kurt's discussion, will you please let us know if find anything that's genuinely new?

John: Yes.

Noah: I was hoping for that level of exposition on the original page.

<hsivonen> Norm, you are aware of srcdoc, right?

<Norm> hsivonen, yes and I *hate* it with a firey passion.

<hsivonen> (My time is up. Gotta go. Regrets.)

<anne> bye hsivonen

Some discussion of the nature of markup. Structured attributes, order, anonymity, etc.

Norm: Next is 05: http://esw.w3.org/HTML_XML_Use_Case_05

Anne summarizes.

John; I believe it's true however that concatenation doesn't work with 100% reliability for HTML either.

Anne: True, it's 95% or so.

John: So you end up with something that's more complex than XML conceptually becuase you have all of XML plus all the recovery strategy.

Anne: That makes the parser simpler...

John: I didn't say the parser, I mean it makes the language more complicated.

Noah: Getting into XML5 under string concatenation seems a bit backwards.
... I think the use case is that you think you have WF XML, you want to have WF XML, but sometimes you blow it. You want to serve XML5 because you want error handling if you get it wrong. XML without draconian error handling.
... For me, that's the use case for XML5. Let's gently move the world towards better markup.

<anne> ("better" imo)

Noah: Half the time when you make a mistake, it's a mistake you want to fix.

<anne> I am not sure where this is going? Is Noah volunteering for writing up another use case?

Norm: I think that's a little different than 05, Noah, would you be willing to write that up as an 07 use case?

Noah: Yes, of course. I don't think the overlap is a problem.

<scribe> ACTION: Noah to write up the use case for XML+error recover as Use Case 07. [recorded in http://www.w3.org/2011/01/25-html-xml-minutes.html#action02]

John: Having spent the last week writing a MicroXML parser, I had to work a lot harder than if I was going to be draconian. I had to always be in a recoverable place.

Anne: I did it too and I didn't think it was that hard. It's actually simpler because XML has a lot of places where you have to check the character ranges and with recover you don't have to do that.

John: I was talking about much more complex recoveries; for example, when you get a not well formed comment, what do you do?

Some discussion about how hard the recovery problem is.

Norm: Ok, next week we'll look at 06 and 07 and any others that have changed significantly. Then I think we need to consider next steps. Are there any?

Any other business?

None heard.

Adjourned.

- DRAFT -

HTML/XML Task Force

Meeting 5, 25 Jan 2011

Attendees

Contents

Accept this agenda?

Accept minutes from the previous meeting?

Next meeting 1 Feb 2011

Review wiki use cases

Any other business?

Summary of Action Items