Summary: HTML WG October 2008 face-to-face meeting

1. Spec-splitting and recruiting editors # T

See the minutes of the discussion about spec-splitting and recruiting editors.

The group discussed some of the challenges involved in splitting out parts of the current HTML5 draft into separate specifications, and of finding and recruiting additional editors. Ian Hickson took an action to provide an evaluation of what parts of the current draft could potentially be split out and taken on by separate editors, along with an assessment about what level of effort would be needed to maintain each of those.

A few days after the face-to-face meeting, Hixie followed up on that action item by posting HTML5 Specification - List of sections and corresponding work estimates, and later, with the comment “Upon further review I noticed some estimates that should probably be revised down”, followed with a list of some revisions to those initial estimates.

2. Joint meeting with W3C Technical Architecture Group (TAG) # T

See the minutes of the discussion with the TAG.

The joint meeting between the HTML WG and TAG began with a discussion of the issue of modularization of the HTML5 specification, in particular the idea of producing a separate specification that normatively defines just the HTML markup language itself.

Henry Thompson from the TAG led off with this comment:

I care a lot about distinguishing about the definition of the language as a formal artifact and the discussion of the behavior of what browsers do with something that purports to be of that language.

The discussion drifted a bit into areas concerning the stability of the HTML5 draft and the need to reference it normatively in other specifications, before returning to the subject of modularization, in particular the topic of producing a separate normative “markup language” specification. Noah Mendelssohn commented:

My personal preference is that you do create such a spec. I don’t have a reason, just abstract intuition… Do you write one or more non-normative informative guides to help authors write stuff… I heard [here today] that in general they should encourage the creation of clean content… The more controversial option is should you write a normative and precise document that specifies only the clean language and its semantics.

What I have in mind is a document would be a document that would include the syntax, as well as the normative definitions of what a table is, a paragraph, etc. Ideally, I would then NOT repeat those semantics in the “larger” user agent spec.; I would have the user agent spec refer to the language spec for that.

It should also be noted that that prior to the meeting, some messages related to the separate-spec topic were posted to mailing lists: A Suggestions regarding creation of an "Authoring Specification" for HTML 5 message that Noah posted to the public-html-comments mailing list, as well as an Re: HTML5: clean and non-clean message Noah posted to the www-tag mailing list. From those messages, here are some selected excerpts:

I think the specification for the authoring of correct HTML 5 documents is of great importance. I understand that you are hoping that the need can be met in part by, eventually, using scripts to produce a stripped down version of the existing draft, leaving out much of the parsing and error recovery detail. Perhaps this will lead to a first class result, but I have some nervousness that the result might not be as effective as one might like.

I think it would be a good idea to generate representative drafts sooner rather than later. If practical, this could be done by marking up the existing draft and running the full automated process. If that’s impractical soon, as I suspect may be the case, I would think that one or two members of the HTML working group could be tasked with manually producing a partial skeleton for evaluation, including at least some of the key sections such as 8.1, and representative slices of some of the others. I think the resulting draft should be circulated for comment, and should be used to inform planning for how the final HTML 5 authoring draft will eventually be prepared.

There’s a risk that, if all one does is to strip the existing spec. to produce the authoring spec, [some] key aspects of correct HTML 5 will be unduly hard to discover.

I understand that to some extent there is already an intention to produce a separate “Authoring Specification” that is somewhat similar to what I propose, and I’m glad that’s being considered. I would prefer that such a specification be viewed not as an “authoring specification” but as “The HTML 5 Language Specification”, I.e. a document that’s of equal interest whether you are writing or reading an HTML 5 document. It would allow you to answer two questions: 1) is this a legal HTML 5 document? and 2) if yes, what does this document mean? If automatic extraction of the pertinent bits from the current drafts produces a first class exposition of such language specification, that’s great, but I have some suspicion that a far cleaner, smaller, and easier to read specification could be written either by hand, or by careful manual adaptation of the current work.

So, the net result of the proposed separation would be two documents:

The HTML 5 Language Specification (as proposed above)

The HTML 5 Browser Specification

Larry Masinter later made a related comment:

I think rather than thinking about browsers and authors, since most content today is created by other software, you should use the terms producers and consumers.

Following the main part of the discussion about the markup-language spec, the topic then turned to other types of potential modularization of the spec; for example, the possible value in taking the section of the HTML5 draft that defines what URLs are and how user-agents should handle them, and making it a separate specification.

The editor of the HTML5 draft, Ian Hickson, and a number of members of the HTML WG pointed out that the complexities among dependencies in the HTML5 draft make it a challenge to separate out certain parts of the spec.

Another issue that came up during that discussion was in regard to parts of the HTML5 draft that conflict with other existing specifications. T.V. Raman commented:

The meta-issue here isn’t about bits and bytes, but about whether this WG should be codifying existing violations of existing RFCs.

The editor and members of the group pointed out that the parts of the HTML5 draft that conflict with with existing specifications are generally doing so because for those cases, UA/browser behavior actually conflicts with those existing specifications, and the HTML5 specification, by design, attempts to precisely document “real world” interoperable UA/browser behavior — even for cases where there is general agreement that certain specific behaviors (such as content-type sniffing) in browsers may be less than ideal.

The discussion closed with an action item being assigned to the HTML WG co-chair to lead an HTML WG response to TAG discussion and report back to the TAG at some later time, and with an action item being assigned to the W3C Director, Tim Berners-Lee, to write up a summary from the TAG perspective on at the discussion, and description of what kind of modularization of the HTML5 work that TAG thinks would recommend exploring further.

3. Forms in HTML5 # T

See the minutes of the discussion about forms.

A portion of the HTML WG face-to-face meeting was structured as an “open house”, with members of other groups being invited to attend and bring their questions and concerns about HTML5 to the HTML WG. The first part of that portion of the meeting was a discussion of handling of forms in HTML5.

The editor of the HTML5 draft, Ian Hickson, gave a status update on the integration of forms support in the HTML5 draft, reporting that he had completed integration of the previous Web Forms 2 specification into the HTML5 draft, and had a large number of comments related to forms that he would be responding to by e-mail.

Charlie Wiecha and Nick Van den Bleeken from the W3C Forms Working Group were in attendance for this part of the meeting. Charlie talked briefly about work on an “attribute-oriented forms notation” that could potentially be integrated with HTML5, and gave a location for a WebFormsA: Streamlined Expression of Data-Rich Web Applications document related to that work.

4. MathML in text/html # T

See the minutes of the discussion about MathML.

Neil Soiffer, representing the MathML working group, attended this part of the meeting in order to discuss an issue of the MathML plugin for IE requiring use of namespace prefixes in MathML content, and the potential (in)compatibility of the HTML5 parsing algorithm with such MathML content containing prefixes.

5. GRDDL, RDFa, @rel value registry, extensibility in text/html # T

See the minutes of this portion of the meeting.

Harry Halpin, the chair of the W3C GRDDL Working group, attended this portion of the meeting and led a discussion around the topic of what the HTML5 language provides for extensibility mechanisms in text/html (non-XML) content.

During the discussion, it was pointed out that there is a need among Atom, XHTML2 and HTML5 for a shared mechanism for keeping an ongoing record of common/standard values of the rel attribute (because that set of values is not static — it grows as new uses/values for the rel attribute emerge). That lead to discussion about a potential need for a “rel registry” for those values.

The discussion turned to RDFa and CURIEs, with Henri Sivonen commenting:

I was vocal in that discussion against taking RDFa as-is. A lot of it revolves around the way RDFa uses XML namespace.

The other objection was related to people who aren’t in the RDF community having to pay the “RDF tax”… I’d like a solution similar to GRDDL. People who want it can, but doesn’t cause problems for people that don’t.

Harry followed up with a question, “Would HTML5 be willing to sort out some way to use short CURIE like values?” and Henri’s response was, “I suggest you use a registry, and concatenate the short values with a base URI.”

6. Fragment identifiers for audio and video content # T

See the (short) minutes of the discussion about media fragments.

Silvia Pfeiffer and Raphaël Troncy of the W3C Media Fragments Working Group gave a short presentation to inform the HTML WG about the work they are doing and how it potentially relates to HTML5 media (audio and video) content.

7. Table headers # T

See the minutes of the discussion about table headers.

Joshue O Connor led a discussion about how the spec should best provide markup for header associations in complex data tables, and provided some relevant links:

Al Gilman "function and impacts (was: @scope and >@headers reform)" posting to public-html
ESW Wiki: headers attribute Issue
HTML WG bugzilla issue #5822: The headers attribute should be able to reference a td
Gez’s complex table example
another table example from Anne van Kesteren
Smart span algorithm for table cells (James Graham)

Al Gilman gave a summary of the issues and current state of the ongoing dialog around them; there was then some discussion about proposed solutions, followed by discussion about the timeline for trying to get a resolution on the issues. Al concluded by noting “What we want to do in face time, we’ve done”.

8. HTML integration point for HTTP authentication # T

See the minutes of the discussion about an HTML integration point for HTTP authentication.

Julian Reschke led a discussion about issues with HTTP authentication and specifically about limitations in the user interface that browsers provide for it.

Some relevant links:

Hixie’s list of messages related to HTTP authentication that are still planning to (re)read and evaluate and eventually respond to
HTML WG tracker issue 13: Handling HTTP status 401 responses / User Agent Authentication Forms
WHATWG mailing-list thread (initiated by Aaron Swartz) on fixing the authentication problem

Julian gave a summary of the issue, and discussion followed about the general brokenness of HTTP authentication and whether we really want to be doing anything to try to help it succeed, which led to discussion about OAuth and OpenID and SAML.

Harry Halpin commented: “this is really important, someone should eventually sort this whole identity thing out” and there was general agreement about that, but now about where the sorting out should actually get done; Jonas Sicking commented: “I think HTML5 should be completely silent on this. I think that should be a separate work item. Is it a W3C matter?” and Julian’s response to that was, “IETF is waiting for W3C; it’s a user agent issue.”

Thomas Roessler (W3C Security Activity Lead) stepped in near the end of the discussion to offer some insights around the security issues and security UI issues.

9. Progress review/evaluation of stability of sections of HTML5 # T

See the minutes for both the morning and afternoon portions of this discussion.

The group spent roughly half of day two of the face-to-face meeting reading through the spec together and updating its accompanying marginal annotations to reflect the current status of each section of the spec in terms of stability and implementation support.

The results are reflected in the marginal annotations in the WHATWG copy of the spec (the W3C copy does not support the annotations mechanism).

10. ARIA implicit roles # T

See the minutes of the discussion about ARIA implicit roles.

Ian Hickson Ben Millard, Anne van Kesteren, Cynthia Shelly, Michael Cooper, Henri Sivonen, and Marcos Caceres met for a separate “breakout session” to discuss ARIA implicit roles.

Anne suggested that there was a need to focus on two things: “figuring out where holes with current HTML to MSAA mapping are in existing implementations and Figuring out what the mapping should be going forward and for new HTML elements”, and there was discussion about the need to document [existing | proposed] mapping between HTML features, ARIA features, MSAA, UIA, IA2, ATK, and AX.

The ARIA User Agent Implementors Guide was noted as being a good starting point for creating further mappings.

As a forum for further discussion, the attendees agreed to use the wai-xtech@w3.org mailing list, with message subjects flagged with the prefix [Role].

11. Status report on authoring guide to HTML5 # T

See the minutes of the discussion about the authoring guide.

Lachlan Hunt gave a report on the Editor’s Draft of a non-normative Web Developer’s Guide to HTML 5 that he’s been working on. There was some discussion about what level of background knowledge the current introduction to the document expected from readers. Karl Dubost said the after the end of November, he would be able to spend time working collaboratively on editing parts of the document.

12. SVG in text/html (joint meeting with SVG WG) # T

See the minutes of the discussion about SVG in text/html.

Henri Sivonen commented on his experiences in evaluating the SVG WG proposal for SVG and text/html and the “commented out” section on SVG in text/html in the HTML5 draft.

I‘ve implemented the proposal that was commented out. I estimated what work it would take to implement it in Gecko and in Java SE, and my assessment is that it’s much easier in both cases to implement the commented-out proposal. I sent comments about why the SVG WG is not implementable. I fundamentally disagree with having an XML parser inside the HTML parser.

Chris Lilley expressed misgivings about any implementor based on implementing just one of the two proposals.

Charles McCathieNevile responded by giving Opera’s perspective on the proposals.

In our assessment, in practical terms, the two proposals will work [equally well] -- after looking at the costs, based on a “desk check”, I think it is a goal that where it is feasible, you should be able to take SVG content out of text/html content, and stick it into, e.g., an editor. We don’t want to break the use case of cut-and-paste in that scenario

Erik Dahlström from Opera added:

To me it’s not a goal to allow something to be very different from the syntax we have now; it’s important to stay as close as possible to what is out there already.

Henri Sivonen responded:

I agree about the importance of the copy-and-paste from browser into editor, but following the line of thought, it leads to breaking the fundamental permissive nature of text/html (the “host” format); you can get around this without making fundamental changes to HTML parsing

There followed some discussion about browsers being able to help address the copy/paste problem by providing mechanisms to take any given text/html document (including those containing SVG content) and output a well-formed XML serialized representation of it.

That discussion was followed by one about the need to first (re)focus on what problem the two groups both wanted to solve, and what the goals of the solution should be.

Both groups agreed about the goal that the solution should not stop people from putting in well-formed SVG content into text/html. There was also agreement about that goal that “markup should be as easy to edit by hand as regular HTML, modulo complications due to the vocabulary itself”.

The groups did not reach agreement about the goal of “don’t break legacy pages”. Chris Lilley suggested that goal should be refined to state, “SVG pages which currently work — which produce some useful output — should continue working.”

The discussion then turned to the issue of how to deal with certain cases of use of namespaces in SVG content. Jonas Sicking offered the following comment and questions:

Seems like the issue of wrong namespaces is the only one we don’t have agreement on. Do we want it to be possible to use an off-the-shelf XML parser? Are we OK to restricting ourselves to non-off-the-shelf XML parsers? If I as an implementor am not OK with writing my own XML parser, then that definitely excludes some implementors.

The final item discussed was the issue of “whether to use case-sensitive or case-insensitive tag and attribute names at the syntax level should be driven from implementation performance choices, not conformance.” Henri Sivonen stated the performance concerns were a very serious issue.

To summarize the perspective from the view of the members of the SVG WG who attended: While stating that they recognized the need for browsers to do error correction of Web content — they reiterated that their position is that SVG must maintain its XML syntax, and in particular that specifications should mandate that conforming authoring tools create SVG as well-formed XML, since other output might create unpredictable behavior. The SVG WG is not opposed to error correction similar to that for HTML5 (including unquoted attribute values and case-insensitivity), but unlike HTML5, wants to emphasize that this is error-correction, and not canonically correct content.

The outcome of the discussion seemed to be indicate that while members of the SVG WG still did not agree that the commented-out section on SVG in text/html in the HTML5 draft was the right solution, there was some acknowledgment that it might eventually help lead to the right solution. Specifically, the commented-out proposal excludes certain SVG elements (“metadata” and “font”), while the SVG WG believes that there should be no implicit white-list of elements within HTML5, and that the list of elements, attributes, and attribute values should be referenced directly from the SVG family of specifications, and the SVG WG believes the final solution should allow for this.

In summary, both groups expressed agreement with the overall goal of getting SVG to work in text/html, and have a strong commitment making it happen, and plan to continue to work out the details for a solution along with implementors.