Notes on HTML 5 Issues

Noah Mendelsohn
2 Sept 2009

These are very rough, work-in-progress notes on issues that the TAG might want to raise with respect to the HTML 5 working draft. Most of these are notes that I gathered while reading a few sections, but some of the issues were mentioned by other TAG members. I am not necessarily convinced that all of these are legitimate issues, and even when they are, details may be wrong.

For most issues, references are provided to pertinent sections of the HTML 5 draft. Since section numbering has changed since the TAG began its review, some sections are listed in this form XXXX(YYYY). This is to be read as Section #XXXX in the 25th Aug. working draft of HTML 5; section YYYY in the version that the TAG divyed up for review.

The TAG has not reviewed these. The sections below are divided according to what I suggest might be the significance of these issues if the TAG decides that any of them have merit.

Most significant

These are likely to have the most impact on the operation of the Web and the success of HTML 5 overall.

Potential Issue: Lack of clarity on what is an error and what isn't. Need consistent editorial approach.

It's understood that the draft is in part trying to specify what constitutes legal HTML 5 and associated processing, and also to specify the behavior of user agents parsing and processing input that is not legal HTML. In some cases, the distinctions seem not to be sufficiently clear. Picking nearly at random some representative examples:


Potential issue: declarative vs. imperative expositions of validity checks and mappings to DOM

Many of the rules for distinguishing correct syntax, and for mapping input into the DOM, are expressed imperatively. Some of the concerns I've heard TAG members express about this choice include:


Throughout the specification.

Potential issue: algorithm complexity

Many of the step-by-step algorithms are presented in a way that is extremely difficult for a human to parse and check. Even if the suggestion that the rules be made more declarative is rejected (in some or all cases), it may be useful to look for other ways of setting out the algorithms. The case can be made that a more formal programming notation might be just as easy or easier to follow, more precise in its semantics, and perhaps easier to check or implement automatically in some cases.


This is one example of a section that seems particularly long and hard to follow:

Potential issue: document.write() not supported from XML serialization

This may be old news, but I was surprised to see that document.write() is not supported when parsing the XML serialization. This seems to put the nail in the coffin of XML as a serialization format for colloquial HTML. I understand that there are a variety of issues in making a sensible definition of how this would work, but my intuition is that it could be done reasonably cleanly (albeit not with most off-the-shelf XML parsers).


Potential Issue: Does HTML 5 establish appropriate policies for extensibility?

See discussion below under references.


Potential Issue: language associated with "must" sometimes informal

There are many instances in which it is stated that "XXXX must happen", but the explanation of XXX is vague or informal, or terms that probably should be hyperlinks are not linked.


There are many, many examples of this, and probably the whole draft should be checked by multiple readers to find them. The following are a few selected more or less at random to illustrate the concern:

A case could be made that the prevalence of problems like this illustrates a more structural weakness relating to the somewhat informal style of significant parts of the specification.

Potential Issue: Web addresses and URL terminology

The HTML 5 draft uses the term URL, not URI. It's unclear whether the factoring to reference WebAddr and/or IRI-bis will be retained.


To be supplied

Potential Issue: Content-type sniffing

HTML 5 calls for user agents to ignore normative Content-type in certain cases.


To be supplied

Potential issue: inconsistent formality/informality

There are many examples where seemingly simple things are spelled out in very careful detail, such as the character codes for the letters H-T-M-L. It then seems surprising when very important concepts are not clearly spelled out or hyperlinked.


Potential issue: "willful violations" of other specifications

HTML 5 acknowledges in several places that it is in "willful violation" of other specifications from the W3C and IETF. Potential concerns include:


A few representative examples:

Potential issue: doctype and explicit version identifiers

The TAG has been exploring the pros and cons of using explicit version identifiers in HTML (and other) documents. Considerations include:


To be supplied

Moderate significance

To be supplied.

Minor issues

To be supplied.

Noah Mendelsohn for TAG
$Revision: 1.8 $ of $Date: 2009/09/03 16:56:23 $

Valid XHTML 1.1