Bug 20740 - document outlining issues
document outlining issues
Product: WHATWG
Classification: Unclassified
Component: HTML
PC All
: P2 normal
: Unsorted
Assigned To: Ian 'Hixie' Hickson
: a11y
Depends on:
Blocks: 22050
  Show dependency treegraph
Reported: 2013-01-22 21:06 UTC by ben
Modified: 2013-05-15 19:06 UTC (History)
8 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description ben 2013-01-22 21:06:01 UTC
I think it's fairly clear at this point that there is a painful gulf between how the spec describes document outlining and how authors in the real world are actually building web pages. Luke Steven's many posts on this topic (whether one agrees with him on all points or not) are a visible symptom of this situation. Most recently check out (including comments):


Given the "at-risk" status of the outline algorithm, I think something's got to give here. (Stevens, unfortunately, has no constructive suggestions about outlining proper, which strikes me as somewhat defeatist.)

I definitely don't have all the answers, but having watched this evolve from the sidelines for years, I think there are some things that could be done to nudge the spec and real-world practice towards a greater state of parity and clarity. Here are a couple modest proposals:

1) Rename the terms "sectioning content", "sectioning root", and "section" in the spec. Here's the problem: the terminological overlap between the terms "sectioning content" (used to refer to content that defines scope of headings and footers, for example article, aside, nag, and section) and the term "section" (used to refer to the <section> element) is deeply confusing. The reuse of the same noun (or is the former sometimes a gerund?) in two very different but conceptually adjacent contexts is compounding the overall mess here.

The clarification in HTML 5.1 Nightly " Creating An Outline" that says: "The sections in the outline aren't section elements, though some may correspond to such elements — they are merely conceptual sections" doesn't really do much to clarify things. What, pray tell, is a "conceptual section"? In philosophy (and also typically in technical writing), a "concept" is something that has a "definition". It's distinct from a "notion," for example, which may not have a definition. But since all the key terms in the spec have definitions, the adjective "conceptual" provides no meaningful qualification. Try explaining to a first year web design student the difference between "sections" and "conceptual sections"!

For a technical specification, this is unacceptably confusing. I'm sure the English language is large enough that the W3C can find two suitably different terms for two different concepts. Here's my stab at a remapping of terms within the spec:

"sectioning content" = > "outlining content"
"sectioning root" => "outlining root"
"section" (not the element) => "outline container"
<section> (the element) stays the same.

Since the main purpose of "section" (again, not the element) in the spec is outlining, why not spell this out explicitly, and sidestep a bit of confusion in the process? I think this would be a win on all sides for clarity in the spec, which has been justly criticized on this point.

2) Introduce unnumbered headers, e.g. an <h> element.

By way of comparison, every beginning web developer instantly "gets" the idea of the un-numbered <li> tag. It has the great and obvious virtue that scripts can add or subtract elements from an unordered list dynamically without renumbering each element in the entire list in markup. Wouldn't it be great if headings and document outlining had the same flexibility?

As you know, the spec's current suggestion is to use all <h1> tags: "Sections may contain headings of any rank, but authors are strongly encouraged to either use only h1 elements, or to use elements of the appropriate rank for the section’s nesting level." The idea was to avoid introducing new elements, and maintain backwards compatibility.

This spares browsers' parsers the relatively trivial task of recognizing a new block element, but it introduces massive confusion for any implementation of document outlining. Predictably, there has been chaos. For any particularly page, is a screen reader (for example) supposed to use the new HTML5 outlining algorithm or the old? And based on what factors? There are no good answers here that I know of, not even provisional ones. No wonder vendors have dragged their feet on this! The "all h1, all the time" approach tries to split the difference between two completely different ideas about page structure, and winds up totally breaking the old outlining model without providing a satisfactory indication that the new model is in use on the page. This offers neither enhancement nor degradation--it's pure breakage, and it's not suited to the backwards-compatible, incrementalist approach that the web requires.

Instead, the "one h element to bind them all"  approach needs to be taken to its logical conclusion, making a clean break with the old outline model, and forging a very clear path toward the new "section" (renamed, please!) based outlining model, which has real virtues that should not be simply shelved for lack of implementation so far. The advantages of this unnumbered <h> tag approach are:

1) Nearly total backwards compatibility. Existing popular HTML5 javascript shims could very easily be tweaked to include an unnumbered <h> element.
2) No styling issues. Authors just use classes for styling. Personally, the idea of adding class names to my <h> tags to style them doesn't bother me in the least. (I disagree with Luke Stevens on this point). Similarly, Hickson's idea that adding a single class to an element is somehow "hard" for developers doesn't resonate with me.
3) Lucid developer aesthetics. Developers have it hammered home that the place of the <h> within the outline is determined by context, just like an <li> element. It's easy to learn, it's simple to type, and it's meaningfully contextualized.
4) A clear criterion for HTML5 outline interpretation in user agents. User agents can be advised to switch their outlining based on the presence of unnumbered <h> tags in the markup. It could be as simple as: "If there's a single <h> on the page, do it the new way. If there aren't any, do it the old way." With this more backwards-compatible implementation path, we might succeed in getting some implementations. :) Some developer evangelism would still naturally be required.

Thanks for reading.
Comment 1 Silvia Pfeiffer 2013-02-17 03:20:26 UTC
Bug 20068 has a discussion of more of the issues and as a consequence some examples were added to the outline algorithm at https://github.com/w3c/html/commit/fffcc35c0fe5e9a2bd0aadb4c4ae0658cfe0ce88 . Does this satisfy some of your concerns?
Comment 2 ben 2013-02-18 18:49:17 UTC
Thanks for the reply Silvia, but that thread and the linked "examples" don't address my concerns. If anything they further my point. This passage from Hixie on the linked thread:

"...then there's three top-level sections. So if we stick that in an <article>, we need to end up with three top-level sections, essentially three articles. Stick a section before the first heading, and it can't be in the same section as the heading's. It's a sibling of the /section/ (not <section>) that was generated by the first part of the <section>."

is illustrative fodder for my first argument about the need to differentiate the terminology. I've read this passage five times, I'm quite familiar with the topic, and I have no idea what Hixie's talking about. Is "heading's" with the apostrophe a possessive? Is he referring to the section that belongs to the heading? Or does the apostrophe indicate a plural, referring to multiple headings? (That usage is frowned upon, but common enough.) And again, the need to distinguish /section/ from <section> !

Ian, Michael, (and hopefully others!) can you comment on my original post?
Comment 3 steve faulkner 2013-04-06 13:48:36 UTC
assigning to Ian, appears to be one he should deal with as its questioning use of terminology around sectioning etc.
Comment 4 Ian 'Hixie' Hickson 2013-04-14 07:54:32 UTC
Please file only one issue per bug.

I can't really make head or tail of comment 0. The suggested terms don't seem any clearer than the current terms. If you would like to discuss specific terminology changes, I recommend filing a separate bug that just covers that topic (http://whatwg.org/newbug if you want me to be the one to process it.)

Regarding <h>, this has been considered before, and is a non-starter. <h1> is exactly what the proposed <h> would be, and already works, and has numerous additional benefits (like being compatible with <h2>-<h6> which you may wish to use to remain compatible with legacy UAs).

Regarding comment 2, I don't understand what's unclear (and yes, the grammar there seems to be correct English). Please file a new bug if there is something unclear that you would like me to handle (again, use http://whatwg.org/newbug if you want me to be the one to process it).

Regarding comment 3, I assume it was an oversight that this was assigned to me but left in the HTMLWG component. Moving to WHATWG component since apparently the HTMLWG doesn't want the bug any more (?).

Marking WONTFIX since I'm not changing the spec yet, but really this is NEEDSINFO except that I'd like the additional information in separate new bugs.
Comment 5 steve faulkner 2013-04-14 08:27:51 UTC
(In reply to comment #4)
> Regarding comment 3, I assume it was an oversight that this was assigned to
> me but left in the HTMLWG component. Moving to WHATWG component since
> apparently the HTMLWG doesn't want the bug any more (?).
My fault due to not understanding the appropriate procedure. As far as the HTML WG  not wanting, I resolve bugs I have the knowledge to resolve, if I find a bug that I think it is best to pass on to someone else, I do so.