22050 – document outlining issues

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 22050 - document outlining issues

Summary: document outlining issues

Status:	RESOLVED WONTFIX

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P3 editorial
Target Milestone:	---
Assignee:	steve faulkner
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Depends on:	20740
Blocks:
	Show dependency tree / graph

Reported:	2013-05-15 19:06 UTC by Edward O'Connor
Modified:	2015-06-17 03:17 UTC (History)
CC List:	10 users (show)

See Also:

Attachments

Description Edward O'Connor 2013-05-15 19:06:27 UTC

+++ This bug was initially created as a clone of Bug #20740 +++

I think it's fairly clear at this point that there is a painful gulf between how the spec describes document outlining and how authors in the real world are actually building web pages. Luke Steven's many posts on this topic (whether one agrees with him on all points or not) are a visible symptom of this situation. Most recently check out (including comments):

http://www.webdesignerdepot.com/2013/01/the-harsh-truth-about-html5s-structural-semantics-part-1/
http://www.webdesignerdepot.com/2013/01/the-harsh-truth-about-html5s-structural-semantics-part-2/
http://www.webdesignerdepot.com/2013/01/the-harsh-truth-about-html5s-structural-semantics-part-3/

Given the "at-risk" status of the outline algorithm, I think something's got to give here. (Stevens, unfortunately, has no constructive suggestions about outlining proper, which strikes me as somewhat defeatist.)

I definitely don't have all the answers, but having watched this evolve from the sidelines for years, I think there are some things that could be done to nudge the spec and real-world practice towards a greater state of parity and clarity. Here are a couple modest proposals:

1) Rename the terms "sectioning content", "sectioning root", and "section" in the spec. Here's the problem: the terminological overlap between the terms "sectioning content" (used to refer to content that defines scope of headings and footers, for example article, aside, nag, and section) and the term "section" (used to refer to the <section> element) is deeply confusing. The reuse of the same noun (or is the former sometimes a gerund?) in two very different but conceptually adjacent contexts is compounding the overall mess here.

The clarification in HTML 5.1 Nightly "4.4.11.1 Creating An Outline" that says: "The sections in the outline aren't section elements, though some may correspond to such elements — they are merely conceptual sections" doesn't really do much to clarify things. What, pray tell, is a "conceptual section"? In philosophy (and also typically in technical writing), a "concept" is something that has a "definition". It's distinct from a "notion," for example, which may not have a definition. But since all the key terms in the spec have definitions, the adjective "conceptual" provides no meaningful qualification. Try explaining to a first year web design student the difference between "sections" and "conceptual sections"!

For a technical specification, this is unacceptably confusing. I'm sure the English language is large enough that the W3C can find two suitably different terms for two different concepts. Here's my stab at a remapping of terms within the spec:

"sectioning content" = > "outlining content"
"sectioning root" => "outlining root"
"section" (not the element) => "outline container"
<section> (the element) stays the same.

Since the main purpose of "section" (again, not the element) in the spec is outlining, why not spell this out explicitly, and sidestep a bit of confusion in the process? I think this would be a win on all sides for clarity in the spec, which has been justly criticized on this point.

2) Introduce unnumbered headers, e.g. an <h> element.

By way of comparison, every beginning web developer instantly "gets" the idea of the un-numbered <li> tag. It has the great and obvious virtue that scripts can add or subtract elements from an unordered list dynamically without renumbering each element in the entire list in markup. Wouldn't it be great if headings and document outlining had the same flexibility?

As you know, the spec's current suggestion is to use all <h1> tags: "Sections may contain headings of any rank, but authors are strongly encouraged to either use only h1 elements, or to use elements of the appropriate rank for the section’s nesting level." The idea was to avoid introducing new elements, and maintain backwards compatibility.

This spares browsers' parsers the relatively trivial task of recognizing a new block element, but it introduces massive confusion for any implementation of document outlining. Predictably, there has been chaos. For any particularly page, is a screen reader (for example) supposed to use the new HTML5 outlining algorithm or the old? And based on what factors? There are no good answers here that I know of, not even provisional ones. No wonder vendors have dragged their feet on this! The "all h1, all the time" approach tries to split the difference between two completely different ideas about page structure, and winds up totally breaking the old outlining model without providing a satisfactory indication that the new model is in use on the page. This offers neither enhancement nor degradation--it's pure breakage, and it's not suited to the backwards-compatible, incrementalist approach that the web requires.

Instead, the "one h element to bind them all"  approach needs to be taken to its logical conclusion, making a clean break with the old outline model, and forging a very clear path toward the new "section" (renamed, please!) based outlining model, which has real virtues that should not be simply shelved for lack of implementation so far. The advantages of this unnumbered <h> tag approach are:

1) Nearly total backwards compatibility. Existing popular HTML5 javascript shims could very easily be tweaked to include an unnumbered <h> element.
2) No styling issues. Authors just use classes for styling. Personally, the idea of adding class names to my <h> tags to style them doesn't bother me in the least. (I disagree with Luke Stevens on this point). Similarly, Hickson's idea that adding a single class to an element is somehow "hard" for developers doesn't resonate with me.
3) Lucid developer aesthetics. Developers have it hammered home that the place of the <h> within the outline is determined by context, just like an <li> element. It's easy to learn, it's simple to type, and it's meaningfully contextualized.
4) A clear criterion for HTML5 outline interpretation in user agents. User agents can be advised to switch their outlining based on the presence of unnumbered <h> tags in the markup. It could be as simple as: "If there's a single <h> on the page, do it the new way. If there aren't any, do it the old way." With this more backwards-compatible implementation path, we might succeed in getting some implementations. :) Some developer evangelism would still naturally be required.

Thanks for reading.

Comment 1 Silvia Pfeiffer 2013-05-15 22:17:17 UTC

This whole outline business baffles me still - I wasn't there when it got created, so I may be missing something.

Was there ever any attempt to learn from existing "table of contents" approaches, such as the one employed in LaTeX: http://en.wikibooks.org/wiki/LaTeX/Document_Structure#Table_of_contents ?

Why don't we have an element that can actually visually expose the outline of a document? Something like <tableOfContents></tableOfContents>, which is filled by the browser with the result of the outline algorithm.

I don't see much use of the outline algorithm unless it actually gets visually exposed. Having an actual element for exposing the outline would force us to implement it and fix it.

Comment 2 steve faulkner 2015-06-12 14:40:47 UTC

WONTFIX to reflect WHATWG resolution.