This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25394 - outline depth calculation should not include empty sections
Summary: outline depth calculation should not include empty sections
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL:
Whiteboard:
Keywords:
Depends on: 25393
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-19 12:42 UTC by steve faulkner
Modified: 2014-09-08 23:53 UTC (History)
7 users (show)

See Also:


Attachments

Description steve faulkner 2014-04-19 12:42:21 UTC
+++ This bug was initially created as a clone of Bug #25393 +++

Including sections without Hx leads to illogical outline depths for headings
http://www.w3.org/html/wg/drafts/html/master/sections.html#outline-depth

example:

<body>
<h1>heading</h1>

<section>
<section>
<section>
<h1>heading</h1>
</section>
</section>
</section>

results in 

<h1>
<h4>
</body>

given that use of sectioning elements in the wild are prone to this type of misuse, ignoring sections without headings seems prudent.
Comment 1 steve faulkner 2014-04-19 13:00:53 UTC
some data 252 HTML5 pages using the <section> element
http://www.html5accessibility.com/HTML5data/section/section.html
Comment 2 steve faulkner 2014-04-19 13:06:10 UTC
some data: 170 HTML5 pages that use the <article> element http://www.html5accessibility.com/HTML5data/article/index.html
Comment 3 Ian 'Hixie' Hickson 2014-04-22 18:30:42 UTC
It should result in:

   + heading
      + (untitled section)
         + (untitled section)
            + heading

If people don't want this, then they shouldn't use <section> that way. If we start trying to DWIM the logic here, (a) the outline algorithm will become even more absurdly complex (it's already insane enough that people keep getting confused), and (b) it'll make it impossible to do some things, like having untitled sections, for which there are plenty of use cases (e.g. lots of sidebars have no titles in practice; same with sections in long fiction books, etc).
Comment 4 steve faulkner 2014-04-23 10:29:14 UTC
(In reply to Ian 'Hixie' Hickson from comment #3)
> It should result in:
> 
>    + heading
>       + (untitled section)
>          + (untitled section)
>             + heading
> 
> If people don't want this, then they shouldn't use <section> that way. 

people don't understand that the heading in the example will result in it being exposed as a <h4> to AT users. Its a really brittle and short sighted view to expect that devs will stop misusing elements.

If we
> start trying to DWIM the logic here, (a) the outline algorithm will become
> even more absurdly complex (it's already insane enough that people keep
> getting confused), and (b) it'll make it impossible to do some things, like
> having untitled sections, for which there are plenty of use cases (e.g. lots
> of sidebars have no titles in practice; same with sections in long fiction
> books, etc).

it does not appear you are taking into account, the AT use case where a user is moving through a document, not a document outline view.

what they get is 
heading level 1

content
...

content

heading level 4
Comment 5 Jason Kiss 2014-04-28 11:00:09 UTC
I tend to agree with Hixie on this one: The algorithm is already too confusing, and sectioning elements need to consistently either affect heading levels and the outline or not at all.
Comment 6 steve faulkner 2014-04-28 11:07:35 UTC
(In reply to Jason Kiss from comment #5)
> I tend to agree with Hixie on this one: The algorithm is already too
> confusing, and sectioning elements need to consistently either affect
> heading levels and the outline or not at all.

hi Jason, so you are saying that having gaps in nesting levels is OK? This appears to go against advice not to skip heading levels.
Comment 7 Jason Kiss 2014-04-28 11:23:43 UTC
(In reply to steve faulkner from comment #6)
> hi Jason, so you are saying that having gaps in nesting levels is OK? This
> appears to go against advice not to skip heading levels.

I think gaps in nesting levels are to be avoided, but just like skipped heading levels, they are a fact/consequence of misuse of HTML, and I think it's more confusing to incorporate additional rules to account for such misuse, especially when the base rule set is already as complex as the html5 outline algorithm.
Comment 8 steve faulkner 2014-04-28 11:36:04 UTC
(In reply to Jason Kiss from comment #7)
> (In reply to steve faulkner from comment #6)
> > hi Jason, so you are saying that having gaps in nesting levels is OK? This
> > appears to go against advice not to skip heading levels.
> 
> I think gaps in nesting levels are to be avoided, but just like skipped
> heading levels, they are a fact/consequence of misuse of HTML, and I think
> it's more confusing to incorporate additional rules to account for such
> misuse, especially when the base rule set is already as complex as the html5
> outline algorithm.

I don't think we are talking about misuse take, this example from the spec: with a second heading added to illustrate:

the top level heading has a level=2, the second heading has a level=7, how is this 5 level gap be conveyed meaningfully to a user? 

<article>
 <h1><a href="http://bacon.example.com/?blog=109431">Bacon on a crowbar</a></h1>
 <article>
  <header><strong>t3yw</strong> 12 points 1 hour ago</header>
  <p>I bet a narwhal would love that.</p>
  <footer><a href="?pid=29578">permalink</a></footer>
  <article>
   <header><strong>greg</strong> 8 points 1 hour ago</header>
   <blockquote><p>I bet a narwhal would love that.</p></blockquote>
   <p>Dude narwhals don't eat bacon.</p>
   <footer><a href="?pid=29579">permalink</a></footer>
   <article>
    <header><strong>t3yw</strong> 15 points 1 hour ago</header>
    <blockquote>
     <blockquote><p>I bet a narwhal would love that.</p></blockquote>
     <p>Dude narwhals don't eat bacon.</p>
    </blockquote>
    <p>Next thing you'll be saying they don't get capes and wizard
    hats either!</p>
    <footer><a href="?pid=29580">permalink</a></footer>
    <article>
     <article>

<!-- heading added equates to h7 due to nesting -->
<h1>heading text </h1>

      <header><strong>boing</strong> -5 points 1 hour ago</header>
      <p>narwhals are worse than ceiling cat</p>
      <footer><a href="?pid=29581">permalink</a></footer>
     </article>
    </article>
   </article>
  </article>
  <article>
   <header><strong>fred</strong> 1 points 23 minutes ago</header>
   <blockquote><p>I bet a narwhal would love that.</p></blockquote>
   <p>I bet they'd love to peel a banana too.</p>
   <footer><a href="?pid=29582">permalink</a></footer>
  </article>
 </article>
</article>
Comment 9 Jason Kiss 2014-04-28 12:25:16 UTC
(In reply to steve faulkner from comment #8)
> I don't think we are talking about misuse take, this example from the spec:
> with a second heading added to illustrate:
> 
> the top level heading has a level=2, the second heading has a level=7, how
> is this 5 level gap be conveyed meaningfully to a user? 

On a side note, will the outline algorithm permit infinite heading levels, or so many levels greater than six? I don't see why it shouldn't, but presumably that involves some adjustment in UAs and AT.

In terms of your example, I'd suggest that the gap is meaningfully conveyed by being a level 7 heading, which is 5 levels deeper than the level 2 heading that started the thread, suggesting that there are five comments that come before it.

Additionally, I'd recommend that the system used to post comments generate some default heading for each nested comment in the case that no explicit heading is provided, e.g. make the <header> content the heading, at which point there would be no gaps.
Comment 10 Duff Johnson 2014-04-28 14:36:56 UTC
I concur with comment 4. The problem's severity scales with content volume / complexity. It's not future-proof.

Historical sloppy handling of heading levels appears unsustainable in the HTML5 paradigm. It's not a legacy worthy of protection. HTML5 is the right break-point.

Yes, accommodating more than six heading levels is implied - and why not? What's the (defensible) argument to exclude H7, H8... H22?

IMO: Fix it now or you'll be fixing it in HTML6. ;-)
Comment 11 Ian 'Hixie' Hickson 2014-04-28 21:57:36 UTC
Could someone better explain how this is impacting AT users, e.g. by describing a real-world result of this? Maybe a video showing how an AT reacts to a page of the form being discussed here? That would be helpful in designing a better solution that doesn't require changing the outline algorithm.
Comment 12 Duff Johnson 2014-04-29 04:43:57 UTC
Hi Ian,

A quick answer - I hope it's responsive to your question.

Here's a video made by a blind screen reader user that discusses the matter of headings.

https://www.youtube.com/watch?v=AmUPhEVWu_E

- He emphasizes the "proper" use of headings (admittedly, without being specific about what that is - nonetheless, he has a clear sense of proper vs. improper use)
- While allowing for headings used to indicate "importance" (an HTML 4 rubric not carried over in HTML 5 so far as I am aware), he highlights the significance of headings as section indicators.

Without headings the user cannot distinguish between section headings and subsection headings, forcing him to plow through very minor sub-sub-sections because they might miss a major heading if they don't.

Logical heading levels make it relatively easy to distinguish main body content from supplementary content, surf a lot of content, or surf highly structured content.

Illogical headings indicate to the user that they cannot trust the structures they encounter, and therefore must default to surfing them all, whether in the main body of the content or in sidebars or comments, without distinction. 

Note that WebAIM's survey of screen reader users shows that headings are the most significant means of web content navigation to screen reader users:

http://webaim.org/projects/screenreadersurvey5/#finding

I don't see how the concept of an "untitled section" (per your Comment 3) is conveyable to AT users without resort to the approach Steve highlighted in his initial statement of the problem. Given their means of consumption, if they don't impute headings (somehow) from each <section> they encounter, AT users could only assume that sections without headings are simply continuations of the body content associated with the immediately prior heading.

This is emphatically not the desired effect.

I hope this is useful input. I'm off to bed.
Comment 13 Ian 'Hixie' Hickson 2014-04-29 21:51:27 UTC
That video seems like things would work fine with sections lacking headers, actually. He doesn't seem to rely on heading structure at all — he just treats them like a flat list. If ATs want to use this approach, then they don't need to use the outline algorithm at all — they can just take all the headings and let users step through them. There's no need for a change to the spec to do that.

If they want to do better, then they can use the outline algorithm. Note that <section> elements don't match 1:1 to sections. A <section> element can generate multiple sections and thus have multiple headings. A screen reader has to generate the outline tree for the page to make sense of it.

For example:

   <section>
    <h1>A</h1>
    aaa...
    <h1>B</h1>
    bbb...
   </section>

...is two sections. Similarly, this is three sections:

   <section>
    <h1>A</h1>
    aaa...
    <section>
    ccc...
    </section>
    <h1>B</h1>
    bbb...
   </section>

...and if you were to walk the outline, you'd find it had a section A, with a subsection with no name, and a section B, sibling to A. There's no reason that I can see that the AT can't say this.

But there's also no reason I can see for the AT to not just say that there's two headings, A, and B, and ignore the outline. It depends on what UI the AT wants to expose.
Comment 14 Duff Johnson 2014-05-02 16:38:34 UTC
I think the video is ambiguous, and interpret it differently. He's pretty clear that heading levels matter, it’s just unclear from the demo itself why they matter. In that sense, certainly, the video does not clearly present the message I feel it delivers.

However… I contend that if we asked him to extend the demo to encompass, say, a detailed technical document with 132 headings, or a page with 3 headings in the document’s enclosing <section> and 18 headings in sidebar(s), or a page with lots of deep comments which themselves leveraged headings (or sections), he would want "proper" use of heading levels (no skips), however he consumed the content.

Or so I believe. I’ve asked for a video in that context. ☺

HTML5 makes big definitional changes in headings - removing "importance" from the concept, providing <section>, etc. The idea (it seems) is to facilitate precision in semantics, provide richer, more valuable options for navigation and presentation, better support for richer content, etc. All good!

In the pre-HTML5 paradigm, of course, "headings" have never been <em>only</em> about "sections". The video reflects present-day assumptions that HTML5 is (I had thought) supposed to negate, specifically:

- Most web-pages have relatively few headings (as expressed to the user).
- Heading levels, per-se, aren't very useful because the semantics aren't clear.

A "flat" approach to headings makes some very cramped assumptions. It’s a reasonable behavior for low numbers of headings (or sections), or with low proportions of headings used for content other than "main" content. But when the gross volume / depth of headings goes up.... well, that's why we want an outline model! So we need both, and the more you need an outline the more you want headings. Or so it seems to me.

But as I try to ingest your point (and I may well be missing it), it seems your answer is that AT (or its users) can/will choose the outline or focus on the headings. How is a screen-reader to decide, or even, help a screen-reader user to decide? Doesn't forcing the choice reduce the value of both to the point that AT users will just… give up on outlines, and per Steve's initial point – just continue to "…[ignore] sections without headings"?

It feels as if what's on-offer is a facilitation of current practice (HTML 4 mentality) in an HTML5 setting. Maybe this is precisely the design-intent? I can understand the appeal – but a world that leverages HTML5 this feels like a big fork in the road that will increase confusion over structure instead of leveraging the opportunity afforded by HTML5 to reduce it.

At the end of the day, I can’t tell you how to clarify the representation of semantic structures in documents and pages in HTML’s model for outlines and headings. Nor should I try – you guys spend waaay more time in it than I do. All I can tell you is that to me, this specific approach feels like it either won’t help at all or could hurt a lot.

If I find a video by an AT user navigating a highly structured document I will post it.
Comment 15 Ian 'Hixie' Hickson 2014-05-02 18:16:25 UTC
I think this is missing the original point of the bug.

Steve makes the assertion that if an AT is heading-focused, and a page has sections without headings, the resulting user interface will be confusing.

Before I can fix this, I first need to understand it. So the question I have is, "what is the confusion in question?". Concretely, that is: if a user navigates a page with sections with missing headings, what is the user experience? What exactly is confusing about it?

I'm not saying that it's not confusing; I'm saying I don't understand the issue well enough to fix it.
Comment 16 Duff Johnson 2014-05-02 19:15:43 UTC
What is a section without a heading? How should AT express <section> in the absence of heading tags? Offhand I'm having difficultly envisioning a use-case for a <section> without a heading.

To me, a <section> without a heading tag implies (a) an entry in a Table of Contents, and (b) that said entry is blank. What would you have a user understand from such a construct? Would headings appearing within a section appear to be... subsections to that section? Or to the heading above? Or would it depend on the respective heading levels, irrespective of intervening <section>s? Or would the heading levels be driven by sections in addition to the headings themselves? 

I don't see how providing "outline" or "headings-only" modes in a UI, as you suggested, would answer these questions. At best the user gets 2 (or more) potentially inconsistent "pictures" of the document's structure.

In the common (and IMHO, reasonable) understanding, headings are the delineators for semantic structure within a given high-level block of content (like an <article> or an <aside>). Without a heading I have trouble understanding how <section> is usefully distinguished from a <div>.

I like what Bruce (and Steve) wrote here:

http://html5doctor.com/the-section-element/
Comment 17 Ian 'Hixie' Hickson 2014-05-05 22:40:24 UTC
Sample use cases for sections without headings are:
 - sidebars (typically using <aside>)
 - navigation (typically using <nav>)
 - narrative streams in books (sometimes marked up with <hr>s separating the 
   sections rather than <section>)
 - microposts (e.g. tweets) or comments (typically using <article>)
 - different panels in an application UI
 - especially long footers for which <footer> might be insufficient
 - segments of applications, e.g. the comments part of this bug page, or the
   game board on an online game application
 - parts of a page for which the context is sufficiently clear that the
   section does not need a heading, e.g. a list of endnotes

<div> doesn't mean anything. You can add or remove <div> elements from a page and, modulo some effects on implied paragraphs, it has no effect on the semantics.

<section> means "a generic section of a document or application", that is, "a thematic grouping of content".
Comment 18 steve faulkner 2014-05-06 06:04:31 UTC
(In reply to Ian 'Hixie' Hickson from comment #17)
> Sample use cases for sections without headings are:
>  - sidebars (typically using <aside>)
labelled in acc layer via role
>  - navigation (typically using <nav>)
labelled in acc layer via role

>  - narrative streams in books (sometimes marked up with <hr>s separating the 
>    sections rather than <section>)
epub(for e,g,) provides labelling advice for untitled sections
http://www.idpf.org/accessibility/guidelines/content/xhtml/sections.php

>  - microposts (e.g. tweets) or comments (typically using <article>)
comments often have implicit or explict titles/labels
>  - different panels in an application UI

typically have a heading/role/label for identification


>  - segments of applications, e.g. the comments part of this bug page, or the

would benefit from a label/heading


>  - parts of a page for which the context is sufficiently clear that the
>    section does not need a heading, e.g. a list of endnotes

sufficiently clear for who?

suggest any cases where no hx is provided, for AT users a *useful* role/label be exposed via outline algo. What we already know is that simply exposing role label for article/section is considered to be noise/annoying by AT users.
Comment 19 Duff Johnson 2014-05-06 14:48:36 UTC
(In reply to Ian 'Hixie' Hickson from comment #17)
> Sample use cases for sections without headings are:
>  - sidebars (typically using <aside>)
>  - navigation (typically using <nav>)
>  - narrative streams in books (sometimes marked up with <hr>s separating the 
>    sections rather than <section>)
>  - microposts (e.g. tweets) or comments (typically using <article>)
>  - different panels in an application UI
>  - especially long footers for which <footer> might be insufficient
>  - segments of applications, e.g. the comments part of this bug page, or the
>    game board on an online game application
>  - parts of a page for which the context is sufficiently clear that the
>    section does not need a heading, e.g. a list of endnotes

I concur with Steve - some of these include their own semantics (<aside>, etc), others can't be clear without additional content.

I guess one could argue that <section> acquires useful meaning if one can be assured of its occurrence within an <article> or <aside> context (for example)... but such usage doesn't appear to be required.

> <div> doesn't mean anything. You can add or remove <div> elements from a
> page and, modulo some effects on implied paragraphs, it has no effect on the
> semantics.

Indeed - and the same may be said of a disembodied <section> absent other contextulizing elements (headings, <article>, etc) to lend it scope.

> <section> means "a generic section of a document or application", that is,
> "a thematic grouping of content".

...and as such, useful only if the applicable "theme" is unambiguous, IMO.
Comment 20 Léonie Watson 2014-05-09 09:34:26 UTC
Comment 15
"Steve makes the assertion that if an AT is heading-focused, and a page has sections without headings, the resulting user interface will be confusing."

Yes, speaking as a screen reader user, it would be confusing.

"a page with sections with missing headings, what is the user experience? What exactly is confusing about it?"

The heading hierarchy is the best way to understand the relationship between different sections of content. If the hierarchy is logical (cascades without skipping levels), then those relationships are easy to determine.

Headings also assist with content location. Moving between headings that have a logical hierarchy enables you to drill down into the page content, narrowing down the part of the page you need to examine in detail to find the content you're looking for.

If the page doesn't have a logical heading hierarchy, then neither of these things is possible. This significantly reduces the UX from the point of view of a blind person.

HTH.
Comment 21 Ian 'Hixie' Hickson 2014-05-09 17:26:47 UTC
Can you give a concrete example of this? I'm not understanding what the difference in experience would be. Let's start with some simple examples. What is the user experience for these two examples?

   <body>
    <h1>Hello</h1>
    <p>foo foo foo</p>
    <section>
      <h1>Welcome</h1>
      <p>bar bar bar</p>
    </section>
   </body>

   <body>
    <h1>Hello</h1>
    <p>foo foo foo</p>
    <section><section>
      <h1>Welcome</h1>
      <p>bar bar bar</p>
    </section></section>
   </body>
Comment 22 steve faulkner 2014-05-09 18:37:53 UTC
(In reply to Ian 'Hixie' Hickson from comment #21)
> Can you give a concrete example of this? I'm not understanding what the
> difference in experience would be. Let's start with some simple examples.
> What is the user experience for these two examples?
> 
>    <body>
>     <h1>Hello</h1>
>     <p>foo foo foo</p>
>     <section>
>       <h1>Welcome</h1>
>       <p>bar bar bar</p>
>     </section>
>    </body>
> 
>    <body>
>     <h1>Hello</h1>
>     <p>foo foo foo</p>
>     <section><section>
>       <h1>Welcome</h1>
>       <p>bar bar bar</p>
>     </section></section>
>    </body>

If you want to get user experience feedback, why not provide working examples and examples that have meaningful content?
Comment 23 Léonie Watson 2014-05-09 20:13:50 UTC
"> What is the user experience for these two examples?"

In this example, the first heading would be announced by a screen reader as "Level 1" and the next heading as "Level 2". The parent/child relationship is easy to understand.

In the second example, the first heading would be announced as "Level 1", but the next heading as "Level 3". a grandparent/grandchild relationship (without an intervening parent) isn't quite so easy to understand.

So because you can't make use of the visual cues, you question the relationship you're being presented with. Is the content under the second heading really a sub-section of the content before it, or is it just an arbitrary				 heading?

The deeper the nesting, the weaker the relationship - and the understanding of that relationship becomes. The closest thing I can liken it to, would be trying to locate some content in a sub-section of a section within the fourth chapter of a book, without knowing where any of the chapters or sections began.
Comment 24 Ian 'Hixie' Hickson 2014-05-09 23:14:42 UTC
Thanks for the explanation. Based on that, I loaded VoiceOver and explored how it would feel.

I have to say, I really don't think this is as confusing as is made out above. I mean, don't get me wrong, we should absolutely recommend against it in our advocacy of how to write good markup. But I actually ended up way more confused when I tried to read the VoiceOver help and it read me a filename in place of an image, for example, than I was when I was navigating a page with missing levels.

Honestly it's no more confusing than when people use just <h1> and <h3> today, with no <h2>. Yes, it's not ideal, but AT users aren't dumb. They can tell that for some reason the author just skipped a level.

I mean, look at how the guy in the video cited above uses Jaws. He's often stopping the playback before Jaws even has a chance to tell him the level.

There's also the complexity around what it would mean to re-level headings. For example, consider markup like this:

   <section>
    <h1>...</h1>
    ...
    <section>
      ...
      <section>
       <h1>...</h1> <!-- X -->
      </section>
      <h1>...</h1>
      ...
    </section>
    <h1>...</h1>
    ...
   </section>

If we start skipping levels, then what level is the heading marked X? is it a sibling of the following one? What about cases like:

   <section>
     <h1>...</h1>
     ...
     <section>
      <h1>...</h1>
     </section>
   </section>
   <section>
     ...
     <section>
      <h1>...</h1>
     </section>
   </section>
   <section>
     <h1>...</h1>
     ...
     <section>
      <h1>...</h1>
     </section>
   </section>

Are the three subsections the same level? Or different levels? Just because the author forgot to put a heading in one of the sections, should we renumber the other headings? I think if we did that, it would be at least as confusing, and probably significantly MORE confusing, to a user navigating the page for the first time.

Having said all that, I would definitely recommend that ATs seriously consider presenting the outline including implicit headers. Also, I think that the "level 2" kind of language is a bit obscure — people don't generally talk about "level 2 headings" of books and so on. "Heading" and "Subheading", at least for level 1 and 2, would be way clearer, IMHO. And faster.
Comment 25 steve faulkner 2014-05-10 08:32:34 UTC
(In reply to Ian 'Hixie' Hickson from comment #24)
> Thanks for the explanation. Based on that, I loaded VoiceOver and explored
> how it would feel.
> 
> I have to say, I really don't think this is as confusing as is made out
> above. I mean, don't get me wrong, we should absolutely recommend against it
> in our advocacy of how to write good markup. But I actually ended up way
> more confused when I tried to read the VoiceOver help and it read me a
> filename in place of an image, for example, than I was when I was navigating
> a page with missing levels.

So you as a sighted non screen reader user are telling a blind screen reader user that they are making out that something which they find confusing is less confusing for you, therefore its not as confusing as they suggest.

 
> Honestly it's no more confusing than when people use just <h1> and <h3>
> today, with no <h2>. 

Nobody has argued that the present situation as regards to skipped levels is anymore or less confusing. What we should be trying to do is not embed the issue in the outline algorithm. It may well not involve re-ordering levels, but it is obvious that at least the advice in the spec about how the outline is conveyed to users needs to be less vague.

>Yes, it's not ideal, but AT users aren't dumb. They can
> tell that for some reason the author just skipped a level.

can they?

from Comment 23

"So because you can't make use of the visual cues, you question the relationship you're being presented with. Is the content under the second heading really a sub-section of the content before it, or is it just an arbitrary  heading?"
Comment 26 Ian 'Hixie' Hickson 2014-05-10 15:39:43 UTC
I'm not saying it's not confusing. I'm just saying that the confusion is being overblown here, especially in comparison to the suggested change to deal with it.

I agree that what Léonie said in comment 23 is entirely accurate:

> So because you can't make use of the visual cues, you question the
> relationship you're being presented with. Is the content under the second
> heading really a sub-section of the content before it, or is it just an
> arbitrary heading?

The same could be said for tons of stuff. Because you can't see the images, you question what it means when VoiceOver help says "p x 1 2 8 dot p n g" or whatnot. Because you don't have the visual cues, you question whether a checkbox is associated with the text before it or after it, when there's no <label> element.

The question that matters here is, are the solutions better or worse than the problems?

As I see it, there's two solutions on the table here. One is to make the algorithm different in some unspecified way. I don't see how we can do this in a way that is less confusing than what we have already.

The other way is to have browsers/ATs report implicit headings. The spec actually already suggests doing this. This lets users see the exact structure that the author used, and removes the confusion, at the cost of higher verbosity.
Comment 27 steve faulkner 2014-05-10 15:51:50 UTC
How should this document outline by presented in user agents?

http://validator.w3.org/nu/?showoutline=yes&doc=http%3A%2F%2Fwww.terminix.com%2F#outline
Comment 28 Austin hicks 2014-05-10 16:06:03 UTC
I'm not really sure where to start replying to comments, so I'm just going to provide my general thoughts.  I just saw this this morning via Twitter, so Im' trying to respond to everything at once; thus the length.  I am a blind Computer Science undergraduate who is extremely skilled at using the screen reader NVDA, to the point of writing scripts for it.  I have also used Jaws and Voiceover (both Mac and IOS) for extended periods.
I believe that the only real way to make headway on this issue, if it is even a problem in the first place, is to get a screen reader or other AT to implement it and get feedback.  I am going to cross-link this from the appropriate NVDA issue shortly, as someone has officially requested the algorithm.  The examples provided here of the alternatives do not actually make anything clear: they need actual content to actually be useful, and I need to come across them without them being "examples".  The rest of this is, therefore, hypothetical until I use the algorithm in the wild-and, according to the documentation linked in this issue, no one has yet implemented it (Jaws may have, as I saw someone claim they had once, but I haven't touched nor wanted to touch Jaws for over a year now).
I would divide users into three categories, but this division is also arbitrary.  First are those just learning AT.  This group may or may not be affected by this algorithm: if it is what they learn, I doubt it will be confusing to them, but their teachers may teach them inaccurate practices from HTML4 for the next few years.  The second is those who navigate the same sites over and over.  This is what I would call "heading-based", but not really-I'll come to that in the moment.  The biggest effect on this group is going to be that the algorithm changes familiar page layouts and some of them will have difficulty with that.  The third group--mine--researches widely, browsing many different pages and many different web sites, all using a wide variety of different HTML practices; this group is small, as far as I can tell.  This group is also the only perspective I can write from.
This last group is also the one who gets confronted with "heading-based navigation" being a myth.  While not dead yet, heading-based navigation started dying, as far as I can tell, about 5 years ago.  I will check for headings and will be happy to see them, but don't expect or anticipate them being there anymore. I have watched at least one familiar web site remove them much to my annoyance.  Everyone I know who is both sighted and making small web sites doesn't believe in them, but this is a small sample size of college students and may be an inaccurate generalization.  Nevertheless, nowadays, the most useful keystroke in my web browsing has become "next non-link text", not so much next heading-it's more important to get to the article than it is to figure out the sidebar in many cases, and that usually does the trick.  I'm not saying headings aren't popular among the blind, merely that assuming they are anything more than preferred is a mistake-in lots of cases, they don't exist or don't exist meaningfully.
I am aware of no AT that is specifically oriented around only the heading.  Proper web browsing support adds something like 50 keystrokes: navigate by link, list, Aria landmark.  There are even more esoteric ones than that: how about "move out of this element", which basically combines a leave list and leave table keystroke?  I suggest we do not get caught in the trap of saying that AT is heading-based; it's the user, not the software, that that term may apply to.  This is a problem with sighted users trying to test: they don't know about the huge group of keystrokes they aren't using.
What  the heading actually implies depends on the user.  I am aware of what a heading looks like, i.e. that it tends to be indented, so it makes sense to me when people leave them out-that is, I understand that it is a stylistic choice without understanding why it looks good in this case.  The only way the traditional semantic meaning of the heading is going to continue to have meaning is if we force web authors to actually use these practices, not if we fix the algorithm.  I do not believe that adding spurious headings will fix anything: a heading implies a section, and if I have to pass two, three, or seven headings before the actual section content I will not be pleased.  The most common use I've seen for the next heading key is basically synonymous with next section.
My advice is leave it alone for now until it becomes widespread, and then fix it if it becomes a problem.  The algorithm is too complex, and the support for it too sparse to understand what affect it will have on my future.  There's also no real guarantee that AT will implement this exact algorithm anyway.
Comment 29 steve faulkner 2014-05-11 10:18:38 UTC
I am a little confused why, as per the html5 outline algorithm, it would be better to report the outline produced by 
http://validator.w3.org/nu/?showoutline=yes&doc=http%3A%2F%2Fwww.terminix.com%2F#outline which is full of <h8>'s and <h9>'s

as against the old outline:

<h1>(Missing heading)
<h2>INTRODUCING OUR KILLER NEW MOSQUITO SERVICE
<h2>ALERT: Termites are swarming now
<h2>When you see one cockroach, there could be hundreds
<h2>The Ultimate Protection ® Guarantee. Only from Terminix
<h2>(No heading text)
<h1>We don’t just eliminate termites and pests. We eliminate worries
<h2>Bed bug reports are on the rise
<h2>Protect your family from mosquitoes
<h2>Have you seen a pest?
<h2>Our Services
<h2>Why Terminix ®
<h2>Our services
<h2>Manage account
<h3>(Missing heading)
<h4>An ironclad guarantee to rid your home of termites and pests.
<h3>Your local Terminix branch
<h3>Your local Terminix branch
<h3>Local time
<h3>Top current pest threats in , Edit
<h3>Termites
<h3>Rodents
<h3>Spiders
<h3>Bed Bugs
<h3>Ants
<h3>Cockroaches
<h2>Termites
<h2>Rodents
<h2>Spiders
<h2>Bed Bugs
<h2>Ants
<h2>Cockroaches
<h3>Termite swarm map
<h3>Want more information?
<h2>Confirm your ZIP code
Comment 30 Léonie Watson 2014-05-11 11:44:30 UTC
Comment 24
"I have to say, I really don't think this is as confusing as is made out above. I mean, don't get me wrong, we should absolutely recommend against it in our advocacy of how to write good markup. But I actually ended up way more confused when I tried to read the VoiceOver help and it read me a filename in place of an image, for example, than I was when I was navigating a page with missing levels."

Yes, hearing a file name instead of an alt text is another problem screen reader users come up against. Headings are another one.

"Honestly it's no more confusing than when people use just <h1> and <h3> today, with no <h2>."

True, but the current situation is based on authoring practices, what we're talking about now is entrenching that behaviour into the specification. If authors sometimes struggle to get h1 through h6 into a logical hierarchy, what do you think the chances of them getting it right based on the outline algorithm might be?

"Yes, it's not ideal, but AT users aren't dumb. They can tell that for some reason the author just skipped a level."

We can tell when the author skipped some levels, that's announced by the screen reader. What we can't tell is why the levels were skipped. Was the h5 used instead of an h2 because it looked better, or because the content was genuinely a great-great-grandchild of the content prefaced by the h1 (and if so what happened to the intervening headings)?

This isn't just the opinion of one person, much as it's important to me personally. The 4th WebAIM screen reader survey asked screen reader users how helpful they found heading levels. 47% said "very helpful" and a further 34% said "somewhat useful". Only 16% found them either "not very useful" or "not at all useful".
http://webaim.org/projects/screenreadersurvey4/#levels

"Having said all that, I would definitely recommend that ATs seriously consider presenting the outline including implicit headers."

I'm not sure how a screen reader might convey an implicit header to a user?

"Also, I think that the "level 2" kind of language is a bit obscure — people don't generally talk about "level 2 headings" of books and so on. "Heading" and "Subheading", at least for level 1 and 2, would be way clearer, IMHO. And faster."

I think the terminology has been fairly standard since screen readers began supporting HTML. You're right that using "heading" and "sub-heading" would be simpler for h1 and h2, but it would come unravelled fairly swiftly after that. Not sure I have the patience to listen to "sub-sub-sub-sub-sub-heading" for every h5, or worse given that infinite heading levels would be possible :)
Comment 31 Ian 'Hixie' Hickson 2014-05-12 01:47:30 UTC
www.terminix.com is a very useful example, thanks. That page is definitely a mess, and I agree that reporting implicit headings in that page wouldn't be useful.
Comment 32 Ian 'Hixie' Hickson 2014-05-12 02:09:27 UTC
In fact that page has all kinds of crazy stuff.

For example:

   <h2 class="session-pop-up-h2-like-h1">

...or:

   <section class="footer">
     <span class="ico-phone modal-link">Prefer to call us directly?</span>
     <h2>866.399.0453</h2>
   </section>

...or (I've only rearranged the spacing here):

   <table class="choice new">
     <tr>
       <td>
         <label>
           I need a representative to call me as soon as possible.
         </label>
       </td>
     </tr>
     <tr>
       <td></td>
       <td></td>
     </tr>
   </table>

...or:

   <section class="header">
       How can we help?
   </section>

I don't think anything with the outline algorithm is going to save this page.

It might be worth ATs providing both an outline-driven page navigation and a heading-driven page navigation. Or, maybe ATs should apply heuristics similar to how they handle layout-table pages, or <div>itis pages that only use <div>, etc. For example, if a page has multiple <section> elements that contain only intra-element whitespace and no headings, but the page has <h2> elements, maybe <section> elements are worthless on that page and the outline algorithm should be run by treating <section>s as <div>s. But this isn't something we can really specify, since it's something ATs would have to experiment with to get the exact logic optimised for users (and maybe different users have different needs).
Comment 33 steve faulkner 2014-05-12 08:35:44 UTC
We have dealt with (in the W3C HTML spec) a related issue of unnecessary use <section> elements  by a number of means:

(In response to SR users reporting overuse of sections which were announced to users due to the mapping of section to region.)

We have tightened up the author advice around sections:
"The theme of each section should be identified, typically by including a heading (h1-h6 element) as a child of the section element."
http://www.w3.org/html/wg/drafts/html/master/sections.html#the-section-element

also by adding advice for AT:

"Note:It is strongly recommended that user agents such as screen readers only convey the presence of, and provide navigation for section elements, when the section element has an accessible name."

http://www.w3.org/html/wg/drafts/html/master/dom.html#sec-implicit-aria-semantics

We don't expect the changes to the section author requirements to have any immediate effect, but the AT advice has already been taken on board by the vendor whose software was announcing unlabelled sections as regions, the changes they made in response, has resulted in an improved user experience.

PS: I have not provided the above to start a debate on the merits of the changes on this bug, if anyone wishes to feedback on the changes please file a bug against the W3V HTML spec. 
https://www.w3.org/Bugs/Public/enter_bug.cgi?comment=&product=HTML%20WG&component=HTML5%20spec
Comment 35 Ian 'Hixie' Hickson 2014-08-01 23:10:36 UTC
> http://www.tomshardware.com/

This page is an accessibility nightmare in general. I literally got lost while walking the DOM of this page at one point, four tables deep. This page would work ok in a "only show headings" mode, ignoring section depth, though, at least as far as section navigation goes.

> http://www.gilt.com/

This page's markup was actually not that unreasonable, as far as I can tell. It actually works _better_ if you don't ignore sections with no headings, because it uses untitled sections to group the content. That's exactly what untitled sections are good for.

> http://www.html5accessibility.com/HTML5data/section/index-1748.html

No idea what's going on on this page. Even ignoring the outline, and looking at the page itself, it seems a mess. Maybe it's missing some scripts.

> http://www.html5accessibility.com/HTML5data/section/index-1803.html

This page isn't perfect, but I don't think blank headings are particularly harming it.

> http://www.html5accessibility.com/HTML5data/section/index-1814.html

Again, the blank headings here seem like a minor problem.

> http://www.html5accessibility.com/HTML5data/section/index-1912.html

I assume this page is missing some scripts or style sheets or something. It's not clear that any heading navigation would be particularly helpful on this page. Certainly blank section headings aren't helping here.

> http://www.kelkoo.co.uk/c-138701-flowers-plants-supplies.html

This has some untitled sections, but they don't seem harmful. Really there's just one major one, the sidebar. Ironically it actually has a title in the markup (its ID). I would assume that if that site hires an accessibility consultant, they'd have them add some ARIA annotations to fix the problem (or just a heading that's hidden in the visual view, which would be enough).

> http://news.lycos.com/

This page seems fine, heading-wise.

> http://www.aftenposten.no/nyheter/uriks/

The top of this page is painfully full of untitled sections. A heading mode would probably be more useful.

> http://www.skyteam.com/en/Supporting-your-business/

The untitled sections are all just in the footer. Probably not a serious problem in practice.

> http://www.buzzsugar.com/
> http://www.popsugar.com/

This page is a disaster, but we can't save it by tweaking the outline algorithm: the page has no headings at all. (These two sites are the same.)

> http://www.bhg.com/decorating/

No serious problem here.

> http://www.bt.dk/politik/venstre-om-grandprix-jelved-total-mangel-paa-situations-fornemmelse

A few spurious sections, but nothing that serious. Most untitled sections are actually used for grouping.

> http://www.cristalab.com/cursos/

Not really a serious problem here; the untitled sections are real, and not in fact titled.

> http://www.ps4italia.com/data-ed-conferenza-sony-alle3-2014/

This page is a mess, but I think a heading-only navigation mode would be sufficient to navigate it.

> http://www.artfire.com/browse/vintage/ephemera

The untitled sections are real, they're just not titled.

> http://www.mojo-themes.com/categories/html-css/

There's a few bogus sections, but the outline is mostly ok.

> http://www.html5accessibility.com/HTML5data/section/index-7561.html
> http://www.rushlimbaugh.com/

Bunch of untitled sections at the top of this page, for reasons I can't determine, but a heading-only navigation mode would be sufficient.

> http://www.coasttocoastam.com/

The untitled sections here are actually real, and have no heading, so navigating by outline would actually work better than navigating by heading. But yeah, this page is a disaster.

> http://www.komonews.com/

I've no idea what this site is doing with <article>, but it's a mess.



I found this set of sites actually somewhat encouraging. There's a lot of inaccessible content as usual, but it's nowhere near as bad as this bug suggests.

Most of the sites that are problematic can be saved just by ATs offering two navigation modes: one to walk the outline tree, and one to walk the heading list. I think that would be more effective than attempting to figure out which sections should be skipped when navigating the tree, because in practice, a number of sites have untitled sections that are in fact real sections that it's useful to navigate to. I'll add some text to that effect.

There are a few sites in this list that are lost causes, even with those two modes. But I don't think there's anything we can do to the outline algorithm to help those (e.g. the site with no headings).
Comment 36 contributor 2014-08-01 23:15:09 UTC
Checked in as WHATWG revision r8698.
Check-in comment: Add a section encouraging user agents (especially ATs) to expose the outline and headings
http://html5.org/tools/web-apps-tracker?from=8697&to=8698
Comment 37 Ian 'Hixie' Hickson 2014-09-08 23:53:53 UTC
Please reopen if I missed something (see in particular comment 35 and comment 36).
Thanks for filing this bug, hopefully the added section will help AT implements and users to get the best of both worlds (the old-style heading navigation, and the new-style outline navigation, where available).