This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24100 - Bug in the HTML outline algorithm
Summary: Bug in the HTML outline algorithm
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL:
Whiteboard:
Keywords:
: 24107 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-12-15 07:31 UTC by Michael[tm] Smith
Modified: 2014-02-21 21:09 UTC (History)
4 users (show)

See Also:


Attachments

Description Michael[tm] Smith 2013-12-15 07:31:04 UTC
+++ This bug was initially created as a clone of Bug #24097 +++

Quoting Marc Hoyois's description from bug 24097:

[[
The determination of the *current section* when exiting a sectioning root is wrong and can lead to several weird behaviors, including an actual error. Here are a couple of examples.

## Example 1 (error)

<body>
<h1>A</h1>
<section></section>
<figure></figure>
<h2>B</h2>
</body>

After exiting the sectioning root <figure>, the algorithm sets the current section to be the *deepest section* in the current outline, which is the section corresponding to the <section> element. Then, when entering <h2>, it will compare the rank of <h2> with the rank of the implied heading of that section, which is undefined.

## Example 2 (no error but nonsensical outline)

<body>
<h1>A</h1>
<section><h1>B</h1></section>
<figure></figure>
<h2>C</h2>
</body>

In this case the algorithm produces the outline

1. A
   1.1. B
      1.1.1. C

If we remove the <figure> element, we get the correct outline:

1. A
   1.1. B
   1.2. C

## Solution

The problem is this: when exiting a sectioning root, the current section should be set to whichever section was current upon entering the root, but this is not always the deepest section. The algorithm could ask that the correct section be remembered, or else that section can be determined as follows (when exiting the sectioning root):

- let *current section* be the last section of the current outline
- if the last child section of *current section* exists and is an *implicit* section, then go to the step *finding the deepest child*, otherwise do nothing

## Another bug?

There is a related point which I'm not sure is intended. Consider the document:

<body>
<figure></figure>
<h1>Title</h1>
</body>

The algorithm computes the outline:

1. Untitled document
2. Title

If sectioning roots are supposed to be "invisible" in the outline, then the outline should simply be

1. Title

If the latter is indeed the intended behavior, then the algorithm should not create an implied heading when entering a sectioning root.
]]
Comment 1 Michael[tm] Smith 2013-12-16 07:57:51 UTC
*** Bug 24107 has been marked as a duplicate of this bug. ***
Comment 2 Ian 'Hixie' Hickson 2013-12-16 22:11:06 UTC
For the record, the reason this logic exists at all (distinguishing sectioning roots from sectioning content) is so that this:

   <h1>...</h1>
   <h2>...</h2>
   <figure></figure>
   <h3>...</h3>

...results in:

    h1 section
      h2 section
        (figure)
        h3 section

...while this:

   <h1>...</h1>
   <h2>...</h2>
   <section></section>
   <h3>...</h3>

...result in:

    h1 section
      h2 section
      anon section
      h3 section

...in the outline.
Comment 3 Ian 'Hixie' Hickson 2013-12-16 22:17:31 UTC
Split off the second part to bug 24118.
Comment 4 contributor 2013-12-16 22:30:41 UTC
Checked in as WHATWG revision r8357.
Check-in comment: Make the outline algorithm easier to edit by making it all explicit steps and breaking out the (currently still identical) steps for entering sectioning content vs sectioning roots.
http://html5.org/tools/web-apps-tracker?from=8356&to=8357
Comment 5 Ian 'Hixie' Hickson 2013-12-16 22:44:26 UTC
Actually nevermind about that splitting off, I fixed bug 24118 at the same time as this one anyway. Heh. Thanks Marc! Please do reopen this bug if it's not properly fixed (or bug 24118 if that part of it isn't fixed). Thanks!
Comment 6 contributor 2013-12-16 22:44:28 UTC
Checked in as WHATWG revision r8358.
Check-in comment: Make the outline algorithm handle sectioning roots more sensibly
http://html5.org/tools/web-apps-tracker?from=8357&to=8358
Comment 7 Marc Hoyois 2013-12-17 08:33:42 UTC
Everything looks good, except that you removed the penultimate step "Associate current outline target with current section" when entering a sectioning root. Without it a sectioning root ends up being associated with its *parent section* (which is null for <body>).
Comment 8 Ian 'Hixie' Hickson 2014-01-03 22:34:14 UTC
You want it to be associated with its parent section, otherwise it disappears from the section it was a part of, which makes no sense (consider a <blockquote>; it's not a subsection, it's just a part of the section that happens to have its own outline). In the case of a root <body>, it gets associated with its own section because "current section" is set when you enter the <body> and is never unset.

No? Maybe I'm missing something.
Comment 9 Marc Hoyois 2014-01-04 00:32:52 UTC
You're right about <body>.

I see your point, but you could also argue the other way. If <body> is to be associated with the top section in the outline it creates, then you might expect the same for other sectioning roots. I guess it depends on what exactly is the purpose of these associations; the spec doesn't say.
Comment 10 Ian 'Hixie' Hickson 2014-01-04 22:01:50 UTC
The purpose is up to the implementation, but for example: if you had the element, which entry in the table of contents should you highlight? If you associate the <blockquote> with the sections of its internal outline only, there's no link from that outline to the parent outline. It's like you've orphaned the element entirely. There'd be no way to know what section the element was in:

   <h1>Aaa</h1>
   <h2>Bbb</h2>
   <blockquote>...</blockquote>
   <h2>Ccc</h2>

What section is the blockquote in?
Comment 11 Marc Hoyois 2014-01-04 22:35:01 UTC
> What section is the blockquote in?

I'd say the answer depends on which outline you're looking at: it's in section Bbb of the main outline and it's also in the first section of another outline. But if each node must be associated to a single section of a single outline, that section should not depend on which root you run the algorithm from, as it currently does. The outline of the <blockquote> element will not be linked to the main outline either way, and the problem of figuring out which section to highlight also applies to any child of <blockquote>, so changing the section associated with <blockquote> does not solve it.
Comment 12 Ian 'Hixie' Hickson 2014-01-09 19:04:41 UTC
The children of the blockquote belong to the outline of the blockquote, and the blockquote itself belongs to the outline of the document. That way you can walk your way up the chain. If we associate the blockquote with the inner outline's section, then the chain is broken.

I suppose we could have a 1:many association model, but I'm not really sure what that would mean, exactly.
Comment 13 Marc Hoyois 2014-01-09 20:46:04 UTC
> That way you can walk your way up the chain.

I don't see how. You can't possibly link the outlines with a one-to-one association of nodes with sections.

With either model a user agent that wants to figure out in which section of the main outline a given node is cannot do it using only the trees of sections and the node→section mapping. Given this, it seems more consistent to associate sectioning roots with a section in their own outline, since that's how it's done for the top root and for sectioning content elements.
Comment 14 Ian 'Hixie' Hickson 2014-01-14 00:13:04 UTC
You walk the outline chain by going element -> section, section -> outline, outline -> element, loop.
Comment 15 Marc Hoyois 2014-01-14 01:42:07 UTC
The new algorithm provides no way of getting the element from the outline.

Anyway, in the hope of moving things forward, let me suggest a couple solutions. I'm assuming the goal is to determine which section to highlight in a table of content given an element.

Solution 1: the algorithm simply treats all sectioning roots (except the top one) and their descendants as generic nodes. That way the algorithm produces only one outline, that of the root given as input, and all nodes are associated to a section in that outline. This seems like a clean and practical solution.

Solution 2: leave the algorithm as is, but restore the "associate node with section" step as I proposed in comment #7. Read literally, each sectioning root now has an "associated section" (same as in the pre-december algorithm) as well as a "parent section". Using this data you can walk up the chain.
Comment 16 Ian 'Hixie' Hickson 2014-01-14 19:16:37 UTC
The outline is the outline of the element. I don't understand what you mean.

I don't understand the problem that the solutions are attempting to solve. As far as I can tell, the issue in comment 7 isn't an issue. I thought what we were discussing is why it _is_ an issue. :-)
Comment 17 Marc Hoyois 2014-01-19 17:47:00 UTC
Say you're given some descendant of that <blockquote> element, and you want to highlight the section in the outline of <body> where the node belongs. The problem is that the output of the algorithm does not contain the necessary information to figure out which section that is. What you proposed in comment #14 assumes that outlines are some sort of object having the element as a property, but the algorithm does not define outlines as such: they're just trees of sections.
Comment 18 Ian 'Hixie' Hickson 2014-01-21 21:39:11 UTC
An outline is "for a sectioning content element or a sectioning root element" (quoting from the definition of "outline" at http://whatwg.org/html#outline ).

So if you get to an outline, you can go to the element for which it was created.
Comment 19 Marc Hoyois 2014-02-03 06:21:57 UTC
I don't know what to say except to repeat the last sentence of comment #17. I doubt that any implementor would understand the sentence "an outline is for a sectioning element" as "an outline must point to a sectioning element". (I'm terribly busy at the moment, so I may not be very responsive.)
Comment 20 Ian 'Hixie' Hickson 2014-02-07 18:17:05 UTC
I don't know what "must point" would mean, I mean, we can't very well give conformance criteria for the shapes of internal data structures.

But I think it's eminently reasonable to assume that if X is for Y, a property of X is that it is for Y, and a property of Y is that X is for it. The outline doesn't exist in isolation, it exists only in the context of the element for which it was created. Would the spec be more acceptable if I simply added the sentence "The element for which the outline was created is said to be the outline's owner"?
Comment 21 Ian 'Hixie' Hickson 2014-02-21 21:09:03 UTC
I've added the sentence I suggested in comment 20. Reopen the bug if it's not enough or if I am still missing something here.
Comment 22 contributor 2014-02-21 21:09:45 UTC
Checked in as WHATWG revision r8499.
Check-in comment: Try to clarify that outlines are owned by elements.
http://html5.org/tools/web-apps-tracker?from=8498&to=8499