27314 – Parser: "adjusted current node" not clear enough

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 27314 - Parser: "adjusted current node" not clear enough

Summary: Parser: "adjusted current node" not clear enough

Status:	RESOLVED MOVED

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	P3 normal
Target Milestone:	Unsorted
Assignee:	Ian 'Hixie' Hickson
QA Contact:	contributor

URL:	https://html.spec.whatwg.org/#adjuste...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-11-12 13:31 UTC by contributor
Modified:	2019-03-29 19:43 UTC (History)
CC List:	8 users (show)

See Also:

Attachments

Description contributor 2014-11-12 13:31:14 UTC

Specification: https://html.spec.whatwg.org/multipage/syntax.html
Multipage: https://html.spec.whatwg.org/multipage/#adjusted-current-node
Complete: https://html.spec.whatwg.org/#adjusted-current-node
Referrer: https://html.spec.whatwg.org/multipage/

Comment:
"adjusted current node" not clear enough.
<http://www.w3.org/mid/546285C8.1050701@mit.edu>

Posted from: 94.234.170.93 by simonp@opera.com
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.52 Safari/537.36 OPR/26.0.1656.17 (Edition beta)

Comment 1 Ian 'Hixie' Hickson 2014-11-12 19:14:23 UTC

How would you like this editorial issue addressed?

Comment 2 Boris Zbarsky 2014-11-13 16:39:19 UTC

I think it would make more sense to me we introduce a concept like "override current node" (name tbd) and define adjusted current node like so:

  The adjusted current node is override current node if the stack of open
  elements has only one element in it and the override current node is set;
  otherwise, the adjusted current node is the current node

And then the fragment parsing algorithm would set the override current node.  This would encapsulate all the information needed to determine the current node in the parser bits, with a concrete API entry point the fragment parsing algorithm uses.

Comment 3 Ian 'Hixie' Hickson 2014-11-13 23:52:14 UTC

I don't understand how this differs from what we have now.

Comment 4 Boris Zbarsky 2014-11-14 00:03:54 UTC

It differs by not referring to non-local "context object" state from some other algorithm.  Instead, all the state the parser needs is right on the parser.

Comment 5 Ian 'Hixie' Hickson 2014-11-19 21:59:12 UTC

How is "override current node" any more local than "context object"?

Comment 6 Boris Zbarsky 2014-11-19 22:15:45 UTC

The former is a property of the parser, but the latter is a property of some algorithm that later calls into the parser.

Comment 7 Boris Zbarsky 2014-11-19 22:16:29 UTC

I mean, if you have better ideas, that would be good too.  I just have a data point: at least 3 people including myself read this spec bit and were totally confused about the behavior.  I _think_ with my phrasing they would not have been confused.

Comment 8 Ian 'Hixie' Hickson 2014-11-20 00:24:59 UTC

I guess I don't understand why "context element" is not a property of the parser. It's even hyperlinked to its definition right there at the top of the "HTML fragment parsing algorithm". If we introduce yet another term, which itself would just be defined in terms of the "context element", that just seems like a net increase in complexity. I mean, your definition is isomorphic with the current one, except with one layer of indirection added, no?

I guess I don't understand what's confusing about the current text. The link in comment 0 doesn't really explain what the problem is.

Comment 9 Boris Zbarsky 2014-11-20 02:14:28 UTC

I don't know why people can't get that out of the spec text until it's explicitly pointed out to them.  We just have empirical evidence they can't.

I should also note that afaict the behavior is underdefined right now when there is no context element, unless I'm missing something...  my proposed phrasing would in fact fix that.

Comment 10 Ian 'Hixie' Hickson 2014-11-20 17:59:58 UTC

My problem is that if I don't know why people don't understand the spec, I can't fix it. I have no way to evaluate whether your proposed fix would improve clarity. My judgement is that it would be net worse since it has more indirection. I could be wrong. I don't know. That's the point.

Looks like the context node being optional is no longer needed, so I've dropped all that. I've also reordered some sentences and added some notes to make some things hopefully a little clearer. But again, without understanding the source of the confusion, I can't really fix this.

Comment 11 contributor 2014-11-20 18:08:39 UTC

Checked in as WHATWG revision r8856.
Check-in comment: Try to clarify stuff around fragment parsing (this removes the ability to call the fragment parsing algorithm without a context node; I couldn't find anyone doing that, but if I missed a case please let me know)
https://html5.org/tools/web-apps-tracker?from=8855&to=8856

Comment 12 Boris Zbarsky 2014-11-20 18:39:23 UTC

I think from my point of view there is less indirection in my proposal.  The parser just directly uses the state of the parser instead of using the ambient state of being of some other algorithm that invoked the parser...

Comment 13 Ian 'Hixie' Hickson 2014-11-21 22:03:40 UTC

The HTML fragment parsing algorithm is part of the parser. (Technically it's actually the only defined entry-point into the parser... the other uses of the parser are much more handwavy.)

Comment 14 Boris Zbarsky 2014-11-22 02:18:01 UTC

Maybe I should have been more precise.  The relevant state is consumed by the tree builder, or whatever it is that creates actual element instances.  That's where it should be stored, imo.

Comment 15 Ian 'Hixie' Hickson 2014-11-24 18:46:10 UTC

Oh it's definitely a mistake to consider the tree building a separate component from the rest of the parser. The tree constructor is intricately intertwined with lost of other parts of the platform, including e.g. the tokeniser, the initial parser setup, and so on. I really don't think that adding a new term just in the tree construction section, which is merely defined as being something outside the tree construction section, would help simplify anything.

Comment 16 Boris Zbarsky 2014-11-24 19:20:38 UTC

> Oh it's definitely a mistake to consider the tree building a separate component
> from the rest of the parser.

In actual implementations, the tree building and tokenization don't even run on the same thread.  So they're somewhat separate.  Tightly coupled, to be sure, but separate.

Comment 17 Ian 'Hixie' Hickson 2014-11-25 07:13:20 UTC

I disagree that running on a second thread implies any sort of design or conceptual separation. But now we're getting to rather philosophical points.

How's the current text in the spec? Do you still find it confusing? Can you elaborate on why you find it confusing, if so? What interpretations of the current spec do you initially have?

Comment 18 Boris Zbarsky 2014-11-25 13:58:48 UTC

It's really hard to evaluate that now that I know what the answer is.

I suggest finding people who haven't looked at this spec before and asking them a concrete question, like so:

Given this markup:

   <svg><g></g></svg>

and this script:

   document.querySelector("g").innerHTML = "<a></a>";

what is the namespace of the resulting <a> element?  This is the question that I saw multiple people answer incorrectly due to the way this spec section is written...

Comment 19 Boris Zbarsky 2014-11-25 14:00:07 UTC

> what is the namespace of the resulting <a> element? 

"And why?" of course.

Comment 20 Ian 'Hixie' Hickson 2014-11-26 19:59:55 UTC

In bug 26783 comment 8 Henri suggests that we just go through and change all the uses of "html" as a sentinel node to just check the size of the stack, so then we can just put the context node on the stack and solve this entire problem with much less prose.

Comment 21 Henri Sivonen 2014-11-27 14:06:25 UTC

(In reply to Ian 'Hixie' Hickson from comment #20)
> In bug 26783 comment 8 Henri suggests that we just go through and change all
> the uses of "html" as a sentinel node to just check the size of the stack,
> so then we can just put the context node on the stack and solve this entire
> problem with much less prose.

I approve of making the spec match the Validator.nu/Gecko code more closely, since I think the result would be conceptually clearer. ;-)

(In reply to Boris Zbarsky from comment #7)
> I mean, if you have better ideas, that would be good too.  I just have a
> data point: at least 3 people including myself read this spec bit and were
> totally confused about the behavior.  I _think_ with my phrasing they would
> not have been confused.

In the Servo context? Or has there been a need to others than me (and wchen while reviewing) to look at this in the Gecko context?

Comment 22 Boris Zbarsky 2014-11-27 14:38:55 UTC

This was in Gecko, when triaging bug reports about innerHTML inside SVG stuff.

Comment 23 Anthony Ramine 2015-09-03 08:45:27 UTC

This vagueness also make parsing template fragments wrong. https://html.spec.whatwg.org/multipage/syntax.html#appropriate-place-for-inserting-a-node ends up returning the only element in the stack of opened elements.

See web-platform-test:
/html/semantics/scripting-1/the-template-element/definitions/template-contents.html
"The template contents must be a DocumentFragment (nested template containing a text node)"

According to spec, #t2 should be appended to the outer template itself, and not its template contents. That sounds like a bug.

Comment 24 Anne 2015-09-04 07:38:52 UTC

I filed https://github.com/w3c/DOM-Parsing/issues/1 on the <template> issue.

Comment 25 Domenic Denicola 2019-03-29 19:43:08 UTC

https://github.com/whatwg/html/issues/4467