This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26074 - HTML parsing algorithm doesn't allow the rt element under the rtc element.
Summary: HTML parsing algorithm doesn't allow the rt element under the rtc element.
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Robin Berjon
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-12 10:49 UTC by Yuki Sekiguchi
Modified: 2014-06-12 13:33 UTC (History)
4 users (show)

See Also:


Attachments

Description Yuki Sekiguchi 2014-06-12 10:49:06 UTC
HTML 5 spec and its editor's draft says
http://www.w3.org/TR/html5/syntax.html#parsing-main-inbody
> A start tag whose tag name is "rt"
> If the stack of open elements has a ruby element in scope, then generate implied end tags, except for rtc elements. If the current node is not then a ruby element, this is a parse error.
> Insert an HTML element for the token.

IIUC, HTML parser doesn't allow the rt element under the rtc element.
However, the description of rt element allows this.
http://www.w3.org/TR/html5/text-level-semantics.html#the-rt-element

> Contexts in which this element can be used:
> As a child of a ruby or of an rtc element.

I think this is bug of parser algorithm.
Comment 1 Robin Berjon 2014-06-12 11:12:45 UTC
(In reply to Yuki Sekiguchi from comment #0)
> HTML 5 spec and its editor's draft says
> http://www.w3.org/TR/html5/syntax.html#parsing-main-inbody
> > A start tag whose tag name is "rt"
> > If the stack of open elements has a ruby element in scope, then generate implied end tags, except for rtc elements. If the current node is not then a ruby element, this is a parse error.
> > Insert an HTML element for the token.
> 
> IIUC, HTML parser doesn't allow the rt element under the rtc element.
> However, the description of rt element allows this.

Those are not references to the editor's draft, they are references to a published snapshot. The editor's draft for 5.0 is at:

    http://www.w3.org/html/wg/drafts/html/CR/Overview.html

Furthermore, what you are pointing at is not a bug in the parsing algorithm. It says to generate implied end tags *except* for rtc elements. This leaves containing rtc elements open.

Here is an example of a parser change that matches the ruby behaviour:

    https://github.com/html5lib/html5lib-python/pull/126/files
Comment 2 Yuki Sekiguchi 2014-06-12 11:44:56 UTC
Thank you for quick reply.

The spec says:
> If the current node is not then a ruby element, this is a parse error.

This phrase is same as algorithm for "rb", "rp" and "rtc".

I think this phrase denies rb, rp or rtc element under any elements except ruby element.
If this is correct, the phrase for rt element should be "If the current node is not then a ruby element nor a rtc element, this is a parse error."
Comment 3 Koji Ishii 2014-06-12 12:45:20 UTC
Robin, I think Yuki is correct.

What you replied was about the first sentence of the spec:
> A start tag whose tag name is "rt"
> ----------------------------------
> If the stack of open elements has a ruby element in scope,
> then generate implied end tags, except for rtc elements.

The problem is in the next sentence:
> If the current node is not then a ruby element, this is a parse error.

Since rt does not close rtc, the parser must allow rtc as the current node in the second sentence.

The current WebKit implementation allows both ruby and rtc as in the line 891 of this patch:
http://trac.webkit.org/changeset/167437/trunk/Source/WebCore/html/parser/HTMLTreeBuilder.cpp
I did this because I was primarily reading "4.5 Text-level semantics" and "8.1.2.4 Optional tags", without reading "8.2.5 Tree construction" very well.

That indicates that 4.5 and 8.2.5 are contradicting, and will result in different implementations by which section developer reads.

It looks to me that, in this case, 4.5 is correct.
Comment 4 Robin Berjon 2014-06-12 13:33:07 UTC
(In reply to Yuki Sekiguchi from comment #2)
> The spec says:
> > If the current node is not then a ruby element, this is a parse error.
> 
> This phrase is same as algorithm for "rb", "rp" and "rtc".
> 
> I think this phrase denies rb, rp or rtc element under any elements except
> ruby element.

Ah! Sorry, I misunderstood your bug as referring to the "If the stack of open elements has a ruby element in scope" part and couldn't see what was wrong.

Indeed you are correct that the parse error is spurious. This is now fixed in the CR draft (updated online every 10 minutes).

Good catch, thanks!