22026 2013-05-14 04:23:47 +0000 For <pre>, <listing>, and <textarea>, the "next token" is not well-defined. For example, does a NULL character token count, if it is ignored by tree construction? 2013-06-17 22:18:12 +0000 1 1 1 Unclassified WHATWG HTML unspecified Other other RESOLVED FIXED http://www.whatwg.org/specs/web-apps/current-work/#the-after-head-insertion-mode P3 normal Unsorted 1 contributor ian ian jukka.k.korpela mike mikeday zcorpan contributor oldest_to_newest 87626 0 contributor 2013-05-14 04:23:47 +0000 Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html Multipage: http://www.whatwg.org/C#the-after-head-insertion-mode Complete: http://www.whatwg.org/c#the-after-head-insertion-mode Referrer: http://www.whatwg.org/specs/web-apps/current-work/multipage/ Comment: For <pre>, <listing>, and <textarea>, the "next token" is not well-defined. For example, does a NULL character token count, if it is ignored by tree construction? Posted from: 110.142.158.46 User agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:20.0) Gecko/20100101 Firefox/20.0 87629 1 mikeday 2013-05-14 04:28:39 +0000 *** Bug 22027 has been marked as a duplicate of this bug. *** 87634 2 jukka.k.korpela 2013-05-14 07:36:53 +0000 This raises the question which characters are allowed. Is it specified somehow? It seems that indirectly it is specified for the XHTML syntax, since it must follow XML 1.0 rules, and they define the allowed characters. In particular, U+0000 NULL is not allowed. NULL is not allowed in HTML 4.01 either. I think browsers usually ignore NULL, but validators may not, and this has caused some confusion, especially since NULL usually appears due to some feature in some software rather than an author’s informed action. If rules are set for character repertoire, they could also specify some general processing rules, e.g. requiring that some characters, though forbidden, must be ignored by user agents when in HTML mode. (In XHTML mode, XML 1.0 rules imply that e.g. NULL is a well-formedness error, with Draconian implications.) 87636 3 zcorpan 2013-05-14 08:01:41 +0000 (In reply to comment #2) > This raises the question which characters are allowed. Is it specified > somehow? Yes. See "parse error" in e.g. http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#data-state http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inbody But this is a bit off-topic for this bug. 87703 4 mikeday 2013-05-15 05:17:04 +0000 There are two test cases: <pre>NULL next line and: <textarea>NULL next line where "NULL" is a literal NULL character (U+0000) expressed in the appropriate character encoding. For <pre> the NULL will be tokenized in the data state, and passed up to tree construction as a character token, but then ignored by the "in body" insertion mode. Since the token is generated, but ignored, does it count as the "next token" or not? The browsers seem to think not, and they still strip the following newline. So the spec could be clarified to define "next token" in a way that reflects this. For <textarea> the NULL will be tokenized in the rcdata state, which generates a character token containing the replacement character (U+FFFD) instead. This is clearly the "next token", so the following newline should *not* be stripped. Chrome acts as expected, but Firefox strips it anyway. This appears to be a bug in Firefox. Given the lack of a definition for "next token", there may be other inconsistencies and ambiguous cases that we have not noticed yet. 88963 5 ian 2013-06-08 00:12:53 +0000 The U+0000 token is the "next token" in these cases. 88964 6 contributor 2013-06-08 00:14:01 +0000 Checked in as WHATWG revision r7949. Check-in comment: Clarify 'next token' in the HTML parser. http://html5.org/tools/web-apps-tracker?from=7948&to=7949 89048 7 mikeday 2013-06-11 02:33:54 +0000 So Firefox and Chrome are incorrect then, given that they both strip the newline even though it follows an (ignored) NUL character? 89421 8 ian 2013-06-17 22:18:12 +0000 Yup. File bugs. :-)