This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
In the tokenization.html page, in the section "8.2.4.69 Tokenizing character references", after the table, it says: """ Otherwise, return a character token for the Unicode character whose code point is that number. If the number is in the range 0x0001 to 0x0008, 0x000E to 0x001F, 0x007F to 0x009F, 0xFDD0 to 0xFDEF, or is one of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or 0x10FFFF, then this is a parse error. """ As far as I understand, the character is still returned even if it's a parse error, but this is not clear. The current wording might suggest that the character is returned, /but/ if the number is in those ranges, then it's a parse error (and it doesn't say what should be returned). I suggest rephrasing it a bit to state explicitly that the character corresponding to that value is returned in both the cases.
(In reply to comment #0) > As far as I understand, the character is still returned even if it's a parse > error, but this is not clear. It's pretty clear to me that the first sentence already covers all cases. Otherwise, the first sentence and the second long long sentence would have been switched. Having said that, I am not the editor and he might agree with you. > I suggest rephrasing it a bit to state explicitly that the character > corresponding to that value is returned in both the cases. Why don't you propose some text by the way?
I'm with Kenny on this. I don't really see how to make it clearer. If you have any proposals though I'm happy to entertain them.
One solution would be to use a list like the in the rest of the page, so something like: ... → 0xD800 to 0xDFFF → greater than 0x10FFFF Parse error. Return U+FFFD. → 0x0001 to 0x0008 → 0x000E to 0x001F → ... Parse error. Treat it as per the "anything else" entry below. → Anything else Return a character token for the Unicode character whose code point is that number.
This bug was cloned to create bug 18021 as part of operation convergence.
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the Editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the Tracker Issue; or you may create a Tracker Issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html Status: Accepted Change Description: applied patch https://github.com/w3c/html/commit/6ce78faff3937f156ea217bba6d290de3f456de0 Rationale: adopted resolution by WHATWG