This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
As explained in the non-normative section of HTML5 (obviously based on normative sections), http://www.w3.org/TR/html5/introduction.html#syntax-errors, under "Errors involving fragile syntax constructs": The correct way to express the above cases is as follows: <a href="?bill&ted">Bill and Ted</a> <!-- &ted is ok, since it's not a named character reference --> <a href="?art&copy">Art and Copy</a> <!-- the & has to be escaped, since © is a named character reference --> Thus, the error "& did not start a character reference" should only appear when the "&" precedes a named character reference.
I wrote an experimental patch for this and pushed it to http://qa-dev.w3.org:8888/ So for now you can test it there and please let me know if find any problems. I'll try to get the patch landed in the sources soon and pushed out to the production validator.
Created attachment 1240 [details] Test case
Tested with the attached test case. The error never showed up where it shouldn't. Tested it on other sites as well. Looks like the patch is working. Based on http://www.w3.org/TR/html5/named-character-references.html, an error should have shown up for "&dollar" and "&minus", but the live validator (http://validator.w3.org) does not recognize them as named character references, so I imagine that is a separate bug.
(In reply to comment #3) > Tested with the attached test case. The error never showed up where it > shouldn't. Tested it on other sites as well. Looks like the patch is working. Excellent. Thanks very much for taking the time to test -- I really appreciate it. > Based on http://www.w3.org/TR/html5/named-character-references.html, an > error should have shown up for "&dollar" and "&minus", but the live > validator (http://validator.w3.org) does not recognize them as named > character references, so I imagine that is a separate bug. Yes, I can confirm from inspection of the current validator source code that the code currently does not recognize "dollar" and "minuss" as named characters. The only characters it recognizes as such are the ones in the NAMES array in this file: http://hg.mozilla.org/projects/htmlparser/raw-file/default/src/nu/validator/htmlparser/impl/NamedCharacters.java So please do file a bug noting that "dollar" and "minus" are missing from that (along with any other missing ones you might find).
Bug 19718 created to address the issue.
(In reply to comment #3) > Tested with the attached test case. The error never showed up where it > shouldn't. Tested it on other sites as well. Looks like the patch is working. > > Based on http://www.w3.org/TR/html5/named-character-references.html, an > error should have shown up for "&dollar" and "&minus", but the live > validator (http://validator.w3.org) does not recognize them as named > character references, so I imagine that is a separate bug. The validator does recognize "$" and "−" as valid named character references. The current spec actually does not require it to recognize semicolon-less "&dollar" and "&minus" as special in any way, and they are not errors, so the per-spec behavior for them it to report nothing at all. I realize that the validator (actually the HTML parser used by the validator) does report "Named character reference was not terminated by a semicolon" errors for semicolon-less versions of some named character references such as "®". I'd need to look at the code more to figure out why it does that for some and not for others. I suspect it just has to do with length. But regardless, the current spec doesn't actually define "®" as a parse error, so I think the actual bug here might be that the parser is emitting any error message at all for the "®" case.