<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>17418</bug_id>
          
          <creation_ts>2012-06-05 15:31:46 +0000</creation_ts>
          <short_desc>&amp; did not start a character reference and Errors involving fragile syntax constructs</short_desc>
          <delta_ts>2015-08-23 06:58:46 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML Checker</product>
          <component>General</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows NT</op_sys>
          <bug_status>NEW</bug_status>
          <resolution></resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>rasamassen</reporter>
          <assigned_to name="Michael[tm] Smith">mike+validator</assigned_to>
          <cc>mike</cc>
          
          <qa_contact name="qa-dev tracking">www-validator-cvs</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>68731</commentid>
    <comment_count>0</comment_count>
    <who name="">rasamassen</who>
    <bug_when>2012-06-05 15:31:46 +0000</bug_when>
    <thetext>As explained in the non-normative section of HTML5 (obviously based on normative sections), http://www.w3.org/TR/html5/introduction.html#syntax-errors, under &quot;Errors involving fragile syntax constructs&quot;:

The correct way to express the above cases is as follows:

&lt;a href=&quot;?bill&amp;ted&quot;&gt;Bill and Ted&lt;/a&gt; &lt;!-- &amp;ted is ok, since it&apos;s not a named character reference --&gt;
&lt;a href=&quot;?art&amp;amp;copy&quot;&gt;Art and Copy&lt;/a&gt; &lt;!-- the &amp; has to be escaped, since &amp;copy is a named character reference --&gt;


Thus, the error &quot;&amp; did not start a character reference&quot; should only appear when the &quot;&amp;&quot; precedes a named character reference.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>77138</commentid>
    <comment_count>1</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2012-10-26 11:21:32 +0000</bug_when>
    <thetext>I wrote an experimental patch for this and pushed it to http://qa-dev.w3.org:8888/

So for now you can test it there and please let me know if find any problems.

I&apos;ll try to get the patch landed in the sources soon and pushed out to the production validator.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>77146</commentid>
    <comment_count>2</comment_count>
      <attachid>1240</attachid>
    <who name="">rasamassen</who>
    <bug_when>2012-10-26 12:33:34 +0000</bug_when>
    <thetext>Created attachment 1240
Test case</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>77148</commentid>
    <comment_count>3</comment_count>
    <who name="">rasamassen</who>
    <bug_when>2012-10-26 12:39:19 +0000</bug_when>
    <thetext>Tested with the attached test case. The error never showed up where it shouldn&apos;t. Tested it on other sites as well. Looks like the patch is working.

Based on http://www.w3.org/TR/html5/named-character-references.html, an error should have shown up for &quot;&amp;dollar&quot; and &quot;&amp;minus&quot;, but the live validator (http://validator.w3.org) does not recognize them as named character references, so I imagine that is a separate bug.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>77149</commentid>
    <comment_count>4</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2012-10-26 12:48:52 +0000</bug_when>
    <thetext>(In reply to comment #3)
&gt; Tested with the attached test case. The error never showed up where it
&gt; shouldn&apos;t. Tested it on other sites as well. Looks like the patch is working.

Excellent. Thanks very much for taking the time to test -- I really appreciate it.

&gt; Based on http://www.w3.org/TR/html5/named-character-references.html, an
&gt; error should have shown up for &quot;&amp;dollar&quot; and &quot;&amp;minus&quot;, but the live
&gt; validator (http://validator.w3.org) does not recognize them as named
&gt; character references, so I imagine that is a separate bug.

Yes, I can confirm from inspection of the current validator source code that the code currently does not recognize &quot;dollar&quot; and &quot;minuss&quot; as named characters. The only characters it recognizes as such are the ones in the NAMES array in this file:

  http://hg.mozilla.org/projects/htmlparser/raw-file/default/src/nu/validator/htmlparser/impl/NamedCharacters.java

So please do file a bug noting that &quot;dollar&quot; and &quot;minus&quot; are missing from that (along with any other missing ones you might find).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>77153</commentid>
    <comment_count>5</comment_count>
    <who name="">rasamassen</who>
    <bug_when>2012-10-26 15:40:09 +0000</bug_when>
    <thetext>Bug 19718 created to address the issue.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>77824</commentid>
    <comment_count>6</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2012-11-04 10:23:38 +0000</bug_when>
    <thetext>(In reply to comment #3)
&gt; Tested with the attached test case. The error never showed up where it
&gt; shouldn&apos;t. Tested it on other sites as well. Looks like the patch is working.
&gt; 
&gt; Based on http://www.w3.org/TR/html5/named-character-references.html, an
&gt; error should have shown up for &quot;&amp;dollar&quot; and &quot;&amp;minus&quot;, but the live
&gt; validator (http://validator.w3.org) does not recognize them as named
&gt; character references, so I imagine that is a separate bug.

The validator does recognize &quot;&amp;dollar;&quot; and &quot;&amp;minus;&quot; as valid named character references. The current spec actually does not require it to recognize semicolon-less &quot;&amp;dollar&quot; and &quot;&amp;minus&quot; as special in any way, and they are not errors, so the per-spec behavior for them it to report nothing at all.

I realize that the validator (actually the HTML parser used by the validator) does report &quot;Named character reference was not terminated by a semicolon&quot; errors for semicolon-less versions of some named character references such as &quot;&amp;reg;&quot;. I&apos;d need to look at the code more to figure out why it does that for some and not for others. I suspect it just has to do with length. But regardless, the current spec doesn&apos;t actually define &quot;&amp;reg&quot; as a parse error, so I think the actual bug here might be that the parser is emitting any error message at all for the &quot;&amp;reg&quot; case.</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>1240</attachid>
            <date>2012-10-26 12:33:34 +0000</date>
            <delta_ts>2012-10-26 12:33:34 +0000</delta_ts>
            <desc>Test case</desc>
            <filename>amp-test.html</filename>
            <type>text/html</type>
            <size>289</size>
            <attacher>rasamassen</attacher>
            
              <data encoding="base64">PCFET0NUWVBFIGh0bWw+CjxodG1sPgo8aGVhZD4KCTxtZXRhIGNoYXJzZXQ9InV0Zi04Ij4KCTx0
aXRsZT5BbXAgVGVzdDwvdGl0bGU+CjwvaGVhZD4KPGJvZHk+CjxhIGhyZWY9Ij9ib2ImMTIzIj5J
cyB0aGlzIE9LPzwvYT4KPGEgaHJlZj0iP2JvYiZkb3QmYXJlJnJ1bm5pbmciPklzIHRoaXMgT0s/
PC9hPgo8YSBocmVmPSI/cnVuJmNvcHkmcGFzdGUiPklzIHRoaXMgT0s/PC9hPgo8YSBocmVmPSI/
Ym9iJmZvbyZkb2xsYXImbWludXMmQWdyYXZlIj5JcyB0aGlzIE9LPzwvYT4KCjwvYm9keT4KPC9o
dG1sPg==
</data>

          </attachment>
      

    </bug>

</bugzilla>