<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>15192</bug_id>
          
          <creation_ts>2011-12-15 00:28:52 +0000</creation_ts>
          <short_desc>section 8.1.4 Character references; section 8.2.2.2 Character encodings In section 8.2.2.2, we say, &quot;User agents must at a minimum support the UTF-8 and Windows-1252 encodings, but may support more.&quot; In section 8.1.4, we say, &quot;The numeric character refere</short_desc>
          <delta_ts>2012-01-18 15:35:15 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>HTML5 spec</component>
          <version>unspecified</version>
          <rep_platform>Other</rep_platform>
          <op_sys>other</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WORKSFORME</resolution>
          
          
          <bug_file_loc>http://www.whatwg.org/specs/web-apps/current-work/#top</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>contributor</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>mike</cc>
    
    <cc>Ms2ger</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>61575</commentid>
    <comment_count>0</comment_count>
    <who name="">contributor</who>
    <bug_when>2011-12-15 00:28:52 +0000</bug_when>
    <thetext>Specification: http://www.w3.org/TR/2011/WD-html5-20110525/
Multipage: http://www.whatwg.org/C#top
Complete: http://www.whatwg.org/c#top

Comment:
section 8.1.4 Character references; section 8.2.2.2 Character encodings

In section 8.2.2.2, we say, &quot;User agents must at a minimum support the UTF-8
and Windows-1252 encodings, but may support more.&quot;

In section 8.1.4, we say, &quot;The numeric character reference forms described
above are allowed to reference any Unicode code point other than U+0000,
U+000D, permanently undefined Unicode characters (noncharacters), and control
characters other than space characters.&quot;

What about the characters in the range 0x80 to 0x9F, which in Windows-1252
encodings are replaced with printable characters?

For example, am I allowed to use a Windows-1252 codepoint, &quot;&amp;#x80;&quot;, to
reference the Euro character, &quot;&amp;#x20AC;&quot;? Does the browser have to further
interpret strings after replacing character references?

I suggest we add a note to 8.1.4 Character references:
&quot;The numeric character references are to Unicode code points, so instead of
using character references in the range of &amp;#x80; to &amp;#x9F; from the
Windows-1252 encoding, use the appropriate Unicode character. Instead of using
character references in the range of &amp;#D800; to &amp;#DFFF; as surrogate pairs
from the UTF-16 encoding, use the appropriate Unicode character.&quot;


Posted from: 96.53.31.86
User agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>