This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 15192 - section 8.1.4 Character references; section 8.2.2.2 Character encodings In section 8.2.2.2, we say, "User agents must at a minimum support the UTF-8 and Windows-1252 encodings, but may support more." In section 8.1.4, we say, "The numeric character refere
Summary: section 8.1.4 Character references; section 8.2.2.2 Character encodings In se...
Status: RESOLVED WORKSFORME
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-15 00:28 UTC by contributor
Modified: 2012-01-18 15:35 UTC (History)
4 users (show)

See Also:


Attachments

Description contributor 2011-12-15 00:28:52 UTC
Specification: http://www.w3.org/TR/2011/WD-html5-20110525/
Multipage: http://www.whatwg.org/C#top
Complete: http://www.whatwg.org/c#top

Comment:
section 8.1.4 Character references; section 8.2.2.2 Character encodings

In section 8.2.2.2, we say, "User agents must at a minimum support the UTF-8
and Windows-1252 encodings, but may support more."

In section 8.1.4, we say, "The numeric character reference forms described
above are allowed to reference any Unicode code point other than U+0000,
U+000D, permanently undefined Unicode characters (noncharacters), and control
characters other than space characters."

What about the characters in the range 0x80 to 0x9F, which in Windows-1252
encodings are replaced with printable characters?

For example, am I allowed to use a Windows-1252 codepoint, "€", to
reference the Euro character, "€"? Does the browser have to further
interpret strings after replacing character references?

I suggest we add a note to 8.1.4 Character references:
"The numeric character references are to Unicode code points, so instead of
using character references in the range of € to Ÿ from the
Windows-1252 encoding, use the appropriate Unicode character. Instead of using
character references in the range of &#D800; to &#DFFF; as surrogate pairs
from the UTF-16 encoding, use the appropriate Unicode character."


Posted from: 96.53.31.86
User agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)