[whatwg] Fwd: Entity parsing

On Fri, 24 Apr 2009, ?istein E. Andersen wrote:
> 
> When a named character reference is followed by a semicolon, it clearly 
> has to be expanded, but how to handle non-semicolon-terminated character 
> references is less obvious.
> 
> Let &IE4 (resp. &HTML4, &HTML5) be a non-semicolon-terminated named 
> character reference from the IE4 (resp. HTML4, HTML5) set, and let . 
> (full stop) represent any character other than semicolon, and ^ 
> (circumflex) any character which is (roughly) not an ASCII letter or 
> digit (i.e., [^a-zA-Z0-9]).  Not completely unreasonable sets of 
> character references to expand (outside of attribute values) include:
> 
> 	1) &IE4^
> 	2) &IE4.
> 	3) &HTML4^
> 	4) &IE4. &HTML4^
> 	5) &HTML4.
> 	6) &IE4. &HTML5^
> 	7) &HTML4. &HTML5^
> 	8) &HTML5.
> 
> (The set of character references to be expanded in attribute values 
> could be obtained by replacing . by ^ above.)
> 
> Currently, Opera follows 1), IE 2), and Safari and Firefox 3).
> 
> My main concern is that &HTML4^ is actually legitimate in HTML4 and 
> works in both Safari and Firefox today, and that HTML5 should not change 
> the rendering of valid HTML4 pages unless there is a good reason to do 
> so.

Could you give an example of what you mean? I'm having trouble following 
your description above.

As far as I can tell HTML5 more or less matches what legacy pages need, 
but if there are specific entities that should be parsed in a different 
way than HTML5 says they should, I'm happy to fix this.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 4 June 2009 16:49:04 UTC