[whatwg] Entity parsing

>From section 9.2.3.1. Tokenising entities:
> For some entities, UAs require a semicolon, for others they don't.

This applies to IE.

FWIW, the entities not requiring a semicolon are the ones encoding Latin-1 characters,
the other HTML 3.2 entities (&amp, &gt and &lt), as well as &quot and the uppercase variants (&AMP, &COPY, &GT, &LT, &QUOT and &REG).

IE/mac has its very own interpretation of what `requiring a semicolon' means; it treats
&Deltax as &Deltax, but &Deltax; as Δx; (with the final semicolon rendered).

Firefox and Safari, on the other hand, seem to have implemented the SGML notion
of entities (mostly) correctly, not requiring a semicolon before whitespace, tags, etc.
(the definition of `etc.' varies slightly) and not giving preferential treatment to
certain entities.

Opera apparently allows omission of a semicolon only when both IE and Firefox/Safari do.

This means that `&agrave la' is rendered as intended in all (these) browsers, whereas
`na&iumlve' is not (IE only); `Ha&yuml les Roses' works fine, but not the hyphenated 
`Ha&yuml-les-Roses' (not Firefox) or the capitalised `HA&Yuml LES ROSES'
(Firefox/Safari only). Making omission of semicolons conforming (in specific
cases) does therefore not seem very compelling, as it would either be confusing
and apparently arbitrary or make conforming documents render inconsistently.

(Parsing still has to be defined, of course, but bear in mind that constructions
like `na&iumlve' are IE-only.)

-- 
??istein E. Andersen

Received on Sunday, 5 November 2006 06:52:18 UTC