Re: XHTML character entity support

James Graham scripsit:

> I would be interested in seeing that, if you can dig up some kind of 
> reference.

http://conferences.idealliance.org/extreme/html/2004/Siefkes01/EML2004Siefkes01.html
is the paper.  It's a two-pass algorithm, but schemaless.

> Note that a requirement is that the algorithm not need to use lookahead; 
> it must be possible to implement an incremental, error handling, parser.

Yes, if you need to be both streaming and schemaless, Anne's version is
probably the best you can do.  My TagSoup library is streaming, but
requires a schema (it's distributed with an HTML schema).  Unfortunately,
the schema language is a one-off, because standard XML schema languages
don't provide the necessary information like entity declarations and
default element parents (if the first tag you see is an LI, what should
you interpolate in front of it?  Answer: HTML, BODY, UL in that order).

-- 
It was impossible to inveigle           John Cowan <cowan@ccil.org>
Georg Wilhelm Friedrich Hegel           http://www.ccil.org/~cowan
Into offering the slightest apology
For his Phenomenology.                      --W. H. Auden, from "People" (1953)

Received on Friday, 13 November 2009 16:36:16 UTC