W3C

HTML Parser and Generator Implementations

This is a sort of "Family Tree" of HTML parser implementations, annotated with notes on features and bugs.

I'm working on updating the HTML parser in our reference code. See: A Lexical Analyzer for HTML and Basic SGML.

See also: HTML Testing and Certification

SGML.c in LibWWW
The first HTML parser ever released was in the library/linemode distribution back in '92 or so. It supported broken markup such as:
<xmp>... </foo> ... </xmp>
<a href=http://foo.bar/>...</a>
htmllib.py used in grail
Based on regexps. Guido wrote the first web spider, I believe. This parser treats P, LI, DT, DD as empty elements. Nifty formatter code.
SGML Lexical Analyzer

Tools that Write HTML

LaTeX2HTML
Creates documents with missing quotes around the attribute values.

W3C
Connolly
$Id: implementations.html,v 1.1 2000/06/19 17:13:03 janet Exp $