Tolerating broken HTML writers

These are illegal according to SGML, but they're so prevalent that they're supported by the sample implementation.

Please stop generating HTML in this style!

NeXT editor

broken in Linemode Browser 1.3c

Looks for name in end tag before determining whether to end an RCDATA section:

This example section ends here: </foo . Even though the above ETAGO begins a markup error, this text is in a normal paragraph in conforming implementations.<P> But, alas, it's not on the linemode browser.<P> The following XMP tag is an error on conforming systems: This anchor's name starts with a digit, which is not a name start character.

This anchor's href contains a #, which is not a name character. The NeXT browser writes all its anchors like this. Unfortunately, many systems copied this behavior.

html-mode.el

See NeXT.

Viola

Any known problems? I hear it's going to use SGMLs.

MidasWWW 1.0

The MidasWWW parses HTML into its internal data structures, and then offers the option to extract the data and write it to a file.

It doesn't get it right all the time.

Treats all out-of-context tags as data:

www_and_frame

Go get The latest version -- it should be current with this spec.