The following sections contain the formal SGML definition of HTML 4.0, including the SGML declaration and the Document Type Definition (DTD), as well as a sample SGML catalog.
Many authors rely on a limited set of browsers to check on the documents they produce, assuming that if the browsers can render their documents they are valid. Unfortunately, this is a very ineffective means of verifying a document's validity precisely because browsers are designed to cope with invalid documents by rendering them as well as they can to avoid frustrating users.
The following sample SGML catalog can be used with an SGML parser, such as nsgmls (see [SP]), to verify that HTML documents conform to the HTML 4.0 DTD. It assumes that the DTD has been saved as the file "HTML4.dtd" and that the entities are in the files "HTMLlat1.ent", "HTMLsymbol.ent" and "HTMLspecial.ent". Make sure your SGML parser is capable of handling Unicode. See your validation tool documentation for further details.
Beware that such validation, although useful and highly recommended, does not guarantee that a document fully conforms to the HTML 4.0 specification. This is because an SGML parser relies solely on the given SGML DTD which does not express all aspects of a valid HTML 4.0 document. Specifically, an SGML parser ensures that the syntax, the structure, the list of elements and their attributes are valid. But for instance, it cannot catch errors such as setting the width attribute of an IMG element to an invalid value (i.e., "foo", "12.5", or "25%"). Although the specification restricts the value for this attribute to an "integer representing a length in pixels", the DTD only defines it to be CDATA, which actually allows any value. Only a specialized program could capture the complete specification of HTML 4.0.
Nevertheless, this type of validation is still highly recommended since it permits the detection of a large set of errors that make documents invalid.
This catalog includes the override directive to ensure that processing software such as nsgmls uses public identifiers in preference to system identifiers. This means that users do not have to be connected to the Web when retrieving URL-based system identifiers.
OVERRIDE YES PUBLIC "-//W3C//DTD HTML 4.0//EN" HTML4-strict.dtd PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" HTML4-loose.dtd PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN" HTML4-frameset.dtd PUBLIC "-//W3C//ENTITIES Latin1//EN//HTML" ISOlat1.ent PUBLIC "-//W3C//ENTITIES Special//EN//HTML" HTMLmisc.ent PUBLIC "-//W3C//ENTITIES Symbols//EN//HTML" HTMLsym.ent