The Structure of HTML 3.0 Documents

In HTML documents, tags define the start and end of headings, paragraphs, lists, character highlighting and links etc. Most HTML elements are identified in a document as a start tag, which gives the element name and attributes, followed by the content, followed by the end tag. Start tags are delimited by < and >, while end tags are delimited by </ and >. For example:

    <title>This is a Title</title>
    <h1>This is a Heading</h1>
    <P>This is a paragraph.

Every HTML document as a minimum must have a title. To identify the document as being HTML 3.0, it is recommended that documents start with the prologue:

    <!doctype HTML public "-//W3O//DTD W3 HTML 3.0//EN">

When absent, this prologue is implied by the MIME content type for HTML 3.0 together with the associated version parameter.


Document Structure

HTML 3.0 documents formally have the following structure:

    <HTML>
    <HEAD> head elements ...
    <BODY> body elements ...
    </HTML>

In most cases, the HTML, HEAD and BODY tags can be safely omitted. Note that the formal syntax of HTML 3.0 is defined by the document type definition, which is included as an appendix of this specification. The details of the HEAD and BODY elements will be described in subsequent sections.

The permitted syntax of HTML 3.0 compliant documents is specified by the DTD. This includes the content model for each element, defining what markup is permitted within each element. The DTD uses SGML entities in content models to express regular features of HTML 3.0, for example %body.content defines what markup is permitted within the BODY element. A number of other elements also share this content model, e.g. BQ, DIV, FORM, TH and TD.

The description of each tag includes the content model and the permitted context (which elements can contain this tag). Where practical, these properties are given with the same entity names as used in the DTD, and should help the newcomer to get to grips with understanding the DTD itself. For example, the description of the NOTE element starts with:

    The NOTE element

    Permitted context: %block
    Content model: %flow
This says that the NOTE element (used for admonishments such as notes, cautions and errors) can occur in any element which includes %block in its content model. Similarly, any element with %flow as part of its permitted context can occur within a NOTE element.

The HTML element

This has three attributes:

VERSION
This is fixed by the DTD as the string "-//W3O//DTD W3 HTML 3.0//EN"
URN
The universal resource name for the document (optional)
ROLE
An optional space separated list of SGML NAME tokens that define the role this document plays, e.g. table of contents. The conventions for these names are outside the scope of this specification. wouldn't it be better to leave this to a link to a URC?
Note that both the start and end tag for the HTML element can be omitted.