8 The global structure of an HTML document

Contents

  1. HTML version information
  2. The HTML element
  3. The HEAD element
    1. Titles: the TITLE element and the title attribute
    2. Meta information
  4. The BODY element
    1. Element identifiers: the id and class attributes
    2. Grouping elements: the DIV and SPAN elements
    3. Headings: The H1, H2, H3, H4, H5, H6 elements
    4. The ADDRESS element

An HTML 4.0 document generally consists of three parts: a line containing version information, a descriptive header section, and a body, which contains the document's actual content.

8.1 HTML version information

The first line of an HTML 4.0 document should be the document type declaration. This includes an identifier string naming the DTD used by the document (see [GOLD90]). We recommend that authors use one of the following declarations:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">

For documents following the strict HTML 4.0 DTD, which excludes presentation attributes and elements that W3C expects to phase out as support for style sheets matures. If you need these features please use the transitional DTD.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

For documents following the HTML 4.0 "transitional" DTD, which includes presentation attributes and elements that W3C expects to phase out as support for style sheets matures.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN">

For documents conforming to the HTML 4.0 frameset DTD, where a FRAMESET element replaces the BODY element.

The binding between public identifiers and files can be specified using a catalog file following the format recommended by the SGML Open Consortium (see [SGMLOPEN]). A sample catalog file for HTML 4.0 is included at the beginning of the section on SGML reference information for HTML. The last two letters of the declaration indicate the language of the DTD. For HTML, this is always English ("EN"). User agents may ignore this information.

Some user agents also support the use of a URL in a system identifier, where the URL references the DTD, as in:

<!DOCTYPE HTML SYSTEM "http://www.w3.org/DTD/HTML4/strict.dtd">

or better still together with the public identifier as in:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/DTD/HTML4-strict.dtd">

The user agent is then able to download the DTD and entity sets as needed. The following URLs are supported in relation to this specification:

Notes:

  1. The HTML 3.2 specification proposed the use of the terms "Draft" or "Final" in the public identifier to distinguish between the draft and final recommendation. Experience suggests that these terms were ineffective so they are not being proposed for HTML 4.0.
  2. We recommend that authors avoid using the DTD subset due to the lack of widespread support for this feature.

8.2 The HTML element

<!ENTITY % version "version CDATA #FIXED '%HTML.Version;'">

<![ %HTML.Frameset; [
<!ENTITY % html.content "HEAD, FRAMESET">
]]>

<!ENTITY % html.content "HEAD, BODY">

<!ELEMENT HTML O O (%html.content;) -- document root element -->
<!ATTLIST HTML
  %version;
  %i18n;                           -- lang, dir --
  >

Start tag: optional, End tag: optional

Attribute definitions
version = url
This attribute specifies (with a URL) the location of the DTD for the version of HTML governing the current document. Since the same information must appear in the DOCTYPE header, the usefulness of this attribute is uncertain.

Attributes defined elsewhere

After version information, the remainder of an HTML document is contained by the HTML element. Thus, a typical HTML document has this structure:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
<HTML>
...The head, body, etc. goes here...
</HTML>

8.3 The HEAD element

<!-- %head.misc; defined earlier on as "SCRIPT | STYLE | META | LINK" -->
<!ENTITY % head.content "TITLE & ISINDEX? & BASE?">

<!ELEMENT HEAD O O  (%head.content;) +(%head.misc;) -- document head -->
<!ATTLIST HEAD
  %i18n;                           -- lang, dir --
  profile     %URL;      #IMPLIED  -- named dictionary of meta info --
  >

Start tag: optional, End tag: optional

Attribute definitions

profile = url
This attribute specifies the location of one or more meta data profiles, separated by white space. For future extensions, user agents should consider the value to be a list even though this specification only considers the first URL to be significant. Profiles are discussed below in the section on meta information.

Attributes defined elsewhere

The HEAD element contains information about the current document, such as its title, keywords that may be useful to search engines, and other data that is not considered document content. Elements within the HEAD declaration must not be rendered by conforming user agents unless otherwise specified.

8.3.1 Titles: the TITLE element and the title attribute

<!-- The TITLE element is not considered part of the flow of text.
       It should be displayed, for example as the page header or
       window title. Exactly one title is required per document.
    -->
<!ELEMENT TITLE - -  (#PCDATA) -(%head.misc;) -- document title -->
<!ATTLIST TITLE %i18n>

Start tag: required, End tag: required

Attributes defined elsewhere

Every HTML document must have exactly one TITLE element in the HEAD section. User agents generally use the title to give people some idea about the document's contents, for example, by displaying the title as a caption, or speaking it.

For reasons of accessibility, user agents must always make the value of the TITLE element available to users (including TITLE elements that occur in frames). The mechanism for doing so depends on the user agent.

Titles may contain character entities (for accented characters, special characters, etc.), but may not contain other markup. Here is a sample document title:

<HTML>
<HEAD>
<TITLE>A study of population dynamics</TITLE>
... other head elements...
</HEAD>
<BODY>
... document body...
</BODY>
</HTML>

Related to the TITLE element is the title attribute.

Attribute definitions
title = cdata
This attribute offers advisory information about the element for which it is set.
Unlike the TITLE element, which provides information about an entire document and may only appear once, the title attribute may annotate any number of elements. Please consult an element's definition to verify that it supports this attribute. Values of the title attribute may be rendered by user agents in a variety of ways. For instance, visual browsers frequently display the title as a "tool tip" (a short message that appears when the pointing device pauses over an object). Audio user agents may speak the title information in a similar context. For example, setting the attribute on a link allows user agents (visual and non-visual) to tell users about the nature of the linked resource:
...some text...
Here's a photo of 
<A href="http://someplace.com/neatstuff.gif" title="Me scuba diving">
   me scuba diving last summer
</A>
...some more text...

The title attribute has an additional role when used with the LINK element to designate an external style sheet. Please consult the section on links and style sheets for details.

Note: To improve the quality of speech synthesis for cases handled poorly by standard techniques, future versions of HTML may include an attribute for encoding phonemic and prosodic information.

8.3.2 Meta information

As this specification is being written, a number of approaches are being proposed for allowing authors to assign richer machine-readable information about documents and other network-accessible resources to an HTML document.

The current HTML specification allows authors to assign meta data to their documents as follows:

Note that since a profile is defined for the HEAD element, the same profile applies to all META and LINK elements in the document head.

The META element 

<!ELEMENT META - O EMPTY        -- generic metainformation -->
<!ATTLIST META
  %i18n;                           -- lang, dir, for use with content string --
  http-equiv  NAME       #IMPLIED  -- HTTP response header name  --
  name        NAME       #IMPLIED  -- metainformation name --
  content     CDATA      #REQUIRED -- associated information --
  scheme      CDATA      #IMPLIED  -- select form of content --
  >

Start tag: required, End tag: forbidden

Attribute definitions

For the following attributes, the permitted values and their interpretation are profile dependent:

name = cdata
This attribute specifies a property name.
content = cdata
This attribute specifies a property's value.
scheme = cdata
This attribute names a scheme to be used to interpret the property's value.
http-equiv = cdata
This attribute may be used in place of the name attribute. HTTP servers use this attribute to gather information for HTTP response message headers.

Attributes defined elsewhere

The META element can be used to describe properties of a document (e.g., author, expiration date, a list of key words, etc.) and assign values to those properties. This specification does not define a normative set of properties.

The name attribute specifies a property and the content attribute specifies the property's value. For example, the following declaration sets a value for the Author property:

<META name="Author" content="Dave Raggett">

The lang attribute can be used with META to specify the language for the value of the content attribute. This enables speech synthesizers to apply language dependent pronunciation rules.

In this example, the author's name is declared to be French:

<META name="Author" lang="fr" content="Arnaud Le Hors">

Here's another example that illustrates how some user agents support the use of META to refresh the current page after a few seconds, perhaps replacing it with another page:

<META name="refresh" content="3,http://www.acme.com/intro.html">

The content is a number specifying the delay in seconds, followed by the URL to load when the time is up. This mechanism is generally used to show users a fleeting introductory page. However, since some user agents do not support this mechanism, authors should include content on the introductory page to allow users to navigate away from it (so they don't remain "stranded" on the introductory page).

META and HTTP headers 

The http-equiv attribute can be used in place of the name attribute and has a special significance when documents are retrieved via the Hypertext Transfer Protocol (HTTP). HTTP servers may use the property name specified by the http-equiv attribute to create an [RFC822]-style header in the HTTP response. Please see the HTTP specification ([RFC2068]) for details on valid HTTP headers.

The following sample META declaration:

<META http-equiv="Expires" content="Tue, 20 Aug 1996 14:25:27 GMT">

will result in the HTTP header:

Expires: Tue, 20 Aug 1996 14:25:27 GMT

This can be used by caches to determine when to fetch a fresh copy of the associated document.

META and search engines 

A common use for META is to specify keywords that a search engine may use to improve the quality of search results. When several META elements provide language-dependent information about a document, search engines may filter on the lang attribute to display search results using the language preferences of the user. For example,

<META name="keywords" lang="en" 
         content="vacation,Greece,sunshine">
<META name="keywords" lang="fr" 
         content="vacances,Gr&egrave;ce,soleil">

The effectiveness of search engines can also be increased by using the LINK element to specify links to translations of the document in other languages, links to versions of the document in other media (e.g., PDF), and, when the document is part of a collection, links to an appropriate starting point for browsing the collection.

META and PICS 
The Platform for Internet Content Selection [PICS] is an infrastructure for associating labels (meta data) with Internet content. Originally designed to help parents and teachers control what children can access on the Internet, it also facilitates other uses for labels, including code signing, privacy, and intellectual property rights management.

This example illustrates how one can use a META declaration to include a PICS 1.1 label:

<HEAD>
 <META http-equiv="PICS-Label" content='
 (PICS-1.1 "http://www.gcf.org/v2.5"
    labels on "1994.11.05T08:15-0500"
      until "1995.12.31T23:59-0000"
      for "http://w3.org/PICS/Overview.html"
    ratings (suds 0.5 density 0 color/hue 1))
 '>
</HEAD>
META and default information 

The META element may be used to specify the default information for a document in the following instances:

The following example specifies the character encoding for a document as being ISO-8859-5

<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-5"> 

Meta data profiles 

The profile attribute of the HEAD specifies the location of a meta data profile. The value of the profile attribute is a URL. User agents may use this URL in two ways:

This example refers to a hypothetical profile that defines useful properties for document indexing. The properties defined by this profile --- including "author", "copyright", "keywords", and "date" --- have their values set by subsequent META declarations.

 <HEAD profile="http://www.acme.com/profiles/core">
  <TITLE>How to complete Memorandum cover sheets</TITLE>
  <META name="author" content="John Doe">
  <META name="copyright" content="&copy; 1997 Acme Corp.">
  <META name="keywords" content="corporate,guidelines,cataloging">
  <META name="date" content="1994-11-06T08:49:37+00:00">
 </HEAD>

As this specification is being written, it is common practice to use the date formats described in [RFC2068]. As these formats are relatively hard to process, we recommend that authors use the [ISO8601] date format. For more information, see the sections on the INS and DEL elements.

The scheme attribute is used to identify the expected format of the value of the content attribute, for cases when a property supports multiple formats. The values permitted for the scheme attribute depend on the property name and the profile.

The first META declaration in the following example refers to the Dewey Decimal System (dds) scheme. The second refers to the ISBN scheme.

<META scheme="dds" name="description" 
         content="04.251 Supercomputers systems design">
<META scheme="ISBN"  name="identifier" content="0-8230-2355-9">

Note: One sample profile is the Dublin Core[DCORE]. This profile defines a set of recommended properties for electronic bibliographic descriptions, and is intended to promote interoperability among disparate description models.

8.4 The BODY element

<!ENTITY % block "%blocklevel; | %inline;">

<!ENTITY % Color "CDATA" -- a color using sRGB: #RRGGBB as Hex values -->

<!-- There are also 16 widely known color names with their sRGB values:

    Black  = #000000    Green  = #008000
    Silver = #C0C0C0    Lime   = #00FF00
    Gray   = #808080    Olive  = #808000
    White  = #FFFFFF    Yellow = #FFFF00
    Maroon = #800000    Navy   = #000080
    Red    = #FF0000    Blue   = #0000FF
    Purple = #800080    Teal   = #008080
    Fuchsia= #FF00FF    Aqua   = #00FFFF
 -->

<!ENTITY % bodycolors "
  bgcolor %Color;   #IMPLIED
  text    %Color;   #IMPLIED
  link    %Color;   #IMPLIED
  vlink   %Color;   #IMPLIED
  alink   %Color;   #IMPLIED
  ">

<!ELEMENT BODY O O  (%block;)+ +(INS|DEL) -- document body -->
<!ATTLIST BODY
  %attrs;                          -- %coreattrs, %i18n, %events --
  background  %URL;      #IMPLIED  -- texture tile for document background --
  %bodycolors;                     -- bgcolor, text, link, vlink, alink --
  onload      %Script;   #IMPLIED  -- the document has been loaded --
  onunload    %Script;   #IMPLIED  -- the document has been removed --
  >

Start tag: optional, End tag: optional

Attribute definitions

background = url
Deprecated.The value of this attribute is a URL that designates an image resource. The image generally tiles the background (for visual browsers).
text =color
Deprecated.This attribute sets the foreground color for text (for visual browsers).
link = color
Deprecated.This attribute sets the color of text marking unvisited hypertext links (for visual browsers).
vlink = color
Deprecated.This attribute sets the color of text marking visited hypertext links (for visual browsers).
alink = color
Deprecated.This attribute sets the color of text marking hypertext links when selected by the user (for visual browsers).

Attributes defined elsewhere

The body of a document contains the document's content. The content may be presented by a user agent in a variety of ways. For example, for visual browsers, you can think of the body as a canvas where the content appears: text, images, colors, graphics, etc. For audio user agents, the same content may be spoken. Since style sheets are now the preferred way to specify a document's presentation, the presentational attributes of BODY have been deprecated.

The following line of HTML illustrates the use of the deprecated attributes. It sets the background color of the canvas to white, the text foreground color to black, and the color of hyperlinks to red initially, fuchsia when activated, and maroon once visited.

DEPRECATED EXAMPLE:

<HTML>
<HEAD>
 <TITLE>A study of population dynamics</TITLE>
</HEAD>
<BODY bgcolor="white" text="black"
  link="red" alink="fuchsia" vlink="maroon">
  ... document body...
</BODY>
</HTML>

Using style sheets, the same effect could be accomplished as follows:

<HTML>
<HEAD>
 <TITLE>A study of population dynamics</TITLE>
 <STYLE type="text/css">
  BODY { background: white; color: black}
  A:link { color: red }
  A:visited { color: maroon }
  A:active { color: fuchsia }
 </STYLE>
</HEAD>
<BODY>
  ... document body...
</BODY>
</HTML>

Using external (linked) style sheets gives you the flexibility to change the presentation without revising the source HTML document:

<HTML>
<HEAD>
 <TITLE>A study of population dynamics</TITLE>
 <LINK rel="stylesheet" type="text/css" href="smartstyle.css">
</HEAD>
<BODY>
  ... document body...
</BODY>
</HTML>

Framesets and HTML bodies. Documents that contain framesets replace the BODY element by the FRAMESET element. Please consult the section on frames for more information.

8.4.1 Element identifiers: the id and class attributes

Attribute definitions

id = name
This attribute assigns a document-wide name to a specific instance of an element. Values for id must be unique within a document. Furthermore, this attribute shares the same name space as the name attribute.
class = cdata-list
This attribute assigns a class or set of classes to a specific instance of an element. Any number of elements may be assigned the same class name or names. They must be separated by white space characters.

The id and class attributes assign identifiers to an element instance.

An identifier specified by id must be unique within a document. A class name specified by class may be shared by several element instances. Class values should be chosen to distinguish the role of the element the class is associated with, e.g., note, example, warning.

These attributes can be used in the following ways:

Almost every HTML element may be assigned identifier and class information.

Suppose, for example, that we are writing a document about a programming language. The document is to include a number of preformatted examples. We use the PRE element to format the examples. We also assign a background color (green) to all instances of the PRE element belonging to the class "example".

<HEAD>
<STYLE
PRE.example { background : green }
</STYLE
</HEAD>
<BODY>
<PRE class="example" id="example-1">
...example code here...
</PRE>
</BODY>

By setting the id attribute for this example, we can (1) create a hyperlink to it and (2) override class style information with instance style information.

8.4.2 Grouping elements: the DIV and SPAN elements

<!ELEMENT DIV - - (%block;)+ -- generic language/style container -->
<!ATTLIST DIV
  %attrs;                          -- %coreattrs, %i18n, %events --
  %align;                          -- align, text alignment --
  >
<!ELEMENT SPAN - - (%inline;)*      -- generic language/style container -->
<!ATTLIST SPAN
  %attrs;                          -- %coreattrs, %i18n, %events --
  >

Start tag: required, End tag: required

Attributes defined elsewhere

The DIV and SPAN elements, in conjunction with the id and class attributes, offer a generic mechanism for adding structure to documents. These are the only two HTML elements that do not add presentation to their enclosed content. Thus, by creating instances and classes of elements and applying style sheets to them, authors may specialize HTML according to their needs and tastes.

Suppose we wanted to generate a document from a database of client information. Since HTML does not include elements that identify objects such as "client", "telephone number", "email address", etc., we use DIV and SPAN to tailor HTML to our own needs.

In this example, every client's last name belongs to the class "client-last-name", etc. We also assign a unique identifier to each client ("client-boyera", "client-lafon", etc.).

<DIV id="client-boyera" class="client">
Last name: <SPAN class="client-last-name">Boyera</SPAN>,
First name: <SPAN class="client-first-name">Stephane</SPAN>,
Tel: <SPAN class="client-tel">(212) 555-1212</SPAN>,
Email: <SPAN class="client-email">sb@foo.org:</SPAN>,
</DIV>

<DIV id="client-lafon" class="client">
Last name: <SPAN class="client-last-name">Lafon</SPAN>,
First name: <SPAN class="client-first-name">Yves</SPAN>,
Tel: <SPAN class="client-tel">(671) 555-1212</SPAN>,
Email: <SPAN class="client-email">yves@bar.com:</SPAN>,
</DIV>

Later, we may easily add style sheet declaration to fine tune the presentation of these database entries.

SPAN is an inline element and can be used within paragraphs, list items, etc. when you want assign class or language information to a group of words. SPAN cannot be used to group block-level elements. SPAN has no inherent effect on rendering until you apply a style, e.g., via a style attribute, or a linked style sheet.

DIV by contrast, is a block-level element. It can be used to group other block-level elements, but can't be used within paragraph elements. A DIV element following an unclosed P element will terminate that paragraph.

User agents generally place a line break before and after DIV elements, for instance:

<P>aaaaaaaaa<DIV>bbbbbbbbb</DIV><DIV>ccccc<P>ccccc</DIV>

This is typically rendered as:

aaaaaaaaa
bbbbbbbbb
ccccc

ccccc

Your user agent renders this as follows:

aaaaaaaaa

bbbbbbbbb
ccccc

ccccc

8.4.3 Headings: The H1, H2, H3, H4, H5, H6 elements

<!ENTITY % heading "H1|H2|H3|H4|H5|H6">
<!--
  There are six levels of headings from H1 (the most important)
  to H6 (the least important).
-->

<!ELEMENT (%heading;)  - -  (%inline;)* -- heading -->
<!ATTLIST (%heading;)
  %attrs;                          -- %coreattrs, %i18n, %events --
  %align;                          -- align, text alignment --
  >

Start tag: required, End tag: required

Attributes defined elsewhere

A heading element briefly describes the topic of the section it introduces. Heading information may be used by user agents, for example, to construct a table of contents for a document automatically.

There are six levels of headings in HTML with H1 as the most important and H6 as the least. Visual browsers usually render more important headings in larger fonts than less important ones.

The following example shows how to use the DIV element to associate a heading with the document section that follows it. Doing so allows you to define a style for the section (color the background, set the font, etc.) with style sheets.

<DIV class="section" id="forest-elephants" >
<H1>Forest elephants</H1>
In this section, we discuss the lesser known forest elephants.
...this section continues...
<DIV class="subsection" id="forest-habitat" >
<H2>Habitat</H2>
Forest elephants do not live in trees but among them.
...this subsection continues...
</DIV>
</DIV>

This structure may be decorated with style information such as:

<HEAD>
<STYLE>
DIV.section { text-align: justify; font-size: 12pt}
DIV.subsection { text-indent: 2em }
H1 { font-style: italic; color: green }
H2 { color: green }
</STYLE>
</HEAD>

Numbered sections and references
HTML does not itself cause section numbers to be generated from headings. This facility may be offered by user agents, however. Soon, style sheet languages such as CSS will allow authors to control the generation of section numbers (handy for forward references in printed documents, as in "See section 7.2").

Some people consider skipping heading levels to be bad practice. They accept H1 H2 H1 while they do not accept H1 H3 H1 since the heading level H2 is skipped.

8.4.4 The ADDRESS element

<!ELEMENT ADDRESS - - ((%inline;) | P)+ -- information on author -->
<!ATTLIST ADDRESS
  %attrs;                          -- %coreattrs, %i18n, %events --
  >

Start tag: required, End tag: required

Attributes defined elsewhere

For lack of a better place, we include the definition of the ADDRESS here. This element adds author and contact information to a document, e.g.,

<ADDRESS>
Newsletter editor<BR>
J. R. Brown<BR>
8723 Buena Vista, Smallville, CT 01234<BR>
Tel: +1 (123) 456 7890
</ADDRESS>