Hypertext Markup Language (HTML)

An Application Conforming to International Standard ISO 8879 -- Standard Generalized Markup Language

About of this Document

This document describes the current practice and current proposals for future standardisation of HTML, as a basis for review and enhancement.

The document is a draft form of a standard for interchange of information on the network which is proposed to be registered as a MIME (RFC1521) content type.

Please send comments to connolly@hal.com or the discussion list www-html@info.cern.ch.

Version

This is version 2.0 of this document. It introduces forms for user input of information. This feature is known as a level 2 feature of HTML. All other specified features are known as level 1 features. Features of higher levels which are under discussion, (such as tables, figures, and mathematical formulae) where mentioned are described as "proposed".

The latest version of this document is currently available in hypertext on the World-Wide Web as http://www.w3.org/hypertext/WWW/MarkUp/HTML.html

Abstract

HyperText Markup Language (HTML) can be used to represent

Hypertext news, mail, online documentation, and collaborative hypermedia;
Menus of options;
Database query results;
Simple structured documents with inlined graphics.
Hypertext views of existing bodies of information

The World Wide Web (W3) initiative links related information throughout the globe. HTML provides one simple format for providing linked information, and all W3 compatible programs are required to be capable of handling HTML. W3 uses an Internet protocol (Hypertext Transfer Protocol, HTTP), which allows transfer representations to be negotiated between client and server, the result being returned in an extended MIME message. HTML is therefore just one, but an important one, of the representations used with W3.

HTML is proposed as a MIME content type.

HTML refers to the URI specification RFCxxxx.

Implementations of HTML parsers and generators can be found in the various W3 servers and browsers, in the public domain W3 code, and may also be built using various public domain SGML parsers such as [SGMLS] . HTML documents are SGML documents with fairly generic semantics appropriate for representing information from a wide range of applications.

Status of this memo

This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts.

Internet Drafts are working documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress".

Distribution of this document is unlimited.

Vocabulary

This specification uses the words below with the precise meaning given.

Representation: The encoding of information for interchange. For example, HTML is a representation of hypertext.
Rendering: The form of presentation to information to the human reader.

Imperatives

may: The implementation is not obliged to follow this in any way.
must: If this is not followed, the implementation does not conform to this specification.
shall: as "must"
should: If this is not followed, though the implementation officially conforms to the standard, undesirable results may occur in practice.
typical: Typical rendering is described for many elements. This is not a mandatory part of the standard but is given as guidance for designers and to help explain the uses for which the elements were intended.

Notes

Sections marked "Note:" are not mandatory parts of the specification but for guidance only.

Status of features

Mandatory: These features must be implemented in the rendering. Features are mandatory unless otherwise mentioned.
Optional: Standard HTML features which may safely be ignored by parsers. It is legal to ignore these, treat the contents as though the tags were not there. (e.g. EM, and processing instructions) . Authors should be awarethat these features may be ignored by some applications.
Proposed: The specification of these features is not final. They should not be regarded as part ofthe standard, but indicate possible directions for future versions.
Obsolete: Not standard HTML. Parsers should implement these features as far as possible in order to preserve back-compatibility with previous versions of this specification.

HTML and MIME

The definition of the HTML content subtype is

MIME Type name: text
MIME subtype name:: html
Required parameters:: none
Optional parameters:: level, version, charset

Level

The level parameter specifies the feature set which is used in the document. The level is an integer number, implying that any features of same or lower level may be present in the document. Levels are defined by this specification.

Version

In order to help avoid future compatibility problems, the version parameter may be used to give the version number of this specification to which the document conforms. The version number appears at the front of this document and within public identifier for the SGML DTD.

Character sets

The base character set (the SGML BASESET) for HTML is ISO Latin-1. This is the set referred to by any numeric character references . The actual character set used in the representation of an HTML document may be ISO Latin 1, or its 7-bit subset which is ASCII. There is no obligation for an HTML document to contain any characters above decimal 127. It is possible that a transport medium such as electronic mail imposes constraints on the number of bits in a representation of a document, though the HTTP access protocol used by W3 always allows 8 bit transfer.

When an HTML document is encoded using 7-bit characters, then the mechanisms of character references and entity references may be used to encode characters in the upper half of the ISO Latin-1 set. In this way, documents may be prepared which are suitable for mailing through 7-bit limited systems.

Character set option (proposed)

The SGML declaration specified ISO Latin 1 as the base character set. The charset parameter is reserved for future use. Its intended significance is to override the base character set of the SGML declaration. Support of character sets other than ISO-Latin-1 is not a requirement for conformance with this specification.

HTML and SGML

This section describes the relationship between HTML and SGML, and guides the newcomer through interpretation of the DTD . (This is not a full tutorial on SGML, and in the event of any apparent conflict, the SGML standard is definitive.)

The HyperText Markup Language is an application conforming to International Standard ISO 8879 -- Standard Generalized Markup Language [ SGML ]. SGML is a system for defining structured document types, and markup languages to represent instances of those document types.

Every SGML document has three parts:

An SGML declaration, which binds SGML processing quantities and syntax token names to specific values. For example, the SGML declaration in the HTML DTD specifies that the string that opens a tag is </ and the maximum length of a name is 34 characters.
A prologue including one or more document type declarations, which specifiy the element types, element relationships and attributes, and references that can be represented by markup. The HTML DTD specifies, for example, that the HEAD element contains at most one TITLE element.
An instance, which contains the data and markup of the document.

We use the term HTML to mean both the document type and the markup language for representing instances of that document type.

The SGML declaration for HTML is given in the appendix ``SGML Delcaration for HTML.'' It is implicit among WWW implementations.

The prologue for an HTML document should look like:

      <!DOCTYPE HTML PUBLIC "-//W3 Organization//DTD W3 HTML 2.0//EN">

NOTE 2:: Many extant HTML documents do not contain a prologue. Implementations are encouraged to infer the above prologue if the document does not begin with <! .

Structured Text

An HTML instance is like a text file, except that some of the characters are interpreted as markup. The markup gives structure to the document.

The instance represents a hierarchy of elements. Each element has a name , some attributes , and some content. Most elements are represented in the document as a start tag, which gives the name and attributes, followed by the content, followed by the end tag. For example:

	<!DOCTYPE HTML PUBLIC
	 	"-//W3 Organization//DTD W3 HTML 2.0//EN">
	<HTML>
	  <HEAD>
	    <TITLE>
	      A sample HTML document
	    </TITLE>
	  </HEAD>

	  <BODY>
	    <H1>
	      An Example of Structure
	      <br>
	      In HTML
	    </H1>
	    <P>
	      Here's a typical paragraph.
	    <UL>
	      <LI>
	        Item one has an
	        <A NAME="anchor">
	          anchor
	        </A>
	      <LI>
	        Here's item two.
	    </UL>
	  </BODY>
	</HTML>

Some elements (e.g. BR ) are empty. They have no content. They show up as just a start tag.

For the rest of the elements, the content is a sequence of data characters and nested elements. Some things such as forms and anchors cannot be nested, in which case this is mentioned in the text. Anchors and character highlighting may be put inside other constructs.

Undefined tag and attribute names

It is a principle to be conservative in that which one produces, and liberal in that which one accepts. HTML parsers should be liberal except when verifying code. HTML generators should generate strictly conforming HTML.

The behaviour of WWW applications reading HTML documents and discovering tag or attribute names which they do not understand should be to behave as though, in the case of a tag, the whole tag had not been there but its content had, or in the case of an attribute, that the attribute had not been present.

Character Data

The charcters between the tags represent text in the ISO-Latin-1 character set, which is a superset of ASCII. Because certain characters will be interpreted as markup, they should be "escaped"; that is, represented by markup -- entity or numeric character references. For example:

                When a&#60;b, we can show that...
                Brought to you by AT&amp;T

The HTML DTD includes entities for each of the non-ASCII characters so that one may reference them by name if it is inconvenient to enter them directly:

           Kurt G&ouml;del was a famous logician and mathematician.

NOTE 1:: To ensure that a string of characters has no markup, it is sufficient to represent all occurrences of < , > , and & by character or entity references.
NOTE 2:: There are SGML features ( CDATA , RCDATA ) to allow most < , > , and & characters to be entered without the use of entity or character references. Because these features tend to be used and implemented inconsistently, and because they require 8-bit characters to represent non-ASCII characters, they are not employed in this version of the HTML DTD. An earlier HTML specification included an XMP element whose syntax is not expressible in SGML. Inside the XMP , no markup was recognized except the </XMP> end tag. While implementations are encouraged to support this idiom, its use is obsolete.

Comments

To include comments in an HTML document that will be ignored by the parser, surround them with . After the comment delimiter, all text up to the next occurrence of -- is ignored. Hence comments cannot be nested. Whitespace is allowed between the closing -- and >. (But not between the opening <! and --.)

For example:

<HEAD>
<TITLE>HTML Guide: Recommended Usage</TITLE>
<!-- Id: Text.html,v 1.6 1994/04/25 17:33:48 connolly Exp -->
</HEAD>

Note 3:: Some historical implementations incorrectly consider a > sign to terminate a comment.

HTML Elements

This is a discussion of the elements in the HTML language, and how they interact to represent documents.

The HTML Document Element

An HTML document is organized as a HEAD and a BODY, much like memo or a mail message:

The HEAD element is an small unordered collection of information about the document, whereas the BODY is an ordered sequence of information elements of arbitrary length. This organization allows an implementation to determine certain properties of a document -- the title, for example -- without parsing the entire document.

Information in the HEAD Element

TITLE: The title of the document
ISINDEX: Sent by a server in a searchable document
NEXTID: A parameter used by editors to generate unique identifiers
LINK: Relationship between this document and another. See also the Anchor element , Relationships . A document may have many LINK elements.
BASE: A record of the URL of the document when saved

Proposed head elements

EXPIRES: The date after which the document is invalid. Semantics as in the HTTP specification.

Obsolete head elements

META: A wrapper for an HTTP element

Body Elements (level 1)

The order of the contents of the BODY element should be preserved when it is rendered on the output device.

Hypertext Anchors

Anchors: Sections of text which form the beginning and/or end of hypertext links are called "anchors" and defined by the A tag.

Block Elements

These elements typically stack vertically in the rendered flow of text. Whitespace between them is ignored.

Headings: Several levels of heading are supported.
Paragraph: The P element represents a paragraph.
Horizontal Rule: A horizontal dividing line
Address style: Used to represent authorship or status of a document
Blockquote style: A block of text quoted from another source.
Lists: Bulleted lists, glossaries, etc.
Preformatted text: Sections in fixed-width font for preformatted text.

Inline Elements

These elements fall left to right in the rendered flow of text. Whitespace between them separates words, except in the PRE element, where it has its literal ASCII meaning.

Special Phrases: Emphasis, typographic distinctions, etc.
Line Breaks: Indicates a line break in a flow of text.
IMG: The IMG tag allows inline graphics.

Body elements (level 2)

Elements for forms

The FORM element and various other elements allowed only within it describe forms which allow user input.

FORM elements: FORM, INPUT, SELECT, OPTION, TEXTAREA, etc

Obsolete elements

The other elements are obsolete but should be recognised by parsers for back-compatibility.

HEAD

The HEAD element contains all information about the document in general. It does not contain any text which is part of the document: this is in the BODY . Within the head element, only certain elements are allowed.

BODY

The BODY element contains all the information which is part of the document, as opposed information about the document which is in the HEAD .

The elements within the BODY element are in the order in which they should be presented to the reader.

See the list of things which are allowed within a BODY element .

Anchors

An anchor is a piece of text which marks the beginning and/or the end of a hypertext link.

The text between the opening tag and the closing tag is either the start or destination (or both) of a link. Attributes of the anchor tag are as follows.

HREF: OPTIONAL. If the HREF attribute is present, the anchor is sensitive text: the start of a link. If the reader selects this text, (s)he should be presented with another document whose network address is defined by the value of the HREF attribute . The format of the network address is specified elsewhere . This allows for the form HREF="#identifier" to refer to another anchor in the same document. If the anchor is in another document, the attribute is a relative name , relative to the documents address (or specified base address if any). @@NOTE:; This refers to the URI specification, which does not cover relative addresses. There is no specification of how to distinguish relative addresses from absolute addresses.
NAME: OPTIONAL. If present, the attribute NAME allows the anchor to be the destination of a link. The value of the attribute is an identifier for the anchor. Identifiers are arbitrary strings but must be unique within the HTML document. Another document can then make a reference explicitly to this anchor by putting the identifier after the address, separated by a hash sign . @@NOTE:; This feature is representable in SGML as an ID attribute, if we restrict the identifiers to be SGML names .
REL: OPTIONAL. An attribute REL may give the relationship (s) described by the hypertext link. The value is a comma-separated list of relationship values. Values and their semantics will be registered by the HTML registration authority . The default relationship if none other is given is void. REL should not be present unless HREF is present. See Relationship values , REV .
REV: OPTIONAL. The same as REL , but the semantics of the link type are in the reverse direction. A link from A to B with REL="X" expresses the same relationship as a link from B to A with REV="X". An anchor may have both REL and REV attributes.
URN: OPTIONAL. If present, this specifies a uniform resource number for the document. See note .
TITLE: OPTIONAL. This is informational only. If present the value of this field should equal the value of the TITLE of the document whose address is given by the HREF attribute. See note .
METHODS: OPTIONAL. The value of this field is a string which if present must be a comma separated list of HTTP METHODS supported by the object for public use. See note .

All attributes are optional, although one of NAME and HREF is necessary for the anchor to be useful. See also: LINK .

Example of use:

	See <A HREF="http://www.w3.org/">CERN</A>'s information for
	more details.

	A <A NAME=serious>serious</A> crime is one which is associated
	with imprisonment. 
			...
	The Organization may refuse employment to anyone convicted
	of a <a href="#serious">serious</A> crime.

Address

This element is for address information, signatures, authorship, etc, often at the top or bottom of a document.

Typical rendering

Typically, an address element is italic and/or right justified or indented. The address element implies a paragraph break. Paragraph marks within the address element do not cause extra white space to be inserted.

Examples of use:

		<ADDRESS><A HREF="Author.html">A.N.Other</A></ADDRESS>


		<ADDRESS>
		Newsletter editor<p>
		J.R. Brown<p>
		JimquickPost News, Jumquick, CT 01234<p>
		Tel (123) 456 7890
		</ADDRESS>

BASE

This element allows the URL of the document itself to be recorded in situations in which the document may be read out of context. URLs within the document may be in a "partial" form relative to this base address.

Where the base address is not specified, the reader will use the URL it used to access the document to resolve any relative URLs.

The one attribute is:

HREF: the URL

Line Break

The line break element marks that a new line must be started at the given point.

Typical rendering

A new line with indent the same as that of line-wrapped text.

Examples

		<ADDRESS>Tim Berners-Lee<BR>
		World Wide Web project<BR>
		CERN<BR>1211 Geneva 23<BR>Switzerland
		</ADDRESS>

		I think that I shall never see<BR>
		A hoarding lovely as a tree<BR>
		In fact, unless the hoardings fall<BR>
		I'll never see a tree at all.<P>

BLOCKQUOTE

The BLOCKQUOTE element allows text quoted from another source to be rendered specially.

Typical rendering

A typical rendering might be a slight extra left and right indent, and/or italic font. BLOCKQUOTE causes a paragraph break, and typically a line or so of white space will be allowed between it and any text before or after it.

Single-font rendition may for example put a vertical line of ">" characters down the left margin to indicate quotation in the Internet mail style.

Example

I think it ends
<BLOCKQUOTE>Soft you now, the fair Ophelia. Nymph, in thy orisons, 
be all my sins remembered.
</BLOCKQUOTE>
but I am not sure.

Fill-out Forms and Input fields

Forms are composed by placing input fields within paragraphs, preformatted/literal text, lists and tables. This gives considerable scope in designing the layout of forms. The form features use the following elements which are all known as HTML level 2 elements.

FORM: a form within a document.
INPUT: one input field
TEXTAREA: a multline input field
SELECT: A selection from a finite set of options
OPTION: one option within a SELECT

Each field is defined by an INPUT element and must have an NAME attribute which uniquely names the field in the document. Additional optional attributes can be used to specify the type of the field (defaults to free text), its size/precision, its initial value and whether the field is currently disabled or in error:

<FORM ACTION="mailto:www_admin@info.cern.ch">
<MH HIDDEN>Subject: WWW Questionaire</MH>
Please help up to improve the World Wide Web by filling in the
following questionaire:
<P>Your organization? <INPUT NAME="org" SIZE="48">
<P>Commercial? <INPUT NAME="commerce" TYPE=checkbox>
How many users? <INPUT NAME="users" TYPE=int>
<P>Which browsers do you use?
<UL>
<LI>X Mosaic <INPUT NAME="browsers" TYPE=checkbox VALUE="xmosaic">
<LI>Cello <INPUT NAME="browsers" TYPE=checkbox VALUE="cello">
<LI>Others <TEXTAREA NAME="others" COLS=48 ROWS=4></TEXTAREA>
</UL>
A contact point for your site: <INPUT NAME="contact" SIZE="42">
<P>Many thanks on behalf of the WWW central support team.
<P ALIGN=CENTER><INPUT TYPE=submit> <INPUT TYPE=reset>
</FORM>

This fictitious example is a questionnaire that will be emailed to www_admin@info.cern.ch . The FORM element is used to delimit the form. There can be several forms in a single document, but the FORM element can't be nested. The ACTION attribute specifies a URL that designates an HTTP server or an email address. If missing, the URL for the document itself will be assumed. The effect of the action can be modified by including a method prefix, e.g. ACTION="POST http://...." . This prefix is used to select the HTTP method when sending the form's contents to an HTTP server. Would it be cleaner to use a separate attribute, e.g. METHOD ?

Servers can disable forms by sending an appropriate header or by an attribute on the optional HTMLPLUS element at the very start of the document, e.g. <htmlplus forms=off> .

Here, the <P> and <UL> elements have been used to lay out the text (and input fields. The browser has changed the background color within the FORM element to distinguish the form from other parts of the document. The browser is responsible for handling the input focus, i.e. which field will currently get keyboard input.

For many platforms there will be existing conventions for forms, e.g. and shift- keys to move the keyboard focus forwards and backwards between fields, while an key submits the form. In the example, the and buttons are specified explicitly with special purpose fields. The button is used to email the form or send its contents to the server as specified by the ACTION attribute, while the button resets the fields to their initial values. When the form consists of a single text field, it may be appropriate to leave such buttons out and rely on the key.

The INPUT element is used for a large variety of typed of input fields.

When you need to let users enter more than one line of text, you should use the TEXTAREA element.

The RADIO and CHECKBOX types of INPUT field can be used to specify multiple choice forms in which every alternative is visible as part of the form. An alternative is to use the SELECT element which is generally rendered in a more compact fashion as a pull down combo list.

FORM

The FORM element is used to delimit the form . There can be several forms in a single document, but the FORM element can't be nested.

The ACTION attribute specifies a URL that designates an HTTP server or an email address. If missing, the URL for the document itself will be assumed. The effect of the action can be modified by including a method prefix, e.g. ACTION="POST http://...." . This prefix is used to select the HTTP method when sending the form's contents to an HTTP server. Would it be cleaner to use a separate attribute, e.g. METHOD ?

INPUT

The INPUT element represents a field whose contents may be edited by the user. It has the following attributes.

NAME: Symbolic name used when transferring the form's contents. This attribute is always needed and should uniquely identify this field.
TYPE: Defines the type of data the field accepts. Defaults to free text.
SIZE: Specifies the size or precision of the field according to its type.
MAXLENGTH: The maximum number of characters that will be accepted as input. This can be greater that specified by SIZE , in which case the field will scroll appropriately. The default is unlimited.
VALUE: The initial value for the field, or the value when checked for checkboxes and radio buttons. This attribute is required for radio buttons.
SRC: A URL or URN specifying an image - for use only with TYPE=IMAGEMAP.
ALIGN: Vertical alignment of the image - for use only with TYPE=IMAGEMAP.

Propsed

CHECKED: When present indicates that a checkbox or radio button is selected.
DISABLED: When present indicates that this field is temporarily disabled. Browsers should show this by "greying it" out in some manner.
ERROR: When present indicates that the field's initial value is in error in some way, e.g. because it is inconsistent with the values of other fields. Servers should include an explanatory error message with the form's text.

Types

The following types of fields can be defined with the TYPE attribute :

TEXT: Single line text entry fields. Use the SIZE attribute to specify the visible width in characters, e.g. SIZE="24" for a 24 character field. The MAX attribute can be used to specify an upper limit to the number of characters that can be entered into a text field, e.g. MAX=72 . Use the TEXTAREA element for text fields which can accept multiple lines (see below).
HIDDEN: No field is presented to the user, but the content of the field is sent with the submitted form. This value may be used to transmit state information about client/server interaction.
CHECKBOX: Used for simple Boolean attributes, or for attributes which can take multiple values at the same time. The latter is represented by a number of checkbox fields each of which has the same NAME .
RADIO: For attributes which can take a single value from a set of alternatives. Each radio button field in the group should be given the same NAME .
SUBMIT: This is a button that when pressed submits the form. It offers authors control over the location of this button. You can use an image as a submit button by specifying a URL with the SRC attribute.
RESET: This is a button that when pressed resets the form's fields to their initial values as specified by the VALUE attribute. You can use an image as a reeset button by specifying a URL with the SRC attribute.

Proposed types

RANGE: This allows you to specify an integer range with the MIN and MAX attributes, e.g. MIN=1 MAX=100 . Users can select any value in this range.
INT: For entering integer numbers, the maximum number of digits can be specified with the SIZE attribute (excluding the sign character), e.g. size=3 for a three digit number.
FLOAT: For fields which can accept floating point numbers.
SCRIBBLE: A field upon which you can write with a pen or mouse. The size of the field in millimeters is given as SIZE= width , height. The units are absolute as they relate to the dimensions of the human hand, rather than pixels of varying resolution. The scribble may involve time and pressure data in addition to the basic ink data. You can use scribble for signatures or sketches. The field can be initialised by setting the SRC attribute to a URL which contains the ink *2 . The VALUE attribute is ignored.
AUDIO: This provides a way of entering spoken messages into a form. Browsers might show an icon which when clicked pops-up a set of tape controls that you can use to record and replay messages. The initial message can be set by specifying a URL with the SRC attribute. The VALUE attribute is ignored.

Obsolete types

DATE Fields which can accept a recognized date format.

URL For fields which expect document references as URLs or URNs.

IMAGE: This allows you to specify an image field upon which you can click with a pointing device. The SRC and ALIGN attributes are exactly the same as for the IMG and IMAGE elements. The symbolic names for the x and y coordinates of the click event are specified with .x and .y for the given with the NAME attribute. The VALUE attribute is ignored.

When you need to let users enter more than one line of text, you should use the TEXTAREA element.

OPTION

The OPTION element can take the following attributes:

SELECTED: Indicates that this option is initially selected.
VALUE: When present indicates the value to be returned if this option is chosen. The returned value defaults to the contents of the option element.

Proposed attributes

DISABLED: When present indicates that this option is temporarily disabled. Browsers should show this by "greying it"

The contents of the OPTION element is presented to the user to represent the option. It is used as a returned value if the VALUE attribute is not present.

SELECT

The SELECT element allows the user to chose one of a set of alternatives described by textual labels, Every alternative is represented by the OPTION element.

Attributes

MULTIPLE: The MULTIPLE attribute is needed when users are allowed to make several selections, e.g. <SELECT MULTIPLE> .

Proposed attributes

ERROR: The ERROR attribute can be used to indicate that the initial selection is in error in some way, e.g. because it is inconsistent with the values of other fields.

Typical rendering

SELECT is typically as a pull down or pop-up list.

Example

e.g.

<SELECT NAME="flavor">
<OPTION>Vanilla
<OPTION>Strawberry
<OPTION>Rum and Raisin
<OPTION>Peach and Orange
</SELECT>

out in some manner.

TEXTAREA

When you need to let users enter more than one line of text, you should use the TEXTAREA element, e.g.

<TEXTAREA NAME="address" ROWS=64 COLS=6>
Hewlett Packard Laboratories
1501 Page Mill Road
Palo Alto, California 94304-1126
</TEXTAREA>

The text up to the end tag is used to initialize the field's value. This end tag is always required even if the field is initially blank. The ROWS and COLS attributes determine the visible dimension of the field in characters. Browsers are recommended to allow text to grow beyond these limits by scrolling as needed. In the initial design for forms, multi-line text fields were supported by the INPUT element with TYPE=TEXT . Unfortunately, this causes problems for fields with long text values as SGML limits the length of attribute literals. The HTML+ DTD allows for up to 1024 characters (the SGML default is only 240 characters!).

Headings

Six levels of heading are supported. (Note that a hypertext node within a hypertext work tends to need fewer levels of heading than a work whose only structure is given by the nesting of headings.)

A heading element implies all the font changes, paragraph breaks before and after, and white space (for example) necessary to render the heading. Further character emphasis or paragraph marks are not required in HTML.

H1 is the highest level of heading, and is recommended for the start of a hypertext node. It is suggested that the the text of the first heading be suitable for a reader who is already browsing in related information, in contrast to the title tag which should identify the node in a wider context.

The heading elements are

		<H1>, <H2>, <H3>, <H4>, <H5>, <H6>

It is not normal practice to jump from one header to a header level more than one below, for example for follow an H1 with an H3. Although this is legal, it is discouraged, as it may produce strange results for example when generating other representations from the HTML.

Example:

		<H1>This is a heading</H1>
		Here is some text
		<H2>Second level heading</H2>
		Here is some more text.

Parser Note:

Parsers should not require any specific order to heading elements, even if the heading level increases by more than one between successive headings.

Typical Rendering

H1: Bold very large font, centered. One or two lines clear space between this and anything following. If printed on paper, start new page.
H2: Bold, large font,, flush left against left margin, no indent. One or two clear lines above and below.
H3: Italic, large font, slightly indented from the left margin. One or two clear lines above and below.
H4: Bold, normal font, indented more than H3. One clear line above and below.
H5: Italic, normal font, indented as H4. One clear line above.
H6: Bold, indented same as normal text, more than H5. One clear line above.

These typical values are just an indication, and it is up to the designer of the presentation software to define the styles. The reader may have options to customize these. When writing documents, you should assume that whatever is done it is designed to have the same sort of effect as the styles above.

The rendering software is responsible for generating suitable vertical white space between elements, so it is NOT normal or required to follow a heading element with a paragraph mark.

Horizontal Rule

Typical Rendering

Some sort of divider between sections of text such as a full width horizontal rule or equivalent graphic.

Example

The horizontal rule is typically used for separating heading information (when more than just a heading) from content, etc.

		<H1>The Albatross</H1>
		<Address>The Bumstead Monthly, 1948</Address>
		The following information is culled from
		this and suvccessive issues of the magazine.
		Thanks are due to the editor-in-chief,
		A.R. Bunstead, for her help and advice.
		<H2>Copyright IQR Inc.</h2>
		This recording may not be sold, resold,
		hired out, used, or talked about in too great
		a depth without the publisher's written or
		videotaped consent.
		<HR>
		The Albatross, most fabled and infamous of ..

IMG: Embedded Images

Status: Extra

The IMG element allows another document to be inserted inline. The document is normally an icon or small graphic, etc. This element is NOT intended for embedding other HTML text.

Browsers which are not able to display inline images ignore IMG elements. Authors should note that some browsers will be able to display (or print) linked graphics but not inline graphics. If the graphic is essential, it may be wiser to make a link to it rather than to put it inline. If the graphic is essentially decorative, then IMG is appropriate.

The IMG element is empty: it has no closing tag. It has two attributes:

SRC: The value of this attribute is the URL of the document to be embedded. Its syntax is the same as that of the HREF attribute of the A tag. SRC is mandatory.
ALIGN: Take values TOP or MIDDLE or BOTTOM, defining whether the tops or middles of bottoms of the graphics and text should be aligned vertically.
ALT: Optional alternative text as an alternative to the graphics for display in text-only environments.

Note that IMG elements are allowed within anchors.

Example

	Warning: < IMG SRC ="triangle.gif" ALT="Warning:"> This must be done by a
	qualified technician.

	< A HREF="Go.html">< IMG SRC ="Button.ps" ALT="GO"></A>

ISINDEX

This element informs the reader that the document is an index document. As well as reading it, the reader may use a keyword search.

The node may be queried with a keyword search by suffixing the node address with a question mark, followed by a list of keywords separated by plus signs. See the network address format .

Note that this tag is normally generated automatically by a server. If it is added by hand to an HTML document, then the client will assume that the server can handle a search on the document. Obviously the server must have this capability for it to work: simply adding <ISINDEX> in the document is not enough to make searches happen if the server does not have a search engine!

Status: standard.

Example of use:

		<ISINDEX>

LINK

The LINK element occurs within the HEAD element of an HTML document. It is used to indicate a relationship between the document and some other object. A document may have any number of LINK elements.

The LINK element is empty, but takes the same attributes as the anchor element .

Typical uses are to indicate authorship, related indexes and glossaries, older or more recent versions, etc. Links can indicate a static tree structure in which the document was authored by pointing to a "parent" and "next" and "previous" document, for example.

Servers may also allow links to be added by those who do not have the right to alter the body of a document.

Forms of list in HTML

These lists may be nested

Glossaries

A glossary (or definition list) is a list of paragraphs each of which has a short title alongside it. Apart from glossaries, this element is useful for presenting a set of named elements to the reader. The elements within a glossary follow are introduced by these elements:

DT: The "term", typically placed in a wide left indent
DD: The "definition", which may wrap onto many lines

These elements must appear in pairs. Single occurrences of DT without a following DD are allowed, and have the same significance as if the DD had been present with no text.. The one attribute which DL can take is

COMPACT: suggests that a compact rendering be used, because the enclosed elements are individually small, or the whole glossary is rather large, or both.

Typical rendering

The definition list DT, DD pairs are arranged vertically. For each pair, the DT element is on the left, in a column of about a third of the display area, and the DD element is in the right hand two thirds of the display area. The DT term is normally small enough to fit on one line within the left-hand column. If it is longer, it will either extend across the page, in which case the DD section is moved down to separate them, or it is wrapped onto successive lines of the left hand column.

This is sometimes implemented with the use of a large negative first line indent.

White space is typically left between successive DT,DD pairs unless the COMPACT attribute is given. The COMPACT attribute is appropriate for lists which are long and/or have DT,DD pairs which each take only a line or two. It is of course possible for the rendering software to discover these cases itself and make its own decisions, and this is to be encouraged.

The COMPACT attribute may also reduce the width of the left-hand (DT) column.

Examples of use

	<DL>
	<DT>Term the first<DD>definition paragraph is reasonably
	long but is still displayed clearly
	<DT>Term2 follows<DD>Definition of term2
	</DL>

	<DL COMPACT>
	<DT>Term<DD>definition paragraph
	<DT>Term2<DD>Definition of term2
	</DL>

Lists

A list is a sequence of paragraphs, each of which may be preceded by a special mark or sequence number. The syntax is:

		<UL>
		<LI> list element
		<LI> another list element ...
		</UL>

The opening list tag may be any of UL , OL , MENU or DIR . It must be immediately followed by the first list element.

Typical rendering

The representation of the list is not defined here, but a bulleted list for unordered lists, and a sequence of numbered paragraphs for an ordered list would be quite appropriate. Other possibilities for interactive display include embedded scrollable browse panels.

List elements with typical rendering are:

UL: A list of multi-line paragraphs, typically separated by some white space and/or marked by bullets, etc.
OL: As UL, but the paragraphs are typically numbered in some way to indicate the order as significant.
MENU: A list of smaller paragraphs. Typically one line per item, with a style more compact than UL.
DIR: A list of short elements, typically less than 20 characters. These may be arranged in columns across the page, typically 24 character in width. If the rendering software is able to optimize the column width as function of the widths of individual elements, so much the better.

Example of use

		<OL>
		<LI> When you get to the station, leave
		by the southern exit, on platform one.
		<LI>Turn left to face toward the mountain
		<LI>Walk for a mile or so until you reach the
		"Asquith Arms" then 
		<LI>Wait and see...
		</OL>

		< MENU >
		<LI>The oranges should be pressed fresh
		<LI>The nuts may come from a packet
		<LI>The gin must be good quality
		</MENU>

		< DIR >
		<LI>A-H<LI>I-M
		<LI>M-R<LI>S-Z
		</DIR>

Next ID

This tag takes a single attribute which is the number of the next document-wide numeric identifier to be allocated of the form z123.

When modifying a document, old anchor ids should not be reused, as there may be references stored elsewhere which point to them. This is read and generated by hypertext editors. Human writers of HTML usually use mnemonic alphabetical identifiers. Browser software may ignore this tag.

Example of use:

		<NEXTID N=z27>

P: Paragraph

The empty P element represents a paragraph. The exact rendering of this (indentation, leading, etc) is not defined here, and may be a function of other tags, style sheets etc.

You do NOT need to use <P> to put white space around heading, list, address or blockquote elements. It is the responsibility of the rendering software to generate that white space. An empty paragraph has undefined effect and should be avoided.

Typical rendering

Typically, paragraphs are surrounded by a small vertical space (of a line or half a line). This is not the case (typically) within ADDRESS or (ever) within PRE elements. With some implementations, normal paragraphs may have a small extra left indent on the first line.

Examples of use

	<h1>What to do</h1>
	<p>This is a one paragraph.<P>This is a second.
	<P>
	This is a third.

Bad example

        <h1><P>What not to do</h1>
	<address><p>I found that on my XYZ browser it looked prettier to
	me if I put some paragraph tags</address>
	<p>
	<ul><p><li>Around lists, and
	<li>Inside headings.
	</ul>
	<p>
	<h2>None of the paragraph tags in this example should
	be there.</h2>

PRE: Preformatted text

Preformatted elements in HTML are displayed with text in a fixed width font, and so are suitable for text which has been formatted for a teletype by some existing formatting system.

The optional attribute is:

WIDTH: This attribute gives the maximum number of characters which will occur on a line. It allows the presentation system to select a suitable font and indentation. Where the WIDTH attribute is not recognized, it is recommended that a width of 80 be assumed. Where WIDTH is supported, it is recommended that at least widths of 40, 80 and 132 characters be presented optimally, with other widths being rounded up.

Within a PRE element,

Line boundaries within the text are rendered as a move to the beginning of the next line, except for one immediately following or immediately preceding a tag.
The <p> tag should not be used. If found, it should be rendered as a move to the beginning of the next line.
Anchor elements and character highlighting elements may be used.
Elements which define paragraph formatting (Headings, Address, etc) must not be used.
The ASCII Horizontal Tab (HT) character must be interpreted as the smallest positive nonzero number of spaces which will leave the number of characters so far on the line as a multiple of 8. Its use is not recommended however.

Example of use

			<PRE WIDTH="80">
			This is an example line
			</PRE>

Note: Highlighting

Within a preformatted element, the constraint that the rendering must be on a fixed horizontal character pitch may limit or prevent the ability of the renderer to render highlighting elements specially.

Note: Margins

The above references to the "beginning of a new line" must not be taken as implying that the renderer is forbidden from using a (constant) left indent for rendering preformatted text. The left indent may of course be constrained by the width required.

TITLE

The title of a document is specified by the TITLE element. The TITLE element must occur in the HEAD of the document.

There may only be one title in any document. It should identify the content of the document in a fairly wide context.

It may not contain anchors, paragraph marks, or highlighting. The title may be used to identify the node in a history list, to label the window displaying the node, etc. It is not normally displayed in the text of a document itself. Contrast titles with headings . The title should ideally be less than 64 characters in length. That is, many applications will display document titles in window titles, menus, etc where there is only limited room. Whilst there is no limit on the length of a title (as it may be automatically generated from other data), information providers are warned that it may be truncated if long.

Examples of use

Appropriate titles might be

		<TITLE>Rivest and Neuman. 1989(b)</TITLE>

		<TITLE>A Recipe for Maple Syrup Flap-Jack</TITLE>

		<TITLE>Introduction -- AFS user's Guide</TITLE>

Examples of inappropriate titles are those which are only meaningful within context,

		<TITLE>Introduction</TITLE>

or too long,

	<TITLE>Remarks on the Quantum-Gravity effects of "Bean
	Pole" diversification in Mononucleosis patients in Developing
	Countries under Economic Conditions Prevalent during
	the Second half of the Twentieth Century, and Related Papers:
	a Summary</TITLE>

Character highlighting

Status: Extra

These elements allow sections of text to be formatted in a particular way, to provide emphasis, etc. The tags do NOT cause a paragraph break, and may be used on sections of text within paragraphs.

Where not supported by implementations, like all tags, these tags should be ignored but the content rendered.

All these tags have related closing tags, as in

		This is <EM>emphasized</EM> text.

Some of these styles are more explicit than others about how they should be physically represented. The logical styles should be used wherever possible, unless for example it is necessary to refer to the formatting in the text. (Eg, "The italic parts are mandatory".)

Note:

Browsers unable to display a specified style may render it in some alternative, or the default, style, with some loss of quality for the reader. Some implementations may ignore these tags altogether, so information providers should attempt not to rely on them as essential to the information content.

These element names are derived from TeXInfo macro names.

Physical styles

TT: Fixed-width typewriter font.
B: Boldface, where available, otherwise alternative mapping allowed.
I: Italic font (or slanted if italic unavailable).
U: Underline.

Logical styles

EM: Emphasis, typically italic.
STRONG: Stronger emphasis, typically bold.
CODE: Example of code. typically monospaced font. (Do not confuse with PRE )
SAMP: A sequence of literal characters.
KBD: in an instruction manual, Text typed by a user.
VAR: A variable name.
DFN: The defining instance of a term. Typically bold or bold italic.
CITE: A citation. Typically italic.

Examples of use

	This text contains an <em>emphasized</em> word.
	<strong>Don't assume</strong> that it will be italic!
	It was made using the <CODE>EM</CODE> element. A citation is
	typically italic and has no formal necessary structure:
	<cite>Moby Dick</cite> is a book title.

Obsolete elements

The following elements of HTML are obsolete. It is recommended that client implementors implement the obsolete forms for compatibility with old servers.

Plaintext

Status: Obsolete .

The empty PLAINTEXT tag terminates the HTML entity. What follows is not SGML. In stead, there's an old HTTP convention that what follows is an ASCII (MIME "text/plain") body.

An example if its use is:

			<PLAINTEXT>
			0001 This is line one of a ling listing
			0002 file from <any@host.inc.com> which is sent

This tag allows the rest of a file to be read efficiently without parsing. Its presence is an optimization. There is no closing tag. The rest of the data is not in SGML.

XMP and LISTING : Example sections

Status: Obsolete . This are in use and should be recognized by browsers. New servers should use <PRE> instead.

These styles allow text of fixed-width characters to be embedded absolutely as is into the document. The syntax is:

			<LISTING>
				...
			</LISTING>

			<XMP>
				...
			</XMP>

The text between these tags is to be portrayed in a fixed width font, so that any formatting done by character spacing on successive lines will be maintained. Between the opening and closing tags:

The text may contain any ISO Latin printable characters, but not the end tag opener. (See Historical note )
Line boundaries are significant, except any occurring immediately after the opening tag or before the closing tag. and are to be rendered as a move to the start of a new line.
The ASCII Horizontal Tab (HT) character must be interpreted as the smallest positive nonzero number of spaces which will leave the number of characters so far on the line as a multiple of 8. Its use is not recommended however.

The LISTING element is portrayed so that at least 132 characters will fit on a line. The XMP elementis portrayed in a font so that at least 80 characters will fit on a line but is otherwise identical to LISTING.

Highlighted Phrase HP1 etc

Status: Obsolete . These tags like all others should be ignored if not implemented. Replaced will more meaningful elements -- see character highlighting .

Examples of use:

		<HP1>...</HP1>   <HP2>... </HP2> etc.

Comment element

Status: Obsolete

A comment element used for bracketing off unneed text and comment has been introduced in some browsers but will be replaced by the SGML command feature in new implementations.

Historical Note: XMP and LISTING

The XMP and LISTING elements used historically to have non SGML conforming specifications, in that the text could contain any ISO Latin printable characters, including the tag opener, so long as it does not contain the closing tag in full.

This form is not supported by SGML and so is not the specified HTML interpretation. Providers should be warned that implementations may vary on how they interpret end tags apparently within these elements

Entities

The following entity names are used in HTML , always prefixed by ampersand (&) and followed by a semicolon as shown. They represent particular graphic characters which have special meanings in places in the markup, or may not be part of the character set available to the writer.

<: The less than sign <
>: The "greater than" sign >
&: The ampersand sign &itself.
": The double quote sign "
 : A non-breaking space

Also allowed are references to any of the ISO Latin-1 alphabet, using the entity names in the following table .

ISO Latin 1 character entities

This list is derived from "ISO 8879:1986//ENTITIES Added Latin 1//EN".

Æ: capital AE diphthong (ligature)
Á: capital A, acute accent
Â: capital A, circumflex accent
À: capital A, grave accent
Å: capital A, ring
Ã: capital A, tilde
Ä: capital A, dieresis or umlaut mark
Ç: capital C, cedilla
Ð: capital Eth, Icelandic
É: capital E, acute accent
Ê: capital E, circumflex accent
È: capital E, grave accent
Ë: capital E, dieresis or umlaut mark
Í: capital I, acute accent
Î: capital I, circumflex accent
Ì: capital I, grave accent
Ï: capital I, dieresis or umlaut mark
Ñ: capital N, tilde
Ó: capital O, acute accent
Ô: capital O, circumflex accent
Ò: capital O, grave accent
Ø: capital O, slash
Õ: capital O, tilde
Ö: capital O, dieresis or umlaut mark
Þ: capital THORN, Icelandic
Ú: capital U, acute accent
Û: capital U, circumflex accent
Ù: capital U, grave accent
Ü: capital U, dieresis or umlaut mark
Ý: capital Y, acute accent
á: small a, acute accent
â: small a, circumflex accent
æ: small ae diphthong (ligature)
à: small a, grave accent
å: small a, ring
ã: small a, tilde
ä: small a, dieresis or umlaut mark
ç: small c, cedilla
é: small e, acute accent
ê: small e, circumflex accent
è: small e, grave accent
ð: small eth, Icelandic
ë: small e, dieresis or umlaut mark
í: small i, acute accent
î: small i, circumflex accent
ì: small i, grave accent
ï: small i, dieresis or umlaut mark
ñ: small n, tilde
ó: small o, acute accent
ô: small o, circumflex accent
ò: small o, grave accent
ø: small o, slash
õ: small o, tilde
ö: small o, dieresis or umlaut mark
ß: small sharp s, German (sz ligature)
þ: small thorn, Icelandic
ú: small u, acute accent
û: small u, circumflex accent
ù: small u, grave accent
ü: small u, dieresis or umlaut mark
ý: small y, acute accent
ÿ: small y, dieresis or umlaut mark

The HTML DTD

The SGML declaration of HTML follows . Its relationship to the content of an SGML document is explained in the section "HTML and SGML" .

<!SGML  "ISO 8879:1986"
--
	SGML Declaration for HyperText Markup Language (HTML)
	as used by the World-Wide Web (WWW) application.

--

CHARSET
         BASESET  "ISO 646:1983//CHARSET
                   International Reference Version (IRV)//ESC 2/5 4/0"
         DESCSET  0   9   UNUSED
                  9   2   9
                  11  2   UNUSED
                  13  1   13
                  14  18  UNUSED
                  32  95  32
                  127 1   UNUSED
     BASESET   "ISO Registration Number 100//CHARSET
                ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
     DESCSET   128 32 UNUSED
               160 95 32

CAPACITY        SGMLREF
                TOTALCAP        150000
                GRPCAP          150000
  
SCOPE    DOCUMENT
SYNTAX   
         SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
                           19 20 21 22 23 24 25 26 27 28 29 30 31 127
         BASESET  "ISO 646:1983//CHARSET
                   International Reference Version (IRV)//ESC 2/5 4/0"
         DESCSET  0 128 0
         FUNCTION
              --  SPACE       32
                  TAB SEPCHAR  9
                  LF  SEPCHAR 10
                  FF  SEPCHAR 12
                  CR  SEPCHAR 13 --

	-- The above is an accurate description of the usage of FUNCTION --
	-- characters in HTML implementations; that is, there is no      --
	-- Record Start or Record End character, and no occurences of    --
	-- character 10 or 13 are "ignored" by the parser.               --
	-- But because few SGML implementations support this concrete    --
	-- sytax, we include the one below.                              --

	-- Note that in order to get correct behaviour w.r.t. newline    --
	-- processing, you will have to play some tricks in construcing  --
	-- the document entity for parsing in order to keep the parser   --
	-- from ignoring newlines in surpirsing ways                     --

		  RE          13
                  RS          10
                  SPACE       32
                  TAB SEPCHAR  9
	

         NAMING   LCNMSTRT ""
                  UCNMSTRT ""
                  LCNMCHAR ".-"
                  UCNMCHAR ".-"
                  NAMECASE GENERAL YES
                           ENTITY  NO
         DELIM    GENERAL  SGMLREF
                  SHORTREF SGMLREF
         NAMES    SGMLREF
         QUANTITY SGMLREF
                  NAMELEN  34
                  TAGLVL   100
                  LITLEN   1024
                  GRPGTCNT 150
                  GRPCNT   64                   

FEATURES
  MINIMIZE
    DATATAG  NO
    OMITTAG  YES
    RANK     NO
    SHORTTAG YES
  LINK
    SIMPLE   NO
    IMPLICIT NO
    EXPLICIT NO
  OTHER
    CONCUR   NO
    SUBDOC   NO
    FORMAL   YES
  APPINFO    NONE
>
<!-- 
	$Id: html.decl,v 1.6 1994/05/18 17:23:34 connolly Exp $

	Author: Daniel W. Connolly <connolly@hal.com>

	See also: http://www.hal.com/%7Econnolly/html-spec/HTML.html
		  http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html
 -->

The HTML DTD

The HTML DTD follows . Its relationship to the content of an SGML document is explained in the section "HTML and SGML" .

<!--	html.dtd

        Document Type Definition for the HyperText Markup Language
        as used by the World Wide Web (HTML DTD).

	$Id: html.dtd,v 1.13 1994/05/18 17:23:29 connolly Exp $

	Author: Daniel W. Connolly <connolly@hal.com>
	See Also: http://www.hal.com/%7Econnolly/html-spec/HTML.html
		  http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html
-->

<!ENTITY HTML.Version
	"-//connolly hal.com//DTD WWW HTML $Date 1994/04/19 17:24:06 $//EN"
	-- public identifier for "current pracitice" version             -- 
	-- actually, take the $'s out to get the real public identifer,  --
	-- since $ is illegal in public identifier. When DTD stabilizes, --
	-- we'll need to stop using RCS keywords to version the pub id   --

        -- Typical usage:

            <!DOCTYPE HTML PUBLIC "-//connolly hal.com//DTD WWW HTML
						$Date: 1994/05/18 17:23:29 $//EN">
	    <html>
	    ...
	    </html>
	--
	>


<!-- Feature Test Entities -->

<!-- To use these, write your document like:
	<!DOCTYPE HTML [
	<!ENTITY % HTML.Optional "INCLUDE">
	<!ENTITY % html PUBLIC "-//connolly hal.com//DTD WWW HTML 1.8//EN">
	%html;
	]>
	<TITLE>Here's my doc</TITLE>
	<p>It uses lots of optional features

 In practice, if you're using sgmls to validate your docs,
 you can stick the <!DOCTYPE [...]> in a separate file and
 validate with:
	sgmls -s doctype.sgml foo.html
 -->

<!ENTITY % HTML.Minimal  "IGNORE">
<!ENTITY % HTML.Obsolete "IGNORE">
<!ENTITY % HTML.Prescriptive "IGNORE">

<![ %HTML.Minimal [
	<!ENTITY % HTML.linkRelationships "IGNORE">
	<!ENTITY % HTML.linkMethods "IGNORE">
	<!ENTITY % HTML.linkRedundantInfo "IGNORE">
	<!ENTITY % HTML.forms "IGNORE">
	<!-- @@ nested lists -->
	<!-- @@ phrases -->
	<!-- @@ headers inside A -->
	<!-- @@ nested phrases, fonts -->
	]]>
	
<![ %HTML.Obsolete [
	<!ENTITY % HTML.titleCDATA "INCLUDE">
	<!ENTITY % HTML.litCDATA "INCLUDE">
	<!ENTITY % HTML.pSeparator "INCLUDE">
	]]>

<![ %HTML.Prescriptive [
	<!--
	This feature test entity prescribes that certain
	idioms detract from the structural integrity of an
	HTML document, and are therefore disallowed.
	-->
	<!ENTITY % HTML.font-phrase "IGNORE">
	<!ENTITY % HTML.anchorNameCDATA "IGNORE">
	<!ENTITY % HTML.PLAINTEXT "IGNORE">
	<!ENTITY % HTML.bodyBlockOnly "INCLUDE">
	]]>

<!ENTITY % HTML.bodyBlockOnly "IGNORE"
	-- only allow block elements in the BODY element
	This means all paragraphs need to start with a <P> tag.
	-->

<!ENTITY % HTML.pSeparator "IGNORE"
	-- use P element as paragraph separator, rather	that container.
	-->

<!ENTITY % HTML.linkRelationships "INCLUDE"
	-- Adding markup to links to show the relationship between
	ends of a link
	see http://www.w3.org/hypertext/WWW/MarkUp/Relationships.html
	-->

<!ENTITY % HTML.linkMethods "INCLUDE"
	-- Adding markup to links to show the methods supported
	by the referent object
	see http://www.w3.org/hypertext/WWW/MarkUp/Elements/A.html
	-->

<!ENTITY % HTML.linkRedundantInfo "INCLUDE"
	-- Adding markup to links to give redundant information
	like URN, content type, title...
	-->

<!ENTITY % HTML.anchorNameCDATA "INCLUDE"
	-- Anchor names should be distinct. SGML parser can validate
	this if the NAME attribute of the A element is declared as ID.
	But that restricts the syntax of an anchor name to an SGML name,
	i.e. a letter followed by letters, numbers, periods and dashes,
	up to NAMELEN (34) characters long.
	-->

<!ENTITY % HTML.PLAINTEXT "INCLUDE"
	-- Support for the <PLAINTEXT> tag as a sign of the
	end of th HTML data stream and the beginning of a stream
	of text/plain data
	-->

<!ENTITY % HTML.titleCDATA "IGNORE"
	-- Is the TITLE element #PCDATA, RCDATA, or CDATA content?
	On Mosaic, it's #PCDATA, but in the linemode browser,
	it's more like CDATA, but not quite.
	-->

<!ENTITY % HTML.NEXTID "INCLUDE"
	-- Used by the NeXT implementation to keep track of the
	next anchor id to use
	-->

<!ENTITY % HTML.font-phrase "INCLUDE"
	-- allow B, I, TT, U outside PRE,
	CITE, VAR, etc. inside PRE
	-->

<!ENTITY % HTML.KEY "IGNORE"
	-- There was once a KEY element, for keyboard keys, menu items,
	buttons, etc. but it's not supported or widely documented
	-->

<!ENTITY % HTML.U "IGNORE"
	-- There was also a U element, but since it clashes with
	the common pracitce of underlining hypertext links, it is
	not widely supported
	-->

<!ENTITY % HTML.litCDATA "IGNORE"
	-- treat XMP, LISTING as CDATA, as per linemodeWWW
	-->

<!ENTITY % HTML.forms "INCLUDE"
	-- Support for forms as per
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html
	-->

<!-- DTD definitions -->

<!ENTITY % heading "H1|H2|H3|H4|H5|H6" >
<!ENTITY % list " UL | OL | DIR | MENU ">
<!ENTITY % literal " XMP | LISTING ">

<!ENTITY % URI "CDATA"
        -- The term URI means a CDATA attribute
           whose value is a Uniform Resource Identifier,
           as defined by 
	"Universal Resource Identifiers" by Tim Berners-Lee
	aka http://www.w3.org/hypertext/WWW/Addressing/URL/URI_Overview.html

	Note that CDATA attributes are limited by the LITLEN
	capacity (1024 in the current version of html.decl),
	so that URIs in HTML have a bounded length.

	@@ Need to discuss relative addresses.
        -->

<!ENTITY % Content-Type "CDATA"
	-- meaning a MIME content type, as per RFC1521
	-->

<![ %HTML.anchorNameCDATA [ <!ENTITY % anchor-name "CDATA"> ]]>
<!ENTITY % anchor-name "ID">

<![ %HTML.linkRelationships [ <!ENTITY % linkRelAttrs "
        REL CDATA #IMPLIED -- forward relationship type --
        REV CDATA #IMPLIED -- reversed relationship type
                              to referent data:

                                PARENT CHILD, SIBLING, NEXT, TOP,
                                DEFINITION, UPDATE, ORIGINAL etc. --
	"> ]]>
<!ENTITY % linkRelAttrs "">

<![ %HTML.linkRedundantInfo [ <!ENTITY % linkRedundantAttrs "
        URN CDATA #IMPLIED -- universal resource number --

        TITLE CDATA #IMPLIED -- advisory only --
	"> ]]>
<!ENTITY % linkRedundantAttrs "">

<![ %HTML.linkMethods [ <!ENTITY % linkMethodAttrs "
        METHODS NAMES #IMPLIED -- supported public methods of the object:
                                        TEXTSEARCH, GET, HEAD, ... --
	"> ]]>
<!ENTITY % linkMethodAttrs "">

<!ENTITY % linkattributes
        "NAME %anchor-name #IMPLIED
        HREF %URI;  #IMPLIED
	%linkRelAttrs;
	%linkRedundantAttrs;
	%linkMethodAttrs;
        ">


<!-- Document Element -->


<![ %HTML.PLAINTEXT [ <!ENTITY % obsolete-plaintext ", PLAINTEXT?"> ]]>
<!ENTITY % obsolete-plaintext "">

<!ENTITY % html-content "HEAD, BODY %obsolete-plaintext;">
<!ELEMENT HTML O O  (%html-content)>

<![ %HTML.NEXTID [  <!ENTITY % head-content "TITLE? & ISINDEX? & LINK* & BASE?
			& NEXTID?"> ]]>
<!ENTITY % head-content "TITLE & ISINDEX? & LINK* & BASE?">
<!ELEMENT HEAD O O  (%head-content)>

<![ %HTML.titleCDATA [ <!ENTITY % title-content "CDATA"> ]]>
<!ENTITY % title-content "(#PCDATA)">
<!ELEMENT TITLE - -  %title-content
          -- The TITLE element is not considered part of the flow of text.
             It should be displayed, for example as the page header or
             window title.
          -->

<!ELEMENT ISINDEX - O EMPTY
          -- WWW clients should offer the option to perform a search on
             documents containing ISINDEX.
          -->

<!ELEMENT NEXTID - O EMPTY>
<!ATTLIST NEXTID N %anchor-name #REQUIRED
          -- The number should be a name suitable for use
             for the ID of a new element. When used, the value
             has its numeric part incremented. EG Z67 becomes Z68
          -->
<!ELEMENT LINK - O EMPTY>
<!ATTLIST LINK
        %linkattributes>
        
<!ELEMENT BASE - O EMPTY    -- Reference context for URIs -->
<!ATTLIST BASE

        HREF %URI; #REQUIRED

        >

<![ %HTML.KEY [
	<!ENTITY % key-emph "| KEY">
	]]>
<!ENTITY % key-emph "">

<![ %HTML.U [
	<!ENTITY % u-font "| U">
	]]>
<!ENTITY % u-font "">

<!ENTITY % font "TT | B | I %u-font">
<!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | DFN | CITE
	 | STRIKE %key-emph">


<![ %HTML.font-phrase [
	<!ENTITY % obsolete-font "| %font">
	<!ENTITY % obsolete-phrase "| %phrase">
	]]>
<!ENTITY % obsolete-font "">
<!ENTITY % obsolete-phrase "">
<![ %HTML.pSeparator [
	<!ENTITY % obsolete-p "| P">
	]]>
<!ENTITY % obsolete-p "">

<!ENTITY % inline "%phrase %obsolete-font">
<!ENTITY % pre-inline "%font %obsolete-phrase %obsolete-p">

<!ENTITY % text "#PCDATA | IMG | %inline | BR %obsolete-p">

<!ENTITY % htext "A | %text"    -- Plus links, no structure -->

<![ %HTML.font-phrase [ <!ENTITY % font-content "(%htext)+"> ]]>
<!ENTITY % font-content "#PCDATA">
<!ELEMENT (%font;) - - (%font-content;)>

<!ELEMENT (%phrase;) - - (%htext)+>

<!ENTITY % pre "PRE | XMP | LISTING">

<![ %HTML.forms [ <!ENTITY % block-form "| FORM | ISINDEX"> ]]>
<!ENTITY % block-form "">

<![ %HTML.pSeparator [
	<!ENTITY % obsolete-htext "| %htext">
	<!ENTITY % block-p "">
	]]>
<!ENTITY % obsolete-htext "| A">
<!ENTITY % block-p "| P ">

<!ENTITY % block "HR | %list | DL
		| %pre | BLOCKQUOTE | ADDRESS 
		%block-form %block-p">


<![ %HTML.bodyBlockOnly [
	<!ENTITY % current-htext "">
	]]>
<!ENTITY % current-htext "| %htext">

<!ENTITY % body-content "%heading | %block %current-htext">
<!ELEMENT BODY O O  (%body-content)*>


<!ELEMENT A     - - (%heading|%block|%text)+ -(A)
	-- @# Technically, this allows silliness like:
		<H2><A>xyz<H1>h1</H1></A></H2>
	The right way to do anchors outside of %htext is more like:
		<as id=z1><H2>lkjlkj</h2><ae start=z1>
	-->
<!ATTLIST A
        %linkattributes;
        >

<!ELEMENT IMG    - O EMPTY --  Embedded image -->
<!ATTLIST IMG
        SRC %URI;  #IMPLIED     -- URI of document to embed --
	ALT CDATA #IMPLIED
	ALIGN (top|middle|bottom) #IMPLIED
	ISMAP (ISMAP) #IMPLIED
        >


<![ %HTML.pSeparator [ <!ENTITY % p-content "EMPTY"> ]]>
<!ENTITY % p-content "(%htext)+">
<!ELEMENT P     - O %p-content>
<!ELEMENT HR    - O EMPTY -- horizontal rule -->
<!ELEMENT BR    - O EMPTY -- @# BR -> &br; -->

<!ELEMENT ( %heading )  - -  (%htext;)+>

<!ELEMENT DL    - -  (DT*, DD?)+>
<!ATTLIST DL
	COMPACT (COMPACT) #IMPLIED>

<!ELEMENT DT    - O (%htext)+>
<!ELEMENT DD    - O (%htext|%block)+>

<!ELEMENT (%list) - -  (LI)+>

<!ELEMENT LI    - O (%htext|%block)+>

<!ELEMENT BLOCKQUOTE - - (%htext|%block)+ -- @# Hmm... --
        -- for quoting some other source -->

<!ELEMENT ADDRESS - - (%htext;|%block)+>

<!ELEMENT PRE - - (#PCDATA|%pre-inline|A)+>
<!ATTLIST PRE
        WIDTH NUMBER #implied
        >

<!-- Mnemonic character entities. -->

<!ENTITY % ISOlat1 PUBLIC
  "ISO 8879:1986//ENTITIES Added Latin 1//EN">
%ISOlat1;

<!ENTITY #DEFAULT SDATA "&#38;unkown;" --display the markup-->
<!ENTITY amp CDATA "&#38;"     -- ampersand -->
<!ENTITY gt CDATA "&#62;"      -- greater than -->
<!ENTITY lt CDATA "&#60;"      -- less than -->
<!ENTITY quot CDATA "&#34;"    -- double quote -->

<!-- Processing Entities -->

<!ENTITY nbsp "<? nonbreaking-space>">
<!-- @# should add entites for processing instructions
	for line break, centering, etc. -->


<!-- Forms  -->
<![ %HTML.forms [

<!ENTITY % HTTP-Method "(GET | POST)">
<!ELEMENT FORM - - (%body-content)* -(FORM) +(INPUT|SELECT|TEXTAREA)>
<!ATTLIST FORM
	ACTION %URI #REQUIRED
	METHOD %HTTP-Method #IMPLIED -- @# MAILTO? --
	ENCTYPE %Content-Type; #IMPLIED
	>

<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
			RADIO | SUBMIT | RESET |
			IMAGE | HIDDEN )">
<!ELEMENT INPUT - O EMPTY>
<!ATTLIST INPUT
	TYPE %InputType #IMPLIED -- @# defaults to TEXT?? --
	NAME CDATA #IMPLIED -- required for all but submit and reset --
	VALUE CDATA #IMPLIED
	SRC %URI #IMPLIED -- for image inputs -- 
	CHECKED (CHECKED) #IMPLIED
	SIZE CDATA #IMPLIED -- @# should be NUMBERS: delimit with space, not comma --
	MAXLENGTH NUMBER #IMPLIED
	ALIGN (top|middle|bottom|left|center|right) #IMPLIED --@#supported?--
	>

<!ELEMENT SELECT - - (OPTION+)>
<!ATTLIST SELECT
	NAME CDATA #REQUIRED
	SIZE NUMBER #IMPLIED
	MULTIPLE (MULTIPLE) #IMPLIED
	>

<!ELEMENT OPTION - O (#PCDATA)>
<!ATTLIST OPTION
	SELECTED (SELECTED) #IMPLIED
	VALUE CDATA #IMPLIED
	>

<!ELEMENT TEXTAREA - - (#PCDATA)>
<!ATTLIST TEXTAREA
	NAME CDATA #REQUIRED
	ROWS NUMBER #REQUIRED -- @#implied? --
	COLS NUMBER #REQUIRED
	>
]]>

<!-- Obsolete Elements  -->

<![ %HTML.litCDATA [ <!ENTITY % lit-content "CDATA"> ]]>
<!ENTITY % lit-content "RCDATA">
<!ELEMENT (%literal) - -  %lit-content>

<![ %HTML.PLAINTEXT [
<!ELEMENT PLAINTEXT - O EMPTY>
]]>

Security Considerations

Anchors, embedded images, and all other elements which contain URls as parameters may cause the URI to be dereferenced, in which case the security considerations of the URI specification apply.

Documents may be constructed whose visible contents mislead one to follow a link by to unsuitable or offensive material .

Acknowledgements

The HTML document type was designed initially at CERN in 1990 for the World-Wide Web project. The DTD was written, and the specification tightened up, by Dan Connolly. After much discussion on the network and some enhancement in particular the addition of inline images introduced by the NCSA "Mosaic" software for WWW. The FORMS material is derived from the HTML+ specification with the help of Dave Raggett.This document is the work of many contributors. Many thanks to Erik Naggum and James Clark for making SGML technology available, and toTerry Allen, Dave Raggett, Marc Andressen, William Perry, and the rest of the WWW community.

References

SGML: ISO 8879:1986, Information Processing Text and Office Systems Standard Generalized Markup Language (SGML).
sgmls: an SGML parser by James Clark <jjc@jclark.com> derived from the ARCSGML parser materials which were written by Charles F. Goldfarb. The source is available on the ifi.uio.no FTP server in the directory /pub/SGML/SGMLS .
W3: The World-Wide Web , a global information initiative. For bootstrap information, telnet info.cern.ch or find documents by ftp://ftp.w3.org/pub/www/doc
URI: Universal Resource Identifiers . RFCxxx. Currently available by anonymous FTP from info.cern.ch in /pub/www/doc/url*.{ps,txt}

Author's addresses

				Daniel W. Connolly
		Affiliation:	HaL Software Systems
				Austin, TX
				USA
		email:		connolly@hal.com


				Tim Berners-Lee
		Address		CERN
				1211 Geneva 23
				Switzerland
		Telephone: 	+41(22)767 3755
		Fax:       	+41(22)767 7155
		email:	   	timbl@info.cern.ch