An Application Conforming to International
Standard ISO 8879 -- Standard Generalized
Markup Language
About of this Document
This document describes the current
practice and current proposals for
future standardisation of HTML, as
a basis for review and enhancement.
The document is a draft form of
a standard for interchange of information
on the network which is proposed
to be registered as a MIME (RFC1521)
content type.
Please send comments to connolly@hal.com
or the discussion list www-html@info.cern.ch.
Version
This is version 2.0 of this document.
It introduces forms for user input
of information. This feature is known
as a level 2 feature of HTML. All
other specified features are known
as level 1 features. Features of
higher levels which are under discussion,
(such as tables, figures, and mathematical
formulae) where mentioned are described
as "proposed".
The latest version of this document
is currently available in hypertext
on the World-Wide Web as http://www.w3.org/hypertext/WWW/MarkUp/HTML.html
Abstract
HyperText Markup Language (HTML)
can be used to represent
- Hypertext news, mail, online documentation,
and collaborative hypermedia;
- Menus of options;
- Database query results;
- Simple structured documents with
inlined graphics.
- Hypertext views of existing bodies
of information
The World Wide Web (W3) initiative
links related information throughout
the globe. HTML provides one simple
format for providing linked information,
and all W3 compatible programs are
required to be capable of handling
HTML. W3 uses an Internet protocol
(Hypertext Transfer Protocol, HTTP),
which allows transfer representations
to be negotiated between client and
server, the result being returned
in an extended MIME message. HTML
is therefore just one, but an important
one, of the representations used
with W3.
HTML is proposed as a MIME content
type.
HTML refers to the URI specification
RFCxxxx.
Implementations of HTML parsers and
generators can be found in the various
W3 servers and browsers, in the public
domain W3 code, and may also be built
using various public domain SGML
parsers such as [SGMLS] . HTML documents
are SGML documents with fairly generic
semantics appropriate for representing
information from a wide range of
applications.
Status of this memo
This document is an Internet Draft.
Internet Drafts are working documents
of the Internet Engineering Task
Force (IETF), its Areas, and its
Working Groups. Note that other
groups may also distribute working
documents as Internet Drafts.
Internet Drafts are working documents
valid for a maximum of six months.
Internet Drafts may be updated, replaced,
or obsoleted by other documents at
any time. It is not appropriate
to use Internet Drafts as reference
material or to cite them other than
as a "working draft" or "work in
progress".
Distribution of this document is
unlimited.
Vocabulary
This specification uses the words
below with the precise meaning given.
- Representation
- The encoding of information
for interchange. For example, HTML
is a representation of hypertext.
- Rendering
- The form of presentation
to information to the human reader.
Imperatives
- may
- The implementation is not obliged
to follow this in any way.
- must
- If this is not followed, the
implementation does not conform to
this specification.
- shall
- as "must"
- should
- If this is not followed, though
the implementation officially conforms
to the standard, undesirable results
may occur in practice.
- typical
- Typical rendering is described
for many elements. This is not a
mandatory part of the standard but
is given as guidance for designers
and to help explain the uses for
which the elements were intended.
Sections marked "Note:" are not mandatory
parts of the specification but for
guidance only.
Status of features
- Mandatory
- These features must be
implemented in the rendering. Features
are mandatory unless otherwise mentioned.
- Optional
- Standard HTML features which
may safely be ignored by parsers.
It is legal to ignore these, treat
the contents as though the tags were
not there. (e.g. EM, and processing
instructions) . Authors should be
awarethat these features may be ignored
by some applications.
- Proposed
- The specification of these
features is not final. They should
not be regarded as part ofthe standard,
but indicate possible directions
for future versions.
- Obsolete
- Not standard HTML. Parsers
should implement these features as
far as possible in order to preserve
back-compatibility with previous
versions of this specification.
HTML and MIME
The definition of the HTML content
subtype is
- MIME Type name
- text
- MIME subtype name:
- html
- Required parameters:
- none
- Optional parameters:
- level, version,
charset
Level
The level parameter specifies the
feature set which is used in the
document. The level is an integer
number, implying that any features
of same or lower level may be present
in the document. Levels are defined
by this specification.
Version
In order to help avoid future compatibility
problems, the version parameter may
be used to give the version number
of this specification to which the
document conforms. The version
number appears at the front of this
document and within public identifier
for the SGML DTD.
Character sets
The base character set (the SGML
BASESET) for HTML is ISO Latin-1.
This is the set referred to by any
numeric character references . The
actual character set used in the
representation of an HTML document
may be ISO Latin 1, or its 7-bit
subset which is ASCII. There is no
obligation for an HTML document to
contain any characters above decimal
127. It is possible that a transport
medium such as electronic mail imposes
constraints on the number of bits
in a representation of a document,
though the HTTP access protocol used
by W3 always allows 8 bit transfer.
When an HTML document is encoded
using 7-bit characters, then the
mechanisms of character references
and entity references may be used
to encode characters in the upper
half of the ISO Latin-1 set. In this
way, documents may be prepared which
are suitable for mailing through
7-bit limited systems.
Character set option (proposed)
The SGML declaration specified ISO
Latin 1 as the base character set.
The charset parameter is reserved
for future use. Its intended significance
is to override the base character
set of the SGML declaration. Support
of character sets other than ISO-Latin-1
is not a requirement for conformance
with this specification.
HTML and SGML
This section describes the relationship
between HTML and SGML, and guides
the newcomer through interpretation
of the DTD . (This is not a full
tutorial on SGML, and in the event
of any apparent conflict, the SGML
standard is definitive.)
The HyperText Markup Language is
an application conforming to International
Standard ISO 8879 -- Standard Generalized
Markup Language [ SGML ]. SGML is
a system for defining structured
document types, and markup languages
to represent instances of those document
types.
Every SGML document has three parts:
- An SGML declaration, which binds
SGML processing quantities and syntax
token names to specific values. For
example, the SGML declaration in
the HTML DTD specifies that the string
that opens a tag is </ and the maximum
length of a name is 34 characters.
- A prologue including one or more
document type declarations, which
specifiy the element types, element
relationships and attributes, and
references that can be represented
by markup. The HTML DTD specifies,
for example, that the HEAD element
contains at most one TITLE element.
- An instance, which contains the data
and markup of the document.
We use the term HTML to mean both
the document type and the markup
language for representing instances
of that document type.
The SGML declaration for HTML is
given in the appendix ``SGML Delcaration
for HTML.'' It is implicit among
WWW implementations.
The prologue for an HTML document
should look like:
<!DOCTYPE HTML PUBLIC "-//W3 Organization//DTD W3 HTML 2.0//EN">
- NOTE 2:
- Many extant HTML documents
do not contain a prologue. Implementations
are encouraged to infer the above
prologue if the document does not
begin with <! .
.
Structured Text
An HTML instance is like a text file,
except that some of the characters
are interpreted as markup. The markup
gives structure to the document.
The instance represents a hierarchy
of elements. Each element has a name
, some attributes , and some content.
Most elements are represented in
the document as a start tag, which
gives the name and attributes, followed
by the content, followed by the end
tag. For example:
<!DOCTYPE HTML PUBLIC
"-//W3 Organization//DTD W3 HTML 2.0//EN">
<HTML>
<HEAD>
<TITLE>
A sample HTML document
</TITLE>
</HEAD>
<BODY>
<H1>
An Example of Structure
<br>
In HTML
</H1>
<P>
Here's a typical paragraph.
<UL>
<LI>
Item one has an
<A NAME="anchor">
anchor
</A>
<LI>
Here's item two.
</UL>
</BODY>
</HTML>
Some elements (e.g. BR ) are empty.
They have no content. They show up
as just a start tag.
For the rest of the elements, the
content is a sequence of data characters
and nested elements. Some things
such as forms and anchors cannot
be nested, in which case this is
mentioned in the text. Anchors and
character highlighting may be put
inside other constructs.
Most elements start and end with
tags. Empty elements have no end
tag. Start tags are delimited by
<and >, and end tags are delimited
by </ and >. For example:
<h1> ... </H1> <!-- uppercase = lowercase -->
<h1 > ... </h1 > <!-- spaces OK before > -->
The following are not valid tags:
< h1> <!-- this is not a tag at all -->
<H1/> <H=1> <!-- these are markup errors -->
- NOTE:
- The SGML declaration for HTML
specifies SHORTTAG YES , which means
that there are some other valid syntaxes
for tags, e.g. NET tags: <em/.../
, empty start tags: <> , empty end
tags: </> . Until such time as support
for these idioms is widely deployed,
their use is strongly discouraged.
The start and end tags for the HTML,
HEAD, and BODY elements are omissable.
The end tags of some other elements
(e.g. P, LI, DT, DD) can be ommitted
(see the DTD for details). This does
not change the document structure
-- the following documents are equivalent:
<!DOCTYPE HTML PUBLIC
"-//W3 Organization//DTD W3 HTML 2.0//EN">
<TITLE>Structural Example</TITLE>
<H1>Structural Example</H1>
<P>A paragraph...
<!DOCTYPE HTML PUBLIC
"-//W3 Organization//DTD W3 HTML 2.0//EN">
<HTML><HEAD>
<TITLE>Structural Example</TITLE>
</HEAD>
<BODY>
<H1>Structural Example</H1>
<P>A paragraph...</P>
</BODY>
The element name immediately follows
the tag open delimiter. Names consist
of a letter followed by up to 33
letters, digits, periods, or hyphens.
Names are not case sensitive. For
example:
A H1 h1 another.name name-with-hyphens
In a start tag, whitespace and attributes
are allowed between the element name
and the closing delimiter. An attribute
consists of a name, an equal sign,
and a value. Whitespace is allowed
around the equal sign.
The value is either:
- A string literal, delimited by single
quotes or double quotes, or
- A name token; that is, a sequence
of letters, digits, periods, or hyphens.
For example:
<A HREF="http://host/dir/file.html">
<A HREF=foo.html >
<IMG SRC="mrbill.gif" ALT="Mr. Bill says, "Oh Noooo"">
The length of an attribute value
(after replacing entity and numeric
character referencees) is limited
to 1024 characters.
- NOTE 1:
- Some implementations allowed
any character except space or '>'
in a name token, for example <A HREF=foo/bar.html>
. As a result, there are many documents
that contain attribute values that
should be quoted but are not. While
parser implementators are encouraged
to support this idiom, its use in
future documents is stictly prohibited.
- NOTE 2:
- Some implementations also
consider any occurence of the > character
to signal the end of a tag. For compatibility
with such implementations, it may
be necessary to represent > with
an entity or numeric character reference;
for example: <IMG SRC="eq1.ps" ALT="a
> b">
Attributes with a delcared value
of NAME (e.g. ISMAP , COMPACT ) may
be written using a minimized syntax.
The markup:
<UL COMPACT="COMPACT">
can be written as
<UL COMPACT>
Undefined tag and attribute names
It is a principle to be conservative
in that which one produces, and liberal
in that which one accepts. HTML
parsers should be liberal except
when verifying code. HTML generators
should generate strictly conforming
HTML.
The behaviour of WWW applications
reading HTML documents and discovering
tag or attribute names which they
do not understand should be to behave
as though, in the case of a tag,
the whole tag had not been there
but its content had, or in the case
of an attribute, that the attribute
had not been present.
The charcters between the tags represent
text in the ISO-Latin-1 character
set, which is a superset of ASCII.
Because certain characters will be
interpreted as markup, they should
be "escaped"; that is, represented
by markup -- entity or numeric character
references. For example:
When a<b, we can show that...
Brought to you by AT&T
The HTML DTD includes entities for
each of the non-ASCII characters
so that one may reference them by
name if it is inconvenient to enter
them directly:
Kurt Gödel was a famous logician and mathematician.
- NOTE 1:
- To ensure that a string of
characters has no markup, it is sufficient
to represent all occurrences of <
, > , and & by character or entity
references.
- NOTE 2:
- There are SGML features (
CDATA , RCDATA ) to allow most <
, > , and & characters to be entered
without the use of entity or character
references. Because these features
tend to be used and implemented inconsistently,
and because they require 8-bit characters
to represent non-ASCII characters,
they are not employed in this version
of the HTML DTD. An earlier HTML
specification included an XMP element
whose syntax is not expressible in
SGML. Inside the XMP , no markup
was recognized except the </XMP>
end tag. While implementations are
encouraged to support this idiom,
its use is obsolete.
Comments
To include comments in an HTML document
that will be ignored by the parser,
surround them with <!-- and -->.
After the comment delimiter, all
text up to the next occurrence of
-- is ignored. Hence comments cannot
be nested. Whitespace is allowed
between the closing -- and >. (But
not between the opening <! and --.)
For example:
<HEAD>
<TITLE>HTML Guide: Recommended Usage</TITLE>
<!-- Id: Text.html,v 1.6 1994/04/25 17:33:48 connolly Exp -->
</HEAD>
- Note 3:
- Some historical implementations
incorrectly consider a > sign to
terminate a comment.
.
This is a discussion of the elements
in the HTML language, and how they
interact to represent documents.
The HTML Document Element
An HTML document is organized as
a HEAD and a BODY, much like memo
or a mail message:
HTML
|
|_head
|_body
The HEAD element is an small unordered
collection of information about the
document, whereas the BODY is an
ordered sequence of information elements
of arbitrary length. This organization
allows an implementation to determine
certain properties of a document
-- the title, for example -- without
parsing the entire document.
- TITLE
- The title of the document
- ISINDEX
- Sent by a server in a searchable
document
- NEXTID
- A parameter used by editors
to generate unique identifiers
- LINK
- Relationship between this document
and another. See also the Anchor
element , Relationships . A document
may have many LINK elements.
- BASE
- A record of the URL of the document
when saved
Proposed head elements
- EXPIRES
- The date after which the
document is invalid. Semantics as
in the HTTP specification.
Obsolete head elements
- META
- A wrapper for an HTTP element
-
The order of the contents of the
BODY element should be preserved
when it is rendered on the output
device.
Hypertext Anchors
- Anchors
- Sections of text which form
the beginning and/or end of hypertext
links are called "anchors" and defined
by the A tag.
Block Elements
These elements typically stack vertically
in the rendered flow of text. Whitespace
between them is ignored.
- Headings
- Several levels of heading
are supported.
- Paragraph
- The P element represents
a paragraph.
- Horizontal Rule
- A horizontal dividing
line
- Address style
- Used to represent authorship
or status of a document
- Blockquote style
- A block of text
quoted from another source.
- Lists
- Bulleted lists, glossaries,
etc.
- Preformatted text
- Sections in fixed-width
font for preformatted text.
Inline Elements
These elements fall left to right
in the rendered flow of text. Whitespace
between them separates words, except
in the PRE element, where it has
its literal ASCII meaning.
- Special Phrases
- Emphasis, typographic
distinctions, etc.
- Line Breaks
- Indicates a line break
in a flow of text.
- IMG
- The IMG tag allows inline graphics.
Body elements (level 2)
Elements for forms
The FORM element and various other
elements allowed only within it describe
forms which allow user input.
- FORM elements
- FORM, INPUT, SELECT,
OPTION, TEXTAREA, etc
The other elements are obsolete but
should be recognised by parsers for
back-compatibility.
HEAD
The HEAD element contains all information
about the document in general. It
does not contain any text which is
part of the document: this is in
the BODY . Within the head element,
only certain elements are allowed.
BODY
The BODY element contains all the
information which is part of the
document, as opposed information
about the document which is in the
HEAD .
The elements within the BODY element
are in the order in which they should
be presented to the reader.
See the list of things which are
allowed within a BODY element .
An anchor is a piece of text which
marks the beginning and/or the end
of a hypertext link.
The text between the opening tag
and the closing tag is either the
start or destination (or both) of
a link. Attributes of the anchor
tag are as follows.
- HREF
- OPTIONAL. If the HREF attribute
is present, the anchor is sensitive
text: the start of a link. If the
reader selects this text, (s)he should
be presented with another document
whose network address is defined
by the value of the HREF attribute
. The format of the network address
is specified elsewhere . This allows
for the form HREF="#identifier" to
refer to another anchor in the same
document. If the anchor is in another
document, the attribute is a relative
name , relative to the documents
address (or specified base address
if any). @@NOTE:
- This refers to the
URI specification, which does not
cover relative addresses. There is
no specification of how to distinguish
relative addresses from absolute
addresses.
- NAME
- OPTIONAL. If present, the attribute
NAME allows the anchor to be the
destination of a link. The value
of the attribute is an identifier
for the anchor. Identifiers are arbitrary
strings but must be unique within
the HTML document. Another document
can then make a reference explicitly
to this anchor by putting the identifier
after the address, separated by a
hash sign . @@NOTE:
- This feature
is representable in SGML as an ID
attribute, if we restrict the identifiers
to be SGML names .
- REL
- OPTIONAL. An attribute REL may
give the relationship (s) described
by the hypertext link. The value
is a comma-separated list of relationship
values. Values and their semantics
will be registered by the HTML registration
authority . The default relationship
if none other is given is void. REL
should not be present unless HREF
is present. See Relationship values
, REV .
- REV
- OPTIONAL. The same as REL , but
the semantics of the link type are
in the reverse direction. A link
from A to B with REL="X" expresses
the same relationship as a link from
B to A with REV="X". An anchor may
have both REL and REV attributes.
- URN
- OPTIONAL. If present, this specifies
a uniform resource number for the
document. See note .
- TITLE
- OPTIONAL. This is informational
only. If present the value of this
field should equal the value of the
TITLE of the document whose address
is given by the HREF attribute. See
note .
- METHODS
- OPTIONAL. The value of this
field is a string which if present
must be a comma separated list of
HTTP METHODS supported by the object
for public use. See note .
All attributes are optional, although
one of NAME and HREF is necessary
for the anchor to be useful. See
also: LINK .
Example of use:
See <A HREF="http://www.w3.org/">CERN</A>'s information for
more details.
A <A NAME=serious>serious</A> crime is one which is associated
with imprisonment.
...
The Organization may refuse employment to anyone convicted
of a <a href="#serious">serious</A> crime.
This element is for address information,
signatures, authorship, etc, often
at the top or bottom of a document.
Typical rendering
Typically, an address element is
italic and/or right justified or
indented. The address element implies
a paragraph break. Paragraph marks
within the address element do not
cause extra white space to be inserted.
Examples of use:
<ADDRESS><A HREF="Author.html">A.N.Other</A></ADDRESS>
<ADDRESS>
Newsletter editor<p>
J.R. Brown<p>
JimquickPost News, Jumquick, CT 01234<p>
Tel (123) 456 7890
</ADDRESS>
BASE
This element allows the URL of the
document itself to be recorded in
situations in which the document
may be read out of context. URLs
within the document may be in a "partial"
form relative to this base address.
Where the base address is not specified,
the reader will use the URL it used
to access the document to resolve
any relative URLs.
The one attribute is:
- HREF
- the URL
Line Break
The line break element marks that
a new line must be started at the
given point.
Typical rendering
A new line with indent the same as
that of line-wrapped text.
Examples
<ADDRESS>Tim Berners-Lee<BR>
World Wide Web project<BR>
CERN<BR>1211 Geneva 23<BR>Switzerland
</ADDRESS>
I think that I shall never see<BR>
A hoarding lovely as a tree<BR>
In fact, unless the hoardings fall<BR>
I'll never see a tree at all.<P>
See also:
Paragraph marks
BLOCKQUOTE
The BLOCKQUOTE element allows text
quoted from another source to be
rendered specially.
Typical rendering
A typical rendering might be a slight
extra left and right indent, and/or
italic font. BLOCKQUOTE causes a
paragraph break, and typically a
line or so of white space will be
allowed between it and any text before
or after it.
Single-font rendition may for example
put a vertical line of ">" characters
down the left margin to indicate
quotation in the Internet mail style.
Example
I think it ends
<BLOCKQUOTE>Soft you now, the fair Ophelia. Nymph, in thy orisons,
be all my sins remembered.
</BLOCKQUOTE>
but I am not sure.
Fill-out Forms and Input fields
Forms are composed by placing input
fields within paragraphs, preformatted/literal
text, lists and tables. This gives
considerable scope in designing the
layout of forms. The form features
use the following elements which
are all known as HTML level 2 elements.
- FORM
- a form within a document.
- INPUT
- one input field
- TEXTAREA
- a multline input field
- SELECT
- A selection from a finite
set of options
- OPTION
- one option within a SELECT
Each field is defined by an INPUT
element and must have an NAME attribute
which uniquely names the field in
the document. Additional optional
attributes can be used to specify
the type of the field (defaults to
free text), its size/precision, its
initial value and whether the field
is currently disabled or in error:
<FORM ACTION="mailto:www_admin@info.cern.ch">
<MH HIDDEN>Subject: WWW Questionaire</MH>
Please help up to improve the World Wide Web by filling in the
following questionaire:
<P>Your organization? <INPUT NAME="org" SIZE="48">
<P>Commercial? <INPUT NAME="commerce" TYPE=checkbox>
How many users? <INPUT NAME="users" TYPE=int>
<P>Which browsers do you use?
<UL>
<LI>X Mosaic <INPUT NAME="browsers" TYPE=checkbox VALUE="xmosaic">
<LI>Cello <INPUT NAME="browsers" TYPE=checkbox VALUE="cello">
<LI>Others <TEXTAREA NAME="others" COLS=48 ROWS=4></TEXTAREA>
</UL>
A contact point for your site: <INPUT NAME="contact" SIZE="42">
<P>Many thanks on behalf of the WWW central support team.
<P ALIGN=CENTER><INPUT TYPE=submit> <INPUT TYPE=reset>
</FORM>
This fictitious example is a questionnaire
that will be emailed to www_admin@info.cern.ch
. The FORM element is used to delimit
the form. There can be several forms
in a single document, but the FORM
element can't be nested. The ACTION
attribute specifies a URL that designates
an HTTP server or an email address.
If missing, the URL for the document
itself will be assumed. The effect
of the action can be modified by
including a method prefix, e.g. ACTION="POST
http://...." . This prefix is used
to select the HTTP method when sending
the form's contents to an HTTP server.
Would it be cleaner to use a separate
attribute, e.g. METHOD ?
Servers can disable forms by sending
an appropriate header or by an attribute
on the optional HTMLPLUS element
at the very start of the document,
e.g. <htmlplus forms=off> .
Here, the <P> and <UL> elements have
been used to lay out the text (and
input fields. The browser has changed
the background color within the FORM
element to distinguish the form from
other parts of the document. The
browser is responsible for handling
the input focus, i.e. which field
will currently get keyboard input.
For many platforms there will be
existing conventions for forms, e.g.
and shift- keys to move the keyboard
focus forwards and backwards between
fields, while an key submits the
form. In the example, the and buttons
are specified explicitly with special
purpose fields. The button is used
to email the form or send its contents
to the server as specified by the
ACTION attribute, while the button
resets the fields to their initial
values. When the form consists of
a single text field, it may be appropriate
to leave such buttons out and rely
on the key.
The INPUT element is used for a large
variety of typed of input fields.
When you need to let users enter
more than one line of text, you should
use the TEXTAREA element.
The RADIO and CHECKBOX types of
INPUT field can be used to specify
multiple choice forms in which every
alternative is visible as part of
the form. An alternative is to use
the SELECT element which is generally
rendered in a more compact fashion
as a pull down combo list.
FORM
The FORM element is used to delimit
the form . There can be several forms
in a single document, but the FORM
element can't be nested.
The ACTION attribute specifies a
URL that designates an HTTP server
or an email address. If missing,
the URL for the document itself will
be assumed. The effect of the action
can be modified by including a method
prefix, e.g. ACTION="POST http://...."
. This prefix is used to select the
HTTP method when sending the form's
contents to an HTTP server. Would
it be cleaner to use a separate attribute,
e.g. METHOD ?
INPUT
The INPUT element represents a field
whose contents may be edited by the
user. It has the following attributes.
- NAME
- Symbolic name used when transferring
the form's contents. This attribute
is always needed and should uniquely
identify this field.
- TYPE
- Defines the type of data the
field accepts. Defaults to free text.
- SIZE
- Specifies the size or precision
of the field according to its type.
- MAXLENGTH
- The maximum number of characters
that will be accepted as input. This
can be greater that specified by
SIZE , in which case the field will
scroll appropriately. The default
is unlimited.
- VALUE
- The initial value for the field,
or the value when checked for checkboxes
and radio buttons. This attribute
is required for radio buttons.
- SRC
- A URL or URN specifying an image
- for use only with TYPE=IMAGEMAP.
- ALIGN
- Vertical alignment of the image
- for use only with TYPE=IMAGEMAP.
Propsed
- CHECKED
- When present indicates that
a checkbox or radio button is selected.
- DISABLED
- When present indicates that
this field is temporarily disabled.
Browsers should show this by "greying
it" out in some manner.
- ERROR
- When present indicates that
the field's initial value is in error
in some way, e.g. because it is inconsistent
with the values of other fields.
Servers should include an explanatory
error message with the form's text.
Types
The following types of fields can
be defined with the TYPE attribute
:
- TEXT
- Single line text entry fields.
Use the SIZE attribute to specify
the visible width in characters,
e.g. SIZE="24" for a 24 character
field. The MAX attribute can be used
to specify an upper limit to the
number of characters that can be
entered into a text field, e.g. MAX=72
. Use the TEXTAREA element for text
fields which can accept multiple
lines (see below).
- HIDDEN
- No field is presented to the
user, but the content of the field
is sent with the submitted form.
This value may be used to transmit
state information about client/server
interaction.
- CHECKBOX
- Used for simple Boolean
attributes, or for attributes which
can take multiple values at the same
time. The latter is represented by
a number of checkbox fields each
of which has the same NAME .
- RADIO
- For attributes which can take
a single value from a set of alternatives.
Each radio button field in the group
should be given the same NAME .
- SUBMIT
- This is a button that when
pressed submits the form. It offers
authors control over the location
of this button. You can use an image
as a submit button by specifying
a URL with the SRC attribute.
- RESET
- This is a button that when
pressed resets the form's fields
to their initial values as specified
by the VALUE attribute. You can use
an image as a reeset button by specifying
a URL with the SRC attribute.
Proposed types
- RANGE
- This allows you to specify
an integer range with the MIN and
MAX attributes, e.g. MIN=1 MAX=100
. Users can select any value in this
range.
- INT
- For entering integer numbers,
the maximum number of digits can
be specified with the SIZE attribute
(excluding the sign character), e.g.
size=3 for a three digit number.
- FLOAT
- For fields which can accept
floating point numbers.
- SCRIBBLE
- A field upon which you can
write with a pen or mouse. The size
of the field in millimeters is given
as SIZE= width , height. The units
are absolute as they relate to the
dimensions of the human hand, rather
than pixels of varying resolution.
The scribble may involve time and
pressure data in addition to the
basic ink data. You can use scribble
for signatures or sketches. The field
can be initialised by setting the
SRC attribute to a URL which contains
the ink *2 . The VALUE attribute
is ignored.
- AUDIO
- This provides a way of entering
spoken messages into a form. Browsers
might show an icon which when clicked
pops-up a set of tape controls that
you can use to record and replay
messages. The initial message can
be set by specifying a URL with the
SRC attribute. The VALUE attribute
is ignored.
Obsolete types
DATE Fields which can accept a recognized
date format.
URL For fields which expect document
references as URLs or URNs.
- IMAGE
- This allows you to specify
an image field upon which you can
click with a pointing device. The
SRC and ALIGN attributes are exactly
the same as for the IMG and IMAGE
elements. The symbolic names for
the x and y coordinates of the click
event are specified with .x and .y
for the given with the NAME attribute.
The VALUE attribute is ignored.
When you need to let users enter
more than one line of text, you should
use the TEXTAREA element.
OPTION
The OPTION element can take the following
attributes:
- SELECTED
- Indicates that this option
is initially selected.
- VALUE
- When present indicates the
value to be returned if this option
is chosen. The returned value defaults
to the contents of the option element.
Proposed attributes
- DISABLED
- When present indicates that
this option is temporarily disabled.
Browsers should show this by "greying
it"
The contents of the OPTION element
is presented to the user to represent
the option. It is used as a returned
value if the VALUE attribute is not
present.
SELECT
The SELECT element allows the user
to chose one of a set of alternatives
described by textual labels, Every
alternative is represented by the
OPTION element.
Attributes
- MULTIPLE
- The MULTIPLE attribute is
needed when users are allowed to
make several selections, e.g. <SELECT
MULTIPLE> .
Proposed attributes
- ERROR
- The ERROR attribute can be
used to indicate that the initial
selection is in error in some way,
e.g. because it is inconsistent with
the values of other fields.
Typical rendering
SELECT is typically as a pull down
or pop-up list.
Example
e.g.
<SELECT NAME="flavor">
<OPTION>Vanilla
<OPTION>Strawberry
<OPTION>Rum and Raisin
<OPTION>Peach and Orange
</SELECT>
- out in some manner.
TEXTAREA
When you need to let users enter
more than one line of text, you should
use the TEXTAREA element, e.g.
<TEXTAREA NAME="address" ROWS=64 COLS=6>
Hewlett Packard Laboratories
1501 Page Mill Road
Palo Alto, California 94304-1126
</TEXTAREA>
The text up to the end tag is used
to initialize the field's value.
This end tag is always required even
if the field is initially blank.
The ROWS and COLS attributes determine
the visible dimension of the field
in characters. Browsers are recommended
to allow text to grow beyond these
limits by scrolling as needed. In
the initial design for forms, multi-line
text fields were supported by the
INPUT element with TYPE=TEXT . Unfortunately,
this causes problems for fields with
long text values as SGML limits the
length of attribute literals. The
HTML+ DTD allows for up to 1024 characters
(the SGML default is only 240 characters!).
Six levels of heading are supported.
(Note that a hypertext node within
a hypertext work tends to need fewer
levels of heading than a work whose
only structure is given by the nesting
of headings.)
A heading element implies all the
font changes, paragraph breaks before
and after, and white space (for example)
necessary to render the heading.
Further character emphasis or paragraph
marks are not required in HTML.
H1 is the highest level of heading,
and is recommended for the start
of a hypertext node. It is suggested
that the the text of the first heading
be suitable for a reader who is already
browsing in related information,
in contrast to the title tag which
should identify the node in a wider
context.
The heading elements are
<H1>, <H2>, <H3>, <H4>, <H5>, <H6>
It is not normal practice to jump
from one header to a header level
more than one below, for example
for follow an H1 with an H3. Although
this is legal, it is discouraged,
as it may produce strange results
for example when generating other
representations from the HTML.
Example:
<H1>This is a heading</H1>
Here is some text
<H2>Second level heading</H2>
Here is some more text.
Parser Note:
Parsers should not require any specific
order to heading elements, even if
the heading level increases by more
than one between successive headings.
Typical Rendering
- H1
- Bold very large font, centered.
One or two lines clear space between
this and anything following. If printed
on paper, start new page.
- H2
- Bold, large font,, flush left
against left margin, no indent. One
or two clear lines above and below.
- H3
- Italic, large font, slightly indented
from the left margin. One or two
clear lines above and below.
- H4
- Bold, normal font, indented more
than H3. One clear line above and
below.
- H5
- Italic, normal font, indented
as H4. One clear line above.
- H6
- Bold, indented same as normal
text, more than H5. One clear line
above.
These typical values are just an
indication, and it is up to the designer
of the presentation software to define
the styles. The reader may have options
to customize these. When writing
documents, you should assume that
whatever is done it is designed to
have the same sort of effect as the
styles above.
The rendering software is responsible
for generating suitable vertical
white space between elements, so
it is NOT normal or required to follow
a heading element with a paragraph
mark.
Horizontal Rule
Typical Rendering
Some sort of divider between sections
of text such as a full width horizontal
rule or equivalent graphic.
Example
The horizontal rule is typically
used for separating heading information
(when more than just a heading) from
content, etc.
<H1>The Albatross</H1>
<Address>The Bumstead Monthly, 1948</Address>
The following information is culled from
this and suvccessive issues of the magazine.
Thanks are due to the editor-in-chief,
A.R. Bunstead, for her help and advice.
<H2>Copyright IQR Inc.</h2>
This recording may not be sold, resold,
hired out, used, or talked about in too great
a depth without the publisher's written or
videotaped consent.
<HR>
The Albatross, most fabled and infamous of ..
IMG: Embedded Images
Status: Extra
The IMG element allows another document
to be inserted inline. The document
is normally an icon or small graphic,
etc. This element is NOT intended
for embedding other HTML text.
Browsers which are not able to display
inline images ignore IMG elements.
Authors should note that some browsers
will be able to display (or print)
linked graphics but not inline graphics.
If the graphic is essential, it may
be wiser to make a link to it rather
than to put it inline. If the graphic
is essentially decorative, then IMG
is appropriate.
The IMG element is empty: it has
no closing tag. It has two attributes:
- SRC
- The value of this attribute is
the URL of the document to be embedded.
Its syntax is the same as that of
the HREF attribute of the A tag.
SRC is mandatory.
- ALIGN
- Take values TOP or MIDDLE or
BOTTOM, defining whether the tops
or middles of bottoms of the graphics
and text should be aligned vertically.
- ALT
- Optional alternative text as
an alternative to the graphics for
display in text-only environments.
Note that IMG elements are allowed
within anchors.
Example
Warning: < IMG SRC ="triangle.gif" ALT="Warning:"> This must be done by a
qualified technician.
< A HREF="Go.html">< IMG SRC ="Button.ps" ALT="GO"></A>
This element informs the reader that
the document is an index document.
As well as reading it, the reader
may use a keyword search.
The node may be queried with a keyword
search by suffixing the node address
with a question mark, followed by
a list of keywords separated by plus
signs. See the network address format
.
Note that this tag is normally generated
automatically by a server. If it
is added by hand to an HTML document,
then the client will assume that
the server can handle a search on
the document. Obviously the server
must have this capability for it
to work: simply adding <ISINDEX>
in the document is not enough to
make searches happen if the server
does not have a search engine!
Status: standard.
Example of use:
<ISINDEX>
LINK
The LINK element occurs within the
HEAD element of an HTML document.
It is used to indicate a relationship
between the document and some other
object. A document may have any number
of LINK elements.
The LINK element is empty, but takes
the same attributes as the anchor
element .
Typical uses are to indicate authorship,
related indexes and glossaries, older
or more recent versions, etc. Links
can indicate a static tree structure
in which the document was authored
by pointing to a "parent" and "next"
and "previous" document, for example.
Servers may also allow links to be
added by those who do not have the
right to alter the body of a document.
Forms of list in HTML
These lists may be nested
A glossary (or definition list) is
a list of paragraphs each of which
has a short title alongside it. Apart
from glossaries, this element is
useful for presenting a set of named
elements to the reader. The elements
within a glossary follow are introduced
by these elements:
- DT
- The "term", typically placed in
a wide left indent
- DD
- The "definition", which may wrap
onto many lines
These elements must appear in pairs.
Single occurrences of DT without
a following DD are allowed, and have
the same significance as if the DD
had been present with no text.. The
one attribute which DL can take is
- COMPACT
- suggests that a compact rendering
be used, because the enclosed elements
are individually small, or the whole
glossary is rather large, or both.
Typical rendering
The definition list DT, DD pairs
are arranged vertically. For each
pair, the DT element is on the left,
in a column of about a third of the
display area, and the DD element
is in the right hand two thirds of
the display area. The DT term is
normally small enough to fit on one
line within the left-hand column.
If it is longer, it will either extend
across the page, in which case the
DD section is moved down to separate
them, or it is wrapped onto successive
lines of the left hand column.
This is sometimes implemented with
the use of a large negative first
line indent.
White space is typically left between
successive DT,DD pairs unless the
COMPACT attribute is given. The COMPACT
attribute is appropriate for lists
which are long and/or have DT,DD
pairs which each take only a line
or two. It is of course possible
for the rendering software to discover
these cases itself and make its own
decisions, and this is to be encouraged.
The COMPACT attribute may also reduce
the width of the left-hand (DT) column.
<DL>
<DT>Term the first<DD>definition paragraph is reasonably
long but is still displayed clearly
<DT>Term2 follows<DD>Definition of term2
</DL>
<DL COMPACT>
<DT>Term<DD>definition paragraph
<DT>Term2<DD>Definition of term2
</DL>
Lists
A list is a sequence of paragraphs,
each of which may be preceded by
a special mark or sequence number.
The syntax is:
<UL>
<LI> list element
<LI> another list element ...
</UL>
The opening list tag may be any of
UL , OL , MENU or DIR . It must be
immediately followed by the first
list element.
Typical rendering
The representation of the list is
not defined here, but a bulleted
list for unordered lists, and a sequence
of numbered paragraphs for an ordered
list would be quite appropriate.
Other possibilities for interactive
display include embedded scrollable
browse panels.
List elements with typical rendering
are:
- UL
- A list of multi-line paragraphs,
typically separated by some white
space and/or marked by bullets, etc.
- OL
- As UL, but the paragraphs are
typically numbered in some way to
indicate the order as significant.
- MENU
- A list of smaller paragraphs.
Typically one line per item, with
a style more compact than UL.
- DIR
- A list of short elements, typically
less than 20 characters. These may
be arranged in columns across the
page, typically 24 character in width.
If the rendering software is able
to optimize the column width as function
of the widths of individual elements,
so much the better.
Example of use
<OL>
<LI> When you get to the station, leave
by the southern exit, on platform one.
<LI>Turn left to face toward the mountain
<LI>Walk for a mile or so until you reach the
"Asquith Arms" then
<LI>Wait and see...
</OL>
< MENU >
<LI>The oranges should be pressed fresh
<LI>The nuts may come from a packet
<LI>The gin must be good quality
</MENU>
< DIR >
<LI>A-H<LI>I-M
<LI>M-R<LI>S-Z
</DIR>
This tag takes a single attribute
which is the number of the next document-wide
numeric identifier to be allocated
of the form z123.
When modifying a document, old anchor
ids should not be reused, as there
may be references stored elsewhere
which point to them. This is read
and generated by hypertext editors.
Human writers of HTML usually use
mnemonic alphabetical identifiers.
Browser software may ignore this
tag.
Example of use:
<NEXTID N=z27>
P: Paragraph
The empty P element represents a
paragraph. The exact rendering of
this (indentation, leading, etc)
is not defined here, and may be a
function of other tags, style sheets
etc.
You do NOT need to use <P> to put
white space around heading, list,
address or blockquote elements. It
is the responsibility of the rendering
software to generate that white space.
An empty paragraph has undefined
effect and should be avoided.
Typical rendering
Typically, paragraphs are surrounded
by a small vertical space (of a line
or half a line). This is not the
case (typically) within ADDRESS or
(ever) within PRE elements. With
some implementations, normal paragraphs
may have a small extra left indent
on the first line.
Examples of use
<h1>What to do</h1>
<p>This is a one paragraph.<P>This is a second.
<P>
This is a third.
Bad example
<h1><P>What not to do</h1>
<address><p>I found that on my XYZ browser it looked prettier to
me if I put some paragraph tags</address>
<p>
<ul><p><li>Around lists, and
<li>Inside headings.
</ul>
<p>
<h2>None of the paragraph tags in this example should
be there.</h2>
See also
Line Break
Preformatted elements in HTML are
displayed with text in a fixed width
font, and so are suitable for text
which has been formatted for a teletype
by some existing formatting system.
The optional attribute is:
- WIDTH
- This attribute gives the maximum
number of characters which will occur
on a line. It allows the presentation
system to select a suitable font
and indentation. Where the WIDTH
attribute is not recognized, it is
recommended that a width of 80 be
assumed. Where WIDTH is supported,
it is recommended that at least widths
of 40, 80 and 132 characters be presented
optimally, with other widths being
rounded up.
Within a PRE element,
- Line boundaries within the text are
rendered as a move to the beginning
of the next line, except for one
immediately following or immediately
preceding a tag.
- The <p> tag should not be used. If
found, it should be rendered as a
move to the beginning of the next
line.
- Anchor elements and character highlighting
elements may be used.
- Elements which define paragraph formatting
(Headings, Address, etc) must not
be used.
- The ASCII Horizontal Tab (HT) character
must be interpreted as the smallest
positive nonzero number of spaces
which will leave the number of characters
so far on the line as a multiple
of 8. Its use is not recommended
however.
Example of use
<PRE WIDTH="80">
This is an example line
</PRE>
Note: Highlighting
Within a preformatted element, the
constraint that the rendering must
be on a fixed horizontal character
pitch may limit or prevent the ability
of the renderer to render highlighting
elements specially.
Note: Margins
The above references to the "beginning
of a new line" must not be taken
as implying that the renderer is
forbidden from using a (constant)
left indent for rendering preformatted
text. The left indent may of course
be constrained by the width required.
The title of a document is specified
by the TITLE element. The TITLE element
must occur in the HEAD of the document.
There may only be one title in any
document. It should identify the
content of the document in a fairly
wide context.
It may not contain anchors, paragraph
marks, or highlighting. The title
may be used to identify the node
in a history list, to label the window
displaying the node, etc. It is not
normally displayed in the text of
a document itself. Contrast titles
with headings . The title should
ideally be less than 64 characters
in length. That is, many applications
will display document titles in window
titles, menus, etc where there is
only limited room. Whilst there is
no limit on the length of a title
(as it may be automatically generated
from other data), information providers
are warned that it may be truncated
if long.
Examples of use
Appropriate titles might be
<TITLE>Rivest and Neuman. 1989(b)</TITLE>
or
<TITLE>A Recipe for Maple Syrup Flap-Jack</TITLE>
or
<TITLE>Introduction -- AFS user's Guide</TITLE>
Examples of inappropriate titles
are those which are only meaningful
within context,
<TITLE>Introduction</TITLE>
or too long,
<TITLE>Remarks on the Quantum-Gravity effects of "Bean
Pole" diversification in Mononucleosis patients in Developing
Countries under Economic Conditions Prevalent during
the Second half of the Twentieth Century, and Related Papers:
a Summary</TITLE>
Character highlighting
Status: Extra
These elements allow sections of
text to be formatted in a particular
way, to provide emphasis, etc. The
tags do NOT cause a paragraph break,
and may be used on sections of text
within paragraphs.
Where not supported by implementations,
like all tags, these tags should
be ignored but the content rendered.
All these tags have related closing
tags, as in
This is <EM>emphasized</EM> text.
Some of these styles are more explicit
than others about how they should
be physically represented. The logical
styles should be used wherever possible,
unless for example it is necessary
to refer to the formatting in the
text. (Eg, "The italic parts are
mandatory".)
Note:
Browsers unable to display a specified
style may render it in some alternative,
or the default, style, with some
loss of quality for the reader. Some
implementations may ignore these
tags altogether, so information providers
should attempt not to rely on them
as essential to the information content.
These element names are derived from
TeXInfo macro names.
Physical styles
- TT
- Fixed-width typewriter font.
- B
- Boldface, where available, otherwise
alternative mapping allowed.
- I
- Italic font (or slanted if italic
unavailable).
- U
- Underline.
Logical styles
- EM
- Emphasis, typically italic.
- STRONG
- Stronger emphasis, typically
bold.
- CODE
- Example of code. typically monospaced
font. (Do not confuse with PRE )
- SAMP
- A sequence of literal characters.
- KBD
- in an instruction manual, Text
typed by a user.
- VAR
- A variable name.
- DFN
- The defining instance of a term.
Typically bold or bold italic.
- CITE
- A citation. Typically italic.
Examples of use
This text contains an <em>emphasized</em> word.
<strong>Don't assume</strong> that it will be italic!
It was made using the <CODE>EM</CODE> element. A citation is
typically italic and has no formal necessary structure:
<cite>Moby Dick</cite> is a book title.
Obsolete elements
The following elements of HTML are
obsolete. It is recommended that
client implementors implement the
obsolete forms for compatibility
with old servers.
Status: Obsolete .
The empty PLAINTEXT tag terminates
the HTML entity. What follows is
not SGML. In stead, there's an old
HTTP convention that what follows
is an ASCII (MIME "text/plain") body.
An example if its use is:
<PLAINTEXT>
0001 This is line one of a ling listing
0002 file from <any@host.inc.com> which is sent
This tag allows the rest of a file
to be read efficiently without parsing.
Its presence is an optimization.
There is no closing tag. The rest
of the data is not in SGML.
Status: Obsolete . This are in use
and should be recognized by browsers.
New servers should use <PRE> instead.
These styles allow text of fixed-width
characters to be embedded absolutely
as is into the document. The syntax
is:
<LISTING>
...
</LISTING>
or
<XMP>
...
</XMP>
The text between these tags is to
be portrayed in a fixed width font,
so that any formatting done by character
spacing on successive lines will
be maintained. Between the opening
and closing tags:
- The text may contain any ISO Latin
printable characters, but not the
end tag opener. (See Historical note
)
- Line boundaries are significant,
except any occurring immediately
after the opening tag or before the
closing tag. and are to be rendered
as a move to the start of a new line.
- The ASCII Horizontal Tab (HT) character
must be interpreted as the smallest
positive nonzero number of spaces
which will leave the number of characters
so far on the line as a multiple
of 8. Its use is not recommended
however.
The LISTING element is portrayed
so that at least 132 characters will
fit on a line. The XMP elementis
portrayed in a font so that at least
80 characters will fit on a line
but is otherwise identical to LISTING.
Highlighted Phrase HP1 etc
Status: Obsolete . These tags like
all others should be ignored if not
implemented. Replaced will more meaningful
elements -- see character highlighting
.
Examples of use:
<HP1>...</HP1> <HP2>... </HP2> etc.
Comment element
Status: Obsolete
A comment element used for bracketing
off unneed text and comment has been
introduced in some browsers but will
be replaced by the SGML command feature
in new implementations.
Historical Note: XMP and LISTING
The XMP and LISTING elements used
historically to have non SGML conforming
specifications, in that the text
could contain any ISO Latin printable
characters, including the tag opener,
so long as it does not contain the
closing tag in full.
This form is not supported by SGML
and so is not the specified HTML
interpretation. Providers should
be warned that implementations may
vary on how they interpret end tags
apparently within these elements
Entities
The following entity names are used
in HTML , always prefixed by ampersand
(&) and followed by a semicolon as
shown. They represent particular
graphic characters which have special
meanings in places in the markup,
or may not be part of the character
set available to the writer.
- <
- The less than sign <
- >
- The "greater than" sign >
- &
- The ampersand sign &itself.
- "
- The double quote sign "
-
- A non-breaking space
Also allowed are references to any
of the ISO Latin-1 alphabet, using
the entity names in the following
table .
ISO Latin 1 character entities
This list is derived from "ISO 8879:1986//ENTITIES
Added Latin 1//EN".
- Æ
- capital AE diphthong (ligature)
- Á
- capital A, acute accent
- Â
- capital A, circumflex accent
- À
- capital A, grave accent
- Å
- capital A, ring
- Ã
- capital A, tilde
- Ä
- capital A, dieresis or umlaut
mark
- Ç
- capital C, cedilla
- Ð
- capital Eth, Icelandic
- É
- capital E, acute accent
- Ê
- capital E, circumflex accent
- È
- capital E, grave accent
- Ë
- capital E, dieresis or umlaut
mark
- Í
- capital I, acute accent
- Î
- capital I, circumflex accent
- Ì
- capital I, grave accent
- Ï
- capital I, dieresis or umlaut
mark
- Ñ
- capital N, tilde
- Ó
- capital O, acute accent
- Ô
- capital O, circumflex accent
- Ò
- capital O, grave accent
- Ø
- capital O, slash
- Õ
- capital O, tilde
- Ö
- capital O, dieresis or umlaut
mark
- Þ
- capital THORN, Icelandic
- Ú
- capital U, acute accent
- Û
- capital U, circumflex accent
- Ù
- capital U, grave accent
- Ü
- capital U, dieresis or umlaut
mark
- Ý
- capital Y, acute accent
- á
- small a, acute accent
- â
- small a, circumflex accent
- æ
- small ae diphthong (ligature)
- à
- small a, grave accent
- å
- small a, ring
- ã
- small a, tilde
- ä
- small a, dieresis or umlaut
mark
- ç
- small c, cedilla
- é
- small e, acute accent
- ê
- small e, circumflex accent
- è
- small e, grave accent
- ð
- small eth, Icelandic
- ë
- small e, dieresis or umlaut
mark
- í
- small i, acute accent
- î
- small i, circumflex accent
- ì
- small i, grave accent
- ï
- small i, dieresis or umlaut
mark
- ñ
- small n, tilde
- ó
- small o, acute accent
- ô
- small o, circumflex accent
- ò
- small o, grave accent
- ø
- small o, slash
- õ
- small o, tilde
- ö
- small o, dieresis or umlaut
mark
- ß
- small sharp s, German (sz
ligature)
- þ
- small thorn, Icelandic
- ú
- small u, acute accent
- û
- small u, circumflex accent
- ù
- small u, grave accent
- ü
- small u, dieresis or umlaut
mark
- ý
- small y, acute accent
- ÿ
- small y, dieresis or umlaut
mark
The HTML DTD
The SGML declaration of HTML follows
. Its relationship to the content
of an SGML document is explained
in the section "HTML and SGML" .
<!SGML "ISO 8879:1986"
--
SGML Declaration for HyperText Markup Language (HTML)
as used by the World-Wide Web (WWW) application.
--
CHARSET
BASESET "ISO 646:1983//CHARSET
International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET 0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
BASESET "ISO Registration Number 100//CHARSET
ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
DESCSET 128 32 UNUSED
160 95 32
CAPACITY SGMLREF
TOTALCAP 150000
GRPCAP 150000
SCOPE DOCUMENT
SYNTAX
SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 127
BASESET "ISO 646:1983//CHARSET
International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET 0 128 0
FUNCTION
-- SPACE 32
TAB SEPCHAR 9
LF SEPCHAR 10
FF SEPCHAR 12
CR SEPCHAR 13 --
-- The above is an accurate description of the usage of FUNCTION --
-- characters in HTML implementations; that is, there is no --
-- Record Start or Record End character, and no occurences of --
-- character 10 or 13 are "ignored" by the parser. --
-- But because few SGML implementations support this concrete --
-- sytax, we include the one below. --
-- Note that in order to get correct behaviour w.r.t. newline --
-- processing, you will have to play some tricks in construcing --
-- the document entity for parsing in order to keep the parser --
-- from ignoring newlines in surpirsing ways --
RE 13
RS 10
SPACE 32
TAB SEPCHAR 9
NAMING LCNMSTRT ""
UCNMSTRT ""
LCNMCHAR ".-"
UCNMCHAR ".-"
NAMECASE GENERAL YES
ENTITY NO
DELIM GENERAL SGMLREF
SHORTREF SGMLREF
NAMES SGMLREF
QUANTITY SGMLREF
NAMELEN 34
TAGLVL 100
LITLEN 1024
GRPGTCNT 150
GRPCNT 64
FEATURES
MINIMIZE
DATATAG NO
OMITTAG YES
RANK NO
SHORTTAG YES
LINK
SIMPLE NO
IMPLICIT NO
EXPLICIT NO
OTHER
CONCUR NO
SUBDOC NO
FORMAL YES
APPINFO NONE
>
<!--
$Id: html.decl,v 1.6 1994/05/18 17:23:34 connolly Exp $
Author: Daniel W. Connolly <connolly@hal.com>
See also: http://www.hal.com/%7Econnolly/html-spec/HTML.html
http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html
-->
The HTML DTD
The HTML DTD follows . Its relationship
to the content of an SGML document
is explained in the section "HTML
and SGML" .
<!-- html.dtd
Document Type Definition for the HyperText Markup Language
as used by the World Wide Web (HTML DTD).
$Id: html.dtd,v 1.13 1994/05/18 17:23:29 connolly Exp $
Author: Daniel W. Connolly <connolly@hal.com>
See Also: http://www.hal.com/%7Econnolly/html-spec/HTML.html
http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html
-->
<!ENTITY HTML.Version
"-//connolly hal.com//DTD WWW HTML $Date 1994/04/19 17:24:06 $//EN"
-- public identifier for "current pracitice" version --
-- actually, take the $'s out to get the real public identifer, --
-- since $ is illegal in public identifier. When DTD stabilizes, --
-- we'll need to stop using RCS keywords to version the pub id --
-- Typical usage:
<!DOCTYPE HTML PUBLIC "-//connolly hal.com//DTD WWW HTML
$Date: 1994/05/18 17:23:29 $//EN">
<html>
...
</html>
--
>
<!-- Feature Test Entities -->
<!-- To use these, write your document like:
<!DOCTYPE HTML [
<!ENTITY % HTML.Optional "INCLUDE">
<!ENTITY % html PUBLIC "-//connolly hal.com//DTD WWW HTML 1.8//EN">
%html;
]>
<TITLE>Here's my doc</TITLE>
<p>It uses lots of optional features
In practice, if you're using sgmls to validate your docs,
you can stick the <!DOCTYPE [...]> in a separate file and
validate with:
sgmls -s doctype.sgml foo.html
-->
<!ENTITY % HTML.Minimal "IGNORE">
<!ENTITY % HTML.Obsolete "IGNORE">
<!ENTITY % HTML.Prescriptive "IGNORE">
<![ %HTML.Minimal [
<!ENTITY % HTML.linkRelationships "IGNORE">
<!ENTITY % HTML.linkMethods "IGNORE">
<!ENTITY % HTML.linkRedundantInfo "IGNORE">
<!ENTITY % HTML.forms "IGNORE">
<!-- @@ nested lists -->
<!-- @@ phrases -->
<!-- @@ headers inside A -->
<!-- @@ nested phrases, fonts -->
]]>
<![ %HTML.Obsolete [
<!ENTITY % HTML.titleCDATA "INCLUDE">
<!ENTITY % HTML.litCDATA "INCLUDE">
<!ENTITY % HTML.pSeparator "INCLUDE">
]]>
<![ %HTML.Prescriptive [
<!--
This feature test entity prescribes that certain
idioms detract from the structural integrity of an
HTML document, and are therefore disallowed.
-->
<!ENTITY % HTML.font-phrase "IGNORE">
<!ENTITY % HTML.anchorNameCDATA "IGNORE">
<!ENTITY % HTML.PLAINTEXT "IGNORE">
<!ENTITY % HTML.bodyBlockOnly "INCLUDE">
]]>
<!ENTITY % HTML.bodyBlockOnly "IGNORE"
-- only allow block elements in the BODY element
This means all paragraphs need to start with a <P> tag.
-->
<!ENTITY % HTML.pSeparator "IGNORE"
-- use P element as paragraph separator, rather that container.
-->
<!ENTITY % HTML.linkRelationships "INCLUDE"
-- Adding markup to links to show the relationship between
ends of a link
see http://www.w3.org/hypertext/WWW/MarkUp/Relationships.html
-->
<!ENTITY % HTML.linkMethods "INCLUDE"
-- Adding markup to links to show the methods supported
by the referent object
see http://www.w3.org/hypertext/WWW/MarkUp/Elements/A.html
-->
<!ENTITY % HTML.linkRedundantInfo "INCLUDE"
-- Adding markup to links to give redundant information
like URN, content type, title...
-->
<!ENTITY % HTML.anchorNameCDATA "INCLUDE"
-- Anchor names should be distinct. SGML parser can validate
this if the NAME attribute of the A element is declared as ID.
But that restricts the syntax of an anchor name to an SGML name,
i.e. a letter followed by letters, numbers, periods and dashes,
up to NAMELEN (34) characters long.
-->
<!ENTITY % HTML.PLAINTEXT "INCLUDE"
-- Support for the <PLAINTEXT> tag as a sign of the
end of th HTML data stream and the beginning of a stream
of text/plain data
-->
<!ENTITY % HTML.titleCDATA "IGNORE"
-- Is the TITLE element #PCDATA, RCDATA, or CDATA content?
On Mosaic, it's #PCDATA, but in the linemode browser,
it's more like CDATA, but not quite.
-->
<!ENTITY % HTML.NEXTID "INCLUDE"
-- Used by the NeXT implementation to keep track of the
next anchor id to use
-->
<!ENTITY % HTML.font-phrase "INCLUDE"
-- allow B, I, TT, U outside PRE,
CITE, VAR, etc. inside PRE
-->
<!ENTITY % HTML.KEY "IGNORE"
-- There was once a KEY element, for keyboard keys, menu items,
buttons, etc. but it's not supported or widely documented
-->
<!ENTITY % HTML.U "IGNORE"
-- There was also a U element, but since it clashes with
the common pracitce of underlining hypertext links, it is
not widely supported
-->
<!ENTITY % HTML.litCDATA "IGNORE"
-- treat XMP, LISTING as CDATA, as per linemodeWWW
-->
<!ENTITY % HTML.forms "INCLUDE"
-- Support for forms as per
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html
-->
<!-- DTD definitions -->
<!ENTITY % heading "H1|H2|H3|H4|H5|H6" >
<!ENTITY % list " UL | OL | DIR | MENU ">
<!ENTITY % literal " XMP | LISTING ">
<!ENTITY % URI "CDATA"
-- The term URI means a CDATA attribute
whose value is a Uniform Resource Identifier,
as defined by
"Universal Resource Identifiers" by Tim Berners-Lee
aka http://www.w3.org/hypertext/WWW/Addressing/URL/URI_Overview.html
Note that CDATA attributes are limited by the LITLEN
capacity (1024 in the current version of html.decl),
so that URIs in HTML have a bounded length.
@@ Need to discuss relative addresses.
-->
<!ENTITY % Content-Type "CDATA"
-- meaning a MIME content type, as per RFC1521
-->
<![ %HTML.anchorNameCDATA [ <!ENTITY % anchor-name "CDATA"> ]]>
<!ENTITY % anchor-name "ID">
<![ %HTML.linkRelationships [ <!ENTITY % linkRelAttrs "
REL CDATA #IMPLIED -- forward relationship type --
REV CDATA #IMPLIED -- reversed relationship type
to referent data:
PARENT CHILD, SIBLING, NEXT, TOP,
DEFINITION, UPDATE, ORIGINAL etc. --
"> ]]>
<!ENTITY % linkRelAttrs "">
<![ %HTML.linkRedundantInfo [ <!ENTITY % linkRedundantAttrs "
URN CDATA #IMPLIED -- universal resource number --
TITLE CDATA #IMPLIED -- advisory only --
"> ]]>
<!ENTITY % linkRedundantAttrs "">
<![ %HTML.linkMethods [ <!ENTITY % linkMethodAttrs "
METHODS NAMES #IMPLIED -- supported public methods of the object:
TEXTSEARCH, GET, HEAD, ... --
"> ]]>
<!ENTITY % linkMethodAttrs "">
<!ENTITY % linkattributes
"NAME %anchor-name #IMPLIED
HREF %URI; #IMPLIED
%linkRelAttrs;
%linkRedundantAttrs;
%linkMethodAttrs;
">
<!-- Document Element -->
<![ %HTML.PLAINTEXT [ <!ENTITY % obsolete-plaintext ", PLAINTEXT?"> ]]>
<!ENTITY % obsolete-plaintext "">
<!ENTITY % html-content "HEAD, BODY %obsolete-plaintext;">
<!ELEMENT HTML O O (%html-content)>
<![ %HTML.NEXTID [ <!ENTITY % head-content "TITLE? & ISINDEX? & LINK* & BASE?
& NEXTID?"> ]]>
<!ENTITY % head-content "TITLE & ISINDEX? & LINK* & BASE?">
<!ELEMENT HEAD O O (%head-content)>
<![ %HTML.titleCDATA [ <!ENTITY % title-content "CDATA"> ]]>
<!ENTITY % title-content "(#PCDATA)">
<!ELEMENT TITLE - - %title-content
-- The TITLE element is not considered part of the flow of text.
It should be displayed, for example as the page header or
window title.
-->
<!ELEMENT ISINDEX - O EMPTY
-- WWW clients should offer the option to perform a search on
documents containing ISINDEX.
-->
<!ELEMENT NEXTID - O EMPTY>
<!ATTLIST NEXTID N %anchor-name #REQUIRED
-- The number should be a name suitable for use
for the ID of a new element. When used, the value
has its numeric part incremented. EG Z67 becomes Z68
-->
<!ELEMENT LINK - O EMPTY>
<!ATTLIST LINK
%linkattributes>
<!ELEMENT BASE - O EMPTY -- Reference context for URIs -->
<!ATTLIST BASE
HREF %URI; #REQUIRED
>
<![ %HTML.KEY [
<!ENTITY % key-emph "| KEY">
]]>
<!ENTITY % key-emph "">
<![ %HTML.U [
<!ENTITY % u-font "| U">
]]>
<!ENTITY % u-font "">
<!ENTITY % font "TT | B | I %u-font">
<!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | DFN | CITE
| STRIKE %key-emph">
<![ %HTML.font-phrase [
<!ENTITY % obsolete-font "| %font">
<!ENTITY % obsolete-phrase "| %phrase">
]]>
<!ENTITY % obsolete-font "">
<!ENTITY % obsolete-phrase "">
<![ %HTML.pSeparator [
<!ENTITY % obsolete-p "| P">
]]>
<!ENTITY % obsolete-p "">
<!ENTITY % inline "%phrase %obsolete-font">
<!ENTITY % pre-inline "%font %obsolete-phrase %obsolete-p">
<!ENTITY % text "#PCDATA | IMG | %inline | BR %obsolete-p">
<!ENTITY % htext "A | %text" -- Plus links, no structure -->
<![ %HTML.font-phrase [ <!ENTITY % font-content "(%htext)+"> ]]>
<!ENTITY % font-content "#PCDATA">
<!ELEMENT (%font;) - - (%font-content;)>
<!ELEMENT (%phrase;) - - (%htext)+>
<!ENTITY % pre "PRE | XMP | LISTING">
<![ %HTML.forms [ <!ENTITY % block-form "| FORM | ISINDEX"> ]]>
<!ENTITY % block-form "">
<![ %HTML.pSeparator [
<!ENTITY % obsolete-htext "| %htext">
<!ENTITY % block-p "">
]]>
<!ENTITY % obsolete-htext "| A">
<!ENTITY % block-p "| P ">
<!ENTITY % block "HR | %list | DL
| %pre | BLOCKQUOTE | ADDRESS
%block-form %block-p">
<![ %HTML.bodyBlockOnly [
<!ENTITY % current-htext "">
]]>
<!ENTITY % current-htext "| %htext">
<!ENTITY % body-content "%heading | %block %current-htext">
<!ELEMENT BODY O O (%body-content)*>
<!ELEMENT A - - (%heading|%block|%text)+ -(A)
-- @# Technically, this allows silliness like:
<H2><A>xyz<H1>h1</H1></A></H2>
The right way to do anchors outside of %htext is more like:
<as id=z1><H2>lkjlkj</h2><ae start=z1>
-->
<!ATTLIST A
%linkattributes;
>
<!ELEMENT IMG - O EMPTY -- Embedded image -->
<!ATTLIST IMG
SRC %URI; #IMPLIED -- URI of document to embed --
ALT CDATA #IMPLIED
ALIGN (top|middle|bottom) #IMPLIED
ISMAP (ISMAP) #IMPLIED
>
<![ %HTML.pSeparator [ <!ENTITY % p-content "EMPTY"> ]]>
<!ENTITY % p-content "(%htext)+">
<!ELEMENT P - O %p-content>
<!ELEMENT HR - O EMPTY -- horizontal rule -->
<!ELEMENT BR - O EMPTY -- @# BR -> &br; -->
<!ELEMENT ( %heading ) - - (%htext;)+>
<!ELEMENT DL - - (DT*, DD?)+>
<!ATTLIST DL
COMPACT (COMPACT) #IMPLIED>
<!ELEMENT DT - O (%htext)+>
<!ELEMENT DD - O (%htext|%block)+>
<!ELEMENT (%list) - - (LI)+>
<!ELEMENT LI - O (%htext|%block)+>
<!ELEMENT BLOCKQUOTE - - (%htext|%block)+ -- @# Hmm... --
-- for quoting some other source -->
<!ELEMENT ADDRESS - - (%htext;|%block)+>
<!ELEMENT PRE - - (#PCDATA|%pre-inline|A)+>
<!ATTLIST PRE
WIDTH NUMBER #implied
>
<!-- Mnemonic character entities. -->
<!ENTITY % ISOlat1 PUBLIC
"ISO 8879:1986//ENTITIES Added Latin 1//EN">
%ISOlat1;
<!ENTITY #DEFAULT SDATA "&unkown;" --display the markup-->
<!ENTITY amp CDATA "&" -- ampersand -->
<!ENTITY gt CDATA ">" -- greater than -->
<!ENTITY lt CDATA "<" -- less than -->
<!ENTITY quot CDATA """ -- double quote -->
<!-- Processing Entities -->
<!ENTITY nbsp "<? nonbreaking-space>">
<!-- @# should add entites for processing instructions
for line break, centering, etc. -->
<!-- Forms -->
<![ %HTML.forms [
<!ENTITY % HTTP-Method "(GET | POST)">
<!ELEMENT FORM - - (%body-content)* -(FORM) +(INPUT|SELECT|TEXTAREA)>
<!ATTLIST FORM
ACTION %URI #REQUIRED
METHOD %HTTP-Method #IMPLIED -- @# MAILTO? --
ENCTYPE %Content-Type; #IMPLIED
>
<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
RADIO | SUBMIT | RESET |
IMAGE | HIDDEN )">
<!ELEMENT INPUT - O EMPTY>
<!ATTLIST INPUT
TYPE %InputType #IMPLIED -- @# defaults to TEXT?? --
NAME CDATA #IMPLIED -- required for all but submit and reset --
VALUE CDATA #IMPLIED
SRC %URI #IMPLIED -- for image inputs --
CHECKED (CHECKED) #IMPLIED
SIZE CDATA #IMPLIED -- @# should be NUMBERS: delimit with space, not comma --
MAXLENGTH NUMBER #IMPLIED
ALIGN (top|middle|bottom|left|center|right) #IMPLIED --@#supported?--
>
<!ELEMENT SELECT - - (OPTION+)>
<!ATTLIST SELECT
NAME CDATA #REQUIRED
SIZE NUMBER #IMPLIED
MULTIPLE (MULTIPLE) #IMPLIED
>
<!ELEMENT OPTION - O (#PCDATA)>
<!ATTLIST OPTION
SELECTED (SELECTED) #IMPLIED
VALUE CDATA #IMPLIED
>
<!ELEMENT TEXTAREA - - (#PCDATA)>
<!ATTLIST TEXTAREA
NAME CDATA #REQUIRED
ROWS NUMBER #REQUIRED -- @#implied? --
COLS NUMBER #REQUIRED
>
]]>
<!-- Obsolete Elements -->
<![ %HTML.litCDATA [ <!ENTITY % lit-content "CDATA"> ]]>
<!ENTITY % lit-content "RCDATA">
<!ELEMENT (%literal) - - %lit-content>
<![ %HTML.PLAINTEXT [
<!ELEMENT PLAINTEXT - O EMPTY>
]]>
Security Considerations
Anchors, embedded images, and all
other elements which contain URls
as parameters may cause the URI to
be dereferenced, in which case the
security considerations of the URI
specification apply.
Documents may be constructed whose
visible contents mislead one to follow
a link by to unsuitable or offensive
material .
Acknowledgements
The HTML document type was designed
initially at CERN in 1990 for the
World-Wide Web project. The DTD was
written, and the specification tightened
up, by Dan Connolly. After much discussion
on the network and some enhancement
in particular the addition of inline
images introduced by the NCSA "Mosaic"
software for WWW. The FORMS material
is derived from the HTML+ specification
with the help of Dave Raggett.This
document is the work of many contributors.
Many thanks to Erik Naggum and James
Clark for making SGML technology
available, and toTerry Allen, Dave
Raggett, Marc Andressen, William
Perry, and the rest of the WWW community.
References
- SGML
- ISO 8879:1986, Information Processing
Text and Office Systems Standard
Generalized Markup Language (SGML).
- sgmls
- an SGML parser by James Clark
<jjc@jclark.com> derived from the
ARCSGML parser materials which were
written by Charles F. Goldfarb. The
source is available on the ifi.uio.no
FTP server in the directory /pub/SGML/SGMLS
.
- W3
- The World-Wide Web , a global
information initiative. For bootstrap
information, telnet info.cern.ch
or find documents by ftp://ftp.w3.org/pub/www/doc
- URI
- Universal Resource Identifiers
. RFCxxx. Currently available by
anonymous FTP from info.cern.ch in
/pub/www/doc/url*.{ps,txt}
Author's addresses
Daniel W. Connolly
Affiliation: HaL Software Systems
Austin, TX
USA
email: connolly@hal.com
Tim Berners-Lee
Address CERN
1211 Geneva 23
Switzerland
Telephone: +41(22)767 3755
Fax: +41(22)767 7155
email: timbl@info.cern.ch