Structured Text
An HTML instance is like a text file,
except that some of the characters
are interpreted as markup. The markup
gives structure to the document.
The instance represents a hierarchy
of elements. Each element has a name
, some attributes , and some content.
Most elements are represented in
the document as a start tag, which
gives the name and attributes, followed
by the content, followed by the end
tag. For example:
<!DOCTYPE HTML PUBLIC
"-//W3 Organization//DTD W3 HTML 2.0//EN">
<HTML>
<HEAD>
<TITLE>
A sample HTML document
</TITLE>
</HEAD>
<BODY>
<H1>
An Example of Structure
<br>
In HTML
</H1>
<P>
Here's a typical paragraph.
<UL>
<LI>
Item one has an
<A NAME="anchor">
anchor
</A>
<LI>
Here's item two.
</UL>
</BODY>
</HTML>
Some elements (e.g. BR ) are empty.
They have no content. They show up
as just a start tag.
For the rest of the elements, the
content is a sequence of data characters
and nested elements. Some things
such as forms and anchors cannot
be nested, in which case this is
mentioned in the text. Anchors and
character highlighting may be put
inside other constructs.
Most elements start and end with
tags. Empty elements have no end
tag. Start tags are delimited by
<and >, and end tags are delimited
by </ and >. For example:
<h1> ... </H1> <!-- uppercase = lowercase -->
<h1 > ... </h1 > <!-- spaces OK before > -->
The following are not valid tags:
< h1> <!-- this is not a tag at all -->
<H1/> <H=1> <!-- these are markup errors -->
- NOTE:
- The SGML declaration for HTML
specifies SHORTTAG YES , which means
that there are some other valid syntaxes
for tags, e.g. NET tags: <em/.../
, empty start tags: <> , empty end
tags: </> . Until such time as support
for these idioms is widely deployed,
their use is strongly discouraged.
The start and end tags for the HTML,
HEAD, and BODY elements are omissable.
The end tags of some other elements
(e.g. P, LI, DT, DD) can be ommitted
(see the DTD for details). This does
not change the document structure
-- the following documents are equivalent:
<!DOCTYPE HTML PUBLIC
"-//W3 Organization//DTD W3 HTML 2.0//EN">
<TITLE>Structural Example</TITLE>
<H1>Structural Example</H1>
<P>A paragraph...
<!DOCTYPE HTML PUBLIC
"-//W3 Organization//DTD W3 HTML 2.0//EN">
<HTML><HEAD>
<TITLE>Structural Example</TITLE>
</HEAD>
<BODY>
<H1>Structural Example</H1>
<P>A paragraph...</P>
</BODY>
The element name immediately follows
the tag open delimiter. Names consist
of a letter followed by up to 33
letters, digits, periods, or hyphens.
Names are not case sensitive. For
example:
A H1 h1 another.name name-with-hyphens
In a start tag, whitespace and attributes
are allowed between the element name
and the closing delimiter. An attribute
consists of a name, an equal sign,
and a value. Whitespace is allowed
around the equal sign.
The value is either:
- A string literal, delimited by single
quotes or double quotes, or
- A name token; that is, a sequence
of letters, digits, periods, or hyphens.
For example:
<A HREF="http://host/dir/file.html">
<A HREF=foo.html >
<IMG SRC="mrbill.gif" ALT="Mr. Bill says, "Oh Noooo"">
The length of an attribute value
(after replacing entity and numeric
character referencees) is limited
to 1024 characters.
- NOTE 1:
- Some implementations allowed
any character except space or '>'
in a name token, for example <A HREF=foo/bar.html>
. As a result, there are many documents
that contain attribute values that
should be quoted but are not. While
parser implementators are encouraged
to support this idiom, its use in
future documents is stictly prohibited.
- NOTE 2:
- Some implementations also
consider any occurence of the > character
to signal the end of a tag. For compatibility
with such implementations, it may
be necessary to represent > with
an entity or numeric character reference;
for example: <IMG SRC="eq1.ps" ALT="a
> b">
Attributes with a delcared value
of NAME (e.g. ISMAP , COMPACT ) may
be written using a minimized syntax.
The markup:
<UL COMPACT="COMPACT">
can be written as
<UL COMPACT>
Undefined tag and attribute names
It is a principle to be conservative
in that which one produces, and liberal
in that which one accepts. HTML
parsers should be liberal except
when verifying code. HTML generators
should generate strictly conforming
HTML.
The behaviour of WWW applications
reading HTML documents and discovering
tag or attribute names which they
do not understand should be to behave
as though, in the case of a tag,
the whole tag had not been there
but its content had, or in the case
of an attribute, that the attribute
had not been present.
The charcters between the tags represent
text in the ISO-Latin-1 character
set, which is a superset of ASCII.
Because certain characters will be
interpreted as markup, they should
be "escaped"; that is, represented
by markup -- entity or numeric character
references. For example:
When a<b, we can show that...
Brought to you by AT&T
The HTML DTD includes entities for
each of the non-ASCII characters
so that one may reference them by
name if it is inconvenient to enter
them directly:
Kurt Gödel was a famous logician and mathematician.
- NOTE 1:
- To ensure that a string of
characters has no markup, it is sufficient
to represent all occurrences of <
, > , and & by character or entity
references.
- NOTE 2:
- There are SGML features (
CDATA , RCDATA ) to allow most <
, > , and & characters to be entered
without the use of entity or character
references. Because these features
tend to be used and implemented inconsistently,
and because they require 8-bit characters
to represent non-ASCII characters,
they are not employed in this version
of the HTML DTD. An earlier HTML
specification included an XMP element
whose syntax is not expressible in
SGML. Inside the XMP , no markup
was recognized except the </XMP>
end tag. While implementations are
encouraged to support this idiom,
its use is obsolete.
Comments
To include comments in an HTML document
that will be ignored by the parser,
surround them with <!-- and -->.
After the comment delimiter, all
text up to the next occurrence of
-- is ignored. Hence comments cannot
be nested. Whitespace is allowed
between the closing -- and >. (But
not between the opening <! and --.)
For example:
<HEAD>
<TITLE>HTML Guide: Recommended Usage</TITLE>
<!-- Id: Text.html,v 1.6 1994/04/25 17:33:48 connolly Exp -->
</HEAD>
- Note 3:
- Some historical implementations
incorrectly consider a > sign to
terminate a comment.
.