The document is a draft form of a standard for interchange of information on the network which is proposed to be registered as a MIME (RFC1521) content type.
Please send comments to connolly@hal.com or the discussion list www-html@info.cern.ch.
The latest version of this document is currently available in hypertext on the World-Wide Web as http://www.w3.org/hypertext/WWW/MarkUp/HTML.html
HTML is proposed as a MIME content type.
HTML refers to the URI specification RFCxxxx.
Implementations of HTML parsers and generators can be found in the various W3 servers and browsers, in the public domain W3 code, and may also be built using various public domain SGML parsers such as [SGMLS] . HTML documents are SGML documents with fairly generic semantics appropriate for representing information from a wide range of applications.
Internet Drafts are working documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress".
Distribution of this document is unlimited.
When an HTML document is encoded using 7-bit characters, then the mechanisms of character references and entity references may be used to encode characters in the upper half of the ISO Latin-1 set. In this way, documents may be prepared which are suitable for mailing through 7-bit limited systems.
The HyperText Markup Language is an application conforming to International Standard ISO 8879 -- Standard Generalized Markup Language [ SGML ]. SGML is a system for defining structured document types, and markup languages to represent instances of those document types.
Every SGML document has three parts:
The SGML declaration for HTML is given in the appendix ``SGML Delcaration for HTML.'' It is implicit among WWW implementations.
The prologue for an HTML document should look like:
<!DOCTYPE HTML PUBLIC "-//W3 Organization//DTD W3 HTML 2.0//EN">
The instance represents a hierarchy of elements. Each element has a name , some attributes , and some content. Most elements are represented in the document as a start tag, which gives the name and attributes, followed by the content, followed by the end tag. For example:
<!DOCTYPE HTML PUBLIC "-//W3 Organization//DTD W3 HTML 2.0//EN"> <HTML> <HEAD> <TITLE> A sample HTML document </TITLE> </HEAD> <BODY> <H1> An Example of Structure <br> In HTML </H1> <P> Here's a typical paragraph. <UL> <LI> Item one has an <A NAME="anchor"> anchor </A> <LI> Here's item two. </UL> </BODY> </HTML>Some elements (e.g. BR ) are empty. They have no content. They show up as just a start tag.
For the rest of the elements, the content is a sequence of data characters and nested elements. Some things such as forms and anchors cannot be nested, in which case this is mentioned in the text. Anchors and character highlighting may be put inside other constructs.
<h1> ... </H1> <!-- uppercase = lowercase --> <h1 > ... </h1 > <!-- spaces OK before > -->The following are not valid tags:
< h1> <!-- this is not a tag at all --> <H1/> <H=1> <!-- these are markup errors -->
<!DOCTYPE HTML PUBLIC "-//W3 Organization//DTD W3 HTML 2.0//EN"> <TITLE>Structural Example</TITLE> <H1>Structural Example</H1> <P>A paragraph... <!DOCTYPE HTML PUBLIC "-//W3 Organization//DTD W3 HTML 2.0//EN"> <HTML><HEAD> <TITLE>Structural Example</TITLE> </HEAD> <BODY> <H1>Structural Example</H1> <P>A paragraph...</P> </BODY>
A H1 h1 another.name name-with-hyphens
The value is either:
<A HREF="http://host/dir/file.html"> <A HREF=foo.html > <IMG SRC="mrbill.gif" ALT="Mr. Bill says, "Oh Noooo"">The length of an attribute value (after replacing entity and numeric character referencees) is limited to 1024 characters.
<UL COMPACT="COMPACT">can be written as
<UL COMPACT>
The behaviour of WWW applications reading HTML documents and discovering tag or attribute names which they do not understand should be to behave as though, in the case of a tag, the whole tag had not been there but its content had, or in the case of an attribute, that the attribute had not been present.
When a<b, we can show that...
Brought to you by AT&T
The HTML DTD includes entities for
each of the non-ASCII characters
so that one may reference them by
name if it is inconvenient to enter
them directly:
Kurt Gödel was a famous logician and mathematician.
For example:
<HEAD> <TITLE>HTML Guide: Recommended Usage</TITLE> <!-- Id: Text.html,v 1.6 1994/04/25 17:33:48 connolly Exp --> </HEAD>
HTML
|
|_head
|_body
The HEAD element is an small unordered
collection of information about the
document, whereas the BODY is an
ordered sequence of information elements
of arbitrary length. This organization
allows an implementation to determine
certain properties of a document
-- the title, for example -- without
parsing the entire document.
The elements within the BODY element are in the order in which they should be presented to the reader.
See the list of things which are allowed within a BODY element .
The text between the opening tag and the closing tag is either the start or destination (or both) of a link. Attributes of the anchor tag are as follows.
See <A HREF="http://www.w3.org/">CERN</A>'s information for more details. A <A NAME=serious>serious</A> crime is one which is associated with imprisonment. ... The Organization may refuse employment to anyone convicted of a <a href="#serious">serious</A> crime.
<ADDRESS><A HREF="Author.html">A.N.Other</A></ADDRESS> <ADDRESS> Newsletter editor<p> J.R. Brown<p> JimquickPost News, Jumquick, CT 01234<p> Tel (123) 456 7890 </ADDRESS>
Where the base address is not specified, the reader will use the URL it used to access the document to resolve any relative URLs.
The one attribute is:
<ADDRESS>Tim Berners-Lee<BR> World Wide Web project<BR> CERN<BR>1211 Geneva 23<BR>Switzerland </ADDRESS> I think that I shall never see<BR> A hoarding lovely as a tree<BR> In fact, unless the hoardings fall<BR> I'll never see a tree at all.<P>
Single-font rendition may for example put a vertical line of ">" characters down the left margin to indicate quotation in the Internet mail style.
I think it ends <BLOCKQUOTE>Soft you now, the fair Ophelia. Nymph, in thy orisons, be all my sins remembered. </BLOCKQUOTE> but I am not sure.
<FORM ACTION="mailto:www_admin@info.cern.ch"> <MH HIDDEN>Subject: WWW Questionaire</MH> Please help up to improve the World Wide Web by filling in the following questionaire: <P>Your organization? <INPUT NAME="org" SIZE="48"> <P>Commercial? <INPUT NAME="commerce" TYPE=checkbox> How many users? <INPUT NAME="users" TYPE=int> <P>Which browsers do you use? <UL> <LI>X Mosaic <INPUT NAME="browsers" TYPE=checkbox VALUE="xmosaic"> <LI>Cello <INPUT NAME="browsers" TYPE=checkbox VALUE="cello"> <LI>Others <TEXTAREA NAME="others" COLS=48 ROWS=4></TEXTAREA> </UL> A contact point for your site: <INPUT NAME="contact" SIZE="42"> <P>Many thanks on behalf of the WWW central support team. <P ALIGN=CENTER><INPUT TYPE=submit> <INPUT TYPE=reset> </FORM>This fictitious example is a questionnaire that will be emailed to www_admin@info.cern.ch . The FORM element is used to delimit the form. There can be several forms in a single document, but the FORM element can't be nested. The ACTION attribute specifies a URL that designates an HTTP server or an email address. If missing, the URL for the document itself will be assumed. The effect of the action can be modified by including a method prefix, e.g. ACTION="POST http://...." . This prefix is used to select the HTTP method when sending the form's contents to an HTTP server. Would it be cleaner to use a separate attribute, e.g. METHOD ?
Servers can disable forms by sending an appropriate header or by an attribute on the optional HTMLPLUS element at the very start of the document, e.g. <htmlplus forms=off> .
Here, the <P> and <UL> elements have been used to lay out the text (and input fields. The browser has changed the background color within the FORM element to distinguish the form from other parts of the document. The browser is responsible for handling the input focus, i.e. which field will currently get keyboard input.
For many platforms there will be existing conventions for forms, e.g. and shift- keys to move the keyboard focus forwards and backwards between fields, while an key submits the form. In the example, the and buttons are specified explicitly with special purpose fields. The button is used to email the form or send its contents to the server as specified by the ACTION attribute, while the button resets the fields to their initial values. When the form consists of a single text field, it may be appropriate to leave such buttons out and rely on the key.
The INPUT element is used for a large variety of typed of input fields.
When you need to let users enter more than one line of text, you should use the TEXTAREA element.
The RADIO and CHECKBOX types of INPUT field can be used to specify multiple choice forms in which every alternative is visible as part of the form. An alternative is to use the SELECT element which is generally rendered in a more compact fashion as a pull down combo list.
The ACTION attribute specifies a URL that designates an HTTP server or an email address. If missing, the URL for the document itself will be assumed. The effect of the action can be modified by including a method prefix, e.g. ACTION="POST http://...." . This prefix is used to select the HTTP method when sending the form's contents to an HTTP server. Would it be cleaner to use a separate attribute, e.g. METHOD ?
URL For fields which expect document references as URLs or URNs.
<SELECT NAME="flavor"> <OPTION>Vanilla <OPTION>Strawberry <OPTION>Rum and Raisin <OPTION>Peach and Orange </SELECT>
<TEXTAREA NAME="address" ROWS=64 COLS=6> Hewlett Packard Laboratories 1501 Page Mill Road Palo Alto, California 94304-1126 </TEXTAREA>The text up to the end tag is used to initialize the field's value. This end tag is always required even if the field is initially blank. The ROWS and COLS attributes determine the visible dimension of the field in characters. Browsers are recommended to allow text to grow beyond these limits by scrolling as needed. In the initial design for forms, multi-line text fields were supported by the INPUT element with TYPE=TEXT . Unfortunately, this causes problems for fields with long text values as SGML limits the length of attribute literals. The HTML+ DTD allows for up to 1024 characters (the SGML default is only 240 characters!).
A heading element implies all the font changes, paragraph breaks before and after, and white space (for example) necessary to render the heading. Further character emphasis or paragraph marks are not required in HTML.
H1 is the highest level of heading, and is recommended for the start of a hypertext node. It is suggested that the the text of the first heading be suitable for a reader who is already browsing in related information, in contrast to the title tag which should identify the node in a wider context.
The heading elements are
<H1>, <H2>, <H3>, <H4>, <H5>, <H6>It is not normal practice to jump from one header to a header level more than one below, for example for follow an H1 with an H3. Although this is legal, it is discouraged, as it may produce strange results for example when generating other representations from the HTML.
<H1>This is a heading</H1> Here is some text <H2>Second level heading</H2> Here is some more text.
The rendering software is responsible for generating suitable vertical white space between elements, so it is NOT normal or required to follow a heading element with a paragraph mark.
<H1>The Albatross</H1> <Address>The Bumstead Monthly, 1948</Address> The following information is culled from this and suvccessive issues of the magazine. Thanks are due to the editor-in-chief, A.R. Bunstead, for her help and advice. <H2>Copyright IQR Inc.</h2> This recording may not be sold, resold, hired out, used, or talked about in too great a depth without the publisher's written or videotaped consent. <HR> The Albatross, most fabled and infamous of ..
The IMG element allows another document to be inserted inline. The document is normally an icon or small graphic, etc. This element is NOT intended for embedding other HTML text.
Browsers which are not able to display inline images ignore IMG elements. Authors should note that some browsers will be able to display (or print) linked graphics but not inline graphics. If the graphic is essential, it may be wiser to make a link to it rather than to put it inline. If the graphic is essentially decorative, then IMG is appropriate.
The IMG element is empty: it has no closing tag. It has two attributes:
Warning: < IMG SRC ="triangle.gif" ALT="Warning:"> This must be done by a qualified technician. < A HREF="Go.html">< IMG SRC ="Button.ps" ALT="GO"></A>
The node may be queried with a keyword search by suffixing the node address with a question mark, followed by a list of keywords separated by plus signs. See the network address format .
Note that this tag is normally generated automatically by a server. If it is added by hand to an HTML document, then the client will assume that the server can handle a search on the document. Obviously the server must have this capability for it to work: simply adding <ISINDEX> in the document is not enough to make searches happen if the server does not have a search engine!
Status: standard.
<ISINDEX>
The LINK element is empty, but takes the same attributes as the anchor element .
Typical uses are to indicate authorship, related indexes and glossaries, older or more recent versions, etc. Links can indicate a static tree structure in which the document was authored by pointing to a "parent" and "next" and "previous" document, for example.
Servers may also allow links to be added by those who do not have the right to alter the body of a document.
This is sometimes implemented with the use of a large negative first line indent.
White space is typically left between successive DT,DD pairs unless the COMPACT attribute is given. The COMPACT attribute is appropriate for lists which are long and/or have DT,DD pairs which each take only a line or two. It is of course possible for the rendering software to discover these cases itself and make its own decisions, and this is to be encouraged.
The COMPACT attribute may also reduce the width of the left-hand (DT) column.
<DL> <DT>Term the first<DD>definition paragraph is reasonably long but is still displayed clearly <DT>Term2 follows<DD>Definition of term2 </DL> <DL COMPACT> <DT>Term<DD>definition paragraph <DT>Term2<DD>Definition of term2 </DL>
<UL> <LI> list element <LI> another list element ... </UL>The opening list tag may be any of UL , OL , MENU or DIR . It must be immediately followed by the first list element.
List elements with typical rendering are:
<OL> <LI> When you get to the station, leave by the southern exit, on platform one. <LI>Turn left to face toward the mountain <LI>Walk for a mile or so until you reach the "Asquith Arms" then <LI>Wait and see... </OL> < MENU > <LI>The oranges should be pressed fresh <LI>The nuts may come from a packet <LI>The gin must be good quality </MENU> < DIR > <LI>A-H<LI>I-M <LI>M-R<LI>S-Z </DIR>
When modifying a document, old anchor ids should not be reused, as there may be references stored elsewhere which point to them. This is read and generated by hypertext editors. Human writers of HTML usually use mnemonic alphabetical identifiers. Browser software may ignore this tag.
<NEXTID N=z27>
You do NOT need to use <P> to put white space around heading, list, address or blockquote elements. It is the responsibility of the rendering software to generate that white space. An empty paragraph has undefined effect and should be avoided.
<h1>What to do</h1> <p>This is a one paragraph.<P>This is a second. <P> This is a third.
<h1><P>What not to do</h1> <address><p>I found that on my XYZ browser it looked prettier to me if I put some paragraph tags</address> <p> <ul><p><li>Around lists, and <li>Inside headings. </ul> <p> <h2>None of the paragraph tags in this example should be there.</h2>
The optional attribute is:
<PRE WIDTH="80"> This is an example line </PRE>
There may only be one title in any document. It should identify the content of the document in a fairly wide context.
It may not contain anchors, paragraph marks, or highlighting. The title may be used to identify the node in a history list, to label the window displaying the node, etc. It is not normally displayed in the text of a document itself. Contrast titles with headings . The title should ideally be less than 64 characters in length. That is, many applications will display document titles in window titles, menus, etc where there is only limited room. Whilst there is no limit on the length of a title (as it may be automatically generated from other data), information providers are warned that it may be truncated if long.
<TITLE>Rivest and Neuman. 1989(b)</TITLE>or
<TITLE>A Recipe for Maple Syrup Flap-Jack</TITLE>or
<TITLE>Introduction -- AFS user's Guide</TITLE>Examples of inappropriate titles are those which are only meaningful within context,
<TITLE>Introduction</TITLE>or too long,
<TITLE>Remarks on the Quantum-Gravity effects of "Bean Pole" diversification in Mononucleosis patients in Developing Countries under Economic Conditions Prevalent during the Second half of the Twentieth Century, and Related Papers: a Summary</TITLE>
These elements allow sections of text to be formatted in a particular way, to provide emphasis, etc. The tags do NOT cause a paragraph break, and may be used on sections of text within paragraphs.
Where not supported by implementations, like all tags, these tags should be ignored but the content rendered.
All these tags have related closing tags, as in
This is <EM>emphasized</EM> text.Some of these styles are more explicit than others about how they should be physically represented. The logical styles should be used wherever possible, unless for example it is necessary to refer to the formatting in the text. (Eg, "The italic parts are mandatory".)
These element names are derived from TeXInfo macro names.
This text contains an <em>emphasized</em> word. <strong>Don't assume</strong> that it will be italic! It was made using the <CODE>EM</CODE> element. A citation is typically italic and has no formal necessary structure: <cite>Moby Dick</cite> is a book title.
The empty PLAINTEXT tag terminates the HTML entity. What follows is not SGML. In stead, there's an old HTTP convention that what follows is an ASCII (MIME "text/plain") body.
An example if its use is:
<PLAINTEXT> 0001 This is line one of a ling listing 0002 file from <any@host.inc.com> which is sentThis tag allows the rest of a file to be read efficiently without parsing. Its presence is an optimization. There is no closing tag. The rest of the data is not in SGML.
These styles allow text of fixed-width characters to be embedded absolutely as is into the document. The syntax is:
<LISTING> ... </LISTING>or
<XMP> ... </XMP>The text between these tags is to be portrayed in a fixed width font, so that any formatting done by character spacing on successive lines will be maintained. Between the opening and closing tags:
<HP1>...</HP1> <HP2>... </HP2> etc.
A comment element used for bracketing off unneed text and comment has been introduced in some browsers but will be replaced by the SGML command feature in new implementations.
This form is not supported by SGML and so is not the specified HTML interpretation. Providers should be warned that implementations may vary on how they interpret end tags apparently within these elements
<!SGML "ISO 8879:1986"
--
SGML Declaration for HyperText Markup Language (HTML)
as used by the World-Wide Web (WWW) application.
--
CHARSET
BASESET "ISO 646:1983//CHARSET
International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET 0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
BASESET "ISO Registration Number 100//CHARSET
ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
DESCSET 128 32 UNUSED
160 95 32
CAPACITY SGMLREF
TOTALCAP 150000
GRPCAP 150000
SCOPE DOCUMENT
SYNTAX
SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 127
BASESET "ISO 646:1983//CHARSET
International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET 0 128 0
FUNCTION
-- SPACE 32
TAB SEPCHAR 9
LF SEPCHAR 10
FF SEPCHAR 12
CR SEPCHAR 13 --
-- The above is an accurate description of the usage of FUNCTION --
-- characters in HTML implementations; that is, there is no --
-- Record Start or Record End character, and no occurences of --
-- character 10 or 13 are "ignored" by the parser. --
-- But because few SGML implementations support this concrete --
-- sytax, we include the one below. --
-- Note that in order to get correct behaviour w.r.t. newline --
-- processing, you will have to play some tricks in construcing --
-- the document entity for parsing in order to keep the parser --
-- from ignoring newlines in surpirsing ways --
RE 13
RS 10
SPACE 32
TAB SEPCHAR 9
NAMING LCNMSTRT ""
UCNMSTRT ""
LCNMCHAR ".-"
UCNMCHAR ".-"
NAMECASE GENERAL YES
ENTITY NO
DELIM GENERAL SGMLREF
SHORTREF SGMLREF
NAMES SGMLREF
QUANTITY SGMLREF
NAMELEN 34
TAGLVL 100
LITLEN 1024
GRPGTCNT 150
GRPCNT 64
FEATURES
MINIMIZE
DATATAG NO
OMITTAG YES
RANK NO
SHORTTAG YES
LINK
SIMPLE NO
IMPLICIT NO
EXPLICIT NO
OTHER
CONCUR NO
SUBDOC NO
FORMAL YES
APPINFO NONE
>
<!--
$Id: html.decl,v 1.6 1994/05/18 17:23:34 connolly Exp $
Author: Daniel W. Connolly <connolly@hal.com>
See also: http://www.hal.com/%7Econnolly/html-spec/HTML.html
http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html
-->
<!-- html.dtd
Document Type Definition for the HyperText Markup Language
as used by the World Wide Web (HTML DTD).
$Id: html.dtd,v 1.13 1994/05/18 17:23:29 connolly Exp $
Author: Daniel W. Connolly <connolly@hal.com>
See Also: http://www.hal.com/%7Econnolly/html-spec/HTML.html
http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html
-->
<!ENTITY HTML.Version
"-//connolly hal.com//DTD WWW HTML $Date 1994/04/19 17:24:06 $//EN"
-- public identifier for "current pracitice" version --
-- actually, take the $'s out to get the real public identifer, --
-- since $ is illegal in public identifier. When DTD stabilizes, --
-- we'll need to stop using RCS keywords to version the pub id --
-- Typical usage:
<!DOCTYPE HTML PUBLIC "-//connolly hal.com//DTD WWW HTML
$Date: 1994/05/18 17:23:29 $//EN">
<html>
...
</html>
--
>
<!-- Feature Test Entities -->
<!-- To use these, write your document like:
<!DOCTYPE HTML [
<!ENTITY % HTML.Optional "INCLUDE">
<!ENTITY % html PUBLIC "-//connolly hal.com//DTD WWW HTML 1.8//EN">
%html;
]>
<TITLE>Here's my doc</TITLE>
<p>It uses lots of optional features
In practice, if you're using sgmls to validate your docs,
you can stick the <!DOCTYPE [...]> in a separate file and
validate with:
sgmls -s doctype.sgml foo.html
-->
<!ENTITY % HTML.Minimal "IGNORE">
<!ENTITY % HTML.Obsolete "IGNORE">
<!ENTITY % HTML.Prescriptive "IGNORE">
<![ %HTML.Minimal [
<!ENTITY % HTML.linkRelationships "IGNORE">
<!ENTITY % HTML.linkMethods "IGNORE">
<!ENTITY % HTML.linkRedundantInfo "IGNORE">
<!ENTITY % HTML.forms "IGNORE">
<!-- @@ nested lists -->
<!-- @@ phrases -->
<!-- @@ headers inside A -->
<!-- @@ nested phrases, fonts -->
]]>
<![ %HTML.Obsolete [
<!ENTITY % HTML.titleCDATA "INCLUDE">
<!ENTITY % HTML.litCDATA "INCLUDE">
<!ENTITY % HTML.pSeparator "INCLUDE">
]]>
<![ %HTML.Prescriptive [
<!--
This feature test entity prescribes that certain
idioms detract from the structural integrity of an
HTML document, and are therefore disallowed.
-->
<!ENTITY % HTML.font-phrase "IGNORE">
<!ENTITY % HTML.anchorNameCDATA "IGNORE">
<!ENTITY % HTML.PLAINTEXT "IGNORE">
<!ENTITY % HTML.bodyBlockOnly "INCLUDE">
]]>
<!ENTITY % HTML.bodyBlockOnly "IGNORE"
-- only allow block elements in the BODY element
This means all paragraphs need to start with a <P> tag.
-->
<!ENTITY % HTML.pSeparator "IGNORE"
-- use P element as paragraph separator, rather that container.
-->
<!ENTITY % HTML.linkRelationships "INCLUDE"
-- Adding markup to links to show the relationship between
ends of a link
see http://www.w3.org/hypertext/WWW/MarkUp/Relationships.html
-->
<!ENTITY % HTML.linkMethods "INCLUDE"
-- Adding markup to links to show the methods supported
by the referent object
see http://www.w3.org/hypertext/WWW/MarkUp/Elements/A.html
-->
<!ENTITY % HTML.linkRedundantInfo "INCLUDE"
-- Adding markup to links to give redundant information
like URN, content type, title...
-->
<!ENTITY % HTML.anchorNameCDATA "INCLUDE"
-- Anchor names should be distinct. SGML parser can validate
this if the NAME attribute of the A element is declared as ID.
But that restricts the syntax of an anchor name to an SGML name,
i.e. a letter followed by letters, numbers, periods and dashes,
up to NAMELEN (34) characters long.
-->
<!ENTITY % HTML.PLAINTEXT "INCLUDE"
-- Support for the <PLAINTEXT> tag as a sign of the
end of th HTML data stream and the beginning of a stream
of text/plain data
-->
<!ENTITY % HTML.titleCDATA "IGNORE"
-- Is the TITLE element #PCDATA, RCDATA, or CDATA content?
On Mosaic, it's #PCDATA, but in the linemode browser,
it's more like CDATA, but not quite.
-->
<!ENTITY % HTML.NEXTID "INCLUDE"
-- Used by the NeXT implementation to keep track of the
next anchor id to use
-->
<!ENTITY % HTML.font-phrase "INCLUDE"
-- allow B, I, TT, U outside PRE,
CITE, VAR, etc. inside PRE
-->
<!ENTITY % HTML.KEY "IGNORE"
-- There was once a KEY element, for keyboard keys, menu items,
buttons, etc. but it's not supported or widely documented
-->
<!ENTITY % HTML.U "IGNORE"
-- There was also a U element, but since it clashes with
the common pracitce of underlining hypertext links, it is
not widely supported
-->
<!ENTITY % HTML.litCDATA "IGNORE"
-- treat XMP, LISTING as CDATA, as per linemodeWWW
-->
<!ENTITY % HTML.forms "INCLUDE"
-- Support for forms as per
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html
-->
<!-- DTD definitions -->
<!ENTITY % heading "H1|H2|H3|H4|H5|H6" >
<!ENTITY % list " UL | OL | DIR | MENU ">
<!ENTITY % literal " XMP | LISTING ">
<!ENTITY % URI "CDATA"
-- The term URI means a CDATA attribute
whose value is a Uniform Resource Identifier,
as defined by
"Universal Resource Identifiers" by Tim Berners-Lee
aka http://www.w3.org/hypertext/WWW/Addressing/URL/URI_Overview.html
Note that CDATA attributes are limited by the LITLEN
capacity (1024 in the current version of html.decl),
so that URIs in HTML have a bounded length.
@@ Need to discuss relative addresses.
-->
<!ENTITY % Content-Type "CDATA"
-- meaning a MIME content type, as per RFC1521
-->
<![ %HTML.anchorNameCDATA [ <!ENTITY % anchor-name "CDATA"> ]]>
<!ENTITY % anchor-name "ID">
<![ %HTML.linkRelationships [ <!ENTITY % linkRelAttrs "
REL CDATA #IMPLIED -- forward relationship type --
REV CDATA #IMPLIED -- reversed relationship type
to referent data:
PARENT CHILD, SIBLING, NEXT, TOP,
DEFINITION, UPDATE, ORIGINAL etc. --
"> ]]>
<!ENTITY % linkRelAttrs "">
<![ %HTML.linkRedundantInfo [ <!ENTITY % linkRedundantAttrs "
URN CDATA #IMPLIED -- universal resource number --
TITLE CDATA #IMPLIED -- advisory only --
"> ]]>
<!ENTITY % linkRedundantAttrs "">
<![ %HTML.linkMethods [ <!ENTITY % linkMethodAttrs "
METHODS NAMES #IMPLIED -- supported public methods of the object:
TEXTSEARCH, GET, HEAD, ... --
"> ]]>
<!ENTITY % linkMethodAttrs "">
<!ENTITY % linkattributes
"NAME %anchor-name #IMPLIED
HREF %URI; #IMPLIED
%linkRelAttrs;
%linkRedundantAttrs;
%linkMethodAttrs;
">
<!-- Document Element -->
<![ %HTML.PLAINTEXT [ <!ENTITY % obsolete-plaintext ", PLAINTEXT?"> ]]>
<!ENTITY % obsolete-plaintext "">
<!ENTITY % html-content "HEAD, BODY %obsolete-plaintext;">
<!ELEMENT HTML O O (%html-content)>
<![ %HTML.NEXTID [ <!ENTITY % head-content "TITLE? & ISINDEX? & LINK* & BASE?
& NEXTID?"> ]]>
<!ENTITY % head-content "TITLE & ISINDEX? & LINK* & BASE?">
<!ELEMENT HEAD O O (%head-content)>
<![ %HTML.titleCDATA [ <!ENTITY % title-content "CDATA"> ]]>
<!ENTITY % title-content "(#PCDATA)">
<!ELEMENT TITLE - - %title-content
-- The TITLE element is not considered part of the flow of text.
It should be displayed, for example as the page header or
window title.
-->
<!ELEMENT ISINDEX - O EMPTY
-- WWW clients should offer the option to perform a search on
documents containing ISINDEX.
-->
<!ELEMENT NEXTID - O EMPTY>
<!ATTLIST NEXTID N %anchor-name #REQUIRED
-- The number should be a name suitable for use
for the ID of a new element. When used, the value
has its numeric part incremented. EG Z67 becomes Z68
-->
<!ELEMENT LINK - O EMPTY>
<!ATTLIST LINK
%linkattributes>
<!ELEMENT BASE - O EMPTY -- Reference context for URIs -->
<!ATTLIST BASE
HREF %URI; #REQUIRED
>
<![ %HTML.KEY [
<!ENTITY % key-emph "| KEY">
]]>
<!ENTITY % key-emph "">
<![ %HTML.U [
<!ENTITY % u-font "| U">
]]>
<!ENTITY % u-font "">
<!ENTITY % font "TT | B | I %u-font">
<!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | DFN | CITE
| STRIKE %key-emph">
<![ %HTML.font-phrase [
<!ENTITY % obsolete-font "| %font">
<!ENTITY % obsolete-phrase "| %phrase">
]]>
<!ENTITY % obsolete-font "">
<!ENTITY % obsolete-phrase "">
<![ %HTML.pSeparator [
<!ENTITY % obsolete-p "| P">
]]>
<!ENTITY % obsolete-p "">
<!ENTITY % inline "%phrase %obsolete-font">
<!ENTITY % pre-inline "%font %obsolete-phrase %obsolete-p">
<!ENTITY % text "#PCDATA | IMG | %inline | BR %obsolete-p">
<!ENTITY % htext "A | %text" -- Plus links, no structure -->
<![ %HTML.font-phrase [ <!ENTITY % font-content "(%htext)+"> ]]>
<!ENTITY % font-content "#PCDATA">
<!ELEMENT (%font;) - - (%font-content;)>
<!ELEMENT (%phrase;) - - (%htext)+>
<!ENTITY % pre "PRE | XMP | LISTING">
<![ %HTML.forms [ <!ENTITY % block-form "| FORM | ISINDEX"> ]]>
<!ENTITY % block-form "">
<![ %HTML.pSeparator [
<!ENTITY % obsolete-htext "| %htext">
<!ENTITY % block-p "">
]]>
<!ENTITY % obsolete-htext "| A">
<!ENTITY % block-p "| P ">
<!ENTITY % block "HR | %list | DL
| %pre | BLOCKQUOTE | ADDRESS
%block-form %block-p">
<![ %HTML.bodyBlockOnly [
<!ENTITY % current-htext "">
]]>
<!ENTITY % current-htext "| %htext">
<!ENTITY % body-content "%heading | %block %current-htext">
<!ELEMENT BODY O O (%body-content)*>
<!ELEMENT A - - (%heading|%block|%text)+ -(A)
-- @# Technically, this allows silliness like:
<H2><A>xyz<H1>h1</H1></A></H2>
The right way to do anchors outside of %htext is more like:
<as id=z1><H2>lkjlkj</h2><ae start=z1>
-->
<!ATTLIST A
%linkattributes;
>
<!ELEMENT IMG - O EMPTY -- Embedded image -->
<!ATTLIST IMG
SRC %URI; #IMPLIED -- URI of document to embed --
ALT CDATA #IMPLIED
ALIGN (top|middle|bottom) #IMPLIED
ISMAP (ISMAP) #IMPLIED
>
<![ %HTML.pSeparator [ <!ENTITY % p-content "EMPTY"> ]]>
<!ENTITY % p-content "(%htext)+">
<!ELEMENT P - O %p-content>
<!ELEMENT HR - O EMPTY -- horizontal rule -->
<!ELEMENT BR - O EMPTY -- @# BR -> &br; -->
<!ELEMENT ( %heading ) - - (%htext;)+>
<!ELEMENT DL - - (DT*, DD?)+>
<!ATTLIST DL
COMPACT (COMPACT) #IMPLIED>
<!ELEMENT DT - O (%htext)+>
<!ELEMENT DD - O (%htext|%block)+>
<!ELEMENT (%list) - - (LI)+>
<!ELEMENT LI - O (%htext|%block)+>
<!ELEMENT BLOCKQUOTE - - (%htext|%block)+ -- @# Hmm... --
-- for quoting some other source -->
<!ELEMENT ADDRESS - - (%htext;|%block)+>
<!ELEMENT PRE - - (#PCDATA|%pre-inline|A)+>
<!ATTLIST PRE
WIDTH NUMBER #implied
>
<!-- Mnemonic character entities. -->
<!ENTITY % ISOlat1 PUBLIC
"ISO 8879:1986//ENTITIES Added Latin 1//EN">
%ISOlat1;
<!ENTITY #DEFAULT SDATA "&unkown;" --display the markup-->
<!ENTITY amp CDATA "&" -- ampersand -->
<!ENTITY gt CDATA ">" -- greater than -->
<!ENTITY lt CDATA "<" -- less than -->
<!ENTITY quot CDATA """ -- double quote -->
<!-- Processing Entities -->
<!ENTITY nbsp "<? nonbreaking-space>">
<!-- @# should add entites for processing instructions
for line break, centering, etc. -->
<!-- Forms -->
<![ %HTML.forms [
<!ENTITY % HTTP-Method "(GET | POST)">
<!ELEMENT FORM - - (%body-content)* -(FORM) +(INPUT|SELECT|TEXTAREA)>
<!ATTLIST FORM
ACTION %URI #REQUIRED
METHOD %HTTP-Method #IMPLIED -- @# MAILTO? --
ENCTYPE %Content-Type; #IMPLIED
>
<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
RADIO | SUBMIT | RESET |
IMAGE | HIDDEN )">
<!ELEMENT INPUT - O EMPTY>
<!ATTLIST INPUT
TYPE %InputType #IMPLIED -- @# defaults to TEXT?? --
NAME CDATA #IMPLIED -- required for all but submit and reset --
VALUE CDATA #IMPLIED
SRC %URI #IMPLIED -- for image inputs --
CHECKED (CHECKED) #IMPLIED
SIZE CDATA #IMPLIED -- @# should be NUMBERS: delimit with space, not comma --
MAXLENGTH NUMBER #IMPLIED
ALIGN (top|middle|bottom|left|center|right) #IMPLIED --@#supported?--
>
<!ELEMENT SELECT - - (OPTION+)>
<!ATTLIST SELECT
NAME CDATA #REQUIRED
SIZE NUMBER #IMPLIED
MULTIPLE (MULTIPLE) #IMPLIED
>
<!ELEMENT OPTION - O (#PCDATA)>
<!ATTLIST OPTION
SELECTED (SELECTED) #IMPLIED
VALUE CDATA #IMPLIED
>
<!ELEMENT TEXTAREA - - (#PCDATA)>
<!ATTLIST TEXTAREA
NAME CDATA #REQUIRED
ROWS NUMBER #REQUIRED -- @#implied? --
COLS NUMBER #REQUIRED
>
]]>
<!-- Obsolete Elements -->
<![ %HTML.litCDATA [ <!ENTITY % lit-content "CDATA"> ]]>
<!ENTITY % lit-content "RCDATA">
<!ELEMENT (%literal) - - %lit-content>
<![ %HTML.PLAINTEXT [
<!ELEMENT PLAINTEXT - O EMPTY>
]]>
Documents may be constructed whose visible contents mislead one to follow a link by to unsuitable or offensive material .
Daniel W. Connolly Affiliation: HaL Software Systems Austin, TX USA email: connolly@hal.com Tim Berners-Lee Address CERN 1211 Geneva 23 Switzerland Telephone: +41(22)767 3755 Fax: +41(22)767 7155 email: timbl@info.cern.ch