- W3C »
- Standards »
- HTML 4.01
1 December 2008 Recommendation (Second Edition)
| Ok to use? | Yes, this is a Web Standard. Learn more | 
|---|---|
| About the Document | 
Abstract
This specification defines the HyperText Markup Language (HTML), the publishing language of the World Wide Web. This specification defines HTML 4.01, which is a subversion of HTML 4. In addition to the text, multimedia, and hyperlink features of the previous versions of HTML (HTML 3.2 [HTML32] and HTML 2.0 [RFC1866]), HTML 4 supports more multimedia options, scripting languages, style sheets, better printing facilities, and documents that are more accessible to users with disabilities. HTML 4 also takes great strides towards the internationalization of documents, with the goal of making the Web truly World Wide.
HTML 4 is an SGML application conforming to International Standard ISO 8879 -- Standard Generalized Markup Language [ISO8879].
Table of Contents
- 
About the HTML 4 Specification
- 
Introduction to HTML 4
About the HTML 4 Specification
1.1 How the specification is organized
This specification is divided into the following sections:
- Sections 2 and 3: Introduction to HTML 4
- The introduction describes HTML's place in the scheme of the
World Wide Web, provides a brief history of the development of
HTML, highlights what can be done with HTML 4, and provides some
HTML authoring tips.
The brief SGML tutorial gives readers some understanding of HTML's relationship to SGML and gives summary information on how to read the HTML Document Type Definition (DTD). 
- Sections 4 - 24: HTML 4 reference manual
- The bulk of the reference manual consists of the HTML language
reference, which defines all elements and attributes of the
language.
This document has been organized by topic rather than by the grammar of HTML. Topics are grouped into three categories: structure, presentation, and interactivity. Although it is not easy to divide HTML constructs perfectly into these three categories, the model reflects the HTML Working Group's experience that separating a document's structure from its presentation produces more effective and maintainable documents. The language reference consists of the following information: - 
What characters may appear in an HTML document. 
- 
Basic data types of an HTML document. 
- 
Elements that govern the structure of an HTML document, including text, lists, tables, links, and included objects, images, and applets. 
- 
Elements that govern the presentation of an HTML document, including style sheets, fonts, colors, rules, and other visual presentation, and frames for multi-windowed presentations. 
- 
Elements that govern interactivity with an HTML document, including forms for user input and scripts for active documents. 
- 
The SGML formal definition of HTML: - The SGML declaration of HTML.
- Three DTDs: strict, transitional, and frameset.
- The list of character references.
 
 
- 
- Appendixes
- The first appendix contains information about changes from HTML 3.2 to help authors and implementors with the transition to HTML 4, and changes from the 18 December 1997 specification. The second appendix contains performance and implementation notes, and is primarily intended to help implementors create user agents for HTML 4.
- References
- A list of normative and informative references.
- Indexes
- Three indexes give readers rapid access to the definition of key concepts, elements and attributes.
1.2 Document conventions
This document has been written with two types of readers in mind: authors and implementors. We hope the specification will provide authors with the tools they need to write efficient, attractive, and accessible documents, without over-exposing them to HTML's implementation details. Implementors, however, should find all they need to build conforming user agents.
The specification may be approached in several ways:
- 
Read from beginning to end. The specification begins with a general presentation of HTML and becomes more and more technical and specific towards the end. 
- Quick access to information. In order to get
information about syntax and semantics as quickly as possible, the
online version of the specification includes the following
features:
- Every reference to an element or attribute is linked to its definition in the specification. Each element or attribute is defined in only one location.
- Every page includes links to the indexes, so you never are more than two links away from finding the definition of an element or attribute.
- 
The front pages of each section of the language reference manual extend the initial table of contents with more detail about that section. 
 
1.2.1 Elements and attributes
Element names are written in uppercase letters (e.g., BODY). Attribute names are written in lowercase letters (e.g., lang, onsubmit). Recall that in HTML, element and attribute names are case-insensitive; the convention is meant to encourage readability.
Element and attribute names in this document have been marked up and may be rendered specially by some user agents.
Each attribute definition specifies the type of its value. If the type allows a small set of possible values, the definition lists the set of values, separated by a bar (|).
After the type information, each attribute definition indicates the case-sensitivity of its values, between square brackets ("[]"). See the section on case information for details.
1.2.2 Notes and examples
Informative notes are emphasized to stand out from surrounding text and may be rendered specially by some user agents.
All examples illustrating deprecated usage are marked as "DEPRECATED EXAMPLE". Deprecated examples also include recommended alternate solutions. All examples that illustrates illegal usage are clearly marked "ILLEGAL EXAMPLE".
Examples and notes have been marked up and may be rendered specially by some user agents.
2 Introduction to HTML 4
2.1 What is the World Wide Web?
The World Wide Web (Web) is a network of information resources. The Web relies on three mechanisms to make these resources readily available to the widest possible audience:
- A uniform naming scheme for locating resources on the Web (e.g., URIs).
- Protocols, for access to named resources over the Web (e.g., HTTP).
- Hypertext, for easy navigation among resources (e.g., HTML).
The ties between the three mechanisms are apparent throughout this specification.
2.1.1 Introduction to URIs
Every resource available on the Web -- HTML document, image, video clip, program, etc. -- has an address that may be encoded by a Universal Resource Identifier , or "URI".
URIs typically consist of three pieces:
- The naming scheme of the mechanism used to access the resource.
- The name of the machine hosting the resource.
- The name of the resource itself, given as a path.
Consider the URI that designates the W3C Technical Reports page:
http://www.w3.org/TR
This URI may be read as follows: There is a document available via the HTTP protocol (see [RFC2616]), residing on the machine www.w3.org, accessible via the path "/TR". Other schemes you may see in HTML documents include "mailto" for email and "ftp" for FTP.
Here is another example of a URI. This one refers to a user's mailbox:
                        ...this is text...
   For all comments, please send email to 
   <A href="mailto:joe@someplace.com">Joe Cool</A>.
Note. Most readers may be familiar with the term "URL" and not the term "URI". URLs form a subset of the more general URI naming scheme.
2.1.2 Fragment identifiers
Some URIs refer to a location within a resource. This kind of URI ends with "#" followed by an anchor identifier (called the fragment identifier ). For instance, here is a URI pointing to an anchor named section_2:
http://somesite.com/html/top.html#section_2
2.1.3 Relative URIs
A relative URI doesn't contain any naming scheme information. Its path generally refers to a resource on the same machine as the current document. Relative URIs may contain relative path components (e.g., ".." means one level up in the hierarchy defined by the path), and may contain fragment identifiers.
Relative URIs are resolved to full URIs using a base URI. As an example of relative URI resolution, assume we have the base URI "http://www.acme.com/support/intro.html". The relative URI in the following markup for a hypertext link:
<A href="suppliers.html">Suppliers</A>
would expand to the full URI "http://www.acme.com/support/suppliers.html", while the relative URI in the following markup for an image
<IMG src="../icons/logo.gif" alt="logo">
would expand to the full URI "http://www.acme.com/icons/logo.gif".
In HTML, URIs are used to:
- Link to another document or resource, (see the A and LINK elements).
- Link to an external style sheet or script (see the LINK and SCRIPT elements).
- Include an image, object, or applet in a page, (see the IMG , OBJECT , APPLET and INPUT elements).
- Create an image map (see the MAP and AREA elements).
- Submit a form (see FORM ).
- Create a frame document (see the FRAME and IFRAME elements).
- Cite an external reference (see the Q , BLOCKQUOTE , INS and DEL elements).
- Refer to metadata conventions describing a document (see the HEAD element).
Please consult the section on the URI type for more information about URIs.
2.2 What is HTML?
To publish information for global distribution, one needs a universally understood language, a kind of publishing mother tongue that all computers may potentially understand. The publishing language used by the World Wide Web is HTML (from HyperText Markup Language).
HTML gives authors the means to:
- Publish online documents with headings, text, tables, lists, photos, etc.
- Retrieve online information via hypertext links, at the click of a button.
- Design forms for conducting transactions with remote services, for use in searching for information, making reservations, ordering products, etc.
- Include spread-sheets, video clips, sound clips, and other applications directly in their documents.
2.2.1 A brief history of HTML
HTML was originally developed by Tim Berners-Lee while at CERN, and popularized by the Mosaic browser developed at NCSA. During the course of the 1990s it has blossomed with the explosive growth of the Web. During this time, HTML has been extended in a number of ways. The Web depends on Web page authors and vendors sharing the same conventions for HTML. This has motivated joint work on specifications for HTML.
HTML 2.0 (November 1995, see [RFC1866]) was developed under the aegis of the Internet Engineering Task Force (IETF) to codify common practice in late 1994. HTML+ (1993) and HTML 3.0 (1995, see [HTML30]) proposed much richer versions of HTML. Despite never receiving consensus in standards discussions, these drafts led to the adoption of a range of new features. The efforts of the World Wide Web Consortium's HTML Working Group to codify common practice in 1996 resulted in HTML 3.2 (January 1997, see [HTML32]). Changes from HTML 3.2 are summarized in Appendix A
Most people agree that HTML documents should work well across different browsers and platforms. Achieving interoperability lowers costs to content providers since they must develop only one version of a document. If the effort is not made, there is much greater risk that the Web will devolve into a proprietary world of incompatible formats, ultimately reducing the Web's commercial potential for all participants.
Each version of HTML has attempted to reflect greater consensus among industry players so that the investment made by content providers will not be wasted and that their documents will not become unreadable in a short period of time.
HTML has been developed with the vision that all manner of devices should be able to use information on the Web: PCs with graphics displays of varying resolution and color depths, cellular telephones, hand held devices, devices for speech for output and input, computers with high or low bandwidth, and so on.
2.3 HTML 4
HTML 4 extends HTML with mechanisms for style sheets, scripting, frames, embedding objects, improved support for right to left and mixed direction text, richer tables, and enhancements to forms, offering improved accessibility for people with disabilities.
HTML 4.01 is a revision of HTML 4.0 that corrects errors and makes some changes since the previous revision.
2.3.1 Internationalization
This version of HTML has been designed with the help of experts in the field of internationalization, so that documents may be written in every language and be transported easily around the world. This has been accomplished by incorporating [RFC2070], which deals with the internationalization of HTML.
One important step has been the adoption of the ISO/IEC:10646 standard (see [ISO10646]) as the document character set for HTML. This is the world's most inclusive standard dealing with issues of the representation of international characters, text direction, punctuation, and other world language issues.
HTML now offers greater support for diverse human languages within a document. This allows for more effective indexing of documents for search engines, higher-quality typography, better text-to-speech conversion, better hyphenation, etc.
2.3.2 Accessibility
As the Web community grows and its members diversify in their abilities and skills, it is crucial that the underlying technologies be appropriate to their specific needs. HTML has been designed to make Web pages more accessible to those with physical limitations. HTML 4 developments inspired by concerns for accessibility include:
- Better distinction between document structure and presentation, thus encouraging the use of style sheets instead of HTML presentation elements and attributes.
- Better forms, including the addition of access keys, the ability to group form controls semantically, the ability to group SELECT options semantically, and active labels.
- The ability to markup a text description of an included object (with the OBJECT element).
- A new client-side image map mechanism (the MAP element) that allows authors to integrate image and text links.
- The requirement that alternate text accompany images included with the IMG element and image maps included with the AREA element.
- Support for the title and lang attributes on all elements.
- Support for the ABBR and ACRONYM elements.
- A wider range of target media (tty, braille, etc.) for use with style sheets.
- Better tables, including captions, column groups, and mechanisms to facilitate non-visual rendering.
- Long descriptions of tables, images, frames, etc.
Authors who design pages with accessibility issues in mind will not only receive the blessings of the accessibility community, but will benefit in other ways as well: well-designed HTML documents that distinguish structure and presentation will adapt more easily to new technologies.
Note. For more information about designing accessible HTML documents, please consult [WAI].
2.3.3 Tables
The new table model in HTML is based on [RFC1942]. Authors now have greater control over structure and layout (e.g., column groups). The ability of designers to recommend column widths allows user agents to display table data incrementally (as it arrives) rather than waiting for the entire table before rendering.
Note. At the time of writing, some HTML authoring tools rely extensively on tables for formatting, which may easily cause accessibility problems.
2.3.4 Compound documents
HTML now offers a standard mechanism for embedding generic media objects and applications in HTML documents. The OBJECT element (together with its more specific ancestor elements IMG and APPLET ) provides a mechanism for including images, video, sound, mathematics, specialized applications, and other objects in a document. It also allows authors to specify a hierarchy of alternate renderings for user agents that don't support a specific rendering.
2.3.5 Style sheets
Style sheets simplify HTML markup and largely relieve HTML of the responsibilities of presentation. They give both authors and users control over the presentation of documents -- font information, alignment, colors, etc.
Style information can be specified for individual elements or groups of elements. Style information may be specified in an HTML document or in external style sheets.
The mechanisms for associating a style sheet with a document is independent of the style sheet language.
Before the advent of style sheets, authors had limited control over rendering. HTML 3.2 included a number of attributes and elements offering control over alignment, font size, and text color. Authors also exploited tables and images as a means for laying out pages. The relatively long time it takes for users to upgrade their browsers means that these features will continue to be used for some time. However, since style sheets offer more powerful presentation mechanisms, the World Wide Web Consortium will eventually phase out many of HTML's presentation elements and attributes. Throughout the specification elements and attributes at risk are marked as "deprecated". They are accompanied by examples of how to achieve the same effects with other elements or style sheets.
2.3.6 Scripting
Through scripts, authors may create dynamic Web pages (e.g., "smart forms" that react as users fill them out) and use HTML as a means to build networked applications.
The mechanisms provided to include scripts in an HTML document are independent of the scripting language.
2.3.7 Printing
Sometimes, authors will want to make it easy for users to print more than just the current document. When documents form part of a larger work, the relationships between them can be described using the HTML LINK element or using W3C's Resource Description Framework (RDF) (see [RDF10]).
2.4 Authoring documents with HTML 4
We recommend that authors and implementors observe the following general principles when working with HTML 4.
2.4.1 Separate structure and presentation
HTML has its roots in SGML which has always been a language for the specification of structural markup. As HTML matures, more and more of its presentational elements and attributes are being replaced by other mechanisms, in particular style sheets. Experience has shown that separating the structure of a document from its presentational aspects reduces the cost of serving a wide range of platforms, media, etc., and facilitates document revisions.
2.4.2 Consider universal accessibility to the Web
To make the Web more accessible to everyone, notably those with disabilities, authors should consider how their documents may be rendered on a variety of platforms: speech-based browsers, braille-readers, etc. We do not recommend that authors limit their creativity, only that they consider alternate renderings in their design. HTML offers a number of mechanisms to this end (e.g., the alt attribute, the accesskey attribute, etc.)
Furthermore, authors should keep in mind that their documents may be reaching a far-off audience with different computer configurations. In order for documents to be interpreted correctly, authors should include in their documents information about the natural language and direction of the text, how the document is encoded, and other issues related to internationalization.
2.4.3 Help user agents with incremental rendering
By carefully designing their tables and making use of new table features in HTML 4, authors can help user agents render documents more quickly. Authors can learn how to design tables for incremental rendering (see the TABLE element). Implementors should consult the notes on tables in the appendix for information on incremental algorithms.
Status of This Document
This is the status of the document at the time of its publication. For updated status information, see the relation of this technology to:
Editors, Authors, Groups
This document was produced by the W3C HTML Working Group, part of the HTML Activity.
- Editors
- 
- Dave Raggett, W3C
- Arnaud Le Hors, W3C
- Ian Jacobs, W3C
 
The authors of this specification, the members of the W3C HTML Working Group , deserve much applause for their diligent review of this document, their constructive comments, and their hard work: John D. Burger (MITRE), Steve Byrne (JavaSoft), Martin J. Dürst (University of Zurich), Daniel Glazman (Electricité de France), Scott Isaacs (Microsoft), Murray Maloney (GRIF), Steven Pemberton (CWI), Robert Pernett (Lotus), Jared Sorensen (Novell), Powell Smith (IBM), Robert Stevahn (HP), Ed Tecot (Microsoft), Jeffrey Veen (HotWired), Mike Wexler (Adobe), Misha Wolf (Reuters), and Lauren Wood (SoftQuad).
Thanks to everyone who has helped to author the working drafts that went into the HTML 4 specification, and to all those who have sent suggestions and corrections.
Many thanks to the Web Accessibility Initiative task force (WAI HC group) for their work on improving the accessibility of HTML and to T.V. Raman (Adobe) for his early work on developing accessible forms.
Thank you Dan Connolly (W3C) for rigorous and bountiful input as part-time editor and thoughtful guidance as chairman of the HTML Working Group. Thank you Sally Khudairi (W3C) for your indispensable work on press releases.
Thanks to David M. Abrahamson and Roger Price for their careful reading of the specification and constructive comments.
Lastly, thanks to Tim Berners-Lee without whom none of this would have been possible.
Changes
See the changes from the Proposed Recommendation (diff-marked version) and changes from the previous edition (diff-marked version).
Implementation
If this were a CR it would talk about PR entrance criteria, etc....
Comments
Public discussion on HTML features takes place on www-html@w3.org (archives of www-html@w3.org).
Patent Disclosure Request
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
How Stable is This?
This document has been reviewed by W3C Members and other interested parties and has been endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited as a normative reference from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.
W3C recommends that user agents and authors (and in particular, authoring tools) produce HTML 4.01 documents rather than HTML 4.0 documents. W3C recommends that authors produce HTML 4 documents instead of HTML 3.2 documents. For reasons of backward compatibility, W3C also recommends that tools interpreting HTML 4 continue to support HTML 3.2 and HTML 2.0 as well.
Update Policy
W3C will make every effort to make available this document as-is at the following permanent URI:
http://www.w3.org/TR/1999/REC-html401-19991224
See the in-place modification policy for W3C tecnnical reports @@Need to add that we will update in place to give a sentence when superseded; also, merge persistence policy with update policy? Change title of update policy?@@.
