2 Introduction to HTML 4.0

Contents

  1. What is the World Wide Web?
    1. Introduction to URLs
    2. Fragment identifiers
    3. Relative URLs
  2. What is HTML?
    1. A brief history of HTML
  3. HTML 4.0
    1. Internationalization
    2. Accessibility
    3. Tables
    4. Compound documents
    5. Style sheets
    6. Scripting
    7. Printing
  4. Designing documents with HTML 4.0
    1. Separate structure and presentation
    2. Consider universal accessibility to the Web
    3. Help user agents with incremental rendering

2.1 What is the World Wide Web?

The World Wide Web is a network of information resources. The Web relies on three mechanisms to make these resources readily available to the widest possible audience:

  1. A uniform naming scheme for locating resources on the Web (e.g., URLs).
  2. Protocols, for access to named resources over the Web (e.g., HTTP).
  3. Hypertext, for easy navigation among resources (e.g., HTML).

The ties between the three mechanisms are apparent throughout this specification.

2.1.1 Introduction to URLs

Every resource available on the Web -- HTML document, image, video clip, program, etc. -- has an address that may be encoded by a Uniform Resource Locator, or "URL".

URLs typically consist of three pieces:

  1. The naming scheme of the mechanism used to access the resource.
  2. The name of the machine hosting the resource.
  3. The name of the resource itself, given as a path.

Consider the URL that designates the current HTML specification:

http://www.w3.org/TR/PR-html4/cover.html

This URL may be read as follows: There is a document available via the HTTP protocol (see [RFC2068]), residing on the machine www.w3.org, accessible via the path "/TR/PR-html4/cover.html". Other schemes you may see in HTML documents include "mailto" for email and "ftp" for FTP.

Here is another example of a URL. This one refers to a user's mailbox:

...this is text...
For all comments, please send email to 
<A href="mailto:joe@someplace.com">Joe Cool</A>.

2.1.2 Fragment identifiers

Some URLs refer to a location within a resource. This kind of URL ends with "#" followed by an anchor identifier (called the "fragment identifier"). For instance, here is a URL pointing to an anchor named section_2:

http://somesite.com/html/top.html#section_2

2.1.3 Relative URLs

A relative URL doesn't contain any naming scheme information. Its path generally refers to a resource on the same machine as the current document. Relative URLs may contain relative path components (".." means one level up in the hierarchy defined by the path), and may contain fragment identifiers.

Relative URLs are resolved to full URLs using a base URL. As an example of relative URL resolution, assume we have the base URL "http://www.acme.com/support/intro.html". The relative URL in the following markup for a hypertext link:

  <A href="suppliers.html">Suppliers</A>

would expand to the full URL "http://www.acme.com/support/suppliers.html", while the relative URL in the following markup for an image

  <IMG src="../icons/logo.gif" alt="logo">

would expand to the full URL "http://www.acme.com/icons/logo.gif".

In HTML, URLs play a role in these situations:

Please consult the section on the URL type for more information about URLs.

2.2 What is HTML?

To publish information for global distribution, one needs a universally understood language, a kind of publishing mother tongue that all computers may potentially understand. The publishing language used by the World Wide Web is HTML (from HyperText Markup Language).

HTML gives authors the means to:

2.2.1 A brief history of HTML

HTML was originally developed by Tim Berners-Lee while at CERN, and popularized by the Mosaic browser developed at NCSA. During the course of the 1990s it has blossomed with the explosive growth of the Web. During this time, HTML has been extended in a number of ways. The Web depends on Web page authors and vendors sharing the same conventions for HTML. This has motivated joint work on specifications for HTML.

HTML 2.0 (November 1995, see [RFC1866]) was developed under the aegis of the Internet Engineering Task Force (IETF) to codify common practice in late 1994. HTML+ (1993) and [HTML30] (1995) proposed much richer versions of HTML. Despite never receiving consensus in standards discussions, these drafts led to the adoption of a range new features. The efforts of the World Wide Web Consortium's HTML working group to codify common practice in 1996 resulted in HTML 3.2 (January 1997, see [HTML32]). Changes from HTML 3.2 are summarized in Appendix A

Most people agree that HTML documents should work well across different browsers and platforms. Achieving interoperability lowers costs to content providers since they must develop only one version of a document. If the effort is not made, there is much greater risk that the Web will devolve into a proprietary world of incompatible formats, ultimately reducing the Web's commercial potential for all participants.

Each version of HTML has attempted to reflect greater consensus among industry players so that the investment made by content providers will not be wasted and that their documents will not become unreadable in a short period of time.

HTML has been developed with the vision that all manner of devices should be able to use information on the Web: PCs with graphics displays of varying resolution and color depths, cellular telephones, hand held devices, devices for speech for output and input, computers with high or low bandwidth, and so on.

2.3 HTML 4.0

HTML 4.0 extends HTML with mechanisms for style sheets, scripting, frames, embedding objects, improved support for right to left and mixed direction text, richer tables, and enhancements to forms, offering improved accessibility for people with disabilities.

2.3.1 Internationalization

This version of HTML has been designed with the help of experts in the field of internationalization, so that documents may be written in every language and be transported easily around the world. This has been accomplished by incorporating [RFC2070], which deals with the internationalization of HTML.

One important step has been the adoption of the ISO/IEC:10646 standard (see [ISO10646]) as the document character set for HTML. This is the world's most inclusive standard dealing with issues of the representation of international characters, text direction, punctuation, and other world language issues.

HTML now offers greater support for diverse human languages within a document. This allows for more effective indexing of documents for search engines, higher-quality typography, better text-to-speech conversion, correct hyphenation, etc.

2.3.2 Accessibility

As the Web community grows and its members diversify in their abilities and skills, it is crucial that the underlying technologies be appropriate to their specific needs. HTML has been designed to make Web pages more accessible to those with physical limitations. HTML 4.0 developments in the area of accessibility include:

Authors who design pages with accessibility issues in mind will not only receive the blessings of the accessibility community, but will benefit in other ways as well: well-designed HTML documents that distinguish structure and presentation will adapt more easily to new technologies.

2.3.3 Tables

The new table model in HTML is based on [RFC1942]. Authors now have greater control over structure and layout (e.g., column groups). The ability of designers to recommend column widths allows user agents to display table data incrementally (as it arrives) rather than waiting for the entire table before rendering.

Note. At the time of writing, some HTML authoring tools rely extensively on tables for formatting, which may easily cause accessibility problems.

2.3.4 Compound documents

HTML now offers a standard mechanism for embedding generic media objects and applications in HTML documents. The OBJECT element (together with its more specific ancestor elements IMG and APPLET) provides a mechanism for including images, video, sound, mathematics, specialized applications, and other objects in a document. It also allows authors to specify a hierarchy of alternate renderings for user agents that don't support a specific rendering.

2.3.5 Style sheets

Style sheets simplify HTML markup and largely relieve HTML of the responsibilities of presentation. They give both authors and users control over the presentation of documents -- font information, alignment, colors, etc.

Style information can be specified for specific elements or groups of elements either within an HTML document or in separate style sheets.

The mechanism for associating a style sheet with a document is independent of the style sheet language.

Before the advent of style sheets, authors had limited control over rendering. HTML 3.2 included a number of attributes and elements offering control over alignment, font size, and text color. Authors also exploited tables and images as a means for laying out pages. The relatively long time it takes for users to upgrade their browsers means that these features will continue to be used for some time. However, since style sheets offer more powerful presentation mechanisms, the World Wide Web Consortium will eventually phase out many of HTML's presentation elements and attributes. Throughout the specification elements and attributes at risk are marked as "deprecated". They are usually accompanied by examples of how to achieve the same effects with other elements or style sheets.

This specification includes three Document Type Definitions (DTDs) that may be used to validate HTML 4.0 documents. One for use with framesets, a loose DTD for transitional documents and a strict DTD that excludes presentation elements and attributes.

2.3.6 Scripting

Through scripts, authors may create "smart forms" that react as users fill them out. Scripting allows designers to create dynamic Web pages, and to use HTML as a means to build networked applications. The mechanisms provided to associate HTML with scripts are independent of particular scripting languages.

2.3.7 Printing

Sometimes, authors will want to make it easy for users to print more than just the current document. When documents form part of a larger work, the relationships between them can be described using the HTML LINK element or using W3C's Resource Description Language, see [RDF].

2.4 Designing documents with HTML 4.0

We recommend that authors and implementors observe the following general principles when working with HTML 4.0.

2.4.1 Separate structure and presentation

HTML has its roots in SGML which has always been a language for the specification of structural markup. As HTML matures, more and more of its presentational elements and attributes are being replaced by other mechanisms, in particular style sheets. Experience has shown that separating the structure of a document from its presentational aspects reduces the cost of serving a wide range of platforms, media, etc., and facilitates document revisions.

2.4.2 Consider universal accessibility to the Web

To make the Web more accessible to everyone, notably those with disabilities, authors should consider how their documents may be rendered on a variety of platforms: speech-based browsers, braille-readers, etc. We do not recommend that designers limit their creativity, only that they consider alternate renderings in their design. HTML offers a number of mechanisms to this end (e.g., the alt attribute, the accesskey attribute, etc.)

Furthermore, authors should keep in mind that their documents may be reaching a far-off audience with different computer configurations. In order for documents to be interpreted correctly, designers should include in their documents information about the natural language and direction of the text, how the document is encoded, and other issues related to internationalization.

2.4.3 Help user agents with incremental rendering

By carefully designing their tables and making use of new table features in HTML 4.0, designers can help user agents render documents more quickly. Authors can learn how to design tables for incremental rendering in the definition of the TABLE element. Implementors should consult the notes on tables in the appendix for information on incremental algorithms.