2 Introduction to HTML 4.0

Contents

  1. A brief history of HTML
  2. HTML 4.0
    1. Internationalization
    2. Accessibility
    3. Tables
    4. Compound documents
    5. Style sheets
    6. Scripting
    7. Printing
  3. Designing documents with HTML 4.0
    1. Separate structure and presentation
    2. Consider universal accessibility to the Web
    3. Help user agents with incremental rendering

To publish information for global distribution, one needs a universally understood language, a kind of publishing mother tongue that all computers may potentially understand. The publishing language used by the World Wide Web is HTML (from HyperText Markup Language).

HTML gives authors the means to:

2.1 A brief history of HTML

HTML was originally developed by Tim Berners-Lee while at CERN, and popularized by the Mosaic browser developed at NCSA. During the course of the 1990s it has blossomed with the explosive growth of the Web. During this time, HTML has been extended in a number of ways. The Web depends on Web page authors and vendors sharing the same conventions for HTML. This has motivated joint work on specifications for HTML.

HTML 2.0 (November 1995, see [RFC1866]) was developed under the aegis of the Internet Engineering Task Force (IETF) to codify common practice in late 1994. HTML+ (1993) and [HTML30] (1995) proposed much richer versions of HTML. Despite never receiving consensus in standards discussions, these drafts led to the adoption of a range new features. The efforts of the World Wide Web Consortium's HTML working group to codify common practice in 1996 resulted in HTML 3.2 (January 1997, see [HTML32]).

While most people agree that HTML documents should work well across different browsers and platforms, achieving interoperability implies higher costs to content providers since they must develop different versions of documents. If the effort is not made, however, there is much greater risk that the Web will devolve into a proprietary world of incompatible formats, ultimately reducing the Web's commercial potential for all participants.

Each version of HTML has attempted to reflect greater consensus among industry players so that the investment made by content providers will not be wasted and that their documents will not become unreadable in a short period of time.

HTML has been developed with the vision that all manner of devices should be able to use information on the Web: PCs with graphics displays of varying resolution and color depths, cellular telephones, hand held devices, devices for speech for output and input, computers with high or low bandwidth, and so on.

2.2 HTML 4.0

HTML 4.0 extends HTML with mechanisms for style sheets, scripting, frames, embedding objects, improved support for right to left and mixed direction text, richer tables, and enhancements to forms, offering improved accessibility for people with disabilities.

2.2.1 Internationalization

This version of HTML has been designed with the help of experts in the field of internationalization, so that documents may be written in every language and be transported easily around the world. This has been accomplished by incorporating [RFC2070], which deals with the internationalization of HTML.

One important step has been the adoption of the ISO/IEC:10646 standard (see [ISO10646]) as the document character set for HTML. This is the world's most inclusive standard dealing with issues of the representation of international characters, text direction, punctuation, and other world language issues.

HTML now offers greater support for diverse human languages within a document. This allows for more effective indexing of documents for search engines, higher-quality typography, better text-to-speech conversion, correct hyphenation, etc.

2.2.2 Accessibility

As the Web community grows and its members diversify in their abilities and skills, it is crucial that the underlying technologies be appropriate to their specific needs. HTML has been designed to make Web pages more accessible to those with physical limitations. HTML 4.0 developments in the area of accessibility include:

Authors who design pages with accessibility issues in mind will not only receive the blessings of the accessibility community, but will benefit in other ways as well: well-designed HTML documents that distinguish structure and presentation will adapt more easily to new technologies.

2.2.3 Tables

The new table model in HTML is based on [RFC1942]. Authors now have greater control over structure and layout (e.g., column groups). The ability of designers to recommend column widths allows user agents to display table data incrementally (as it arrives) rather than waiting for the entire table before rendering.

Beware - at the time of writing, some HTML authoring tools rely extensively on tables for formatting, which may easily cause accessibility problems.

2.2.4 Compound documents

HTML now offers a standard mechanism for embedding generic media objects and applications in HTML documents. The OBJECT element (together with its more specific ancestor elements IMG and APPLET) provides a mechanism for including images, video, sound, mathematics, specialized applications, and other objects in a document. It also allows authors to specify a hierarchy of alternate renderings for user agents that don't support a specific rendering.

2.2.5 Style sheets

Style sheets simplify HTML markup and largely relieve HTML of the responsibilities of presentation. They give both authors and users control over the presentation of documents --- font information, alignment, colors, etc.

Stylistic information can be:

The mechanism for associating a style sheet with a document is independent of the style sheet language.

Before the advent of style sheets, authors had limited control over rendering. HTML 3.2 included a number of attributes and elements offering control over alignment, font size, and text color. Authors also exploited tables and images as a means for laying out pages. The relatively long time it takes for users to upgrade their browsers means that these features will continue to be used for some time. However, since style sheets offer more powerful presentation mechanisms, the World Wide Web Consortium will eventually phase out many of HTML's presentation elements and attributes. Throughout the specification elements and attributes at risk are marked as "deprecated". They are usually accompanied with examples of how to achieve the same effects using style sheets.

This specification includes three Document Type Definitions (DTDs) that may be used to validate HTML 4.0 documents. One for use with framesets, a loose DTD for transitional documents and a strict DTD that excludes presentation elements and attributes.

2.2.6 Scripting

Through scripts, authors may create "smart forms" that react as users fill them out. Scripting allows designers to create dynamic Web pages, and to use HTML as a means to build networked applications. The mechanisms provided to associate HTML with scripts are independent of particular scripting languages.

2.2.7 Printing

HTML features (the LINK element) allow user agents to print a collection of documents in an intelligent manner based on descriptions of the relationships among documents acting as parts of a larger work.

2.3 Designing documents with HTML 4.0

We recommend that authors and implementors observe the following general principles when working with HTML 4.0.

2.3.1 Separate structure and presentation

HTML has its roots in SGML which has always been a language for the specification of structural markup. As HTML matures, more and more of its presentational elements and attributes are being replaced by other mechanisms, in particular style sheets. Experience has shown that separating the structure of a document from its presentational aspects reduces the cost of serving a wide range of platforms, media, etc., and facilitates document revisions.

2.3.2 Consider universal accessibility to the Web

To make the Web more accessible to everyone, notably those with disabilities, authors should consider how their documents may be rendered on a variety of platforms: speech-based browsers, braille-readers, etc. We do not recommend that designers limit their creativity, only that they consider alternate renderings in their design. HTML offers a number of mechanisms to this end (e.g., the alt attribute, the accesskey attribute, etc.)

Furthermore, authors should keep in mind that their documents may be reaching a far-off audience with different computer configurations. In order for documents to be interpreted correctly, designers should include in their documents information about the language and direction of the text, how the document is encoded, and other issues related to internationalization.

2.3.3 Help user agents with incremental rendering

By carefully designing their tables and making use of new table features in HTML 4.0, designers can help user agents render documents more quickly.