XHTML Modularization - an Overview

02 February 2000, revised on April 2001

Introduction

XHTML is a reformulation of HTML into a language that conforms to the XML 1.0 Recommendation. The ultimate goal of this reformulation is that XHTML and its descendants will be useful in environments where there are no preconceived notions about the semantics of any element or attribute (generic, adaptive XML environments). The realization of this goal is still some way off. In the interim, the W3C's HTML Working Group has been taking steps to ensure that documents developed using XHTML will be portable into these adaptive environments even if, today, the documents must be processed by user agents with arcane knowledge of some elements and attributes (e.g. <form>, <applet>).

One critical step along the way to the goal is modularization. Many members of the user and alternate client communities have indicated that they wish to subset and extend HTML in a variety of ways. They want to do this to accomodate device-specific functionality, to limit the content that is sent to smaller-footprint devices, or to enhance their ability to produce useful Internet content. The HTML Working Group has determined that the best way to satisfy the requirements of the various constituencies is to define a framework that can be used to develop markup languages derived from HTML. Once defined, the framework would be used as a means for defining extensions to XHTML, and as a set of building blocks that markup language designers could use to bring the extensions together with the base into a cohesive whole.

The result of this work is collectively called XHTML Modularization. This document is intended to provide an overview of the architecture of Modularization, and a description of what aspects of Modularization each Working Draft defines.

This document is informative, and is not intended as a replacement for any text in any W3C current or future recommendation.

Architecture

The architecture of XHTML's modularization is simple: a basic framework of XHTML modules enables the development of XHTML-conforming markup languages. These new languages must use the basic framework, and may also use other XHTML-provided modules, other W3C-defined modules, or indeed any other module that is correctly defined. The modules plug together within the XHTML framework to define a markup language that is task or client specific, but which is based upon the familiar (X)HTML structure. This new markup language is appropriate for a development of portable, XHTML-conforming content. Documents developed against this new markup language will be usable on any XHTML-conforming clients. In many cases, the content will also be portable to existing HTML 4 browsers.

XHTML Family Architectural DIagram

Implementation

XHTML does not require the use of a specific markup language description format. Instead, it defines its modules using abstract prose and implements the abstraction using formats such as XML DTDs and XML Schema. Each implementation places some structural requirements on the way in which modules can be plugged together. However, in general it permits the definition of an XHTML-conforming markup language by merely taking a (supplied) markup language template and adding into that template references to the modules that are needed by the markup language.

XML Namespaces

XHTML Modularization is a framework for defining markup languages. Such languages are independent grammars with their own schema. These schema define the structure of content developed using the languages in addition to defining the elements and attributes that make up the vocabulary of the languages. Such markup languages are orthogonal to the concept of "XML Namespaces". An XML Namespace is a way of mapping specific element or attribute references within a document to collections of elements and attributes through the use of prefixes on the element or attribute names. The combination of elements and attributes from various grammars via the XML Namespace mechanism results in a compound document. XHTML-family markup languages can be used in these documents. The use, or lack of use, of an XML Namespace in relation to an XHTML-family markup language is independent of the language's use as a complete, freestanding markup language. Both uses are possible. However, a document is XHTML conforming only when it uses an XHTML-family markup language as document type, and when it validates against the schema for that markup language.

Modularization Documents

Modularization of XHTML

The first and the most important document in this family is "Modularization of XHTML". This document defines the way in which XHTML-family modules are defined at the abstract level, and decomposes "XHTML 1.0" into sets of related elements and attributes. These sets are called modules. They are designed such that there are a few core modules. These must be present in all XHTML-family document types. These core modules provide structure for the document type as well as simple text markup. These minimal requirements for XHTML-family document types are designed so to ensure that XHTML conforming documents will render on all XHTML-conforming systems, regardless of the XHTML document type to which they are written.

This document also defines the way in which those abstract definitions should be mapped onto an XML DTD, and provides the XML DTD implementation of XHTML Modularization. In the future, instructions for mapping onto XML Schema will also be provided.

The modules defined in this document are based upon the element and attribute definitions in "HTML 4.01". However, all of the deprecated functionality and most of the "transitional" aspects of HTML 4 have been either removed from these modules or relegated to the "legacy" module. The modules are further broken down such that presentational elements are separated from structural. The result is a collection that forms a strong basis for future markup language definitions. Further, it is possible to define markup languages that are purely structural, relying upon stylesheets for information about style and presentation. Such languages are more suited toward use in generic XML applications.

Modularization of XHTML in XML Schema

The purpose of this document, "Modularization of XHTML in XML Schema", is to describe a modularization framework for languages within the XHTML Namespace using XML Schema. This document provides a complete set of XML Schema modules for XHTML. In addition to the schema modules themselves, the framework presented here describes a means of further extending and modifying XHTML. Once this document becomes mature, it is expected that this document will be merged into Modularization of XHTML.

XHTML 1.1 - Module-based XHTML

The purpose of XHTML Modularization is to permit the definition of document types using the modules. The first of these is defined in "XHTML 1.1 - Module-based XHTML". In this document type, the non-presentational modules from "Modularization of XHTML" are brought together into a markup language that relies upon style sheets (like CSS or XSL) for presentation. This version of XHTML is geared toward use in generic, adaptive XML environments when the stylesheet grammars are mature enough to support this. In the interim, documents developed in XHTML 1.1 will render fine in existing and historical browsers. However, since they are by definition valid XML documents, they are also processable by XML search engines and other, related technologies.

XHTML Basic

Another module-based XHTML-family document type is "XHTML Basic". The XHTML Basic document type includes the minimal set of modules required to be an XHTML Host Language document type, and in addition it includes modules like Images, Basic Forms, Basic Tables and Object. It is a subset of XHTML 1.1, and is designed for Web clients that do not support the full set of XHTML features. The document type is simple but rich enough for content authoring.

XHTML Basic is designed as a common base that may be extended. For example, an event module that is more generic than the traditional HTML 4 event system could be added or it could be extended by additional modules from XHTML Modularization such as the Scripting Module. The goal of XHTML Basic is to serve as a common language supported by various kinds of user agents.

Conclusions

The architecture and technology represented by these working drafts is already is use both within the W3C and throughout the industry. The W3C's MathML, Internationalization, and SYMM working groups are using the modularization methods to define XML DTDs that conform to XHTML. External organizations like OASIS and Project Gutenberg are using Modularization of XHTML to build new markup languages that extend XHTML in important ways. Groups like the WAP Forum and the ATSC are looking to use Modular XHTML as the basis for their client-specific markup languages. All in all, the early releases of XHTML Modularization have proven to be very popular. The HTML Working Group continues to work hard to ensure that this important technology is quickly available for the community to use in its ever increasing need for content portability.