Modularization of XHTML

This working draft specifies an abstract modularization of XHTML 1.0. A companion document, Building XHTML Modules, implements this abstraction as a collection of component XML Document Type Definitions (DTDs). This modularization provide a means for subsetting and extending XHTML, a feature desired for extending XHTML's reach onto emerging platforms.

Status of this document

This document is nearly complete, and is being circulated for a final public review prior to last call.

This document is a working draft of the W3C's HTML Working Group. This working draft may be updated, replaced or rendered obsolete by other W3C documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This document is work in progress and does not imply endorsement by the W3C membership.

Please send detailed comments on this document to www-html-editor@w3.org. We cannot guarantee a personal response, but we will try when it is appropriate. Public discussion on HTML features takes place on the mailing list www-html@w3.org.

Quick Table of Contents

Full Table of Contents

1. Introduction

1.1. What is XHTML?

XHTML is the reformulation of HTML 4.0 as an application of XML. XHTML 1.0 [XHTML1] specifies three XML document types that correspond to the three HTML 4.0 DTDs: Strict, Transitional, and Frameset. XHTML 1.0 is the basis for a family of document types that subset and extend HTML.

1.2. What is XHTML Modularization?

XHTML Modularization is decomposition of XHTML 1.0, and by reference HTML 4.0, into a collection of abstract modules that provide specific types of functionality. These abstract modules are implemented in the XHTML 1.1 specification using the XML Document Type Definition language, but other implementations are possible and expected. The mechanism for defining the abstract modules defined in this document, and for implementing them using XML DTDs, is defined in the document "Building XHTML Modules" [BUILDING].

These modules may be combined with each other and with other modules to create XHTML subset and extension document types that qualify as members of the XHTML family of document types.

1.3. Why Modularize XHTML?

The modularization of XHTML refers to the task of specifying well-defined sets of XHTML elements that can be combined and extended by document authors, document type architects, other XML standards specifications, and application and product designers to make it economically feasible for content developers to deliver content on a greater number and diversity of platforms.

Over the last couple of years, many specialized markets have begun looking to HTML as a content language. There is a great movement toward using HTML across increasingly diverse computing platforms. Currently there is activity to move HTML onto mobile devices (hand held computers, portable phones, etc.), television devices (digital televisions, TV-based web browsers, etc.), and appliances (fixed function devices). Each of these devices has different requirements and constraints.

Modularizing XHTML provides a means for product designers to specify which elements are supported by a device using standard building blocks and standard methods for specifying which building blocks are used. These modules serve as "points of conformance" for the content community. The content community can now target the installed base that supports a certain collection of modules, rather than worry about the installed base that supports this permutation of XHTML elements or that permutation of XHTML elements. The use of standards is critical for modularized XHTML to be successful on a large scale. It is not economically feasible for content developers to tailor content to each and every permutation of XHTML elements. By specifying a standard, either software processes can autonomously tailor content to a device, or the device can automatically load the software required to process a module.

Modularization also allows for the extension of XHTML's layout and presentation capabilities, using the extensibility of XML, without breaking the XHTML standard. This development path provides a stable, useful, and implementable framework for content developers and publishers to manage the rapid pace of technological change on the Web.

The modularization of XHTML is accomplished on two major levels: at the abstract level, and at the document type level. Roughly speaking, the abstract level provides a conceptual approach to the modularization of XHTML, while the document type level provides DTD-level building blocks that allow document type designers to support the abstract modules.

1.3.1. Abstract modules

An XHTML document type is defined as a set of abstract modules. A abstract module defines, in a document type, one kind of data that is semantically different from all others. Abstract modules can be combined into document types without a deep understanding of the underlying schema that defines the modules.

1.3.2. DTD modules

A DTD module consists of a set of element types, a set of attribute list declarations, and a set of content model declarations, where any of these three sets may be empty. An attribute list declaration in a DTD module may modify an element type outside the element types in the module, and a content model declaration may modify an element type outside the element type set.

An XML DTD is a means of describing the structure of a class of XML documents, collectively known as an XML document type. XML document types are currently represented as DTDs, as described in the XML 1.0 Recommendation [XML]. Where possible, this document also allows for the potential use of other schema languages that are currently under consideration by the W3C XML Schema Working Group. (e.g. DCD, SOX, DDML, XSchema)

1.3.3. Hybrid document types

A hybrid document type is an XML DTD composed from a collection of XML DTDs or DTD Modules. The primary purpose of the modularization framework described in this document is to allow a DTD author to combine elements from multiple abstract modules into a hybrid document type, develop documents against that hybrid document type, and to validate that document against the associated hybrid document type definition.

One of the most valuable benefits of XML over SGML is that XML reduces the barrier to entry for standardization of element sets that allow communities to exchange data in an interoperable format. However, the relatively static nature of HTML as the content language for the Web has meant that any one of these communities have previously held out little hope that their XML document types would be able to see widespread adoption as part of Web standards. The modularization framework allows for the dynamic incorporation of these diverse document types within the XHTML family of document types, further reducing the barriers to the incorporation of these domain-specific vocabularies in XHTML documents.

1.3.4. Validation

The use of well-formed, but not valid, documents is an important benefit of XML. In the process of developing a document type, however, the additional leverage provided by a validating parser for error checking is important. The same statement applies to XHTML document types with elements from multiple abstract modules.

The general problem of fragment validation - validation of XML documents with different schemas from multiple XML Namespaces [XMLNAMES] in different portions of the document - is beyond the scope of this framework. An essential feature of this framework, however, is a collection of conventions for creating, from a set of abstract modules, hybrid DTDs.

2. Terms and Definitions

While some terms are defined in place, the following definitions are used throughout this document. Familiarity with the W3C XML 1.0 Recommendation [XML] is highly recommended.

3. Conformance Definition

In order to ensure that XHTML-family documents are maximally portable among XHTML-family user agents, this specification rigidly defines conformance requirements for both of these and for XHTML-family document types. While the conformance definitions can be found in this section, they necessarily reference normative text within this document, within the base XHTML specification [XHTML1], and within other related specifications. It is only possible to fully comprehend the conformance requirements of XHTML through a complete reading of all normative references.

3.1. XHTML Family Document Type Conformance

It is possible to modify existing document types and define wholly new document types using both modules defined in this specification and other modules. Such a document type conforms to this specification when it meets the following criteria:

3.2. XHTML Family Document Conformance

Documents that rely upon XHTML-family document types are considered XHTML conforming if they validate against their referenced document type.

3.3. XHTML Family User Agent Conformance

A conforming user agent must meet all of the following criteria (as defined in [XHTML1]):

3.4. Naming Rules

Names for XHTML-conforming document types must adhere to strict naming conventions so that it is possible for software and users to readily determine the relationship of document types to XHTML. The names for document types implemented as XML Document Type Definitions are defined through XML Formal Public Identifiers (FPIs). Within FPIs, fields are separated by double slash character sequences (//). The various fields MUST be composed as follows:

Using these rules, the name for an XHTML family conforming document type might be

-//MyCompany//DTD XHTML-MyML
      1.0//EN

4. XHTML Abstract Modules

This section specifies the contents of the XHTML abstract modules. These modules are abstract definitions of collections of elements, attributes, and their content models. These abstract modules can be mapped onto any appropriate specification mechanism. The XHTML 1.1 Specification, for example, maps these modules onto DTDs as described in [XML].

Content developers and device designers should view this section as a guide to the definition of the functionality provided by the various XHTML-defined modules. When developing documents or defining a profile for a class of documents, content developers can determine which of these modules are essential for conveying their message. When designing clients, device designers should develop their device profiles by choosing from among the abstract modules defined here.

4.1. Common Characteristics of Modules

Many of the abstract modules in this section describe elements, attributes on those elements, and minimal content models for those elements or element sets. This section identifies some shorthand expressions that are used throughout the abstract module definitions. These expressions should in no way be considered normative or mandatory. They are an editorial convenience for this document. When used in the remainder of this section, it is the expansion of the term that is normative, not the term itself.

4.1.1. Syntactic Conventions

The abstract modules are not defined in a formal grammar. However, the definitions do adhere to the following syntactic conventions (as defined in Building XHTML Modules [BUILDING]). These conventions are similar to those of XML DTDs, and should be familiar to XML DTD authors. Each discrete syntactic element can be combined with others to make more complex expressions that conform to the algebra defined here.

element name: When an element is included in a content model, its explicit name will be listed.
Content set: Some modules define lists of explicit element names called content sets. When a content set is included in a content model, its name will be listed.
expr ?: Zero or one instances of expr are permitted.
expr +: One or more instances or expr are required.
expr *: Zero or more instances of expr are permitted.
a , b: Expression a is required, followed by expression b.
a | b: Either expression a or expression b is required.
a - b: Expression a is permitted, omitting elements in expression b.
parentheses: When an expression is contained within parentheses, evaluation of any subexpressions within the parentheses take place before evaluation of expressions outside of the parentheses (starting at the deepest level of nesting first).
extending pre-defined elements: In some instances, a module adds attributes to an element. In these instances, the element name is followed by an ampersand (&). +.
Defining the type of attribute values: When a module defines the type of an attribute value, it does so by listing the type in parentheses after the attribute name.
Defining the legal values of attributes: When a module defines the legal values for an attribute, it does so by listing the explicit legal values (enclosed in quotation marks), separated by verical bars |, inside of parentheses following the attribute name.

4.1.2. Content Types

The abstract module definitions in this document define minimal, atomic content models for each module. These minimal content models reference the elements in the module itself. They may also reference elements in other modules upon which the abstract module depends. Finally, the content model in many cases requires that text be permitted as content to one or more elements. In these cases, the symbol used for text is PCDATA. This is a term, defined in the XML 1.0 Recommendation, that refers to processed character data. A content type can also be defined as EMPTY, meaning the element has no content in its minimal content model.

4.1.3. Attribute Types

In some instances, the types of attribute values or the explicit set of permitted values for attributes are defined. The following attribute types (defined in the XML 1.0 Recommendation) are used in the definitions of the Abstract Modules:

Attribute Type	Definition
CDATA	Character data
ID	A document-unique identifier
IDREF	A reference to a document-unique identifier
NAME	A name with the same character constraints as ID above
NMTOKEN	A name composed of CDATA characters but no whitespace
NMTOKENS	Multiple names composed of CDATA characters separated by whitespace
PCDATA	Processed character data

4.1.4. Attribute Collections

The following basic attribute sets are used on many elements. In each case where they are used, their use is identified via their name rather than enumerating the list.

Collection Name	Attributes in Collection
Core	class (NMTOKEN), id (ID), title (CDATA)
I18N	dir ("rtl" \| "ltr"), xml:lang (NMTOKEN)
Events	onclick, ondblclick, onmousedown, onmouseup, onmouseover, onmousemove, onmouseout, onkeypress, onkeydown, onkeyup
Style	style (CDATA)
Common	Core + Events + Internationalization + Style

Note that the Events collection is only defined when the Intrinsic Events abstract module is selected. Otherwise, the Events collection is empty.

Also note that the Style collection is only defined when the Stylesheet Module is selected. Otherwise, the Style collection is empty.

4.2. Basic Modules

4.2.1. Structure Module

The Structure Module defines the major structural elements for XHTML. These elements effectively act as the basis for the content model of many XHTML family document types. The elements and attributes included in this module are:

Elements	Attributes	Minimal Content Model
body	Common	(Heading \| Block \| List)*
div	Common	(Heading \| Block \| List)*
head	I18n, profile	title
html	I18n, version, xmlns	head, body
span	Common	(PCDATA \| Inline)*
title	I18n	PCDATA

This module is the basic structural definition for XHTML content. The html element acts as the root element for all XHTML Family Document Types. The

div

element is added to the Block content set and the span element is added to the Inline content set as these are defined in the Basic Text Module below.

4.2.2. Basic Text Module

This module defines all of the basic text container elements, attributes, and their content model:

Element	Attributes	Minimal Content Model
abbr	Common	(PCDATA \| Inline)*
acronym	Common	(PCDATA \| Inline)*
address	Common	(PCDATA \| Inline)*
blockquote	Common, cite	(PCDATA \| Heading \| Block)*
br	Core	EMPTY
cite	Common	(PCDATA \| Inline)*
code	Common	(PCDATA \| Inline)*
dfn	Common	(PCDATA \| Inline)*
em	Common	(PCDATA \| Inline)*
h1	Common	(PCDATA \| Inline)*
h2	Common	(PCDATA \| Inline)*
h3	Common	(PCDATA \| Inline)*
h4	Common	(PCDATA \| Inline)*
h5	Common	(PCDATA \| Inline)*
h6	Common	(PCDATA \| Inline)*
kbd	Common	(PCDATA \| Inline)*
p	Common	(PCDATA \| Inline)*
pre	Common	(PCDATA \| Inline)*
q	Common	(PCDATA \| Inline)*
samp	Common	(PCDATA \| Inline)*
strong	Common	(PCDATA \| Inline)*
var	Common	(PCDATA \| Inline)*

4.2.3. Hypertext Module

The Hypertext Module provides the element that is used to define hypertext links to other resources. This module supports the following element and attributes:

Element	Attributes	Minimal Content Model
a	Common, charset, href, hreflang, rel, rev, type	(PCDATA \| Inline - a)*

This module adds the a element to the Inline content set of the Basic Text Module.

4.2.4. List Module

As its name suggests, the List Module provides list-oriented elements. Specifically, the List Module supports the following elements and attributes:

Elements	Attributes	Minimal Content Model
dl	Common	(dt \| dd)+
dt	Common	(PCDATA \| Inline)*
dd	Common	(PCDATA \| Inline)*
ol	Common	li+
ul	Common	li+
li	Common	(PCDATA \| Inline)*

This module also defines the content set List with the minimal content model (dl | ol | ul)+ and adds this set to the Flow content set of the Basic Text Module.

4.3. Applet Module

The Applet Module provides elements for referencing external applications. Specifically, the Applet Module supports the following elements and attributes:

Element	Attributes	Minimal Content Model
applet	Core, alt, archive, code, codebase, height, name, object, width	param?
param	id (ID), name (CDATA), type, value, valuetype	EMPTY

When the Applet Module is used, it adds the


      applet

element to the Inline content set of the Basic Text Module.

4.4. Text Extension Modules

4.4.1. Presentation Module

This module defines elements, attributes, and a minimal content model for simple presentation-related markup:

Element	Attributes	Minimal Content Model
b	Common	(PCDATA \| Inline)*
big	Common	(PCDATA \| Inline)*
hr	Common	EMPTY
i	Common	(PCDATA \| Inline)*
small	Common	(PCDATA \| Inline)*
sub	Common	(PCDATA \| Inline)*
sup	Common	(PCDATA \| Inline)*
tt	Common	(PCDATA \| Inline)*

When this module is used, the hr element is added to the Block content set of the Basic Text Module. In additional, the

b, big, i,
      small, sub, sup,

and tt elements are added to the Inline content set of the Basic Text Module.

4.4.2. Edit Module

Element	Attributes	Minimal Content Model
del	Common	(PCDATA \| Inline)*
ins	Common	(PCDATA \| Inline)*

When this module is used, the del and

ins

elements are added to the Inline content set of the Basic Text Module.

4.4.3. BDO Module

The BDO module defines an element that can be used to declare the bi-directional rules for the element's content.

Elements	Attributes	Minimal Content Model
bdo	Common	(PCDATA \| Inline)*

When this module is used, the bdo element are added to the Inline content set of the Basic Text Module.

4.5. Forms Modules

4.5.1. Basic Forms Module

The Basic Forms Module provides the forms features found in HTML 3.2. Specifically, the Basic Forms Module supports the following elements, attributes, and minimal content model:

When this module is used, it adds the Form content set to the Block content set and it adds the Formctrl content set to the Inline content set as these are defined in the Basic Text Module.

Elements	Attributes	Minimal Content Model
form	Common, action, method, enctype	Heading \| Block - form
input	Common, checked, maxlength, name, size, src, type, value	EMPTY
select	Common, multiple, name, size	option+
option	Common, selected, value	Inline*
textarea	Common, columns, name, rows	PCDATA*

4.5.2. Forms Module

The Forms Module provides all of the forms features found in HTML 4.0. Specifically, the Forms Module supports:

When this module is used, it adds the Form content set to the Block content set and it adds the Formctrl content set to the Inline content set as these are defined in the Basic Text Module.

The Forms Module is a superset of the Basic Forms Module. These modules may not be used together in a single document type.

Elements	Attributes	Minimal Content Model
form	Common, accept, accept-charset, action, method, enctype	(Heading \| Block - form \| fieldset)+
input	Common, accept, accesskey, alt, checked, disabled, maxlength, name, readonly, size, src, tabindex, type, value	EMPTY
select	Common, disabled, multiple, name, size, tabindex	(optgroup \| option)+
option	Common, disabled, label, selected, value	PCDATA
textarea	Common, accesskey, columns, disabled, name, readonly, rows, tabindex	PCDATA
button	Common, accesskey, disabled, name, tabindex, type, value	(PCDATA \| Heading \| List \| Block - Form \| Inline - Formctrl)*
fieldset	Common	(PCDATA \| legend \| Flow)*
label	Common, accesskey, for	(PCDATA \| Inline - label)*
legend	Common, accesskey	(PCDATA \| Inline)+
optgroup	Common, disabled, label	option+

4.6. Table Modules

4.6.1. Basic Tables Module

The Basic Tables Module provides table-related elements, but only in a limited form. Specifically, the Basic Tables Module supports:

Elements	Attributes	Minimal Content Model
caption	Common	(PCDATA \| Inline)*
table	Common, border, cellpadding. cellspacing, summary, width	caption?, tr+
td	Common, abbr, align, axis, colspan, headers, rowspan, scope, valign	(PCDATA \| Flow)*
th	Common, abbr, align, axis, colspan, headers, rowspan, scope, valign	(PCDATA \| Flow)*
tr	Common, align, valign	(th \| td)+

When this module is used, it adds the table element to the Block content set as defined in the Basic Text Module.

4.6.2. Tables Module

As its name suggests, the Tables Module provides table-related elements that are better able to be accessed by non-visual user agents. Specifically, the Tables Module supports the following elements, attributes, and content model:

Elements	Attributes	Minimal Content Model
caption	Common	(PCDATA \| Inline)*
table	Common, border, cellpadding. cellspacing, datapagesize, frame, rules, summary, width	caption?, ( col* \| colgroup* ), (( thead?, tfoot?, tbody+ ) \| ( tr+ ))
td	Common, abbr, align, axis, colspan, headers, rowspan, scope, valign	(PCDATA \| Inline)*
th	Common, abbr, align, axis, colspan, headers, rowspan, scope, valign	(PCDATA \| Inline)*
tr	Common, align, valign	(td \| th)+
col	Common, align, span, valign, width	EMPTY
colgroup	Common, align, span, valign, width	col*
tbody	Common, align, valign	tr+
thead	Common, align, valign	tr+
tfoot	Common, align, valign	tr+

When this module is used, it adds the table element to the Block content set of the Basic Text Module.

4.7. Image Module

The Image Module provides basic image embedding, and may be used in some implementations independently of client side image maps. The Image Module supports the following element and attributes:

Elements	Attributes	Minimal Content Model
img	Common, alt, height, longdesc, src, width	EMPTY

When this module is used, it adds the img element to the Inline content set of the Basic Text Module.

4.8. Client-side Image Map Module

The Client-side Image Map Module provides elements for client side image maps. It requires that the Image Module (or another module that supports the img element) be included. The Client-side Image Map Module supports the following elements:

Elements	Attributes	Minimal Content Model
a&	coords, shape	n/a
area	Common, accesskey, alt, coords, href, nohref, shape, tabindex	EMPTY
img&	usemap	n/a
map	Common, name	((Heading \| Block) \| area)+
object&	usemap	Note: Only when the object module is included

When this module is used, the table element is added to the Block content set of the Basic Text Module.

4.9. Server-side Image Map Module

The Server-side Image Map Module provides support for image-selection and transmission of selection coordinates. It requires that the Image Module (or another module that supports the img element) be included. The Server-side Image Map Module supports the following attributes:

Elements	Attributes	Minimal Content Model
img&	ismap	n/a

4.10. Object Module

The Object Module provides elements for general-purpose object inclusion. Specifically, the Object Module supports:

Elements	Attributes	Minimal Content Model
object	Common, archive, classid, codebase, codetype, data, declare, height, standby, tabindex, type, width	(PCDATA \| Flow \| param)*
param	id, type, value, valuetype	EMPTY

When this module is used, it adds the object element to the Inline content set of the Basic Text Module.

4.11. Frames Module

As its name suggests, the Frames Module provides frame-related elements. Specifically, the Frames Module supports:

Elements	Attributes	Minimal Content Model
frameset	Core, cols, rows	(frame \| noframes)+
frame	Core, frameborder, longdesc, marginheight, marginwidth, noresize, scrolling, src	EMPTY
noframes	Common	body
a&	target	n/a

4.12. Iframe Module

The Iframe Module defines an element that can be used to define a base URL against which relative URIs in the document will be resolved. The element and attribute included in this module are:

Elements	Attributes	Minimal Content Model
iframe	Core, frameborder, height, longdesc, marginheight, marginwidth, scrolling, src, width	Flow

When this module is used, the iframe element is added to the Block content set as defined by the Basic Text Module.

4.13. Intrinsic Events

Intrinsic events are attributes that are used in conjunction with elements that can have specific actions occur when certain events are performed by the user. The attributes indicated in the following table are added to the attribute set for their respective elements ONLY when the modules defining those elements are selected. Note also that selection of this module defines the attribute collection Events as described above. Attributes defined by this module are:

Elements	Attributes	Notes
a&	onblur, onfocus
area&	onblur, onfocus	When the Client-side Image Map module is also used
form&	onreset, onsubmit	When the Basic Forms or Forms module is used
body&	onload, onunload
label&	onblur, onfocus	When the Forms module is used
input&	onblur, onchange, onfocus, onselect	When the Basic Forms or Forms module is used
select&	onblur, onchange, onfocus	When the Basic Forms or Forms module is used
textarea&	onblur, onchange, onfocus, onselect	When the Basic Forms or Forms module is used
button&	onblur, onfocus	When the Forms module is used

4.14. Metainformation Module

The Metainformation Module defines an element that describes information within the declarative portion of a document (in XHTML within the head element). This module includes the following element:

Elements	Attributes	Minimal Content Model
meta	I18n, content, http-equiv, name, scheme	EMPTY

4.15. Scripting Module

The Scripting Module defines elements that are used to contain information pertaining to executable scripts or the lack of support for executable scripts. Elements and attributes included in this module are:

Elements	Attributes	Minimal Content Model
noscript	Common	(Heading \| List \| Block)+
script	charset, defer, src, type	PCDATA

When this module is used, it adds the

script and
      noscript

elements are added to the Block content set of the Basic Text Module.

4.16. Stylesheet Module

The Stylesheet Module enables style sheet processing. Note also that selection of this module defines the attribute collection Style as described above. The element and attributes defined by this module are:

Elements	Attributes	Minimal Content Model
style	I18n, media, title, type	PCDATA

When this module is used, it adds the style element to the Block content set of the Basic Text Module.

4.17. Link Module

The Link Module defines an element that can be used to define links to external resources. These resources are often used to augment the user agent's ability to process the associated XHTML document. The element and attributes included in this module are:

Elements	Attributes	Minimal Content Model
link	Common, charset, href, hreflang, media, rel, rev, type	EMPTY

When this module is used, it adds the link element to the content model of the head element as defined in the Structure Module.

4.18. Base Module

The Base Module defines an element that can be used to define a base URL against which relative URIs in the document will be resolved. The element and attribute included in this module are:

Elements	Attributes	Minimal Content Model
base	href	EMPTY

When this module is used, it adds the base element to the content model of the head element of the Structure Module.

A. References

A.1. Normative References

A.2. Informative References

B. Design Goals

B.1. Requirements

The design goals listed in the previous section lead to a large number of requirements for the modularization framework. These requirements, summarized in this section, can be further classified according to the major features of the framework to be described.

B.1.1. Granularity

Collectively the requirements in this section express the desire that the modules defined within the framework hit the right level of granularity:

B.1.2. Composibility

The composibility requirements listed here are intended to ensure that the modularization framework be able to express the right set of target modules required by the communities that will be served by the framework:

B.1.3. Ease of Use

The modularization framework will only receive widespread adoption if it describes mechanisms that make it easy for our target audience to use the framework:

B.1.4. Compatibility

The intent of this document is that the modularization framework described here should work well with the XML and other standards being developed by the W3C Working Groups:

B.1.5. Conformance

The effectiveness of the framework will also be measured by how easy it is to test the behavior of modules developed according to the framework, and to test the documents that employ those modules for validation:

Modularization of XHTML™

W3C Working Draft 10 September 1999

Abstract