W3C

Modularization of XHTML

W3C Working Draft 5 January 2000

This version:
http://www.w3.org/TR/2000/WD-xhtml-modularization-20000105
(Single HTML file, Postscript version, PDF version, ZIP archive, or Gzip'd TAR archive)
Latest version:
http://www.w3.org/TR/xhtml-modularization
Previous version:
http://www.w3.org/TR/1999/WD-xhtml-modularization-19990910/
Diff-marked version:
xhtml-modularization-diff-20000105.html
Editors:
Murray Altheim, Sun Microsystems
Frank Boumphrey, HTML Writers Guild
Sam Dooley, IBM
Shane McCarron, Applied Testing and Technology
Ted Wugofski, Gateway

Abstract

This working draft specifies an abstract modularization of XHTML and an implementation of the abstraction using XML Document Type Definitions (DTDs). This modularization provide a means for subsetting and extending XHTML, a feature needed for extending XHTML's reach onto emerging platforms.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This is the "Last Call Working Draft" of "Modularization of XHTML". The Last Call review period ends on 2359Z on 1 Feburary 2000. Please send review comments before the review period ends to www-html-editor@w3.org.

The Working Group anticipates asking the W3C Director to advance this document to Proposed Recommendation after the Working Group processes Last Call review comments and incorporates resolutions into the Guidelines.

This document has been produced as part of the W3C HTML Activity. The goals of the HTML Working Group (members only) are discussed in the HTML Working Group charter (members only).

This is a W3C Working Draft for review by W3C Members and other interested parties. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This is work in progress and does not imply endorsement by, or the consensus of, either W3C or participants of the HTML WG Group.

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.

Quick Table of Contents

Full Table of Contents

1. Introduction

This section is normative.

1.1. What is XHTML?

XHTML is the reformulation of HTML 4.0 as an application of XML. XHTML 1.0 [XHTML1] specifies three XML document types that correspond to the three HTML 4.0 DTDs: Strict, Transitional, and Frameset. XHTML 1.0 is the basis for a family of document types that subset and extend HTML.

1.2. What is XHTML Modularization?

XHTML Modularization is a decomposition of XHTML 1.0, and by reference HTML 4.0, into a collection of abstract modules that provide specific types of functionality. These abstract modules are implemented in this specification using the XML Document Type Definition language, but an implementation using XML Schemas is expected. The mechanism for defining the abstract modules defined in this document, and for implementing them using XML DTDs, is defined in the document "Building XHTML Modules" [BUILDING].

These modules may be combined with each other and with other modules to create XHTML subset and extension document types that qualify as members of the XHTML family of document types.

1.3. Why Modularize XHTML?

The modularization of XHTML refers to the task of specifying well-defined sets of XHTML elements that can be combined and extended by document authors, document type architects, other XML standards specifications, and application and product designers to make it economically feasible for content developers to deliver content on a greater number and diversity of platforms.

Over the last couple of years, many specialized markets have begun looking to HTML as a content language. There is a great movement toward using HTML across increasingly diverse computing platforms. Currently there is activity to move HTML onto mobile devices (hand held computers, portable phones, etc.), television devices (digital televisions, TV-based web browsers, etc.), and appliances (fixed function devices). Each of these devices has different requirements and constraints.

Modularizing XHTML provides a means for product designers to specify which elements are supported by a device using standard building blocks and standard methods for specifying which building blocks are used. These modules serve as "points of conformance" for the content community. The content community can now target the installed base that supports a certain collection of modules, rather than worry about the installed base that supports this permutation of XHTML elements or that permutation of XHTML elements. The use of standards is critical for modularized XHTML to be successful on a large scale. It is not economically feasible for content developers to tailor content to each and every permutation of XHTML elements. By specifying a standard, either software processes can autonomously tailor content to a device, or the device can automatically load the software required to process a module.

Modularization also allows for the extension of XHTML's layout and presentation capabilities, using the extensibility of XML, without breaking the XHTML standard. This development path provides a stable, useful, and implementable framework for content developers and publishers to manage the rapid pace of technological change on the Web.

1.3.1. Abstract modules

An XHTML document type is defined as a set of abstract modules. A abstract module defines one kind of data that is semantically different from all others. Abstract modules can be combined into document types without a deep understanding of the underlying schema that define the modules.

1.3.2. DTD modules

A DTD module consists of a set of element types, a set of attribute list declarations, and a set of content model declarations, where any of these three sets may be empty. An attribute list declaration in a DTD module may modify an element type outside the element types in the module, and a content model declaration may modify an element type outside the element type set.

An XML DTD is a means of describing the structure of a class of XML documents, collectively known as an XML document type. XML document types are currently represented as DTDs, as described in the XML 1.0 Recommendation [XML]. Where possible, this document also allows for the potential use of other schema languages that are currently under consideration by the W3C XML Schema Working Group. (e.g. DCD, SOX, DDML, XSchema)

1.3.3. Hybrid document types

A hybrid document type is an XML DTD composed from a collection of XML DTDs or DTD Modules. The primary purpose of the modularization framework described in this document is to allow a DTD author to combine elements from multiple abstract modules into a hybrid document type, develop documents against that hybrid document type, and to validate that document against the associated hybrid document type definition.

One of the most valuable benefits of XML over SGML is that XML reduces the barrier to entry for standardization of element sets that allow communities to exchange data in an interoperable format. However, the relatively static nature of HTML as the content language for the Web has meant that any one of these communities have previously held out little hope that their XML document types would be able to see widespread adoption as part of Web standards. The modularization framework allows for the dynamic incorporation of these diverse document types within the XHTML family of document types, further reducing the barriers to the incorporation of these domain-specific vocabularies in XHTML documents.

1.3.4. Validation

The use of well-formed, but not valid, documents is an important benefit of XML. In the process of developing a document type, however, the additional leverage provided by a validating parser for error checking is important. The same statement applies to XHTML document types with elements from multiple abstract modules.

The general problem of fragment validation - validation of XML documents with different schemas from multiple XML Namespaces [XMLNAMES] in different portions of the document - is beyond the scope of this framework. An essential feature of this framework, however, is a collection of conventions for creating, from a set of abstract modules, hybrid DTDs.

2. Terms and Definitions

This section is informative.

While some terms are defined in place, the following definitions are used throughout this document. Familiarity with the W3C XML 1.0 Recommendation [XML] is highly recommended.

abstract module
a unit of document type specification corresponding to a distinct type of content, corresponding to a markup construct reflecting this distinct type.
compound document
A compound document is a document that uses more than one XML Namespace. Compound documents may be defined as documents that contain elements or attributes from multiple document types.
content model
the declared markup structure allowed within instances of an element type. XML 1.0 differentiates two types: elements containing only element content (no character data) and mixed content (elements that may contain character data optionally interspersed with child elements). The latter are characterized by a content specification beginning with the "#PCDATA" string (denoting character data).
document model
the effective structure and constraints of a given document type. The document model constitutes the abstract representation of the physical or semantic structures of a class of documents.
document type
a class of documents sharing a common abstract structure. The ISO 8879 [SGML] definition is as follows: "a class of documents having similar characteristics; for example, journal, article, technical manual, or memo. (4.102)"
document type definition (DTD)
a formal, machine-readable expression of the XML structure and syntax rules to which a document instance of a specific document type must conform; the schema type used in XML 1.0 to validate conformance of a document instance to its declared document type. The same markup model may be expressed by a variety of DTDs.
driver
a generally short file used to declare and instantiate the modules of a DTD. A good rule of thumb is that a DTD driver contains no markup declarations that comprise any part of the document model itself.
element
an instance of an element type.
element type
the definition of an element, that is, a container for a distinct semantic class of document content.
entity
an entity is a logical or physical storage unit containing document content. Entities may be composed of parse-able XML markup or character data, or unparsed (ie., non-XML, possibly non-textual) content. Entity content may be either defined entirely within the document entity ("internal entities") or external to the document entity ("external entities"). In parsed entities, the replacement text may include references to other entities.
entity reference
a mnemonic or numeric string used as a reference to the content of a declared entity (eg., "&amp;" for "&", "&#60;" for "<", "&copy;" for "©".)
generic identifier
the name identifying the element type of an element. Also, element type name.
instantiate
to replace an entity reference with an instance of its declared content.
markup declaration
a syntactical construct within a DTD declaring an entity or defining a markup structure. Within XML DTDs, there are four specific types: entity declaration defines the binding between a mnemonic symbol and its replacement content. element declaration constrains which element types may occur as descendants within an element. See also content model. attribute definition list declaration defines the set of attributes for a given element type, and may also establish type constraints and default values. notation declaration defines the binding between a notation name and an external identifier referencing the format of an unparsed entity
markup model
the markup vocabulary (ie., the gamut of element and attribute names, notations, etc.) and grammar (ie., the prescribed use of that vocabulary) as defined by a document type definition (ie., a schema) The markup model is the concrete representation in markup syntax of the document model, and may be defined with varying levels of strict conformity. The same document model may be expressed by a variety of markup models.
module
an abstract unit within a document model expressed as a DTD fragment, used to consolidate markup declarations to increase the flexibility, modifiability, reuse and understanding of specific logical or semantic structures.
modularization
an implementation of a modularization model; the process of composing or de-composing a DTD by dividing its markup declarations into units or groups to support specific goals. Modules may or may not exist as separate file entities (ie., the physical and logical structures of a DTD may mirror each other, but there is no such requirement).
modularization model
the abstract design of the document type definition (DTD) in support of the modularization goals, such as reuse, extensibility, expressiveness, ease of documentation, code size, consistency and intuitiveness of use. It is important to note that a modularization model is only orthogonally related to the document model it describes, so that two very different modularization models may describe the same document type.
parameter
entity an entity whose scope of use is within the document prolog (ie., the external subset/DTD or internal subset). Parameter entities are disallowed within the document instance.
parent document type
A parent document type of a compound document is the document type of the root element.
tag
descriptive markup delimiting the start and end (including its generic identifier and any attributes) of an element.

3. Conformance Definition

This section is normative.

In order to ensure that XHTML-family documents are maximally portable among XHTML-family user agents, this specification rigidly defines conformance requirements for both of these and for XHTML-family document types. While the conformance definitions can be found in this section, they necessarily reference normative text within this document, within the base XHTML specification [XHTML1], and within other related specifications. It is only possible to fully comprehend the conformance requirements of XHTML through a complete reading of all normative references.

3.1. XHTML Family Document Type Conformance

It is possible to modify existing document types and define wholly new document types using both modules defined in this specification and other modules. Such a document type conforms to this specification when it meets the following criteria:

  1. The document type must be defined using one of the implementation methods defined by the W3C (currently this is limited to XML DTDs, but XML Schema will be available soon).
  2. The document type must have a unique identifier as defined in Naming Rules.
  3. The document type must include, at a minimum, the Structure, Hypertext, Basic Text, and List modules defined in this specification.
  4. For each of the W3C-defined modules that are included, all of the elements, attributes, and any required minimal content models must be included (and optionally extended) in the document type's content model.
  5. The document type may define additional elements and attributes. However, these must be in their own XML Namespace [XMLNAMES].

3.2. XHTML Family Document Conformance

Documents that rely upon XHTML-family document types are considered XHTML conforming if they validate against their referenced document type.

3.3. XHTML Family User Agent Conformance

A conforming user agent must meet all of the following criteria (as defined in [XHTML1]):

  1. In order to be consistent with the XML 1.0 Recommendation [XML], the user agent must parse and evaluate an XHTML document for well-formedness. If the user agent claims to be a validating user agent, it must also validate documents against their referenced DTDs according to [XML].
  2. When the user agent claims to support facilities defined within this specification or required by this specification through normative reference, it must do so in ways consistent with the facilities' definition.
  3. When a user agent processes an XHTML document as generic XML, it shall only recognize attributes of type ID (e.g. the id attribute on most XHTML elements) as fragment identifiers.
  4. If a user agent encounters an element it does not recognize, it must render the element's content.
  5. If a user agent encounters an attribute it does not recognize, it must ignore the entire attribute specification (i.e., the attribute and its value).
  6. If a user agent encounters an attribute value it doesn't recognize, it must use the default attribute value.
  7. If it encounters an entity reference (other than one of the predefined entities) for which the User Agent has processed no declaration (which could happen if the declaration is in the external subset which the User Agent hasn't read), the entity reference should be rendered as the characters (starting with the ampersand and ending with the semi-colon) that make up the entity reference.
  8. When rendering content, User Agents that encounter characters or character entity references that are recognized but not renderable should display the document in such a way that it is obvious to the user that normal rendering has not taken place.
  9. The following characters are defined in [XML] as whitespace characters:

    The XML processor normalizes different system's line end codes into one single line-feed character, that is passed up to the application. The XHTML user agent in addition, must treat the following characters as whitespace:

    In elements where the 'xml:space' attribute is set to 'preserve', the user agent must leave all whitespace characters intact (with the exception of leading and trailing whitespace characters, which should be removed). Otherwise, whitespace is handled according to the following rules:

    Whitespace in attribute values is processed according to [XML].

3.4. Naming Rules

Names for XHTML-conforming document types must adhere to strict naming conventions so that it is possible for software and users to readily determine the relationship of document types to XHTML. The names for document types implemented as XML Document Type Definitions are defined through XML Formal Public Identifiers (FPIs). Within FPIs, fields are separated by double slash character sequences (//). The various fields MUST be composed as follows:

  1. The leading field identifies the resources relationship to a formal standard. For privately defined resources, this field MUST be "-". For formal standards, this field MUST be the formal reference to the standard (e.g. ISO/IEC 15445:1999).
  2. The second field MUST contain the name of the organization responsible for maintaining the named item. There is no formal registry for these organization names. Each organization SHOULD define a name that is unique. The name used by the W3C is, for example, W3C.
  3. The third field MUST take the form DTD XHTML- followed by an organization-defined unique identifier (e.g. MyML 1.0). This identifier SHOULD be composed of a unique name and a version identifier that can be updated as the document type evolves.
  4. The fourth field defines the language in which the item is developed (e.g. EN).

Using these rules, the name for an XHTML family conforming document type might be -//MyCompany//DTD XHTML-MyML 1.0//EN.

3.4.1. Rationale for Naming Rules

Naming Rules are critical for portability of user agents and XHTML-conforming tools. These rules need to be simple enough that they can be readily adhered to, and need to convey upon document type and module designers the power to readily associate their creations with XHTML (for marketing purposes, if nothing else). The above rules address these concerns. There were some other possibilities for naming conventions, and they were not used for the following reasons:

4. XHTML Abstract Modules

This section is normative.

This section specifies the contents of the XHTML abstract modules. These modules are abstract definitions of collections of elements, attributes, and their content models. These abstract modules can be mapped onto any appropriate specification mechanism. The XHTML 1.1 Specification, for example, maps these modules onto DTDs as described in [XML].

Content developers and device designers should view this section as a guide to the definition of the functionality provided by the various XHTML-defined modules. When developing documents or defining a profile for a class of documents, content developers can determine which of these modules are essential for conveying their message. When designing clients, device designers should develop their device profiles by choosing from among the abstract modules defined here.

4.1. Common Characteristics of Modules

Many of the abstract modules in this section describe elements, attributes on those elements, and minimal content models for those elements or element sets. This section identifies some shorthand expressions that are used throughout the abstract module definitions. These expressions should in no way be considered normative or mandatory. They are an editorial convenience for this document. When used in the remainder of this section, it is the expansion of the term that is normative, not the term itself.

4.1.1. Syntactic Conventions

The abstract modules are not defined in a formal grammar. However, the definitions do adhere to the following syntactic conventions (as defined in Building XHTML Modules [BUILDING]). These conventions are similar to those of XML DTDs, and should be familiar to XML DTD authors. Each discrete syntactic element can be combined with others to make more complex expressions that conform to the algebra defined here.

element name
When an element is included in a content model, its explicit name will be listed.
Content set
Some modules define lists of explicit element names called content sets. When a content set is included in a content model, its name will be listed.
expr ?
Zero or one instances of expr are permitted.
expr +
One or more instances or expr are required.
expr *
Zero or more instances of expr are permitted.
a , b
Expression a is required, followed by expression b.
a | b
Either expression a or expression b is required.
a - b
Expression a is permitted, omitting elements in expression b.
parentheses
When an expression is contained within parentheses, evaluation of any subexpressions within the parentheses take place before evaluation of expressions outside of the parentheses (starting at the deepest level of nesting first).
extending pre-defined elements
In some instances, a module adds attributes to an element. In these instances, the element name is followed by an ampersand (&).
Defining the type of attribute values
When a module defines the type of an attribute value, it does so by listing the type in parentheses after the attribute name.
Defining the legal values of attributes
When a module defines the legal values for an attribute, it does so by listing the explicit legal values (enclosed in quotation marks), separated by vertical bars |, inside of parentheses following the attribute name.

4.1.2. Content Types

The abstract module definitions in this document define minimal, atomic content models for each module. These minimal content models reference the elements in the module itself. They may also reference elements in other modules upon which the abstract module depends. Finally, the content model in many cases requires that text be permitted as content to one or more elements. In these cases, the symbol used for text is PCDATA. This is a term, defined in the XML 1.0 Recommendation, that refers to processed character data. A content type can also be defined as EMPTY, meaning the element has no content in its minimal content model.

4.1.3. Attribute Types

In some instances, the types of attribute values or the explicit set of permitted values for attributes are defined. The following attribute types (defined in the XML 1.0 Recommendation) are used in the definitions of the Abstract Modules:

Attribute Type Definition
CDATA Character data
ID A document-unique identifier
IDREF A reference to a document-unique identifier
IDREFS A space-separated list of references to document-unique identifiers
NAME A name with the same character constraints as ID above
NMTOKEN A name composed of CDATA characters but no whitespace
NMTOKENS Multiple names composed of CDATA characters separated by whitespace
PCDATA Processed character data

In addition to these pre-defined data types, XHTML Modularization defines the following data types and their semantics (as appropriate):

Data types Description
Character A single character from [ISO10646].
Charset A character encoding, as per [RFC2045].
Charsets A space separated list of character encodings, as per [RFC2045].
Color

The attribute value type "Color" refers to color definitions as specified in [SRGB]. A color value may either be a hexadecimal number (prefixed by a hash mark) or one of the following sixteen color names. The color names are case-insensitive.

Color names and sRGB values
Black = "#000000" Green = "#008000"
Silver = "#C0C0C0" Lime = "#00FF00"
Gray = "#808080" Olive = "#808000"
White = "#FFFFFF" Yellow = "#FFFF00"
Maroon = "#800000" Navy = "#000080"
Red = "#FF0000" Blue = "#0000FF"
Purple = "#800080" Teal = "#008080"
Fuchsia = "#FF00FF" Aqua = "#00FFFF"

Thus, the color values "#800080" and "Purple" both refer to the color purple.

ContentType A media type, as per [RFC2045].
ContentTypes A comma-separated list of media types, as per [RFC2045].
Datetime Date and time information.
LanguageCode A language code, as per [RFC1766].
Length The value may be either a Pixels or a percentage of the available horizontal or vertical space. Thus, the value "50%" means half of the available space.
LinkTypes

Authors may use the following recognized link types, listed here with their conventional interpretations. A LinkTypes value refers to a space-separated list of link types. White space characters are not permitted within link types.

These link types are case-insensitive, i.e., "Alternate" has the same meaning as "alternate".

User agents, search engines, etc. may interpret these link types in a variety of ways. For example, user agents may provide access to linked documents through a navigation bar.

Alternate
Designates substitute versions for the document in which the link occurs. When used together with the xml:lang attribute, it implies a translated version of the document. When used together with the media attribute, it implies a version designed for a different medium (or media).
Stylesheet
Refers to an external style sheet. See the Style Module for details. This is used together with the link type "Alternate" for user-selectable alternate style sheets.
Start
Refers to the first document in a collection of documents. This link type tells search engines which document is considered by the author to be the starting point of the collection.
Next
Refers to the next document in a linear sequence of documents. User agents may choose to preload the "next" document, to reduce the perceived load time.
Prev
Refers to the previous document in an ordered series of documents. Some user agents also support the synonym "Previous".
Contents
Refers to a document serving as a table of contents. Some user agents also support the synonym ToC (from "Table of Contents").
Index
Refers to a document providing an index for the current document.
Glossary
Refers to a document providing a glossary of terms that pertain to the current document.
Copyright
Refers to a copyright statement for the current document.
Chapter
Refers to a document serving as a chapter in a collection of documents.
Section
Refers to a document serving as a section in a collection of documents.
Subsection
Refers to a document serving as a subsection in a collection of documents.
Appendix
Refers to a document serving as an appendix in a collection of documents.
Help
Refers to a document offering help (more information, links to other sources information, etc.)
Bookmark
Refers to a bookmark. A bookmark is a link to a key entry point within an extended document. The title attribute may be used, for example, to label the bookmark. Note that several bookmarks may be defined in each document.
MediaDesc

The MediaDesc attribute is a comma separated list of media descriptors. The following is a list of recognized media descriptors:

screen
Intended for non-paged computer screens.
tty
Intended for media using a fixed-pitch character grid, such as teletypes, terminals, or portable devices with limited display capabilities.
tv
Intended for television-type devices (low resolution, color, limited scrollability).
projection
Intended for projectors.
handheld
Intended for handheld devices (small screen, monochrome, bitmapped graphics, limited bandwidth).
print
Intended for paged, opaque material and for documents viewed on screen in print preview mode.
braille
Intended for braille tactile feedback devices.
aural
Intended for speech synthesizers.
all
Suitable for all devices.

Future versions of XHTML may introduce new values and may allow parameterized values. To facilitate the introduction of these extensions, conforming user agents must be able to parse the media attribute value as follows:

  1. The value is a comma-separated list of entries. For example,
    media="screen, 3d-glasses, print and resolution > 90dpi"
    

    is mapped to:

    "screen"
    "3d-glasses"
    "print and resolution > 90dpi"
    
  2. Each entry is truncated just before the first character that isn't a US ASCII letter [a-zA-Z] (ISO 10646 hex 41-5a, 61-7a), digit [0-9] (hex 30-39), or hyphen (hex 2d). In the example, this gives:
    "screen"
    "3d-glasses"
    "print"
    
  3. A case-sensitive match is then made with the set of media types defined above. User agents may ignore entries that don't match. In the example we are left with screen and print.

Note. Style sheets may include media-dependent variations within them (e.g., the CSS @media construct). In such cases it may be appropriate to use "media =all".

MultiLength The value may be a Length or a relative length. A relative length has the form "i*", where "i" is an integer. When allotting space among elements competing for that space, user agents allot pixel and percentage lengths first, then divide up remaining available space among relative lengths. Each relative length receives a portion of the available space that is proportional to the integer preceding the "*". The value "*" is equivalent to "1*". Thus, if 60 pixels of space are available after the user agent allots pixel and percentage space, and the competing relative lengths are 1*, 2*, and 3*, the 1* will be alloted 10 pixels, the 2* will be alloted 20 pixels, and the 3* will be alloted 30 pixels.
Number One or more digits
Pixels The value is an integer that represents the number of pixels of the canvas (screen, paper). Thus, the value "50" means fifty pixels. For normative information about the definition of a pixel, please consult [CSS1]
Script

Script data can be the content of the "script" element and the value of intrinsic event attributes. User agents must not evaluate script data as HTML markup but instead must pass it on as data to a script engine.

The case-sensitivity of script data depends on the scripting language.

Please note that script data that is element content may not contain character references, but script data that is the value of an attribute may contain them.

Text Arbitrary textual data, likely meant to be human-readable.
URI A Uniform Resource Identifier, as per [URI].
URI A space separated list of Uniform Resource Identifiers, as per [URI].

4.1.4. Attribute Collections

The following basic attribute sets are used on many elements. In each case where they are used, their use is identified via their name rather than enumerating the list.

Collection Name Attributes in Collection
Core class (NMTOKEN), id (ID), title (CDATA)
I18N dir ("rtl" | "ltr"), xml:lang (NMTOKEN)
Events onclick (Script), ondblclick (Script), onmousedown (Script), onmouseup (Script), onmouseover (Script), onmousemove (Script), onmouseout (Script), onkeypress (Script), onkeydown (Script), onkeyup (Script)
Style style (CDATA)
Common Core + Events + I18N + Style

Note that the Events collection is only defined when the Intrinsic Events abstract module is selected. Otherwise, the Events collection is empty.

Also note that the Style collection is only defined when the Stylesheet Module is selected. Otherwise, the Style collection is empty.

4.2. Basic Modules

The basic modules are modules that are required to be present in any XHTML Family Conforming Document Type.

4.2.1. Structure Module

The Structure Module defines the major structural elements for XHTML. These elements effectively act as the basis for the content model of many XHTML family document types. The elements and attributes included in this module are:

Elements Attributes Minimal Content Model
body Common (Heading | Block | List)*
head I18N, profile (URI) title
html I18N, version (CDATA), xmlns (URI) head, body
title I18N PCDATA

This module is the basic structural definition for XHTML content. The html element acts as the root element for all XHTML Family Document Types.

4.2.2. Basic Text Module

This module defines all of the basic text container elements, attributes, and their content model:

Element Attributes Minimal Content Model
abbr Common (PCDATA | Inline)*
acronym Common (PCDATA | Inline)*
address Common (PCDATA | Inline)*
blockquote Common, cite (URI) (PCDATA | Heading | Block)*
br Core EMPTY
cite Common (PCDATA | Inline)*
code Common (PCDATA | Inline)*
dfn Common (PCDATA | Inline)*
div Common (Heading | Block | List)*
em Common (PCDATA | Inline)*
h1 Common (PCDATA | Inline)*
h2 Common (PCDATA | Inline)*
h3 Common (PCDATA | Inline)*
h4 Common (PCDATA | Inline)*
h5 Common (PCDATA | Inline)*
h6 Common (PCDATA | Inline)*
kbd Common (PCDATA | Inline)*
p Common (PCDATA | Inline)*
pre Common (PCDATA | Inline)*
q Common, cite (URI) (PCDATA | Inline)*
samp Common (PCDATA | Inline)*
span Common (PCDATA | Inline)*
strong Common (PCDATA | Inline)*
var Common (PCDATA | Inline)*

The minimal content model for this module defines some content sets:

Heading
h1 | h2 | h3 | h4 | h5 | h6
Block
address | blockquote | div | p | pre
Inline
abbr | acronym | br | cite | code | dfn | em | kbd | q | samp | span | strong | var
Flow
Heading | Block | Inline

4.2.3. Hypertext Module

The Hypertext Module provides the element that is used to define hypertext links to other resources. This module supports the following element and attributes:

Element Attributes Minimal Content Model
a Common, accesskey (Character), charset (Charset), href (URI), hreflang (LanguageCode), rel (LinkTypes), rev (LinkTypes), tabindex (Number), type (ContentType) (PCDATA | Inline - a)*

This module adds the a element to the Inline content set of the Basic Text Module.

4.2.4. List Module

As its name suggests, the List Module provides list-oriented elements. Specifically, the List Module supports the following elements and attributes:

Elements Attributes Minimal Content Model
dl Common (dt | dd)+
dt Common (PCDATA | Inline)*
dd Common (PCDATA | Inline)*
ol Common li+
ul Common li+
li Common (PCDATA | Inline)*

This module also defines the content set List with the minimal content model (dl | ol | ul)+ and adds this set to the Flow content set of the Basic Text Module.

4.3. Applet Module

The Applet Module provides elements for referencing external applications. Specifically, the Applet Module supports the following elements and attributes:

Element Attributes Minimal Content Model
applet Core, alt (Text), archive (CDATA), code (CDATA), codebase (URI), height (Length), name (CDATA), object (CDATA), width (Length) param?
param id (ID), name (CDATA), type (ContentType), value (CDATA), valuetype ("data" | "ref" | "object") EMPTY

When the Applet Module is used, it adds the applet element to the Inline content set of the Basic Text Module.

4.4. Text Extension Modules

This section defines a variety of additional textual markup modules.

4.4.1. Presentation Module

This module defines elements, attributes, and a minimal content model for simple presentation-related markup:

Element Attributes Minimal Content Model
b Common (PCDATA | Inline)*
big Common (PCDATA | Inline)*
hr Common EMPTY
i Common (PCDATA | Inline)*
small Common (PCDATA | Inline)*
sub Common (PCDATA | Inline)*
sup Common (PCDATA | Inline)*
tt Common (PCDATA | Inline)*

When this module is used, the hr element is added to the Block content set of the Basic Text Module. In additional, the b, big, i, small, sub, sup, and tt elements are added to the Inline content set of the Basic Text Module.

4.4.2. Edit Module

This module defines elements and attributes for use in editing-related markup:

Element Attributes Minimal Content Model
del Common, cite (URI), datetime (Datetime) (PCDATA | Inline)*
ins Common, cite (URI), datetime (Datetime) (PCDATA | Inline)*

When this module is used, the del and ins elements are added to the Inline content set of the Basic Text Module.

4.4.3. BDO Module

The BDO module defines an element that can be used to declare the bi-directional rules for the element's content.

Elements Attributes Minimal Content Model
bdo Common (PCDATA | Inline)*

When this module is used, the bdo element are added to the Inline content set of the Basic Text Module.

4.5. Forms Modules

4.5.1. Basic Forms Module

The Basic Forms Module provides the forms features found in HTML 3.2. Specifically, the Basic Forms Module supports the following elements, attributes, and minimal content model:

Elements Attributes Minimal Content Model
form Common, action (URI), method ("get" | "put"), enctype (ContentType) Heading | Block - form
input Common, checked ("checked"), maxlength (Number), name (CDATA), size (Number), src (URI), type ("text", "password", "checkbox", "radio", "submit", "reset", "file", "hidden", "image"), value (CDATA) EMPTY
select Common, multiple ("multiple"), name (CDATA), size (Number) option+
option Common, selected ("selected"), value (CDATA) Inline*
textarea Common, columns (Number), name (CDATA), rows (Number) PCDATA*

This module defines two content sets:

Form
form
Formctrl
input | select | textarea

When this module is used, it adds the Form content set to the Block content set and it adds the Formctrl content set to the Inline content set as these are defined in the Basic Text Module.

4.5.2. Forms Module

The Forms Module provides all of the forms features found in HTML 4.0. Specifically, the Forms Module supports:

Elements Attributes Minimal Content Model
form Common, accept (ContentTypes), accept-charset (Charsets), action (URI), method ("get" | "put"), enctype (ContentType) (Heading | Block - form | fieldset)+
input Common, accept (ContentTypes), accesskey (Character), alt (CDATA), checked ("checked"), disabled ("disabled"), maxlength (Number), name (CDATA), readonly ("readonly"), size (Number), src (URI), tabindex (Number), type ("text", "password", "checkbox", "radio", "submit", "reset", "file", "hidden", "image"), value (CDATA) EMPTY
select Common, disabled ("disabled"), multiple ("multiple"), name (CDATA), size (Number), tabindex (Number) (optgroup | option)+
option Common, disabled ("disabled"), label (Text), selected ("selected"), value (CDATA) PCDATA
textarea Common, accesskey (Character), columns (Number), disabled ("disabled"), name (CDATA), readonly ("readonly"), rows (Number), tabindex (Number) PCDATA
button Common, accesskey (Character), disabled ("disabled"), name (CDATA), tabindex (Number), type ("button" | "submit" | "reset"), value (CDATA) (PCDATA | Heading | List | Block - Form | Inline - Formctrl)*
fieldset Common (PCDATA | legend | Flow)*
label Common, accesskey (Character), for (IDREF) (PCDATA | Inline - label)*
legend Common, accesskey (Character) (PCDATA | Inline)+
optgroup Common, disabled ("disabled"), label (Text) option+

This module defines two content sets:

Form
form | fieldset
Formctrl
input | select | textarea | label | button

When this module is used, it adds the Form content set to the Block content set and it adds the Formctrl content set to the Inline content set as these are defined in the Basic Text Module.

The Forms Module is a superset of the Basic Forms Module. These modules may not be used together in a single document type.

4.6. Table Modules

4.6.1. Basic Tables Module

The Basic Tables Module provides table-related elements, but only in a limited form. Specifically, the Basic Tables Module supports:

Elements Attributes Minimal Content Model
caption Common (PCDATA | Inline)*
table Common, border (Pixels), cellpadding (Length). cellspacing(Length), summary (Text), width (Length) ( caption?, tr+
td Common, abbr (Text), align ("left" | "center" | "right"), axis (CDATA), colspan (Number), headers (IDREFS), rowspan (Number), scope ("row" | "col" | "rowgroup" | "colgroup"), valign ("top" | "middle" | "bottom") (PCDATA | Flow)*
th Common, abbr (Text), align ("left" | "center" | "right"), axis (CDATA), colspan (Number), headers (IDREFS), rowspan (Number), scope ("row" | "col" | "rowgroup" | "colgroup"), valign ("top" | "middle" | "bottom") (PCDATA | Flow)*
tr Common, align ("left" | "center" | "right"), valign ("top" | "middle" | "bottom") (th | td)+

When this module is used, it adds the table element to the Block content set as defined in the Basic Text Module.

4.6.2. Tables Module

As its name suggests, the Tables Module provides table-related elements that are better able to be accessed by non-visual user agents. Specifically, the Tables Module supports the following elements, attributes, and content model:

Elements Attributes Minimal Content Model
caption Common (PCDATA | Inline)*
table Common, border (Pixels), cellpadding (Length), cellspacing (Length), datapagesize (CDATA), frame ("void" | "above" | below" | "hsides" | "lhs" | "rhs" | "vsides" | "box" | "border"), rules ("none" | "groups" | "rows" | "cols" | "all"), summary (Text), width (Length) caption?, ( col* | colgroup* ), (( thead?, tfoot?, tbody+ ) | ( tr+ ))
td Common, abbr (Text), align ("left" | "center" | "right" | "justify" | "char"), axis (CDATA), char (Character), charoff (Length), colspan (Number), headers (IDREFS), rowspan (Number), scope ("row", "col", "rowgroup", "colgroup"), valign ("top" | "middle" | "bottom" | "baseline") (PCDATA | Inline)*
th Common, abbr (Text), align ("left" | "center" | "right" | "justify" | "char"), axis (CDATA), char (Character), charoff (Length), colspan (Number), headers (IDREFS), rowspan (Number), scope ("row", "col", "rowgroup", "colgroup"), valign ("top" | "middle" | "bottom" | "baseline") (PCDATA | Inline)*
tr Common, align ("left" | "center" | "right" | "justify", "char"), char (Character), charoff (Length), valign ("top" | "middle" | "bottom" | "baseline") (td | th)+
col Common, align ("left" | "center" | "right" | "justify", "char"), char (Character), charoff (Length), span (Number), valign ("top" | "middle" | "bottom" | "baseline"), width (MultiLength) EMPTY
colgroup Common, align ("left" | "center" | "right" | "justify", "char"), char (Character</