Please refer to the errata for this document, which may include some normative corrections.
This specification defines the Third Edition of XHTML 1.0, a reformulation of HTML 4 as an XML 1.0 application, and three DTDs corresponding to the ones defined by HTML 4. The semantics of the elements and their attributes are defined in the W3C Recommendation for HTML 4. These semantics provide the foundation for future extensibility of XHTML. Compatibility with existing HTML user agents is possible by following a small set of guidelines [XHTMLMIME].
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
On 19 May 2009 the W3C Director rescinded this document due to process issues. The community is advised not to use this document.
Please report errors in this document to firstname.lastname@example.org (archive). Public discussion on HTML features takes place on the mailing list email@example.com (archive).
This document has been produced by the W3C XHTML 2 Working Group as part of the HTML Activity. The goals of the XHTML 2 Working Group are discussed in the XHTML 2 Working Group charter.
This document is governed by the 24 January 2002 CPP as amended by the W3C Patent Policy Transition Procedure. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.
This section is informative.
XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4 [HTML4]. XHTML family document types are XML based, and ultimately are designed to work in conjunction with XML-based user agents. The details of this family and its evolution are discussed in more detail in [XHTMLMOD].
XHTML 1.0 (this specification) is the first document type in the XHTML family. It is a reformulation of the three HTML 4 document types as applications of XML 1.0 [XML]. It is intended to be used as a language for content that is both XML-conforming and, if some simple guidelines are followed, operates in HTML 4 conforming user agents. Developers who migrate their content to XHTML 1.0 will realize the following benefits:
The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its attendant benefits, while still remaining confident in their content's backward and future compatibility.
HTML 4 [HTML4] is an SGML (Standard Generalized Markup Language) application conforming to International Standard ISO 8879, and is widely regarded as the standard publishing language of the World Wide Web.
SGML is a language for describing markup languages, particularly those used in electronic document exchange, document management, and document publishing. HTML is an example of a language defined in SGML.
SGML has been around since the middle 1980's and has remained quite stable. Much of this stability stems from the fact that the language is both feature-rich and flexible. This flexibility, however, comes at a price, and that price is a level of complexity that has inhibited its adoption in a diversity of environments, including the World Wide Web.
HTML, as originally conceived, was to be a language for the exchange of scientific and other technical documents, suitable for use by non-document specialists. HTML addressed the problem of SGML complexity by specifying a small set of structural and semantic tags suitable for authoring relatively simple documents. In addition to simplifying the document structure, HTML added support for hypertext. Multimedia capabilities were added later.
In a remarkably short space of time, HTML became wildly popular and rapidly outgrew its original purpose. Since HTML's inception, there has been rapid invention of new elements for use within HTML (as a standard) and for adapting HTML to vertical, highly specialized, markets. This plethora of new elements has led to interoperability problems for documents across different platforms.
XML™ is the shorthand name for Extensible Markup Language [XML].
XML was conceived as a means of regaining the power and flexibility of SGML without most of its complexity. Although a restricted form of SGML, XML nonetheless preserves most of SGML's power and richness, and yet still retains all of SGML's commonly used features.
While retaining these beneficial features, XML removes many of the more complex features of SGML that make the authoring and design of suitable software both difficult and costly.
The benefits of migrating to XHTML 1.0 are described above. Some of the benefits of migrating to XHTML in general are:
This section is normative.
The following terms are used in this specification. These terms extend the definitions in [RFC2119] in ways based upon similar definitions in ISO/IEC 9945-1:1990 [POSIX.1]:
This section is normative.
This version of XHTML provides a definition of strictly conforming XHTML 1.0 documents, which are restricted to elements and attributes from the XML and XHTML 1.0 namespaces. See Section 3.1.2 for information on using XHTML with other namespaces, for instance, to include metadata expressed in RDF within XHTML documents.
A Strictly Conforming XHTML Document is an XML document that requires only the facilities described as mandatory in this specification. Such a document must meet all of the following criteria:
It must conform to the constraints expressed in one of the three DTDs found in DTDs and in Appendix B.
The root element of the document must be
The root element of the document must contain an
xmlns declaration for the XHTML namespace [XMLNS]. The namespace for XHTML is
defined to be
http://www.w3.org/1999/xhtml. An example root element might look like:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
There must be a DOCTYPE declaration in the document prior to the root element. The public identifier included in the DOCTYPE declaration must reference one of the three DTDs found in DTDs using the respective Formal Public Identifier. The system identifier may be changed to reflect local system conventions.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
The DTD subset must not be used to override any parameter entities in the DTD.
An XML declaration is not required in all XML documents; however XHTML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol. Here is an example of an XHTML document. In this example, the XML declaration is included.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Virtual Library</title> </head> <body> <p>Moved to <a href="http://example.org/">example.org</a>.</p> </body> </html>
The XHTML namespace may be used with other XML namespaces as per [XMLNS], although such documents are not strictly conforming XHTML 1.0 documents as defined above. Work by W3C is addressing ways to specify conformance for documents involving multiple namespaces. For an example, see [XHTML+MathML].
The following example shows the way in which XHTML 1.0 could be used in conjunction with the MathML Recommendation:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>A Math Example</title> </head> <body> <p>The following is MathML markup:</p> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <log/> <logbase> <cn> 3 </cn> </logbase> <ci> x </ci> </apply> </math> </body> </html>
The following example shows the way in which XHTML 1.0 markup could be incorporated into another XML namespace:
<?xml version="1.0" encoding="UTF-8"?> <!-- initially, the default namespace is "books" --> <book xmlns='urn:loc.gov:books' xmlns:isbn='urn:ISBN:0-395-36341-6' xml:lang="en" lang="en"> <title>Cheaper by the Dozen</title> <isbn:number>1568491379</isbn:number> <notes> <!-- make HTML the default namespace for a hypertext commentary --> <p xmlns='http://www.w3.org/1999/xhtml'> This is also available <a href="http://www.w3.org/">online</a>. </p> </notes> </book>
A conforming user agent must meet all of the following criteria:
idattribute on most XHTML elements) as fragment identifiers.
White space is handled according to the following rules. The following characters are defined in [XML] white space characters:
The XML processor normalizes different systems' line end codes into one single LINE FEED character, that is passed up to the application.
The user agent must use the definition from CSS for processing whitespace characters [CSS2]. Note that the CSS2 recommendation does not explicitly address the issue of whitespace handling in non-Latin character sets. This will be addressed in a future version of CSS, at which time this reference will be updated.
Note that in order to produce a Canonical XHTML document, the rules above must be applied and the rules in [XMLC14N] must also be applied to the document.
This section is informative.
Due to the fact that XHTML is an XML application, certain practices that were perfectly legal in SGML-based HTML 4 [HTML4] must be changed.
Well-formedness is a new concept introduced by [XML]. Essentially this means that all elements must either have closing tags or be written in a special form (as described below), and that all the elements must nest properly.
Although overlapping is illegal in SGML, it is widely tolerated in existing browsers.
CORRECT: nested elements.
<p>here is an emphasized <em>paragraph</em>.</p>
INCORRECT: overlapping elements
<p>here is an emphasized <em>paragraph.</p></em>
XHTML documents must use lower case for all HTML element and attribute names. This difference is necessary because XML is case-sensitive e.g. <li> and <LI> are different tags.
In SGML-based HTML 4 certain elements were permitted to omit the end tag; with the elements that followed implying closure. XML does not allow end tags to be omitted. All elements other than those
declared in the DTD as
EMPTY must have an end tag. Elements that are declared in the DTD as
EMPTY can have an end tag or can use empty element shorthand (see Empty Elements).
CORRECT: terminated elements
<p>here is a paragraph.</p><p>here is another paragraph.</p>
INCORRECT: unterminated elements
<p>here is a paragraph.<p>here is another paragraph.
All attribute values must be quoted, even those which appear to be numeric.
CORRECT: quoted attribute values
INCORRECT: unquoted attribute values
XML does not support attribute minimization. Attribute-value pairs must be written in full. Attribute names such as
checked cannot occur in elements without
their value being specified.
CORRECT: unminimized attributes
INCORRECT: minimized attributes
Empty elements must either have an end tag or the start tag must end with
/>. For instance,
<hr></hr>. See HTML Compatibility Guidelines for information on ways to ensure this is backward compatible with HTML 4 user agents.
CORRECT: terminated empty elements
INCORRECT: unterminated empty elements
When user agents process attributes, they do so according to Section 3.3.3 of [XML]:
In XHTML, the script and style elements are declared as having
#PCDATA content. As a result,
& will be treated as the start of markup, and
entities such as
& will be recognized as entity references by the XML processor to
& respectively. Wrapping the
content of the script or style element within a
CDATA marked section avoids the expansion of these entities.
CDATA sections are recognized by the XML processor and appear as nodes in the Document Object Model, see Section 1.3 of the DOM Level 1 Recommendation [DOM].
An alternative is to use external script and style documents.
SGML gives the writer of a DTD the ability to exclude specific elements from being contained within an element. Such prohibitions (called "exclusions") are not possible in XML.
For example, the HTML 4 Strict DTD forbids the nesting of an '
a' element within another '
a' element to any descendant depth. It is not possible to spell out such
prohibitions in XML. Even though these prohibitions cannot be defined in the DTD, certain elements should not be nested. A summary of such elements and the elements that should not be nested in them
is found in the normative Element Prohibitions.
HTML 4 defined the
name attribute for the elements
map. HTML 4 also introduced the
id attribute. Both of these attributes are designed to be used as fragment identifiers.
In XML, fragment identifiers are of type
ID, and there can only be a single attribute of type
ID per element. Therefore, in XHTML 1.0 the
id attribute is
defined to be of type
ID. In order to ensure that XHTML 1.0 documents are well-structured XML documents, XHTML 1.0 documents MUST use the
id attribute when defining fragment
identifiers on the elements listed above. See the HTML Compatibility Guidelines for information on ensuring such anchors are backward compatible when serving
XHTML documents as media type
Note that in XHTML 1.0, the
name attribute of these elements is formally deprecated, and will be removed in a subsequent version of XHTML.
HTML 4 and XHTML both have some attributes that have pre-defined and limited sets of values (e.g. the
type attribute of the
input element). In SGML and XML, these are
called enumerated attributes. Under HTML 4, the interpretation of these values was case-insensitive, so a value of
TEXT was equivalent to a value of
Under XML, the interpretation of these values is case-sensitive, and in XHTML 1 all of these values are defined in lower-case.
SGML and XML both permit references to characters by using hexadecimal values. In SGML these references could be made using either &#Xnn; or &#xnn;. In XML documents, you must use the lower-case version (i.e. &#xnn;)
This section is informative.
Although there is no requirement for XHTML 1.0 documents to be compatible with existing user agents, in practice this is easy to accomplish. Guidelines for creating compatible documents can be found in [XHTMLMIME].
XHTML Documents which follow the guidelines set forth in [XHTMLMIME] may be labeled with the Internet Media Type "text/html" [RFC2854], as they are compatible with most HTML browsers. Those documents, and any other document conforming to this specification, may also be labeled with the Internet Media Type "application/xhtml+xml" as defined in [RFC3236]. For further information on using media types with XHTML, see the informative note [XHTMLMIME].
This appendix is normative.
These DTDs and entity sets form a normative part of this specification. The complete set of DTD files together with an XML declaration and SGML Open Catalog is included in the zip file and the gzip'd tar file for this specification. Users looking for local copies of the DTDs to work with should download and use those archives rather than using the specific DTDs referenced below.
These DTDs approximate the HTML 4 DTDs. The W3C recommends that you use the authoritative versions of these DTDs at their defined SYSTEM identifiers when validating content. If you need to use these DTDs locally you should download one of the archives of this version. For completeness, the normative versions of the DTDs are included here:
The file DTD/xhtml1-strict.dtd is a normative part of this specification. The annotated contents of this file are available in this separate section for completeness.
The file DTD/xhtml1-transitional.dtd is a normative part of this specification. The annotated contents of this file are available in this separate section for completeness.
The file DTD/xhtml1-frameset.dtd is a normative part of this specification. The annotated contents of this file are available in this separate section for completeness.
The XHTML entity sets are the same as for HTML 4, but have been modified to be valid XML 1.0 entity declarations. Note the entity for the Euro currency sign (
€) is defined as part of the special characters.
The file DTD/xhtml-lat1.ent is a normative part of this specification. The annotated contents of this file are available in this separate section for completeness.
The file DTD/xhtml-special.ent is a normative part of this specification. The annotated contents of this file are available in this separate section for completeness.
The file DTD/xhtml-symbol.ent is a normative part of this specification. The annotated contents of this file are available in this separate section for completeness.
This appendix is normative.
The following elements have prohibitions on which elements they can contain (see SGML Exclusions). This prohibition applies to all depths of nesting, i.e. it contains all the descendant elements.
This appendix is informative.
The contents of this appendix have been moved into a separate Note - see [XHTMLMIME].
This appendix is informative.
This specification was written with the participation of the members of the W3C XHTML 2 Working Group (formerly the HTML Working Group).
At publication of the third edition, the membership was:
At publication of the second edition, the membership was:
At publication of the first edition, the membership was:
This appendix is informative.