Many people want to use XHTML to author their web pages, but are confused about the best ways to deliver those pages in such a way that they will be processed correctly by various user agents. This Note contains suggestions about how to format XHTML to ensure it is maximally portable, and how to deliver XHTML to various user agents - even those that do not yet support XHTML natively. This document is intended to be used by document authors who want to use XHTML today, but want to be confident that their XHTML content is going to work in the greatest number of environments.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is a Note made available by the World Wide Web Consortium (W3C) for your information. Publication of this Note by W3C indicates no endorsement by W3C or the W3C Team, or any W3C Members.
This document has been produced by the W3C XHTML 2 Working Group as part of the HTML Activity. The goals of the XHTML 2 Working Group are discussed in the XHTML 2 Working Group charter. The document represents working group consensus on the usage of Internet media types for various XHTML Family documents. However, this document is not intended to be a normative specification. Instead, it documents a set of recommendations to maximize the interoperability of XHTML documents with regard to Internet media types. This document does not address general issues on media types and namespaces.
Comments on this document may be sent to firstname.lastname@example.org (archive). Public discussion on this document may take place on the mailing list email@example.com (archive).
XHTML 1.0 [XHTML1] reformulated HTML 4 [HTML4] as an XML application, and Modularization of XHTML [XHTMLM12N] provided a means to define XHTML-based markup languages using XHTML modules, collectively called the "XHTML Family". However, due to historical reasons, a recommended way to serve such XHTML Family documents, in particular with regard to Internet media types, was somewhat unclear.
After the publication of [XHTML1], an RFC for XML media types was revised and published as RFC 3023 [RFC3023], and it introduced the '+xml' suffix convention for XML-based media types. The 'application/xhtml+xml' media type [RFC3236] was registered following that convention. Now there are at least four possibilities on media type labeling for XHTML Family documents - 'text/html', 'application/xhtml+xml', and generic XML media types 'application/xml' and 'text/xml'.
This document summarizes the current best practice for using those various Internet media types for XHTML Family documents.
In general, 'application/xhtml+xml' should be used for XHTML Family documents, and the use of 'text/html' should be limited to HTML-compatible XHTML Family documents intended for delivery to user agents that do not explcitly state in their HTTP Accept header that they accept 'application/xhtml+xml'. The media types 'application/xml' and 'text/xml' may also be used, but whenever appropriate, 'application/xhtml+xml' or 'text/html' should be used rather than those generic XML media types.
Note that, because of the lack of explicit support for XHTML (and XML in general) in some user agents, only very careful construction of documents can ensure their portability (see Appendix A). If you do not require the advanced features of XHTML Family markup languages (e.g., XML DOM, XML Validation, extensibility via XHTML Modularization, semantic markup via XHTML+RDFa, Assistive Technology access via the XHTML Role and XHTML Access modules, etc.), you may want to consider using HTML 4.01 [HTML] in order to reduce the risk that content will not be portable to HTML user agents. Even in that case authors can help ensure their portability AND ease their eventual migration to the XHTML Family by ensuring their documents are valid [VALIDATOR] and by following the relevant guidelines in Appendix A.
Note: While this document sometimes uses terms like "must" and "should", this document is not normative and those terms do not have the same meaning as when they are used in a normative W3C specification.
xml:lang), but an XHTML Family document type may also include elements and attributes from other namespaces, such as MathML [MathML2].
This section summarizes which Internet media type should be used for XHTML Family documents. Note that while some suggestions are made in this section with regard to content delivery, this section is by no means a comprehensive discussion of content negotiation techniques.
That being said, a combination of these rules, in conjunction with a careful examination of the HTTP Accept header, can be useful in determining which media type to use when a document adheres to the guidelines in Appendix A. Specifically:
application/xhtml+xml(with either no "q" parameter or a positive "q" value) deliver the document using that media type.
text/html(with either no "q" parameter or a positive "q" value) deliver the document using that media type.
In other words, requestors that advertise they support XHTML family documents will receive the document in the XHTML media type, and all other requestors that (at least claim to) support HTML or "everything" will receive the document using the HTML media type. Dealing with user agents that satisfy none of these criteria is outside the scope of this document.
When an XHTML document does NOT adhere to the guidelines, it
should only be delivered as media type
The 'application/xhtml+xml' media type [RFC3236] is the primary media type for XHTML Family documents. 'application/xhtml+xml' should be used for serving XHTML documents to XHTML user agents (agents that explicitly indicate they support this media type). This media type must be used when writing documents using XHTML Family document types that add elements and attributes from foreign namespaces, such as XHTML+MathML [XHTML+MathML].
The 'text/html' media type [RFC2854] is primarily for HTML, not for XHTML. In general, this media type is NOT suitable for XHTML except when the XHTML is conforms to the guidelines in Appendix A. In particular, 'text/html' is NOT suitable for XHTML Family document types that add elements and attributes from foreign namespaces, such as XHTML+MathML [XHTML+MathML].
XHTML documents served as 'text/html' will not be processed as XML [XML10], e.g., well-formedness errors may not be detected by user agents. Also be aware that HTML rules will be applied for DOM and style sheets (see guidelines 11 and 13).
Authors should also be careful about character encoding issues. See guideline 1 and guideline 9 for details.
This appendix summarizes design guidelines for authors who wish their XHTML documents to render on both XHTML-aware and modern HTML user agents. The purpose of providing these guidelines is to supply a simple collection that, if followed, will give reasonable, predictable results in modern user agents. Document authors should treat these as best practices that were considered correct at the time this document was published. Like all of this document, this Appendix is informative. It contains no absolute requirements, and should NEVER be used as the basis for creating conformance nor validation rules of any sort. Period.
For an example document that reflect the use of the guidelines from this section, see Appendix B.
DO NOT include XML processing instructions NOR the XML declaration.
Rationale: Some HTML user agents render XML processing instructions. Also, some user agents interpret the XML declaration to mean that the document is unrecognized XML rather than HTML. Such user agents may not render the document as expected. For compatibility with these types of HTML browsers, you should avoid using processing instructions and XML declarations.
Consequence: Remember, however, that when the XML declaration is not included in a document, AND the character encoding is not specified by a higher level protocol such as HTTP, the document can only use the default character encodings UTF-8 or UTF-16. See, however, guideline 9 below.
If an element has an EMPTY content model DO use the minimized tag syntax
permitted by XML (e.g.,
DO NOT use the alternative syntax (e.g.,
<br></br>) allowed by XML, since this may be unsupported by
HTML user agents.
Also, DO include a space before the trailing
Empty elements in the XHTML family include:
Rationale: HTML user agents ignore the
the end of a tag, but without it they may incorrectly parse the tag or its attributes. HTML user agents also may not recognize the alternate
syntax permitted by XML.
If an element permits content (e.g., the
but an instance of that element has no content (e.g., an empty
paragraph), DO NOT use the
"minimized" tag syntax (e.g.,
Rationale: HTML user agents may give uncertain results when using the the minimized syntax permitted by XML when an element has no content.
DO use external style sheets if your style sheet uses
DO NOT use an internal stylesheet
if the style rules contain any of the above characters.
external scripts if your script uses
DO NOT embed a script in a document
if it contains any of these characters.
Rationale: XML parsers
are permitted to silently remove the contents of comments. Therefore, the historical
practice of "hiding" scripts and style sheets within "comments" to make the
documents backward compatible may not work as expected in XML-based
user agents. While XML provides the CDATA method to embed data such as this,
that method will not work correctly should the document be delivered as
Note that if you really need to embed scripts or stylesheets, the following patterns can be used:
Portably escaping embedded script contents:
<script>//<![CDATA[ ... //]]></script>
Portably escaping embedded style contents:
<style>/*<![CDATA[*/ ... /*]]>*/</style>
@@@@Put a real example in here that works, and one that does not work@@@@
DO ensure that attribute values are on a single line and only use single whitespace characters. DO NOT use line breaks and multiple consecutive white space characters within attribute values.
Rationale: These are handled inconsistently by user agents - user agents are permitted to collapse multiple whitespace characters to a single white space character.
DO use both
attributes when specifying the language of an element in markup languages that support the use of both.
HTML 4 documents use the
to identify the language
of an element. XML documents use the
CSS has a "lang" pseudo selector that automatically uses the appropriate
attribute depending on the media type. Therefore, specifying both
attributes ensures that single CSS selectors will work in both modes.
DO use the
id attribute to identify elements.
DO ensure that the values used for the
id attribute are limited to the pattern
DO NOT use the
name attribute to identify elements, even in languages that
permit the use of
name such as XHTML 1.0.
Rationale: In HTML 3.2 and earlier the
name attribute on some elements could
be used to define an anchor, but HTML 4 introduced the
id attribute. In an XML dialect, only attributes
ID are permitted to be used
as anchors, and the
id attribute is defined to be
ID. Relying upon the
as an anchor will work well in modern HTML and
XHTML-aware user agents.
DO encode your document in UTF-8 or UTF-16. When delivering the document from a server, DO set the character encoding for a document via the charset parameter of the HTTP Content-Type header. When not delivering the document from a server, DO set the encoding via a "meta http-equiv" statement in the document (e.g., <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />). However, note that doing so will explicitly bind the document to an a single content type.
Rationale: Since these guidelines already recommend that documents NOT contain the XML declaration, setting the encoding via the HTTP header is the only reliable mechanism compatible with HTML and XML user agents. When that mechanism is not available, the only portable fallback is the "meta http-equiv" statement.
DO use the full form for boolean attributes, as required by
Such attributes include:
Rationale: The compact form of these attributes is not well formed XML, and therefore invalid.
DO ensure examination of element and attribute names in scripts that use the DOM as defined in The Document Object Model level 1 Recommendation [DOM] are case-insensitive.
Rationale: This will ensure maximum portability of
scripts, since the DOM methods will return element and attribute names
in uppercase when served as
text/html and in
lowercase when served as
DO ensure that when content or attribute values contain
the reserved character
it is used in its escaped form
Rationale: If ampersands are not encoded, the characters after them up to the next semi-colon can be interpreted as the name of a entity by the user agent.
DO use lower case element and attribute names in style sheets.
DO create rules that include inferred elements (e.g., the
element in a table).
Rationale: These simple rules will help increase the portability of CSS rules regardless of the media type the document is processed as.
DO NOT use XML stylesheet declarations to identify style sheets.
DO use the
link elements to
Rationale: Since XML processing instructions may be
rendered by some HTML user agents, using the standard XML stylesheet
declaration mechanism may not work well. However, since XHTML user agents
are required to process
and interpret stylesheets referenced from those elements, documents
constructed to use them will work as expected.
DO NOT use the formfeed character (U+000C).
Rationale: This character is recognized as white space in HTML 4, but is NOT considered white space in all versions of XML.
' to specify an escaped apostrophe.
DO NOT use
Rationale: The entity
' is not defined
in HTML 4.
DO NOT use the XML DTD internal subset mechanism as part of the DOCTYPE declaration.
Rationale: The subset mechanism is not supported by non-XML user agents.
DO NOT use the XML CDATA mechanism excepting as described in guideline 4 above.
Rationale: This mechanism is not supported in non-XML user agents.
DO use explicit
Rationale: While the content model of the
table element permits
tbody element to be skipped, in HTML 4 this element is implicit.
HTML 4 user agents will silently add this element, thus potentially confusing
scripts or style sheets.
DO use the
base element if you
need to establish an alternate base URI for your document.
DO NOT use the
XHTML Family markup languages do not even support
xml:base, relying instead upon the (X)HTML
base. HTML user agents do not support
xml:base at all.
document.writeln to change the document.
DO use DOM manipulation to achieve the same effect.
Rationale: Native XML user agents may not support this technique for modifying the content of the document.
DO NOT use the
(supported by some user agents) to update a document
dynamically.DO use standard DOM
manipulation techniques instead. If you choose to use this method, ensure that
it is available in the user agent and then ensure
that any content inserted via the method is well-formed AND conforms to these
guidelies so it is HTML 4 compatible.
Rationale: There are many other standard Document Object Model methods for updating a document that will do so more portably than this non-standard method.
DO ensure that if
elements are omitted from
scripts that examine the document tree are capable of working both with
and without the element.
Rationale: See guideline 19 above.
DO ensure that any CSS
properties on the
html element are also
specified on the
Rationale: Some HTML user agents only recognize CSS properties on the
DO NOT use the
Rationale: The contents of this element are treated differently depending on whether the document is evaluated as XML or HTML, as well as whether scripting is enabled in the user agent or not.
DO NOT embed content in
DO use the
src attribute to incorporate
data via the
Rationale: Content embedded in this element is treated differently
depending on whether the document is evaluated as XML or HTML,
as well as whether scripting is enabled in the user agent or not.
The contents of the frame can be imported from an external
source via the
DO create new Document
Object Model elements via the
if it is available. If it is not, fallback to the
Rationale: The createElementNS method will work as expected in both HTML and XML contexts, but it is not supported in all user agents.
The following is an example document that adopts the conventions described in Appendix A to ensure its portability among XHTML and HTML user agents.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>sample</title> <link href="style/style.css" rel="stylesheet" type="text/css" /> </head> <body> <div id="main"> <h1>heading</h1> <img src="http://www.w3.org/Icons/w3c_main" alt="W3C logo" /> <!-- defined as an "EMPTY" element, do not use <img></img> or <img/> --> <p>Some material & some <!-- use escaped ampersand, & --> <br /> <!-- defined as an "EMPTY" element, do not use <br></br> or <br/> --> that should be split.</p> <p></p> <!-- NOT defined as an "EMPTY" element, just no content, so do not use <p/> nor <p /> --> <input type="reset" disabled="disabled" /> <!-- defined as an "EMPTY" element, do not use <hr></hr> nor <hr/> --> <hr /> <!-- defined as an "EMPTY" element, do not use <hr></hr> nor <hr/> --> </div> </body> </html>
"HTML 4.01 Specification", W3C Recommendation, D. Raggett, A. Le Hors, I. Jacobs, eds., 24 December 1999. Available at: http://www.w3.org/TR/1999/REC-html401-19991224
The latest version of HTML 4.01 is available at: http://www.w3.org/TR/html401
The latest version of HTML 4 is available at: http://www.w3.org/TR/html4
"Mathematical Markup Language (MathML) Version 2.0", W3C Recommendation, D. Carlisle, P. Ion, R. Miner, N. Poppelier, eds., 21 February 2001. Available at: http://www.w3.org/TR/2001/REC-MathML2-20010221
The latest version is available at: http://www.w3.org/TR/MathML2
The W3C Markup Validation Service available at http://validator.w3.org.
"XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition): A Reformulation of HTML 4 in XML 1.0", W3C Recommendation, S. Pemberton et al., August 2002. Available at: http://www.w3.org/TR/2002/REC-xhtml1-20020801
The first edition is available at: http://www.w3.org/TR/2000/REC-xhtml1-20000126
The latest version is available at: http://www.w3.org/TR/xhtml1
"XHTML™ 1.1 - Module-based XHTML", W3C Recommendation, M. Altheim, S. McCarron, eds., 31 May 2001. Available at: http://www.w3.org/TR/2001/REC-xhtml11-20010531
The latest version is available at: http://www.w3.org/TR/xhtml11
"XHTML™ Basic", W3C Recemmendation, M. Baker, M. Ishikawa, S. Matsui, P. Stark, T. Wugofski, T. Yamakami, eds., 19 December 2000. Available at: http://www.w3.org/TR/2000/REC-xhtml-basic-20001219
The latest version is available at: http://www.w3.org/TR/xhtml-basic
"Modularization of XHTML™", W3C Recommendation, M. Altheim, F. Boumphrey, S. Dooley, S. McCarron, S. Schnitzenbaumer, T. Wugofski, eds., 10 April 2001. Available at: http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410
The latest version is at: http://www.w3.org/TR/xhtml-modularization
"Extensible Markup Language (XML) 1.0 Specification (Second Edition)", T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, eds., 6 October 2000. Available at: http://www.w3.org/TR/2000/REC-xml-20001006
The latest version is available at: http://www.w3.org/TR/REC-xml
"Namespaces in XML", T. Bray, D. Hollander, A. Layman, eds., 14 January 1999. Available at: http://www.w3.org/TR/1999/REC-xml-names-19990114
The latest version is available at: http://www.w3.org/TR/REC-xml-names
"Associating Style Sheets with XML documents Version 1.0", W3C Recommendation, J. Clark, ed., 29 June 1999. Available at: http://www.w3.org/1999/06/REC-xml-stylesheet-19990629
The latest version is available at: http://www.w3.org/TR/xml-stylesheet
In 3.5. Summary, changed 'text/html' for HTML 4 as SHOULD rather than MAY.
Updated reference to XHTML 1.0 to refer to the Second Edition.