Copyright ©2002-2008 W3C ® ( MIT , INRIA , Keio ), All Rights Reserved. W3C liability , trademark , document use and software licensing rules apply.
This document summarizes the current best practice for using various Internet media types when serving XHTML Family documents to relatively modern user agents - even those that do not yet support XHTML natively. In summary , 'application/xhtml+xml' SHOULD be used for XHTML Family documents, and the use of 'text/html' SHOULD be limited to HTML -compatible XHTML Family documents intended for delivery to user agents that do not explcitly state in their HTTP Accept header that they accept 'application/xhtml+xml'. The media types 'application/xml' and 'text/xml' MAY also be used, but whenever appropriate, 'application/xhtml+xml' or 'text/html' SHOULD be used rather than those generic XML media types.
Note that, because of the lack of explicit support for XHTML (and XML in general) in some user agents, only very careful construction of documents can ensure their portability (see Appendix A ). If you do not require the advanced features of XHTML Family markup languages (e.g., XML DOM, XML Validation, extensibility via XHTML Modularization, semantic markup via XHTML+RDFa, Assistive Technology access via the XHTML Role and XHTML Access modules, etc.), you may want to consider using HTML 4.01 [ HTML ] in order to reduce the risk that content will not be portable to HTML user agents. Even in that case authors can help ensure their portability AND ease their eventual migration to the XHTML Family by ensuring their documents are valid [ VALIDATOR ] and by following the relevant guidelines in Appendix A .
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is a Note made available by the World Wide Web Consortium (W3C) for your information. Publication of this Note by W3C indicates no endorsement by W3C or the W3C Team, or any W3C Members.
This document has been produced by the W3C XHTML 2 Working Group as part of the HTML Activity . The goals of the XHTML 2 Working Group are discussed in the XHTML 2 Working Group charter . The document represents working group consensus on the usage of Internet media types for various XHTML Family documents. However, this document is not intended to be a normative specification. Instead, it documents a set of recommendations to maximize the interoperability of XHTML documents with regard to Internet media types. This document does not address general issues on media types and namespaces.
Comments on this document may be sent to www-html-editor@w3.org ( archive ). Public discussion on this document may take place on the mailing list www-html@w3.org ( archive ).
XHTML 1.0 [ XHTML1 ] reformulated HTML 4 [ HTML4 ] as an XML application, and Modularization of XHTML [ XHTMLM12N ] provided a means to define XHTML-based markup languages using XHTML modules, collectively called the "XHTML Family". However, due to historical reasons, a recommended way to serve such XHTML Family documents, in particular with regard to Internet media types, was somewhat unclear.
After the publication of [ XHTML1 ], an RFC for XML media types was revised and published as RFC 3023 [ RFC3023 ], and it introduced the '+xml' suffix convention for XML-based media types. The 'application/xhtml+xml' media type [ RFC3236 ] was registered following that convention. Now there are at least four possibilities on media type labeling for XHTML Family documents - 'text/html', 'application/xhtml+xml', and generic XML media types 'application/xml' and 'text/xml'.
This document summarizes the current best practice for using those various Internet media types for XHTML Family documents.
The key words " MUST ", " MUST NOT ", " REQUIRED ", " SHALL ", " SHALL NOT ", " SHOULD ", " SHOULD NOT ", " RECOMMENDED ", " MAY ", and " OPTIONAL " in this document are to be interpreted as described in RFC 2119 [ RFC2119 ].
http://www.w3.org/1999/xhtml
.
xml:lang
),
but
an
XHTML
Family
document
type
MAY
also
include
elements
and
attributes
from
other
namespaces,
such
as
MathML
[
MathML2
].
This section summarizes which Internet media type SHOULD be used for which XHTML Family document for which purpose.
A combination of these rules, in conjunction with a careful examination of the HTTP Accept header, can be useful in determining which media type to use when a document adheres to the guidelines in Appendix A . Specifically:
application/xhtml+xml
deliver
the
document
using
that
media
type.
text/html
,
deliver
the
document
using
that
media
type.
text/html
.In other words, requestors that advertise they support XHTML family documents will receive the document in the XHTML media type, and all other requestors will receive the document using the HTML media type.
When
a
document
does
NOT
adhere
to
the
guidelines,
it
SHOULD
NOT
be
delivered
as
media
type
text/html
.
If
such
documents
need
to
be
delivered
to
requestors
who
do
not
explicitly
support
the
XHTML
family,
those
documents
should
be
transformed
into
valid
HTML
and
then
delivered
as
such.
Note:
It
is
possible
that
in
the
future
XHTML
Modularization
will
define
rules
for
indicating
which
specific
XHTML
family
members
are
supported
by
a
requestor
(e.g.,
via
the
profile
parameter
of
the
media
type
in
the
Accept
header).
Such
rules,
when
used
in
conjunction
with
the
"quality"
parameter
of
the
media
type
could
help
a
server
determine
which
of
several
versions
of
a
document
to
deliver.
The 'text/html' media type [ RFC2854 ] is primarily for HTML, not for XHTML. In general, this media type is NOT suitable for XHTML except when the XHTML is carefully constructed (see Appendix A. In particular, 'text/html' is NOT suitable for XHTML Family document types that add elements and attributes from foreign namespaces, such as XHTML+MathML [ XHTML+MathML ].
XHTML documents served as 'text/html' will not be processed as XML [ XML10 ], e.g. well-formedness errors may not be detected by user agents. Also be aware that HTML rules will be applied for DOM and style sheets (see guidelines 11 and 13 ).
Authors should also be careful about character encoding issues. A typical misunderstanding is that since an XHTML document is an XML document, the character encoding of an XHTML document should be treated as UTF-8 or UTF-16 in the absence of an explicit character encoding information. This is NOT the case when an XHTML document is served as 'text/html'. "6. Charset default rules" of [ RFC2854 ] notes as follows:
The use of an explicit charset parameter is strongly recommended. While [ MIME ] specifies "The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII." [ HTTP ] Section 3.7.1, defines that "media subtypes of the 'text' type are defined to have a default charset value of 'ISO-8859-1'". Section 19.3 of [ HTTP ] gives additional guidelines. Using an explicit charset parameter will help avoid confusion.
Using an explicit charset parameter also takes into account that the overwhelming majority of deployed browsers are set to use something else than 'ISO-8859-1' as the default; the actual default is either a corporate character encoding or character encodings widely deployed in a certain national or regional community. For further considerations, please also see Section 5.2 of [ HTML40 ].
"5.2.2
Specifying
the
character
encoding"
of
the
HTML
4
specification
[
HTML4
]
also
notes
that
user
agents
must
not
assume
any
default
value
for
the
"charset"
parameter
.
Therefore,
authors
SHOULD
NOT
assume
any
default
value
for
an
XHTML
document
served
as
'text/html',
and
as
mentioned
in
[
RFC2854
],
the
use
of
an
explicit
charset
parameter
is
STRONGLY
RECOMMENDED
.
When
it
is
difficult
to
specify
an
explicit
charset
parameter
through
a
higher-level
protocol
(e.g.,
HTTP),
authors
SHOULD
include
the
XML
declaration
(e.g.,
<?xml
version="1.0"
encoding="EUC-JP"?>
)
and
a
meta
http-equiv
statement
(e.g.
<meta
http-equiv="Content-Type"
content="text/html;
charset=EUC-JP" />
).
See
guideline
9
for
details.
The 'application/xhtml+xml' media type [ RFC3236 ] is the primary media type for XHTML Family document types, and in particular it is suitable for all XHTML Host Language document types. XHTML Family document types suitable for this media type include [ XHTML1 ], [ XHTMLBasic ], [ XHTML11 ] and [ XHTML+MathML ]. An XHTML Host Language document type that adds elements and attributes from foreign namespaces MAY identify its profile with the 'profile' optional parameter or other means such as the "Content-features" MIME header described in RFC 2912 [ RFC2912 ]. Each namespace SHOULD be explicitly identified through namespace declaration [ XMLNS ]. This document does not preclude the registration of its own media type for specific XHTML Host Language document type.
In general, this media type is NOT suitable for XHTML Integration Set document types. This document does not define which media type should be used for XHTML Integration Set document types.
'application/xhtml+xml' SHOULD be used for serving XHTML documents to XHTML user agents (agents that explicitly indicate their support for this media type). Authors who wish to support both XHTML and HTML user agents MAY utilize content negotiation by serving carefully constructed XHTML documents both as 'text/html' and as 'application/xhtml+xml'. Alternately, authors may serve HTML versions of such documents as 'text/html' and XHTML versions as 'application/xhtml+xml'. Also note that it is not necessary for XHTML documents served as 'application/xhtml+xml' to follow the HTML4 Compatibility Guidelines .
When serving an XHTML document with this media type, authors MAY include the XML stylesheet processing instruction [ XMLstyle ] to associate style sheets. This is not generally necessary when documents are to be processed by XHTML-aware user agents, but generic XML document processors may handle such processing instructions.
As for character encoding issues, as mentioned in "6. Charset default rules" of [ RFC3236 ], 'application/xhtml+xml' has the same considerations as 'application/xml'. See section 3.3 for details.
The 'application/xml' media type [ RFC3023 ] is a generic media type for XML documents, and the definition of 'application/xml' does not preclude serving XHTML documents as that media type. Any XHTML Family document MAY be served as 'application/xml'.
However, authors should be aware that such a document may not always be processed as XHTML (e.g. hyperlinks may not be recognized), depending on user agents. Generic XML processors might recognize it as just an XML document which includes elements and attributes from the XHTML namespace (and others), and may not have a priori knowledge what to do with such a document beyond they can do for generic XML documents.
Authors SHOULD explicitly identify the XHTML namespace through the namespace declaration when they serve an XHTML Family document as 'application/xml' to facilitate the chance for reliable processing. The XML stylesheet PI SHOULD be used to associate style sheets.
Whenever appropriate, 'application/xhtml+xml' SHOULD be used rather than 'application/xml'.
As
for
character
encoding
issues,
"3.2
Application/xml
Registration"
of
[
RFC3023
]
says
that
the
use
of
the
charset
parameter
is
STRONGLY
RECOMMENDED
,
and
also
specifies
a
rule
that
[i]f
an
application/xml
entity
is
received
where
the
charset
parameter
is
omitted,
no
information
is
being
provided
about
the
charset
by
the
MIME
Content-Type
header
.
This
means
that
conforming
XML
processors
MUST
follow
the
requirements
described
in
section
4.3.3
of
[
XML10
].
Therefore, while it is STRONGLY RECOMMENDED to specify an explicit charset parameter through a higher-level protocol, authors SHOULD include the XML declaration (e.g. <?xml version="1.0" encoding="EUC-JP"?> ). Note that a meta http-equiv statement will not be recognized by XML processors, and while authors MAY include such a statement a statement in an XHTML document served as 'application/xml' it will not effect processing of the document since the higher level protocol and the XML PI both take precedence.
The 'text/xml' media type [ RFC3023 ] is an another generic media type for XML documents, and the definition of 'text/xml' does not preclude serving XHTML documents as that media type, either. Any XHTML Family document MAY be served as 'text/xml'. The considerations for 'application/xml' also apply to 'text/xml'. Whenever appropriate, 'application/xhtml+xml' SHOULD be used rather than 'text/xml'.
Authors
should
also
be
aware
of
the
difference
between
'application/xml'
(and
for
that
matter
'application/xhtml+xml'
as
well)
and
'text/xml'
with
regard
to
the
treatment
of
character
encoding.
According
to
"3.1
Text/xml
Registration"
of
[
RFC3023
],
if
a
text/xml
entity
is
received
with
the
charset
parameter
omitted,
MIME
processors
and
XML
processors
MUST
use
the
default
charset
value
of
"us-ascii"[
ASCII
]
.
This
default
value
is
authoritative
over
the
encoding
information
specified
in
the
XML
declaration,
or
the
XML
default
encodings
of
UTF-8
and
UTF-16
when
no
encoding
declaration
is
supplied,
so
omitting
the
charset
parameter
of
a
'text/xml'
entity
might
cause
an
unexpected
result.
As
mentioned
in
[
RFC3023
],
the
use
of
the
charset
parameter
is
STRONGLY
RECOMMENDED
.
The following table summarizes recommendation to content authors for labeling XHTML documents. HTML 4 is also listed for comparison.
Media type | HTML 4 | XHTML Family (HTML 4 compatible) | XHTML Family (other) | XHTML Family + Extensions |
---|---|---|---|---|
text/html | SHOULD | MAY | SHOULD NOT * | SHOULD NOT |
application/xhtml+xml | MUST NOT | MAY | SHOULD | SHOULD |
application/xml | MUST NOT | MAY | MAY | MAY |
text/xml | MUST NOT | MAY | MAY | MAY |
* However, see transformation .
This appendix summarizes design guidelines for authors who wish their XHTML documents to render on both XHTML-aware and modern HTML user agents. The purpose of providing these guidelines is to supply a simple collection that, if followed, will give reasonable, predictable results in modern user agents. Document authors should treat these as best practices that were considered correct at the time this document was published. Like all of this document, this Appendix is informative . It contains no absolute requirements, and should NEVER be used as the basis for creating conformance nor validation rules of any sort. Period.
For an example document that reflect the use of the guidelines from this section, see Appendix B .
DO NOT include XML processing instructions NOR the XML declaration.
Rationale : Some HTML user agents render XML processing instructions. Also, some user agents interpret the XML declaration to mean that the document is unrecognized XML rather than HTML. Such user agents may not render the document as expected. For compatibility with these types of HTML browsers, you should avoid using processing instructions and XML declarations.
Consequence : Remember, however, that when the XML declaration is not included in a document, AND the character encoding is not specified by a higher level protocol such as HTTP, the document can only use the default character encodings UTF-8 or UTF-16. See, however, guideline 9 below.
If
an
element
has
an
EMPTY
content
model
DO
use
the
minimized
tag
syntax
permitted
by
XML
(e.g.,
<br />
).
DO
NOT
use
the
alternative
syntax
(e.g.,
<br></br>
)
allowed
by
XML,
since
this
may
be
unsupported
by
HTML
user
agents.
Also,
DO
include
a
space
before
the
trailing
/
and
>
.
Empty
elements
in
the
XHTML
family
include:
area
,
base
,
basefont
,
br
,
col
,
hr
,
img
,
input
,
isindex
,
link
,
meta
,
and
param
.
Rationale:
HTML
user
agents
ignore
the
/>
at
the
end
of
a
tag,
but
without
it
they
may
incorrectly
parse
the
tag
or
its
attributes.
HTML
user
agents
also
may
not
recognize
the
alternate
syntax
permitted
by
XML.
If
an
element
permits
content
(e.g.,
the
p
element)
but
an
instance
of
that
element
has
no
content
(e.g.,
an
empty
paragraph),
DO
NOT
use
the
"minimized"
tag
syntax
(e.g.,
<p
/>
).
Rationale: HTML user agents may give uncertain results when using the the minimized syntax permitted by XML when an element has no content.
DO
use
external
style
sheets
if
your
style
sheet
uses
<
or
&
or
]]>
or
--
.
DO
NOT
use
an
internal
stylesheet
if
the
style
rules
contain
any
of
the
above
characters.
DO
use
external
scripts
if
your
script
uses
<
or
&
or
]]>
or
--
.
DO
NOT
embed
a
script
in
a
document
if
it
contains
any
of
these
characters.
Rationale : XML parsers are permitted to silently remove the contents of comments. Therefore, the historical practice of "hiding" scripts and style sheets within "comments" to make the documents backward compatible may not work as expected in XML-based user agents.
@@@@Put a real example in here that works, and one that does not work@@@@
DO ensure that attribute values are on a single line and only use single whitespace characters.
DO NOT use line breaks and multiple consecutive white space characters within attribute values.Rationale : These are handled inconsistently by user agents.
lang
and
xml:lang
Attributes
DO
use
both
lang
and
xml:lang
attributes
when
specifying
the
language
of
an
element
in
markup
languages
that
support
the
use
of
both.
DO
NOT
use
the
only
the
lang
attribute,
even
in
languages
that
include
it
such
as
XHTML
1.0.
Rationale
:
HTML
4
documents
use
the
lang
attribute
to
identify
the
language
of
an
element.
XML
documents
use
the
xml:lang
attribute.
CSS
has
a
"lang"
pseudo
selector
that
automatically
uses
the
appropriate
attribute
depending
on
the
document
type.
Therefore,
specifying
both
attributes
ensures
that
single
CSS
selectors
will
work
in
both
modes.
DO
use
the
id
attribute
to
identify
elements.
DO
ensure
that
the
values
used
for
the
id
attribute
are
limited
to
the
pattern
[A-Za-z][A-Za-z0-9:_.-]*
.
DO
NOT
use
the
name
attribute
to
identify
elements,
even
in
languages
that
permit
the
use
of
name
such
as
XHTML
1.0.
Rationale
:
In
HTML
3.2
and
earlier
the
name
attribute
on
some
elements
could
be
used
to
define
an
anchor,
but
HTML
4
introduced
the
id
attribute.
In
an
XML
dialect,
only
attributes
with
type
ID
are
permitted
to
be
used
as
anchors,
and
the
id
attribute
is
defined
to
be
of
type
ID
.
Relying
upon
the
id
attribute
as
an
anchor
will
work
well
in
modern
HTML
and
XHTML-aware
user
agents.
DO encode your document in UTF-8 or UTF-16. When delivering the document from a server, DO set the character encoding for a document via the charset parameter of the HTTP Content-Type header. When not delivering the document from a server, DO set the encoding via a "meta http-equiv" statement in the document (e.g., <meta http-equiv="Content-Type" content="text/html; charset=EUC-JP" /> ). However, note that doing so will explicitly bind the document to an a single content type.
Rationale : Since these guidelines already recommend that documents NOT contain the XML declaration, setting the encoding via the HTTP header is the only reliable mechanism compatible with HTML and XML user agents. When that mechanism is not available, the only portable fallback is the "meta http-equiv" statement.
DO
use
the
full
form
for
boolean
attributes,
as
required
by
XML
(e.g.,
disabled="disabled"
).
Such
attributes
include:
compact
,
nowrap
,
ismap
,
declare
,
noshade
,
checked
,
disabled
,
readonly
,
multiple
,
selected
,
noresize
,
and
defer
.
Rationale : The compact form of these attributes is not well formed XML, and therefore invalid.
DO rely upon the HTML 4 DOM as defined in The Document Object Model level 1 Recommendation [ DOM ] for scripting. This means, in particular, that the names of elements and attributes will be returned (from functions that return such things) in upper case.
Rationale : Using the HTML DOM will result in maximum portability of scripts, since the HTML DOM is supported in both HTML and XHTML documents in modern user agents.
DO
ensure
that
when
content
or
attribute
values
contain
the
reserved
character
&
it
is
used
in
its
escaped
form
&
.
Rationale : If ampersands are not encoded, the characters after them up to the next semi-colon can be interpreted as the name of a entity by the user agent.
@@@@add example@@@@
DO
use
lower
case
element
and
attribute
names
in
style
sheets.
DO
create
rules
that
include
inferred
elements
(e.g.,
the
tbody
element
in
a
table).
Rationale : These simple rules will help increase the portability of CSS rules regardless of the media type the document is processed as.
@@@@add examples@@@@
DO NOT use xml stylesheet declarations to identify style sheets.
DO
use
the
style
or
link
elements
to
define
stylesheets.
Rationale
:
Since
XML
processing
instructions
may
be
rendered
by
some
HTML
user
agents,
using
the
standard
XML
stylesheet
declaration
mechanism
may
not
work
well.
However,
since
XHTML
user
agents
are
required
to
process
style
and
link
elements
and
interpret
stylesheets
referenced
from
those
elements,
documents
constructed
to
use
them
will
work
as
expected.
DO NOT use the formfeed character (U+000C).
Rationale : This character is recognized as white space in HTML 4, but is NOT considered white space in XML.
DO
use
'
to
specify
an
escaped
apostrophe.
DO
NOT
use
'
.
Rationale
:
The
entity
'
is
not
defined
in
HTML
4.
The following is an example document that adopts the conventions described in Appendix A to ensure its portability among XHTML and HTML user agents.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>sample</title> <link href="style/style.css" rel="stylesheet" type="text/css" /> </head> <body> <div id="main"> <h1>heading</h1> <img src="http://www.w3.org/Icons/w3c_main" alt="W3C logo" /> <!-- defined as an "EMPTY" element, do not use <img></img> or <img/> --> <p>Some material & some <!-- use escaped ampersand, & --> <br /> <!-- defined as an "EMPTY" element, do not use <br></br> or <br/> --> that should be split.</p> <p></p> <!-- NOT defined as an "EMPTY" element, just no content, so do not use <p/> nor <p /> --> <input type="reset" disabled="disabled" /> <!-- defined as an "EMPTY" element, do not use <hr></hr> nor <hr/> --> <hr /> <!-- defined as an "EMPTY" element, do not use <hr></hr> nor <hr/> --> </div> </body> </html>
" HTML 4.01 Specification ", W3C Recommendation, D. Raggett, A. Le Hors, I. Jacobs, eds. , 24 December 1999. Available at: http://www.w3.org/TR/1999/REC-html401-19991224
The latest version of HTML 4.01 is available at: http://www.w3.org/TR/html401
The latest version of HTML 4 is available at: http://www.w3.org/TR/html4
" Mathematical Markup Language (MathML) Version 2.0 ", W3C Recommendation, D. Carlisle, P. Ion, R. Miner, N. Poppelier, eds. , 21 February 2001. Available at: http://www.w3.org/TR/2001/REC-MathML2-20010221
The latest version is available at: http://www.w3.org/TR/MathML2
The W3C Markup Validation Service available at http://validator.w3.org.
" XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition): A Reformulation of HTML 4 in XML 1.0 ", W3C Recommendation, S. Pemberton et al. , August 2002. Available at: http://www.w3.org/TR/2002/REC-xhtml1-20020801
The first edition is available at: http://www.w3.org/TR/2000/REC-xhtml1-20000126
The latest version is available at: http://www.w3.org/TR/xhtml1
" XHTML™ 1.1 - Module-based XHTML ", W3C Recommendation, M. Altheim, S. McCarron, eds. , 31 May 2001. Available at: http://www.w3.org/TR/2001/REC-xhtml11-20010531
The latest version is available at: http://www.w3.org/TR/xhtml11
" XHTML™ Basic ", W3C Recemmendation, M. Baker, M. Ishikawa, S. Matsui, P. Stark, T. Wugofski, T. Yamakami, eds. , 19 December 2000. Available at: http://www.w3.org/TR/2000/REC-xhtml-basic-20001219
The latest version is available at: http://www.w3.org/TR/xhtml-basic
" Modularization of XHTML™ ", W3C Recommendation, M. Altheim, F. Boumphrey, S. Dooley, S. McCarron, S. Schnitzenbaumer, T. Wugofski, eds. , 10 April 2001. Available at: http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410
The latest version is at: http://www.w3.org/TR/xhtml-modularization
" Extensible Markup Language (XML) 1.0 Specification (Second Edition) ", T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, eds. , 6 October 2000. Available at: http://www.w3.org/TR/2000/REC-xml-20001006
The latest version is available at: http://www.w3.org/TR/REC-xml
" Namespaces in XML ", T. Bray, D. Hollander, A. Layman, eds. , 14 January 1999. Available at: http://www.w3.org/TR/1999/REC-xml-names-19990114
The latest version is available at: http://www.w3.org/TR/REC-xml-names
" Associating Style Sheets with XML documents Version 1.0 ", W3C Recommendation, J. Clark, ed. , 29 June 1999. Available at: http://www.w3.org/1999/06/REC-xml-stylesheet-19990629
The latest version is available at: http://www.w3.org/TR/xml-stylesheet
In 3.5. Summary , changed 'text/html' for HTML 4 as SHOULD rather than MAY .
Updated reference to XHTML 1.0 to refer to the Second Edition.