Copyright ©2002 W3C ® ( MIT , INRIA , Keio ), All Rights Reserved. W3C liability , trademark , document use and software licensing rules apply.
This document summarizes the current best practice for using various Internet media types when serving XHTML Family documents. In summary , 'application/xhtml+xml' SHOULD be used for XHTML Family documents, and the use of 'text/html' SHOULD be limited to HTML -compatible XHTML Family documents intended for delivery to user agents that do not explciitly accept 'application/xhtml+xml'. 'application/xml' and 'text/xml' MAY also be used, but whenever appropriate, 'application/xhtml+xml' or 'text/html' SHOULD be used rather than those generic XML media types.
Note that, because of the lack of explicit support for XHTML (and XML in general) in some legacy user agents, only very careful construction of documents can ensure their portability (see Appendix A ). If you do not require the advanced features of XHTML Family markup languages (e.g., XML DOM, XML Validation, semantic markup via XHTML+RDFa, Assistive Technology access via the XHTML Role and XHTML Access modules, etc), authors may want to consider using HTML 4.01 [ HTML ] in order to reduce the risk that content will not be portable to legacy user agents. Even in that case authors can help ensure their portability AND ease their eventual migration to the XHTML Family by ensuring their documents are valid [ VALIDATOR ] and by following the guidelines in Appendix A .
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is a Note made available by the World Wide Web Consortium (W3C) for your information. Publication of this Note by W3C indicates no endorsement by W3C or the W3C Team, or any W3C Members.
This document has been produced by the W3C XHTML 2 Working Group as part of the HTML Activity . The goals of the XHTML 2 Working Group are discussed in the XHTML 2 Working Group charter . The document represents working group consensus on the usage of Internet media types for various XHTML Family documents. However, this document is not intended to be a normative specification. Instead, it documents a set of recommendations to maximize the interoperability of XHTML documents with regard to Internet media types. This document does not address general issues on media types and namespaces.
Comments on this document may be sent to www-html-editor@w3.org ( archive ). Public discussion on this document may take place on the mailing list www-html@w3.org ( archive ).
XHTML 1.0 [ XHTML1 ] reformulated HTML 4 [ HTML4 ] as an XML application, and Modularization of XHTML [ XHTMLM12N ] provided a means to define XHTML-based markup languages using XHTML modules, collectively called as "XHTML Family". However, due to historical reasons, a recommended way to serve such XHTML Family documents, in particular with regard to Internet media types, was somewhat unclear.
After the publication of [ XHTML1 ], an RFC for XML media types was revised and published as RFC 3023 [ RFC3023 ], and it introduced the '+xml' suffix convention for XML-based media types. The 'application/xhtml+xml' media type [ RFC3236 ] was registered following that convention. Now that there are at least four possibilities on media type labeling for XHTML Family documents - 'text/html', 'application/xhtml+xml', and generic XML media types 'application/xml' and 'text/xml'.
This document summarizes the current best practice for using those various Internet media types for XHTML Family documents.
The key words " MUST ", " MUST NOT ", " REQUIRED ", " SHALL ", " SHALL NOT ", " SHOULD ", " SHOULD NOT ", " RECOMMENDED ", " MAY ", and " OPTIONAL " in this document are to be interpreted as described in RFC 2119 [ RFC2119 ].
http://www.w3.org/1999/xhtml
.
xml:lang
),
but
an
XHTML
Family
document
type
MAY
also
include
elements
and
attributes
from
other
namespaces,
such
as
MathML
[
MathML2
].
This section summarizes which Internet media type SHOULD be used for which XHTML Family document for which purpose.
The 'text/html' media type [ RFC2854 ] is primarily for HTML, not for XHTML. In general, this media type is NOT suitable for XHTML except when the XHTML is carefully constructed (see Appendix A. In particular, 'text/html' is NOT suitable for XHTML Family document types that add elements and attributes from foreign namespaces, such as XHTML+MathML [ XHTML+MathML ].
XHTML documents served as 'text/html' will not be processed as XML [ XML10 ], e.g. well-formedness errors may not be detected by user agents. Also be aware that HTML rules will be applied for DOM and style sheets (see guidelines 11 and 13 ).
Authors should also be careful about character encoding issues. A typical misunderstanding is that since an XHTML document is an XML document, the character encoding of an XHTML document should be treated as UTF-8 or UTF-16 in the absence of an explicit character encoding information. This is NOT the case when an XHTML document is served as 'text/html'. "6. Charset default rules" of [ RFC2854 ] notes as follows:
The use of an explicit charset parameter is strongly recommended. While [ MIME ] specifies "The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII." [ HTTP ] Section 3.7.1, defines that "media subtypes of the 'text' type are defined to have a default charset value of 'ISO-8859-1'". Section 19.3 of [ HTTP ] gives additional guidelines. Using an explicit charset parameter will help avoid confusion.
Using an explicit charset parameter also takes into account that the overwhelming majority of deployed browsers are set to use something else than 'ISO-8859-1' as the default; the actual default is either a corporate character encoding or character encodings widely deployed in a certain national or regional community. For further considerations, please also see Section 5.2 of [ HTML40 ].
"5.2.2
Specifying
the
character
encoding"
of
the
HTML
4
specification
[
HTML4
]
also
notes
that
user
agents
must
not
assume
any
default
value
for
the
"charset"
parameter
.
Therefore,
authors
SHOULD
NOT
assume
any
default
value
for
an
XHTML
document
served
as
'text/html',
and
as
mentioned
in
[
RFC2854
],
the
use
of
an
explicit
charset
parameter
is
STRONGLY
RECOMMENDED
.
When
it
is
difficult
to
specify
an
explicit
charset
parameter
through
a
higher-level
protocol,
authors
SHOULD
include
the
XML
declaration
(e.g.,
<?xml
version="1.0"
encoding="EUC-JP"?>
)
and
a
meta
http-equiv
statement
(e.g.
<meta
http-equiv="Content-Type"
content="text/html;
charset=EUC-JP" />
).
See
guideline
9
for
details.
The 'application/xhtml+xml' media type [ RFC3236 ] is the primary media type for XHTML Family document types, and in particular it is suitable for all XHTML Host Language document types. XHTML Family document types suitable for this media type include [ XHTML1 ], [ XHTMLBasic ], [ XHTML11 ] and [ XHTML+MathML ]. An XHTML Host Language document type that adds elements and attributes from foreign namespaces MAY identify its profile with the 'profile' optional parameter or other means such as the "Content-features" MIME header described in RFC 2912 [ RFC2912 ]. Each namespace SHOULD be explicitly identified through namespace declaration [ XMLNS ]. This document does not preclude the registration of its own media type for specific XHTML Host Language document type.
In general, this media type is NOT suitable for XHTML Integration Set document types. This document does not define which media type should be used for XHTML Integration Set document types.
'application/xhtml+xml' SHOULD be used for serving XHTML documents to XHTML user agents (agents that explicitly indicate their support for this media type). Authors who wish to support both XHTML and HTML user agents MAY utilize content negotiation by serving carefully constructed XHTML docuemtns both as 'text/html' and as 'application/xhtml+xml'. Alternately, authors may server HTML versions of such documents as 'text/html' and XHTML versions as 'application/xhtml+xml'. Also note that it is not necessary for XHTML documents served as 'application/xhtml+xml' to follow the HTML Compatibility Guidelines .
When serving an XHTML document with this media type, authors SHOULD include the XML stylesheet processing instruction [ XMLstyle ] to associate style sheets.
As for character encoding issues, as mentioned in "6. Charset default rules" of [ RFC3236 ], 'application/xhtml+xml' has the same considerations as 'application/xml'. See section 3.3 for details.
The 'application/xml' media type [ RFC3023 ] is a generic media type for XML documents, and the definition of 'application/xml' does not preclude serving XHTML documents as that media type. Any XHTML Family document MAY be served as 'application/xml'.
However, authors should be aware that such a document may not always be processed as XHTML (e.g. hyperlinks may not be recognized), depending on user agents. Generic XML processors might recognize it as just an XML document which includes elements and attributes from the XHTML namespace (and others), and may not have a priori knowledge what to do with such a document beyond they can do for generic XML documents.
Authors SHOULD explicitly identify the XHTML namespace through the namespace declaration when they serve an XHTML Family document as 'application/xml' to facilitate the chance for reliable processing. The XML stylesheet PI SHOULD be used to associate style sheets.
Whenever appropriate, 'application/xhtml+xml' SHOULD be used rather than 'application/xml'.
As
for
character
encoding
issues,
"3.2
Application/xml
Registration"
of
[
RFC3023
]
says
that
the
use
of
the
charset
parameter
is
STRONGLY
RECOMMENDED
,
and
also
specifies
a
rule
that
[i]f
an
application/xml
entity
is
received
where
the
charset
parameter
is
omitted,
no
information
is
being
provided
about
the
charset
by
the
MIME
Content-Type
header
.
This
means
that
conforming
XML
processors
MUST
follow
the
requirements
described
in
section
4.3.3
of
[
XML10
].
Therefore, while it is STRONGLY RECOMMENDED to specify an explicit charset parameter through a higher-level protocol, authors SHOULD include the XML declaration (e.g. <?xml version="1.0" encoding="EUC-JP"?> ). Note that a meta http-equiv statement will not be recognized by XML processors, and authors SHOULD NOT include such a statement in an XHTML document served as 'application/xml' (and 'application/xhtml+xml' as well for that matter).
The 'text/xml' media type [ RFC3023 ] is an another generic media type for XML documents, and the definition of 'text/xml' does not preclude serving XHTML documents as that media type, either. Any XHTML Family document MAY be served as 'text/xml'. The considerations for 'application/xml' also apply to 'text/xml'. Whenever appropriate, 'application/xhtml+xml' SHOULD be used rather than 'text/xml'.
Authors
should
also
be
aware
of
the
difference
between
'application/xml'
(and
for
that
matter
'application/xhtml+xml'
as
well)
and
'text/xml'
with
regard
to
the
treatment
of
character
encoding.
According
to
"3.1
Text/xml
Registration"
of
[
RFC3023
],
if
a
text/xml
entity
is
received
with
the
charset
parameter
omitted,
MIME
processors
and
XML
processors
MUST
use
the
default
charset
value
of
"us-ascii"[
ASCII
]
.
This
default
value
is
authoritative
over
the
encoding
information
specified
in
the
XML
declaration,
or
the
XML
default
encodings
of
UTF-8
and
UTF-16
when
no
encoding
declaration
is
supplied,
so
omitting
the
charset
parameter
of
a
'text/xml'
entity
might
cause
an
unexpected
result.
As
mentioned
in
[
RFC3023
],
the
use
of
the
charset
parameter
is
STRONGLY
RECOMMENDED
.
The following table summarizes recommendation to content authors for labeling XHTML documents. HTML 4 is also listed for comparison purpose.
Media type | HTML 4 | XHTML 1.0 (HTML compatible) | XHTML 1.0 (other) | XHTML Basic / 1.1 | XHTML+MathML |
---|---|---|---|---|---|
text/html | SHOULD | MAY | SHOULD NOT | SHOULD NOT | SHOULD NOT |
application/xhtml+xml | MUST NOT | SHOULD | SHOULD | SHOULD | SHOULD |
application/xml | MUST NOT | MAY | MAY | MAY | MAY |
text/xml | MUST NOT | MAY | MAY | MAY | MAY |
This appendix summarizes design guidelines for authors who wish their XHTML documents to render on existing HTML user agents.
Be aware that processing instructions are rendered on some user agents. Also, some user agents interpret the XML declaration to mean that the document is unrecognized XML rather than HTML, and therefore may not render the document as expected. For compatibility with these types of legacy browsers, you may want to avoid using processing instructions and XML declarations. Remember, however, that when the XML declaration is not included in a document, the document can only use the default character encodings UTF-8 or UTF-16.
Include
a
space
before
the
trailing
/
and
>
of
empty
elements,
e.g.
<br />
,
<hr />
and
<img
src="karen.jpg"
alt="Karen" />
.
Also,
use
the
minimized
tag
syntax
for
empty
elements,
e.g.
<br
/>
,
as
the
alternative
syntax
<br></br>
allowed
by
XML
gives
uncertain
results
in
many
existing
user
agents.
Given
an
empty
instance
of
an
element
whose
content
model
is
not
EMPTY
(for
example,
an
empty
title
or
paragraph)
do
not
use
the
minimized
form
(e.g.
use
<p>
</p>
and
not
<p />
).
Use
external
style
sheets
if
your
style
sheet
uses
<
or
&
or
]]>
or
--
.
Use
external
scripts
if
your
script
uses
<
or
&
or
]]>
or
--
.
Note
that
XML
parsers
are
permitted
to
silently
remove
the
contents
of
comments.
Therefore,
the
historical
practice
of
"hiding"
scripts
and
style
sheets
within
"comments"
to
make
the
documents
backward
compatible
is
likely
to
not
work
as
expected
in
XML-based
user
agents.
Avoid line breaks and multiple white space characters within attribute values. These are handled inconsistently by user agents.
Don't
include
more
than
one
isindex
element
in
the
document
head
.
The
isindex
element
is
deprecated
in
favor
of
the
input
element.
lang
and
xml:lang
Attributes
Use
both
the
lang
and
xml:lang
attributes
when
specifying
the
language
of
an
element.
The
value
of
the
xml:lang
attribute
takes
precedence.
In
XML,
URI
-references
[
RFC2396
]
that
end
with
fragment
identifiers
of
the
form
"#foo"
do
not
refer
to
elements
with
an
attribute
name="foo"
;
rather,
they
refer
to
elements
with
an
attribute
defined
to
be
of
type
ID
,
e.g.,
the
id
attribute
in
HTML
4.
Many
existing
HTML
clients
don't
support
the
use
of
ID
-type
attributes
in
this
way,
so
identical
values
may
be
supplied
for
both
of
these
attributes
to
ensure
maximum
forward
and
backward
compatibility
(e.g.,
<a
id="foo"
name="foo">...</a>
).
Further,
since
the
set
of
legal
values
for
attributes
of
type
ID
is
much
smaller
than
for
those
of
type
CDATA
,
the
type
of
the
name
attribute
has
been
changed
to
NMTOKEN
.
This
attribute
is
constrained
such
that
it
can
only
have
the
same
values
as
type
ID
,
or
as
the
Name
production
in
XML
1.0
Section
2.3,
production
5.
Unfortunately,
this
constraint
cannot
be
expressed
in
the
XHTML
1.0
DTDs.
Because
of
this
change,
care
must
be
taken
when
converting
existing
HTML
documents.
The
values
of
these
attributes
must
be
unique
within
the
document,
valid,
and
any
references
to
these
fragment
identifiers
(both
internal
and
external)
must
be
updated
should
the
values
be
changed
during
conversion.
Note
that
the
collection
of
legal
values
in
XML
1.0
Section
2.3,
production
5
is
much
larger
than
that
permitted
to
be
used
in
the
ID
and
NAME
types
defined
in
HTML
4.
When
defining
fragment
identifiers
to
be
backward-compatible,
only
strings
matching
the
pattern
[A-Za-z][A-Za-z0-9:_.-]*
should
be
used.
See
Section
6.2
of
[
HTML4
]
for
more
information.
Finally,
note
that
XHTML
1.0
has
deprecated
the
name
attribute
of
the
a
,
applet
,
form
,
frame
,
iframe
,
img
,
and
map
elements,
and
it
will
be
removed
from
XHTML
in
subsequent
versions.
Historically,
the
character
encoding
of
an
HTML
document
is
either
specified
by
a
web
server
via
the
charset
parameter
of
the
HTTP
Content-Type
header,
or
via
a
meta
element
in
the
document
itself.
In
an
XML
document,
the
character
encoding
of
the
document
is
specified
on
the
XML
declaration
(e.g.,
<?xml
version="1.0"
encoding="EUC-JP"?>
).
In
order
to
portably
present
documents
with
specific
character
encodings,
the
best
approach
is
to
ensure
that
the
web
server
provides
the
correct
headers.
If
this
is
not
possible,
a
document
that
wants
to
set
its
character
encoding
explicitly
must
include
both
the
XML
declaration
an
encoding
declaration
and
a
meta
http-equiv
statement
(e.g.,
<meta
http-equiv="Content-type"
content="text/html;
charset=EUC-JP" />
).
In
XHTML-conforming
user
agents,
the
value
of
the
encoding
declaration
of
the
XML
declaration
takes
precedence.
Note: be aware that if a document must include the character encoding declaration in a meta http-equiv statement, that document may always be interpreted by HTTP servers and/or user agents as being of the internet media type defined in that statement. If a document is to be served as multiple media types, the HTTP server must be used to set the encoding of the document.
Some
HTML
user
agents
are
unable
to
interpret
boolean
attributes
when
these
appear
in
their
full
(non-minimized)
form,
as
required
by
XML
1.0.
Note
this
problem
doesn't
affect
user
agents
compliant
with
HTML
4.
The
following
attributes
are
involved:
compact
,
nowrap
,
ismap
,
declare
,
noshade
,
checked
,
disabled
,
readonly
,
multiple
,
selected
,
noresize
,
defer
.
The Document Object Model level 1 Recommendation [ DOM ] defines document object model interfaces for XML and HTML 4. The HTML 4 document object model specifies that HTML element and attribute names are returned in upper-case. The XML document object model specifies that element and attribute names are returned in the case they are specified. In XHTML 1.0, elements and attributes are specified in lower-case. This apparent difference can be addressed in two ways:
text/html
via
the
DOM
can
use
the
HTML
DOM,
and
can
rely
upon
element
and
attribute
names
being
returned
in
upper-case
from
those
interfaces.
text/xml
,
application/xml
,
or
application/xhtml+xml
can
also
use
the
XML
DOM.
Elements
and
attributes
will
be
returned
in
lower-case.
Also,
some
XHTML
elements
may
or
may
not
appear
in
the
object
tree
because
they
are
optional
in
the
content
model
(e.g.
the
tbody
element
within
table
).
This
occurs
because
in
HTML
4
some
elements
were
permitted
to
be
minimized
such
that
their
start
and
end
tags
are
both
omitted
(an
SGML
feature).
This
is
not
possible
in
XML.
Rather
than
require
document
authors
to
insert
extraneous
elements,
XHTML
has
made
the
elements
optional.
User
agents
need
to
adapt
to
this
accordingly.
For
further
information
on
this
topic,
see
[
DOM2
]
In
both
SGML
and
XML,
the
ampersand
character
("&")
declares
the
beginning
of
an
entity
reference
(e.g.,
®
for
the
registered
trademark
symbol
"®").
Unfortunately,
many
HTML
user
agents
have
silently
ignored
incorrect
usage
of
the
ampersand
character
in
HTML
documents
-
treating
ampersands
that
do
not
look
like
entity
references
as
literal
ampersands.
XML-based
user
agents
will
not
tolerate
this
incorrect
usage,
and
any
document
that
uses
an
ampersand
incorrectly
will
not
be
"valid",
and
consequently
will
not
conform
to
this
specification.
In
order
to
ensure
that
documents
are
compatible
with
historical
HTML
user
agents
and
XML-based
user
agents,
ampersands
used
in
a
document
that
are
to
be
treated
as
literal
characters
must
be
expressed
themselves
as
an
entity
reference
(e.g.
"
&
").
For
example,
when
the
href
attribute
of
the
a
element
refers
to
a
CGI
script
that
takes
parameters,
it
must
be
expressed
as
http://my.site.dom/cgi-bin/myscript.pl?class=guest&name=user
rather
than
as
http://my.site.dom/cgi-bin/myscript.pl?class=guest&name=user
.
The Cascading Style Sheets level 2 Recommendation [ CSS2 ] defines style properties which are applied to the parse tree of the HTML or XML documents. Differences in parsing will produce different visual or aural results, depending on the selectors used. The following hints will reduce this effect for documents which are served without modification as both media types:
In
HTML
4
and
XHTML,
the
style
element
can
be
used
to
define
document-internal
style
rules.
In
XML,
an
XML
stylesheet
declaration
is
used
to
define
style
rules.
In
order
to
be
compatible
with
this
convention,
style
elements
should
have
their
fragment
identifier
set
using
the
id
attribute,
and
an
XML
stylesheet
declaration
should
reference
this
fragment.
For
example:
<?xml-stylesheet href="http://www.w3.org/StyleSheets/TR/W3C-REC.css" type="text/css"?> <?xml-stylesheet href="#internalStyle" type="text/css"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>An internal stylesheet example</title> <style type="text/css" id="internalStyle"> code { color: green; font-family: monospace; font-weight: bold; } </style> </head> <body> <p> This is text that uses our <code>internal stylesheet</code>. </p> </body> </html>
Some characters that are legal in HTML documents, are illegal in XML document. For example, in HTML, the Formfeed character (U+000C) is treated as white space, in XHTML, due to XML's definition of characters, it is illegal.
The
named
character
reference
'
(the
apostrophe,
U+0027)
was
introduced
in
XML
1.0
but
does
not
appear
in
HTML.
Authors
should
therefore
use
'
instead
of
'
to
work
as
expected
in
HTML
4
user
agents.
" HTML 4.01 Specification ", W3C Recommendation, D. Raggett, A. Le Hors, I. Jacobs, eds. , 24 December 1999. Available at: http://www.w3.org/TR/1999/REC-html401-19991224
The latest version of HTML 4.01 is available at: http://www.w3.org/TR/html401
The latest version of HTML 4 is available at: http://www.w3.org/TR/html4
" Mathematical Markup Language (MathML) Version 2.0 ", W3C Recommendation, D. Carlisle, P. Ion, R. Miner, N. Poppelier, eds. , 21 February 2001. Available at: http://www.w3.org/TR/2001/REC-MathML2-20010221
The latest version is available at: http://www.w3.org/TR/MathML2
The W3C Markup Validation Service available at http://validator.w3.org.
" XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition): A Reformulation of HTML 4 in XML 1.0 ", W3C Recommendation, S. Pemberton et al. , August 2002. Available at: http://www.w3.org/TR/2002/REC-xhtml1-20020801
The first edition is available at: http://www.w3.org/TR/2000/REC-xhtml1-20000126
The latest version is available at: http://www.w3.org/TR/xhtml1
" XHTML™ 1.1 - Module-based XHTML ", W3C Recommendation, M. Altheim, S. McCarron, eds. , 31 May 2001. Available at: http://www.w3.org/TR/2001/REC-xhtml11-20010531
The latest version is available at: http://www.w3.org/TR/xhtml11
" XHTML™ Basic ", W3C Recemmendation, M. Baker, M. Ishikawa, S. Matsui, P. Stark, T. Wugofski, T. Yamakami, eds. , 19 December 2000. Available at: http://www.w3.org/TR/2000/REC-xhtml-basic-20001219
The latest version is available at: http://www.w3.org/TR/xhtml-basic
" Modularization of XHTML™ ", W3C Recommendation, M. Altheim, F. Boumphrey, S. Dooley, S. McCarron, S. Schnitzenbaumer, T. Wugofski, eds. , 10 April 2001. Available at: http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410
The latest version is at: http://www.w3.org/TR/xhtml-modularization
" Extensible Markup Language (XML) 1.0 Specification (Second Edition) ", T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, eds. , 6 October 2000. Available at: http://www.w3.org/TR/2000/REC-xml-20001006
The latest version is available at: http://www.w3.org/TR/REC-xml
" Namespaces in XML ", T. Bray, D. Hollander, A. Layman, eds. , 14 January 1999. Available at: http://www.w3.org/TR/1999/REC-xml-names-19990114
The latest version is available at: http://www.w3.org/TR/REC-xml-names
" Associating Style Sheets with XML documents Version 1.0 ", W3C Recommendation, J. Clark, ed. , 29 June 1999. Available at: http://www.w3.org/1999/06/REC-xml-stylesheet-19990629
The latest version is available at: http://www.w3.org/TR/xml-stylesheet
In 3.5. Summary , changed 'text/html' for HTML 4 as SHOULD rather than MAY .
Updated reference to XHTML 1.0 to refer to the Second Edition.