W3C

XHTML Media Types - Second Edition

W3C Note HERELONGDATEHERE

This version:
http://www.w3.org/TR/2008/NOTE-xhtml-media-types-HEREDATEHERE
Latest version:
http://www.w3.org/TR/xhtml-media-types
Previous version:
http://www.w3.org/TR/2002/NOTE-xhtml-media-types-20020801
Diff from previous version:
xhtmlmime-diff.html
Editor:
Shane McCarron, Applied Testing and Technology, Inc.
First Edition Editor:
石川 雅康 (Ishikawa Masayasu), W3C

Abstract

This document summarizes the current best practice for using various Internet media types when serving XHTML Family documents to relatively modern user agents - even those that do not yet support XHTML natively. In summary, 'application/xhtml+xml' SHOULD be used for XHTML Family documents, and the use of 'text/html' SHOULD be limited to HTML-compatible XHTML Family documents intended for delivery to user agents that do not explcitly accept 'application/xhtml+xml'. 'application/xml' and 'text/xml' MAY also be used, but whenever appropriate, 'application/xhtml+xml' or 'text/html' SHOULD be used rather than those generic XML media types.

Note that, because of the lack of explicit support for XHTML (and XML in general) in some user agents, only very careful construction of documents can ensure their portability (see Appendix A). If you do not require the advanced features of XHTML Family markup languages (e.g., XML DOM, XML Validation, extensibility via XHTML Modularization, semantic markup via XHTML+RDFa, Assistive Technology access via the XHTML Role and XHTML Access modules, etc.), you may want to consider using HTML 4.01 [HTML] in order to reduce the risk that content will not be portable to legacy user agents. Even in that case authors can help ensure their portability AND ease their eventual migration to the XHTML Family by ensuring their documents are valid [VALIDATOR] and by following the relevant guidelines in Appendix A.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is a Note made available by the World Wide Web Consortium (W3C) for your information. Publication of this Note by W3C indicates no endorsement by W3C or the W3C Team, or any W3C Members.

This document has been produced by the W3C XHTML 2 Working Group as part of the HTML Activity. The goals of the XHTML 2 Working Group are discussed in the XHTML 2 Working Group charter. The document represents working group consensus on the usage of Internet media types for various XHTML Family documents. However, this document is not intended to be a normative specification. Instead, it documents a set of recommendations to maximize the interoperability of XHTML documents with regard to Internet media types. This document does not address general issues on media types and namespaces.

Comments on this document may be sent to www-html-editor@w3.org (archive). Public discussion on this document may take place on the mailing list www-html@w3.org (archive).

Table of Contents

1. Introduction

XHTML 1.0 [XHTML1] reformulated HTML 4 [HTML4] as an XML application, and Modularization of XHTML [XHTMLM12N] provided a means to define XHTML-based markup languages using XHTML modules, collectively called the "XHTML Family". However, due to historical reasons, a recommended way to serve such XHTML Family documents, in particular with regard to Internet media types, was somewhat unclear.

After the publication of [XHTML1], an RFC for XML media types was revised and published as RFC 3023 [RFC3023], and it introduced the '+xml' suffix convention for XML-based media types. The 'application/xhtml+xml' media type [RFC3236] was registered following that convention. Now there are at least four possibilities on media type labeling for XHTML Family documents - 'text/html', 'application/xhtml+xml', and generic XML media types 'application/xml' and 'text/xml'.

This document summarizes the current best practice for using those various Internet media types for XHTML Family documents.

2. Terms and Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

XHTML
The Extensible HyperText Markup Language. XHTML is not the name of a single, monolithic markup language, but the name of a family of document types which collectively form a family of related markup languages. The namespace URI for the XHTML family is http://www.w3.org/1999/xhtml.
XHTML Family document type
A document type which belongs to the family of XHTML document types. Such document types include [XHTML1], and XHTML Host Language document types such as XHTML 1.1 [XHTML11] and XHTML Basic [XHTMLBasic]. Elements and attributes in those document types belong to the XHTML namespace (except those from the XML namespace, such as xml:lang), but an XHTML Family document type MAY also include elements and attributes from other namespaces, such as MathML [MathML2].
XHTML Host Language document type
A document type which conforms to the "XHTML Host Language Document Type Conformance" as defined in section 3.1 of [XHTMLM12N].
XHTML Integration Set document type
A document type which conforms to the "XHTML Integration Set Document Type Conformance" as defined in section 3.2 of [XHTMLM12N].

3. Recommended Media Type Usage

This section summarizes which Internet media type SHOULD be used for which XHTML Family document for which purpose.

3.1. 'text/html'

The 'text/html' media type [RFC2854] is primarily for HTML, not for XHTML. In general, this media type is NOT suitable for XHTML except when the XHTML is carefully constructed (see Appendix A. In particular, 'text/html' is NOT suitable for XHTML Family document types that add elements and attributes from foreign namespaces, such as XHTML+MathML [XHTML+MathML].

XHTML documents served as 'text/html' will not be processed as XML [XML10], e.g. well-formedness errors may not be detected by user agents. Also be aware that HTML rules will be applied for DOM and style sheets (see guidelines 11 and 13).

Authors should also be careful about character encoding issues. A typical misunderstanding is that since an XHTML document is an XML document, the character encoding of an XHTML document should be treated as UTF-8 or UTF-16 in the absence of an explicit character encoding information. This is NOT the case when an XHTML document is served as 'text/html'. "6. Charset default rules" of [RFC2854] notes as follows:

The use of an explicit charset parameter is strongly recommended. While [MIME] specifies "The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII." [HTTP] Section 3.7.1, defines that "media subtypes of the 'text' type are defined to have a default charset value of 'ISO-8859-1'". Section 19.3 of [HTTP] gives additional guidelines. Using an explicit charset parameter will help avoid confusion.

Using an explicit charset parameter also takes into account that the overwhelming majority of deployed browsers are set to use something else than 'ISO-8859-1' as the default; the actual default is either a corporate character encoding or character encodings widely deployed in a certain national or regional community. For further considerations, please also see Section 5.2 of [HTML40].

"5.2.2 Specifying the character encoding" of the HTML 4 specification [HTML4] also notes that user agents must not assume any default value for the "charset" parameter. Therefore, authors SHOULD NOT assume any default value for an XHTML document served as 'text/html', and as mentioned in [RFC2854], the use of an explicit charset parameter is STRONGLY RECOMMENDED. When it is difficult to specify an explicit charset parameter through a higher-level protocol (e.g., HTTP), authors SHOULD include the XML declaration (e.g., <?xml version="1.0" encoding="EUC-JP"?>) and a meta http-equiv statement (e.g. <meta http-equiv="Content-Type" content="text/html; charset=EUC-JP" />). See guideline 9 for details.

3.2. 'application/xhtml+xml'

The 'application/xhtml+xml' media type [RFC3236] is the primary media type for XHTML Family document types, and in particular it is suitable for all XHTML Host Language document types. XHTML Family document types suitable for this media type include [XHTML1], [XHTMLBasic], [XHTML11] and [XHTML+MathML]. An XHTML Host Language document type that adds elements and attributes from foreign namespaces MAY identify its profile with the 'profile' optional parameter or other means such as the "Content-features" MIME header described in RFC 2912 [RFC2912]. Each namespace SHOULD be explicitly identified through namespace declaration [XMLNS]. This document does not preclude the registration of its own media type for specific XHTML Host Language document type.

In general, this media type is NOT suitable for XHTML Integration Set document types. This document does not define which media type should be used for XHTML Integration Set document types.

'application/xhtml+xml' SHOULD be used for serving XHTML documents to XHTML user agents (agents that explicitly indicate their support for this media type). Authors who wish to support both XHTML and HTML user agents MAY utilize content negotiation by serving carefully constructed XHTML documents both as 'text/html' and as 'application/xhtml+xml'. Alternately, authors may serve HTML versions of such documents as 'text/html' and XHTML versions as 'application/xhtml+xml'. Also note that it is not necessary for XHTML documents served as 'application/xhtml+xml' to follow the HTML Compatibility Guidelines.

When serving an XHTML document with this media type, authors MAY include the XML stylesheet processing instruction [XMLstyle] to associate style sheets. This is not generally necessary when documents are to be processed by XHTML-aware user agents, but generic XML document processors may handle such processing instructions.

As for character encoding issues, as mentioned in "6. Charset default rules" of [RFC3236], 'application/xhtml+xml' has the same considerations as 'application/xml'. See section 3.3 for details.

3.3. 'application/xml'

The 'application/xml' media type [RFC3023] is a generic media type for XML documents, and the definition of 'application/xml' does not preclude serving XHTML documents as that media type. Any XHTML Family document MAY be served as 'application/xml'.

However, authors should be aware that such a document may not always be processed as XHTML (e.g. hyperlinks may not be recognized), depending on user agents. Generic XML processors might recognize it as just an XML document which includes elements and attributes from the XHTML namespace (and others), and may not have a priori knowledge what to do with such a document beyond they can do for generic XML documents.

Authors SHOULD explicitly identify the XHTML namespace through the namespace declaration when they serve an XHTML Family document as 'application/xml' to facilitate the chance for reliable processing. The XML stylesheet PI SHOULD be used to associate style sheets.

Whenever appropriate, 'application/xhtml+xml' SHOULD be used rather than 'application/xml'.

As for character encoding issues, "3.2 Application/xml Registration" of [RFC3023] says that the use of the charset parameter is STRONGLY RECOMMENDED, and also specifies a rule that [i]f an application/xml entity is received where the charset parameter is omitted, no information is being provided about the charset by the MIME Content-Type header. This means that conforming XML processors MUST follow the requirements described in section 4.3.3 of [XML10].

Therefore, while it is STRONGLY RECOMMENDED to specify an explicit charset parameter through a higher-level protocol, authors SHOULD include the XML declaration (e.g. <?xml version="1.0" encoding="EUC-JP"?>). Note that a meta http-equiv statement will not be recognized by XML processors, and while authors MAY include such a statement a statement in an XHTML document served as 'application/xml' it will not effect processing of the document since the higher level protocol and the XML PI both take precedence.

3.4. 'text/xml'

The 'text/xml' media type [RFC3023] is an another generic media type for XML documents, and the definition of 'text/xml' does not preclude serving XHTML documents as that media type, either. Any XHTML Family document MAY be served as 'text/xml'. The considerations for 'application/xml' also apply to 'text/xml'. Whenever appropriate, 'application/xhtml+xml' SHOULD be used rather than 'text/xml'.

Authors should also be aware of the difference between 'application/xml' (and for that matter 'application/xhtml+xml' as well) and 'text/xml' with regard to the treatment of character encoding. According to "3.1 Text/xml Registration" of [RFC3023], if a text/xml entity is received with the charset parameter omitted, MIME processors and XML processors MUST use the default charset value of "us-ascii"[ASCII]. This default value is authoritative over the encoding information specified in the XML declaration, or the XML default encodings of UTF-8 and UTF-16 when no encoding declaration is supplied, so omitting the charset parameter of a 'text/xml' entity might cause an unexpected result. As mentioned in [RFC3023], the use of the charset parameter is STRONGLY RECOMMENDED.

3.5. Summary

The following table summarizes recommendation to content authors for labeling XHTML documents. HTML 4 is also listed for comparison.

Media types summary for serving XHTML documents
Media type HTML 4 XHTML Family (HTML compatible) XHTML Family (other) XHTML Family + Extensions
text/html SHOULD MAY SHOULD NOT SHOULD NOT
application/xhtml+xml MUST NOT MAY SHOULD SHOULD
application/xml MUST NOT MAY MAY MAY
text/xml MUST NOT MAY MAY MAY

Appendix A. Compatibility Guidelines

This appendix summarizes design guidelines for authors who wish their XHTML documents to render on both XHTML-aware and modern HTML user agents. Our goal in providing these guidelines is to supply a simple collection that, if followed, will give reasonable, predictable results in modern user agents. Document authors should treat these as best practices that were considered correct at the time this document was published. Like all of this document, this Appendix is informative. It contains no absolute requirements, and should NEVER be used as the basis for creating conformance rules of any sort. Period.

Processing Instructions and the XML Declaration

DO NOT include XML processing instructions NOR the XML declaration.

Rationale: Some legacy user agents render XML processing instructions. Also, some user agents interpret the XML declaration to mean that the document is unrecognized XML rather than HTML. Such user agents may not render the document as expected. For compatibility with these types of legacy browsers, you may want to avoid using processing instructions and XML declarations. Remember, however, that when the XML declaration is not included in a document, AND the character encoding is not specified by a higher level protocol such as HTTP, the document can only use the default character encodings UTF-8 or UTF-16.

Original text:

Be aware that processing instructions are rendered on some user agents. Also, some user agents interpret the XML declaration to mean that the document is unrecognized XML rather than HTML, and therefore may not render the document as expected. For compatibility with these types of legacy browsers, you may want to avoid using processing instructions and XML declarations. Remember, however, that when the XML declaration is not included in a document, the document can only use the default character encodings UTF-8 or UTF-16.

A.2. Empty Elements

DO include a space before the trailing / and > of elements that have no content (e.g., <br />, <hr /> and <img src="karen.jpg" alt="Karen" />).

Rationale: Legacy user agents ignore the  /> at the end of a tag, but without it they may incorrectly parse the tag or its attributes.

DO use the "minimized" tag syntax for empty elements (e.g., <br />).

Rationale: Legacy user agents may give uncertain results when using the the alternative syntax <br></br> allowed by XML.

Original text:

Include a space before the trailing / and > of empty elements, e.g. <br />, <hr /> and <img src="karen.jpg" alt="Karen" />. Also, use the minimized tag syntax for empty elements, e.g. <br />, as the alternative syntax <br></br> allowed by XML gives uncertain results in many existing user agents.

A.3. Element Minimization and Empty Element Content

DO NOT use the "minimized" form of elements that are permitted to have content (e.g., DO NOT express an empty paragraph as <p />).

Rationale: HTML user agents are not aware of the XML-permitted minimization notation.

Original text:

Given an empty instance of an element whose content model is not EMPTY (for example, an empty title or paragraph) do not use the minimized form (e.g. use <p> </p> and not <p />).

A.4. Embedded Style Sheets and Scripts

DO use external style sheets if your style sheet uses < or & or ]]> or --.

DO use external scripts if your script uses < or & or ]]> or --.

Rationale: XML parsers are permitted to silently remove the contents of comments. Therefore, the historical practice of "hiding" scripts and style sheets within "comments" to make the documents backward compatible may not work as expected in XML-based user agents.

@@@@Ed. Note: Is this really an issue? We are giving advice about modern browsers and text/html compatibility here. Are there any browsers that would actually eat the content of comments?

Original text:

Use external style sheets if your style sheet uses < or & or ]]> or --. Use external scripts if your script uses < or & or ]]> or --. Note that XML parsers are permitted to silently remove the contents of comments. Therefore, the historical practice of "hiding" scripts and style sheets within "comments" to make the documents backward compatible is likely to not work as expected in XML-based user agents.

A.5. Line Breaks within Attribute Values

DO NOT use line breaks and multiple white space characters within attribute values.

Rationale: These are handled inconsistently by user agents.

Original text:

Avoid line breaks and multiple white space characters within attribute values. These are handled inconsistently by user agents.

A.6. isindex

DO NOT include more than one isindex element in the document head.

Rationale: The isindex element is deprecated in favor of the input element.

@@@@Ed. Note: This seems silly. No one uses isindex, and in any event it was NEVER permitted to have more than one of them. We should just remove this guideline altogether. It has nothing to do with text/html compatibility as far as I know.

Original text:

Don't include more than one isindex element in the document head. The isindex element is deprecated in favor of the input element.

A.7. The lang and xml:lang Attributes

DO use xml:lang attribute when specifying the language of an element.

Rationale: While XHTML 1.0 included both of these attributes, in modern user agents the xml:lang attribute can be used as an indicator for assistive technologies AND as a selection mechanism for CSS2 stylesheets. Modern implementations of these two technologies treat xml:lang correctly even in documents delivered as media type text/html.

Original text:

Use both the lang and xml:lang attributes when specifying the language of an element. The value of the xml:lang attribute takes precedence.

A.8. Fragment Identifiers

DO use the id attribute to identify elements.

DO ensure that the values used for the id attribute are limited to the pattern [A-Za-z][A-Za-z0-9:_.-]*.

Rationale: In HTML 3.2 and earlier the name attribute on some elements could be used to define an anchor, but HTML 4 introduced the id attribute. In an XML dialect, only attributes with type ID are permitted to be used as anchors, and the id attribute is defined to be of type ID. Relying upon the id attribute as an anchor will work well in modern HTML and XHTML-aware user agents.

Original text:

In XML, URI-references [RFC2396] that end with fragment identifiers of the form "#foo" do not refer to elements with an attribute name="foo"; rather, they refer to elements with an attribute defined to be of type ID, e.g., the id attribute in HTML 4. Many existing HTML clients don't support the use of ID-type attributes in this way, so identical values may be supplied for both of these attributes to ensure maximum forward and backward compatibility (e.g., <a id="foo" name="foo">...</a>).

Further, since the set of legal values for attributes of type ID is much smaller than for those of type CDATA, the type of the name attribute has been changed to NMTOKEN. This attribute is constrained such that it can only have the same values as type ID, or as the Name production in XML 1.0 Section 2.3, production 5. Unfortunately, this constraint cannot be expressed in the XHTML 1.0 DTDs. Because of this change, care must be taken when converting existing HTML documents. The values of these attributes must be unique within the document, valid, and any references to these fragment identifiers (both internal and external) must be updated should the values be changed during conversion.

Note that the collection of legal values in XML 1.0 Section 2.3, production 5 is much larger than that permitted to be used in the ID and NAME types defined in HTML 4. When defining fragment identifiers to be backward-compatible, only strings matching the pattern [A-Za-z][A-Za-z0-9:_.-]* should be used. See Section 6.2 of [HTML4] for more information.

Finally, note that XHTML 1.0 has deprecated the name attribute of the a, applet, form, frame, iframe, img, and map elements, and it will be removed from XHTML in subsequent versions.

A.9. Character Encoding

DO set the character encoding for a document via the charset parameter of the HTTP Content-Type header. When this is not possible, DO set the encoding via a "meta http-equiv" statement in the document (e.g., <meta http-equiv="Content-Type" content="text/html; charset=EUC-JP" />). However, note that doing so will explicitly bind the document to an a single content type.

Rationale: Since these guidelines already recommend that documents NOT contain the XML declaration, setting the encoding via the HTTP header is the only reliable mechanism compatible with legacy and XML user agents. When that mechanism is not available, the only portable fallback is the "meta http-equiv" statement.

Original text:

Historically, the character encoding of an HTML document is either specified by a web server via the charset parameter of the HTTP Content-Type header, or via a meta element in the document itself. In an XML document, the character encoding of the document is specified on the XML declaration (e.g., <?xml version="1.0" encoding="EUC-JP"?>). In order to portably present documents with specific character encodings, the best approach is to ensure that the web server provides the correct headers. If this is not possible, a document that wants to set its character encoding explicitly must include both the XML declaration an encoding declaration and a meta http-equiv statement (e.g., <meta http-equiv="Content-type" content="text/html; charset=EUC-JP" />). In XHTML-conforming user agents, the value of the encoding declaration of the XML declaration takes precedence.

Note: be aware that if a document must include the character encoding declaration in a meta http-equiv statement, that document may always be interpreted by HTTP servers and/or user agents as being of the internet media type defined in that statement. If a document is to be served as multiple media types, the HTTP server must be used to set the encoding of the document.

A.10. Boolean Attributes

DO use the full form for boolean attributes, as required by XML (e.g., disabled="disabled").

Rationale: The compact form of these attributes is not well formed XML, and therefore invalid.

Original text:

Some HTML user agents are unable to interpret boolean attributes when these appear in their full (non-minimized) form, as required by XML 1.0. Note this problem doesn't affect user agents compliant with HTML 4. The following attributes are involved: compact, nowrap, ismap, declare, noshade, checked, disabled, readonly, multiple, selected, noresize, defer.

A.11. Document Object Model and XHTML

DO rely upon the HTML 4 DOM as defined in The Document Object Model level 1 Recommendation [DOM] for scripting. This means, in particular, that the names of elements and attributes will be returned (from functions that return such things) in upper case.

Rationale: Using the HTML DOM will result in maximum portability of scripts, since the HTML DOM is supported in both HTML and XHTML documents in modern user agents.

@@@@Ed. Note: Is this really true?

Original text:

The Document Object Model level 1 Recommendation [DOM] defines document object model interfaces for XML and HTML 4. The HTML 4 document object model specifies that HTML element and attribute names are returned in upper-case. The XML document object model specifies that element and attribute names are returned in the case they are specified. In XHTML 1.0, elements and attributes are specified in lower-case. This apparent difference can be addressed in two ways:

  1. User agents that access XHTML documents served as Internet media type text/html via the DOM can use the HTML DOM, and can rely upon element and attribute names being returned in upper-case from those interfaces.
  2. User agents that access XHTML documents served as Internet media types text/xml, application/xml, or application/xhtml+xml can also use the XML DOM. Elements and attributes will be returned in lower-case. Also, some XHTML elements may or may not appear in the object tree because they are optional in the content model (e.g. the tbody element within table). This occurs because in HTML 4 some elements were permitted to be minimized such that their start and end tags are both omitted (an SGML feature). This is not possible in XML. Rather than require document authors to insert extraneous elements, XHTML has made the elements optional. User agents need to adapt to this accordingly. For further information on this topic, see [DOM2]

A.12. Using Ampersands in Attribute Values (and Elsewhere)

DO ensure that the reserved character & is included in content in its escaped form &amp;.

Rationale: If ampersands are not encoded, the characters after them up to the next semi-colon can be interpreted as the name of a entity by the user agent.

Original text:

In both SGML and XML, the ampersand character ("&") declares the beginning of an entity reference (e.g., &reg; for the registered trademark symbol "®"). Unfortunately, many HTML user agents have silently ignored incorrect usage of the ampersand character in HTML documents - treating ampersands that do not look like entity references as literal ampersands. XML-based user agents will not tolerate this incorrect usage, and any document that uses an ampersand incorrectly will not be "valid", and consequently will not conform to this specification. In order to ensure that documents are compatible with historical HTML user agents and XML-based user agents, ampersands used in a document that are to be treated as literal characters must be expressed themselves as an entity reference (e.g. "&amp;"). For example, when the href attribute of the a element refers to a CGI script that takes parameters, it must be expressed as http://my.site.dom/cgi-bin/myscript.pl?class=guest&amp;name=user rather than as http://my.site.dom/cgi-bin/myscript.pl?class=guest&name=user.

A.13. Cascading Style Sheets (CSS) and XHTML

DO use lower case element and attribute names in style sheets.

DO create rules that include inferred elements (e.g., the tbody element in a table).

DO explicitly set the style on the html element, since in XHTML CSS rules the style on the body is not automatially reflected there.

Rationale: These simple rules will help increase the portability of CSS rules regardless of the media type the document is processed as.

Original text:

The Cascading Style Sheets level 2 Recommendation [CSS2] defines style properties which are applied to the parse tree of the HTML or XML documents. Differences in parsing will produce different visual or aural results, depending on the selectors used. The following hints will reduce this effect for documents which are served without modification as both media types:

  1. CSS style sheets for XHTML should use lower case element and attribute names.
  2. In tables, the tbody element will be inferred by the parser of an HTML user agent, but not by the parser of an XML user agent. Therefore you should always explicitly add a tbody element if it is referred to in a CSS selector.
  3. Within the XHTML namespace, user agents are expected to recognize the "id" attribute as an attribute of type ID. Therefore, style sheets should be able to continue using the shorthand "#" selector syntax even if the user agent does not read the DTD.
  4. Within the XHTML namespace, user agents are expected to recognize the "class" attribute. Therefore, style sheets should be able to continue using the shorthand "." selector syntax.
  5. CSS defines different conformance rules for HTML and XML documents; be aware that the HTML rules apply to XHTML documents delivered as HTML and the XML rules apply to XHTML documents delivered as XML.

A.14. Referencing Style Elements when serving as XML

DO NOT use xml stylesheet declarations to identify style sheets.

DO use the style or link elements to define stylesheets.

Rationale: Since XML processing instructions may be rendered by some legacy user agents, using the standard XML stylesheet declaration mechanism may not work well. However, since XHTML user agents are required to process style and link elements and interpret stylesheets referenced from those elements, documents constructed to use them will work as expected.

Original text:

In HTML 4 and XHTML, the style element can be used to define document-internal style rules. In XML, an XML stylesheet declaration is used to define style rules. In order to be compatible with this convention, style elements should have their fragment identifier set using the id attribute, and an XML stylesheet declaration should reference this fragment. For example:

<?xml-stylesheet href="http://www.w3.org/StyleSheets/TR/W3C-REC.css" type="text/css"?>
<?xml-stylesheet href="#internalStyle" type="text/css"?>
<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>An internal stylesheet example</title>
<style type="text/css" id="internalStyle">
  code {
	color: green;
	font-family: monospace;
	font-weight: bold;
  }
</style>
</head>
<body>
<p>
  This is text that uses our 
  <code>internal stylesheet</code>.
</p>
</body>
</html>

A.15. White Space Characters in HTML vs. XML

DO NOT use the formfeed character (U+000C).

Rationale: This character is recognized as white space in HTML 4, but is NOT considered white space in XML.

Original text:

Some characters that are legal in HTML documents, are illegal in XML document. For example, in HTML, the Formfeed character (U+000C) is treated as white space, in XHTML, due to XML's definition of characters, it is illegal.

A.16. The Named Character Reference &apos;

DO use &#39; to specify an escaped apostrophe. DO NOT use &apos;.

Rationale: The entity &apos; is not defined in HTML 4.

Original text:

The named character reference &apos; (the apostrophe, U+0027) was introduced in XML 1.0 but does not appear in HTML. Authors should therefore use &#39; instead of &apos; to work as expected in HTML 4 user agents.

Appendix B. References

[ASCII]
"Information Systems -- Coded Character Sets -- 7-Bit American National Standard Code for Information Interchange (7-Bit ASCII)", ANSI X3.4-1986, 1986.
[HTML4]

"HTML 4.01 Specification", W3C Recommendation, D. Raggett, A. Le Hors, I. Jacobs, eds., 24 December 1999. Available at: http://www.w3.org/TR/1999/REC-html401-19991224

The latest version of HTML 4.01 is available at: http://www.w3.org/TR/html401

The latest version of HTML 4 is available at: http://www.w3.org/TR/html4

[HTML40]
"HTML 4.0 Specification", W3C Recommendation, D. Raggett, A. Le Hors, I. Jacobs, eds., 18 December 1997, revised on 24 April 1998. Available at http://www.w3.org/TR/1998/REC-html40-19980424
[HTTP]
"Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, J. Gettys, R. Fielding, J. Mogul, H. Frystyk, L. Masinter, P. Leach and T. Berners-Lee, June 1999. Available at: http://www.rfc-editor.org/rfc/rfc2616.txt
[MathML2]

"Mathematical Markup Language (MathML) Version 2.0", W3C Recommendation, D. Carlisle, P. Ion, R. Miner, N. Poppelier, eds., 21 February 2001. Available at: http://www.w3.org/TR/2001/REC-MathML2-20010221

The latest version is available at: http://www.w3.org/TR/MathML2

[MIME]
"Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, N. Freed, N. Borenstein, November 1996. Available at: http://www.rfc-editor.org/rfc/rfc2046.txt
[RFC2119]
"Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, S. Bradner, March 1997. Available at: http://www.rfc-editor.org/rfc/rfc2119.txt
[RFC2854]
"The 'text/html' Media Type", RFC 2854, D. Connolly, L. Masinter, June 2000. Available at: http://www.rfc-editor.org/rfc/rfc2854.txt
[RFC2912]
"Indicating Media Features for MIME Content", RFC 2912, G. Klyne, September 2000. Available at: http://www.rfc-editor.org/rfc/rfc2912.txt
[RFC3023]
"XML Media Types", RFC3023, M. Murata, S. St.Laurent, D. Kohn, January 2001. Available at: http://www.rfc-editor.org/rfc/rfc3023.txt
[RFC3236]
"The 'application/xhtml+xml' Media Type", RFC 3236, M. Baker, P. Stark, January 2002. Available at: http://www.rfc-editor.org/rfc/rfc3236.txt
[VALIDATOR]

The W3C Markup Validation Service available at http://validator.w3.org.

[XHTML1]

"XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition): A Reformulation of HTML 4 in XML 1.0", W3C Recommendation, S. Pemberton et al., August 2002. Available at: http://www.w3.org/TR/2002/REC-xhtml1-20020801

The first edition is available at: http://www.w3.org/TR/2000/REC-xhtml1-20000126

The latest version is available at: http://www.w3.org/TR/xhtml1

[XHTML11]

"XHTML™ 1.1 - Module-based XHTML", W3C Recommendation, M. Altheim, S. McCarron, eds., 31 May 2001. Available at: http://www.w3.org/TR/2001/REC-xhtml11-20010531

The latest version is available at: http://www.w3.org/TR/xhtml11

[XHTMLBasic]

"XHTML™ Basic", W3C Recemmendation, M. Baker, M. Ishikawa, S. Matsui, P. Stark, T. Wugofski, T. Yamakami, eds., 19 December 2000. Available at: http://www.w3.org/TR/2000/REC-xhtml-basic-20001219

The latest version is available at: http://www.w3.org/TR/xhtml-basic

[XHTMLM12N]

"Modularization of XHTML™", W3C Recommendation, M. Altheim, F. Boumphrey, S. Dooley, S. McCarron, S. Schnitzenbaumer, T. Wugofski, eds., 10 April 2001. Available at: http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410

The latest version is at: http://www.w3.org/TR/xhtml-modularization

[XHTML+MathML]
"XHTML plus Math 1.1 DTD", "A.2 MathML as a DTD Module", Mathematical Markup Language (MathML) Version 2.0. Available at: http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd
[XML10]

"Extensible Markup Language (XML) 1.0 Specification (Second Edition)", T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, eds., 6 October 2000. Available at: http://www.w3.org/TR/2000/REC-xml-20001006

The latest version is available at: http://www.w3.org/TR/REC-xml

[XMLNS]

"Namespaces in XML", T. Bray, D. Hollander, A. Layman, eds., 14 January 1999. Available at: http://www.w3.org/TR/1999/REC-xml-names-19990114

The latest version is available at: http://www.w3.org/TR/REC-xml-names

[XMLstyle]

"Associating Style Sheets with XML documents Version 1.0", W3C Recommendation, J. Clark, ed., 29 June 1999. Available at: http://www.w3.org/1999/06/REC-xml-stylesheet-19990629

The latest version is available at: http://www.w3.org/TR/xml-stylesheet

Appendix C. Changes from Previous Version

In 3.5. Summary, changed 'text/html' for HTML 4 as SHOULD rather than MAY.

Updated reference to XHTML 1.0 to refer to the Second Edition.