Serving HTML & XHTML

This article very briefly describes some aspects of how XHTML is sent from the server to the user agent (eg. a browser), and how common user agents handle the markup they receive. It describes implementation-specific issues, rather than W3C standards.

These topics have a bearing on how to declare the character encoding of an HTML or XHTML document. This information is also helpful in explaining why some aspects of CSS styling do not appear as expected, or vary from user agent to user agent.

MIME types

When a server serves (ie. sends) a document to a browser it also sends some additional information with the document, called the HTTP header.

The Content-Type field of the HTTP header describes what type of data format this is. This information is expressed using a MIME media type label. Here is an example of an HTTP header for an HTML file using the MIME type text/html. Note that the Content-Type entry can also express the character encoding of the document.

HTTP/1.1 200 OK
Date: Wed, 05 Nov 2003 10:46:04 GMT
Server: Apache/1.3.28 (Unix) PHP/4.2.3
Content-Location: CSS2-REC.en.html
Vary: negotiate,accept-language,accept-charset
TCN: choice
P3P: policyref=http://www.w3.org/2001/05/P3P/p3p.xml
Cache-Control: max-age=21600
Expires: Wed, 05 Nov 2003 16:46:04 GMT
Last-Modified: Tue, 12 May 1998 22:18:49 GMT
ETag: "3558cac9;36f99e2b"
Accept-Ranges: bytes
Content-Length: 10734
Connection: close
Content-Type: text/html; charset=utf-8
Content-Language: en

The text/html MIME type is the normal choice for HTML files. A browser that receives a file with this MIME type will assume that the markup follows the HTML syntax, and will use an HTML parser to interpret the meaning of the markup.

Unlike HTML, XHTML is an XML-based markup language. The syntax of XML is slightly different from that of HTML, and XML processors are less forgiving if you make mistakes. XML-based content development puts a lot of emphasis on well-formedness and validity, and can be readily integrated with all the processing tools, data, and automation available in the XML world. Many developers prefer to use XHTML because of the advantages XML's rigor brings for editing or processing of documents.

To send XHTML markup to a browser with a MIME type that says that it is XML, you need to use one of the following MIME types: application/xhtml+xml, application/xml or text/xml. The W3C recommends that you serve XHTML as XML using only the first of these MIME types – ie. application/xhtml+xml.

When a browser reads XML it uses an XML parser, not an HTML parser.

Unfortunately, up to and including version 8, Internet Explorer doesn't support files served as XML, although a number of other browsers do. To get around the fact that not all browsers support content served as XML, many XHTML files are actually served using the text/html MIME type. In this case, the user agent will read the file as if it were HTML, and use the HTML parser.

Since the browser considers the XML to actually be HTML, you need to take into account some of the differences between the two formats when writing your XHTML code, to ensure that the differences between XML and HTML syntax do not trip up the browser. This includes different ways of declaring the character encoding or language declarations inside the document.

Appendix C of the XHTML specification recommends a small number of compatibility guidelines when serving XHTML as HTML. These compatibility guidelines are particularly important for legacy versions of browsers. They recommend, amongst other things, that you leave a space before the '/>' at the end of an empty tag (such as img, hr or br), that you use HTML's lang attribute as well as XML's xml:lang attribute, that you always use both id and name attributes for fragment identifiers, etc.

'Standards' vs 'Quirks' modes

Current mainstream browsers may display an HTML file in either standards mode or quirks mode. This means that different rules are applied to the display of the file, one conforming to an interpretation of expected behavior according to W3C standards, the other to expectations based on the non-standard behavior of older browsers.

In the latest versions of major browsers, standards mode is turned on by the presence of a DOCTYPE declaration. Lack of a DOCTYPE can lead to different rendering from browser to browser.

The screen captures below illustrate some of these differences.

Picture of a file displayed in standards mode.
A document rendered in standards mode.
Picture of a file displayed in quirks mode.
The same document rendered in quirks mode.

Click on the pictures to see the actual HTML pages. If you view in Internet Explorer you will see the same effect.

The two pictures show two pages with exactly the same markup and CSS styling, apart from one thing. The only difference between the source of the two files is that the one on the left has a DOCTYPE at the top, and the other doesn't. A file with an appropriate DOCTYPE declaration should normally be rendered in standards mode by recent versions of most browsers. No DOCTYPE, and you get quirks.

The visual differences illustrated above arise from the following implementation differences in a browser such as Internet Explorer:

The original use of the DOCTYPE is to indicate the definition of the markup language. The following shows the source text with the DOCTYPE declaration at the top (highlighted in red italics).

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
	  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xml:lang="en" lang="en" xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 
    <title>XHTML document</title> 
    <style type="text/css">
    body { background: white; color: black; font-family: arial, sans-serif; font-size: 12px; }
    p { font-size: 100%; }
    h1 { font-size: 16px; }
    div { margin: 20px; width: 170px; padding: 50px; border: 6px solid teal; }
    table { border: 1px solid teal; }
    </style> 
    </head> 

<body> 
    <h1>Test file for Standards/Quirks</h1> 
    <div>
        A div with CSS width:170px, margin:20px, padding:50px and border:6px.
        </div> 
    <p>Text in a p element.</p>
    <table> 
        <tr><td>Text in a table.</td></tr> 
        </table>
    </body> 
</html> 
			

Browsers that switch in this way between standards and quirks modes are often said to do DOCTYPE switching.

It is generally a good idea to always serve your pages in standards mode – ie. always include a DOCTYPE declaration.

The XML declaration and DOCTYPEs

There is one aspect of using DOCTYPEs that is critically important for character encoding declarations and for predictable styling results.

Because XHTML 1.0 is based on XML, it is possible to add an XML declaration at the beginning of the markup, even if it is served as HTML. This would make the top of the above file look like this (the XML declaration is highlighted in red italics):

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
	  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xml:lang="en" lang="en" xml‎ns="http://www.w3.org/1999/xhtml">
<head>
...

In browsers such as Internet Explorer 7, Firefox, Safari, Opera, Google Chrome, and others, with or without the XML declaration, a page served with a DOCTYPE declaration will be rendered in standards mode.

With Internet Explorer 6, however, if anything other than a byte-order mark appears before the DOCTYPE declaration the page is rendered in quirks mode.

If Internet Explorer 6 users still count for a significant proportion of your intended audience, this may be an issue. If you want to ensure that your pages are rendered in the same way as on all other standards-compliant browsers, you need to think carefully about how you deal with this.

Obviously, if your document contains no constructs that are affected by the difference between standards vs. quirks mode this is a non-issue. If, on the other hand, that is not the case, you will have to add workarounds to your CSS to overcome the differences, or omit the XML declaration.

Note that if you decide to omit the XML declaration you should choose either UTF-8 or UTF-16 as the encoding for the page. (See Declaring character encodings in HTML for more information about the impact on encoding declarations.)