Choosing the right doctype for your HTML documents

From W3C Wiki
Revision as of 22:28, 21 July 2011 by Plehegar (Talk | contribs)

Jump to: navigation, search

Introduction

The previous article dissected the anatomy of the head section of an HTML document, looking briefly at what different things can be contained in the head, and what they do. In this Web Standards Curriculum article I will look at the doctype in a lot more detail, showing what it does and how it helps you validate your HTML, how to choose a doctype for your document, and the XML declaration, which you’ll rarely need, but will sometimes come across.

The doctype comes first

The very first thing you should make sure to have in any HTML document you create is a DTD declaration. If you haven’t heard anyone mention a DTD declaration before, don’t worry. For the sake of making things easier, it is often referred to as a “doctype”, which is what I’ll call it in the rest of this article.

You might be wondering what a “DTD” or doctype is. DTD is short for “Document Type Definition”, and among other things it defines what elements and attributes are allowed to be used in a certain flavor of HTML—yes that’s right, there are different versions of HTML in use on the Web today, but don’t let this worry you—you’ll only really need to concern yourselves with one.

The doctype is used for two things, by different kinds of software:

  1. Web browsers use it to determine which rendering mode they should use (more on rendering modes later).
  2. Markup validators look at the doctype to determine which rules they should check the document against (more on that later as well).

Both of these will affect you, but in different ways, which will be explained later on in this article.

Here is an example:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

Now, that may look like a lot of nonsense to you, so let me offer a somewhat simplified explanation of how it is constructed. For a much more detailed look at exactly what each character refers to, see the article !DOCTYPE.

The most important parts of the doctype are the two strings delimited by quotes. "-//W3C//DTD HTML 4.01//EN" states that this is a DTD document published by the W3C, that the DTD describes HTML version 4.01, and that the language used in the DTD is English.

The second string, "http://www.w3.org/TR/html4/strict.dtd", is a URL that points to the DTD document used for this doctype.

Even though a doctype may look a bit strange, it is required by the HTML and XHTML specifications. If you don’t include one you will get a validation error when you check the syntax of your document with the W3C Markup validator or other tools that check HTML documents for errors. Some web browsers even contain such functionality by default, while others can have it added by installing an extension.

Doctype switching and rendering modes

If you do not provide a doctype, browsers will handle and render the document anyway—they need to make an attempt to render all sorts of strange things that they come across on the Web, so they can’t always be very picky. However, without a doctype, the results may not look like you intended, because of something called “doctype sniffing” or “doctype switching”.

Most web browsers released in the 21st century look at the doctype of any HTML documents they encounter and use that to decide whether the author of the documents took care to write their HTML and CSS properly according to web standards.

If they find a doctype that indicates that the document is coded well, they use something called “Standards mode” when they layout the page. In standards mode, browsers generally try to render the page according to the CSS specifications—they trust that the person who created the document knew what they were doing.

On the other hand, if they find an outdated or incomplete doctype, they use “Quirks mode”, which is more backwards compatible with old practices and old browsers. Quirks mode assumes that the document is old or that it has not been created with web standards in mind—it means that the web page will still render, but it will take a lot more processing power to do so, and you’ll likely get a strange or ugly result, which you weren’t quite expecting.

The differences are mostly related to how CSS is rendered, and only in a few cases about how the actual HTML is treated. As a web designer or developer, you will get the most consistent results by making sure that all browsers use their Standards rendering mode, hence you should stick to web standards, and use a proper doctype!

Validation

As I mentioned earlier, the doctype is also used by validators, which you will learn more about later in this article series. For now, all you need to know is that a validator is used to check that the syntax of your HTML document is correct and does not contain any mistakes. The validator programs look at the doctype you have used to determine what rules to use. It’s a bit like telling a spell checker which language a document is written in. If you don’t tell it, it won’t know which spelling and grammar rules to use.

Choosing a doctype

So, now that you know that you need to insert a doctype and what it is used for, how will you know which one to choose? There isn’t just one, after all, there are many. You could even create your own if you feel up to something more advanced. But I’m not going to mention a whole lot of different doctypes. I’ll try to keep things simple and settle for two.

If your document is HTML, use this:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

If your document is XHTML, use this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Note: "Real" XHTML should be delivered to the web browser as XML, but the details of how and when to do that, and the implications it has, is beyond the scope of this particular article.

Both of these doctypes will ensure that browsers use their Standards mode when dealing with your document. The most noticeable effect that will have on your work is that you will get more consistent results when styling the document with CSS. To see some of the other doctypes that you could use, the W3C have published a list of Recommended DTDs to use in your Web document.

You may notice that both of the doctypes I mention here are called “Strict”. While that may sound a bit scary, it isn’t.

There are strict and transitional flavours of both HTML and XHTML. Strict in this case means that the doctypes allow less presentational markup than the transitional doctype does. The presentational markup that isn’t allowed shouldn’t really be there anyway, since you should use HTML to define the structure and meaning of your documents, and CSS to determine how they are presented. Using a strict doctype will help you with that, since the validator will alert you of any presentational elements or attributes that have sneaked its way into your code.

The XML declaration

I stated earlier that the doctype needs to be the very first thing in your HTML documents. Well, that is in fact a slightly simplified version of the truth. There is also the XML declaration to consider.

You may have seen a code snippet that looks like this before the doctype in some XHTML documents:

<?xml version="1.0" encoding="UTF-8"?>

This is called an XML Declaration, and when it is present it needs to be inserted before the doctype.

Internet Explorer version 6 has a problem with that—this causes it to switch into Quirks mode, and as I explained earlier you most likely do not want that.

Luckily the XML declaration is not required unless you are really sending your XHTML documents as XML to web browsers (see the sidenote about XHTML) *and* you are using a different character encoding than UTF-8 *and* your server is not sending an HTTP header that determines the character encoding.

The probability of all that happening all at once is quite slim, so the easiest way to solve the Internet Explorer problem is to simply omit the XML declaration. Don’t forget the doctype though!

Summary

Always include one of the doctypes mentioned here as the very first thing in all of your HTML documents. It will make sure that validators know what version of HTML you are using, so they can correctly report any mistakes you have made. It will also make sure that all recent web browsers use their Standards mode, which will give you more consistent results when you are styling the document with CSS.

Exercise questions

Here a few questions that you should be able to answer after reading this:

  • What are the two main purposes of including a doctype in HTML documents?
  • What are the benefits of using a strict doctype instead of a transitional one?
  • Why is the XML declaration problematic?
  • One doctype I haven’t mentioned in this article is the frameset doctype—research what this does, and why it shouldn’t be used.

Further reading

Note: This material was originally published as part of the Opera Web Standards Curriculum, available as 14: Choosing the right doctype for your HTML documents, written by Roger Johansson. Like the original, it is published under the Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.