Doctypes and markup styles

From W3C Wiki

Introduction

The doctype appears just above the <html> tag, at the very start of each document you write:

<!DOCTYPE html>
<html lang="en-GB">
<head>
  <meta charset="utf-8">
  <title>My fabulous document</title>

   ... etc

The doctype explains what type of HTML is to be expected and therefore what spec validators (for example the W3C HTML validator) should validate your document against. In this article of the Web Standards Curriculum we will explore the different doctypes you are likely to come across on your journey around the Web, as well as looking at how XHTML and HTML differ.

Standards versus quirks mode

The doctype also serves to make the browser render the page in so called "standards mode". In standards mode, browsers generally try to render the page according to the CSS specifications — they trust that the person who created the document knew what they were doing.

On the other hand, if a browser finds an outdated, incomplete of missing doctype at the start of the page, they use “quirks mode”, which is more backwards compatible with old practices and old browsers. Quirks mode assumes that the document is old or that it has not been created with web standards in mind — it means that the web page will still render, but the browser will work a bit harder in doing so, and you’ll likely get a strange or ugly result, which you weren’t quite expecting. The differences are mostly related to how CSS is rendered, and only in a few cases about how the actual HTML is treated. As a web designer or developer, you will get the most consistent results by making sure that all browsers use their Standards rendering mode, hence you should stick to web standards, and use a proper doctype!

The HTML5 doctype

In this course, we are sticking with the HTML5 docype:

<!DOCTYPE html>

There is no disadvantage to using this doctype, and it is certainly a lot easier to remember than the others! This is the one we'd recommend you use going forward, as even if you don't plan to start using any of the new HTML5 features in your work yet, it will still validate most of the HTML 4/XHTML 1.0 features (there are a couple of exceptions where features have been removed from the spec, but as you'll learn later, validation is merely a useful tool, and not the be all and end all of everything), and will be future proof for when you do start using new HTML5 features. There are however, other doctypes that you may come across when working on various projects. Let's look at some of the others you might bump into.

The HTML 4.01 strict doctype

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

The HTML 4.01 strict doctype validates against the HTML 4.01 spec, although it doesn't allow any presentational markup or deprecated elements (such as <font> elements) or framesets to be used. It validates loose HTML style markup, such as minimized attributes and non-quoted attributes (eg required, rather than required="required").

The HTML 4.01 transitional doctype

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

The HTML 4.01 transitional doctype validates against the HTML 4.01 spec. It allows some presentational markup and deprecated elements (such as <font> elements) but not framesets. Again, it validates loose HTML style markup.

The XML 1.0 strict and transitional doctypes

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

and

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

These are the exact XHTML 1.0 equivalents of the HTML 4.01 doctypes we discussed above, so functionally they are the same, except that they won't validate loose HTML style markup: it all needs to be well formed XML.

The HTML 4.01 and XML 1.0 frameset doctypes

If you want to use framesets and still have your markup validate, you can use one of these two doctypes:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd">

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

They are functionally the same as HTML 4.01 transitional and XHTML 1.0 transitional respectively, but they allow the use of framesets. But please don't use framesets: they are evil and outdated. We don't even tell you have to use them anywhere in this course. Everytime you use a frameset on a web page, a fairy dies. So please don't. Think of the fairies.

Other doctypes

You may also occasionally see other doctypes on your travels, but they are not mentioned here because they are outdated. Don't use any other doctypes. Not by choice, anyway.

HTML versus XHTML syntax

Table 1 shows the main syntax differences between HTML and XHTML:

HTML XHTML
Elements and attributes are case insensitive, eg <h1> is the same thing as <H1>. Elements and attributes are case sensitive; they are all lowercase.
Certain elements don’t need a closing tag (eg paragraphs, <p>), while others (called “empty elements”) shouldn't have a closing tag (eg images, <img>). All elements must be explicitly closed (eg <p>A paragraph</p>). Elements without content should be closed using a slash in the start tag (eg <hr></hr> and <hr/> mean the same thing).
Attribute values may be written without being enclosed in quotes. Attribute values must be enclosed by quotes.
Shorthand can be used for certain attributes (ie <input required>). The full attribute form must be used for all attributes (eg <input required="required">).

Table 1: Differences between HTML and XHTML.

In terms of what syntax style you should use, pick something you are comfortable with. We'd recommend that you start off using strict XML syntax, as it is guaranteed to work in any situation. Later on you can adjust your style to suit, when you understand what you are doing a bit better. Just remember these rules:

  • If you are using the HTML5 doctype or an HTML 4.01 doctype, you can use whatever style you want
  • If you are using an XHTML doctype, you need to use XML well-formed syntax
  • Whichever style you are using, best practice is to:
    • Close all open elements, so for example <p>paragraph</p>, not <p>paragraph
    • Nest them properly, for example <p>paragraph with bold <strong>word</strong></p>, not <p>paragraph with bold <strong>word</p></strong>

This last rule is very important. If you don't do this, your HTML will not be well formed, and CSS and JavaScript may not work reliably, as they rely on having a well-formed Document Object Model (DOM). For more on the HTML DOM, read Growing trees.

Serving "real" XML

You may also be interested to know that most of the XHTML on the Web is actually HTML written with well-formed XML syntax. Even if the doctype is an XHTML one, it will be sent to the client as HTML unless you:

  • save the file with an .xhtml file extension
  • Include a line of code called the XML declaration before your doctype, which looks like this:
<?xml version="1.0" encoding="UTF-8"?>

We wouldn't recommend that you this however: old versions of Internet Explorer (6 and below) switch into Quirks mode if they find the XML declaration at the start of the document, which is bad, as we looked at earlier. In addition, IE 6-8 will try to download files saved as XHTML rather than display them in the browser, which you definitely do not want!

Therefore you should just stick to not trying to serve documents as proper XHTML! Carry on people, nothing to see here.

Summary

And that's it for the doctype - pretty much all you need to know.


Next: The HTML <head> element.