From HTML WG Wiki
Jump to: navigation, search

It seems that many Web authors have argued in favor of having stricter rules for writing HTML. It would be beneficial for this group to know what kind of syntax people would like to see and document how much benefit it gives to them.

It will not be necessary mandatory but could be a good guide for Web Pro authors as a platform for writing HTML.

text/html documents

It is possible to author text/html documents that follow the rules of XML well-formedness, although this is not a requirement. This document offers a non-biased comparison of the options, but does not favour either. The decision about syntax rests with the author.

text/html documents with "classic" syntax

"Classic" HTML syntax is fine in text/html documents (not application/xhtml+xml, see below). Element and attribute names are not case sensitive (element names are traditionally written in UPPERCASE). Well-formedness is not a requirement of HTML, many elements have optional end tags (for example, a <TD> element is known to have ended when the next <TD>, <TH> or <TR> is encountered) and empty elements do not require the '/>' shorthand.

Advantages of HTML syntax:

  • concise, saves bytes
  • consistent with HTML 4 syntax
  • familiar for HTML authors
  • Since the author knows it is not XML, (s)he is less likely to get confused by things that don't mean what they mean in XML

Disadvantages of HTML syntax:

  • inconsistent with XHTML (and by extension XML)
  • unfamiliar for XML and XHTML authors (Does not matter much when writing.)
  • complicated migration to XHTML (In practice, though, dependence of e.g. document.write() is a much bigger obstacle to migration)
  • it can be difficult to read complex markup with option close tags omitted

Recommended for authors who solely work on HTML documents. Offers a concise, forgiving markup syntax that works.

text/html documents with a syntax that would be well-formed if parsed as XML

As of HTML 5, it is now possible to author text/html documents using a syntax subset that would be well-formed if treated as XML.

Please note that doing so does not automatically convert HTML to XHTML documents; such documents will continue to be processed as text/html unless the MIME type is changed (usually through a filename extension change or HTTP server MIME type header).

The one XML-like syntax that cannot be deployed in text/html documents is the use of explicit closing tags for elements required to be empty (such as <link>, <meta>, <base>, <img>,
, and
). Unless deployed with an actual XML MIME type, these elements must not have explicit close tags (e.g., <meta></meta>). Use the abbreviated self-closing tag instead (e.g., <meta />).

Advantages of XML syntax:

  • works in all browsers
  • easy migration to application/xhtml+xml (unless we define an inconsistent DOM across the two HTML5 serializationsfor things like document.write(), etc.)
  • syntax allows easier reading of source particularly for complex markup
  • easier to explain and teach because it is more consistent
  • it is a relatively well-beaten cowpath
  • it is huge use case. It is an industry.

Questionable advantages of XML syntax:

  • consistent with XHTML syntax (on the face of it, but there are things that aren't really consistent; see disadvantages)
  • familiar for XHTML and XML authors (except for things that don't mean what they mean in XML; see disadvantages)
  • rigid syntax (fewer rules/exceptions simplifies code production and validation) (except the exceptions are still lurking there for things that don't mean what they mean in XML; see disadvantages)
  • closing tags can help CSS work. Styles allow us to create rules based on tags. Close tags are needed to tell a UA when to stop applying a style rule. If you omit the closing tag the UA may not know when to stop applying the rule. (This is simply not true of the kind of tag omission that the current HTML 5 draft permits. The end points are unambiguous.)
  • closing tags can help DOM scripting work. See Jonathan Snook link below. (the Jonathan Snook link talks about stuff that is forbidden in the "classic HTML" syntax anyway)

Disadvantages of XML syntax:

  • verbose, uses more bytes
  • Some pieces of syntax that look like XML don't actually mean what they mean in XML. (e.g. <div/> or <![CDATA[ or

      ; on the other hand the problem of something like "

        " is really a problem with the current HTML5 draft and it's DOM/XML centeredness on issues such as content model etc. Also CDATA sections an PIs can easily be avoided, so this really falls under the disadvantage starting out "may mask subtle difficulties")
      1. syntactic aspects of XML will not work "as expected" (i.e. as they do in XML) as they are undefined in HTML e.g. CDATA sections, processing instructions, …
      2. may mask subtle difficulties from authors who later deploy content as actual XML

      Questionable disadvantages of XML syntax:

      • inflexible syntax (both syntaxes have their own rules and, when violated, trouble can result)
      • unfamiliar for HTML authors (HTML without needing to remember when quotes and tags may be omitted is not really unfamiliar to HTML authors)

      Recommended for authors who must also deal with XML documents on a regular basis: offers familiar, consistent syntax. Recommended for authors considering migrating to application/xhtml+xml in the future without the help of a parser and serializer.

      Related References:

      application/xhtml+xml documents

      application/xhtml+xml documents with "classic" HTML syntax

      XHTML documents must use well-formed XML, HTML syntax will render such documents not well-formed. Use text/html with classic syntax instead.

      application/xhtml+xml documents with XML syntax

      XML syntax is the only option for authoring application/xhtml+xml documents. For further information please refer to the HTML and XHTML Frequently Answered Questions.