Shorttags - the odd side of HTML 4.01

Author(s) and publish date

By:

Olivier Thereaux

Published:

9 October 2007

HTML 4.01 or XHTML 1.0? The choice between the two popular ways of authoring for the web seldom yields a clear answer: after all, the two languages share the same semantics, and the differences are mostly about the writing style.

Advocates of the XHTML style will hail the potential of XML for transformation and processing. Advocates of HTML 4.01 will generally reply that Internet Explorer, as of today, does not recognize the preferred media type for XHTML. As a result, most people serve XHTML in a way tantamount to serving tag soup to browsers: in that logic, using HTML 4.01 is the actually "strict" choice.

Both are quite correct, but for anyone authoring (X)HTML by hand, there is one very good reason, often overlooked, to prefer the XHTML syntax to the "classic" HTML one: shorttags.

Shortwhat?

Let's look at the following piece of HTML markup.

    <p<a href="/">first part of the text</> second part

Now for the surprising part: The above is proper HTML. Valid, conformant, everything. It uses an ill-known feature of SGML called shorthand markup, which was authorized in HTML up to HTML 4.01. But what used to be a "cool" feature for SGML experts becomes a liability in HTML, where the construct is more likely to appear as a typo than as a conscious choice.

All could be fine if this form typo-that-happens-to-be-legal was properly implemented in contemporary HTML user-agents. It is not. In the example above, </> is supposed to close the <a> element. In most browsers today, it does not, and the second part of the text will be part of the link, when it should not. try it.

validation as helping tool

This is reason enough for me, as clumsy author of HTML, to prefer the XHTML document types, notwithstanding all the media type debate: validation is an incentive to keep my code clean. XHTML forces me to close my elements, put my attributes behind quotes, and it won't let disruptive typos pass as valid.

That does not mean we are leaving HTML 4.01 authors in the awkward company of shorttags: since the HTML specification lists these as not recommended, in the upcoming release of the Markup Validator, detected usage of shorthand markup will be signaled as a warning. SGML hackers can still use it at their own risks. Others will be warned about, and advised to fix, their typos.

Related RSS feed

Subscribe to our blog feed

Comments (9)

Anne van Kesteren - 9 October 2007 at 06:29:26 UTC

This is simply a problem with HTML 4.01 not being realistic and the validator assuming it is (and being based on SGML tools). There's an HTML5 validator project that no longer has this problem and the same goes for the HTML5 specification which carefully defines a custom HTML syntax that is more realistic.
David Hammond - 9 October 2007 at 12:56:41 UTC

This is precisely why I developed the HTML 4 Good Practice Checker (http://www.webdevout.net/test?html4-good-practice). This is a tool that modifies the way the W3C HTML Validator handles the document so that it enforces rules which normally aren't enforced. Using this tool on an HTML 4 document, you will see "validation errors" for things like shorttags, missing end tags (where allowed), unquoted attribute values, etc.
Daniel Aleksandersen - 9 October 2007 at 17:36:05 UTC

The example code and example page works fine in Opera 9.5 alpha, build 1589. Says something about which browser is better. ;-)
Olivier Théreaux - 10 October 2007 at 05:59:34 UTC

@David Hammond: Cool. Replacing the document type with a stricter DTD is quite a clever solution. The next version of the markup validator (coming tomorrow if all goes well) will give warnings for shorttag constructs, since they are, strictly speaking, allowed. Yielding errors for conforming but not recommended practices tends to make some people very, very angry ;)...
@Anne: Most HTML specifications were standardizing the current state of development in different browser projects, trying to bring consensus between all the players when implementations or goals differed.
Back to 1995:
HTML 2.0 becomes IETF Proposed Standard
This is an effort (started in 1994) to create a specification for interoperablility among implementations of HTML.
… Does that sound familiar? Not surprising, the development of the web is following a cycle. Right now we're in that part of the cycle where those wanting to make a clear map of existing implementations have the upper hand. Sooner or later, we'll get to the other side of the cycle, the innovation side:
HTML 3.0 internet draft
"provides additional capabilities over previous versions such as tables, text flow around figures and math."
Since the days of early HTML however, Quality Assurance, and in particular the requirements for interoperability and implementation, have improved. Had this been around in the early days of the web, HTML4.01 would have been better, better implemented. The way HTML5 is being built from the current implementations has nothing to do with being realistic: it is - in a laudable way, and from the ground up - following a better, contemporary quality process.
David Hammond - 11 October 2007 at 11:58:11 UTC

The warning is a great idea. I would, however, like to see a warning also given to people who use XHTML as text/html, informing them that typical browsers won't parse the document as XML, and there could be differences in parsing, CSS, and scripting depending on how the user agent treats it. Unfortunately, if you have your browser set up to parse all "XHTML" pages as real XML, most websites show significant problems, including the websites of many "standards experts". Check out this list: http://www.webdevout.net/articles/beware-of-xhtml#broken_xhtml
This is an issue that is not really understood by most web developers, and putting it right on the validation page would help bring it to more people's attention.
Anne van Kesteren - 12 October 2007 at 18:28:17 UTC

olivier, sure, but implementations never implemented SGML so basing HTML on it doesn't seem like it ever reflected interoperability of anything.
Thomas Mayer - 21 October 2007 at 05:41:24 UTC

I personally stay with HTML because Internet Explorer is not the only user agent that can't handle XHTML's mediatype. Lynx for example isn't able to as well.
Also sending XHTML as text/html defeats the purpose for me. And this is more problematic than even most "experts" think. Only recently I found out why DOM-1-Methods won't work in (real) XHTML.
Writing good HTML doesn't seem very hard to me. And the HTML 5 WG suggests validationg HTML 5 will be better than validation HTML 4 it seems the only proematic thing I can see with HTML 4 is going to go away.
???? ??????????? - 8 November 2007 at 11:46:06 UTC

Latest available stable Opera version does not display the thing correctly (no line-break, no link).
4c - 21 March 2008 at 18:05:04 UTC
As tags can only be closed in the reverse order of being opened, tag names in end tags are useless in HTML.

I also think that every tag should be just that, and attributes should be instead expressed with child elements.
A location is always a location, for example take:

<a href="gold.html"><img src="gold.png"/></a>

Would't this be more semantic and make more sense:

<a><url>gold.html</><img><url>gold.png</></></>