One of the things which made the web so popular since its first days was its easy access: HTML was simple. Anyone could write a web page. This is still true, to some extent, and thanks to a number of Web Authoring tools, or services such as Wikis, Blog software and CMSes, anyone can create a Web Page. But the Web technologies got richer: CSS, scripting, the DOM, SVG, widgets… From this increased richness and complexity rose a new group of people: the Web Professionals.
For the outside eye, Web Professionals are pragmatics, knowledgeable of technologies. They know Web Architecture, their bedside reading are the W3C Specifications. But the insiders know that the Web Professionals are a highly dedicated and disciplined cast, following age-old teachings of the Web Standards ? – the Way (or Tao) of Web Standards, striving to achieve the seven virtues.
Disclaimer: this article is a humorous look at principles of Web quality, viewed through the filter of the Bushido, the Samurai’s code of Honor. It is a companion to a talk given at the Days of Web Standards Conference, in Tokyo on July 15th, 2007. The metaphor should be taken with a smile, the principles of Web Architecture it showcases, seriously.
Update 2007-07-28: the slide set for the talk (PDF, 15MB) are now available.
The seven virtues of Web Standards ?
? – Honesty: Use Semantic Markup for profit
Most professional creators of Web content will certainly cite “valid markup” as one of the things they care about a lot when working on the Web. This is an apt goal, one that brings a lot of benefits: among other benefits, it makes content more portable across platforms, and easier to style consistently.
But there are benefits to using HTML to its full potential way beyond validity. HTML is a well-structured language providing meaning to its different elements, and making good use of semantics can reap a lot of benefits. For example:
- Using semantic elements instead of styling unstructured markup (e.g using headings instead of bold text) will make the content easier to index by search engines, and thus easier to find on the Web.
- Declaring the language of a document or a block (e.g
<html lang="ja" xml:lang="ja">) will allow tools and external services to know that your content is in this language: voice browsers can adopt the proper voice setting; and some services will even automatically provide a free translation of the content.
<link rel="">construct can be used for smooth navigation in collections of documents. Some browsers will also pre-fetch documents linked this way, resulting in a faster, more pleasant user experience.
- Rich markup can also be queried, reused, rehashed: GRDDL can be used to extract and reuse data from documents that use rich markup such as RDFa or Microformats (learn how).
Many web sites invest a lot of time and money building complex APIs to access their information, when often, they could simply use rich, semantic HTML markup: HTML can be a cheap and efficient API.
? – Respect/Etiquette: Use HTTP for Content/Language Negotiation
In a social context, etiquette is the art of acting and communicating in a manner appropriate to the context, and taking into consideration who you are communicating with. This virtue can be followed in our usage of the Web technologies: when serving Web content, it is important to take into account the capabilities and preferences.
That does not mean browser sniffing, which is the act of serving different content based on the detection of such browsing engine or other. Instead, HTTP provides mechanisms for a user agent to declare what types of content are supported and prefered (a feed reader, for instance, will claim a preference for the Atom format, while a graphical browser will typically prefer HTML), what languages are acceptable, and prefered (based on user preferences).
Using this Language-Negotiation technology we can provide a single resource under a single URI, but still serve it in different languages. For example when using the CSS Validator:
- Tom, who speaks english as a mother tongue and whose browser sends the
Accept-Language: enwill see a page in english.
- Tomoyo, who speaks Japanese and German and whose browser is set up to send
Accept-Language: ja, de;q=0.8will get the page in Japanese.
- Finally, since Tommi speaks fluent English but who prefers Finnish, his browser will send the accept-language headers reflecting his preferences:
Accept-Language: fi, en;q=0.9. The CSS validator not being available in Finnish, he will receive his second choice, that is, English
Another Benefit? Even though they may all see the page in different languages, Tommi, Tomoyo and Tom can all link to the same resource, and exchange links, and the content will automatically be adapted to them.
To learn more about language negotiation, find out how to set up language preferences in browsers and enable negotiation on a Web server, see The techniques on the W3C Internationalization Web site.
?- Benevolence: Use Caching capabilities to save time and money
Most Web professionals know and fear this scenario: A Web site is getting some attention. Visitors are flowing to it, users are mashing up its rich, interesting content. But the Systems administrators worry. The servers don’t seem to cope with the load very well. They’ll have to get some budget for a new server, and replication will be complicated. The site becomes slow, hardly usable. Users start to walk away and use the competition, which may not be as cool, but at least, they work. Before the budget for a new server can be granted, it’s too late: the site lost an opportunity to go from cool to successful.
Scalability is a complex issue, and sometimes its problem can not be avoided. But often, they can be avoided altogether, or alleviated, by making a smarter usage of Web technologies.
Smaller page weight can have a dramatically positive effect on page load times: this is one of many reasons to use clean, structured markup and CSS stylesheets rather than bloated presentational tag soup.
But there’s a part of the equation too often overlooked: caching. Images and style sheets seldom change: are you sure your Web server properly tells browsers, proxies and search engines that they are not changing, and yet should be considered “fresh”? Even dynamically generated content has a certain life span, and there are techniques to reflect that in how they are served, to ensure that the server-heavy dynamically generated content will not be requested in vain, when a cached copy would have worked.
This practice is a win-win solution for the server and the client:
- For the server, this reduces network traffic dramatically. Large sites can save gigabites of bandwidth per week with a simple caching of static documents, stylesheets, and especially images, videos and multimedia content. Fewer requests also means less loaded servers, and faster response times.
- For the client, this simply means faster page loads. Stylesheets and layout images, for instance, are loaded once and for all, making browsing faster, providing a better user experience. Remember the findings of Jakob Nielsen: wait more than a second, and you are already losing users.
How is this done?
- Switching on caching in a directory where your images or stylesheet lie, on a server like apache, is as trivial as adding a handful of lines to your configuration or .htaccess. With the Apache server the mod-expires module, if enabled, can take take care of sending Last-Modified, Expires and Cache-Control headers for you.
<IfModule mod_expires.c> <Directory /path/to/my/staticstuff/> ExpiresDefault "modification plus 4 weeks" </Directory> </IfModule>
- php scripts are often used to draw content for a database. If the database has a field with the timestamp of the last relevant change, this can be forwarded to the user-agents. For example, Simon Willison’s method:
<?php $last_modified = substr(date('r', $timestamp), 0, -5).'GMT'; $etag = '"'.md5($last_modified).'"'; // Send the headers header("Last-Modified: $last_modified"); header("ETag: $etag"); ?>
or, if you just want browsers to keep content in cache for a day:
<?php Header("Cache-Control: must-revalidate"); $offset = 60 * 60 * 24 * 3; $ExpStr = "Expires: " . gmdate("D, d M Y H:i:s", time() + $offset) . " GMT"; Header($ExpStr); ?>
- for more techniques with Apache, other servers and also scripting languages such as php, a must-read reference is Mark Nottingham’s site on Caching, with explanation and code samples.
? – Courage: Use XML as a testing tool
HTML or XHTML? Contemporary proponents of HTML will state that “it works fine”. Which is mostly true. Fans of XHTML will praise its XML-based syntax, and the possibility to use XHTML with XML-based editors, processors, and use transformations through XSLT. That, too, is true.
The tricky situation is not a problem of format (tags and brackets and slashes), but one of serving content: the current Web market has one major browser holding a large percentage of users, and it will not recognise XHTML when served as XML. That is, if a server is set up to deliver XHTML with the internet media type
application/xhtml+xml, this browser will simply refuse to show the content, instead offering to download it. While there is hope that this may be fixed in the future, it is unacceptable for most Web content providers to consider serving XHTML with its XML media type, and instead, most people today serve XHTML “as compatible with legacy HTML“, that is, with the
text/html media type.
With most XHTML content being served “as HTML”, XHTML detractors argue that it is pointless to use XHTML in the first place. Endless debates ensue.
Debates are fun, but we are talking business here: in the current situation, what can a Web team do with XHTML, to make better Web sites, and grow more successful businesses?
- Use it! Even if your public web server, in the end, serves content as “text/html”, you can still use an XML-based process, XML tools and editors and transformations, in your creation process.
- Use it for Quality Control! Unlike HTML engines, XML processors are supposed to be very strict with the syntax they accept. This is sometimes used as an argument against serving XHTML as XML on a public web site, because errors in the site’s markup will result in XML-aware browsers to throw an error and refuse to display the pages. No-one really wants to inflict that on their users. However, this is a great, strict, quality control tool for a staging server: set your test Web server to serve XHTML content with its XML media type, browse your test site with any XML-aware browser (most open source graphical browsers of today are), and you will quickly spot syntax mistakes in your code.
- Use your economical power to push for it! There is a large economy behind the Web, and this can be leveraged to lobby for better implementations. It has been done successfully in the past, with groups such as the WaSP pushing for better interoperability, better CSS support in modern browser. Does the Japanese Web market want features such as Ruby Annotations, because it would be extremely useful for displaying “furigana”? This feature is present in XHTML 1.1, but ill supported by the modern lineup of browsers, partly because of lack of support for
application/xhtml+xmlmedia type. The market has the economical leverage to push for a better Web ecosystem, and for the sake of better Web business, it should use that leverage to pressure for better specs and implementations.
? – Rectitude: Use tools and processes to fix issues before they hurt
Mieux vaut prévenir que guérir.…
An ounce of prevention is worth a pound of cure.…
Fixing issues as early as possible, even before they appear, is so deeply ingrained in most cultures that its application on the Web should be a no-brainer. Yet, the process of Quality Assurance, in most cases, comes right after development and just before the Web site’s release… Many sites skip that step entirely, and fix issues when users start complaining.
And then the site evolves. Content gets modified, sometimes broken. Users get to input their own content, sometimes with dangerous consequences.
Can we do any better? Can we use our common wisdom to prevent rather than fix?
There is no perfect solution, only a right attitude, and good reflexes. Here are a few ideas:
- In all the tools you develop, try to include some atomic quality checking. When buying the tools, or using free software, ask for features such as quality checking, consistency check.
- Test XHTML markup with XML-aware browsers as soon as possible
- Include input validation checks and filters for every user input method to your site
- Use a permanent, step-by-step quality checking and fixing process rather than big installment of validation and cleanup. We have Tools to help you do that.
?? – Loyalty: Use a consistent URI space for discoverability and trust
What happens when a restaurant, which had been open for a few years at a given address, closes? Residents of Tokyo will likely know the answer: it is replaced by a hair salon. But what happens to the faithful customers of that restaurant, if no-one gives you the new address of their favorite spot? Many will just find another place to eat. Others, but few, may eagerly look for the restaurant’s new address. And meanwhile, in its new location, the restaurant staff will probably be struggling to build up a customer base from zero.
For a Web site too, a persistent URI space means visitors will be coming back: they will bookmark, they will link. Search engines will discover the resources, and will tend to give extra credit to stable contents that have been around for a while. On the Web, longevity induces a virtuous circle of popularity.
This does not mean that a site should be static and stale. The web world tends to proceed on a rather short life cycle, and we want sites that evolve and get reborn every few years with new, exciting styles and features. But that is perfectly compatible with keeping the URI space consistent, so long as the “brand new” site:
- knows the URI space of the old site
- maps updated content with known URI for equivalent content, through proxying, mapping, redirection
How can this be done?
?? – Honor: Use Web Standards for better business
Following the bushido, the way of the warrior, the samurai class of ancient Japan were striving to achieve what they considered canonical virtues: Honesty, Respect, Benevolence, Courage, Rectitude, Loyalty, and Honor. But these values were not sought only for their idealistic value: following them allowed the class of warriors to grow strong, powerful, respected. Honored. Being virtuous and honest was the path to solid alliances: the virtues, not merely beautiful, were the way to achieve the samurai’s goals in the old feudal Japan.
Today, the Web Standards are considered by most of the Web Industry to be “the way to go”. Yet, adoption of Web standards for the mere beauty of it is, largely, missing the point.
Web Standards and Web Architecture are not just a pleasant, fuzzy concept. They are solutions made to build robust, efficient and long-lasting technologies:
- Rich, standard markup means easy, cheap reuse of Web content.
- A smart usage of HTTP standard technologies means saving a lot in infrastructure costs, and providing rich content to a diverse audience without needing to build, or pay for, complex systems.
- Standard technologies can be used for a smarter, more efficient quality process, saving labor cost, allowing engineers to focus on what matters.
- Persistent, well-addressed Web content means that your users will come back.
This is all practical business sense: making good content, building trust and brand attachment, building services on solid foundations… saving and making money. Web Standards lead to good business: such is the Way of Web Standards.
Show the Way
Is there anything we forgot? Anything we got wrong? If you have data, numbers figures, experience proving or challenging the assertions of this article, we would love to hear about them.
Also, while this article shows some techniques and gives pointers to more, you certainly have your own techniques and practices. How about sharing them?
Please use the comments form below for your feedback and ideas.