W3C

HTML Classes of Products and Authoring

Rene Saarsoo has published a survey of Coding practices of Web pages. It contains a lot of very useful information for those who try to understand how the Web is authored in the wild. One of the major concerns of HTML WG is to try to design HTML 5 in a way which is mostly compatible with what authors mostly do on the Web.

It is not an easy task. There are different types of authors on the Web, and then different types of requirements for different products. A while ago, I posted on the mailing-list trying to work out some of the possible categories of products.

Web author (hand coding)

From the point of view of the author, HTML is a set of tags with a clear defined meaning (ex: ‘q’) or functional semantics (ex: ‘a’). Sometimes, the definitions given by previous specifications, books, tutorials, lead to misunderstanding and then the features are not properly used. They are many categories of HTML hand coders with different capabilities and knowledge. Some of the authors will see it just as a support for CSS for example and do not care that much about the meaning. Some will be very precise and be frustrated by the lack of defined elements.

Web author (wysiwyg)

By far this is the most common author on the Web, and basically, they do not know what HTML is at all. Most of these people use a form where they put simple text, sometimes enrich with javascript toolbar, some send html emails, some save their office document as web document to be loaded by the CMS.

CMS developer, scripting libraries.

HTML is a language that in the best case have some rules of nesting for tags and help to put content on a web page. It is something to put bits of content coming from a database on the Web. It is very rare that the semantics is understood or even care of. It is very rare to have CMS which puts a quality process in the publishing step. Their conception is more html fragment than document.

Web authoring Wysiwyg tool

HTML is a very difficult thing to implement. The specification in the past have not been defined for Wysiwyg tools. They had to produce a document which respects the syntactic rules of the language. But there is no or little guidance on implementing the language at the UI level. We have a tendency to define, right now, a lot more how to render and not that much how to create.

Web Visual Browser

From the point of view of a Web visual browser (and then its developers), it is a blurb of tags, most of the time not written very well. They have to parse HTML, Javascript, CSS rules, plug-ins to give something mostly usable by a random person on the Web.

Assistive Technologies Browser

They see HTML as a powerful language to give easily access to content for people who had no access to it in the past. Giving access to a paper book to someone who is blind has a high cost, it becomes easy on the Web. Though it is also difficult to implement a useful tool because not many Web authors and CMS care for accessibility. So people themselves using these browsers fill the gap when they can by using their own skills and intelligence.

Strange world. It is not a uniform world. They are at least two big sub-classes:

Web search services (Yahoo!, MS Live, Google and Quaero)

For those, they need to parse the web content which is not only html and which is mostly a few tags and a lot of content. They are interested by links and some of the meaningful tags but not that much.

Web search engines (ht://Dig, Nutch, etc.)

More skilled and more powerful, they are used on corporate, academic, personal Web sites. They are crafted to index all kind of metadata and semantics. HTML is a fully meaningful language. It helps users on the Web to have a more precise answer within the context of a corporate site. Initiatives like explicit data (RDFa, microformats), metadata in head, etc. are very important for them. Some of these engines work on the Desktop and then are a tool for desktop users (Spotlight (Apple) for example.)

Validators, Conformance checker, Helping tools

HTML is a set of rules and definitions, that helps to define if the document is in contradiction with these rules. Some of the rules can be checked easily, can be processed by a machine, some others are a lot more difficult.

Other Specifications

HTML is a set of rules and syntactic constraints with a defined semantics that can be used, be encapsulated in another technology.

2 thoughts on “HTML Classes of Products and Authoring

  1. Standards, in short, must adapt to the ‘Chaos’ theory? [rhetorical] Glad I don’t have your job.

    Further evolution of two standards: HTML [chaos] and XHTML [linear and controlled]?

Comments are closed.