Questions and Answers about the HDL Proposal

This document is a companion piece to the proposal for a Hypertext Delivery Language (HDL) to be developed upon an existing standard, SDL. It poses and answers some of the more basic questions relating to HDL and its relationship to other delivery methods.

What would be the relationship between HDL and authoring languages?

Like SDL, the HDL contemplated by the proposal is not intended to be an authoring format but rather the target of a conversion process from some other format optimized for authoring or data retrieval. The most obvious source format for HDL would be HTML, presumably HTML 3.0, which is here assumed to be a refinement of the existing HTML 2.0 without the addition of typographical controls. A conversion script to produce HDL from HTML is obviously a prerequisite to this plan, but such a script would be relatively easy to develop.

It may be wondered what advantages a conversion process would have over the implementation of format controls directly in HTML. There are several aspects to this question.

First, it is necessary to distinguish between the need to control the typography of documents in order to produce a distinctive appearance and the need to adopt a completely different appearance for each document. The former is quite common; the latter is quite rare (except in advertising). Typically, individual users and organizations seek to develop a distinctive look for their documents and demand the controls necessary to achieve the visual effect required to distinguish their documents from those produced by others. Once having achieved such an effect, however, this distinctive look becomes associated with individual or corporate identity and is seldom changed. It is often necessary to develop a suite of styles for different subdocument types -- the different sections of a newspaper or magazine, for example -- but once developed, such styles change infrequently.

A second, complementary observation is that when styles do change, they tend to change massively, and such change creates legacy conversion problems if it cannot be accomplished at a single point of control. Style changes can most reliably be performed if the original format-independent tagging is preserved.

For both of these reasons, it is far better in practice for typographical specifications to be compartmentalized in a single place than to be scattered throughout source documents. This is why it has long been considered good practice in the publishing world to confine all typographical specifications to named sets in stylesheets rather than applying unnamed specifications to individual elements.

The conversion model for HDL production would implement this practice in the world of WWW delivery. Documents would be prepared in a markup language free of formatting instructions (HTML), thus decoupling them from future changes to their overall visual identity. Specific styles would be associated with logical elements during the conversion to HDL. Tuning the styles would be accomplished over an entire set of documents by changing the converter (which could be table-driven to make such changes easy) or for individual documents by changing the HDL output. The second alternative is made possible by the fact that styles in SDL are maintained in a separate section in each document, the Table of Semantics and Styles (TOSS). Changing the specification of a named style in the TOSS changes its behavior throughout the document.

A third, unrelated reason for preferring a conversion model is that HDL could provide a single, standard target format for an unlimited number of source formats, not just HTML and its derivatives. The possible source formats could include large, general-purpose markup languages such as ISO 12083; industry markup languages such as DocBook; or non-structured input formats that are even simpler and easier to use than HTML. Anything that could serve as input to a conversion script could serve as source material for HDL delivery.

Perhaps the most significant of these alternative sources is the huge amount of legacy data currently in word processing and desktop publishing formats. HDL would be particularly well designed to serve as the target format for such data. It is possible to use HTML as the target for a conversion from flat, format-rich legacy data, but it is not well suited to the purpose and is incapable (as it should be) of expressing the look of the original. HDL, on the other hand, could capture almost all of the original formatting information and present legacy documents online in a form that expressed the intention of the publisher as well as could be accomplished within the limitations of the medium.

Why not a page description language instead?

People who have not had much experience with online document delivery in real-world, cross-platform situations often wonder why documents can't simply be served out in a fixed format such as PostScript or RTF. The answer, briefly, is that it is not possible to achieve complete page fidelity -- that is, to completely capture and retain the look of a particular page layout -- and still make the document usable across platforms with different fonts, display sizes, and window aspect ratios.

Vendors of page-oriented rendering engines such as Acrobat presume that the user will be satisfied to allow the document to occupy the display completely. This is true only in demo situations and in very limited, single-tasking operating environments. Full-function operating environments of the present, and all common operating environments of the future, will present a completely different situation. In environments that allow it, users quickly learn to work in a multitasking world controlled through multiple windows. And their first requirement is that those windows be capable of arbitrary, user-definable shapes and sizes. Documents that depend on a fixed page display don't work in such environments. They also don't work for people with special presentational needs (e.g., large type and Braille) and on devices with unusual display requirements (e.g., PDAs and airline cockpit displays).

Discussions of this point often divide people into two camps: one that insists on completely format-independent delivery and another that insists on complete page fidelity. The truth, borne out by several years of successful online publishing in high-level hypertext systems, is that a great deal of typographical design -- perhaps 70 to 80 percent of the basic visual information traditionally specified by publishers -- can and should remain under the control of the designer. But there is a significant part that can't be specified in cross-platform online environments, and that is the part relating to page geometry. The typographic controls that still make sense in online environments are the ones addressed by the stylesheet languages of universal SGML browsers and by the style attributes that would be included in HDL.

How does HDL fit into the larger picture?

Any discussion of the future of HTML must take place in the context of the imminent arrival of universal SGML Web browsers. An HTML browser such as Mosaic is a browser that can parse and present documents written in a single ISO 8879-compliant markup language, HTML. Similarly, an SDL browser such as the CDE help viewer is a browser that can parse and present documents written in a single ISO 8879-compliant markup language, SDL. By contrast, a universal SGML browser such as DynaText is one that can parse and present documents written in _any_ ISO 8879-compliant markup language.

Universal SGML browsers have been available for some time, but they were sold only as components of expensive SGML publishing systems and were not Web-aware. The first free universal SGML Web browser, SoftQuad's Panorama, was announced and demonstrated at the WWW '94 conference in Chicago. It will be bundled with future releases of NSCA Mosaic. The appearance of Panorama and similar Web browsers capable of delivering any ISO 8879-compliant markup completely changes the landscape of WWW authoring and document delivery.

It may be helpful to distinguish different viewer technologies along the axis of "hardwired" versus "programmable" in two key areas: tag sets and style definitions. Categorized in this way, HTML, SDL/HDL, and universal SGML browsers can be placed in a hierarchy of increasing functionality for the document publisher.

First level: hardwired tag set, hardwired styles (HTML browsers)

Second level: hardwired tag set, programmable styles (SDL/HDL browsers)

Third level: programmable tag set, programmable styles (universal SGML browsers)

This picture is easiest to understand if we look first at the two extremes. Tools at the first level (HTML browsers) deny the publisher any control over the basic rules of document structure or the typographical behavior associated with a given element; both the markup language and the styles are determined by the developers of the browser and are bound in at compile time. Tools at the third level (universal SGML browsers) put complete control over the structural rules and typographical behavior in the hands of the document publisher; both the definition of the markup language and the styles associated with each element are specified in control files that are loaded with the document at run time or kept available in a local cache.

The proposed HDL would occupy a space between these two ends of the functionality spectrum. Like HTML, the HDL markup language would be hardwired into the browser, but styles would be definable on a document-by-document basis by the publisher. The key difference between style specification in HDL and style specification in universal SGML browsers is that HDL styles would be encapsulated in the document itself, whereas styles in universal SGML browsers are generally specified in separate stylesheets that are expected to be cached after the first document of a given type has been downloaded from the server.

Why HDL in a world of universal SGML browsers?

The key question raised by this analysis is why an SDL-based approach should be proposed in a world where free universal SGML Web browsers will soon be widely available. There are several reasons to support HDL as at least an interim standard.

First, HDL would be more efficient for online document delivery than raw SGML served out with stylesheet files. The overhead needed to parse and render a fixed tag set like HDL is much less than what is needed to parse and render arbitrary markup. This is admittedly a short-term concern, but a real one.

A corollary is that the relative simplicity of the HDL delivery standard would keep the level of programming needed to implement a Web browser within reach of the entrepreneurs who have been so successful in making the WWW a reality over the past year and a half. Universal SGML browsers are vastly more complex than browsers built around a hardwired tag set, and the programming effort needed to compete directly with them is simply beyond the reach of most independent developers. The proposed functional split would allow cooperative efforts to be divided between content production tools and delivery tools, keeping both aspects within manageable bounds. Standardization on HDL as a browser standard would level the playing field for smaller players, encourage the development of free conversion tools, and keep the Web open for experimentation and innovation a while longer.

A third reason to propose HDL is because, as noted above, it is so well suited to the quick conversion of legacy data produced in word processing and desktop publishing formats. It is entirely possible to implement a suitably flat, format-rich markup language to accomplish this purpose in a universal SGML browser, but such a language would end up as no more than a clone of HDL. An HDL browser could provide the same results with much less programming overhead.

A final reason for preferring HDL as a Web delivery standard is that it would standardize HDL's internal method of style specification. The corresponding single standard for the stylesheet language that is most likely to be used by universal SGML viewers, DSSSL, is still about a year away from general implementation. The SGML viewers can work around this lack of standardization by remapping each other's stylesheet specifications on the fly, but for the interim, HDL's single format would give it an additional performance advantage.

Having said all this, it must be acknowledged that there is no capability of the proposed HDL that cannot be implemented, and will not be implemented, in universal SGML browsers. The most important reason for standardizing on HDL is to give a coherent focus to the current rather poorly received efforts to add formatting controls to HTML, efforts that go contrary to the design principles of the language. In the worst case, however, failure to standardize on HDL will simply hasten the adoption of universal SGML browsers. The biggest losers in such a scenario will be the current developers of HTML tools, not the larger WWW community.

Jon Bosak, Novell Corporate Publishing Services     jb@novell.com