HTML Futures

Joel Nava, Bruce Hunt, T.V. Raman

Adobe Systems Inc.

Abstract

Adobe believes that the Future of HTML lies in retaining the bulk of the behavioral semantics of HTML, it's tagset and associated CSS standard while moving to a fully XML compliant syntax. In our view this should take a phased approach as the evolving future of HTML. In the primary phase, HTML is enhanced minimally to accomodate XML. In the succeeding phase, HTML is altered to have a well formed XML syntax along side the current syntax(es). In the succeeding phase, the semantics of HTML are exposed in a uniform manner through an XML document object model via CSS so that any tag can access the semantic behavior of HTML. In this way, XML integrates with HTML. In the last phase, HTML is subsumed by XML; while the semantics (layout behaviors) are retained as a presentation capability. This approach we believe provides a smooth migration path for HTML's future.

Introduction

All languages grow and change. HTML continues to be a useful content expression language that powers Web growth. It is now time for HTML to mature and grow to support both content driven and layout driven forms in the sensate spaces as well. In an ideal world HTML has a simpler more coherant syntax with well specified content flow semantics. In that same world, non conflicting custom tag sets would permit the capture of unique application structure and permit it to be presented in a variety of formats constrained to present the information content in a manner best perceived by the content consumer. This would include the capability to select precision rendering in the sensate spaces (visual, auditory, or tactile,or smell!?) driven either by a content or layout driven forms. A dynamically drivable precision graphic expression capability providing a means to identify and expose graphics structure along with document structure provides the basis for superb graphic layout, but also provides the key to render the graphics in other sensate spaces as well. To achieve this kind of high utility growth and permit standardized interpreted semantic tag sets, HTML should be driven to align with the work that opens up these new vistas, that is, XML. It is time to plan an enhancement of HTML to achieve these goals.

We believe that it is now time to adopt a plan to move HTML forward and keep it a continuing center of content presentation while leveraging it's wide-spread acceptance. To drive it forward, the changes must provide a seamless path that retains the utility in HTML that transitions it to embrace the extensibility of XML. In addition, HTML becomes the long term method for content driven document rendering on the Web. It interfaces seamlessly to a now emerging layout driven document rendering method for the Web.

To achieve these goals a plan is needed. The proposed plan is a series of phases timed by achieving completion of the goals of each stage. First, HTML must accomodate XML with a view toward engagement. Next HTML must be integrated with XML. The transition can occur in the following phases.

Accomodate

The goal of the accomodate phase is to provide an XML injection facility for down level browsers, provide an XML portal in next generation browsers and to ready HTML for integration in an XML based browser.

Inserting XML into an HTML page in a down level browser was considered at length at the XML in HTML workshop. The conclusion of that meeting was to use the <SCRIPT> tag but that such use is not to be encouraged. In this phase significant concern for embedding XML in downlevel browsers occurs so that early adopters can experiment with XML in a production prototype context to determine tactical utility of extensibility and plan for more accessible and regular semantics.

The HTML in XML workshop recommends that an XML tag be added to HTML to provide the portal. This is the key to accomodation and sets the stage for XML integration with HTML. This approach provides islands of XML in a sea of HTML.

Substantial work would be undertaken to clearly articulate and precisely describe the semantics of the tags in HTML including the general flow model. This work is motivated by the goal of rationalizing the behavior of HTML in the browsers so that a more predictable layout model and unified tag set can be achieved. The result of this work would provide the specification of the function in HTML that would be described by the DOM. The result of this work may also expose a need for additional tags representing individual functions that were previously composed into a single function and represented by a single tag. This expanded syntax will be referred to as the canonical HTML syntax and semantics; while the standard HTML versions 4.0 and below are referred to as classic HTML.

Additional work is undertaken to provide a precision graphics layout tagset and semantics; as well as styles, templates and tools. Where-ever possible, core function is shared between the precision graphics semantics and canonical HTML semantics to minimize the number of constructs required in the DOM.

In this phase, XML usage is experimental and prototypical. It is not used for mainstream Web pages.

This phase concludes when:

Integrate

The goal of the integration phase is to complete the integration of HTML syntax and semantics into an XML processor based browser that also includes classic HTML.

In the Integration phase, HTML's canonical syntax is recast in an XML form. The semantics normally associated with HTML is completely expressable in CSS (which is also then expressed with an XML syntax.). The HTML namespace includes the tags normally associated with HTML and each tag has a default association with it's canonical semantics now expressed as a CSS style or attribute. Browsers also support classic HTML as well as the new well formed HTML namespace, referred to as modern HTML. In this phase, classic HTML appears as islands in a sea of XML. Modern HTML is just another XML namespace.

The authoring tools deprecate the use of classic HTML. Many tools are created to recast classic HTML into modern (XML) HTML. Authoring tools should be able to provide more precision and better cross browser control for modern HTML as well as increasing the utility and accessibility of the content now represented in modern HTML.

Any tag in any (XML) namespace can access canonical HTML semantics by assigning a style from CSS. In addition, a mechanism to associate a script with any tag in a namespace is provided. This script can access any function exposed by the DOM. Of course all function available in CSS and PGML is exposed in the DOM. These mechanisms permit the rich presentation semantics to be tapped by all namespaces since this function is provided by all browsers. It is further expected that function such as that provided by aural CSS will be standard and provided by the DOM. The browsers provide the core interpreting engines for the presentation semantics wrapped up in namespace tag sets. As a consequence, a wider variety of presentations in many different types of media are now routinely delivered on the Web.

The integration stage lasts until XML authoring tools that produce modern HTML, PGML and rely on the standardized semantics of the DOM dominate the authoring market. While the number of classic HTML pages continues to dominate, new pages are almost always constructed in modern HTML. Classic HTML pages reach their peak on the Web and their creation rate steadily declines.

Assimilate

The goal of this phase is to assimilate classic HTML into the modern HTML namespace.

To achieve this, processing tools that convert classic HTML to canonical HTML and thence to modern HTML will be required. These tools are likely to use the classic HTML parsing engines of current browsers, some additional code to convert the parse tree to canonical HTML and then to re-export the parse tree as modern HTML.

Work may be ongoing concerning additional requirements for content driven layout semantics which are naturally expressed as HTML. These changes are not expected to substantially impact the overall phased plan.

In this phase, there is a substantial opportunity for the tools vendors to provide styles, templates and custom semantic tagsets to deliver new vertical applications to many different groups. This should enhance the Web and continue to open it to multiple cultures and significantly widen the accessibility of information provided.

Authoring tools at this stage provide conversion support from classic HTML to modern HTML, but no longer write out classic HTML. Site management tools examine sites for classic HTML so that it can be converted to modern HTML. Browsers continue to support classic HTML

The Assimilation phase lasts until classic HTML wanes to a small fraction (10% to 20%) of the pages on the Web.

Subsume

The goal of this phase is to complete the transition of HTML in the Web to a fully XML syntax and semantics.

By this phase, modern HTML has subsumed classic HTML.

In this phase, browsers no longer support classic HTML and consider it in error. Independent conversion tools remain, but the authoring tools no longer provide classic conversion support.

HTML, PGML, CSS and XSL provide the core representations for presentation semantics available on the Web through the DOM.

Conclusion

Classic HTML has been a major component in the emergence of a word wide content communication system. It is now time to move HTML forward so that it becomes one of several tools for presentation of content. Classic HTML should be transformed into modern HTML which is a namespace in XML and associated canonical semantics. The semantics of canonical HTML should be available to all tagsets in XML via the DOM along with other standardized semantics. In this way a set of focused rich vertical vocabularies can arise that make communication efficient and fast while at the same time permit widespread understanding and dynamic re-presentation to achieve the widest possible access. To achieve this we have suggested a plan that moves HTML from accomodation to integration to assimilation until it is finally subsumed.