Position Paper for the W3C Device Independence Working Group Workshop on Metadata for Content Adaptation, Dublin, October 2004

Summary

As part of the SEKT project we have developed a software platform for supporting device independence, by using declarative templates to format structured data. Device characteristics are specified using the CC/PP UAProf standard, and we aim to extend this by creating an ontology of devices using RDFS and OWL, and combining this with information about user preferences to provide a more meaningful set of attributes for device identification. The key ideas are the use of generic XML as a device independent data format, with purely declarative templates used to format this data. Selection of templates, and selection of sections within templates can be made conditional upon attributes of the current device profile.

Introduction.

The aim of device independence is to provide an effective and functional presentation of the same resource on different devices. In the context of the World Wide Web, devices include PC's, laptops, PDA's, WAP phones and printers, and resources are located at URLs.

The key requirements are to

Identify the relevant characteristics of the device.
Select appropriate structured content.
Arrange the content in a suitable layout.
Present the layout in a suitable style.

This document describes the Device-independent Web Architecture Framework (DWARF), which allows authors to build device independent web applications from four basic components;

Device characteristics held in device profiles.
Content held in a database.
Layout defined in templates.
Style defined in style sheets.

The framework uses DELI to handle CC/PP profiles, XML to store structured page content, and CSS to define style sheets. The focus of this document is the core architecture for the selection of content and its arrangement in different layouts.

Philosophy

The DWARF architecture is based upon the use of declarative templates to select and arrange data content. Selection and arrangement takes place at two levels. On the first level, identical content can be displayed in different ways by selecting different templates, which can provide different structure, and even different presentation languages (e.g. HTML or WML). On the second level, minor modifications such as the addition or omission of individual elements can be made within a single template. At both levels, the selection is based upon conditions associated with the template or template element. These conditions are evaluated against the request context, which provides pointers to both the current device and the current data requested.

A critique of XSL

The use of templates is analogous to XSLT style sheets to format data, but with significant differences. XSLT has the power of a Turing machine, and can carry out complex processing tasks, such as tree inversion, sorting and numerical analysis, but suffers from the serious drawback that it combines programming with data and design. Consequently, on the one hand it is too complex to be authored by non-technical users. On the other hand the awkward syntax and semantics make it a poor choice as a data processing language. Finally, it causes considerable difficulty in reuse, since it fails to separate data form code.

XSLT is based upon the assumption that business data will be held in XML format, in some XML database, and can be extracted, processed and formatted for output in a single step. However in practice this turns out to be a poor model. Firstly, combining extraction, processing and formatting in a single step is bad software engineering practice for the reasons mentioned above. Secondly, XML is a poor choice for data storage for most applications. On the one hand it is verbose, which has serious consequences for large databases. On the other hand it is tree structured, whereas most data is cyclical, making it unsuitable for anything but the most trivial data model.

XML as a device independent data format

Although business data is normally cyclical, information as presented on a page is always tree structured, since text is by its nature serial. Therefore XML makes an excellent device independent data format intermediate between extraction and processing, and output. In our approach, the input to the display engine is structured, processed data. This data is matched against purely declarative templates containing just two primitives, sections and variables, corresponding to XML elements and XML attributes. This MVC (Model View Control) architecture separates data storage, processing and display, with great benefits for simplicity, reuse and maintainability. In simple standalone web applications the data may be supplied directly in the form of XML. In more complex applications, data may be generated by an SQL query, or built dynamically by the application. This differs from other template-based approaches that use specific metadata tags to control layout generation.

Example

The key component of the framework is the template-matching algorithm, which matches tree-structured data against templates. The data and templates below illustrate the idea.

Catalog.xml
<Catalogue>
	<CatalogueInfo date="02/09/2004">
		Nice Nurseries Spring Collection
	</CatalogueInfo>
	<Item code="123" image="crocus" name="Crocus" type="bulb">
		An attractive spring flowering bulb
	</Item>
	<Item code="456" image="clematis" name="Clematis" type="bulb">
		A vigorous and hardy climber
	</Item>
</Catalogue>

Template1.htm
<html>
	<section name="Item">
	<P><b>@@name@@</b>
	<br>@@code@@
	<br>@@Item@@</br>
	</section>
	
	<section name="Item" condition="ccppAccept='image/jpeg'" />
	<img src="@@image@@.jpg"/>
	</section>
	<section name="Item" condition="ccppAccept='image/gif'" />
	<img src="@@image@@.gif"/>
	</section>
	</html>

Template 2
<wml>
<section name="CatalogueInfo">@@title@@</section>

<section name="Item" condition="type='bulb'"
<br>@@code@@ @@name@@
</section>

<section name="CatalogueInfo">@@date@@</section>
</wml>

In the example above the data is represented as XML, and consists of a single "CatalogueInfo" element and a number of "Item" elements, each with associated attributes. The templates each consist of a number of sections, corresponding to element names, and tags corresponding to attribute names. For example the first template has no section displaying "CatalogueInfo" data, a single section dispalying "Item" data and a choice of sections for displaying an image, depending on the formats accepted by the current device. Assuming that the current device accepts jpeg format images, the output from template1 is shown below

<html>
	<P><b>Crocus</b>
	<br>123
	<br> An attractive spring flowering bulb
	<img src="crocus.jpg"/>
	
	<P><b>Clematis</b>
	<br>456
	<br>A vigorous and hardy climber
	<img src="clematis.jpg"/>
</html>

Data elements and attributes may be omitted or repeated in the template, and elements may have conditions attached to them. The data elements and corresponding sections may be nested to any level, or may be recursive, and the conditions may be Boolean expressions using the AND, OR and NOT operators. Notice the use of the @@Item@@ attribute. This is a feature of the XML interface, which treats text elements as attributes of the parent element, with the attribute name set to the parent element's name.

The DWARF template-matching algorithm matches the data against the template so that each template section is repeated for each data element that matches the section name and conditions. In the first template the condition on the second section refers to the "ccppAccept" attribute of the device profile. This section will only be rendered if the profile of the current device allows jpeg images to be displayed. In contrast, in the second template, the condition refers to the "type" attribute of the catalogue data, acting as a crude query facility on the data. There is no distinction between the sources of these attributes in the conditions.

How does the software select the appropriate template? The template file names are listed in a configuration file, and each template may have a Boolean condition associated with it.

Templates.xml
<templates>
	<group name="catalogue">
		<template filename="template1.htm" condition="InputEnabled"/>
		<template filename="template2.htm" />
	</group>
</templates>

Templates are placed into groups, representing alternative presentations for the same content. The software will select the first template listed that meets the conditions, evaluated in the current request context. The overall effect of this data, template group and configuration file is that if the current device is input enabled (ie not a WAP phone) , the first template will be displayed. Either the jpeg or gif version of the image or neither will be displayed depending on the capabilities of the device. If the device cannot display images, the second template will be displayed. This shows additional metadata about the catalogue, and only displays entries about bulbs.

Comparison with similar approaches

There are a number of alternative solutions on the market, including MobileAware and Volantis that similarly make use of templates to reformat data. In each of these applications, an extension to HTML is defined which allows the page author to embed device independence information into the HTML file. The software described here is a generalisation of these solutions, in that the user may define their own tags, through the section names in the templates. Both the MobileAware and Volantis solutions can be implemented in DWARF by the construction of appropriate templates.

Another advantage of the current approach is the facility to select different templates, and well as selecting content within templates. This simplifies the template files, which only contain the sections required for a particular device.

A third advantage of the DWARF approach is that it is not based upon XML or HTML. Content is represented internally as a tree, and this data can be read in directly from XML, but it is also possible to provide an SQL interface to the data.

Future Work

DWARF has been developed as part of the SEKT project, funded under the IST programme of the European Commission. A key part of the project is to provide a focussed, knowledge-based response to a user search, by performing a semantic analysis of target resources, based partly on RDF annotations, and partly on the text itself. Natural Language Processing techniques can then be applied to the results in order to adapt the response to the device at hand. For example, knowledge summarisation could be used to generate a sufficiently concise summary of a document to send to a mobile phone via an SMS service. To this end we plan to extend the device independence software in two ways. Firstly device profiles will be integrated with personal profiles using OWL, the W3C recommendation for web ontologies. This will make it possible to define concepts such as "smallScreen" which take into account not only the physical properties of the device but the preferred font size of the user. Secondly, the integration of DWARF with natural language processing technology will make it possible to generate content 'on the fly' which meets parameters such as maximum length of text.