Content Adaptation and Generation Principles for Heterogeneous Clients

Tayeb Lemlouma and Nabil Layaïda
OPERA Project, INRIA Rhône Alpes
E-Mail: Tayeb.Lemlouma@inrialpes.fr, Nabil.Layaida@inrialpes.fr
Position Paper for the W3C Workshop on Device Independent Authoring Techniques

Abstract
In this paper, we propose a general framework for device independent authoring and presentation. Our approach relies on negotiated adaptation and generation techniques. These techniques allow the creation of customized presentations for different clients starting from a single and more abstract content representation. We focus on some key aspects of this framework through the learned lessons from an experimental system called NAC (Negotiation and Adaptation Core) under development in our project. A particular attention is given to the document model, the document transformation and media adaptation process. The role of the proxy in such a framework is also discussed.

1. Introduction
Providing a suitable content and presentation for different clients in heterogeneous environments is becoming increasingly important today. There is already a plethora of exotic electronic devices such as pagers, PDAs, color cellular phones and there is no sign that the diversity of their characteristics will diminish anytime soon. Making the web content useful on such a range of devices and user agents is challenging and still very hard to achieve. At the lower layers, one of the basic requirements of a successful framework is the provision of a minimal knowledge about clients, servers and network contexts. Starting from such knowledge, efficient mechanisms are needed to deliver the best service in the best possible manner.
The design of complete frameworks with sufficient performance depends widely on to the content representation and model initially used. This has a direct impact on the flexibility exposed during the content manipulation (transformation, adaptation, etc.).
The initial content representation (i.e. the content creation) is not the only factor of primary importance in the process of content delivery. The transformations are the other factor as they can be modularized and re-used in different situations and contexts. The negotiation and adaptation core of the NAC system, uses such a scheme and includes an adaptation layer that operates on the original content sent by the server in a proxy (or server) based architecture. The adaptation process is controlled using a negotiation strategy that allows to reconcile client limitations to servers adaptation capabilities following the notion of profiles (document profile, client profile, etc.). The adaptation layer includes media adaptations such as image transcoding and structural (or tree) transformation using XSLT for instance.
This paper presents some device independent authoring principles from the perspective of content generation using adaptation techniques. Along the presentation of the NAC system, we highlight some key issues and difficulties encountered during the development of the system.

2. Choosing a Document Model
Choosing a general purpose and suitable device independent content model is one of the most important steps in designing an efficient content delivery. The document model influences future scenarios and adaptation strategies applied by the content server. The flexibility and the efficiency of the adaptation process depends widely on the potential left to the original content model. Depending to that model, the achievement of a successful content delivery for different contexts can be easy, difficult or may become sometimes impossible.
This problem can be illustrated clearly when trying to apply an adaptation architecture for the actual Web content. In fact, it is very difficult to process the Web content since it does not respect rigorously a clean syntax at a first place, and thus can not be processed correctly. One possible solution is to "clean" the content first, before its processing, using intermediary processors. Tidy utility [1] represents a good example of a simple, but very useful utility, for cleaning HTML content by fixing syntactic errors automatically. For instance, by adding the missing mark '/' in end tags for anchors, recovering from mixed up tags, etc. Moreover, it allows obtaining a well formed and a valid XML structure of the input document when setting the configuration option 'output-xml' to yes, which helps XML transformation methods to operate correctly.
Unfortunately the task of cleaning the content or making legacy content well structured is not trivial. This requires inferring missing data (structural and formatting), restructuring non structured data and sometimes creating new structures missing initially. In some cases, the conversion requires to correct some mistakes introduced when authoring the original document, the correction -when it is possible- can result sometimes in new content which does not necessarily correspond to the intent of the original authors.
Several domain specific document models are available today and are used in different applications. These models have different functionalities and characteristics and do not have the same expressive power. The presentation description of such models (i.e. the way used to describe and write the presentation in a document) has many sides which differentiate these models and allow comparing them. The presentation is generally related in some way to the logical structure of the document, thus a natural way is to use structure driven transformations such as XSLT.
Some models are more suitable for particular applications. For example SMIL and XHTML + SMIL [5][15] can be used more efficiently than HTML for multimedia representations where the timing aspect is central. In heterogeneous environments the clients reside on a wide range of devices ranging from desktop to ubiquitous information appliances such as personal device assistants, mobile phones, digital televisions, etc. These devices differ in terms of hardware and also in terms of software: two identical devices can use different operating system with different functionalities. This implies that identical clients in terms of hardware may not be capable to render the same presentation. Furthermore, errors can occur when the client try to display the server content and sometimes the client can not display the content at all. This kind of problems is due to:
a) The original content characteristics, and
b) The client capabilities (software user agent and hardware capabilities).
The content model gathers two aspects: the structure of the content and the text media but also other media included by reference (images, audio, video). In many cases, the encoding of the external media (images, video, audio, etc.) is complex and may cause serious problems for some clients especially those having limited processing capacities.
Therefore, original content such as external media can be useless for clients with limited capabilities. Authors who aim to broaden the access of their content for heterogeneous devices should not make too much assumptions of the capabilities client device.
Thanks to some recent efforts, authors can choose the appropriate document model for poor or limited devices. These models are called language profiles designed specifically for particular capabilities of the clients. Among these language profiles, we find the SMIL Basic language profile [4], which consists of a reduced subset of the full SMIL modules [5]. The defined subset can be supported by a wide variety of SMIL players even those running, for instance, on mobile devices presenting limited resources such as : small displays, few number of supported network transactions, limited input methods, etc. An example of the SMIL Basic applications is the one used by the 3GPP in the scene descriptions of the PSS clients and servers [16]. PSS SMIL collection includes the SMIL 2.0 basic language profile plus three additional modules.
Other languages profiles exist such as the SVG Tiny and SVG Basic [14], compact HTML [12], etc. These languages are useful and can be used either by limited or 'rich' clients. Authoring the content in a flexible model helps severs to provide their content for a variety of users by enabling rich adaptation mechanisms of the original content and making them easy to achieve either at the server side or the intermediary proxies.

3. The Modularization Principle and its Application in SMIL
Modularization represents an efficient approach to define the capabilities of the client. The main goal behind modularization is to define, on a module basis, what particular profile to use for every terminal. The modularization comes also with an extensibility framework which allows to scale the capability of a given language profile. This allows a smoother transition between minimal and full blown language profiles.
In addition to the benefits of the modularization at the server side (i.e. the content provider), the concept allows to describe the user capabilities in terms of supported modules and functionalities. This approach allows applying different selections and achieving custom adaptation according to the target context.
As we are targeting the generation of the content, the SMIL language [13] and its modularization framework represents a good approach to follow. Indeed, SMIL satisfies the need of ensuring definitions and descriptions of abstract functionalities collections either in the client or the server side. SMIL [5] offers extensibility for encoding several aspects of multimedia presentations, it specifies what media items will be presented, where and how. It also accounts for whom the media is played and, the most important in our context, how the presentation is adapted for different end users on different contexts. This last point can be ensured, partly, using the 'SMIL variants selection' achieved within the document without the need of the application of additional processing. The SMIL switch element and the test attributes [2] specify how to make the selections among a set of alternatives for inclusion at individual locations in the temporal hierarchy of the SMIL content [11]. Alternatives can be used, for example, to give different formats of a media object according to client preferences and capabilities.
SMIL selection mechanism represents a very simple model ideal for devices that present several limitations and constraints. Furthermore, the different objects used in SMIL presentations (and which constitute basic elements of any multimedia document) can be stored anywhere on the Web and acceded by mean of URL's. Actually, the SMIL modules description knows an important use, such as by the third Generation Partnership Project (3GPP) [16], or in the Multimedia Messaging Service as the presentation language for minimal presentation descriptions [9].

4. Intermediary Proxy Solution: NAC application
The negotiation and adaptation core, called NAC, is an architecture developed in order to provide a solution for the delivery of multimedia content in heterogeneous environments. The goal of the architecture is to enable the delivery of content for a wide range of clients that can include devices from desktops to ubiquitous appliances. For content generation, NAC uses dynamic and static adaptation of the server content. The adaptation is controlled using an adaptation and negotiation module (called ANM), and an optional module on the user side (user context module or UCM). It allows to enrich the ANM knowledge about the client description in terms of profiles. NAC default organization belongs to the proxy-based architectures category, but the proxy entity can be omitted by using (installing) ANM at the original server side which causes some differences in the global behavior (Figure 1). These differences are directly related to the fact that in a server based architecture; ANM can have a total control of the server content including existing variants and their characteristics.

NAC Architecture

One of the important difficulties encountered in NAC is the handling and processing of the server content encoded according to a non adaptable model (example: a non markup language) or one which does not validate against its model (example: a non valid HTML page). Unfortunately, this corresponds to a huge amount of legacy content available on most of the web servers (see section 2).

4.1 Proxy Role
The proxy architecture which consists to add a third entity between the server(s) and the client(s) represents a good approach to address the heterogeneity of clients and servers. Indeed, in a proxy-based architecture the network platform is not modified and all the environment characteristics that already there are taken into account.
In the context of content generation by adaptation, the proxy is the entity responsible of retrieving client requests and contexts and performing possible adaptation on the content received from the server. The generated content is then sent to the client with respect to its characteristics. The proxy can transform existing multimedia content and thus existing content does not have to be produced in multiple versions. All the proxy tasks are designed to behave transparently to clients and content servers.

4.2 The Versioning Principle: a static content adaptation approach
In some situations, the content server maintains different variants of the same original content in order to provide them in different contexts. For example, we find some content providers that store several versions of original videos with different sizes and according to various connection speeds (using modems, LAN, etc.).
Authoring the content in different variants can be useful in some particular cases where no transformation methods of the original content are available or when existing methods generate a non understandable content by the target context. A good example of this is the adaptation of images for different display sizes. In [7] we have introduced an XSLT transformation of HTML to WML that uses the image versioning in order to substitute the original images by their equivalent in wbmp format (the format which is generally used in WML documents).
Maintaining variants of each kind of document and media is costly and has various disadvantages:
o It requires a lot of memory storage space on the content server, especially with large amounts of documents (a great data base for example).
o It requires intensive processing for document authoring and updates.
Furthermore, it is impossible to predict beforehand all of the different types of devices and new ones may appear later requiring an additional burden on the entire process. This shows that both the client and the server must cooperate in the global architecture in order to offer the best possible scheme.

4.3 Adaptation Process Control: a dynamic content adaptation approach
Unfortunately, many actual content servers do not consider client characteristics and contexts when delivering their content. We carried out a simple test using a personal device assistant, and we requested an image resources from different servers. The requested images, which are encoded in different sizes and formats on the servers, were received on the PDA without any adaptation that takes into account the device display limitations. The received resources are for large display areas and on the PDA screen they cannot be displayed entirely without the use of the player scrollbars.
Here, the useful information about the client context and which was not considered by the content server concerns the displaying capabilities. They were conveyed inside the HTTP request using the two HTTP header fields: "UA-color: color16" and "UA-pixels: 240x320" but did not appear have any effect.

4.3.1 Strategy
The way in which clients use the server content is different from a client to another. A client requests the content using its device (PDA, laptop, phone, etc.) that has its specific characteristics and capabilities. Consequently, the presentation provided by the content server, which has its own semantics, should not be the same for all these devices. The content negotiation concept aims to guide content servers to deliver the appropriate content according to the user context, i.e. the client capabilities and the user preferences. For example, we can have a client that uses a PDA to access to HTML pages with French as a preferred language. In this case, a good negotiation strategy must end to the delivery of small HTML pages (which takes into account the client capabilities) with a content written in French.
Generally, in an architecture composed by at least a client and a content server, a content negotiation solution requires the following basic elements (Figure 2):
a) A description tool of the context in which the content is used: such as the description of the client context, the server capabilities, the document profile, etc.
b) An exchange protocol: a well determined dialogue and request format used in the exchange of control messages and the communication of the user context to the server or other entities.
c) Adaptation methods and content versioning: used to adapt or substitute the content with the appropriate variant.
d) A matching strategy: an algorithm which is applied generally at the server side and which aims to match the different profiles (clients, document, server, etc.) in order to determine the best service context and adaptation methods.
Content negotiation techniques are applied mainly following two ways:

A- Variant selection: consists of choosing the best variant on the content server on behalf the user agent. The selection is applied on the available variant list and based on variants description and the user requirements. Selection parameters include the language, the media type, the char-set, etc. The decision of the selection can be determined using an algorithm that covers the different possibilities or simply using a formulae that combines different selection factors and returns the presentation level of a given variant [10][3].

Delivery Framework

B- Content adaptation:in many situations the available content can not be sent directly to the client because of the content nature or the client characteristics. In such case, the content server can be met only after applying some more complex transformation. The adaptation process can be in the form of a program, a script, a XSLT style sheet, etc. Adaptation techniques belong to two categories:
1) Media resources transformation category: in this category, we find transformation methods that concern the media adaptation like image and video adaptation (color reduction, resizing, etc.), media transcoding, and other methods that operates directly at the encoding level.
2) Structural transformation category: concerns transformations that are applied on the global document organization or logical tree. An example of such applications: transforming HTML to WML, filtering HTML documents, transforming XML to SVG, etc. A structural transformation can either keep the same media resource used by the original document, filter it or use an external media transformation to adapt the media for the target context.

The set of the user context requirements can be seen as a set of constraints that the content provider must satisfy in order to find an agreement between what the client demands and what the server can provide. In our approach, the constraints resolution strategy is achieved by adding progressively the constraints to the original content. Finally, it ends with a representation which corresponds to the content to be delivered.
In order to achieve an efficient content adaptation control, the environment constraints must expressed in a manner to cover a wide range of contexts, possibly all of them. Constraints must:

1- Describe enough flexibility, in order to avoid empty solutions.
2- Ease the resolution strategies.
3- Avoid ambiguity: a constraint expression must end to a unique and clear solution.

The universal profiling schema (UPS) [8] was defined to have a central role in the generation of adapted content. UPS identifies three main categories of contexts: the client category, the server category and the network category. From the content side, the server category includes the document instance profile that describes the document characteristics and functionalities. It includes also the resource profile that describes a used media resource and the adaptation method profile that describes an available adaptation method that exist in the server or the proxy side.
In our approach, the client constraints are extracted directly from the HardwarePlatform, SoftwarePlatform and BrowserUA components that exist inside the UPS client profiles. In the RDF bags: OnlySupportedResources, PreferredSupportedResource and NonSupportedResources, we extract additional constraints in terms of capabilities and preferences. During the negotiation matching, the client profile and the document instance profile of the requested content are parsed and the set of included constraints are stored in memory according to their types. The server makes the reference to the document instance profile. According to its content, the server can retrieve -using the exchange protocol- the client resource profile [8] that corresponds to the resource used by the requested content. For example, the server retrieves the client resource profile of the WBMP images if the original requested document uses WBMP images.
The server checks then if the resource (media or document) is supported by the client or not. In the positive case the resource is sent directly to the client without any modifications. In the negative case the server checks if there is any existing version of the resource that can meet the client requirements. Links to the list of versions related to the resource are included in the document instance profile or the client resource profile [8]. If the server succeeds to find a variant that responds to the client requirements, the original resource is substituted by this variant; otherwise the server tries to adapt the original resource. To achieve this operation, the server compares the original resource description (using the resource profile) and the set of the input requirements of each available adaptation methods included in the RDF bag: InputRequirements of the adaptation method profile [8]. If the resource description matches the input requirements of an adaptation method, the server checks if the output description of this method (included in the RDF bag OutputDescription of the adaptation method profile) matches the client requirements. If yes, the server applies this adaptation method on the original resource and delivers the created resource to the client. In the negative case, i.e. no adaptation method can be applied, the server sends a negative reply.

5. Content Creation Genericity using XSLT
XSLT (eXtensible Stylesheet Language Transformations) is used to transform an XML document into another XML document. The XSL language (eXtensible Stylesheet Language) specifies the styling of XML content by using XSLT processor to describe how the document is transformed into another document that uses formatting vocabulary.
The problem with XSLT transformations is that a single XSLT style sheet performs a single transformation according to a specific content model. If we require a transformation with little changes in the generated content we need, generally, to rewrite the style sheet. Providing general XSLT style sheets for the various user contexts (or profile) is very interesting because once this is achieved, the corresponding server has just to apply the generated style sheet and provides the adapted content to the user. A solution of this problem can be to concatenate, each time, the original service with the user agent profile and then to apply a style sheet that operates according to the profile parts of the input tree. This solution is very complicated to achieve using XSLT templates. Indeed, in addition to the concatenation task of the two XML documents (the profile and the original document) into one valid XML document and which we must achieved each time we have a content to deliver; it requires very complicated processing of the initial tree because the two parts of the input tree (the constraints profile and the original content) are separated. This requires even more processing to achieve the cooperation between the two parties.
The scheme proposed in [6] defines a generic style sheet that admits as input the client profile and generates as output a style sheet. The style sheet allows adapting all the concerned content according to the input set of constraint included in the client profile. This approach works reasonably well but unfortunately it is bound to a unique document model (SMIL in this case).

6. Conclusions
In this paper, we have discussed some basic ideas related to the device independent authoring and adaptation framework. We explored some key issues gathered thanks to the implementation of the negotiation and adaptation core (NAC) and we reported the difficulties encountered in the manipulation of content that does not follow appropriate models.
As we have seen, adopting more abstract models helps not only the authoring task but also allows an efficient adaptation control of the content in order to meet different contexts. Other aspects such as the selection of multiple variants should also be considered. Selection mechanisms are generally faster and may complement transformation in the adaptation process.

References

[1] Dave R. Tidy. http://www.w3.org/People/Ragget/tidy/ , W3C working group, August 2000.
[2] Dick B. and Jeffrey A., The SMIL 2.0 Content Control Modules, http://www.w3.org/TR/smil20/smil-content.html
[3] Holtman K., TUE and Mutz A. Transparent Content Negotiation in HTTP. RFC 2295, Network Working Group, March 1998.
[4] Kenichi K., Aaron C. and Michelle K. SMIL 2.0 Basic Profile and Scalability Framework. http://www.w3.org/TR/smil20/smil-basic.html.
[5] Layaïda N. and Van Ossenbruggen J. SMIL 2.0 Language Profile. http://www.w3.org/TR/smil20/smil20-profile.html.
[6] Lemlouma T. and Layaïda N. Adapted Content Delivery for Different Contexts. Submitted to the SAINT 2003 Conference.
[7] Lemlouma T. and Layaïda N. A Framework for Media Resource Manipulation in an Adaptation and Negotiation Architecture. Opera Project, INRIA, August 2001.
[8] Lemlouma T. and Layaïda N. Universal Profiling for Content Negotiation and Adaptation in Heterogeneous Environments. http://opera.inrialpes.fr/people/Tayeb.Lemlouma/NegotiationSchema/index.htm, January 2002.
[9] Multimedia Messaging Service (MMS) Conformance Document, Version 2.0.0. February 6th 2002. http://www.nokia.com.
[10] Network Working Group. Hypertext Transfer Protocol – HTTP/1.0. RFC 1945: http://www.ietf.org/ rfc/rfc1945.txt, May 1996.
[11] Rutledge L., Hardman L. and Ossenbruggen J. V. The use of SMIL: Multimedia Research Currently Applied on a Global Scale.CWI (Centrum voor Wiskunde en Iformatica), NL-1090 GB Amesterdam, The Netherlands.
[12] W3C. Compact HTML for Small Information Appliances. W3C Note, 09 February 1998. http://www.w3.org/TR/1998/NOTE-compactHTML-19980209/#www2.
[13] W3C. Synchronized Multimedia Integration Language (SMIL 2.0), W3C Recommendation 07 August 2001.
[14] W3C. Mobile SVG Profiles: SVG Tiny and SVG Basic, W3C Candidate Recommendation 30 April 2002. http://www.w3.org/TR/SVGMobile/.
[15] W3C. XHTML: The Extensible Hyper Text Markup Language, A Reformulation of HTML 4 in XML 1.0,http://www.w3.org/TR/xhtml/
[16] 3GPP, Technical Specification Group Services and System Aspects, Transparent end-to-end PSS, protocols and codecs (Release 4), 3GPP TS 26.234 v1.5.1, March 2001.

Valid HTML 4.01!