W3C Workshop on Multimodal Interaction. Position Paper: The MONA Project

Position Paper: The MONA Project
Telecommunications Research Center Vienna (ftw.)
Rainer Simon, Florian Wegscheider, Georg Niklfeld

1 Abstract

This position paper presents the current status of the MONA ("Mobile multimOdal Next-generation Applications") project [1]. Within this application-oriented research project, we have developed the "MONA Presentation Server", which enables a new class of mobile applications featuring rich multimodal user interfaces for a wide range of mobile devices in wireless LAN and mobile phone networks. Developers of MONA applications provide a single implementation of their user interface. The presentation server transforms it into a graphical or multimodal user interface and adapts it dynamically for devices as diverse as low-end WAP-phones, Symbian-based smartphones or PocketPC PDAs.

2 Device- and Modality-Independent User Interfaces

Many future applications will feature both multimodal interaction and device independent access in combination. Mobile multimedia, in-car information and entertainment as well as home automation and control are but a few scenarios where multimodality will beneficially complement device-independence. Our work is motivated by the vision that future user interface technologies must therefore enable both in a single, integrated solution.

Within the MONA project, we have developed an experimental XML language for the device- and modality-independent representation of user interfaces. We have implemented a prototype system that translates this representation into a concrete graphical or multimodal user interface. Graphical user interfaces are automatically adapted according to each device's display capabilities. We are currently in the process of implementing two example MONA applications - a unified messaging client and a quiz game - to demonstrate that our system is capable of generating complex user interfaces that allow dynamic real-time interaction among multiple users.

2.1 Requirements

Our user interface description language is based on the following requirements:

Model-Based Approach. User interfaces are represented by an abstract model. Concrete graphical, voice or multimodal user interfaces are derived from this single model.
Declarative Behavior Description. User interface behavior is described declaratively. The presentation server translates this declarative description to a format or scripting language that is suitable for the client device.
Runtime UI Generation Based on Device Profile Information. User interface generation is done during runtime. Users are able to change their interaction modalities as well as their devices on the fly, during an ongoing session. The generation process is adaptive, based on a client device profile.
Presentation Control. Model-based user interface description approaches traditionally suffer from a lack of predictability [6]. Yet predictability and control over presentation are the crucial factors for user interface designers. Our user interface description language should therefore offer a good compromise between user interface abstraction and presentation control mechanisms.
Single Authoring Based on Traditional Methodologies. User interface designers are usually accustomed to thinking in terms of concrete user interfaces, look-and-feel and user experience. They will most likely not be too comfortable with abstract concepts like user tasks and interface input and output requirements. We doubt that the vast community of user interface designers will readily adapt to a fundamentally new authoring paradigm. A modality- and device-independent single authoring methodology should take this into account and enable a smooth transition from traditional methods.

2.2 The MONA User Interface Description Language

We have based our language on the User Interface Markup Language UIML [11] , currently in the process of standardization by the OASIS consortium [8] . The UIML vocabulary we have developed describes user interfaces without implying any preferred presentation or interaction modality. Figure 1 illustrates the language structure.

Structure of the MONA user interface description vocabulary

Figure 1. Structure of the MONA user interface description vocabulary

The core of the language is a set of abstract widgets. These widgets represent intention (e.g. the selection of one from multiple possible options) rather than appearance (e.g. a drop-down list box or set of radio buttons). Similar approaches are found in other languages such as AUIML [2] or XForms [14] .

The abstract widgets are embedded in a structure of (nested) group elements. The groups contain rules that guide the layout algorithm and help to control the dialog flow in voice user interfaces.

In addition to the group structure, the user interface is structured on a larger scale. So-called task units are the key concept for determining both the global dialog flow of the voice user interface, as well as pagination properties, in case the GUI has to be split over multiple screens. A task unit contains groups that belong together in the sense that they are all related to performing a particular user task (e.g. "sending a message" or "viewing all received messages"). Each group may also be a member of multiple task units.

Finally, our language features a clear separation of structure and content. Each widget contains one or more "content elements" that define e.g. labels, choice-options, help text, etc. A content element - in the terminology of our language - is a set of multiple alternative contents for different modalities. The author may specify alternative text, text-to-speech phrases or source URLs for images or audio files. This concept ensures that, while the structure of the language remains modality- and device-independent, the author can still exercise full control over the user experience on the content level.

3 MONA Presentation Server Implementation

The MONA Presentation Server is a prototype Java implementation that performs runtime generation of concrete graphical and multimodal user interfaces from our model-based description. It uses proprietary partner technology [4], [7], [10] that enables synchronized browser-based voice and graphical in- and output on a number of mobile client devices like Pocket PC PDAs, Symbian OS based smartphones and low-end WAP phones. GUI layouts are computed by a layout algorithm that makes optimum use of the available screen width. Support for new devices and target markup languages (e.g. X+V [15], SVG [9]) can easily be added to the server due to a modular architecture.

4 MONA Authoring

Enabling a natural design workflow despite the abstract nature of our language was one of our key requirements. When starting with the work on the two MONA applications, our designers instantly began with classical pen-and-paper GUI sketches based on the scenarios they had in mind. Based on their sketches, they mostly found it quite straightforward to identify an appropriate group and task unit structure for their user interfaces. As a matter of fact, most of our designers did not possess any prior experience with either voice user interfaces or voice/multimodal markup languages. Nevertheless, our language had actually made it quite easy for them to create their first simple multimodal UI in a short amount of time without much effort.

Our designers had based their GUI concept sketches on devices with a PDA screen size and aspect ratio. Using our language, they were able to reproduce their imagined user interfaces with high accuracy on these devices. They were, however, sometimes surprised (sometimes in a positive, sometimes in a negative way) what the layout algorithm would make of their layouts on devices with smaller screens. Obviously, there is still a gap in predictability that is irritating to designers. Good tool support is essential to close this gap.

As a future authoring solution for modality- and device-independent user interfaces, we envision authoring tools similar to current visual web authoring environments. Such tools could, for example, offer GUI previews emulating different devices, as well as visual representations of the voice dialog. Editing is possible in each of these views, as well as in a separate source code window. All changes made in one view are instantly reflected in all other views. As part of the MONA project, we have actually begun a concept study for such a tool and are currently exploring ways to possibly implement a small prototype.

5 Relation to Standards

We see our work to be highly related to a number of existing W3C Multimodal Interaction and Device Independence standards and documents.

5.1 Multimodal Interaction Framework

The W3C Multimodal Interaction Framework [5] formalizes the major components of multimodal systems. Each component represents a set of related functions and comes with markup languages used to describe data flowing through its interfaces. The MONA architecture follows the lines of this framework.

Referring to the terms defined in the multimodal interaction framework, the MONA presentation server covers the Interaction Manager, the System and Environment Component, the Generation and Styling component and partly the Session component.

To the best of our knowledge, the "internal representation" language for describing the output from the Generation component is still under discussion. We believe that our experimental language addresses many of the issues required for this language, as well as for the interface language to the application.

5.2 XForms

It is important to mention that though our language is expressed in UIML syntax, the language itself is neither specific nor bound to UIML. Language elements and structure might as well be expressed in any other XML format, for example as a namespace to extend other languages. After developing our language over two design cycles, we have in particular identified a large number of parallels to XForms:

XForms also features an abstract, inherently device-independent UI model. A multimodal interface can essentially be realized by rendering an XForms model to several synchronized views.
Our abstract widget set is quite comparable to the set of defined XForms controls.
XForms User Interface features a similar grouping concept for UI structuring.
We share the XForms approach of describing UI behavior in a declarative way.

5.3 CC/PP

The W3C's Composite Capabilities/Preferences Profile CC/PP [3] recommendation is an RDF-based notation for the description of client device capabilities and user preferences. A server-side presentation system can use the information provided by the CC/PP to perform runtime UI adaptation.

The MONA adaptation process is currently based on browser sniffing and a set of locally stored proprietary device profiles. These profiles contain a number of parameters that are directly required in the UI generation and layout process (e.g. a priority parameter is used to decide whether less important content, as specified by the author, should be omitted due to lack of screen space). Such parameters are not part of the CC/PP but could easily be derived from it by a simple rule or heuristic. For our current work it is very convenient to be able to directly manipulate the parameters. Trying different values and evaluating the results can then help in determining good rules on how to best derive them from the CC/PP.

5.4 Others

In addition to CC/PP the W3C's device independence working group [12] has published notes on techniques for device independent authoring. We found these helpful for our efforts.

Accessibility will be one major application of multimodal interfaces. Therefore we see the work of the Web Accessibility Initiative [13] as relevant to our research.

6 Conclusion

In this position paper we have introduced an experimental XML language for the modality- and device-independent description of user interfaces. We have described the implementation of our presentation server that generates rich graphical and multimodal user interfaces for different types of mobile devices. We have presented our ideas on single authoring based on our language and concluded by explaining the relation of our work to ongoing W3C activities.

7 References

[1] Project MONA homepage

[2] Azevedo, P., Merrick, R., Roberts, D. "OVID to AUIML - User Oriented Interface Modeling"

[3] Composite Capability/Preference Profiles (CC/PP): Structure and Vocabularies 1.0. W3C Recommendation. 15 Jan. 2004

[4] Kirusa company website

[5] Multimodal Interaction Framework. W3C NOTE 06 May 2003

[6] Myers, B., Hudson, S. E., Pausch, R. "Past, Present, and Future of User Interface Software Tools". ACM Transactions on Computer-Human Interaction (TOCHI), Volume 7, Issue 1 (March 2000). ISSN: 1073-0516

[7] Nuance company website

[8] OASIS - Organization for the Advancement of Structured Information Standards

[9] Scalable Vector Graphics (SVG)

[10] SVOX company website

[11] UIML.org website

[12] W3C Device Independence

[13] Web Accessibility Initiative (WAI)

[14] XForms 1.0, W3C Recommendation, 14 Oct. 2003

[15] XHTML+Voice Profile 1.0. W3C Note 21 Dec. 2001

The MONA project is funded by Kapsch Carrier-Com AG, Mobilkom Austria AG and Siemens Austria AG together with the Austrian competence centre programme Kplus .