W3C Technologies as Consumers of Multimodal Interfaces

Workshop on W3C's Multimodal Architecture and Interfaces

November 16-17, 2007, Fujisawa, Japan

Author: Doug Schepers (W3C Staff Contact, SVG, CDF, and WebAPI Working Groups)


The W3C has traditionally developed and standardized technologies that worked only in a mediating User Agent, such as a browser, rather than directly at the operating-system level. This was a necessity, since the technologies have had to work across a variety of platforms with different internal APIs and capabilities. Consequently, both the input and output methods defined in W3C specifications have represented a limited subset of device capabilities, with none of the richness of native desktop applications.

W3C technologies are meant to serve as an abstracted layer on top of devices and input. Key technologies like DOM (Document Object Model) define a layer of events that is meant to emulate the most common user interface device inputs, while being generalized regarding the actual method of production of those events.


But with the advent of Web applications, made possible by more powerful computers, more ubiquitous and faster Internet access, and by the hard work of pioneers, the experience of interacting with Web resources has become much richer, and thus the demand for still rich interactivity has grown. More and more, users expect the input and output of the Web to be as powerful as the platform on which they use it.

Additionally, W3C technologies are spreading into a wider variety of devices, which means that we need to respond to the specific needs and abilities of those devices. There are alternate outputs, including screens of all sizes, form factors, and resolutions, assistive technologies such as voice or tactile outputs, and other outputs; there are alternate inputs, such as IME, virtual keyboards, mice, pen-tablets, jogwheels, voice recognition, breath switches, and gyroscopic pointers. Often, the type of output determines the mode and scope of the inputs (for example, selection and navigation on small-factor screens). There are even devices that merge the two interfaces, acting as both output and input, yeilding a more direct and concrete experience; the Apple iPhone is the exemplar of this, and Microsoft's Surface project is similar, with many people wanting to follow that lead to unify the input and output interface. The Web needs to have an adequate model that can generalize that in a meaningful way.

The Scalable Vector Graphics (SVG), Compound Document Formats (CDF), and Web Application Programming Interfaces (WebAPI) Working Groups (and the implementors of those technologies) are in a very real sense the consumers of lower-level intergration standards that are underway in the MMI activity. As such, we need to work closely with MMI to make sure that our needs are met, while at the same time providing knowledge into our own higher-level models that will inform how these interfaces should be shaped.



SVG is proving to be a very versatile output method, because content can be authored to fit automatically to arbitrary sizes or shapes of screen (or indeed, the printed page, using the SVG Print specification). Interactive maps, charts, graphs, diagrams, annotation systems, and practically any other data visualization can be done with ease at any scale, with varying levels of detail. SVG can also be combined with XHTML to optimize graphical and textual output, the work of the CDF Working group. But in order for SVG and HTML Web applications to expand their functionality to meet market needs, we need to expand the scope of the inputs that manipulate these outputs.


The WebAPI Working Group is cerrently working on the next version of DOM (Document Object Model) Events, which exposes the mediated user interface and timing events to the author of scripts or declarative animations. There are many innovative or platform-specific input methods that are not currently available in DOM events, and which are of particular interest.

Among these missing bits of functionality are multiple-pointer systems, where a single user can touch multiple points on the screen and have both points respond to movement, as in the "zoom-in" functionality of the Apple iPhone by expanding two fingers outward, or multi-user systems like the Nintendo Wii and other game consoles which allows multiple users to interact with the screen at the same time. Another example are enhancements of the traditional single-pointer system, such as a pen-tablet with pressure sensitivity, which adds dimensionality to positional orientation; being able to reflect that information in the DOM, and utilize it in SVG, would allow the user to vary the width or color or other properties of a line being drawn on the virtual canvas, giving it similar capabilities to desktop drawing applications in that regard; without that information, such intuitive interaction is all but impossible.

There are more understated input methods as well that are increasingly available on devices, and which should be reflected in the DOM. Geo-location --be it from GPS, cell-triagulation, or even static IP lookup-- can dramatically enhance maps or other location-based services. Battery life, network connectivity, and other device-status information can also be useful (if exposed securely) to Web applications, making it easier to provide compelling services to users.

Which work gets done by which group in the W3C matters little, provided that solutions are provided that meet the needs of all interested parties, with the goal of empowering the content author and maximizing the end-user experience. The SVG, CDF, and WebAPI Working Groups look forward to collaborating with MMI to achieve this.