What is W3C doing with regards to standards for multimodal user interfaces to the Web? This page sets out what has already been done and what W3C plans to do in the future.
Traditional Web browsers present a visual rendering of Web pages written in HTML, and allow you to interact through the keyboard and a pointing device such as a mouse, roller ball, touch pad or stylus. Voice user interfaces, by contrast, present information using a combination of synthetic speech and pre-recorded audio, and allow you to interact via spoken commands or phrases. You may also be able to use touch tone (DTMF) keypads.
Multimodal user interfaces support multiple modes of interaction:
Electronic ink is the term for information that describes the motion of a stylus in terms of position, velocity and pressure. It can be used for handwriting and gesture recognition.
Here are just a few ideas for ways to exploit multimodal user interfaces:
Presenting complementary information on different output modes:
When using a cellphone to ask a voice portal for information about the local weather forecast, a picture could be sent to the cellphone to complement the spoken forecast. When asking for walking directions to a nearby restaurant, a map could be displayed. For an incoming call, the display could show a photograph of the caller.
Allowing you to switch between different modes depending on the context:
It could be too noisy for speech recognition to work, or you may be unable or simply not allowed to speak. Under these circumstances, you may want to use the keypad or pointing device instead of speech input. You may be comfortable looking at a form on the display, but choose to use speech to fill in text fields, rather than struggling with the cellphone keypad.
The W3C Voice Browser working group published a set of requirements for multimodal interaction in July 2000. The working group also invited participants to demonstrate proof of concept examples of multimodal applications. A number of such demonstrations were shown at the working group's face to face meeting held in Paris in May 2000.
To get a feeling for future work, the W3C together with the WAP Forum held a joint workshop on the Multimodal Web in Hong Kong on 5-6 September 2000. This workshop addressed the convergence of W3C and WAP standards, and the emerging importance of speech recognition and synthesis for the Mobile Web. The workshop's recommendations encouraged W3C to set up a multimodal working group to develop standards for multimodal user interfaces for the Web.
Although the Voice Browser working group developed requirements for multimodal interaction, the pressure of work on spoken dialogs and related specifications has made it impractical to devote time to further work on multimodal standards. As a result, W3C now expects to create a new multimodal working group later this year.
To ensure that the new multimodal work group can act swiftly to fulfil commercial requirements, W3C member organizations are invited to submit detailed proposals to W3C for the markup language and synchronization protocols needed to support multimodal interaction. Submissions should consider the following points:
W3C Members are encouraged to collaborate on proposals, as this will make it easier to ascertain broad industry support. In late Summer 2001, a charter for a Multimodal working group will be drawn up based upon the proposals that get the broadest industry backing.
Some ideas that have been suggested include:
Information on how to make a submission to W3C can be found here.