8522 154th Avenue NE
Redmond, Washington 98052
Why a Voice Browser?
Features of Conversa Web
Voice Friendly HTML
Conversational Computing has produced a voice browser called Conversa Web [CC98]. This browser replaces the mouse in most instances to enable hands-free browsing [HT95]. Using voice input to select links, for example, is often less tedious than using the mouse. Voice input provides direct "see and say" access to links, eliminating the wrist strain associated with holding the mouse for often hours at a time. Devices with inconvenient mouse access include some notebook computers, PDAs, cell phones with displays, and set-top boxes.
There is a growing trend to use voice browsers for various applications. Many body-worn computers incorporate voice recognition for inspection, repair, and maintenance. Other applications include information kiosks and voice-driven presentations [HM97]. Furthermore, many applications being built today use the browser interface as the GUI.
To voice enable links, Conversa Web includes all likely ways of speaking the link. For example, to speak the link 1998 CUI Introduction, the user might say "nineteen-ninety eight C U I Introduction". Because Conversa Web uses text-to-speech rules for unknown words, it also allows the user to say "coo-ee" for CUI. As the user speaks a link, Conversa Web triggers as soon as it hears enough to distinguish it from other choices. As a result, users need not speak the entire link in the case of long links. After the user speaks a link, Conversa Web briefly highlights the link to give feedback to the user regarding the proper selection. Once Conversa Web asks the Web for the page, it begins to play music to provide aural feedback in case of Web delays during page retrieval.
Conversa Web uses Saycons for links associated with images. The term "Saycon" is a shortened from of "sayable icons". For each image associated with a link, Conversa Web superimposes a cartoon dialog bubble that contains a number. Users simply say the associated number in the Saycon to activate an image-based link. This mechanism solves the problem of bad alt tags -- we have found that less than 20% of the alt tags accurately reflect what a user might expect to say based on the associated image. Saycons are also used to disambiguate identical textual links associated with distinct URLs.
While surfing, users may add pages to the favorites list simply by saying "add to favorites". Conversa Web uses the title of the page to create an entry in the favorites list. At any time after that, users may select from a list favorite items by voice.
Voice commands for voice surfing include the standard ones such as "page up", "go back", etc., but they also include commands to make the browser "go to sleep" and "wake up". Voice commands for scrolling allow users to position the page in the desired position or read while the text automatically scrolls on the screen. Conversa Web also includes voice activated help via the command "Conversa Help Me".
Most Web page authors create pages without considering that users might surf them by voice. Using some simple conventions, authors can often enhance the voice browsing experience without changing the experience for traditional surfing. Some examples include
While much of the responsibility of creating a voice friendly Web page lies with the author, the authoring tool can also help. The authoring tools should
Finally, certain extensions to the HTML language can facilitate voice input. As an example, selection lists are specifically designed for operation with the mouse. It ought to be possible to easily ask for the options in a selection list by voice, but HTML does not support this. We can accommodate the selection with Saycons, but a selection phrase directly associated with the selection list might offer a better alternative. The same discussion applies to form field values and other HTML elements.
Because of past misuse or disuse of tags such as the alt tag, a new tag ought to indicate that the page has been designed with speech input in mind. The new tag might indicate, for example, that the voice browser could trust the alt tags and thereby include them in the active vocabulary set.
[HT95] "Surfing the Web by Voice", by C. Hemphill and P. Thrift. In Proceedings of ACM Multimedia, San Francisco, CA, November 7-9, 1995, pp. 215-222.
[HM97] "Developing Web-based Speech Applications", by C. Hemphill and Y. Muthusamy. In Proceedings of Eurospeech '97, Rhodes, Greece, September 1997, Vol. 2, pp. 895-898.