Papers submitted to the W3C Voice Browser Workshop

13th october 1998, Cambridge, Mass

A W3C Note describing features needed for effective interaction with Web browsers that are based upon voice input and output. Some extensions are proposed to HTML 4.0 and CSS2 to support voice browsing, and some work is proposed in the area of speech recognition and synthesis to make voice browsers more effective.

IBM Special Needs Self Voicing Browser

"IBM Home Page Reader tm (HPR) is a voice browser designed by a blind researcher in IBM Tokyo Research Labs. Chieko Asakawa found browsing the Web somewhere between difficult and impossible using today's Windows screen readers. Home Page Reader was released as an IBM product in Japan in October of 1997. IBM Special Needs Systems in Austin picked it up and has been adapting it to the North American blind user, adding recommended access features, and aspects of HTML 4.0 support."

Conversational Web Access

"We describe current telephone-to-web dialog projects at BBN, as well as some of the problems BBN experienced in building them. Building on this work, we present our thoughts on why the web isn't currently very suitable for voice-only conversational access, and how it might be made better."

Voice Browsing the Web for Information Access

"There is a large amount of information on the World Wide Web that is at the fingertips of anyone with access to the Internet. However, so far this information has primarily been used by people who connect to the web via a traditional computer. This is about to change. Recent advances in wireless communication, speech recognition, and speech synthesis technologies have made it possible to access this information from any place, at any time, by using only a cellular phone. Some possible applications are browsing the web, getting stock quotes, verifying flight schedules, getting maps and directions for various locations, or checking E-mail. In this paper, we discuss different types of web-based applications, briefly describe our system architecture with examples of applications we have developed, and discuss some of the key issues in building spoken dialog applications for the web."

PhoneBrowser: A Web-Content-Programmable Speech Processing Platform

The PhoneBrowser is a system for browsing the World Wide Web using only a telephone as the terminal.Different synthesized voices are used to signify particularly interesting text on the page, most notably hyperlink titles. Other fonts like bold text or heading text, for example, may also have special voices assigned. The HyperVoice description of page layout includes information about images, forms, tables, etc. To the extent possible information about the content of the page is summarized and transformed into a concise verbal form without heavy reliance on special programming.

At any time the user can ask questions to get greater detail or can speak Hyperlink titles into a speech recognizer, interrupting TTS output, to navigate to other Web pages. Other speech commands can control operation of the browser and how the information is rendered. In this way the user has control over the presentation and navigation processes. Thus, the PhoneBrowser makes the Web accessible to traveling business people and to the 60% of the U.S. market that does not own a computer.

Web Authoring Strategies for Voice Browsers

The HTML Writers Guild is committed to developing, distributing, and teaching principles of Universally Accessible Design to our members and the web authoring community. The following strategies recommend specific ways in which these principles can be applied to designing pages that are usable by voice browsers.

Requirements for a markup language for HTTP-mediated interactive voice response services

Voice browsing involves access to the Web via a device, such as a telephone, that has no display. Our joint experience with markup languages for IVR (Interactive Voice Response) systems suggests that HTML cannot be easily extended in ways that would make voice browsing possible. In fact, voice browsing suffers from many of the same obstacles that make so many IVR systems unpleasant and difficult to use. Web contents should nonetheless be accessible to voice browsing communities. This goal can be achieved by a structured markup language that is expressly designed for IVR services. Such a language could be used to create voice browsers along with Web applications that parallel their visual counterparts. We offer some requirements for such a language.

MIT Spoken Languages Systems Group

The Spoken Language Systems Group at the MIT Laboratory for Computer Science is devoted to research that will lead to the development of interactive conversational systems. We formulate and test computational models and develop algorithms that are suitable for human computer interaction using verbal dialogues. These research results are funneled into the development of experimental conversational systems with varying capabilities. For example, our GALAXY system handles queries in three domains: weather, air travel, and city guide. Our GALAXY architecture uses a Java-enabled web browser as a graphical user interface and a telephone line for spoken interaction. Our Jupiter system provides conversational access to weather information on 500+ cities worldwide via a standard telephone.

Towards Improving Audio Web Browsing

At Siemens Corporate Research, we have been designing audio HTML browsers since mid-1996. We have focused our efforts on two applications: an automobile-based browser called LIAISON and a telephone-based browser called DICE. Both of these systems rely upon our underlying WIRE (Web-based Interactive Radio Environment) technology for audio rendering of Internet content.

Standards for Voice Browsing

PipeBeach is developing a voice browser for enterprise servers. It provides interactive access to web pages over conventional and cellular telephone lines. The product supports both DTMF and speech input from the user, as well as speech synthesis and digital audio output. We believe that the major obstacle to wide-scale commercial deployment of voice browsers for the web is not the technology, but the ease (or difficulty!) with which web page designers can add speech support to their site. From this perspective, it would be desirable for the voice browser to render interactive speech dialogs from standard HTML web pages. Our experience has shown that it is indeed possible.

Considerations in Producing a Commercial Voice Browser

Conversational Computing has produced a voice browser that works in conjunction with a standard HTML browser. We describe some possible uses for a voice browser and some of the features incorporated into this browser to facilitate voice interaction. Toward the goal of voice enabling content on the Web, we offer some examples of how page design and HTML extensions might enhance the voice browser experience.