Voice Browser Workshop Agenda

Cambridge, Massachussetts
13th October 1998

Workshop Announcement | Minutes | Papers

Summary

       8:30  - 9:00       Registration & Coffee
       9:00  - 10:45      Intro, Presentations
       10:30 - 10:50      Refreshment Break
       10:50 - 12:20      Presentations
       12:20 - 1:00       Lunch
       1:00  - 3:30       Presentations
       3:30  - 4:00       Refreshment break
       4:00  - 5:00       Panel Session
       5:00  - 5:30       Next Steps for W3C

Please remember when asking questions to use the microphones provided and to start with your name so we can capture this in the minutes!

8:30 Registration and Coffee

Meet up and collect your badges. Participants who are not from W3C member organizations need to pay a registration fee.

9:00 Welcome

Introduction to the W3C and the goals of the workshop.

9:15 Web Authoring Strategies for Voice Browsers

Kynn Bartlett <kynn@hwg.org>

Vice President of Marketing and Outreach,
HTML Writers Guild

The HTML Writers Guild is committed to developing, distributing, and teaching principles of Universally Accessible Design to our members and the web authoring community. The presentation will describe strategies recommending specific ways in which these principles can be applied to designing pages that are usable by voice browsers.

9:40 Web Accessibility, Universal Design, and Voice Browsing

Mark Hakkinen <hakkinen@dev.prodworks.com>

The Productivity Works, Inc.,
Trenton, New Jersey

Mark Hakkinen is Senior Vice President of The Productivity Works, a firm which develops products that emphasise non-visual interfaces to web-based content. With a background in human factors engineering, he has worked with audio and speech based systems since the late '70's, held user interface R&D positions at established and start-up firms, and co-founded his present firm three years ago. In addition to his current work in voice browsing, he is active in an international project applying W3C's SMIL, HTML, and XML to next generation digital talking books. His firm is a member of W3C and participates in the Web Accessibility Initiative.

The challenge of providing web access to persons with visual and print disabilites spawned developments in User Agent design and HTML to improve access for those who could not browse visually. These developments made it possible for the visually disabled to effectively browse the web using auditory interfaces. This achievement is readily applicable to opening the benefits of the web to a much larger audience through the telephone and small devices. Can the web be equally accessible on visual and non-visual clients? The concept of universal clients to the web is key: author once and browse everywhere. HTML accessibility, Cascading Style Sheets and DOM have proven instrumental in the development of non-visual browsing, and it is through this path that we see the web opened to a significantly wider audience, in an open, standards-based manner. Examples will be presented via demonstration using a telephone-based voice browser.

10:05 IBM Special Needs Self Voicing Browser

James Thatcher <thatch@us.ibm.com>

Dr. Jim Thatcher is the Technology Consultant on vision issues for IBM Special Needs Systems in Austin Texas. He has been working in the area of access to computing with speech for 15 years. Jim is the father of the IBM Screen Readers, having developed the prototype for IBM Screen Reader for DOS well before "screen reader" was a phrase in our vocabulary.

Today Jim focuses on access to the web with IBM's forthcoming Home Page Reader and Java access and IBM's experimental Self Voicing Kit for Java.

10:30 Refreshment break

10:50 Voice Browsers and the Web

Dave Raggett <dsr@w3.org> (W3C lead for HTML)
Or Ben-Natan <orben@microsoft.com> (Microsoft Corporation)

We describe features needed for effective interaction with Web browsers using voice input and output. Some extensions are proposed to HTML 4.0 and CSS2 to support voice browsing, and some work is proposed in the area of speech recognition and synthesis to make voice browsers more effective.

11:15 Voice Access to The Internet

George White <gwhite@genmagic.com>

General Magic.

This talk explains the technologies behind the General Magic telephone service, Portico. Portico provides a speech recognizing communication assistant to access information on the Internet and devices connected to the Internet such as PCs and PDAs. It provides telephone access to public and personal information and provides sophisticated control over telephony functions such as dial-by-name, find-me-follow-me, automated call-back and call-screening. It features a powerful, server based, voice user interface with automatic speech recognition, text-to-speech and personality simulation technology. It accepts continuous speech input over the phone for limited domains, reads e-mail with TTS, and has a high quality recorded voice to embody personality. It provides telephone access to a unified Voice-Mail / Email / Fax Message Box, a unified phone-book & address-book, a personal calendar, news, and stocks. Portico also provides Internet GUI access to same data and it synchronizes GUI and VUI calendars, address books, voice mail and email. Portico will be demonstrated as part of the presentation.

11:40 Conversational Web Access

David Stallard <stallard@bbn.com>

BBN Technologies

We describe current telephone-to-web dialog projects at BBN, as well as some of the problems we experienced in building them. Building on this work, we present our thoughts on why the web isn't currently very suitable for voice-only conversational access, and how it might be made better.

12:05 Voice Browsing the Web for Information Access

Rajeev Agarwal, Yeshwant Muthusamy, and Vishu Viswanathan

Media Technologies Lab
Texas Instruments Incorporated
P.O. Box 655303, MS 8374, Dallas, TX 75265
[rajeev | yeshwant | vishu]@csc.ti.com

There is a large amount of information on the World Wide Web that is at the fingertips of anyone with access to the internet. However, so far this information has primarily been used by people who connect to the web via a traditional computer. This is about to change. Recent advances in wireless communication, speech recognition, and speech synthesis technologies have made it possible to access this information from any place, at any time, by using only a cellular phone. Some possible applications are browsing the web, getting stock quotes, verifying flight schedules, getting maps and directions for various locations, or checking E-mail. In this paper, we discuss different types of web-based applications, briefly describe our system architecture with examples of applications we have developed, and discuss some of the key issues in building spoken dialog applications for the web.

12:20 Lunch Break

1:00 Towards Improving Audio Web Browsing

Michael Wynblatt <wynblatt@scr.siemens.com>,
Stuart Goose <sgoose@scr.siemens.com>

Multimedia and Video Technology Group
Siemens Corporate Research, Inc.
Princeton, NJ, USA

At Siemens Corporate Research, we have been designing audio HTML browsers since mid-1996. We have focused our efforts on two applications: an automobile-based browser called LIAISON and a telephone-based browser called DICE. Both of these systems rely upon our underlying WIRE (Web-based Interactive Radio Environment) technology for audio rendering of Internet content.

1:25 Considerations in Producing a Commercial Voice Browser

Michael B. Robin <mikero@conversa.com>,
Charles T. Hemphill <hemphill@conversa.com>

Conversational Computing 8522 154th Avenue NE
Redmond, Washington 98052
mikero@conversa.com

Conversational Computing has produced a voice browser that works in conjunction with a standard HTML browser. We describe some possible uses for a voice browser and some of the features incorporated into this browser to facilitate voice interaction. Toward the goal of voice enabling content on the Web, we offer some examples of how page design and HTML extensions might enhance the voice browser experience.

1:50 PhoneBrowser: A Web-Content-Programmable Speech Processing Platform

Michael Brown <mkb@research.bell-labs.com>

A PhD and Member of Technical Staff with Bell Laboratories for almost 18 years. Dr. Brown has worked on speech recognition throughout most of that time, working on HMM decoding, language modeling, semantics and dialogue. He has also worked on robotics (speech controlled, of course), sensors, handwriting recognition, optical flow and neural networks. He has over 50 publications and more than a dozen patents.

The PhoneBrowser is a system for browsing the World Wide Web using only a telephone as the terminal. Different synthesized voices are used to signify particularly interesting text on the page, most notably hyperlink titles. Other fonts like bold text or heading text, for example, may also have special voices assigned. The HyperVoice description of page layout includes information about images, forms, tables, etc. To the extent possible information about the content of the page is summarized and transformed into a concise verbal form without heavy reliance on special programming.

At any time the user can ask questions to get greater detail or can speak Hyperlink titles into a speech recognizer, interrupting TTS output, to navigate to other Web pages. Other speech commands can control operation of the browser and how the information is rendered. In this way the user has control over the presentation and navigation processes. Thus, the PhoneBrowser makes the Web accessible to traveling business people and to the 60% of the U.S. market that does not own a computer.

2:15 SABLE: A Standard for TTS Markup

Andrew Hunt <hunt@east.sun.com> (Sun Microsystems Laboratories)
Richard Sproat <rws@research.bell-labs.com> (Bell Laboratories, Lucent Technologies)

Andrew Hunt works on speech applications and platforms, as well as various research topics in text-to-speech synthesis. Richard Sproat works on text processing for text-to-speech synthesis.

Currently, speech synthesizers are controlled by a multitude of proprietary tag sets. These tag sets vary substantially across synthesizers and are an inhibitor to the adoption of speech synthesis technology by developers. SABLE is an SGML-based markup scheme for text-to-speech synthesis, developed to address the need for a common TTS control paradigm. SABLE supports two kinds of markup: "text description" marks properties of the text structure that are relevant for rendering a document in speech; "speaker directives" control various aspects of how the speech is to be produced. Unlike some other recent proposals for voice applications markup, SABLE is a community effort in the sense that it has been developed by a team of speech synthesis experts from a variety of institutions. There is a public mailing list (sable@east.sun.com), which anyone can join, and the SABLE specification is available from a variety of public web sites.

2:40 ADML - the language to create AudioWeb; hyperlinked collection of audio pages

Tomasz Imielinski <imielins@cs.rutgers.edu>

Professor and Chair of Computer Science Department at Rutgers and my research interests are in Mobile and Wireless Computing - I am leading the DataMan researcg group at Rutgers which is funded by Darpa, NSF and several companies.

We will discuss our experience and development history of audio web research project at Rutgers University aimed at creating audio accessible content from the world wide web resources. We will describe services which were created through numerous student projects and summarized the lessons which we have learned so far. We will also describe the current status of the AudioWeb browser implementation at Rutgers University.

3:05 Requirements for a markup language for HTTP-mediated interactive voice response services

Nils Klarlund <klarlund@research.att.com>,
Kenneth G. Rehor <krehor@research.bell-labs.com>,
David Ladd <ladd@icsd.mot.com>

Nils Klarlund Joined AT&T Bell Labs in 1995. Interests: verification, programming languages, and user interfaces. Kenneth G. Rehor works on the design of languages and platforms for the integration of telephony and computer networks in the Software Production Research Department at Bell Labs. David Ladd works in the Internet and Connectivity Services Division of Motorola, where he is the Architect and Program Manager of the VoxML Project.

Voice browsing involves access to the Web via a device, such as a telephone, that has no display. Our joint experience with markup languages for IVR (Interactive Voice Response) systems suggests that HTML cannot be easily extended in ways that would make voice browsing possible. In fact, voice browsing suffers from many of the same obstacles that make so many IVR systems unpleasant and difficult to use. Web contents should nonetheless be accessible to voice browsing communities. This goal can be achieved by a structured markup language that is expressly designed for IVR services. Such a language could be used to create voice browsers along with Web applications that parallel their visual counterparts. We offer some requirements for such a language.

3:30 Refreshment break

4:00 Panel Session

The presenters will be asked to give their views on the future of voice interaction and the Web, and what standards are needed to achieve this. This will be followed by questions from the audience.

5:00 Wrap up - what should W3C do next?

This session will consider whether W3C should set up a Voice Browser Interest Group, and attempt to outline the goals and opportunities the group would address. This would form the basis for a briefing package for review by W3C members on setting up a formal activity on Voice Browsers.