Positioning Paper for the WAP/W3C Workshop Sept. 5-6, 2000, in Hong Kong

Philips Speech Processing

Philips Speech Processing, a business unit of Royal Philips Electronics, is a leader in speech recognition and true dialog systems. - For information about the company, see www.speech.philips.com . We are members of the W3C Voice Browser Working Group.

Author: Dr. Volker Steinbiss, CTO

Subjects That We Would Like to Address:

Standards around the system architecture
Standards required for dialog description, including multi-modal dialogs
Understanding of how this work relates to other standards (GPRS, UMTS, 3G)
How can the community access the massive amount of 3rd-party data coded via HTML or WML, e.g. automatic translation of WML to VoiceXML, guidelines for voice-supportive coding of WAP content, etc.

We expect from the workshop that the participants come close to a commonly shared vision of the business and technical directions, and that concrete steps with action holders will be defined at the end of day 2.

We have ample experience with high-level speech dialog systems, both with commercial deployments and from the research angle, and would like to contribute regarding multi-modality issues, architecture, and dialog description languages.

Some ideas on the subject in a nutshell:

Rendering Information

Raw content (existing in a database) can be presented in different ways, e.g.:

1. HTML pages
2. WML pages
3. VoiceXML scripts (VoiceXML assumed to be the dialog description language)

Multi-Modality

Architecture must support the use voice and data simultaneously
Dialog description language must support simultaneous use of different modalities (speech, text, graphics), e.g. "show this" + point
Application execution must understand display dependent commands (what is "lower"?)
Application execution must be modality transparent (can press "OK" or say "yes") with same effect
There should be a standardized way to locally voice-control streamed content (e.g. play, stop, forward etc. music or switch to an inbound phone call while listening)

Translation of WML to VoiceXML

There are many more HTML and WML than VoiceXML pages, so automatic translation from HTML or WML to VoiceXML would give access to a lot of content in a short time with little effort
WML to VoiceXML will be much easier for several reasons than HTML to VoiceXML (due to size of information chunks and as there is no need to interpret graphical structures and icons)
Important to access content fast (generate voice portals on the fly):
- A translation guideline from WML VoiceXML commands
- Maybe a language comprising both?
- A language that would create both WML and VoiceXML content

Standards are Required for

Architecture
Dialog Description (maybe common WML / VoiceXML or translation)