Positioning Paper for the WAP/W3C Workshop Sept. 5-6, 2000,
in Hong Kong
Philips Speech Processing
Philips Speech Processing, a business unit of Royal Philips
Electronics, is a leader in speech recognition and true dialog
systems. - For information about the company, see www.speech.philips.com
. We are members of the W3C Voice Browser Working Group.
Author: Dr. Volker Steinbiss,
CTO
Subjects That We Would Like to Address:
- Standards around the system architecture
- Standards required for dialog description, including
multi-modal dialogs
- Understanding of how this work relates to other standards
(GPRS, UMTS, 3G)
- How can the community access the massive amount of 3rd-party
data coded via HTML or WML, e.g. automatic translation of WML to
VoiceXML, guidelines for voice-supportive coding of WAP content,
etc.
We expect from the workshop that the participants come close
to a commonly shared vision of the business and technical
directions, and that concrete steps with action holders will be
defined at the end of day 2.
We have ample experience with high-level speech dialog
systems, both with commercial deployments and from the research
angle, and would like to contribute regarding multi-modality
issues, architecture, and dialog description languages.
Some ideas on the subject in a nutshell:
Rendering Information
Raw content (existing in a database) can be presented in
different ways, e.g.:
- 1. HTML pages
- 2. WML pages
- 3. VoiceXML scripts (VoiceXML assumed to be the dialog
description language)
Multi-Modality
- Architecture must support the use voice and data
simultaneously
- Dialog description language must support simultaneous use of
different modalities (speech, text, graphics), e.g. "show this" +
point
- Application execution must understand display dependent
commands (what is "lower"?)
- Application execution must be modality transparent (can press
"OK" or say "yes") with same effect
- There should be a standardized way to locally voice-control
streamed content (e.g. play, stop, forward etc. music or switch
to an inbound phone call while listening)
Translation of WML to VoiceXML
- There are many more HTML and WML than VoiceXML pages, so
automatic translation from HTML or WML to VoiceXML would give
access to a lot of content in a short time with little
effort
- WML to VoiceXML will be much easier for several reasons than
HTML to VoiceXML (due to size of information chunks and as there
is no need to interpret graphical structures and icons)
- Important to access content fast (generate voice portals on
the fly):
- A translation guideline from WML VoiceXML commands
- Maybe a language comprising both?
- A language that would create both WML and VoiceXML
content
Standards are Required for
- Architecture
- Dialog Description (maybe common WML / VoiceXML or
translation)