Workshop on Speaker biometrics and VoiceXML 3.0
— Summary
On 5-6 March 2009, the W3C Voice Browser Working Group held a
Workshop on
Speaker biometrics and VoiceXML 3.0
in Menlo Park, California, US, hosted by SRI International.
The minutes of the workshop are available on the W3C Web server:
http://www.w3.org/2008/08/siv/minutes.html
There were 16 attendees from
the following organizations:
- SRI International
- Recognition Technologies, Inc.
- J. Markowitz, Consultants
- Intervoice
- EIG
- Deutsche Telekom AG, Laboratories
- Centrelink
- iBiometrics, Inc.
- Daon
- Cisco Systems, Inc.
- iBiometrics, Inc.
- Voxeo
- Nuance
- General Motors
- W3C
This workshop was narrowly focused on identifying and prioritizing
directions for SIV standards work as a means of making SIV more useful
in current and emerging markets.
Topics discussed during the workshop includes:
- SIV use cases (Application requirements for SIV in VoiceXML 3.0)
- SIV users (design philosophy, uncertainty, security, identity)
- Audio formats for SIV (Wav, PCM, alaw, ulaw, OGG, etc.)
- Data format for multimodal applications (EMMA, etc.)
- SIV related standards other than W3C (CBEFF, INCITS 456, BIAS, BioAPI)
- SIV and MRCP V2
- Architecture and functionality (features, configuration, APIs, etc.)
During the workshop we have clarified
"Why SIV functionality should be added to VoiceXML" as follows.
- The system would be more responsive, so VoiceXML could shorten
customer perceived latency and provide performance benefits to the
users.
- It would be easier for developers to generate applications, because
programming interface would be consistent with the way they use other
VoiceXML resources and low-level operations would be hidden to them.
- Adding SIV to a standard would make it portable and facilitate
integration with Web model, because it makes SIV applications
consistent with the model and provide efficiencies of scale in hosted
environment.
- Standardizations of easy to use API would minimize vendor
lock-in and grow the market.
- Support in VoiceXML enables SIV use (without the application
server) with intermittent/offline connectivity.
- Standards are a sign of technology maturity.
The major "takeaway" is our
confirming SIV fits into the VoiceXML space and
generating the "Menlo Park Model", an SIV available VoiceXML architecture, as below.
The discussion on the above "Menlo Park Model" includes:
- Main hidden security issues that people have concern about are idendified,
and ways in which they can be realistically addressed are discussed.
Those issues don't disappear but now we know we can address them.
- VoiceXML 3.0 could be an example of the Interaction Managers
within the
W3C's MMI Architecture.
The synchronization and markup integration of multiple
modalities should be addressed.
There are likely to be multiple modalities/factors involved in an
interaction using VoiceXML. Consequently, developers need a way to not
completely separate those modalities.
- Collaboration with other W3C Working Groups and other standard
bodies, e.g., OASIS/BIAS, is expected.
Note that the discussion during the workshop was mainly focused on
the application side and provided little guidance for engine
providers. So a good standard that would create a nice wrapper around
speech engines would be needed. Classification, segmentation and
identification should be also considered and the group needs to
determine whether or not to include them in VoiceXML V3.0.
The Call for Participation,
the Logistics,
the Presentation Guideline,
the Agenda
and
the Minutes
are also available on the W3C Web server.
Judith Markowitz,
Ken Rehor
and
Kazuyuki Ashimura,
Workshop co-Chairs
$Id: summary.html,v 1.12 2009/06/23 14:19:55 ashimura Exp $