W3C Speech Interface Implementations

Applying Web technology to enable users to access services from their telephone via a combination of speech and DTMF.

VoiceXML 2.0 and 2.1

The BeVocal Café is a web-based VoiceXML development environment providing a carrier-grade VoiceXML 2.0 (and 1.0) interpreter, and the tools necessary to debug, and test usability of applications. The BeVocal VoiceXML Interpreter features support of the latest W3C Working Draft including many enhancements such as Speaker Verification, Voice Enrollment, XML data, pre-tuned grammars and professional audio.

Conita have implemented a VoiceXML interpreter based upon Open VXI (see below for more details.)

Eloquant has implemented a VoiceXML 2.0 interpreter from the W3C specifications. Supporting full semantic interpretation based upon the current W3C Semantic Interpretation for Speech Recognition drafts, this VoiceXML interpreter is the base for Eloquant's VoiceXML hosting infrastructure which services vocal applications in Europe.

As an early promoter sponsor of the VoiceXML Forum, HeyAnita's technology fully supports most of the current voice standards including VoiceXML 2.0. HeyAnita offers its FreeSpeech Gateway Server, a carrier-grade media server, for the development and in-premise or hosted deployment of voice applications. In addition to supporting the leading operating systems, it allows companies to utilize the ASR and TTS software and telephony hardware of their choice.

HP offers OpenCall Media platform, a carrier-grade VoiceXML 2.0 platform on HP-Unix and Linux with support for ISUP/SIP/ISDN telephony protocols, MRCP TTS/ASR (e.g. Nuance and speechWorks), and CCXML 1.0. A free PC-based VoiceXML SDK is available for application development. A voice portal framework, OpenCall Speech Web, is also available for efficient management and deployment of large-scale voice portals. For further information visit HP VoiceXML Developer Resources.

IBM has a variety of products and tools for the development, debugging, and deployment of VoiceXML applications. IBM's VoiceXML products incorporate many VBWG specifications, including VoiceXML 2.0 and SRGS 1.0.

Interact Incorporated's SPOT SIP Engine is a software platform that provides a "toolbox" for developers, integrators, VARs and hosted providers to create interactive voice applications integrated with VoIP telephony on a Linux platform. The engine provides a fully compliant 2.0 and 2.1 VoiceXML interpreter coupled with a fully compliant CCXML 1.0 interpreter for telephony support, an integrated MRCP V2 stack for interfacing 3rd party ASR/TTS engines, and HMP software for managing audio. A SPOT test portal is available for casual users to explore VoiceXML.

Intervoice's Omvia Media Server includes a VoiceXML 2.0 browser. Intervoice offers its Omvia Media Server in either a customer premise or managed service option. Intervoice also offers InVision Studio, a full featured design, development and debugging tool for the creation of VoiceXML applications. The Omvia Media Server supports multiple ASR and TTS engines in a highly scalable and robust solution.

JVoiceXML is an Open Source VoiceXML interpreter for JAVA supporting JAVA APIs such as JSAPI and JTAPI. JVoiceXML is an implementation of VoiceXML 2.0, the Voice Extensible Markup Language specified at http://www.w3.org/TR/voicexml20/. The major goal is to have a platform independent implementation that can be used for free.

Loquendo has developed a VoiceXML Interpreter that manages both VoiceXML 2.0 and 2.1 documents. The interpreter is currently integrated in Loquendo's VoxNauta Platform and used in a vast range of voice services, including very large-scale ones. Furthermore, Loquendo Café provides developers with resources and tools to learn about creating speech applications, as well as enabling them to run their VoiceXML application on a platform and listen to the service they've created over the phone - in a variety of languages.

Lucent Technologies' MiLife™ VoiceXML Gateway provides telephone access to voice-enabled web services. Implemented on the Enhanced Media Resource Server, it provides a cost-effective, highly reliable solution.

Motorola source licenses the VoxGateway, a carrier-grade VoiceXML 2.0 system which has been integrated into the offerings of a number of voice platform vendors. Contact Jim Ferrans for details.

Nuance offers a VoiceXML platform and graphical VoiceXML development tools, a Voice Site Staging Center for rapid prototyping and testing, and a VoiceXML-based voice browser to developers at no cost.

OpenVXI (official german page) is a portable open source VoiceXML interpreter available from Carnegie Mellon and developed by SpeechWorks. It may be used free of charge in commercial applications and allows the addition of proprietary modifications if desired. OpenVXI closely follows the VoiceXML 2.0 draft specification.

OptimSys offers a modular, flexible and highly scalable VoiceXML platform OptimTalk which works on leading operating systems. It includes VoiceXML and CCXML compliant interpreters and an SDK for adding VoiceXML and CCXML support to your existing solutions. OptimTalk allows for easy integration with ASR and TTS engines and telephony hardware of your choice. OptimTalk also comes with pre-integrated configurations for deployment of voice applications as well as in-house application development. For this purpose it includes a genuine SRGS and SISR implementation.

Phonologies InterpreXer™ is a robust speech application engine to 'voice enable' Enterprise Applications in the cloud or in the network. It fully implements the W3C VoiceXML 2.1 Specification. InterpreXer™ can be deployed to run as an abstract VoiceXML Interpreter and also a full SIP VoiceXML Platform running standalone or within your existing IP based Network. InterpreXer™ has been deployed in a range of environments, such as: in the cloud, in network servers, embedded CPE platforms, readily supporting speech recognition and synthesized speech (both using MRCPv2 engines). InterpreXer™ comes packaged with a web services "trigger" to generate outbound calls from within a enterprise application.

PublicVoiceXML is an open source implementation of a complete VoiceXML 2.0 browser. It is designed to work on low cost telephony hardware using DTMF navigation with hooks to 3rd party text to speech and speech recognition modules. Support and sample applications for the mobile world available. The source code is available on SourceForge and a Linux version will be available soon.

SpeechWorks offers a modular toolkit for system developers, OpenSpeech Browser, to simplify adding complete VoiceXML 2.0 support to new and existing platforms. Its flexible design is the foundation of many commercial VoiceXML solutions. Available for both Windows and Linux operating systems, OpenSpeech Browser includes a VoiceXML interpreter, Internet fetching, data caching and logging components along with integrated speech recognition and TTS engines. Full sources are available for unlimited customization.

Tellme Networks, Inc. answers over one million VoiceXML 2.0 phone calls every day for the Fortune 500 on its carrier-grade network. VoiceXML developers can access Tellme Studio to build their own VoiceXML applications for free.

Vocalocity has implemented the latest specification of VoiceXML 2.0. Vocalocity's platform software is designed specifically for OEM and Channel Partners who provide unique open solutions to their customers. The Vocalocity platform supports multiple telephony, ASR and TTS engines as well as multiple operation systems.

VoiceGenie provides carrier-grade VoiceXML Gateways fully supporting VoiceXML 2.0 plus advanced call control extensions. The platform can be deployed on premises or hosted, available via a range of service providers. VoiceGenie supports both PSTN and SIP simultaneously, and 4 major ASRs (Speechworks, BBN, AT&T Watson and Nuance) and 7 TTS resources (including SpeechWorks, AT&T Natural Voices, Rhetorical, SVOX and others). VoiceGenie runs a popular developer site where applications can be deployed and tested for free. VoiceGenie also offers carrier-grade CCXML solutions as part of the VoiceGenie 7 product line. Available as a standalone platform or integrated with the VoiceGenie Media Platform, VoiceGenieCCXML implements the latest version of the CCXML specification, and interoperates with VoiceGenie's certified VoiceXML implementation and other standards-compliant solutions.

Voxeo Corporation offers the VoiceCenter™ IVR platform that fully supports the W3C VoiceXML 2.0 recommendation and the VoiceXML 2.1 working draft. Voxeo offers VoiceCenter IVR™ in both hosted and on premise configurations.

Voxpilot offers a free VoiceXML online development and deployment environment, with a multilanguage implementation of VoiceXML 2.0 including local access numbers in all major European markets, provided over a robust VoIP infrastructure.

Speech Recognition Grammar Specification (SRGS)

IBM's VoiceXML products incorporate many VBWG specifications, including SRGS 1.0.

Interact Incorporated's SPOT SIP Engine is a software platform that provides a "toolbox" for developers, integrators, VARs and hosted providers to create interactive voice applications integrated with VoIP telephony on a Linux platform. The engine supports SRGS 1.0 to the extent required by conformance with the 2.0, 2.1 VoiceXML Recommendations for its VoiceXML Interpreter, implementation of DTMF grammars, and its integrated MRCP V2 stack for interfacing 3rd party ASR engines.

Both Loquendo ASR and Loquendo Speech Suite support SRGS 1.0 grammars (both XML and ABNF formats).

The Microsoft Speech Server supports the SRGS grammar XML format. The Microsoft Speech Application SDK is a suite of tools for authoring speech applications for voice-only telephony and multimodal applications, and contains a Grammar Editor for the building, editing and debugging of SRGS grammars. SRGS is also supported as the SALT grammar format in Microsoft's SALT add-in to Internet Explorer.

NuGram from Nu Echo is a complete solution for authoring, debugging, deploying, and maintaining speech recognition grammars based on W3C's Speech Recognition Grammar Specification (SRGS) 1.0 and Semantic Interpretation for Speech Recognition (SISR) 1.0. It comprises a full-featured development environment based on Eclipse and a Java-servlet based infrastructure to deploy dynamic grammars.

OptimSys' VoiceXML platform OptimTalk can work in various desktop configurations for in-house application development. For this purpose it includes a genuine SRGS implementation which is, among other things, able to interface with Microsoft Speech API 5.1 compliant speech recognition engines.

Grammar Studio from Voice Web Solutions is a java based WYSIWYG stand-alone GRXML editor desktop software tool from Voice Web Solutions for voice web speech developers authoring the W3C Speech Recognition Grammar Format within VoiceXML telephony and multimodal SALT, telephony and X+V multimodal applications which implement SRGS grammars.

Voxeo Corporation offers the VoiceCenter™ IVR platform that fully supports the W3C Speech Recognition Grammar Specification (SRGS) 1.0 recommendation. Voxeo offers VoiceCenter IVR™ in both hosted and on premise configurations.

Voxpilot provides an implementation of SRGS (XML form) as an important component of its VoiceXML 2.0 platform.

Speech Synthesis Markup Language (SSML)

Interact Incorporated's SPOT SIP Engine is a software platform that provides a "toolbox" for developers, integrators, VARs and hosted providers to create interactive voice applications integrated with VoIP telephony on a Linux platform. The engine supports SSML 1.0 to the extent required by conformance with the 2.0, 2.1 VoiceXML Recommendations for its VoiceXML Interpreter, and its integrated MRCP V2 stack for interfacing 3rd party TTS engines.

Loquendo TTS engine and Loquendo Speech Suite support the complete SSML 1.0 Recommendation.

The Microsoft Speech Server supports SSML for text-to-speech synthesis. The Microsoft Speech Application SDK is a suite of tools for authoring applications for voice-only telephony and multimodal applications. SSML is supported as the default SALT TTS format in Microsoft's add-in to Internet Explorer.

OptimSys' VoiceXML platform OptimTalk provides means for integration with SSML compliant speech synthesis engines. In addition, OptimTalk supports using SSML even with speech synthesis engines that do not have native SSML support, e.g. Microsoft Speech API 5.1 compliant engines. The SSML support might be limited to some extent by the abilities of a particular engine in this case. Several speech engines have been preintegrated.

Voxeo Corporation offers the VoiceCenter™ IVR platform that fully supports the W3C Speech Synthesis Markup Language (SSML) 1.0 specification. Voxeo offers VoiceCenter IVR™ in both hosted and on premise configurations.

Semantic Interpretation for Speech Grammars (SISR)

Interact Incorporated's SPOT SIP Engine is a software platform that provides a "toolbox" for developers, integrators, VARs and hosted providers to create interactive voice applications integrated with VoIP telephony on a Linux platform. The engine supports for SISR 1.0 to the extent required by conformance with the 2.0, 2.1 VoiceXML Recommendations for its VoiceXML Interpreter, implementation of DTMF grammars, and its integrated MRCP V2 stack for interfacing 3rd party ASR engines.

The Loquendo ASR engine and Loquendo Speech Suite support the Semantic Interpretation for Speech Recognition (SISR 1.0) within SRGS 1.0 grammars (see above) based on the SISR 1.0 Recommendation.

The Microsoft Speech Server implements semantic interpretation within SRGS grammars (see above) based upon the working draft SI specification.

NuGram from Nu Echo is a complete solution for authoring, debugging, deploying, and maintaining speech recognition grammars based on W3C's Speech Recognition Grammar Specification (SRGS) 1.0 and Semantic Interpretation for Speech Recognition (SISR) 1.0. It comprises a full-featured development environment based on Eclipse and a Java-servlet based infrastructure to deploy dynamic grammars.

OptimSys' VoiceXML platform OptimTalk contains an SISR-compliant semantic interpreter. It can be used together with OptimSys' SRGS implementation or it can be used separately for semantic interpretation of speech recognition results provided by other grammar systems.

Voice Browser Call Control (CCXML)

Interact Incorporated's SPOT SIP Engine is a software platform that provides a “toolbox" for developers, integrators, VARs and hosted providers to create interactive voice applications integrated with VoIP telephony on a Linux platform. The engine provides a fully compliant 2.0 and 2.1 VoiceXML interpreter coupled with a fully compliant CCXML 1.0 interpreter managing a SIP call control interface to the external telephony world (VoIP or TDM, the latter via a media gateway). Interact Inc. was a key participant in the Candidate Recommendation validation process and submitted an Implementation Report.

Loquendo's VoxNauta Platform fully supports the W3C Call Control XML (CCXML) specification which provides advanced call control functionality with the ease and power of XML. The current implementation is based on CCXML 1.0 Last Call Working Draft.

OptimSys' VoiceXML platform OptimTalk includes a modular, flexible and highly scalable implementation of the CCXML specification. The CCXML interpreter is independent on the telephony hardware and protocols and allows for its easy integration with your existing telephony infrastructure using OptimTalk SDK.

Phonologies' Oktopous™ v1.2 ccXML Interpreter PIK is a comprehensive Linux based ccXML tool kit that conforms to the April 2010 Working Draft spec of CCXML 1.0 published by W3C. It is designed for portability, targeting high-density telephony platforms as an OEM offering for vendors and system integrators looking to implement CCXML functionality in their product offerings.
Phonologies' Open Source Oktopous™ v1.1 ccXML Interpreter PIK is originally developed by Phonologies and has been adopted by more telephony platforms than any open source ccXML Browser.

Voxeo Corporation offers the VoiceCenter™ IVR platform that fully supports the W3C Call Control XML (CCXML) specification which provides advanced call control functionality with the ease and power of XML. Voxeo offers VoiceCenter IVR™ in both hosted and on premise configurations.

State Chart XML (SCXML): State Machine Notation for Control Abstraction

IBM has an open source implementation of SCXML that it has donated to the Apache Community. The Commons SCXML codebase is maintained and updated regularly to match the SCXML specification. Feedback on the Apache Commons SCXML is welcome; developers as well as users can post bugs, comments and suggestions using Apache's Commons mailing lists.

To allow people to get a feeling for what State Chart XML (SCXML) is all about, and to elicit feedback in order that we can improve upon it, Torbjörn Lager has implemented SCXML Web Laboratory - a web interface to a prototype implementation of SCXML in the Oz programming language.

Nokia Qt Software is using SCXML in its Qt state-machine framework. Qt (http://www.qtsoftware.com) is a leading cross-platform application and UI framework. The State Machine Framework lets you define and run state machines in C++, and it includes classes to load SCXML files. The Qt State Machine Framework is also used as the backbone of the Qt Animation Framework.

Authoring Tools

PHP Voice is an open source library for generating VoiceXML applications via the PHP server-side scripting language.

W3C Team Contacts

Kazuyuki Ashimura <ashimura@w3.org> - Voice Browser Activity Lead

Copyright © 1995-2009 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements. This page was last updated on $Date: 2014/02/11 05:47:55 $