"Voice Browser" Activity

Applying Web technology to enable users to access services from their telephone via a combination of speech and DTMF.

News

Answers to some frequently asked questions and a list of implementations can be found at the end of this page.

Introduction

The telephone was invented more than 150 years ago, and continues to be a very important means for us to communicate with each other. The Web by comparison is very recent, but has rapidly become a competing communications channel. The convergence of telecommunications and the Web is now bringing the benefits of Web technology to the telephone, enabling Web developers to create applications that can be accessed via any telephone, and allowing people to interact with these applications via speech and telephone keypads. The W3C Speech Interface Framework is a suite of markup specifications aimed at realizing this goal. It covers voice dialogs, speech synthesis, speech recognition, telephony call control for voice browsers and other requirements for interactive voice response applications, including use by people with hearing or speaking impairments.

Some possible applications include:

Current Situation

The Voice Browser Working Group was first established on 26 March 1999 following a Workshop held the previous October. It was subsequently rechartered on 25 September 2002, and has now been re-chartered through 31 January 2007 to continue its work on maintaining and enhancing the W3C Speech Interface Framework suite of specifications. It operates under the terms of the W3C Patent Policy (5 February 2004 Version). To promote the widest adoption of Web standards, W3C seeks to issue Recommendations that can be implemented, according to this policy, on a Royalty-Free basis. The Working Group is co-chaired by Jim Larson and Scott McGlashan. The W3C Team Contacts are Max Froumentin and Kazuyuki Ashimura.

We want to hear from you!

We are very interested in your comments on our published documents and suggestions for improvements and future work. To subscribe to the discussion list send an email to www-voice-request@w3.org with the word subscribe in the subject header. Previous discussion can be found in the public archive. To unsubscribe send an email to www-vocie-request@w3.org with the word unsubscribe in the subject header.

How to join the Working Group

If your organization is already a member of W3C, ask your W3C Advisory Comittee Representative (member only link) to fill out the online registration form to confirm that your organization is prepared to commit the time and expense involved in particpating in the group. You will be expected to attend all Working Group meetings (about 3 or 4 times a year) and to respond in a timely fashion to email requests. Further details about joining are available on the Working Group (member only link) page. Requirements for patent disclosures, as well as terms and conditions for licensing essential IPR are given in the W3C Patent Policy.

More information about the W3C is available, as is information about joining W3C.

Work Under Development

This is intended to give you a brief summary of each of the major work items under development by the Voice Browser Working Group. The suite of specifications is known as the W3C Speech Interface Framework.

Translations of some of the documents below are available. See the W3C Translations page.

VoiceXML

VoiceXML 2.0 is based upon extensive industry experience. It is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. For an introduction, there is a tutorial. Further tutorials and other resources can be found on the VoiceXML Forum Web site. W3C and VoiceXML Forum have signed a memorandum of understanding setting out mutual goals.

VoiceXML 2.1 provides a small set of additional features. These will help developers to build even more powerful, maintainable and portable voice-activated services, with complete backwards compatibility with the VoiceXML 2.0 specification.

VoiceXML 3.0 is the next major release of VoiceXML. Its purpose is to provide powerful dialog capabilities that can be used to build advanced speech applications, and to provide these capabilities in a form that can be easily and cleanly integrated with other W3C languages. It will provide enhancements to existing dialog and media control, as well as major new features (e.g. modularization, a cleaner separation between data/flow/dialog, and asynchronous external eventing) to facilitate interoperation with external applications and media components.

Speech Recognition Grammar Specificztion (SRGS)

The Speech Recognition Grammar Specification (SRGS) covers both speech and DTMF input. DTMF is valuable in noisy conditions or when the social context makes it awkward to speak. Grammars can be specified in either an XML or an equivalent augmented BNF (ABNF) syntax, which some authors may find easier to deal with. Speech recognition is an inherently uncertain process. Some speech engines may be able to ignore extraneous "um's" and "aah's" and be able to perform partial matches. Recognizers may report confidence values. If the utterance has several possible parses, the recognizer may be able to report the most likely alternatives (N-best results).

Stochastic language models are used with open-ended prompts (e.g., "How can I help?") where context-free grammars would be unwieldy. N-Gram models cover the likelihood that a given word will occur after certain other words. Such models are widely used for dictation systems, but can also be combined with word spotting rules that determine how to route a help desk call etc. The current draft N-Gram specification was published on 3 January 2001. The resumption of work on stochastic language models will depend upon progress in other higher priority areas.

Speech Synthesis Markup Language (SSML)

The speech synthesis specification (SSML) defines a markup language for prompting users via a combination of prerecorded speech, synthetic speech and music. You can select voice characteristics (name, gender and age) and the speed, volume, pitch, and emphasis. There is also provision for overriding the synthesis engine's default pronunciation.

There is currently interest in support for a greater range of languages, greater expressivity for more natural sounding speech as well as other ideas for a next generation version of SSML. Work on this is unlikely to get underway during 2005 while the Working Group focuses on higher priorities,

The Voice Browser Working Group has been collaborating with the CSS Working Group to develop a CSS3 module for speech synthesis based upon SSML for use in rendering XML documents to speech. This is intended to replace the aural cascading style sheet properties in CSS2. The current Working Draft was published on 16 December 2004 and will be updated to track progress on the SSML say-as element.

SSML 1.1 provides a small set of additional features to make SSML more useful in current and emerging markets especially for non-English languages.

Pronunciation Lexicon Specification (PLS)

Pronunciation Lexicons describe phonetic information for use in speech recognition and synthesis. The requirements were first published on 12 March 2001, and updated on 29 October 2004. The pronunciation lexicon is designed to enable developers to provide supplemental information on pronunciation for things like place names, proper names and abbreviations.

Semantic Interpretation for Speech Recognition (SISR)

The semantic interpretation specification describes annotations to grammar rules for extracting the semantic results from recognition. The annotations are expressed in a syntax based upon a subset of ECMAScript, and when evaluated, yield a result represented either as XML or as a value that can be held in an ECMAScript variable. A target for the XML output is Extensible Multimodal Annotation Markup Language (EMMA) which is being developed in the W3C Multimodal Interaction Activity.

Call Control (CCXML)

W3C is designing the CCXML markup language to enable fine-grained control of speech (signal processing) resources and telephony resources in a VoiceXML telephony platform. CCXML's scope is controlling resources in a platform on the network edge (not building network-based call processing applications in a telephone switching system or controlling an entire telecom network). Application developers will be able to use CCXML to perform call screening, whisper call waiting, and call transfer. CCXML enables applications to offer users the ability to place outbound calls, conditionally answer calls, and to initiate or receive outbound communications such as another call.

State Chart XML (SCXML): State Machine Notation for Control Abstraction

SCXML is a candidate for the control language within VoiceXML 3.0 (currently under development by the Voice Browser working group), CCXML 2.0 (anticipated development in 2006 by the Voice Browser working group), and the multimodal authoring language (under development by the Multimodal Interaction working group).

The Data-Flow-Presentation framework

The Working Group has made available a document presenting the DFP framework, which explains how Voice Browser specifications can be used together to create modular voice applications.

Frequently asked questions

Far more people today have access to a telephone than have access to a computer with an Internet connection. In addition, sales of cell phones are booming, so that many of us have already or soon will have a phone within reach wherever we go. Voice Browsers offer the promise of allowing everyone to access Web based services from any phone, making it practical to access the Web any time and any where, whether at home, on the move, or at work.

It is common for companies to offer services over the phone via menus traversed using the phone's keypad. Voice Browsers offer a great fit for the next generation of call centers, which will become Voice Web portals to the company's services and related websites, whether accessed via the telephone network or via the Internet. Users will be able to choose whether to respond by a key press or a spoken command. Voice interaction holds the promise of natural dialogs with Web-based services.

Voice browsers allow people to access the Web using speech synthesis, pre-recorded audio, and speech recognition. This can be supplemented by keypads and small displays. Voice may also be offered as an adjunct to conventional desktop browsers with high resolution graphical displays, providing an accessible alternative to using the keyboard or screen, for instance in automobiles where hands/eyes free operation is essential. Voice interaction can escape the physical limitations on keypads and displays as mobile devices become ever smaller.

Hitherto, speech recognition and spoken language technologies have had for the most part to be handcrafted into applications. The Web offers the potential to vastly expand the opportunities for voice-based applications. The Web page provides the means to scope the dialog with the user, limiting interaction to navigating the page, traversing links and filling in forms. In some cases, this may involve the transformation of Web content into formats better suited to the needs of voice browsing. In others, it may prove effective to author content directly for voice browsers.

Information supplied by authors can increase the robustness of speech recognition and the quality of speech synthesis. Text to speech can be combined with pre-recorded audio material in a manner analogous to the use of images in visual media, drawing upon experience with radio broadcasting. The lessons learned in designing for accessibility can be applied to the broader voice browsing marketplace, making it practical to deliver services to a wide range of platforms.

Q. What are the greatest improvements that VoiceXML 2.0 offers?

A. VoiceXML 2.0 brings the advantages of Web-based development and content delivery to interactive voice response applications. VoiceXML controls the dialog between the application and the user. It is downloaded from HTTP servers in the same way as HTML. This means that application developers can take full advantage of widely deployed and industry proven Web technologies.

Q. While applications for VoiceXML 2.0 have been cited for customer service use, what other groups of users could benefit from VoiceXML 2.0 ( i.e. disabled persons, emergency workers, military, transportation workers, manufacturing workers)?

A. VoiceXML 2.0 is being applied to a wide range of applications, for instance, call centers, government offices and agencies, banks and financial services, utilties, healthcare, retail sales, travel and transportation, and many more. VoiceXML has tremendous potential for improved accessibility for a wide range of services for people with visual impairments, and via text phones, for people with speaking and/or hearing impairments.

Q. What/when is the next milestone for VoiceXML 2.0 now that it has been accepted as a W3C recommendation?

A. VoiceXML 2.0 and Speech Recognition Grammar Specification (SRGS) 1.0 are the first two components in the W3C Speech Interface to reach W3C Recommendation status. SRGS allows applications to specify the words and phrases that users are prompted to speak. This enables robust speaker independent recognition. The next in line is SSML, the speech synthesis markup language that used with VoiceXML to prompt users and to provide answers to questions. After that we have Semantic Interpretation for Speech Recognition which provides the means for developers to extract application data from the results of speech recognition, and CCXML, an XML telephony call control language for VoiceXML.

Q. Where is VoiceXML 2.0 taking us in man/machine relationships?

A. VoiceXML offers a demonstrable improvement in the user experience when people call up companies on the phone, compared with waiting for ages for human customer service representative to become free, or having to put up with the press one for this, press two for that style of interaction with the previous generation of interactive voice response services.

Q. What makes your organization so interested in VoiceXML 2.0?

A. W3C's mission is to fufil the potential of the Web. With an estimated 2 billion fixed line and mobile phones world-wide, VoiceXML will allow an unprecedented number of people to use any telephone to interact with appropriately designed Web-based services via key pads, spoken commands, listening to pre-recorded speech, synthetic speech and music.

Q. What are the new features in VoiceXML 2.1?

A. VoiceXML 2.1 defines a small set of widely implemented additional features. These include using computed expressions for referencing grammars and scripts, the ability to detect where barge-in occurred within a prompt, greater convenience in prompting for dynamic lists of values, to be able to download data without having to move to the next page, to record the user's speech during recognition for later analysis, to pass data with a disconnect, and enhanced control over transfer.

Q. Why not just use HTML instead of inventing a new language for voice-enabled web applications?

A. HTML was originally designed as a visual language with emphasis on visual layout and appearance. Voice differs from visual interfaces in that spoken prompts are transient, you hear it and it's gone. Furthermore, speech recognition is not 100% accurate. As a result, telephony voice interfaces are much more dialog oriented, with emphasis on verbal presentation and response, so a new markup language was especially designed for speech dialogs. However, now that XHTML is evolving beyond the visual focus, it may be possible to combine speech markup with XHTML.

Q. What is the W3C Speech Interface Framework?

A. A collection of interrelated languages for developing speech applications being developed by the Voice Browser Working Group. Currently the W3C Speech Interface Framework includes

Q: What is the purpose of the mailing list www-voice@w3.org?

A. Anyone may use www-voice@w3.org to comment on any of the W3C Speech Interface Framework language specifications. Because VoiceXML 2.0, SRGS and SSML have passed the last call working draft status, comments on these languages will apply to the next version of these languages. This mailing list should not be used to ask for help in developing speech applications. To subscribe send an email to www-voice-request@w3.org with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe).

Q. What is the status of the Speech Application Language Tags (SALT) proposal from the SALT Forum?

A. The Voice Browser Working Group will soon begin to specify the requirements for the next version of VoiceXML. The Working Group will consider the SALT contribution as it develops the requirements and specifications for the next version.

Q. What is the status of the XHTML + Voice Profiles submission from IBM, Motorola, and Opera?

A. An updated version of XHTML + Voice Profiles (X+V) was contributed to the Working Group on 11 March 2003, and incorporates implementation feedback and is synchronized with the VoiceXML 2.0 Candidate Recommendation, see the Team Comment for details of associated IPR disclosures. The Voice Browser Working Group will soon begin to specify the requirements for the next version of VoiceXML. The Working Group will consider the XHTML + Voice Profiles contribution as it develops the requirements and specifications for the next version.

Q. In what other environments can the languages of the Speech Interface Framework be used?

A. SRGS can be used in systems where user input needs to be mapped into a constrained set of words. SRGS is already being used in speech recognition systems, for example the SALT specification uses SRGS as its speech grammar format. It has also been applied to handwriting recognition systems and could conceivably be used for key strokes. VoiceXML is already being used inside XHTML in XHTML+Voice for developing multimodal user interfaces. For people with speaking or hearing impairments, VoiceXML applications can in principle be used with text phones as well as with regular telephones.

The Semantic Interpretation language can be used with SRGS to further generate semantic values from the input. Similarly, SSML is a general mechanism for marking up text for speech output that can be used in a variety of contexts. Again, it is used in SALT interpreters, and it clearly has value for browsers that support the reading of text in non-dialog scenarios (e.g. screen readers). SSML could also be used by any speech synthesis applications, including book reading, e-mail readers, etc.

Q. What is the status of the Natural Language Semantics Markup Language?

A. Work in this area has been transferred to the Multimodal Interaction Working Group, which has begun work on the successor of the Natural Language Semantics Markup Language, called Extended Multimodal Annotation language (EMMA).

Q. What is the status of work on N-Grams, Pronunciation Lexicon and Voice Browser Interoperation Requirements?

A. The Voice Browser Working Group has suspended work in these areas in order to concentrate on bring the remaining languages in the W3C Speech Interface Framework to W3C Recommendation status.

Q. Where can I find more information about the W3C Speech Interface Framework languages?

A. You can access the latest published specification on this web site. If you are implementing a speech application using the W3C Speech Interface Framework languages, refer to the documentation provided by the vender of your VoiceXML 2.0 browser. There are several books available for learning about the W3C Speech Interface Framework languages, including the following:

Q. Is there a conformance test for languages in the W3C Speech Interface Framework?

A. The VoiceXML Forum is has stated its intention to develop plans for a conformance-testing program for VoiceXML 2.0.

Q. Are there validation tests for the Speech Synthesis Recognition Specification?

A. Yes, the Voice Browser Working Group has created a suite of validation tests as part of the Implementation Report plan for the Candidate Recommendation phase.

Q. Are there IP issues associated with VoiceXML 2.0?

A. Every member of the Voice Browser Working Group is required to make a disclosure statement regarding its IP claims relevant to essential technology for Voice Browsers. There are no known impediments to royalty free implementations of the W3C Speech Interface Framework. For more information, please view the summary of the patent disclosures.

Q. How will patent policy issues effect future work on VoiceXML?

A. The Voice Browser working group was rechartered in September 2002 as a royalty free group under the terms of W3C's Current Patent Practice Note, with the goal of producing a W3C Recommendation for VoiceXML and related specifications that can be implemented and distributed without the need to pay any royalties.

W3C does not take a position regarding the validity or scope of any intellectual property right or other rights that might be claimed to pertain to the implementation or use of the technology, nor the extent to which any license under such rights might or might not be available. Copyright of Working Group deliverables is vested in the W3C.

Q. How can I find out about VoiceXML related talks, conferences and seminars?

A. See the Speech Technology magazine website, and the VoiceXML Forum website. See also Ken Rehor's World of VoiceXML.

Q. Who has implemented interpreters for the W3C Speech Interface specifications?

A. To be listed below, the implementation must be working and available for use by developers. If you would like to be mentioned please contact the Team contact (see bottom). Entries should be relevant, factual and brief (one short paragraph).

W3C Speech Interface Implementations

VoiceXML 2.0 and 2.1

The BeVocal Café is a web-based VoiceXML development environment providing a carrier-grade VoiceXML 2.0 (and 1.0) interpreter, and the tools necessary to debug, and test usability of applications. The BeVocal VoiceXML Interpreter features support of the latest W3C Working Draft including many enhancements such as Speaker Verification, Voice Enrollment, XML data, pre-tuned grammars and professional audio.

Conita have implemented a VoiceXML interpreter based upon Open VXI (see below for more details.)

Eloquant has implemented a VoiceXML 2.0 interpreter from the W3C specifications. Supporting full semantic interpretation based upon the current W3C Semantic Interpretation for Speech Recognition drafts, this VoiceXML interpreter is the base for Eloquant's VoiceXML hosting infrastructure which services vocal applications in Europe.

As an early promoter sponsor of the VoiceXML Forum, HeyAnita's technology fully supports most of the current voice standards including VoiceXML 2.0. HeyAnita offers its FreeSpeech Gateway Server, a carrier-grade media server, for the develoment and in-premise or hosted deployment of voice applications. In addition to supporting the leading operating systems, it allows companies to utilize the ASR and TTS software and telephony hardware of their choice.

HP offers OpenCall Media platform, a carrier-grade VoiceXML 2.0 platform on HP-Unix and Linux with support for ISUP/SIP/ISDN telephony protocols, MRCP TTS/ASR (e.g. Nuance and speechWorks), and CCXML 1.0. A free PC-based VoiceXML SDK is available for application development. A voice portal framework, OpenCall Speech Web, is also available for efficient management and deployment of large-scale voice portals. For further information visit HP VoiceXML Developer Resources.

IBM has a variety of products and tools for the development, debugging, and deployment of VoiceXML applications. IBM's VoiceXML products incorporate many VBWG specifications, including VoiceXML 2.0 and SRGS 1.0.

Intervoice's Omvia Media Server includes a VoiceXML 2.0 browser. Intervoice offers its Omvia Media Server in either a customer premise or managed service option. Intervoice also offers InVision Studio, a full featured design, development and debugging tool for the creation of VoiceXML applications. The Omvia Media Server supports multiple ASR and TTS engines in a highly scalable and robust solution.

JVoiceXML is an Open Source VoiceXML interpreter for JAVA supporting JAVA APIs such as JSAPI and JTAPI. JVoiceXML is an implementation of VoiceXML 2.0, the Voice Extensible Markup Language specified at http://www.w3.org/TR/voicexml20/. The major goal is to have a platform independent implementation that can be used for free.

Loquendo has developed a VoiceXML Interpreter that manages both VoiceXML 1.0 and 2.0 documents. The interpreter is currently integrated in Loquendo's Platform Solutions and used in a vast range of voice services, including very large-scale ones. Furthermore, Loquendo Café provides developers with resources and tools to learn about creating speech applications, as well as enabling them to run their VoiceXML application on a platform and listen to the service they've created over the phone - in a variety of languages.

Lucent Technologies' MiLife™ VoiceXML Gateway provides telephone access to voice-enabled web services. Implemented on the Enhanced Media Resource Server, it provides a cost-effective, highly reliable solution.

Motorola source licenses the VoxGateway, a carrier-grade VoiceXML 2.0 system which has been integrated into the offerings of a number of voice platform vendors. Contact Jim Ferrans for details.

Nuance offers a VoiceXML platform and graphical VoiceXML development tools, a Voice Site Staging Center for rapid prototyping and testing, and a VoiceXML-based voice browser to developers at no cost.

OpenVXI is a portable open source VoiceXML interpreter available from Carnegie Mellon and developed by SpeechWorks. It may be used free of charge in commercial applications and allows the addition of proprietary modifications if desired. OpenVXI closely follows the VoiceXML 2.0 draft specification.

OptimSys offers a modular, flexible and highly scalable VoiceXML platform OptimTalk which works on leading operating systems. It includes VoiceXML and CCXML compliant interpreters and an SDK for adding VoiceXML and CCXML support to your existing solutions. OptimTalk allows for easy integration with ASR and TTS engines and telephony hardware of your choice. OptimTalk also comes with pre-integrated configurations for deployment of voice applications as well as in-house application development. For this purpose it includes a genuine SRGS and SISR implementation.

PublicVoiceXML is an open source implementation of a complete VoiceXML 2.0 browser. It is designed to work on low cost telephony hardware using DTMF navigation with hooks to 3rd party text to speech and speech recognition modules. Support and sample applications for the mobile world available. The source code is available on SourceForge and a Linux version will be available soon.

SpeechWorks offers a modular toolkit for system developers, OpenSpeech Browser, to simplify adding complete VoiceXML 2.0 support to new and existing platforms. Its flexible design is the foundation of many commerical VoiceXML solutions. Available for both Windows and Linux operating systems, OpenSpeech Browser includes a VoiceXML interpreter, Internet fetching, data caching and logging components along with integrated speech recognition and TTS engines. Full sources are available for unlimited customization.

Tellme Networks, Inc. answers over one million VoiceXML 2.0 phone calls every day for the Fortune 500 on its carrier-grade network. VoiceXML developers can access Tellme Studio to build their own VoiceXML applications for free.

Vocalocity has implemented the latest specification of VoiceXML 2.0. Vocalicity's platform software is designed specifically for OEM and Channel Partners who provide unique open solutions to their customers. The Vocalocity platform supports multiple telephony, ASR and TTS engines as well as multiple operation systems.

VoiceGenie provides carrier-grade VoiceXML Gateways fully supporting VoiceXML 2.0 plus advanced call control extensions. The platform can be deployed on premises or hosted, available via a range of service providers. VoiceGenie supports both PSTN and SIP simultaneously, and 4 major ASRs (Speechworks, BBN, AT&T Watson and Nuance) and 7 TTS resources (including Speechworks, AT&T Natural Voices, Rhetorical, SVOX and others). VoiceGenie runs a popular developer site where applications can be deployed and tested for free. VoiceGenie also offers carrier-grade CCXML solutions as part of the VoiceGenie 7 product line. Available as a standalone platform or integrated with the VoiceGenie Media Platform, VoiceGenieCCXML implements the latest version of the CCXML specification, and interoperates with VoiceGenie's certified VoiceXML implementation and other standards-compliant solutions.

Voxeo Corporation offers the VoiceCenter™ IVR platform that fully supports the W3C VoiceXML 2.0 recommendation and the VoiceXML 2.1 working draft. Voxeo offers VoiceCenter IVR™ in both hosted and on premise configurations.

Voxpilot offers a free VoiceXML online development and deployment environment, with a multilanguage implementation of VoiceXML 2.0 including local access numbers in all major European markets, provided over a robust VoIP infrastructure.

Speech Recognition Grammar Specification (SRGS)

The SRGS 1.0 Implementation Report includes reports from BeVocal, IBM, Loquendo, Lucent, Microsoft, and SpeechWorks (now part of Scansoft).

IBM's VoiceXML products incorporate many VBWG specifications, including SRGS 1.0.

Both Loquendo ASR and Loquendo Speech Server support SRGS 1.0 grammars (both XML and ABNF formats).

The Microsoft Speech Server supports the SRGS grammar XML format. The Microsoft Speech Application SDK is a suite of tools for authoring speech applications for voice-only telephony and multimodal applications, and contains a Grammar Editor for the building, editing and debugging of SRGS grammars. SRGS is also supported as the SALT grammar format in Microsoft's SALT add-in to Internet Explorer.

OptimSys' VoiceXML platform OptimTalk can work in various desktop configurations for in-house application development. For this purpose it includes a genuine SRGS implementation which is, among other things, able to interface with Microsoft Speech API 5.1 compliant speech recognition engines.

Grammar Studio from Voice Web Solutions is a java based WYSIWYG stand-alone GRXML editor desktop software tool from Voice Web Solutions for voice web speech developers authoring the W3C Speech Recognition Grammar Format within VoiceXML telephony and multimodal SALT, telephony and X+V multimodal applications which implement SRGS grammars.

Voxeo Corporation offers the VoiceCenter™ IVR platform that fully supports the W3C Speech Recognition Grammar Specification (SRGS) 1.0 recommendation. Voxeo offers VoiceCenter IVR™ in both hosted and on premise configurations.

Voxpilot provides an implementation of SRGS (XML form) as an important component of its VoiceXML 2.0 platform.

Speech Synthesis Markup Language (SSML)

There are several internal implementations but few that are currently available to developers.

Loquendo TTS engine and server solutions support the complete SSML 1.0 specification.

The Microsoft Speech Server supports SSML for text-to-speech synthesis. The Microsoft Speech Application SDK is a suite of tools for authoring applications for voice-only telephony and multimodal applications. SSML is supported as the default SALT TTS format in Microsoft's add-in to Internet Explorer.

OptimSys' VoiceXML platform OptimTalk provides means for integration with SSML compliant speech synthesis engines. In addition, OptimTalk supports using SSML even with speech synthesis engines that do not have native SSML support, e.g. Microsoft Speech API 5.1 compliant engines. The SSML support might be limited to some extent by the abilities of a particular engine in this case. Several speech engines have been preintegrated.

Voxeo Corporation offers the VoiceCenter™ IVR platform that fully supports the W3C Speech Synthesis Markup Language (SSML) 1.0 specification. Voxeo offers VoiceCenter IVR™ in both hosted and on premise configurations.

Semantic Interpretation for Speech Grammars

The Loquendo ASR engine supports the Semantic Interpretation for Speech Recognition (SISR 1.0) within SRGS 1.0 grammars (see above) based on the Last Call Working Draft of SISR.

The Microsoft Speech Server implements semantic interpretation within SRGS grammars (see above) based upon the working draft SI specification.

OptimSys' VoiceXML platform OptimTalk contains an SISR-compliant semantic interpreter. It can be used together with OptimSys' SRGS implementation or it can be used separately for semantic interpretation of speech recognition results provided by other grammar systems.

CCXML

Send us details of your implementations!

OptimSys' VoiceXML platform platform OptimTalk includes a modular, flexible and highly scalable implementation of the CCXML specification. The CCXML interpreter is independent on the telephony hardware and protocols and allows for its easy integration with your existing telephony infrastructure using OptimTalk SDK.

Phonologies has made availabile Oktopous™ PIK, an abstract C++ implementation of the the W3C Call Control XML (CCXML 1.0 specification) standard, on the open source licensing model - BSD Style. Oktopous PIK is independent of the underlying telephony platform and protocols and is best suited for OEM customers / System Integrators looking to implement CCXML functionality in their product offerings. Oktopous PIK complies with the latest last call working draft and is available from the SourceForge download section.

Phonologies has also implemented CCXML (Oktopous), along with VoiceXML (InterpreXer™3.0) and SIP, as part of its IP Media Server, Oktopous Media Server.

Voxeo Corporation offers the VoiceCenter™ IVR platform that fully supports the W3C Call Control XML (CCXML) specification which provides advanced call control functionality with the ease and power of XML. Voxeo offers VoiceCenter IVR™ in both hosted and on premise configurations.

Authoring Tools

PHP Voice is an open source library for generating VoiceXML applications via the PHP server-side scripting language.

W3C Team Contact

Max Froumentin <mf@w3.org> - Voice Activity Lead
Kazuyuki Ashimura <kazuyuki@w3.org> - Team contact

Copyright©1995-2006 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements. This page was last updated on $Date: 2006/03/22 07:00:48 $ by $Author: ashimura $.