World Wide Web Consortium Issues SSML 1.0 as a W3C Recommendation

High-Quality Synthesized Speech Bolsters Speech Interface Framework

Contact Americas, Australia --
Karen Myers, <karen@w3.org>, +1.617.253.5884 or +1.978.502.6218
Contact Europe --
Marie-Claire Forgue, <mcf@w3.org>, +33.492.38.75.94
Contact Asia --
Yasuyuki Hirakawa <yasuyuki@w3.org>, +81.466.49.1170

(also available in French and Japanese)

Testimonials are also available.

http://www.w3.org/ -- 8 September 2004 -- Strengthening the voice of the Web, the World Wide Web Consortium (W3C) has published the Speech Synthesis Markup Language (SSML) 1.0 as a W3C Recommendation. SSML 1.0, a fundamental specification in the W3C Speech Interface Framework, elevates the role of high-quality synthesized speech in Web interactions. Application designers for mobile phones, personal digital assistants (PDAs), and a host of emerging technologies use SSML 1.0 to achieve both coarse- and fine-grain control of important aspects of speech synthesis, including pronunciation, volume, and pitch. Like its companion W3C Recommendations VoiceXML 2.0 and Speech Recognition Grammar Specification (SRGS) published by the W3C Voice Browser Working Group, SSML 1.0 is built for integration with other Web technologies and to promote interoperability across different synthesis-capable platforms.

"I am excited about the progress the Voice Browser Working Group has made in providing improved access to services over the telephone through the use of Web technologies," said W3C Director Tim Berners-Lee, who will be delivering a keynote address at the SpeechTEK Conference next week. He added, "Companies can now offer Web access to their customers via the telephone as well as from a personal computer."

Aimed at the world's estimated two billion fixed line and mobile phones, W3C's Speech Interface Framework — a collection of specifications for building voice applications for the Web — will allow an unprecedented number of people to use any telephone to interact with appropriately designed Web-based services via key pads, spoken commands, listening to pre-recorded speech, synthetic speech and music.

A World Wide Web Consortium (W3C) Recommendation is understood by industry and the Web community at large as a Web standard. Each Recommendation is a stable specification developed by a W3C Working Group and reviewed by the W3C Membership. Recommendations promote interoperability of Web technologies by explicitly conveying the industry consensus formed by the Working Group.

A Rich Vocabulary for High-Quality Speech

One of the primary challenges to strengthening the voice of the Web that SSML addresses is pronunciation. For example, how do you pronounce "1/2"? The SSML 1.0 specification uses this simple example to illustrate some of the challenges of turning general purpose text into meaningful synthesized speech. Without additional context, one would not know whether to say "one half" or "January second" or "February first" or "one divided by two". SSML 1.0 constructs help eliminate this sort of ambiguity. The SSML vocabulary allows word-level, phoneme-level, and even waveform-level control of the output to satisfy a wide spectrum of application scenarios and authoring requirements.

"SSML builds on the work of the pioneers in speech synthesis to provide application developers with a powerful and flexible means to deliver a high quality mix of synthetic and pre-recorded speech as part of interactive voice response services," said Dave Raggett, Activity Lead for W3C's work on voice browsers, and a W3C Fellow from Canon. He added, "SSML allows VoiceXML-based services to be accessed via textphones for people with speaking or hearing impairments. In addition, SSML has great promise beyond its use with VoiceXML, as we look forward to emerging standards for multimodal interaction."

Like XHTML, SSML is a markup language based on the widely deployed XML standard. SSML content can stand alone or be included in other XML content in order to improve rendering as synthesized speech. Naturally, SSML is particularly well-suited for use with a VoiceXML wrapper when building an interactive voice response application.

SSML 1.0 is built for Web integration in other ways as well. The Voice Browser Working Group worked closely with other W3C groups to ensure that the design of SSML 1.0 is consistent with principles of accessibility, internationalization, and general Web architecture. Indeed, one important application of SSML involves "text phones" that may be used by people with some hearing disabilities. The same content can also be output as speech through a common telephone. SSML 1.0 is also consistent with previous work at W3C on describing pronunciation with Cascading Style Sheets (CSS). W3C's CSS Working Group is developing a speech module in CSS3 for rendering XML documents with SSML-based speech engines.

Early Industry Adoption

W3C's Voice Browser Working Group has been particularly successful at ensuring adoption of its specifications before they reach Recommendation status. A test suite (discussed in the July 2004 SSML implementation report) has helped ensure consistent behavior and quality among the already numerous implementations of SSML 1.0. Vendors that have already implemented SSML 1.0 and that are participating in Working Group include: Aspect Communications, France Telecom, Hewlett-Packard, IBM, Loquendo, Microsoft, MITRE, Nuance Communications, SAP, ScanSoft, Sun Microsystems, VoiceGenie Technologies, Voxeo, and Voxpilot.

The Working Group will now focus its energies on the remainder of the Speech Framework. "After VoiceXML 2.0 and Speech Recognition Grammar Specification (SRGS), SSML is the third language of the W3C Speech Interface Framework to become a full W3C Recommendation," said Jim Larson, manager, advanced human input/output, for Intel and also co-chair of W3C's Voice Browser Working Group. "We are working to complete work on other languages of the W3C Speech Interface Framework, including VoiceXML 2.1, Semantic Interpretation, and the Call Control eXtensible Markup Language (CCXML)."

The Working Group is among the largest and most active in W3C. Its participants include: Aspect Communications, BeVocal, Brooktrout Technology, Canon, Comverse Technology, Convedia, Electronic Data Systems, France Telecom, Genesys Telecommunications Laboratories, HeyAnita, Hitachi, Hewlett-Packard, IBM, Intel, IWA-HWG, Korea Association of Information and Telecommunication, Loquendo, Microsoft, MITRE, Mitsubishi Electric, Motorola, Nokia, Nuance Communications, Openstream, SAP, ScanSoft, Siemens, Sun Microsystems, Syntellect, Tellme Networks, Verascape, Vocalocity, VoiceGenie Technologies, Voxeo, and Voxpilot.

About the World Wide Web Consortium [W3C]

The W3C was created to lead the Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability. It is an international industry consortium jointly run by the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) in the USA, the European Research Consortium for Informatics and Mathematics (ERCIM) headquartered in France and Keio University in Japan. Services provided by the Consortium include: a repository of information about the World Wide Web for developers and users, and various prototype and sample applications to demonstrate use of new technology. To date, nearly 400 organizations are Members of the Consortium. For more information see http://www.w3.org/