World Wide Web Consortium Issues SSML 1.0 as a W3C Recommendation

Author(s) and publish date

Published:: 8 September 2004

High-Quality Synthesized Speech Bolsters Speech Interface Framework

http://www.w3.org/ -- 8 September 2004 -- Strengthening the voice of the Web, the World Wide Web Consortium (W3C) has published the Speech Synthesis Markup Language (SSML) 1.0 as a W3C Recommendation. SSML 1.0, a fundamental specification in the W3C Speech Interface Framework, elevates the role of high-quality synthesized speech in Web interactions. Application designers for mobile phones, personal digital assistants (PDAs), and a host of emerging technologies use SSML 1.0 to achieve both coarse- and fine-grain control of important aspects of speech synthesis, including pronunciation, volume, and pitch. Like its companion W3C Recommendations VoiceXML 2.0 and Speech Recognition Grammar Specification (SRGS) published by the W3C Voice Browser Working Group, SSML 1.0 is built for integration with other Web technologies and to promote interoperability across different synthesis-capable platforms.

"I am excited about the progress the Voice Browser Working Group has made in providing improved access to services over the telephone through the use of Web technologies," said W3C Director Tim Berners-Lee, who will be delivering a keynote address at the SpeechTEK Conference next week. He added, "Companies can now offer Web access to their customers via the telephone as well as from a personal computer."

Aimed at the world's estimated two billion fixed line and mobile phones, W3C's Speech Interface Framework — a collection of specifications for building voice applications for the Web — will allow an unprecedented number of people to use any telephone to interact with appropriately designed Web-based services via key pads, spoken commands, listening to pre-recorded speech, synthetic speech and music.

A World Wide Web Consortium (W3C) Recommendation is understood by industry and the Web community at large as a Web standard. Each Recommendation is a stable specification developed by a W3C Working Group and reviewed by the W3C Membership. Recommendations promote interoperability of Web technologies by explicitly conveying the industry consensus formed by the Working Group.

A Rich Vocabulary for High-Quality Speech

One of the primary challenges to strengthening the voice of the Web that SSML addresses is pronunciation. For example, how do you pronounce "1/2"? The SSML 1.0 specification uses this simple example to illustrate some of the challenges of turning general purpose text into meaningful synthesized speech. Without additional context, one would not know whether to say "one half" or "January second" or "February first" or "one divided by two". SSML 1.0 constructs help eliminate this sort of ambiguity. The SSML vocabulary allows word-level, phoneme-level, and even waveform-level control of the output to satisfy a wide spectrum of application scenarios and authoring requirements.

"SSML builds on the work of the pioneers in speech synthesis to provide application developers with a powerful and flexible means to deliver a high quality mix of synthetic and pre-recorded speech as part of interactive voice response services," said Dave Raggett, Activity Lead for W3C's work on voice browsers, and a W3C Fellow from Canon. He added, "SSML allows VoiceXML-based services to be accessed via textphones for people with speaking or hearing impairments. In addition, SSML has great promise beyond its use with VoiceXML, as we look forward to emerging standards for multimodal interaction."

Like XHTML, SSML is a markup language based on the widely deployed XML standard. SSML content can stand alone or be included in other XML content in order to improve rendering as synthesized speech. Naturally, SSML is particularly well-suited for use with a VoiceXML wrapper when building an interactive voice response application.

SSML 1.0 is built for Web integration in other ways as well. The Voice Browser Working Group worked closely with other W3C groups to ensure that the design of SSML 1.0 is consistent with principles of accessibility, internationalization, and general Web architecture. Indeed, one important application of SSML involves "text phones" that may be used by people with some hearing disabilities. The same content can also be output as speech through a common telephone. SSML 1.0 is also consistent with previous work at W3C on describing pronunciation with Cascading Style Sheets (CSS). W3C's CSS Working Group is developing a speech module in CSS3 for rendering XML documents with SSML-based speech engines.

Early Industry Adoption

W3C's Voice Browser Working Group has been particularly successful at ensuring adoption of its specifications before they reach Recommendation status. A test suite (discussed in the July 2004 SSML implementation report) has helped ensure consistent behavior and quality among the already numerous implementations of SSML 1.0. Vendors that have already implemented SSML 1.0 and that are participating in Working Group include: Aspect Communications, France Telecom, Hewlett-Packard, IBM, Loquendo, Microsoft, MITRE, Nuance Communications, SAP, ScanSoft, Sun Microsystems, VoiceGenie Technologies, Voxeo, and Voxpilot.

The Working Group will now focus its energies on the remainder of the Speech Framework. "After VoiceXML 2.0 and Speech Recognition Grammar Specification (SRGS), SSML is the third language of the W3C Speech Interface Framework to become a full W3C Recommendation," said Jim Larson, manager, advanced human input/output, for Intel and also co-chair of W3C's Voice Browser Working Group. "We are working to complete work on other languages of the W3C Speech Interface Framework, including VoiceXML 2.1, Semantic Interpretation, and the Call Control eXtensible Markup Language (CCXML)."

The Working Group is among the largest and most active in W3C. Its participants include: Aspect Communications, BeVocal, Brooktrout Technology, Canon, Comverse Technology, Convedia, Electronic Data Systems, France Telecom, Genesys Telecommunications Laboratories, HeyAnita, Hitachi, Hewlett-Packard, IBM, Intel, IWA-HWG, Korea Association of Information and Telecommunication, Loquendo, Microsoft, MITRE, Mitsubishi Electric, Motorola, Nokia, Nuance Communications, Openstream, SAP, ScanSoft, Siemens, Sun Microsystems, Syntellect, Tellme Networks, Verascape, Vocalocity, VoiceGenie Technologies, Voxeo, and Voxpilot.

About the World Wide Web Consortium [W3C]

The W3C was created to lead the Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability. It is an international industry consortium jointly run by the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) in the USA, the European Research Consortium for Informatics and Mathematics (ERCIM) headquartered in France and Keio University in Japan. Services provided by the Consortium include: a repository of information about the World Wide Web for developers and users, and various prototype and sample applications to demonstrate use of new technology. To date, nearly 400 organizations are Members of the Consortium. For more information see http://www.w3.org/

Contact Americas, Australia --: Karen Myers, <karen@w3.org>, +1.617.253.5884 or +1.978.502.6218
Contact Europe --: Marie-Claire Forgue, <mcf@w3.org>, +33.492.38.75.94
Contact Asia --: Yasuyuki Hirakawa <yasuyuki@w3.org>, +81.466.49.1170

Testimonials for W3C's Speech Synthesis Markup Language (SSML) 1.0 Recommendation

EDS

EDS helps business and government clients in 60 countries achieve maximum returns from IT investments. The natural sounding speech enabled by the SSML standard will increase our ability to automate more of our customers' business processes, cut costs and delight our customers' customers. EDS is pleased to be a part of this standards activity that will enable us to add mobility and natural interactions to the information age.

-- Balaji Prasad, EDS Chief Technologist for Automotive Telematics, Electronic Data Systems

Intel

SSML, already used by both VoiceXML 2.0 and SALT to specify verbal prompts in telephone and multimodal applications, will enhance computer-user interactions by enabling computers to speak to users in new and creative applications.

-- Timothy A. Moynihan, Director of Marketing, Modular Communication Platform Division, Intel

IWA/HWG

International Webmasters Association / HTML Writers Guild (IWA/HWG) and VoiceXML Italian User Group, taking part in the Voice Browser Working Group, are glad that Speech Synthesis Markup Language (SSML) has become a W3C Recommendation. This is an important step, not only for voice applications development, but above all for the enrichment of Internet content.

Furthermore, the creation of a standard markup language that allows to authors to create more versatile Web content and users to converse with speech engines represents a significant advance in the field of multimodality and accessibility.

-- Roberto Scano, W3C Advisory Committee Representative, IWA/HWG; and
Fabrizio Gramuglio, VoiceXML Italia User Group.

RNIB

The Royal National Institute of the Blind (RNIB) is the United Kingdom's premier agency for blind and partially sighted people. We support SSML since it enables control of text to speech in a standard way, avoiding proprietary mark-up. This enables efficient production of voice alternative for our customers.

-- Stephen King, Director Technical and Consumer Services, Royal National Institute of the Blind

Loquendo

As a leading player in speech technologies and voice platforms, Loquendo believes that SSML 1.0 Recommendations is an essential step in completing the Speech Interface Framework. Indeed, it will help promote the speech application market, not only by enabling service providers, content creators, operators and voice portals to deliver a much richer user experience, but also by lowering barriers to Web access for some users with disabilities.

Loquendo TTS product already completely supports the SSML 1.0 specification, which may be used in 16 languages. Loquendo's high-quality, high-performance technologies and platforms power over 2,000,000 calls every day in the telecommunications and enterprise markets throughout the world.

Loquendo is very pleased to contribute to the development of this specification, and will continue to give strong support to W3C Voice Browser and Multimodal Interaction Working Groups.

-- Daniele Sereno, Vice President Product Engineering, Loquendo

ScanSoft

ScanSoft is pleased to have been an active participant in the development and proposal of Speech Synthesis Markup Language (SSML) specification, as we are committed to the advancement of open standards, and our solutions are uniquely optimized to support these standards. We congratulate the W3C Voice Browser Working Group on reaching the Recommendation milestone for Speech Synthesis Markup Language (SSML), and applaud the organization's efforts, as we look to work together to continue to advance the proliferation of standards-based speech technology into the future of the speech business.

-- Peter Mahoney, vice president of worldwide marketing, SpeechWorks, a Division of ScanSoft

SUN

Sun Microsystems congratulates the Voice Browser Working Group on the announcement of SSML as a W3C Recommendation. Sun is pleased to see that our initial contribution of the Java Speech Markup Language (JSML) has served as the basis for this W3C Recommendation. Sun supports this Recommendation and commends the W3C Voice Browser Working Group for its efforts in developing and bringing this specification to Recommendation status.

-- Glenn Edens, Senior vice president, Director Sun Labs

UIUC

As one of the original developers of the SABLE speech synthesis markup language, and one of the early participants in the W3C's discussions on SSML, I am very pleased at the release of SSML 1.0 as a W3C Recommendation. SSML 1.0 is an important landmark in the standardization of voice interfaces to the Web.

-- Professor Richard Sproat, Department of Linguistics and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign

Vocalocity

SSML has an important position in an evolving family of W3C standards that are changing the way telephony applications are built. Where SRGS defined acceptable speech input, SSML eases the process of generating natural sounding speech output. As evidenced by the rapid adoption of VoiceXML, the industry is embracing standards for benefits including more innovation, lower costs, and greater flexibility. The final Recommendation of SSML is another key milestone in the advancement of open telephony platforms.

-- Jeff Haynie, CTO, Vocalocity

VoiceXML Forum

We are pleased that SSML 1.0 has joined VoiceXML 2.0 as a W3C Recommendation. We anticipate that the high-quality text-to-speech capabilities enabled by SSML will increase the adoption of powerful VoiceXML-based applications. The combination of the open standards that comprise the W3C Speech Interface Framework will unlock the true value of Web content.

-- Bruce Pollock, Chairman, VoiceXML Forum

Voxpilot

Voxpilot is thrilled to see SSML 1.0 reach W3C Recommendation and is proud to have contributed to the efforts of the W3C Voice Browser Working Group during the specification's development and testing. SSML offers a flexible, Web-based and open standard paradigm for controlling speech synthesis resources, thereby enabling the creation of high quality speech interfaces based on dynamic information sources. Voxpilot offers a complete SSML Processor as a component of its Open Media Platform for carriers and enterprises, which supports integrations with all the leading TTS engine vendors and includes its own optimised streaming engine for rendering Web-based audio resources.

-- Dr. Dave Burke, CTO, Voxpilot Ltd.

Standards

Groups

Get involved

Resources

News and events

About