Voice Browser Working Group Charter

This charter is written in accordance with the W3C Process, section 4.2.2 ( Working Group and Interest Group Charters).

$Date: 2002/09/20 13:48:21 $Author: dsr $

Table of Contents

  1. Mission Statement
  2. Scope
  3. Deliverables
  4. Duration
  5. Success Criteria
  6. Release Policy
  7. Milestones
  8. Confidentiality
  9. Relationship with other W3C Activities
  10. Coordination with External Groups
  11. Communication Mechanisms
  12. Voting Mechanisms
  13. Participation
  14. Intellectual Property

1. Mission Statement

— Voice enabling the Web!

Executive Summary

The Voice Browser Working Group was originally chartered in February 1999 with the goal of extending the Web to support access from any telephone to suitably designed applications. Users would be able to use their voice for input and their ears to listen to recorded and synthetic speech, music and other sounds. The Working Group is now being rechartered on a royalty free basis under the terms of W3C's Current Patent Practice, see section 14 of this charter for details. This follows the 13 June 2002 Director's Decision (Members only) on the results of the Voice Browser Patent Advisory Group. The Working Group will focus on driving VoiceXML and associated specifications through to Recommendation status, as well as continuing work on new features, based upon extensive industry experience with interactive voice response systems. Scope and deliverables for the Working Group are identified in sections 2 and 3.


Far more people today have access to a telephone than to a computer with an Internet connection. In addition, sales of mobile phones are booming, so that many of us have already or soon will have a phone within reach where ever we go. Voice Browsers offer the promise of allowing everyone to access Web based applications from any phone, making it practical to access the Web any time and any where, whether at home, on the move, or at work.

It is common for companies to offer services over the phone via menus traversed using the phone's keypad. Voice Browsers offer a great fit for the next generation of call centers, which will become Web portals to the company's services and related websites, whether accessed via the telephone network or via the Internet. Users will able to choose whether to respond by a key press or a spoken command. Voice interaction holds the promise of naturalistic dialogs with Web-based applications using speech synthesis, pre-recorded audio, and speech recognition. Voice interaction can escape the physical limitations on keypads and displays as mobile devices become ever smaller.

By switching to markup and Web-based technologies, it becomes much cheaper and easier to develop interactive voice response applications. Users dial into voice browsers that in turn download VoiceXML and other resources from web servers. Information supplied by authors can increase the robustness of speech recognition and the quality of speech synthesis. Text to speech can be combined with pre-recorded audio material, which can be used to enliven the user experience in a similar manner to the use of images in visual content. The lessons learned in designing for accessibility can be applied to the broader voice browsing market-place, making it practical to author content that is accessible to people using Braille based browsers, even if they are unable to hear or see.

W3C held a workshop on "Voice Browsers" in October 1998. The workshop brought together people involved in developing voice browsers for accessing Web based services. The workshop concluded that the time was ripe for W3C to bring together interested parties to collaborate on the development of joint specifications for voice browsers. As a response, W3C set up the "Voice Browser" Working Group. The Working Group is now being rechartered on a royalty free basis to drive the existing work through to Recommendation status, and to develop support for new features based upon extensive industry experience.

What has been done already?

The Working Group started by developing a suite of requirements and followed up with work on the corresponding specifications.

VoiceXML 2.0
A dialog language for interactive voice response applications. VoiceXML entered last call on 24 April 2002.
A language for controlling speech synthesis engines, and used for prompts in VoiceXML. SSML is about to re-enter last call following revisions arising out of the initial last call review.
A language for describing speech grammars for use by speech recognition engines. This became a candidate recommendation on 26 June 2002.
Semantic Interpretation
Annotations to speech recognition grammars for computing information to return to an application.
A language designed to provide telephony call control support for VoiceXML or other dialog systems.

The Working Group suspended work on several areas to free up teleconference time for VoiceXML. Work currently suspended includes stochastic grammars (N-Grams), pronunciation lexicon, and voice browser interoperation. In addition, the natural language semantics markup language (NLSML) specification has been transferred to the multimodal interaction activity. The Voice Browser Working Group is cooperating with the CSS Working Group to develop a replacement for the CSS2 aural properties. This is expected to result in a couple of modules for CSS3, one for speech synthesis, based upon SSML, and another for adding aural effects to visual web pages.

2. Scope

The Voice Browser Working Group is tasked with the development of specifications covering the following goals:

  1. A dialog language for interactive voice response applications with support for speech recognition and touch tone (DTMF) input, and audio and synthetic speech for output.
  2. A means for describing speech grammars for use in guiding speech recognition.
  3. A means for annotating speech grammars as a basis for returning information as the result of speech recognition.
  4. A means for controlling speech synthesis engines.
  5. A means for fine grain control of telephony resources for voice browsers.
  6. A means for voice browsers and other call sites to cooperate by sharing data to create a seamless caller experience.

The Working Group is free to prioritize these goals as appropriate, and to drop individual goals, e.g. in case that there is insufficient interest or that there are not enough resources to meet them in the timeframe set out in Section 7.

The Working Group is expected to cooperate with other W3C Working Groups, see Section 9. The Working Group will also serve as a coordination body with existing industry Groups working on related specifications, and to provide a pool of experts on voice browsers, some of which will participate in the other W3C Working Groups relevant to voice browsers.

3. Deliverables

This Section describes an initial set of deliverables for achieving the goals stated in Section 2. At the discretion of the Chair, the Working Group can adapt this set as needed during the course of its work. However, all deliverables must fall within the scope of this charter, and sufficient resources to address them need to be available within the Working Group.

The Voice Browser Working Group is expected to advance the following specifications along the W3C Recommendation track. The milestones in Section 7 show the estimates for progressing the high priority items. Low priority items may be dropped if the resources for working on them aren't realised:

High priority items:

Low priority items:

In parallel with work on VoiceXML 2.0, the Working Group is expected to start work on the next version of VoiceXML, drawing upon public comment and extensive industry experience with earlier versions. This dual track approach is essential to maintaining the flow of innovation.

4. Duration

This Working Group is scheduled to last for slightly more than two years, from September 25th, 2002 to December 31st, 2004.

5. Success Criteria

The Working Group will have fulfilled its mission if it succeeds in developing W3C Recommendations covering the goals stated in Section 2.

6. Release Policy

By default, all documents under development by the Working Group are available to W3C Members from the Working Group's web page. Selected documents will be made publically available via the W3C's technical reports page after approval from W3C management. The types of documents (Notes, Working Drafts etc.) are defined by the W3C Process.

Documents must have at least one editor and one or more contributors. Documents should have a date by which they will be declared stable. Any remaining issues at this date will be described in the document to avoid delaying its wider release.

7. Milestones

This is a provisional list of milestones for the deliverables identified in section 3, and liable to change. The Voice Browser Working Group will be tasked with maintaining publically accessible information describing the documents under development and the schedule for their standardization. The table below uses the following abbreviations: Q for Quarter, WD for Working Draft, LCWD for Last Call Working Draft, CR for Candidate Recommendation, PR for Proposed Recommendation, and REC for Recommendation.

High Priority Work Items
Date VoiceXML 2.0 SRGML SSML Semantic
2002Q3         WD2
2003Q1       LCWD LCWD
2003Q2 CR   CR    
2003Q3 PR   PR CR CR
2003Q4 REC   REC PR PR
2004Q1       REC REC

A note describing the goals for future versions of dialog markup might be released in 2003 Q2 or Q3. A first Working Draft might follow in 2004 Q1 (after VoiceXML 2.0 reaches Candidate Recommendation status) with a goal of LCWD by end of 2004. N-gram and Lexicon might begin to develop in 2003 after Grammar has reached recommendation. The work on Voice Browser Interoperation will be put on a slow track.

8. Confidentiality

Access to email discussions and to documents developed by the Working Group will be limited to W3C Members and Invited Experts, until released for publication by the joint agreement of the Working Group and the W3C management team. Working Group members are required to honor the confidentiality of the Group's discussions and working documents, until such time that the work is publically released. Invited experts are bound by the W3C Invited Expert and Collaborators Agreement. Participants working for W3C Member organizations are bound by their contract with W3C.

9. Relationship with other W3C Activities

The Voice Browser Working Group will have to take into account technologies developed by other Groups within W3C, and to advise them about the requirements for Voice Browsers and to ask them to review specifications prepared by the Working Group, covering proposals for extensions to existing or future Web standards. At the time the charter was written, the following ongoing W3C activities are concerned: (listed in alphabetical order)

10. Coordination with External Groups

The following is a list of Groups that are known or presumed to be working on, or interested in, standards relating to voice browsers, with pointers to the respective projects. The W3C Voice Browser Working Group will need to liaise with these Groups.

3GPP is studying different ways to include speech-enabled services comprising both speech-only and multimodal services in 3G networks. One option for distributed speech recognition is based on the ETSI's STQ Aurora developments. Other options are dependent on the general study on speech enabled services. 3GPP may be interested in working on integrating remote access to speech synthesis resources. W3C should keep a watching brief. There is a possible connection to proposals (e.g. MRCP) for the IETF to develop protocols for accessing remote speech synthesis and speech recognition resources.
Daisy Consortium
Publishes talking books for people with visual impairments.
DARPA Communicator program
The program carries out research on the next generation of intelligent conversational interfaces to distributed information. The goal is to support the creation of speech-enabled interfaces that scale gracefully across modalities, from speech-only to interfaces that include graphics, maps, pointing and gesture.
ECMA TCC32-TG11 is developing a standardised interface for computer-telephony integration (CSTA). This work is potentially related to W3C's work on call control.
Enterprise Computer Telephony Forum
ECTF works to remove obstacles to interoperability for computer telephony systems. Its specifications impact: voice mail, unified messaging, media gateways, voice activated services and more. See the ECTF Solutions FAQ.
European Telecommunications Standards Institute (ETSI)
A non-profit organization whose mission is "to determine and produce the telecommunications standards that will be used for decades to come". ETSI's work is complementary to W3C's. Of particular note is ETSI STQ Aurora work on Distributed Speech Recognition, and ETSI DES/HF-00021, a standard spoken vocabulary for command, control and editing.
The IETF Speech Services Control (speechsc) Working Group is developing protocols to support distributed media processing of audio streams. The focus of the Working Group is to develop protocols to support speech recognition, speech synthesis and speaker verification, and they expect to take advantage of W3C's work on SSML and SRGML. The IETF SIP Working Group is developing the Session Initiation Protocol (SIP). The W3C Voice Browser Working Group should ensure that its specifications work smoothly with SIP.
The International Telecommunication Union's Study Group 16 is working on distributed speech recognition and verification.
Java Community Process: JSR-113
The Java Community Process provides for the development of standards for Java based APIs.JSR-113 is chartered to work on the Java Speech API for the Java 2 Platform, Micro Edition (J2ME). This is potentially related to W3C's work on speech synthesis and speech recognition.
National Library Service for the Blind and Physically Handicapped & NISO Digital Talking Book Committee
Concerned with standards relating to players for digital talking books.
Open Mobile Alliance
The Open Mobile Alliance aims to grow the market for the entire mobile industry by removing the barriers to global user adoption and by ensuring seamless application interoperability while allowing businesses to compete through innovation and differentiation. Mobile phones are particularly important to voice applications.
SALT Forum
The SALT Forum was launched on 15th October 2001 with a mission to develop standards for speech enabling HTML and XHTML. The announcement states their intention to submit specifications to a standards body during 2002. The SALT specification has been contributed to the W3C Voice Browser and Multimodal Interaction Working Groups, and the Working Group consensus process will determine which ideas in SALT will be taken up. W3C Members can view the contribution letter.
SIP Forum
The SIP Forum is a non profit association whose mission is to promote awareness and provide information about the benefits and capabilities that are enabled by SIP. The increasing importance of SIP to telephony makes it appropriate for the Voice Browser Working Group to ensure that its specifications can be used together with SIP.
VoiceXML Forum
The VoiceXML Forum is an industry organization providing educational, marketing, and conformance testing services for VoiceXML. The Forum originally developed VoiceXML, but the specification is now maintained by W3C. Both organizations have signed a memorandum of understanding setting out the goals of both parties.

11. Communication Mechanisms

11.1 Email

The archived member-only mailing list w3c-voice-wg@w3.org is the primary means of discussion within the Group.

Certain topics need coordination with external Groups. The Chair and the Working Group can agree to discuss these topics on a public mailing list. The archived mailing list www-voice@w3.org is used for public discussion of W3C proposals for Voice Browsers, and Working Group members are encouraged to subscribe. As a precaution against spam you must be subscribed in order to send a message to the list. To subscribe send a message with the word subscribe in the subject line to www-voice-request@w3.org.

11.2 Phone

The Working Group meets by phone on Tuesdays and Thursdays. The exact details, dates and times are published in advance on the Working Group page. Additional phone conferences may be scheduled as necessary on specific topics.

11.3 Meetings

Face to face meetings will be arranged 3 to 4 times a year. Meeting details are made available on the W3C Member Calendar and from the Working Group page. The Chair is responsible for providing publically accessible summaries of Working Group face to face meetings, which will be announced on www-voice@w3.org.

11.4 Public Web pages

The Voice Browser Activity will maintain public pages on the W3C website to describe the status of work and pointers to the Working Group, charter, Activity statement, and email archives.

12. Voting Mechanisms

The Group works by consensus. In the event of failure to achieve consensus, the Chair may resort to a vote as described in the Process Document. Each Member company which has at least one Group member in good standing may vote. There is one vote per W3C Member company. Votes are held by email to allow all participants a chance to vote; there is a two week voting period followed by a period of two working days for the announcement of the result. W3C staff and invited experts do not vote; however in the event of a tie the chair has a casting vote. If the issue is solved by consensus during the voting period, the vote is cancelled.

Note: the term good standing is defined in the W3C Process.

13. Participation

by W3C Team

The W3C staff contact, and activity lead will be Dave Raggett (W3C Fellow on assignment from Openwave Systems). Resources of additional W3C team members will be required for some of the deliverables, should the conditions for starting these deliverables be met.

by W3C Members

Requirements for meeting attendance and timely response are described in the Process document. Participation (meetings, reviewing, and writing drafts) is expected to consume time amounting to one day per week for the lifetime of the Group. Working Group participants are required not to disclose information obtained during participation, until that information is publically available.

W3C Members may also offer to review one or more Working Drafts from the Group for clarity, consistency, technical merit, fitness for purpose and conformance with other W3C specifications. The only participation requirement is to provide the review comments by the agreed-to date.

by invited experts

As decided on a case by case basis, invited experts may attend a single meeting or a series; they may in some cases be subscribed to the Group mailing list. For the duration of their participation, invited experts are encouraged to adopt the same requirements for meeting attendance and timely response as are required of W3C Members. Invited experts are subject to the same requirement for information disclosure as are required of W3C Members.

by W3C Team

The W3C team will be responsible for the mailing lists, public and Working Group pages, for the posting of meeting minutes, and for liaison with the W3C communications staff for the publication of Working drafts. W3C team members are expected to adopt the same requirements for meeting attendance, timely response and information disclosure as are required of W3C Members. The W3C staff contact will be expected to devote 40% of his time to this Activity.

14. Intellectual Property

W3C promotes an open Working environment. Whenever possible, technical decisions should be made unencumbered by intellectual property right (IPR) claims.

This is a Royalty Free Working Group, as described in W3C's Current Patent Practice, see also the Director's decision of 13th June 2002 (W3C Members only).

Working Group participants disclose patent claims by sending email to <patent-issues@w3.org>; please see Current Patent Practice for more information about disclosures.

Director's decision on Voice Browser PAG Recommendation

The Director's Decision on the Voice Browser PAG Recommendation makes provision for work on RAND extensions to an RF core specification:

The Working Group will be rechartered as a royalty-free Working Group as defined by the Current Patent Practice note (CPP). The core specifications ought to enable basic interoperability for voice browser applications across the Web, but might not include certain advanced or specialized features over which participants hold patents that they will not currently make available RF as defined in the CPP.
Should any part of a specification be removed from the core version because it is not available on a royalty-free basis, and should the WG decide to continue to work on this part, a PAG should be formed that could recommend issuing the particular part as a RAND specification coming out of W3C, or another organization.