Voice Browser Working Group Charter

The mission of the Voice Browser Working Group, part of the Voice Browser Activity, is to enable users to speak and listen to Web applications by creating standard languages for developing Web-based speech applications. The Voice Browser Working Group concentrates on languages for capturing and producing speech and managing the dialog between user and computer, while a related Group, the Multimodal Interaction Working Group, concentrates on additional input modes including keyboard and mouse, ink and pen, etc.

Summary Table

End date	31 January 2009
Confidentiality	Proceedings are Member-only, but the group sends regular summaries of ongoing work to the public mailing list.
Initial Chairs	Jim Larson, Scott McGlashan
Initial Team Contacts (FTE %: 100)	Kazuyuki Ashimura, new hire
Usual Meeting Schedule	Teleconferences: Weekly Face-to-face meetings: 3 to 4 per year

Background

The telephone was invented in the 1870s and continues to be a very important means for us to communicate with each other. The Web by comparison is very recent, but has rapidly become a competing communications channel. The convergence of telecommunications and the Web is now bringing the benefits of Web technology to the telephone, enabling Web developers to create applications that can be accessed via any telephone, and allowing people to interact with these applications via speech and telephone keypads. The W3C Speech Interface Framework is a suite of markup specifications aimed at realizing this goal. It covers voice dialogs, speech synthesis, speech recognition, telephony call control for voice browsers and other requirements for interactive voice response applications, including use by people with hearing or speaking impairments.

Some possible applications include:

Accessing business information, including the corporate "front desk" asking callers who or what they want, automated telephone ordering services, support desks, order tracking, airline arrival and departure information, cinema and theater booking services, and home banking services.
Accessing public information, including community information such as weather, traffic conditions, school closures, directions and events; local, national and international news; national and international stock market information; and business and e-commerce transactions.
Accessing personal information, including calendars, address and telephone lists, to-do lists, shopping lists, and calorie counters.
Assisting the user to communicate with other people via sending and receiving voice-mail and email messages.

Under previous charters, going back to 2000, The Voice Browser Working Group have created the W3C Speech Interface Framework suite of specifications, which includes:

VoiceXML 2.0 Recommendation, 16 March 2004 (press release, testimonials) and VoiceXML 2.1 Last Call Working Draft, 28 July 2004, specifies the flow control and exchange of information between users and computers.
Speech Recognition Grammar Specification 1.0 Recommendation, 16 March 2004, specifies the words and phrases which a speech recognition system can convert from speech to text.
Semantic Interpretation for Speech Recognition Last Call Working Draft, 8 November 2004, specifies how text returned from a speech recognition system can be modified and reformatted.
SSML Recommendation, 7 September 2004 (press release, testimonials), specifies how to render text as human-like speech by a speech synthesis system.
Pronunciation Lexicon 1.0 Last Call Working Draft, 26 October 2006, specifies how words are pronounced. This information is used by the speech synthesis system to render words as human-like speech, and is used by the speech recognition system to convert human speech to text.
CCXML Working Draft, 22 November 2006, specifies how to manage the telephone system (answer incoming calls, initiate outgoing calls, create conference calls, etc.)
State Chart State Chart XML (SCXML): State Machine Notation for Control Abstraction Working Draft , January 24, 2006, specifies the dialog flow of a speech or multimodal application. The dialog flow is separate from the capture and rendering of information.

In addition to the above, here is a list of documents produced by the Voice Browser Activity

SSML 1.0 say-as attribute values Note, 26 May 2005
Updated pronunciation lexicon requirements, 29 October 2004
Voice Browser Interoperation: Requirements, 8 August 2002
Call Control Requirements in a Voice Browser Framework, 13 April 2001
Stochastic Language Models (N-Gram) Specification, 3 January 2001
Introduction and Overview of W3C Speech Interface Framework, 28 November 2000
Dialog Requirements for Voice Markup Languages, 23 December 1999
Grammar Representation Requirements for Voice Markup Languages, 23 December 1999
Speech Synthesis Markup Requirements for Voice Markup Languages, 23 December 1999

Scope

All work items carried out under this Charter must fall within the scope defined by this section.

VoiceXML 2.1

VoiceXML 2.1 is an extension to VoiceXML 2.0 that provides 8 new features to VoiceXML 2.0. The Group plans to take VoiceXML 2.1 through to Recommendation status.

VoiceXML 3.0

VoiceXML 3.0 is the next major release of VoiceXML. VoiceXML 3.0 will provide powerful dialog capabilities that can be used to build advanced speech applications, and to provide these capabilities in a form that can be easily and cleanly integrated with other W3C languages. VoiceXML 3.0 will provide enhancements to existing dialog and media control, as well as major new features (e.g. multimedia prompts, VCR controls, speaker identification and verification, modularization, a cleaner separation between data/flow/dialog, and asynchronous external eventing) to facilitate interoperation with external applications and media components. The Group will create multiple profiles of VoiceXML 3.0 that enable subsets of VoiceXML 3.0 to target specific user cases. (e.g.,. handheld computers and cell phones with too few resources for full VoiceXML). The Group plans to continue work on VoiceXML 3.0, and plan to published several iterations of the document.

State Chart XML

SCXML 1.0 is a generic XML control language based on Harel State Charts. Although SCXML was designed as a control language for VoiceXML 3.0 and for Multimodal Interaction dialog management, SCXML may also be used for control other types of applications. The Group plans to take SCXML 1.0 through to Recommendation status.

Speech synthesis

SSML 1.1 enhances SSML 1.0 to better support widely spoken East-Asian, Indian and Middle Eastern languages in a manner that improves its usefulness in other languages as well. It also updates SSML 1.0 to be more consistent with PLS, SISR and expected VoiceXML 3.0 functionality. The Group plans to take SSML 1.1 through to Recommendation status. The Group may begin work on SSML 2.0 which will restructure SSML 1.1, enhance the <say-as> element, the role attribute, and possibly provide additional enhancements (for example, emotion elements).

Speech recognition grammars

This covers context free grammars and statistical models of speech, together with DTMF input. SRGS 1.0 for context free grammar is already a full Recommendation. The Group may resume work on N-Gram (statistical) model of speech.

Pronunciation Lexicon

Pronunciation Lexicon Specification (PLS 1.0) provides the basis for describing pronunciation information for use in speech recognition and synthesis, for use in tuning applications, e.g. for proper names that have irregular pronunciations. The Group plans to take PLS 1.0 to full Recommendation. The Group may enhance the role attribute, possibly with a registry.

Semantic interpretation for speech recognition

SISR 1.0 describes annotations to grammar rules for extracting the semantic results from recognition, either as XML or as a value that can be held in an ECMAScript variable. The target for the XML output is EMMA (Extensible Multimodal Annotation Markup Language) which is being developed in the W3C Multimodal Interaction Activity.

Telephony call control for voice browsers (CCXML 1.0)

CCXML 1.0 is an XML language for controlling connections, conferences, and dialogs in a Voice Browser context. The Group plans to take CCXML 1.0 through to Recommendation status. We may consider enhancing CCXML 1.0.

Maintenance work

The Working Group will be maintaining its existing (or soon-to-be) Recommendations: VoiceXML 2.0, VoiceXML 2.1, SRGS 1.0, SSML 1.1, SISR 1.0, PLS 1.0, SCXML 1.0, and CCXML 1.0. Maintenance takes the form of: responding to questions and requests on the public mailing list, issuing errata as needed and possibly publishing minor updates to the specifications.

Success Criteria

For each document to advance to proposed Recommendation, the group will typically produce a technical report with two independent and interoperable implementations for each feature.

Deliverables

The following documents are expected to become W3C Recommendations:

Voice Browser Call Control: CCXML Version 1.0 (currently WD)
Pronunciation Lexicon Specification (PLS) Version 1.0 (currently Last Call)
State Chart XML (SCXML): State Machine Notation for Control Abstraction (currently WD)
Semantic Interpretation for Speech Recognition (SISR) Version 1.0 (currently Last Call)
Speech Synthesis Markup Language (SSML) Version 1.1 (expected to be published as WD in 1Q of 2007)
Voice Extensible Markup Language (VoiceXML) 2.1 (currently Last Call)
Voice Extensible Markup Language (VoiceXML) 3.0 (expected to be published as WD in 3Q of 2007)

The following documents are either notes or are not expected to advance toward Recommendation:

Pronunciation Lexicon Specification (PLS) Version 1.0 Requirements (currently WD)
Speech Synthesis Markup Language Version 1.1 Requirements (WD)
CSS3 Speech Module (WD)

The following documents may be revised depending upon the interest of working group members:

Stochastic Language Models (N-Gram) Specification (WD; may be revisited and advanced depending upon interest of working group members)
SSML 1.0 say-as attribute values Note, 26 May 2005

Milestones

This Working Group is chartered to last until 31 January 2009. The first face to face meeting after re-chartering will be held in May or June 2007.

Here is a list of milestones identified at the time of re-chartering. Others may be added later at the discretion of the Working Group. The dates are for guidance only and subject to change.

Document	Requirements	First Public Working Draft	Last Call Working Draft	Candidate Recommendation	Proposed Recommendation	Recommendation
Note: The group will document significant changes from this initial schedule on the group home page.
CCXML 1.0	Completed	Completed	1Q2007	2Q2007	3Q2007	3Q2007
PLS 1.0	Completed	Completed	Completed	2Q2007	3Q2007	4Q2007
SISR 1.0	Completed	Completed	Completed	2Q2007	3Q2007	4Q2007
SSML 1.1	1Q2007	1Q2007	2Q2007	3Q2007	4Q2007	1Q2008
VoiceXML 2.1	Completed	Completed	Completed	11/2006	12/2006	1Q2007
VoiceXML 3.0	1Q2007	3Q2007	3Q2008	TBD	TBD	TBD
SCXML 1.0	1Q2007	Completed	3Q2007	1Q2008	3Q2008	3Q2008

Dependencies

These are related activities that the Group may need to interact with in ways to be determined, for example, to ask them to review this Group's draft specifications, and for this Group to take advantage of their work to fulfil its needs. Collaboration across working groups will be essential to realizing the mission of the Voice Browser Activity.

W3C-related activities

The following groups are identified as being related to the work of this group.

Internationalization — The specifications of the VBWG are expected to be usable worldwide and be adapted to a wide variety all language. An ongoing strong relationship with the I18N groups is essential to achieve this goal.
Multimodal Interaction WG — The MMIWG has a strong link to the VBWG as it is chartered to develop specifications that allow to use the Web with using any modality, not just voice.
Synchronized Multimedia — VoiceXML 3.0 will introduce advanced media controls, involving timing and synchronization specification borrowed from SMIL.
WAI Protocols and Format — The VBWG expects that its work will be reviewed by the WAI-PF group, in order to ensure universal accessibility of the produced specifications.
Hypertext Coordination Group — The "backplane" framework that is being developed by the groups belonging to the HCG: HTML, Web Applications, XForms, Compound Documents formats, etc. needs to be compatible with the VBWG's Data-Presentation-Flow framework, introduced in the design of VoiceXML 3.0.
XML and Semantic Web Activities — Because the specifications developed in the VBWG are all based on XML, the group will follow the work of the XML Activity in order to keep them compatible with the ongoing evolution of XML. Similarly, many specifications in the VBWG express metadata using RDF. Therefore, cooperation with the Semantic Web Best Practices is expected in case questions arise on the use of RDF.
Security — The Speaker Verification and Identification features of VoiceXML 3.0 will benefit from review from the Web Security Activity.
Emotion Incubator Group — the Group may consider making some extensions to support the recognition of or the presentation of emotions in speech.

External groups

Here is a list of external groups with complementary goals to the Voice Browser activity:

ECMA TC32-TG11 — computer supported telecommunications applications (CSTA)
ETSI — work on DSR codecs, call control, human factors and command vocabularies
IETF SpeechSC working group or its successor — protocols for accessing speech engines
ISO/IEC JTC 1/SC 37 Biometrics — user authentication
ITU — telecommunication standards
SALT Forum — tags for adding speech to HTML and other markup languages
VoiceXML Forum — an industry association for VoiceXML, see memorandum of understanding

Participation

To be successful, the Voice Browser Working Group is expected to have 15 or more active participants for its duration. Effective participation to Voice Browser Working Group is expected to consume one work day per week for each participant; two days per week for editors. The Voice Browser Working Group will allocate also the necessary resources for building Test Suites for each specification. In order to make rapid progress, the Voice Browser Working Group consists of several subgroups, each working on a separate document. Voice Browser Working Group members may participate in one or more subgroups.

Participants are reminded of the Good Standing requirements of the W3C Process.

To become a participant of the Working Group, a representative of a W3C Member organization must be nominated by their Advisory Committee Representative as described in the W3C Process. The associated IPR disclosure must further satisfy the requirements specified in the W3C Patent Policy (5 February 2004 Version).

Experts from appropriate communities may also be invited to join the working group, following the provisions for this in the W3C Process.

Working Group participants are not obligated to participate in every work item, however the Working Group as a whole is responsible for reviewing and accepting all work items.

Face to face meetings will be arranged 3 to 4 times a year. The Chair will make Working Group meeting dates and locations available to the group in a timely manner according to the W3C Process. The Chair is also responsible for providing publicly accessible summaries of Working Group face to face meetings, which will be announced on www-voice@w3.org.

About this Charter

This charter for the Voice Browser Working Group has been created according to section 6.2 of the Process Document. In the event of a conflict between this document or the provisions of any charter and the W3C Process, the W3C Process shall take precedence.

Note: This charter was modified on 26 November 2007 to included the informative note in section 4.1 referring readers to the home page of the group for updated milestone information.