"Voice Browser" Working Group Charter

This charter is written in accordance with section 3.2.2 of the W3C Process.

Mission Statement
Scope and Deliverables
Duration
Success Criteria
Release Policy
Milestones
Confidentiality
Relationship with other W3C Activities
Coordination with External Groups
Communication Mechanisms
Voting Mechanisms
Participation

1. Mission Statement

Far more people today have access to a telephone than have access to a computer with an Internet connection. In addition, sales of cellphones are booming, so that many of us have already or soon will have a phone within reach where ever we go. Voice Browsers offer the promise of allowing everyone to access Web based services from any phone, making it practical to access the Web any time and any where, whether at home, on the move, or at work.

It is common for companies to offer services over the phone via menus traversed using the phone's keypad. Voice Browsers offer a great fit for the next generation of call centers, which will become Web portals to the company's services and related websites, whether accessed via the telephone network or via the Internet. Users will able to choose whether to respond by a key press or a spoken command. Voice interaction holds the promise of naturalistic dialogs with Web-based services.

Voice browsers allow people to access the Web using speech synthesis, pre-recorded audio, and speech recognition. This can be supplemented by keypads and small displays. Voice may also be offered as an adjunct to conventional desktop browsers with high resolution graphical displays, providing an accessible alternative to using the keyboard or screen, for instance in automobiles where hands/eyes free operation is essential. Voice interaction can escape the physical limitations on keypads and displays as mobile devices become ever smaller.

Hitherto, speech recognition and spoken language technologies have had for the most part to be handcrafted into applications. The Web offers the potential to vastly expand the opportunities for voice-based applications. The Web page provides the means to scope the dialog with the user, limiting interaction to navigating the page, traversing links and filling in forms. In some cases, this may involve the transformation of Web content into formats better suited to the needs of voice browsing. In others, it may prove effective to author content directly for voice browsers.

Information supplied by authors can increase the robustness of speech recognition and the quality of speech synthesis. Text to speech can be combined with pre-recorded audio material in an analogous manner to the use of images in visual media, drawing upon experience with radio broadcasting. The lessons learned in designing for accessibility can be applied to the broader voice browsing marketplace, making it practical to author content that is accessible on a wide range of platforms, covering voice, visual displays and Braille.

W3C held a workshop on "Voice Browsers" in October 1998. The workshop brought together people involved in developing voice browsers for accessing Web based services. The workshop concluded that the time was ripe for W3C to bring together interested parties to collaborate on the development of joint specifications for voice browsers, particularly since these efforts concern subsetting or extending some of the core W3C technologies, for example HTML and CSS. As a response, a briefing package has been written to establish a W3C "Voice Browser" Activity and Working Group as a first step.

This Working Group will have the mission:

To prepare and review documents related to Voice Browers, for instance, relating to dialog management, extensions to existing Web standards, speech grammar formats and authoring guidelines.
To serve as a coordination body with existing industry groups working on related specifications.
To serve as a pool of experts on Voice Browsers, some of which will participate in the other W3C working groups relevant to Voice Browsers.

An associated public mailing list (www-voice@w3.org) is proposed for public review of proposals prepared by the working group. A public web page will be provided (http://www.w3.org/Voice) describing the status of the activity, with a link to an archive of the public email list. Access to the private email list for the working group and its associated web page will be limited to W3C members and invited experts.

2. Scope and Deliverables

The following list of topics and deliverables is based on discussions at the October 1998 Voice Browser workshop. Additional topics may be added during the lifetime of the Working Group, if there is enough interest. The timescales for deliverables will be defined at the initial face to face meeting proposed for the end of March 1999.

The working group is not limited to the operation of voice browser clients. For instance, access via regular telephones places all of the effort on the proxy server acting as a portal to Web-based services. It is anticipated that some Web content will be authored specifically for voice browsers. Other content is likely to be written for visual browsers with annotations appropriate to voice browsing. Such annotations may involve media specific markup, style sheets and scripting.

An analysis of ways to handle dialog management

This will study different options for effective dialog management for Voice Browsers, for instance, using state transition graphs or frames. How to support dialog repair and the construction of modular sub-dialogs that can be used for constructing larger dialogs? How can skimming be supported? How can aural content be combined with visual displays? In some situations, it may be appropriate to apply a specialized style sheet for aural rendering of regular HTML markup. In others, it may be appropriate to produce content in a specialized markup language.

Discussion in the Working Group on the needs and opportunities for voice applications is expected to shed light as to whether specialized formats are needed for applications developed specifically for voice interaction, whether extensions to existing Web formats offer a better solution, or whether there is a role for both approaches. Depending on the outcome of the discussion, this work may lead to a Proposed Recommendation for a voice-dialog markup language written in XML.

Extensions to HTML, CSS and DOM for Voice Browsers

Proposals in the form of W3C Notes for extending HTML, CSS and the Document Object Model to better suit the needs of Voice Browsers, e.g. events for handling timeouts and incomprehensible utterances, ways to support skimming through document content, ways to identify hypertext links, and ways to pronounce given words or phrases. This work is expected to result in one or more W3C Working Drafts and should be coordinated with the respective working groups, see section 8.

Speech Grammar Formats

An analysis of the requirements for representing expected utterances, and proposals meeting these requirements. A number of different approaches may be appropriate: grammar rules, templates, statistical language models, and phrase lists. These increase the robustness of speech recognition, and make it easier to support naturalistic dialogs. These could be supplied by authors or added by transformation agents. This work is expected to result in a Working Draft.

Authoring guidelines

These authoring guidelines should describe in layman-terms how to use HTML and CSS so that you can do "author-once - render anywhere", instead of having to redevelop every site several times for different platforms. This work will alleviate the need for the complex rules of thumb that are currently required to interpret pages designed only for visual browsing. This work is expected to result in a Working Draft and should be coordinated with the Web Accessibility Initiative, see section 8.

Device profiles

These provide a way to describe the capabilities of Voice Browser clients as well as user preferences, and can be used by servers to find matching content, or to apply transformations that map existing content into a form better suited to Voice Browsers. This work is expected to result in a W3C Working Draft and should be coordinated with the Mobile Interest Group, see section 8.

3. Duration

This Working Group is scheduled to last for two years, from February 25th, 1999 to February 25th, 2001.

4. Success Criteria

The Working Group has fulfilled its mission if it succeeds in unifying the efforts of vendors and content providers to stimulate the development and widespread use of Voice Browsers without causing the Web to fragment.

5. Release Policy

By default, all documents developed by the Working Group are available from the group's web page. Selected documents may be published via the W3C's technical reports page after approval from W3C management.

Documents must have an editor and one or more contributors. Documents should have a date by which they will be declared stable. Any remaining issues at this date will be described in the document to avoid delaying its wider release.

Documents that do not fulfill the criteria above (e.g. longer documents describing specific technical solutions brought up by one member of the Working Group) have to be submitted to W3C before they can be published on the W3C technical reports page.

6. Milestones

Milestones are only set for the first months of this Working Group. Additional milestones may be added when the group decides to take on additional work items.

9th February, 1999: Working Group Proposal/Call for Participation
12th March, 1999: Deadline for Advisory Committee representatives to submit their review of the proposal.
26th March, 1999: Director's Decision. Work starts via email and teleconferences.
3rd - 4th May, 1999: Face to Face meeting, West Coast, USA (host to be determined). This meeting will list and prioritize work items, and assign editors to working drafts.
26th - 27th July, 1999: Face to Face meeting (location and host to be determined). The date for this meeting is provisional and will be finalized at the first face to face meeting in May.
24th March, 2001: Working Group Closes

7. Confidentiality

Access to email discussions and to documents developed by the working group will be limited to W3C members and invited experts, until released for publication by the joint agreement of the working group and the W3C management team. Working group members are required to honor the confidentiality of the group's work, until such time that the work is publically released.

8. Relationship with other W3C Activities

The Voice Browser Working Group will have to take into account technologies developed by other groups within W3C, and to advise them about the requirements of Voice Browsers and to ask them to review W3C Notes, prepared by the Working group, covering proposals for extensions to existing or future Web standards. As of today, the following ongoing W3C activities are concerned:

Hypertext Coordination Group: this has the responsibility for ensuring that reviews between working groups are planned and carried out so as to meet requirements for deliverables and deadlines. The Voice Browser working group will be represented in the coordination group by its chair.
CSS & FP WG, XSL WG: Style sheets are a very important means to achieve reusability of Web content on different devices. CSS already includes some features for aural rendering and additional features may be needed for Voice Browsers. Proposals for changes to CSS will need to be reviewed by the CSS&FP working group.
DOM Working Group: May be asked to review proposals for a programing interface (API) for Voice Browsers, e.g. for controlling the dialog, for setting up phone calls, and for controlling streaming audio.
HTML WG: The display of Web content on Voice Browsers may result in new requirements for the work on a future version of HTML. The limited memory size of mobile devices may lead to the need of HTML subsets. The HTML Working Group is chartered to develop a modularized version of HTML, which should be usable to express a Voice Browser specific subset. Furthermore, Voice Browsers may incur the need for new HTML elements, e.g. to support the dialog with the user. Proposals for extensions to HTML will need to be reviewed by the HTML working group.
I18N WG/IG: The "Voice Browser" WG has to take into consideration the requirements for internationalization. Any proposals made by the working group should be reviewed by the I18N WG in this regard.
Mobile Access Interest Group: The Voice Browser community shares many technical interests and requirements with the mobile community, such as subsetting HTML, or describing device profiles. Both groups should use the same approaches and technologies.
Synchronized Multimedia: This is likely to be valuable for synchronizing streaming audio with text to speech and content rendered on small displays. The Voice Browser Working Group should coordinate its work in this area with the Synchronized Multimedia Interest Group and Working Group.
Web Accessibility Initiative (WAI): With the potential to reach very large numbers of users Voice Browsers have great potential for people with mobility or sight limitations. The ability to repurpose content for use with Voice Browsers will be dependent on authoring tools and guidelines. Furthermore, people with hearing impairments may want access to content authored directly for Voice Browsers. Any proposals by the working group should be reviewed by the WAI for their accessibility implications.
XML Working Groups: XML will be the basis of any potential new markup language produced for the needs of Voice Browsers. The limited memory size of mobile devices may require subsetting XML, e.g. by eliminating internal entities.

9. Coordination with External Groups

The following is a list of groups that are known or presumed to be working on, or interested in, standards relating to Voice Browsers, with pointers to the respective projects. The W3C Voice Browser working group will need to liaise with these groups.

Daisy Consortium: Publishes talking books for people with visual impairments.
DARPA Communicator program: The program carries out research on the next generation of intelligent conversational interfaces to distributed information. The goal is to support the creation of speech-enabled interfaces that scale gracefully across modalities, from speech-only to interfaces that include graphics, maps, pointing and gesture.
Enterprise Computer Telephony Forum: The ECTF works to remove obstacles to interoperability for computer telephony systems. The use of telephones as voice browsers will make it desirable for W3C to coordinate its work with the ECTF.
Library of Congress-National Library Service/NISO Digital Talking Book Committee: Concerned with standards relating to "talking books".
SABLE Consortium: An international group of researchers who collaborate via email to discuss suggestions relating to the SABLE specification for text-to-speech markup. Coordination with this group is desirable to harmonize the control paradigm for text to speech used in SABLE and in the aural features in W3C's Cascading Style Sheets specification (ACSS).
VXML Forum: The VXML Forum is an industry organization founded by AT&T, Lucent and Motorola, and chartered with establishing and promoting the Voice eXtensible Markup Language (VXML) which has been designed for authoring voice interaction services.

10. Communication Mechanisms

10.1 Email

The archived member-only mailing list w3c-voice-wg@w3.org is the primary means of discussion within the group.

The archived mailing list www-html@w3.org is used for public discussion of proposals for Voice Browsers, and Working Group members are encouraged to subscribe.

10.2 Phone

A weekly one-hour phone conference will be held. The exact details, dates and times will be published in advance on the working group page.

10.3 Meetings

Face to face meetings will be arranged 3 to 4 times a year. Meeting details are made available on the W3C Member Calendar and from the Working Group page

11. Voting Mechanisms

The Group works by consensus. In the event of failure to achieve consensus, the Group may resort to a vote as described in the Process Document. Each Member company which has at least one Group member in good standing may vote. There is one vote per W3C Member company. Votes are held by email to allow all participants a chance to vote; there is a two week voting period followed by a period of two working days for the announcement of the result. W3C staff and invited experts do not vote; however in the event of a tie the chair has a casting vote. If the issue is solved by consensus during the voting period, the vote is cancelled.

12. Participation

by W3C Members

Requirements for meeting attendance and timely response are described in the Process document. Participation (meetings, reviewing and writing drafts) is expected to consume time amounting one half to 1 day per week for the lifetime of the group. Working group participants are required not to disclose information obtained during participation, until that information is publically available.

W3C Members may also offer to review one or more working drafts from the group for clarity, consistency, technical merit, fitness for purpose and conformance with other W3C specifications. The only participation requirement is to provide the review comments by the agreed-to date.

by invited experts

As decided on a case by case basis, invited experts may attend a single meeting or a series; they may in some cases be subscribed to the Group mailing list. For the duration of their participation, invited experts are encouraged to adopt the same requirements for meeting attendance and timely response as are required of W3C Members. Invited experts are subject to the same requirement for information disclosure as are required of W3C Members.

by W3C Team

The W3C team will ensure that the mailing lists, public and Group pages are adequately maintained and that public Working Drafts are made available on the Technical Reports page. W3C team will arrange to take minutes at teleconferences and face to face meetings and post these to the Group mailing list and to the Group page.

A W3C team member will provide liaison between non-team document editors and the W3C team; including posting revisions of Working Drafts to the Group page. W3C team members are expected to adopt the same requirements for meeting attendance, timely response and information disclosure as are required of W3C Members.

The Voice Browser Working Group will be chaired by Tomasz Imielinski of Rutgers University. The W3C staff contact, and activity lead will be Dave Raggett. Resources of additional W3C team members will be required for some of the deliverables, should the conditions for starting these deliverables be met.