W3C

Reusable Dialog Requirements
for Voice Markup Language

W3C Working Draft 26 April 2000

This Version:
http://www.w3.org/TR/2000/WD-reusable-dialog-reqs-20000426
Latest Version:
http://www.w3.org/TR/reusable-dialog-reqs
 
Editor:
Daniel C. Burnett <burnett@nuance.com>

Abstract

The W3C Voice Browser working group aims to develop specifications to enable access to the Web using spoken interaction. This document is part of a set of requirements studies for voice browsers, and provides details of the requirements for reusable components for spoken dialogs.

Status of this document

This document describes the requirements for reusable dialog components for spoken interaction, as a precursor to starting work on specifications. Related requirement drafts are linked from the introduction. The requirements are being released as working drafts but are not intended to become proposed recommendations.

This specification is a Working Draft of the Voice Browser working group for review by W3C members and other interested parties. This is the first public version of this document. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress".

Publication as a Working Draft does not imply endorsement by the W3C membership, nor of members of the Voice Browser working groups. This is still a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite W3C Working Drafts as other than "work in progress."

This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only). This document is for public review. Comments should be sent to the public mailing list <www-voice@w3.org> (archive).

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.

NOTE: Text in an italicized teal font is a comment.

 

1. Introduction

Reusable dialog components provide pre-packaged functionality "out-of-the-box" that enables developers to quickly build applications by providing standard default settings and behavior. They shield developers from having to worry about many of the intricacies associated with building a robust speech dialogue, e.g., confidence score interpretation, error recovery mechanisms, prompting, etc. This behavior can be customized by a developer if necessary to provide application-specific prompts, vocabulary, retry settings, etc.

The main goal of this subgroup is to develop a specification for reusable dialog components within the context of an overall specification for a markup language. The purpose of this document is to establish a prioritized list of requirements for reusable dialog components which any proposed markup language (or extension thereof) should address.

1.1. Scope

Although desirable to standardize the interface to all dialog components, this standardization is impractical for many dialogs. In order to standardize the interface, one would have to standardize the call flow, since the specifics of the call flow determine the parameters that can be configured. If this document attempts to standardize the call flow (and hence interface) for more complex and debatable dialog components, the resulting standard components are likely to contain only the lowest common denominator of functionality and therefore be of limited usefulness. Even the control flows for such common tasks as acquiring telephone numbers and postal codes can differ from one application and vendor to another. To preserve implementation flexibility for more complex components, this document sorts components into two categories -- those requiring only a return semantics specification and those requiring both configuration and return semantics specifications. This document also provides some general requirements applicable to all components.

Note that the document provides no suggestions as to how these components should be accessed (i.e. through a generic external call interface vs. through specific markup elements vs. through reference to a ML page in a standard library); rather, it is important merely that the functionality described is packaged in an easy-to-use fashion. There may consequently be some overlap between components described here and sections of the Dialog Requirements document. In general, the Dialog Requirements document describes dialog features, while this document describes packaged/contained dialogs. In practice this distinction may disappear for some of the components described, but there is no requirement that a component be implemented using core features of the language itself.

1.2. Organization

The document is divided into the following sections:

 

 

2. General requirements

Requirements described in this section apply to all components.

2.1. Internationalization and localization

2.1.1. Support for localization (must specify)

Components must provide support for different locales where appropriate. For example, telephone number, postal code, and address all vary in structure from one country to another. "Support", in this case, can mean either configuration parameters or simply multiple versions covering the range of locales where usage is expected.

2.1.2. Locale information (should specify)

Locale information should be available from the application context for use by components. Such information must conform to existing ISO standards for countries and locales.

2.2. Simultaneous activation of components (nice to specify)

For mixed initiative interactions, it would be nice to allow multiple components to be active simultaneously during some stages of the overall dialog. For example, possibly only the initial grammars for multiple components would be active simultaneously, with the grammar that matched the user's response determining which component would handle the next interaction.

2.3. Return values

2.3.1. NL Format (must specify)

Return result(s) from a component must conform to the NL format developed by the Natural Language subgroup.

2.3.2. Keys and value formats (must specify)

Note: the terminology in this requirement will be updated to match that used by the Natural Language Semantics subgroup. For now, the term "key" refers to a text label that a) does not vary from one invocation of the component to another and b) has an associated "value" that can vary from one invocation of the component to another.

Each component must specify

  1. The semantics of standard return keys
  2. The formats of the values associated with these keys

2.3.3. Implementation-specific information (must specify)

Components must also provide a means to return additional implementation-specific keys and values.

2.4. Error/exception handling (must specify)

Components must be able to catch, pass on, or generate exceptions. All exceptions passed on or generated must be catchable by the calling application. Components are complete dialogs and as such, to an extent appropriate for any given component, are expected to appropriately handle many reocgnizer-level conditions/exceptions rather than always passing through such exceptions.

For example, simple user timeouts will most commonly be captured and handled by the component, while an exception signifying the inability to access a server needed by the component would likely be partially handled within the component and then passed on to the calling application. A user hangup would likely just be passed on to the calling application.

2.5. Coordination between language and component features (must specify)

Components must have clearly published behavior for the cases in which potential conflicts between "always active" language features and those of the component can occur.

2.6. Component composition (nice to specify)

Where reasonable, components will be built using other components to increase consistency in behavior across components.



3. Component requirements

3.1 Introduction

One of the goals in providing packaged dialogs is to reduce application developers' effort through appropriate abstraction. Such abstraction can occur at many levels and can be implemented in a number of different ways. For example, parameterized dialogs can be implemented as markup elements with attributes and sub-elements, as scripts built of markup elements and variables (perhaps stored in a standard library of such dialogs), or as native, precompiled, or otherwise non-markup language objects or modules. Since this document does not in general address implementation issues, there are no suggestions in this section regarding the appropriateness of the dialogs below being implemented in one way over another.

The components in this section are presented in that context. They represent actual dialogs (or dialog templates) and not dialog features (like help prompts or the ability to construct confirmation dialogs).

 

3.2. Organization

The components in this section can be categorized along 3 dimensions:

Return only vs. Configuration/Return
For all components, the return semantics will be specified (see Section 2.3). However, some components will have dialog flows that vary considerably by vendor and application. Since the dialog structure determines the parameters that could possibly be configured, it is infeasible and restrictive to specify the list of component-specific configuration parameters for such components. Therefore, for some components only the return semantics will be specified; for others, both the configuration parameters and return semantics will be specified.
Specification priority
Must specify, should specify, and nice to specify indicate the order in which components will be considered for inclusion in the specification document.
Task vs. template
Task components actually obtain some piece or pieces of information. Template components merely provide some call flow framework. Although task components most likely will be configured, they will operate as-is. Template components will not work without configuration.

This section is sorted first along Config/Return vs. Return only, then along Specification priority. Tables containing alternate sorting orders can be found in Section 5.

 

3.3. Components requiring configuration and return specifications

Components in this section are considered to have control flows that will not change significantly between application or vendor. For these components, both component-specific configurable parameters and the return semantics (see Section 2.3) must be specified. In addition, any general configurable parameters required by Section 2 must still be specified.

3.3.1. Yes/No (must specify, task)

This component is intended to be used as a simple confirmation and will prompt the user with a question or statement. The grammar will handle a variety of affirmative and negative responses and return a "yes" or "no", respectively.

Possible configurable parameters are:

  1. Initial prompt (must specify)
  2. Additional affirmative response grammar (should specify)
  3. Additional negative response grammar (should specify)

The component is intended to be used as a simple confirmation.

3.3.2. Natural numbers (must specify, task)

This component will prompt for and recognize natural numbers. As an example, this could be useful in applications where a certain number of items are being ordered or processed. This component should be able to recognize natural numbers between 0 and some large number (e.g. 99,999,999).

The component may also have the following requirements:

Range restriction (should specify)
Allow for a range restriction to be set on recognized numbers in order to improve recognition accuracy.
N-best list (should specify)
Be capable of returning an n-best list of recognized numbers so that application-level filtering/restriction can be applied.

3.3.3. Simple digit string (must specify, task)

This component will prompt for and recognize a fixed-length string of digits.

Possible configurable parameters are:

  1. Initial prompt (must specify)
  2. Expected length of digit string (must specify)
  3. List of expected digit string lengths (nice to specify)

The component may also have the following requirements:

N-best list (should specify)
Be capable of returning an n-best list of recognized digit strings so that application-level filtering/restriction can be applied.
Mixed digits and natural numbers (should specify)
Supports strings that mix both digits and naturally-spoken numbers, eg "one two three four seven thousand".
Validation options (should specify)
The component will optionally do matching against

 

3.3.4. Fully-specified date (must specify, task)

This component will collect a fully-specified date. Any ambiguity in the user's initial statement of the date will be cleared up by the component without benefit of application-maintained context. For example, the component may use additional prompting to disambiguate. The component will return a fully-specified date. If unable to return a fully-specified date, the component will generate an error.

Possible configurable parameters are:

  1. Initial prompt (must specify)
  2. Calendar system - eg Gregorian, Islamic, Jewish (nice to specify)

3.3.5. Partially-specified date (should specify, task)

This component will collect a date. Any ambiguity in the user's initial statement of the date will be cleared up by the component without benefit of application-maintained context. For example, the component may use additional prompting to disambiguate. The component will then return as much of the date as it has obtained. Note that this means the component may return either a fully-specified date or a partially-specified date.

Possible configurable parameters are:

  1. Initial prompt (must specify)
  2. Boolean flag indicating whether or not to disambiguate a partially-specified date (should specify)
  3. Calendar system - eg Gregorian, Islamic, Jewish (nice to specify)

3.3.6. Simple error-recovery template dialog (should specify, template)

This component would provide only simple error detection and reprompting.

Possible configurable parameters are:

  1. Initial prompt (must specify)
  2. Grammar (must specify)
  3. Replacement for default error prompts (should specify)

3.3.7. Simple alpha string (should specify, task)

This component will prompt for and recognize a fixed-length string of letters.

Possible configurable parameters are:

  1. Initial prompt (must specify)
  2. Expected length of alpha string (must specify)
  3. List of expected alpha string lengths (nice to specify)

The component may also have the following requirements:

N-best list (should specify)
Be capable of returning an n-best list of recognized alpha strings so that application-level filtering/restriction can be applied.
Validation options (should specify)
The component will optionally do matching against

 

3.3.8. Simple alphanumeric string (should specify, task)

This component will prompt for and recognize a fixed-length string of letters and digits.

Possible configurable parameters are:

  1. Initial prompt (must specify)
  2. Expected length of alphanumeric string (must specify)
  3. List of expected alphanumeric string lengths (nice to specify)

The component may also have the following requirements:

N-best list (should specify)
Be capable of returning an n-best list of recognized alphanumeric strings so that application-level filtering/restriction can be applied.
Mixed alphadigits and natural numbers (should specify)
Supports strings that mix both alphadigits and naturally-spoken numbers, eg "a b three four seven thousand".
Validation options (should specify)
The component will optionally do matching against

 

3.4. Components requiring only a return semantics specification

Components in this section are considered to have dialog flows that vary considerably by application. Thus, only the component's return semantics will be specified (as per Section 2.3). Despite the foregoing sentence, any general configurable parameters required by Section 2 must still be specified.

3.4.1. Time (must specify, task)

The Time component will provide generic acquisition of clock times -- for example, "three forty-five AM" or "fourteen twenty-three". If the time is ambiguous (hour < 13 and before-/after-noon designation not provided), the component should conduct additional dialog as needed to clarify. Although time zone specifiers must be recognized if spoken by the user, they are not required of the user.

3.4.2. Menu (must specify, template)

This component plays a prompt offering the caller a menu of items from which she may select a single item and then returns an identifier corresponding to the chosen item.

3.4.3. Currency (must specify, task)

The purpose of this component is to obtain a money amount. The grammar will accept any common means of specifying amounts in the currency. For example, US Currency would allow "three dollars and twenty five cents" while German currency might allow "zwei Mark fuenfzig".

3.4.4. Context-compensating date (should specify, task)

This component will collect a date. Any ambiguity in the user's initial statement of the date will be cleared up with the benefit of application-maintained context. This component will then return as much of the date as it has obtained. Note that this means the component may return either a fully-specified or partially-specified date.

3.4.5. Telephone number (should specify, task)

This component will encapsulate the task of acquiring a telephone number. If a single confirmed number is desired, the component will confirm the number and perform error-recovery, if necessary. Otherwise, it should provide an n-best list of recognized telephone numbers that the application can filter.

3.4.6. Sectioned digit string (should specify, task)

This component will prompt for and recognize digit-strings consisting of sections, such as might be found in a credit card number, social services identification number, and the like.

The component may also have the following requirements:

Multiple sectionings (should specify)
Allow for multiple sectionings of the string to be configured, where each sectioning Note the implication that different-length strings may be allowed.
N-best list (should specify)
Be capable of returning an n-best list of recognized digit strings so that application-level filtering/restriction can be applied.
Mixed digits and natural numbers (should specify)
Supports sections that mix both digits and naturally-spoken numbers, eg "one two three four seven thousand".
Validation options (should specify)
The component will optionally do matching against
 

3.4.7. Sectioned alphanumeric string (should specify, task)

This component will prompt for and recognize alphanumeric strings consisting of sections, such as might be found in a product code, user identifier, automobile license place, etc.

The component may also have the following requirements:

Multiple sectionings (should specify)
Allow for multiple sectionings of the string to be configured, where each sectioning Note the implication that different-length strings may be allowed.
N-best list (should specify)
Be capable of returning an n-best list of recognized alphadigit strings so that application-level filtering/restriction can be applied.
Mixed alphadigits and natural numbers (should specify)
Supports sections that mix both alphadigits and naturally-spoken numbers, eg "a b three four seven thousand".
Validation options (should specify)
The component will optionally do matching against

 

3.4.8. Confirmation and correction dialog (should specify, template)

This component would be responsible for confirming one or more items of information given by the user. If the user indicates that one or more items of information are wrong, this component could walk the user through the process of correcting them by calling other pages and/or components.

3.4.9. Browsable selection list (should specify, template)

This component allows the caller to hear items in a list in sequence and optionally navigate through the list. By default, the list would play prompts associated with items, one after another. Note that any item may be selected at any time by speaking the appropriate grammar entry. In addition, the component allows the user to select an item from the list. The component then returns the index of the selected item.

3.4.10. Browsable action list (should specify, template)

This component allows the caller to hear items in a list in sequence and optionally navigate through the list. By default, the list would play prompts associated with items, one after another. Note that any item may be selected at any time by speaking the appropriate grammar entry. In addition, the component would allow the user to say application-specific commands that cause corresponding event handlers to be executed.

3.4.11. Postal code (should specify, task)

This component will ask for and recognize a valid postal code.

3.4.12. Spelled name (should specify, task)

This component will obtain the correct spelling of one or more names (for example, a given name, middle name or initial, and family name). The names may be obtained individually and confirmed together, possibly with some error correction dialog.

3.4.13. Spoken and spelled name (should specify, task)

This component will prompt for and obtain a spoken name(s), optionally followed by the spelling of the name(s). As with the Spelled name component, this component may obtain the names separately and confirm them together, possibly with some error correction dialog.

3.4.14. Credit card information (should specify, task)

This component will encapsulate the task of acquiring credit card information from a caller. It will collect the card type, card number, and expiration date (month and year). It must support a wide variety of standard cards (Visa, MC, AMEX, Diner's club, Discover, etc.)

The component may also have the following requirements:

Validation (should specify)
The component will verify that the card number is a valid number for the type of card.
Cardholder's name (should specify)
The component will optionally acquire the cardholder's spelled name.

3.4.15. Email address(should specify, task)

This component will obtain the user's email address.

3.4.16. Time range (should specify, task)

This component will provide generic acquisition of clock time ranges -- for example, "between one PM and three PM". If either or both of the times are ambiguous (hour < 13 and before-/after-noon designation not provided), the component should conduct additional dialog as needed to clarify.

3.4.17. Duration (should specify, task)

This component will acquire a duration. The component will be able to accept any common duration unit (seconds, minutes, etc.) and will conduct additional dialog as necessary to resolve the units if ambiguous.

3.4.18. URL (should specify, task)

This component will acquire Uniform Resource Locators as described in IETF RFC XXXX.

3.4.19. Address (nice to specify, task)

This component will obtain the physical or mailing address of the caller.

3.4.20. Multiple choice selections (nice to specify, template)

This component will allow a user to select one or more items from a set of valid options.

3.4.21. Non-fixed alphanumeric string (nice to specify, task)

This component acquires an arbitrary-length sequence of letters and digits.

 

 

4. Future study

Items in this section are ones that will be considered for future study. Although not a part of the requirements document per se, they are included here to inform the reader of other topics that have been discussed and consciously postponed for future study.

4.1. Optional initial grammar embedding

Components may optionally allow the initial grammar to be embedded within an extended initial grammar provided by the developer.

An example of this might occur in an implementation of the Date component. In this hypothetical implementation, there is a single initial "date" grammar that will be used to recognize the user's first utterance. This particular implementation is written under the assumption that the user will be asked to give the complete date (as opposed to asking for the day, the month, and the year separately). The developer may wish, then, to change the initial grammar to be something like the following:

I would like to book an appointment for <date>

This future study item is listed as optional because it presumes that the dialog for the component will prompt for and expect all of its data items in the first interaction with the user. Removing the optionality of this requirement would unneccesarily restrict the implementation of the dialog.

4.2. Optional confirmation

Components may optionally perform a final confirmation of any data items acquired. The developer can specify how the confirmation prompt is to be built from the result.

This requirement is listed as optional because it presumes there will be a single confirmation prompt. Making this requirement non-optional would unneccessarily restrict the implementation of the dialog.

4.3. Component -- Give help

This component will prompt for an online help service.

Possible configurable parameters are:

  1. Initial prompt (must specify)
  2. List of help items (must specify)

4.4. Component -- Transfer to agent

This component will pass the phone call to an operator.

Possible configurable parameters are:

  1. Initial prompt (must specify)

4.5. Component -- Physical measurements

This component will acquire a physical measurement (eg. "five pounds, 4 ounces" or "three light years").

4.6. Component -- Ordinals

This component will collect an ordinal number.

Note: this may be challenging in Japanese where the ordinal used depends on the object being ordered.

 

 

5. Component tables

These tables list all the components using various different sorting orders for convenience. For explanations of the three dimensions along which components are organized, see Section 3.2.

5.1. Alphabetical

Component Specification priority Specification level Task vs. Template
Address Nice to Return only Task
Address, email Should Return only Task
Alpha string, simple Should Configure & Return Task
Alphanumeric string, non-fixed Nice to Return only Task
Alphanumeric string, sectioned Should Return only Task
Alphanumeric string, simple Should Configure & Return Task
Browsable action list Should Return only Template
Browsable selection list Should Return only Template
Credit card information Should Return only Task
Confirmation & correction dialog Should Return only Template
Currency Must Return only Task
Date, context-compensating Should Return only Task
Date, fully-specified Must Configure & Return Task
Date, partially-specified Should Configure & Return Task
Digit string, sectioned Should Return only Task
Digit string, simple Must Configure & Return Task
Duration Should Return only Task
Error-recovery dialog, simple Should Configure & Return Template
Menu Must Return only Template
Multiple choice selections Nice to Return only Template
Name, spelled Should Return only Task
Name, spoken & spelled Should Return only Task
Natural numbers Must Configure & Return Task
Postal code Should Return only Task
Telephone number Should Return only Task
Time Must Return only Task
Time range Should Return only Task
URL Should Return only Task
Yes/No Must Configure & Return Task

5.2. Priority, then Config/Return vs. Return only, then Task vs. Template

Component Specification priority Specification level Task vs. Template
Yes/No Must Configure & Return Task
Natural numbers Must Configure & Return Task
Simple digit string Must Configure & Return Task
Fully-specified date Must Configure & Return Task
Time Must Return only Task
Currency Must Return only Task
Menu Must Return only Template
Partially-specified date Should Configure & Return Task
Simple alpha string Should Configure & Return Task
Simple alphanumeric string Should Configure & Return Task
Simple error-recovery dialog Should Configure & Return Template
Context-compensating date Should Return only Task
Telephone number Should Return only Task
Sectioned digit string Should Return only Task
Sectioned alphanumeric string Should Return only Task
Postal code Should Return only Task
Spelled name Should Return only Task
Spoken & spelled name Should Return only Task
Credit card information Should Return only Task
Email address Should Return only Task
Time range Should Return only Task
Duration Should Return only Task
URL Should Return only Task
Confirmation & correction dialog Should Return only Template
Browsable selection list Should Return only Template
Browsable action list Should Return only Template
Address Nice to Return only Task
Non-fixed alphanumeric string Nice to Return only Task
Multiple choice selections Nice to Return only Template

Acknowledgements

The editor wishes to thank the members of the resuable dialog components subgroup for their help in preparing this draft:

Michael Brown (Lucent/Bell Labs)
Daniel C. Burnett (Nuance)
Deborah Dahl (Unisys)
Carolina di Cristo (CSELT)
Linda Dorrian (Productivity Works)
Andrew Hunt (SpeechWorks)
Robert Keiller (Canon)
Andreas Kellner (Philips)
David Ladd (Motorola)
Jens Marschner (Philips)
Stephen Potter (Entropic)
Dave Raggett (W3C/HP)
Ramesh Sarukkai (Yahoo Inc.)
Frank Scahill (BT)
Kuansan Wang (Microsoft)