W3C

SpeechObjects Specification V1.0

W3C Note 14 November 2000

This Version:
http://www.w3.org/TR/2000/NOTE-speechobjects-20001114
Latest version:
http://www.w3.org/TR/speechobjects
Editors:
Daniel C. Burnett, Nuance Communications <burnett@nuance.com>

Abstract

This document describes SpeechObjects, a core set of reusable dialog components that are callable through a dialog markup language such as VoiceXML, to perform specific dialog tasks, for example, get a date or a credit card number, etc. The major goal of SpeechObjects is to complement the capabilities of the dialog markup language and to leverage best practices and reusable component technology in the development of speech applications.

Status of this document

This document is a submission to the World Wide Web Consortium from Nuance Communications, Inc. (see Submission Request, W3C Staff Comment). For a full list of all acknowledged Submissions, please see Acknowledged Submissions to W3C.

This document is a Note made available by W3C for discussion only. This work does not imply endorsement by, or the consensus of the W3C membership, nor that W3C has, is, or will be allocating any resources to the issues addressed by the Note. This document is a work in progress and may be updated, replaced, or rendered obsolete by other documents at any time.

A list of current W3C technical documents can be found at the Technical Reports page.


Table of Contents


  1. Introduction
  2. Background
    1. Architectural model
    2. Goals and Principles of design
    3. Implementation platform and Requirements
      1. Invocation process
      2. Runtime environment requirements
      3. SpeechChannel
  3. Concepts
    1. Universal commands
    2. Error handling and recovery
    3. Playables
    4. Grammars
    5. Results
    6. Redo Objects
    7. Identifiable interface
    8. Inherited parameters and/or return values
  4. SpeechObject specifications
  5. Appendix

SpeechObjects Specification V1.0


1. Introduction

SpeechObjects are reusable software components that encapsulate discrete pieces of conversational dialog. SpeechObjects are based on an open architecture that can be deployed on any of the major server and IVR (interactive voice response) platforms. This paper describes a specification based on Nuance's Java implementation of SpeechObjects.

Simply stated, a SpeechObject is a reusable software component that implements a dialog flow and is packaged with the audio prompts and recognition grammars that support that dialog. A Java call to a SpeechObject can be as simple as

// Initialize the SpeechObject
SODate date = new SODate();
// Invoke the SpeechObject
SODate = date.invoke(sc, dc, cs);
// Look at the results
int month = result.getMonth();
int day = result.getDayOfMonth();
int year = result.getYear();

In this document we will present both the configuration parameters (JavaBean properties) and the return values for each of the SpeechObjects.

 

2. Background

2.1 Architectural model

The Java SpeechObjects architecture was designed to be portable and extensible, as well as easy to use. To this end SpeechObjects are all based on a primary interface, SpeechObject. This simple interface defines:

From the SpeechObject interface and a set of supporting interfaces, SpeechObject developers can build objects of any complexity that can be run with a single method call. The invoke method for any given SpeechObject executes the entire dialog for that SpeechObject. A simple invoke method might just play a standard prompt, wait for speech, and return the results after recognition completes. A more complicated invoke method could include multiple dialog states, smart prompts, intelligent error handling for both user and system errors, context-sensitive help, and any other features built in by the SpeechObject developer.

To call a SpeechObject from your application, however, doesn't require you to know anything about how the invoke method is implemented. You only need to provide the correct arguments and know what information you want to extract from the results.

The SpeechChannel is the object that provides recognition functionality to a SpeechObject. When an application is launched, the environment allocates a SpeechChannel for each supported port. This SpeechChannel is passed to the application for each incoming call and persists until the application terminates. The SpeechObjects that make up the application use the SpeechChannel to interact with the caller-requesting recognition services, playing prompts, setting configuration parameters, and so on. Telephony control is an optional component of the SpeechChannel.

2.2 Goals and principles of Design

In short, SpeechObjects

2.3 Implementation Platform and Requirements

This section briefly describes the process of invoking a SpeechObject as a motivation for the runtime requirements and then presents the SpeechChannel.

2.3.1 Invocation process

To use a SpeechObject from your application, you simply instantiate it and call its invoke method.

The invoke method executes the dialog defined by the SpeechObject, and returns an instance of the Result class used by that SpeechObject. This result provides your application with the data that was accumulated during the dialog.

The invoke method takes several arguments:

2.3.2 Runtime environment requirements

In order for the invocation described above to work, a platform must implement a SpeechChannel and provide a launcher that creates the DialogContext and CallState and invokes the object.

2.3.3 SpeechChannel

The SpeechChannel is an integral part of a SpeechObjects application. Application developers use SpeechObjects to implement the dialog flow, and SpeechObject developers use SpeechChannel methods to implement the recognition functionality of the dialog.

This section describes the abstract SpeechChannel architecture in more detail.

SpeechChannel interfaces

Functionality provided by the SpeechChannel is actually separated into five interfaces: the main speech channel interface that provides recognition functions, and four separate interfaces that define the functionality for:

The SpeechChannel is the primary object and provides access to the corresponding implementation of the other interfaces. SpeechObjects work with the single SpeechChannel object passed to them and can access the other interfaces when needed:

SpeechChannel

These interfaces can be implemented in the same class, or in separate classes, as appropriate for the platform. In either case, the SpeechChannel interface defines methods that return each of the other interfaces. For example, if a SpeechObject wanted to access dynamic grammar functionality, it would call the SpeechChannel getDynamicGrammarControl method and use the returned object to make dynamic grammar requests.

 

3. Concepts

This section briefly describes some concepts underlying SpeechObjects that may provide a context for understanding the specific parameters and return values described in Section 4.

3.1 Universal commands

A universal command is an utterance that the speaker can say at any point during any dialog. The framework includes a grammar allowing recognition of a small set of universal commands, and provides default handling when these utterances are recognized. The universal commands currently defined by the SpeechObjects framework are:

Through method calls the application can substitute its own handling for any of the supported universals, including disabling them.

As mentioned earlier, these universal commands are based on standards proposed by the Telephone Speech Standards Committee (TSCC).

3.2 Error handling and recovery

All SpeechObjects provide default handling, including prompts and logic (and grammar adjustments, if necessary), for all of the common recognizer error conditions: rejection, no speech timeout, too much speech, spoke too early, recognizer too slow, and unexpected key.

The default error handlers for each of these error types play an error prompt and then attempt to reexecute the dialog. The error prompt is generated by combining an application-wide error prompt that is specific to the type of error with a generic prompt provided by the current SpeechObject. For example, if a no-speech timeout occurs while a Yes/No SpeechObject dialog is executing, the framework concatenates the application-wide error prompt "I'm sorry, I didn't hear you" and the Object error prompt "Please say 'yes' or 'no'."

The default error handling mechanism continually reexecutes the dialog until a valid response is generated or the error threshold for the object is reached. When the threshold is reached, an exception is thrown and you can implement whatever error handling behavior you prefer, such as transferring the caller to a live agent. You can also override the default error handlers for any of the defined error types through class method calls.

All prompts can be overridden through configuration parameters.

3.3 Playables

All prompts used by SpeechObjects are encapsulated using classes that implement the core interface Playable, which defines the protocol for objects that can be played over an audio channel. The framework defines a set of classes that implement Playable that provide additional prompt behavior, including:

3.4 Grammars

SpeechObjects are designed to let you define grammars in a variety of ways, based on the requirements for each dialog and the need to customize the grammar at runtime. Most SpeechObjects use the Nuance dynamic grammar mechanism, meaning that the grammar for a given SpeechObject is compiled and loaded onto the recognition server when the SpeechObject is constructed. This allows SpeechObjects to be reused much more easily, as you don't have to compile the grammar for each SpeechObject into a recognition package before using it.

The SpeechObjects framework grammar classes let you build your grammars in a variety of ways:

From a text file
This format allows grammars to easily be updated as the application is developed and tuned. You don't need to recompile your application when you change the grammar, as the grammar is dynamically generated from the file at runtime.
From a grammar you define programmatically
This is harder to tune but is useful when a grammar's contents are determined by criteria only available at runtime. The Alpha Digit String SpeechObject, for example, generates its grammar at runtime based on how the object is configured, including the number of letters/digits in the string and where pauses or delimiter phrases such as "dash" might occur.

You can also initialize a grammar from a file and subsequently update it programmatically.

The framework also allows compound grammars. Compound grammars let you define a single grammar object comprised of multiple grammars to be used in parallel. For example, in a corporate dialing application you might use compound grammars containing a set of employee names and a set of employee extensions, to allow speaker to dial either by name or number. The framework uses compound grammars to combine each SpeechObject's grammar with the grammar defining the set of Universal commands.

3.5 Results

The Result class is a subclass of a utility class KVSet, which defines an object used to encapsulate a set of key/value pairs. This structure is analogous to natural language slots and the values they are filled in with during recognition. Because the value stored in a KVSet can be any type of object, SpeechObjects have the flexibility to populate Result objects with any set of values that are appropriate. For example:

The value at any given key can also be another KVSet, providing the ability to nest result structures if appropriate.

Each Result class defined by the SpeechObjects includes convenience methods allowing easy access to the specific information it encapsulates.

Result subclasses also have another characteristic, which is that they can be played over the current audio output device. They implement the Playable interface, which allows objects to be appended to the prompt queue and then played by a SpeechChannel or other object that supports audio playback.

This lets you easily play the recognized information, for example, for confirmation dialogs or during testing.

3.6 Redo Objects

Many SpeechObjects are used in conjunction with ConfirmAndCorrect, which confirms all of the information obtained by those SpeechObjects, and upon a negative confirmation, identifies which information needs to be corrected (e.g., "Would you like to change the date, the time, or the telephone number?"). The SpeechObjects corresponding to the piece ("the date") or pieces ("the date and the time") of information that need correction is then re-invoked (e.g., re-invoke the Date object), prompting the user for the information again.

To promote better dialog, rather than simply re-invoking the same SpeechObject again during this error-correcting phase, a SpeechObject may offer a "RedoObject" which should be used to re-obtain the desired information. This RedoObject may simply ask for the information in a different manner, by changing prompts as appropriate ("Please say the date again."). Alternatively, the "RedoObject" may actually employ a different dialog strategy, perhaps breaking up the task into a set of smaller tasks in order to facilitate recognition of complex items. RedoObjects typically share the same SOKey ('object instance name') as their original SpeechObject in order to share n-best information from the original SpeechObject. SpeechObjects that do not employ a RedoObject may return "null" to indicate that the same instance should be used during this error-correction phase of a confirmation dialog.

3.7 Identifiable interface

Many of the SpeechObjects implement the Identifiable interface, which enables them to be used in the Confirm and Correct SpeechObject. The Identification phase of the Confirm and Correct process makes use of

Here's a sample of how the prompt might be used by the system (with the prompt highlighted):

Which would you like to change - the departure city or the arrival city?

The corresponding grammar expression for a derived ArrivalCity SpeechObject might then accept phrases like "the arrival city", "the destination", or "destination city".

3.8 Inherited parameters and/or return values

Many SpeechObjects inherit parameters, return values, and behavior from other SpeechObjects. These relationships are helpful in understanding what parameters might possibly be common (in syntax and behavior) across a large number of Objects. A simplified inheritance diagram for all of the SpeechObjects in this document is shown below.

SpeechObject inheritance diagram

 

4. SpeechObject specifications

Although SpeechObjects as implemented in Java have method calls for setting and getting various values, the specification below is restricted to listing only JavaBean properties of the SpeechObjects, i.e. properties for which there are both "get" and "set" methods. While this restriction limits configuration to discrete parameters which may be changed but not added to (1), it also results in a cleaner interface for the users of the Objects - these properties may be edited in a GUI, set and retrieved in a scripting environment, etc.

Parameter type and return type descriptions can be found in the appendix.

Parameters and return values common to all SpeechObjects

Configuration parameters:

Parameter

Type

Description

RedoObject

SpeechObject

New object to call in case the caller negatively confirms the result from the original object in a confirmation scenario

SOKey

String

Name for this instance's family (i.e., the object itself plus any redo objects for this object)

Return values:

Return value

Type

Description

getNextResult

Result

Next Result in n-best list, or null if no more

requiredAdditionalInteraction

boolean

Boolean indicating whether or not additional interaction between the SO and the caller was required in obtaining this result. Typically, this means that the SO has already done any needed disambiguation

isAutoConfirmed

boolean

Boolean indicating whether or not this Result has already been confirmed

 

Dialog SpeechObject

Description:

This SpeechObject does not implement a specific dialog -- it simply provides the framework for a dialog. The default behavior is:

Configuration parameters:

Parameter

Type

Description

Filter

ResultFilter

Used to examine n-best SpeechObject.Results and filter out invalid results

Grammar

Grammar

The grammar used for recognition

HelpPrompt

Playable

This prompt is played if the user requests help

InitialPrompt

Playable

Unless an error occurs or the user requests help, this is the prompt that is played before recognition

MaxErrorCount

int

The maximum number of errors (rejections, timeouts, or unexpected dtmf keypresses) permitted before the SpeechObject gives up

MaxHelpCount

int

The maximum number of help requests permitted before the SpeechObject gives up

NoResultFoundPrompt

Playable

This prompt is played after a valid recognition but when none of the candidates in the n-best list are successfully processed into a SpeechObject.Result (e.g. if the entries fail to pass this Object's Filter)

NoSpeechTimeoutPrompt

Playable

This prompt is played when a recognition error code of "no speech timeout" is returned by the recognizer

RecognitionErrorPrompt

Playable

This prompt is played by default when a recognition error occurs unless a more specific error prompt is defined

RecognizerTooSlowTimeoutPrompt

Playable

This prompt is played when a recognition error code of "recognizer too slow timeout" is returned by the recognizer

RejectedPrompt

Playable

This prompt is played when a recognition error code of "rejected" is returned by the recognizer

ReturnAllPossibleResults

boolean

If true, this Object returns an entire n-best list of SpeechObject.Results. Otherwise, it will return only the first valid result it interprets and processes

SpeechTooEarlyPrompt

Playable

This prompt is played when a recognition error code of "speech too early" is returned by the recognizer

TooMuchSpeechTimeoutPrompt

Playable

This prompt is played when a recognition error code of "too much speech timeout" is returned by the recognizer

UnexpectedKeyPrompt

Playable

This prompt is played when a recognition error code of "unexpected_key" is returned by the recognizer

 

Return results:

Return value

Type

Description

toString

String

A String representation of this Object's recognized result

 

 

Yes/No

Description:

This SpeechObject expects an answer to a yes-or-no question.

Configuration parameters:

All configuration parameters of the Dialog SpeechObject, plus

Parameter

Type

Description

IdentifyExpression

Expression

Identification expression for the Identifiable interface

IdentifyPrompt

Playable

Identification prompt for the Identifiable interface

StrictGrammar

boolean

If true, loads and uses a limited (strict) grammar to maximize performance

Return results:

All return results of the Dialog SpeechObject, plus

Return value

Type

Description

YesNo

String

String indicating yes or no

saidYes

boolean

True if the user said yes, false otherwise

saidNo

boolean

True if the user said no, false otherwise

 

Quantity

Description:

This SpeechObject recognizes quantities of items. By default, this SpeechObject recognizes 1-4 digit (0-9,999) quantities and has an absolute range of 1-8 digits (0-99,999,999). A developer can (and should) configure this SpeechObject to recognize quantities only within a certain range by setting the minDigits and maxDigits properties, as appropriate for a specific domain and application. The Quantity SpeechObject does not itself perform any confirmation or validity checking. The range of numbers that the speaker is allowed to say is limited by limiting the grammar used for recognition to that range. If the speaker says a number that is out of the current range, the utterance is rejected by the recognizer.

Configuration parameters:

All configuration parameters of the Dialog SpeechObject, plus

Parameter

Type

Description

IdentifyExpression

Expression

Identification expression for the Identifiable interface

IdentifyPrompt

Playable

Identification prompt for the Identifiable interface

MaxDigits

int

Maximum allowed number of digits for the quantity that will be recognized (e.g. 4 => '9999')

MinDigits

int

Minimum allowed number of digits for the quantity that will be recognized (e.g. 2 => '10')

 

Return results:

All return results of the Dialog SpeechObject, plus

Return value

Type

Description

Quantity

int

The quantity recognized

 

Simple Digit String

Description:

This SpeechObject can be configured to recognize a string of digits of a fixed length. When NumberDigits is set, the SpeechObject automatically creates a grammar for recognizing that number of digits (without natural numbers). The Simple Digit String Speech Object does not itself perform any confirmation or validity checking. If there are specific constraints on what constitutes a valid number string for the controlling application, using the result filter mechanism to filter out inconsistent hypotheses is highly recommended.

 

Configuration parameters:

All configuration parameters of the Dialog SpeechObject, plus

Parameter

Type

Description

IdentifyExpression

Expression

Identification expression for the Identifiable interface

IdentifyPrompt

Playable

Identification prompt for the Identifiable interface

NumberDigits

int

Number of digits to be recognized

 

Return results:

All return results of the Dialog SpeechObject, plus

Return value

Type

Description

DigitString

String

The recognized digit string

 

Date

Description:

This SpeechObject prompts for and interprets a date. The date may be specified in one of many formats, including just getting a day of the week, a day of the month, a relative date (today, tomorrow, yesterday, next tuesday), and so forth. More complex expressions which specify the date in multiple ways are allowed (tomorrow, December 12th); consistency of such dates are checked, and if the date is inconsistent, the user will be reprompted for the date with an appropriate error message. Invalid dates, such as April 31, are similarly disallowed, causing reprompting for a date.

The speech object makes an effort to interpret the date intelligently:

If the day of week is given, such as Thursday, the SpeechObject interprets the date as if it were the upcoming Thursday. For example, if today is Monday, February 15, 1999, and the caller said "Thursday", the SpeechObject would interpret this as Thursday, February 18, 1999.

If the day of month is given, such as the 27th, and this day is later than the current day (for example, February 15), this SpeechObject assumes the date is in the same month as the current date. For example, the 27th would be interpreted as Saturday, February 27, 1999.

If the day of month is given, such as the 5th, and this day is a number less than the current date (for example, February 15), the SpeechObject assumes the day is for the next month. In this example, the 5th would be interpreted as Friday, March 5, 1999. If the next month is January, then the SpeechObject assumes the date is in the following year as well.

If the month is before the current month, the Date SpeechObject assumes the caller intends this date in the following year. For example, if the caller said January 3, this would be interpreted as January 3, 2000. If the caller says "today", the SpeechObject determines the current date unless specified by the developer.

When the caller says only a month, the SpeechObject will follow up by prompting the caller to specify the day of the month. This is actually implemented by the invocation of a default DisambiguateTime SpeechObject, which may be overridden.

The SpeechObject performs the following validation checking of the recognized date:

When inconsistent information is provided by the caller, such as a conflicting day of month and day of week (for example, Tuesday, February 15, 1999), the Date SpeechObject plays a prompt that identifies the correct information (February 15th is a Monday) and then reprompts the caller.

Likewise, if the Modifier such as "today" is inconsistent with the day of month, the SpeechObject will play a prompt specifying what 'today's' date is and reprompt the caller.

Invalid date handling:

When the caller responds with an invalid date such as "February 30", the SpeechObject plays a prompt that explains why this date is invalid "... there are only 30 days in April," and then reprompts the caller.

Configuration parameters:

All configuration parameters of the Dialog SpeechObject, plus

Parameter

Type

Description

DateTooEarlyPrompt

Playable

Prompt played if the stated date is before the lower DateLimit, e.g. "I'm sorry, I thought you said 'February 12th, 1985' but that day is too far in the past"

DateTooLatePrompt

Playable

Prompt played if the stated date is after the upper DateLimit, e.g. "I'm sorry, I thought you said 'February 12th, 2085' but that day is too far in the future"

DayOfMonthSO

SODayOfMonth

SODayOfMonth instance used to obtain a day of month when just the month or just the month and year are specified

IdentifyExpression

Expression

Identification expression for the Identifiable interface

IdentifyPrompt

Playable

Identification prompt for the Identifiable interface

InconsistentDayOfWeekPrompt

Playable

Prompt played if the stated date includes a day of week that doesn't match, e.g. "I'm sorry, I thought you said 'Tuesday, December 10th', but December 10th is a Friday."

InconsistentModifiedDayOfWeekPrompt

Playable

Prompt played if the stated date includes a modified day of week that doesn't match, e.g. "I'm sorry, I thought you said 'next Tuesday, December 10th', but next Tuesday is December 14th."

InconsistentNamedTodayEtcPrompt

Playable

Prompt played if the stated date includes a today expression as well as a day of week that actually refers to another date, e.g. "I'm sorry, I thought you said 'today, Tuesday, December 10th', but today is December 14th."

InconsistentTodayEtcPrompt

Playable

Prompt played if the stated date includes a today expression that refers to another date, e.g.,"I'm sorry, I thought you said 'today, December tenth', but today is December fourteenth."

InvalidDatePrompt

Playable

Prompt played if the stated date is invalid (the day of month exceeds the number for the month

LowerDate

java.util.Calendar or int or SODate.DateLimit

Earliest permissible date, represented by a Calendar object, an offset in days, or a DateLimit object

UpperDate

java.util.Calendar or int or SODate.DateLimit

Latest permissible date, represented by a Calendar object, an offset in days, or a DateLimit object

 

Return results:

All return results of the Dialog SpeechObject, plus

Return value

Type

Description

Calendar

java.util.Calendar

The date as a Calendar object

DayOfMonth

int

Day of month specified by the caller

DayOfWeek

int

Day of week specified by the caller

Month

int

Month represented as an integer between 1 and 12

Year

int

Year represented as a four-digit integer

 

Time

Description:

This SpeechObject defines a generic dialog for getting a time expression from the speaker.

The Time SpeechObject is generic and may be specialized (through modification of parameters, prompts, and/or grammars) for use in a range of applications, for example, flight information and reservation systems, personal agenda management, or package delivery/pickup scheduling.

In response to a prompt requesting the time, the caller speaks the time in a natural way (i.e. the time using natural expressions such as "in the morning" or "at night" as well as "am" or "pm".) The Time SpeechObject recognizes a clock time, for example, "three forty-five am". If the time is ambiguous (am/pm not specified), the SpeechObject conducts any additional dialog with the caller needed to ensure that an unambiguous time is obtained. This dialog is implemented by invoking an instance of the DisambiguateTime SpeechObject.

Configuration parameters:

All configuration parameters of the Dialog SpeechObject, plus

Parameter

Type

Description

DisambigObject

SODisambiguateTime

Sets object that disambiguates ambiguous times, e.g. '10' => 10 am or 10 pm

IdentifyExpression

Expression

Identification expression for the Identifiable interface

IdentifyPrompt

Playable

Identification prompt for the Identifiable interface

InconsistentTimePrompt

Playable

Prompt played when the user response in the disambiguation dialog is inconsistent with the original time they said. For example, the user is asked to disambiguate if 11 o'clock is in the morning or evening, and replies "in the afternoon".

 

Return results:

All return results of the Dialog SpeechObject, plus

Return value

Type

Description

AM_PM

String

Returns whether the time said by the caller was AM or PM

Calendar

java.util.Calendar

Returns the time in a Calendar representation

ClockTime

int

A numerical representation of the time

ClockTimePlayable

Playable

The time as a Playable in the standard format (with trailing "am" or "pm")

Hours

int

The hour portion of the time said by the caller

Minutes

int

The minutes portion of the time said by the caller

SmartTimePlayable

Playable

A time Playable in the "intelligent" (colloquial) format (for example, "7 in the evening", "5 in the morning", "noon")

UserStatedModifier

String

Any user-stated modifier that disambiguated the time

 

Menu

Description:

The Menu SpeechObject does not itself define a default dialog. The dialog is generated dynamically based on the number of items defined. The dialog presents the list of menu items and allows the caller to choose one of them. It enables the developer to dynamically build menus from pairs of grammars and prompt atoms, and in addition it permits the developer to associate a listener with any of the items so that the listener's action is performed in response to selecting the item.

The menu may be defined dynamically by calling a method that adds menu items sometime before invocation. Each menu item is defined in terms of:

 

Note that at this time the menu items cannot be set merely by setting JavaBean properties.

Configuration parameters:

All configuration parameters of the Dialog SpeechObject, plus

Parameter

Type

Description

ErrorPromptPostfix

Playable

Audio to use as the postfix of the error prompt (if the prompt is being auto-generated)

ErrorPromptPrefix

Playable

Audio to use as the prefix of the error prompt (if the prompt is being auto-generated).

HelpPromptPostfix

Playable

Audio to use as the postfix of the help prompt (if the prompt is being auto-generated)

HelpPromptPrefix

Playable

Audio to use as the prefix of the help prompt (if the prompt is being auto-generated)

IdentifyExpression

Expression

Identification expression for the Identifiable interface

IdentifyPrompt

Playable

Identification prompt for the Identifiable interface

InitialPromptPostfix

Playable

Audio to use as the postfix of the initial prompt (if the prompt is being auto-generated)

InitialPromptPrefix

Playable

Audio to use as the prefix of the initial prompt (if the prompt is being auto-generated)

ItemListPrompt

Playable

Prompt that is an explicit listing of all the menu items

 

Return results:

All return results of the Dialog SpeechObject, plus

Return value

Type

Description

ItemName

String

The name of the selected item

RecResult

RecResult

The recognition result of the interaction

 

US Currency

Description:

This SpeechObject prompts for and recognizes a dollar and cent amount in one utterance. If neccessary, disambiguation is performed, for utterances like "seven fifty". This disambiguation is performed by invoking a default instance of the DisambiguateCurrency SpeechObject (which may of course be overridden). This SpeechObject provides a DTMF backoff strategy if the caller encounters recognition problems.

Configuration parameters:

All configuration parameters of the Dialog SpeechObject, plus

Parameter

Type

Description

DisambigObject

SODisambiguateCurrency

Object that disambiguates ambiguous currencies, e.g. 'ten fifty' => $10.50 or $1050

IdentifyExpression

Expression

Identification expression for the Identifiable interface

IdentifyPrompt

Playable

Identification prompt for the Identifiable interface

Range

Range

Sets the allowed value range (also propagated to the disambiguation object)

 

Return results:

All return results of the Dialog SpeechObject, plus

Return value

Type

Description

Amount

float

Floating point number indicating the recognized amount of dollars and cents

Cents

int

Integer indicating the recognized amount of cents

Dollars

int

Integer indicating the recognized dollar amount, not including cents

 

North American Telephone Number

Description:

This SpeechObject prompts for and obtains a telephone number from the user, in the standard 10-digit format used in Canada, Mexico, and USA.

Configuration parameters:

All configuration parameters of the Dialog SpeechObject, plus

Parameter

Type

Description

IdentifyExpression

Expression

Identification expression for the Identifiable interface

IdentifyPrompt

Playable

Identification prompt for the Identifiable interface

UseNatural

boolean

If true, allows natural numbers within each section, e.g. 'six five oh, eight four seven, eleven fifty five'

 

Return results:

All return results of the Dialog SpeechObject, plus

Return value

Type

Description

AreaCode

String

The first 3 digits of the 10-digit recognized phone number

Exchange

String

The second set of 3 digits of the 10 digit recognized phone number

Subscriber

String

The last 4 digits of the 10 digit recognized phone number

PhoneNumber

String

Entire phone number as a string

 

Alpha Digit String

Description:

This SpeechObject can be configured to prompt for and recognize alphanumeric digit-strings which may consist of sections, such as an account credit card number, social security number, and the such. Natural numbers are optionally allowed within each section.

As with the Sectioned Digit String SpeechObject, each format of the sectioning is specified as a '-' delimited string. Each format should be of the form:

DDD-DD-DDDD or DDD-DD-AADD

and so forth. The first formatting specifies that the digitstring grammar should recognize a section of three digits, two digits, and four digits (e.g., a Social Security Number). The letter D is used for a digit (0-9), and the letter A is used for any alpha (A-Z).

One can also use a user-defined group to recognize a subset of the alphabet and optionally allow for digits in certain positions as well as an alpha. For example, one could define "V" to correspond to "AEIOU" - the vowels. This is useful for when only certain letters are allowed in a position within the digit-string - the automatically generated grammar can reflect this constraint directly.

Configuration parameters:

All configuration parameters of the Dialog and Sectioned Digit String SpeechObjects, plus

Parameter

Type

Description

Group

Group[]

Defines the groups

Group

(int, Group)

Defines a specific group

 

Return results:

All return results of the Dialog and Sectioned Digit String SpeechObject.

 

Confirm and Correct

Description:

The Confirm and Correct SpeechObject can be used to have the caller confirm one or more pieces of information together, and correct (by invoking suitable SpeechObjects) any pieces of information that are incorrect. The items that can be so confirmed, identified, and/or corrected must be SpeechObjects implementing the Identifiable interface. This is done in the three phases of Confirmation, Identification, and Correction:

  1. The first phase is the Confirmation phase. During the Confirmation phase, the inner Confirmation object is invoked to play the confirmation prompt ("...is this correct?"), so that the caller can indicate whether all the information is correct. If the caller answers in the affirmative, this SpeechObject is finished.
  2. If the caller indicates that information is not completely correct, Confirm and Correct moves on to the Identification phase, invoking the inner Identify object. During this phase, the caller identifies which piece(s) of information need to be corrected. The caller can respond by indicating up to two items that are incorrect -- or the caller can specify that all of the information is wrong; the caller can also answer that none of the items is incorrect, in which case execution returns to the Confirmation phase, and starts over.
  3. If the caller Identifies at least one incorrect piece of information during the Identification phase, execution moves on to the Correction phase. During the Correction phase, Confirm and Correct obtains the re-do object for the SpeechObject whose Result is wrong. After getting and invoking each re-do object, execution returns to the Confirmation phase, to confirm all of the contained SpeechObject.Results.

At the end of a successful invocation (i.e., after all results have been confirmed), the Confirm and Correct SpeechObject returns a Result that contains all the results of the contained SpeechObjects, with each contained SpeechObject Result stored under the contained SpeechObject's SO key. For example, if the contained SpeechObjects are SODate and SOTime, the Result instance returned by Confirm and Correct will contain an SODate.Result stored under SODate's SO key, and an SOTime.Result stored under SOTime's SOKey.

Configuration parameters:

Parameter

Type

Description

Confirmation

Confirmation

The object that performs confirmation

GetInitialResultsIfNeeded

boolean

If true, will initially invoke all contained SpeechObjects that have not yet obtained results

Identify

Identify

The object that identifies which information needs to be corrected

MaxRetryCount

int

Maximum number of retries attempted by Confirm and Correct

SpeechObject

SpeechObject[]

SpeechObjects to be contained (confirmed/corrected)

SpeechObject

(int, SpeechObject)

Adds/sets a SpeechObject for confirmation/correction

 

Return results:

Return value

Type

Description

SOKeysEnum

Enumeration

Enumeration of contained SpeechObjects' Result keys

 

Browsable Selection

Description:

The Browsable Selection SpeechObject acts similarly to the Browsable List SpeechObject except that it also supports a "select" command the caller can use to select the current item being browsed.

Configuration parameters:

All configuration parameters of the Browsable List SpeechObject, plus

Parameter

Type

Description

IdentifyExpression

Expression

Identification expression for the Identifiable interface

IdentifyPrompt

Playable

Identification prompt for the Identifiable interface

SelectionExpression

Expression

Application-specific grammar rule to specify that the current item is to be selected

 

Return results:

All return results of the Browsable List SpeechObject.

 

Browsable Action

Description:

The Browsable Action List SpeechObject acts similarly to the Browsable List SpeechObject except that the user can add any custom command and an associated handler into the list. When a custom command is spoken, the corresponding handler is fired to handle it.

Note that at this time there is no way to specify these custom commands and handlers using JavaBean properties.

Configuration parameters:

All configuration parameters of the Browsable List SpeechObject.

Return results:

All return results of the Browsable List SpeechObject.

 

US Zip Code

Description:

This SpeechObject will collect either a 5- or 9-digit US ZIP code. The filter used to validate 5-digit codes is based on a list of currently existing codes issued by the U.S. Postal Service. The 4-digit extension, if spoken, is not validated. It is possible to disable the filter if you want to accept any 5-digit code.

 

Configuration parameters:

All configuration parameters of the Dialog SpeechObject, plus

Parameter

Type

Description

FilterDisabled

boolean

True prevents the recognized result from being validated

FiveDigitsOnly

boolean

True restricts recognition to 5 digits

IdentifyExpression

Expression

Identification expression for the Identifiable interface

IdentifyPrompt

Playable

Identification prompt for the Identifiable interface

 

Return results:

All return results of the Dialog SpeechObject, plus

Return value

Type

Description

Extension

String

Either a string representation of the last 4 digits (if a 9-digit zip code was recognized) or null (if a 5-digit zip code).

ZipCode

String

String representation of the 5 digit zip code (or first 5 digits, if a 9-digit zip code was recognized)

 

Credit Card Info

Description:

This speech object encapsulates the functionality of acquiring information on credit card type, credit card number, and credit card expiration date.

Configuration parameters:

All configuration parameters of the Dialog SpeechObject, plus

Parameter

Type

Description

AcceptExpiredCard

boolean

If true, credit cards that are expired before today are accepted

AllTypesEnabled

boolean

If true, all eight built-in credit card types are acceptable

CardTypeEnabled

(int, boolean)

Sets whether a specific card type is accepted or not

CreditCardExpirationDateSpeechObject

SpeechObject

Internal expiration date speech object

CreditCardInfoCANDCSpeechObject

SpeechObject

Internal confirm and correct speech object

CreditCardNumberSpeechObject

SpeechObject

Internal card number speech object

CreditCardTypeSpeechObject

SpeechObject

Internal card type speech object

InitialState

String

Initial state for the call-flow

PreamblePrompt

Playable

Prompt that is played at the beginning of the dialog

TypeQueryExplicit

boolean

If true, the credit card type is queried explicitly

 

Return results:

All return results of the Dialog SpeechObject, plus

Return value

Type

Description

CreditCardExpirationMonth

int

Credit card expiration month.

CreditCardExpirationYear

int

Credit card expiration year

CreditCardNumber

String

Credit card number as digit string

CreditCardType

String

Credit card type as string

ResultStatus

int

Result status

isResultOk

boolean

Whether the result is Ok or not

 

Sectioned Digit String

Description:

This SpeechObject can be configured to recognize a string of digits broken up into various sections. Alternate sectionings can be provided, and the use of natural numbers in the grammar can be enabled. The maximum length of a section is six digits.

Each format of the sectioning is specified as a '-' delimited string; the number of digits in each section of a given format is specified by a sequence of 'D' characters. For example,

"DDD-DDD-DDDD"

specifies a sectioning of three digits, three digits, and four digits.

"DDD-DD-DDDDD"

specifies a sectioning of three digits, two digits, and five digits.

Developers can also set the delimiter that may be spoken by callers when reading the sectioned digitstring. By default, this is a "dash", but this can be changed to any single word or valid GSL expression, such as "dot" or "[dash dot]" or null, etc.

Natural numbers can also be enabled through a simple property setting.

This SpeechObject does not itself perform any confirmation or validity checking. If there are specific constraints on what constitutes a valid digit string for your application, using the result filter mechanism to filter out inconsistent hypotheses is highly recommended.

The configuration of the digit string -- that is, the number of sections and the length of each section -- determines the construction of the grammar used for recognition. If the speaker says a digit string that does not match one of the defined patterns, the recognizer rejects the utterance.

Configuration parameters:

All configuration parameters of the Dialog SpeechObject, plus

Parameter

Type

Description

DelimiterExpression

String

The (optional) delimeter expression used in the grammar between sections of the alphadigit string (dash, dot, etc.)

DelimiterPrompt

Playable

Audio played between sections of the recognized digit string (e.g. 'dash.wav')

Format

String[]

Defines formats of all sections of the string

Format

(int, String)

Defines the format for the given section of the string (e.g. 'DDD-DD-DDDD')

Format

WeightedFormat[]

Defines formats of all sections of the string

Format

(int, WeightedFormat)

Defines the format for the given section of the string

IdentifyExpression

Expression

Identification expression for the Identifiable interface

IdentifyPrompt

Playable

Identification prompt for the Identifiable interface

UseNatural

boolean

If true, allows natural numbers within each numeric section, e.g. 'three six two, fifty seven, eleven hundred'

 

Return results:

All return results of the Dialog SpeechObject, plus

Return value

Type

Description

DigitString

String

The recognized string without any section delimeters

Section

String[]

The sections of the recognized string

SectionedDigitString

String

The recognized string with '-' between sections, eg. "52-764".

 

Browsable List

Description:

The Browsable List SpeechObject allows the caller to hear items in a list in sequence, and navigate through the list. The object provides methods through which the developer may dynamically define the list. The list of items to be browsed is encapsulated within a Browsable object.

When invoked, the list plays prompts associated with items, one after another. Depending on configuration, the list advances automatically or through a "next" command to the next item. Based on configuration, the list may terminate automatically at the end, or as the result of an "exit list" command.

The general dialog flow is as follows:

The dialog begins with a preamble if enabled (recognition state which accepts relevant navigational commands), which automatically advances to the first item. Some commands are invalid in the preamble. (For example, "previous" or application-specific commands like delete). The invalid commands are handled as errors. For each item, the item prompt is played, with optional pre-pending and post-pending prompts. Navigation and application-specific commands are active. If enabled, a timeout automatically advances list to the next item. If enabled, a timeout automatically exits the list after the last item.

The default navigation commands are: next, previous, last, first, and exit.

 

Configuration parameters:

Parameter

Type

Description

AutoAdvance

boolean

If set to true, forces the list to advance automatically to the next item if there is no response from the user

Browsable

Browsable

Object providing access to items to be browsed

ExitPrompt

Playable

Prompt played when the list exits

FirstPrePrompt

Playable

The pre-prompt played before the first item

Grammar

Grammar

The grammar used for both preamble and list item recognitions

LastPrePrompt

Playable

The pre-prompt played before the last item

ListNoSpeechTimeoutPrompt

Playable

The prompt played if there is a "no-speech" timeout when the user says a command after the list item

ListRecognizerTooSlowTimeoutPrompt

Playable

The prompt played if there is a "recognizer-too-slow" timeout when the user says a command after the list item

ListRejectedPrompt

Playable

The prompt played if there is a rejection when the user says a command after the list item

ListSpeechTooEarlyPrompt

Playable

The prompt played if there is a "speech-too-early" condition when the user says a command after the list item

ListTooMuchSpeechTimeoutPrompt

Playable

The prompt played if there is a "too-much-speech" timeout when the user says a command after the list item

ListUnexpectedKeyPrompt

Playable

The prompt played if the user presses a dtmf key instead of speaking a command after the list item

MultiItemListErrorPrompt

Playable

The recognition error prompt when the list has more than one item

MultiItemListHelpPrompt

Playable

The help prompt when the list has more than one item

NextPrePrompt

Playable

The pre-prompt played when the user says "next", or when the list auto-advances to the next item

OnlyItemListErrorPrompt

Playable

The recognition error prompt when the list has only one item

OnlyItemListHelpPrompt

Playable

The help prompt when the list has only one item

OnlyItemPrePrompt

Playable

The pre-prompt played when there is only one item

PreambleHelpPrompt

Playable

The prompt played if the user asks for help during the preamble

PreamblePrompt

Playable

The prompt that is played only once when the user first enters the list

PreambleRecognitionErrorPrompt

Playable

The prompt played if there is a recognition error in the preamble

PreviousPrePrompt

Playable

The pre-prompt played when the user says "previous"

ReturnAtEnd

boolean

If true, list will automatically exit when it reaches the end

 

Return results:

Return value

Type

Description

exitedFromList

boolean

True if the user exited during the list portion

exitedFromPreamble

boolean

True if the user exited during the preamble

Index

int

The index of the item on which the list exited

toString

String

The index of the item on which the list exited, in string form. If the list exited in the preamble, returns the string "PREAMBLE"

5. Appendix

This appendix describes the types of the configuration parameters and return values.

Browsable
One class implementing this interface is BrowsableVector, which includes methods to add to, set, and check items in the list of browsable items. An "item" consists of a Playable, which will be played as the name of the item when the list is browsed, and optional arbitrary user data (to be returned if the item is selected).
Confirmation
This class is used by Confirm and Correct during the Confirmation phase to confirm the values of the Speech Objects being managed by Confirm and Correct. It will ask a question like "I have you flying from Boston to New York on Friday, May 5th. Is that correct?" It is rare that a developer will need to override the default instance for this class.
Expression
An Expression is basically an encoded representation of the right-hand side of a arbitrarily-complex grammar production rule, including semantic tags and probabilities.
Grammar
Extensions of this class include both static and dynamic grammars, specified in code or in files, and combinations thereof. For all grammars it is possible to set the top-level Rulename (a String) that should be used for recognition. These grammars can include probabilities and semantic tags.
Group
This class allows you to define a group: a single-letter name that represents a collection of letters. This group name, along with others, can then be used in a format string for the Alpha Digit String SpeechObject. Each group can also enable or disable digit recognition.
Identify
This class is used by Confirm and Correct during the Identification phases to obtain the user's selection of which Object's value was incorrect. It will ask a question like "Which would you like to change - X, Y, or Z?" It is rare that a developer will need to override the default instance for this class.
Playable
There are many classes that implement the Playable interface. Approximately speaking, a Playable can be any concatenation of silence, recorded audio, tts, random prompts (2), and escalating prompts.(3) See Section 3.3 for a longer explanation of Playables.
Range
This class is a property of US Currency that captures the range as an order of magntitude. For example Range[1,3] captures a range of $0 to $999. By default, the Range is [1,8].
RecResult
This is the standard result object from a single recognition and includes such things as a recognized text string, a confidence score, natural language interpretations and their scores.
ResultFilter
This class (actually an interface) provides one method, pass, which examines a SpeechObject.Result and returns a boolean indicating whether or not the specified result passes this filter. It is used internally by the Dialog SpeechObject and its subclasses to postprocess recognition results.
SODate.DateLimit
Base class allowing absolute and relative date specifiers to set a lower or upper date for use by the Date SpeechObject.
SODayOfMonth
The SpeechObject responsible for getting the date of the month. It is invoked by the Date SpeechObject if it is determined that the day of the month has not been specified and cannot be inferred.
SODisambiguateCurrency
The SpeechObject responsible for disambiguating an ambiguous currency expression. When invoked by the Currency SpeechObject, it lists all the possible interpretations of the recognition result and asks the caller to specify the actual amount in dollars and cents. For example, if the caller said "two fifty" it asks, "Did you mean two dollars fifty cents or two hundred and fifty dollars?". The caller inputs an amount in this state that will override the amount obtained by the Currency SpeechObject.
SODisambiguateTime
The SpeechObject responsible for disambiguating an ambiguous time expression. When invoked by the Time SpeechObject, it asks whether the time is "in the morning or in the evening", or "in the afternoon or in the morning", and so on, depending on the candidate time (i.e. the time to be disambiguated). For the value "12", for example, it asks "Is that twelve noon or midnight?"
WeightedFormat
This is a convenience class used by the Sectioned Digit String SpeechObject to represent a weighted format. It consists of a string representing the sectioning, for example "DDDD-DD-DD", and of an associated probability.

Footnotes

(1) e.g. one could set a parameter to have a linked list as a value but not to add an element to the end of the list - setting to a value is allowed, but executing a function on the value is not. Of course, the calling application is free to check the value, compute a new value using this value, and set the parameter to the new value.

(2) a set of Playables, one of which is selected at random each time the random prompt is to be played.

(3) an ordered set of Playables 1 ... n such that 1 is played the first time the escalating prompt is to be played, 2 the second, and so on.


Valid HTML 4.0!