Speech Recognition Grammar Specification
for the W3C Speech Interface Framework

W3C Working Draft 10 July 2000

This version:: http://www.w3.org/TR/2000/WD-grammar-spec-20000710
Latest version:: http://www.w3.org/TR/grammar-spec
Editors:: Andrew Hunt, SpeechWorks International
Scott McGlashan, PipeBeach

Abstract

This document defines syntax for representating grammars for use in speech recognition so that developers can specify the words and patterns of words to be listened for by a speech recognizer. The syntax of the grammar format is presented in two forms, an augmented BNF syntax and an XML syntax. The specification intends to make the two representations directly mappable and allow automatic transformations between the two forms. The W3C Voice Browser Working Group is seeking input on whether the final specification should include both forms or be narrowed to a specific form.

Status of this Document

This specification is a Working Draft of the Voice Browser working group for review by W3C members and other interested parties. This is the first public version of this document. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress".

Publication as a Working Draft does not imply endorsement by the W3C membership, nor of members of the Voice Browser working groups. This is still a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite W3C Working Drafts as other than "work in progress."

This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only). This document is for public review. Comments should be sent to the public mailing list <www-voice@w3.org> (archive).

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.

1. Introduction
2. Rule Expansions
- 2.1 Tokens
- 2.2 Rule References
- 2.3 Sequences
- 2.4 Choices
- 2.5 Counts
- 2.6 Tags
- 2.7 Precedence
3. Rule Definitions
- 3.1 Basic Rule Definition
- 3.2 Scoping of Rule Definitions
- 3.3 Example Phrases
4. Grammar Documents
- 4.1 Grammar Header and Character Encoding
- 4.2 Grammar Declaration and Locale
- 4.3 Imports
- 4.4 Comments
5. Future Study
- 5.1 Augmented BNF and/or XML Form
- 5.2 Semantic Interpretation
- 5.3 Statistical Language Models: n-grams
- 5.4 Dynamic Grammars
- 5.5 Embedding Partial Rule Definitions
- 5.6 Multi-lingual Grammars
- 5.7 Phonemic Pronunciations for Tokens
- 5.8 Tag Element
- 5.9 Robust Recognition
- 5.10 Top-level Rules
6. Acknowledgements
Appendix A. Example Grammar in ABNF and XML Forms
Appendix B. Sample DTD for the XML Format
Appendix C. Formal Syntax for Augmented BNF
Appendix D. Sample Style Sheet to Convert XML to the ABNF Form
Appendix E. Requirements Analysis

1. Introduction

This document defines the syntax for grammar representation. The grammars are intended for use in speech recognition so that developers can specify the words and patterns of words to be listened for by a speech recognizer.

The syntax of the grammar format is presented in two forms, an augmented BNF syntax and an XML syntax. The specification intends to make the two representations directly mappable and allow automatic transformations between the two forms. The W3C Voice Browser Working Group is seeking input on whether the final specification should include both forms or be narrowed to a specific form.

Augmented BNF syntax (ABNF): this is a plain-text (non-XML) representation which is similar to traditional BNF grammar and to many existing BNF-like representations commonly used in the field of speech recognition including the JSpeech Grammar Format from which this specification is derived. Augmented BNF should not be confused with Extended BNF which is used in DTDs for XML and SGML.
XML: This syntax uses XML elements to represent the grammar constructs and adapts designs from the PipeBeach grammar (W3C Members only), TalkML and a research XML variant of the JSpeech Grammar Format.

Section 5 outlines area of Future Study around Grammar representations for speech recognition. In addition to the decision about supporting an XML form, the ABNF form or both, the committee is currently considering a proposal for representing statistical language models -- specifically "n-grams" -- that are used in many speech recognition systems.

The W3C Standard is known as the Speech Recognition Grammar Specification and is based upon the JSGF specification, which is owned by Sun Microsystems, Inc., California, U.S.A.

Sun, Sun Microsystems, Inc., the Sun logo, Java and all Java-based marks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. ©Sun Microsystems.

2. Rule Expansions

A rule expansion is a regular expression that defines patterns of tokens, rule references and combinations of these.

2.1: Token
2.2: Rule Reference
2.3: Sequence
2.4: Choices
2.5: Counts
2.6: Tags
2.7: Precedence

2.1 Token

A token (aka, a terminal symbol) is the part of a grammar that defines words or other entities that may be spoken. In both the XML and ABNF forms, any unmarked text is a token. For now, the grammar format assumes that tokens can be resolved as lexicon entries by the recognizer.

ABNF Form

Any plain text is a token. Tokens are delimited by white-space or by symbols with special syntactic function (e.g. ; = | * + <> () [] {} /* */ //). Tokens may be explicitly quoted if they contain white-space or special symbols.
hello
bon voyage
this is a test    (sequence of four tokens)
"San Francisco"
2
XML Form

Follows the same style as the ABNF tokens including the same use of quotes. XML delimiters act as token delimiters.
hello
bon voyage
this is a test    (sequence of four tokens)
"San Francisco"
2

Issues

It is desirable that sequences of tokens be interpreted identically in both the ABNF and XML forms. However, the surrounding syntactic constraints are different so that the list of special symbols identified above for ABNF will not necessarily be the same for XML. This issue will be addressed in a Future Revision.
Section 5.6 in Future Study proposes a mechanism by which tokens of multiple languages can be mixed in a single grammar.
Section 5.7 in Future Study proposes a mechanism by which pronunciations can be provided for tokens.
Need to define handling for tokens like "9", "+", "&", "'", """, "<". Usually it's better if developers use full words (i.e. "plus", "ampersand" or "and"...).

2.2 Rule Reference

Rulenames: Every rule definition has a local name that must be unique within the scope of the grammar in which it is defined. Legal rule names must be legal XML IDs as defined in the XML specification as the "Name" production in Section 2.3. Section 3.1 documents the rule definition mechanism and the legal naming of rules.

Local References:

When referencing rules defined locally (defined in the same grammar as contains the reference), always use a simple rulename reference which consists of the local rulename only. The ABNF and XML forms have a different syntax for representing a simple rulename reference.

ABNF Form

The simple rulename reference is prefixed by a "$" character.
    $city
    $digit
XML Form

The "ruleref" element is an empty element with a "uri" attribute that specifies the rule reference as a fragment.
<ruleref uri="#city"/>
<ruleref uri="#digit"/>

External Reference by URI:

References to rules defined in other grammars are legal under the conditions defined in Section 3. The external reference must identify the external grammar and a rule within that grammar.

ABNF Form

The URL for the external grammar and rulename fragment are enclosed in a parentheses following the "$" symbol.
$(http://www.grammars.com/world-cities.xml#canada)
$(http://www.example.com/numbers.xml#digit)
XML Form

The "ruleref" element is an empty element with a "uri" attribute that specifies the rule reference as a fragment.
<ruleref uri="http://www.grammars.com/world-cities.xml#canada"/>
<ruleref uri="http://www.example.com/numbers.xml#digit"/>

External Reference by Import:

Section 4.3 defines import declarations that act to bind a local alias for an external grammar identified by its URI. The rule reference syntax has a special mechanism to support reference to rules in grammars that are imported. As with reference by URI, the reference must include both the grammar name and the name of a rule defined within that grammar.

ABNF Form

A reference by import consists of the "$" symbol to mark the reference followed by the import alias, a period symbol "." and the rulename within the imported grammar.
$places.city
XML Form

Instead of the "uri" attribute, an import attribute is used with a special syntax (that is intended to look somewhat like a URI with a fragment). The value of the import attribute is the import alias followed by the hash separator "#" and then the rulename within the imported grammar.
<ruleref import="places#city"/>

Special Rules:

The following are specially defined rulenames. These rule names are defined appropriately by the recognizer and are treated as fully-qualified rulenames.

$NULL <ruleref uri="#NULL"/>
Defines a rule that is automatically matched: that is, matched without the user speaking any word.
$VOID <ruleref uri="#VOID"/>
Defines a rule that can never be spoken. Inserting VOID into a sequence automatically makes that sequence unspeakable.

Issues

The Working Group is considering using the use of XPath for more advanced rule references. In particular, it would be desirable for a dialog document that contains multiple fragments of grammar to define the same rulename in more than one fragment. The current use of XML IDs explicitly prohibits this usage.
There's still an open discussion about whether <rule> or $rule would be better as the syntax for rulename references in the ABNF format.
Additional special rules could be defined such as $GARBAGE or $FILLER.
The proposal on naming and imports defined a "toplevel" mechanism. That's only partially incorporated into this document since the committee did not discuss it in any detail.

2.3 Sequences

A sequence of legal rule expansions is itself a legal rule expansion.

ABNF Form

A sequence of legal expansions is a white-space-separated string of the concatenated sub-expansions. Where necessary, the sequence can be delimited at the start and end by parentheses.
this is a test           (sequence of tokens)
$action $object          (sequence of rule references)
the $object is $color    (sequence of tokens and rule references)
(fly to $city)           (parenthesese for explicit boundaries)
XML Form

With the exception of the "choice" element, a sequence in the XML syntax is defined by a sequence of contained elements and by space-separated tokens in CDATA. If necessary an "item" element can surround the elements of a sequence to allow tags or other data to annotate the sequence. (The "weight" attribute supported for item elements has meaning only if used within a "choice" element.)
this is a test                                           (sequence of tokens)
<ruleref uri="#action"/> <ruleref uri="#object"/>        (sequence of rule references)
the <ruleref uri="#object"/> is <ruleref uri="#color"/>  (sequence of tokens and rule references)
<item>fly to <ruleref uri="#city"/> </item>                (sequence container)

2.4 Choices

A set of alternative rule expansions is itself a legal rule expansion.

Weights may be optionally provided for each alternative expansion. The weights should indicate the occurrence likelihood for each choice. Weights are simple floating point values (nnn[.nnn]) and must be zero or greater. In the absence of weights or if the weights are not properly specified (e.g. one or more missing) then the choices are assumed to be equally likely.

ABNF Form

A set of alternative choices is identified as a list of legal expansions separated by the vertical bar symbol. If necessary, the set of alternative choices may be delimited by parentheses.
Michael | Yuriko | Mary | Duke | $otherNames
(1 | 2 | 3)
A weight is surrounded by forward slashes and placed before each item in the alternatives list.
/10/ small | /2/ medium | /1/ large
/3.1415/ pie | /1.414/ root beer
XML Form

The "choice" element identifies a set of alternative elements. Each alternative expansion is contained in a "item" element. Weights are indicated by the "weight" attribute on the "item" element.
<choice>
  <item>Michael</item>
  <item>Yuriko</item>
  <item>Mary</item>
  <item>Duke</item>
  <item><ruleref uri="#otherNames"/></item>
</choice>

<choice>
  <item>1</item>
  <item>2</item>
  <item>3</item>
</choice>

<choice>
  <item weight="3.1415">pie</item>
  <item weight="1.414">root beer</item>
</choice>

Issues

The XML form is verbose.
The current specification states that if any weight is missing or improper then all weights on a choice are ignored (effectively defaulting to equal weights). Other proposals being considered are that (1) the default weight is always "1" (which can lead to unexpected behavior if all other weights are either very large or very small) or (2) that the default weight is the min or max of the correctly defined weights (which is slightly less prone to unexpected behavior).
The current specification does not support log-likelihood weights.

2.5 Counts: Optional, *, +

Operators are provided that define a legal rule expansion as being another sub-expansion that is optional, that is repeated zero or more times, or that is repeated one or more times.

ABNF Form

Optional expansions are delimited by square brackets: [...]. The postfix operators, * and +, are attached to expansions that are to be repeated zero or more times, or one or more times respectively.
[very] big
pizza with ([and] $topping)+
$digit +
XML Form

The "count" element has a "number" attribute that indicates the number of times the contained expansion may be repeated. Defined values are "optional" or "?", "0+", "1+".
<count number="optional">very</count> big
pizza with <count number="1+"> <count number="optional">and</count> <ruleref uri="#topping"/> </count>
<count number="1+"> <ruleref uri="#digit"/> </count>

Issues

We could extend the number attribute to support values such as "0-3", "6" etc. This will simplify certain grammars (e.g. telephone numbers). In the XML form this changes the set of legal values of the number attribute. In ABNF, the values could be placed in square brackets or between slashes: e.g. $digit /3-6/
Do we want a short-hand name for the number attribute: e.g. "n" or "num"?

2.6 Tags

A tag is an arbitrary string that may be attached to any legal rule expansion. Tags do not affect the legal word patterns defined by the grammars. Tags instead provide information that is typically used in post-processing of speech recognition results that match a grammar (more specifically match rule definitions and rule expansions).

ABNF Form

A tag is delimited by curly braces and is a postfix attachment to a rule expansion. The number of openning curly braces matches the number of closing curly braces. This is useful when the contained text contains curly braces, for example, when the contained text is a scripting language. Alternatively, contained closing braces may be escaped with a backslash. A backslash must also be escaped with a backslash.
this is a test {tag attached to "test"}
open {action=open;} | close {action=shut;}
XML Form

A "tag" element may be attached to any of the rule expansion elements: "ruleref", "choice", "item", "count".
this is a <item tag='tag attached to "test"'>test</item>
<choice>
   <item tag="action=open;"> open </item>
   <item tag="action=shut;"> close </item>
</choice>

Issues

Section 5.2 outlines ongoing study of a semantic interpretation mechanism for speech grammars.
Section 5.8 outlines a Future Study item in which an XML tag element will be introduced with equivalent capability to the tag attribute.

2.7 Precedence

This section defines the precedence of the rule expansion syntax. Because XML documents explicitly indicate structure there is no ambiguity and thus a precedence definition is not required. The precedence definitions for the ABNF form are intended minimize the need for parentheses.

ABNF Form

The following is the ordering of precedence of rule expansions. Parentheses are used when necessary to explicitly control rule structure.

Rulename denoted by the dollar sign '$', and a quoted or unquoted token.

"()" parentheses for grouping and "[]" for optional grouping.

The unary operators (`+', `*', and tag attachment) apply to the tightest immediate preceding rule expansion. (To apply them to a sequence or to alternatives, use `()' or `[]' grouping.)

Sequence of rule expansions.

`|' separated set of alternative rule expansions.

XML Form

None required. XML structure is explicit.

3. Rule Definitions

A rule definition associates a legal rule expansion with a rulename. The rule definition is also responsible for defining the scope of the rule definition: whether it is local to the grammar in which it is defined or whether it may be imported into and referenced within other grammars. Finally, the rule definition may additionally include documentation comments and other pragmatics.

The rulename must be unique within a grammar. The same rulename may be used in multiple grammars with the rulename resolution specification defining how to uniquely identify each rule definition.

3.1: Basic Rule Definition
3.2: Scoping of Rule Definitions
3.3: Example Phrases

3.1 Basic Rule Definition

The core purposes of a rule definition is to associate a legal rule expansion with a rulename.

ABNF Form

The rule definition consists of an optional scoping declaration (explained in the next section) followed by a legal rule name, an equals sign, a legal rule expansion and a closing semi-colon. The rule definition has one of the following legal forms:
$ruleName = ruleExpansion;
public $ruleName = ruleExpansion;
private $ruleName = ruleExpansion;
For example:
$city = Boston | "New York" | Madrid;
$command = $action $object;
XML Form

A rule definition is represented by the "rule" element. The "id" attribute of the element indicates the name of the rule and must be unique within the grammar (this is enforced by XML). The contents of the "rule" element may be any legal rule expansion defined in Section 2. The "scope" attribute is explained in the next section.
<rule name="city">
   <choice>
      <item>Boston</item>
      <item>"San Francisco"</item>
      <item>Madrid</item> 
   </choice>
</rule>
<rule name="command">
   <ruleref uri="#action"/>
   <ruleref uri="#object"/>
</rule>

Issues

Because the rulename is an XML ID, a rulename must be unique to a document. If one or more XML grammars are embedded in another document (e.g. DialogML) then they cannot use the same rulename. The Working Group plans to consider XPath to address this constraint.

3.2 Scoping of Rule Definitions

A rule definition may be defined as local to a grammar or may be referencable within other grammars. The intent of scoping is to allow a grammar author to separate working rules from exported rules that are intended for use elsewhere. The scoping mechanism defined here is closest to that of the Java™ Programming Language. Section 4 explains the import mechanism and namespace resolution.

ABNF Form

A rule definition may be annotated as "public" or "private". If no scope is provided, the default is "private".
$town = Townsville | Beantown;
private $city = Boston | "New York" | Madrid;
public $command = $action $object;
XML Form

The "scope" attribute of the "rule" element defines the scope of the rule definition. Defined values are "public" and "private". If omitted, the default scope is "private".
<rule name="town">
   <choice>
      <item>Townsville</item>
      <item>Beantown</item> 
   </choice>
</rule>
<rule name="city" scope="private">
   <choice>
      <item>Boston</item>
      <item>"San Francisco"</item>
      <item>Madrid</item> 
   </choice>
</rule>
<rule name="command" scope="public">
   <ruleref uri="#action"/>
   <ruleref uri="#object"/>
</rule>

3.3 Example Phrases

It is often desirable to include examples of phrases that match rule definitions along with the definition. Zero, one or many example phrases may be provided for any rule definition. Because the examples are explicitly marked, automated tools can be used for regression testing and for generation of grammar documentation.

ABNF Form

A documentation comment is a C/C++/Java comment that starts with the sequence of characters /** and which immediately precedes the relevant rule definition. Zero or more "@example" tags may be contained at the end of the documentation comment. The tokenization of the example follows the tokenization and sequence rules defined in Section 2.
/**
 * A simple directive to execute an action.
 *
 * @example open the window
 * @example close the door
 */
public $command = $action $object;
XML Form

Any number of "example" elements may be provided as the initial content within a "rule" element. The tokenization of the example follows the tokenization and sequence rules defined in Section 2.
<rule name="command" scope="public">
    <example> open the window </example>
    <example> close the door </example>
    <ruleref uri="#action"/> <ruleref uri="#object"/>
</rule>

4. Grammar Documents

A grammar document specifies a set of associated rules. The grammar is named and all rules defined within that grammar are scoped within the grammar's namespace and all must have unqiue names within that namespace.

4.1: Grammar Header and Character Encoding
4.2: Grammar Declaration and Locale
4.3: Imports
4.4: Comments

4.1 Grammar Header and Character Encoding

The character encoding indicates the symbol set used in the document. For example, for US applications it would be common to use ASCII or the superset of ISO8859. For Japanese grammars, character sets such as JIS and Unicode could be used. For both the ABNF and XML forms, the omission of the character encoding passes responsibility for determining encoding to the recognizer or host platform.

ABNF Form

The ABNF form defines the character encoding in the openning line of the grammar. A legal grammar must start with the "#" symbol and the characters leading to the first newline symbol are of the style:
#ABNF version optional-char-encoding;
#ABNF V1.0;
#ABNF V1.0 ISO8859-5;
#ABNF V1.0 JIS;
XML Form

XML defines character encodings as part of the document's XML declaration on the first line of the document. (Note that the version number in this declaration refers to the XML version and not the version of the grammar specification.)
    <?xml version="1.0" ?>
    <?xml version="1.0" encoding="ISO8859-5" ?>
    <?xml version="1.0" encoding="JIS" ?>

4.2 Grammar Declaration and Locale

The Locale of a grammar indicates the primary language contained by the document. The locale follows RFC 1766 which defines a language code and an optional national or regional variant. If the locale is not defined, the recognizer or host platform should assume a reasonable default locale.

ABNF Form

An optional language declaration should be the first non-comment declaration of an ABNF grammar file following the self-identifying header.
language en-US;
XML Form

Following the XML convention the language and variant are indicated by a "xml:lang" attribute on root the "grammar" element.
<grammar xml:lang="en-US">
... imports
... rule definitions
</grammar>

Issues

This draft assumes a mono-lingual grammar but does not restrict a recognizer from loading and using separate grammars each containing different languages. Specifically, this implies that the xml:lang attribute can be ignored on elements other than "grammar".
Section 5.6 in Future Study describes additional capabilities being considered for supporting multi-lingual grammars.

4.3 Imports

An import is a convenience mechanism for referencing externally defined grammars. An import is effectively a local name -- an alias -- for an external grammar identified by its URI. Rule references (as defined in Section 2.2) can use the alias instead of the URI when referencing rules of the imported grammar. A document should never import two grammars and assign the same local alias.

Note: the import declaration does not copy the referenced grammar. Also, it is not possible to reference externally-defined rules as if they were local rules (using only the simple rulename).

ABNF Form

Zero, one or many "import" declarations follow the optional "language" declaration, but preceed the rule definitions in the body of the grammar. The following import statements define local aliases for imported grammars.
 import http://www.example.com/grammar.gram as mygrammar;
 import http://www.grammars.com/cities-states.xml as places;
 
 ... $places.city ...
XML Form

Zero, one or many "import" elements may be contained as the leading elements within a "grammar" element. The "import" elements must preceed the "rule" elements. The "import" element is empty. The "uri" and "name" attributes are required.
 <import uri="http://www.example.com/grammar.xml" name="mygrammar"/>
 <import uri="http://www.grammars.com/cities-states.xml" name="places"/>

 ... <ruleref import="places#city"/> ...

Issues

The current specification allows a grammar in ABNF format to reference/import a grammar in the XML format and vice versa. There is no clear reason to prevent this behaviour unless stated otherwise in a compliance document.

4.4 Comments

Comments may be placed in most places in a grammar document. For XML, use XML comments. For ABNF, we allow documentation comments and C/C++/Java-style comments. Thus, comments in the XML and ABNF formats are not directly mappable since the ABNF form has more detailed comments.

ABNF Form

C/C++/Java comments are permitted. Documentation comments are permitted before grammar, language and import declarations and before each rule definition.
// C++/Java-style single-line comment
/* C/C++/Java-style comment */
/** Java-style documentation comment */
XML Form

An XML comment has the following syntax.

Issues

There's an inconsistency between XML and ABNF comments for documentation.

5. Future Study

5.1 Augmented BNF and/or XML Form

The W3C Voice Browser Working Group is closely studying the issue of whether the final grammar format specification should include the Augmented BNF form, the XML form or both. We are very interested in comments of reviewers regarding this issue. The following are some of the issues that have been identified.

XML Pro: leverage the ever-expanding set of tools for generating, manipulating, transforming and parsing XML documents.
ABNF Pro: grammars are generally more human-readable in ABNF format.
XML Pro: Appendix D provides a style sheet (XSL) that transforms an XML grammar document into the ABNF format. The reverse transformation is possible with an automated tool but requires a parser that is specific to ABNF.
ABNF & XML: both forms permit the embedding of entire grammars or partial grammars into parent XML documents. In particular, it is a requirement that grammars be embeddable into Dialog Markup Language documents (a specification being concurrently developed by the Working Group).
ABNF Pro: BNF formats are widely used in the speech recognition community and in existing products. They are known to serve the intended purpose well and are understood by a significant body of developers.

5.2: Semantic Interpretation

A speech recognition grammar defines what a user can say. Technically, it defines the syntax of the spoken input that can be heard by a speech recognizer.

The W3C Voice Browser Working Group is currently working on a draft for an Natural Language Semantics specification which will represent interpreted spoken input: what a user means.

The group has initiated work on defining a mechanism by which the semantic interpretation for spoken input sentence can be derived from the sequence of spoken words and the grammar(s) that it matches. A draft proposal is planned for the next release of this document. We are interested in comments and requirements from reviewers of this document.

The group is currently exploring means by which semantic interpretation can be attached to the grammar using the "tagging" mechanisms defined in this document (specifically in Section 2.6). It must be possible to represent the semantic result in the NL Semantic format and it must also be possible to use the semantic result in the processing of the Dialog Markup Language. The first release is intended to support stateless interpretation of spoken input. The following are amongst the approaches under consideration.

Simple tags: interpret a tag as a value string that represents the meaning of the object to which it is attached.
Action tags: embed a scripting language (e.g. ECMAScript) in the tags. It should have access to spoken words, to matched sub-rules and to other information which assists in semantic interpretation. The return value should represent the meaning of each defined rule.
Declarative tags: the tag element can contain constructs to generate semantic representation in the NL semantic markup language. The constructs should follow the DOM document generating principles outlined in Sec. 7 of W3C XSLT recommendation, "Creating the result tree". Each CFG rule should be regarded and treated as a template rule in the XSLT context. As such, the plain text output is a special case of using declarative tags by adhering strictly to Sec. 7.2, "creating text" of the XSLT specifications.

Specific proposals and general requirements for semantic interpretation are welcomed.

5.3: Statistical Language Models: n-grams

The current specification is restricted to representations in the form of regular grammars and context-free grammars. Some classes of speech recognizer additionally support statistic language models — technically, n-grams — that represent legal patterns of words by statistical occurrence.

The W3C Voice Browser Working Group is currently drafting a proposal for an n-gram specification planned for release in the next few months.

The group welcomes comments and requirements for this document.

5.4: Dynamic Grammars

The current specification makes no statement about when grammars are loaded into a voice browser or speech recognizer. Furthermore, the current specification makes no statement about how or when the definition of a grammar can be modified after its initial loading. The following are issues under consideration.

A static vs. volatile attribute might be attached to individual rules or to an entire grammar. Where rules are known to be static or volatile some speech recognition systems are able to optimize run-time performance. A boolean "static" or "volatile" attribute may be attached to any rule element in the XML form or to the grammar element to indicate the default value for the grammar. In the ABNF format a "static" or "volatile" keyword would be permitted on equivalent the rule and grammar declarations.
To reduce the size of grammar files when dynamic changes are made, some consideration has been given to allowing a document to define only those rules which are modified. Unchanged rules may be stubbed out or omitted.

5.5: Embedding Partial Rule Definitions

The current specification is intended to support the embedding of fully-defined grammars into parent documents, in particular, into the Dialog Markup Language currently in development.

It is also desirable that it be possible to embed just the fragment of a rule definition that represents the right side of a rule definition. This could be any legal combination of the entities defined in Section 2. There is nothing in the current specification that prohibits this, but study of the namespace issues is required.

For the ABNF form, the embedded grammar may look like the following:

<grammar> apple | melon | banana | peach </grammar>

For the XML form, the embedded grammar may look like the following:

<grammar xmlns:nl='http://www.example.com/Voice/grammar/schema'>
  <choice> 
     <item>apple</item>
     <item>melon</item>
     <item>banana</item>
     <item>peach</item>
  </choice> 
</grammar>

Please note that the namespace is fictionnal and will change.

5.6: Multi-lingual grammars

The current specification limits grammars to containing a single language (see Section 4.2). There is, however, no prohibition against grammars of more than one languages being loaded into a voice browser or speech recognizer simultaneously if the platform supports all the languages.

The working group is considering extending both the ABNF and XML formats to support more than one language in a single grammar document. It may be possible to attach a language identifier to each rule definition to some or any of the rule expansions defined in Section 2, or to specifically to attach a language identifier to any token. The following is one way in which per-token language identifiers could be attached (using RFC 1766 values).

For the ABNF form, augment tokens with the language identifier separated by a special character. This example includes the French, English and Japanese words for "yes".

oui!fr | yes!en | hai!ja

For XML, introduce a token element with the standard xml:lang attribute. An alternative might also be to allow xml:lang on any element but this introduces interpretation complexities in trees of grammar entities in which language can be different at each node and leaf.

<choice> 
  <item><token xml:lang="fr">oui</token></item>
  <item><token xml:lang="en">yes</token></item>
  <item><token xml:lang="ja">hai</token></item>
</choice>

For some words, especially proper names, the same orthographic text is used in multiple languages. In these cases it may also be desirable to allow attachment of multiple locales to the same token. For example:

Robert!en,fr | Roger!en,fr

5.7: Phonemic pronunciations for tokens

For many words, the written form does not accurately indicate the correct pronunciation of the word. For example, in languages that use the Chinese character set a single character may have many pronunciations amongst which only one might make sense in a given context. Similarly, written forms such as abbreviations, acronyms, proper names, and foreign words do not always reliably indicate correct pronunciation.

Because a recognizer needs to know a word's pronunciation to be able to hear it, the Working Group is considering an enhancement to both the ABNF and XML grammar formats to allow a grammar document to explicitly specify pronunciations. This mechanism may be supported in addition to any existing platform mechanism for supporting vocabularies and pronunciations. It is expected that if pronunciations are supported, that they be optional and that they use a similar format to the pronunciation element defined in the parallel specification for the Speech Synthesis Markup Language (e.g. supporting the same phonetic alphabets including the International Phonetic Alphabet).

For the ABNF form, augment tokens with the pronunciation language.

// Following ":" is the US pronunciation as IPA characters 
tomato:t&#252;m&#251;to&#28A;

For XML, add a token element with optional phoneme and phonetic alphabet attributes.

<-- The attribute provides the US pronunciation as IPA -->
<token phoneme="t&#252;m&#251;to&#28A;">tomato</token>

5.8: Tag element

Both the XML and ABNF forms in the current specification permit tags to be attached to any rule expansion (see Section 2.6). In the XML form the tag is attached to an expansion as an attribute. In the ABNF form the tag is attached to a legal rule expansion as a post-fix entity contained with curly braces. In both forms the contents of the tag is an arbitrary string, however, the Working Group expects that semantic attachment will be an important special-case use of tags in a future revision of this specification (see Section 5.2).

The Working Group plans to introduce a tag element for the XML form of the specification in its next release. This would complement the existing tag attribute. The separate element should have the following advantages:

CDATA contained within the tag element would have fewer formatting constraints that tags attached as an attribute.
Multi-line tags would be easier to read.
The tag data could contain arbitrary XML data, mostly likely in a different XML namespace.

There are two forms in which the tag element could be attached to the existing rule expansion elements.

The tag element could contain the rule expansion to which it is attached. A downside of this approach is that the tag CDATA would need to be separated from the expansion so an additional element may be needed for the tag data.
Tag elements could be interpreted as if attached to the element that contains them. Some of the design choices available include (1) allow a single tag element as the first contained sub-element, (2) allow a single tag element as the last contained sub-element, (3) allow multiple tag elements within a single parent element. In all cases, there may also be a tag attribute on the parent element and for (3) there is explicitly an ability to attach multiple tags: multiple tags will require an explicit interpretation model.

5.9: Robust Recognition

The Working Group plans to consider constructs that address certain problems encountered in robust speech recognition. Issues that may be considered include handling of out-of-vocabulary input, disfluencies and noise management.

5.10: Top-level rules

Many existing speech recognition grammar formats explicitly or implicitly treat one rule defined within the grammar as a special "top-level" rule. The top-level rule is the one that is activated for recognition or is the rule that may be referenced from within other grammars. In the current specification, this is similar to marking a single rule as "public".

The Working Group has discussed extending the rule naming and referencing semantics to support a "top-level" rule. This may be achieved by marking a single or multiple rules as "top-level" or by adding a top-level declaration to a grammar which names one or many top-level rules (e.g. as an attribute of the grammar element in the XML format).

The referencing semantics have not yet been resolved. In particular, if grammar "X" contains toplevel rules called "Y" and "Z", then an external reference to "$X" would intuitively suggest a reference to the top-level rules. However, "$X" appears to be a local rule reference. To work around this ambiguity the "toplevel" name could be reserved for this special purpose.

Comments are welcomed on whether the top-level representation is useful in developing application, and on how to define an appropriate syntax.

6. Acknowledgements

This document was written with the participation of the members of the W3C Voice Browser Working Group (listed in alphabetical order):

Mike Brown, Lucent Bell Labs
Dan Burnett, Nuance Communications
Andrew Hunt, SpeechWorks International
Bruce Lucas, IBM
Scott McGlashan, PipeBeach
Dave Raggett, HP
Kuansan Wang, Microsoft

Appendix A: Example Grammars in ABNF and XML Forms

The following shows a simple grammar that supports commands such as "open a file" and "please move the window". It references a separately-defined grammar for politeness which is not shown here.

ABNF Form

#ABNF V1.0 ISO8859-1x;

language en;

import http://www.sayplease.com/politeness.xml as polite;

/**
 * Basic command.
 * @example please move the window
 * @example open a file
 */

public $basicCmd = $polite.startPolite $command $polite.endPolite;

$command = $action $object;
$action = /10/ open {OPEN} | /2/ close {CLOSE} 
                 | /1/ delete {DELETE} | /1/ move {DELETE};
$object = [the | a] (window | file | menu);

XML Form

<?xml version="1.0"?>

<grammar xml:lang="en">

<import uri="http://www.sayplease.com/politeness.xml" name="polite"/>

<rule id="basicCmd" scope="public">
  <example> please move the window </example>
  <example> open a file </example>

  <ruleref import="polite#startPolite"/>
  <ruleref uri="#command"/>
  <ruleref import="polite#endPolite"/>
</rule>

<rule id="command">
  <ruleref uri="#action"/> <ruleref uri="#object"/>
</rule>

<rule id="action">
   <choice>
      <item weight="10" tag="OPEN">   open </item>
      <item weight="2"  tag="CLOSE">  close </item>
      <item weight="1"  tag="DELETE"> delete </item>
      <item weight="1"  tag="MOVE">   move </item>
    </choice>
</rule>

<rule id="object">
  <count number="optional">
    <choice> <item> the </item> <item> a </item> </choice>
  </count>
  <choice>
      <item> window </item>
      <item> file </item>
      <item> menu </item>
  </choice>
</rule>

</grammar>

The next two grammars show both an imported and importing grammar in both XML and ABNF formats.

ABNF: http://www.example.com/places.gram

#ABNF V1.0 ISO8859-1;

language en;

// No imports in this grammar.

public $city = Boston | Philadelphia | Fargo;

public $state = Florida | Idaho | New York;

// References to local rules
// Artificial example allows "Boston, Florida!"

public $city_state = $city $state;

ABNF: http://www.example.com/booking.gram

#ABNF V1.0 ISO8859-1;

language en;

import http://www.example.com/places.xml as someplaces;

// Reference by URI syntax
$flight = I want to fly to $(http://www.example.com/places.xml#city);

// Reference using imported name
$exercise = I want to walk to $someplaces.state;

// Reference using top level import feature
// Issue: $someplaces looks like a local reference
$wet = I want to swim to $someplaces;

XML Grammar: http://www.example.com/places.xml

<?xml version="1.0"?>

<grammar xml:lang="en">

   <rule id="city" scope="public">
     <choice>
       <item>Boston</item> <item>Philadelphia</item> <item>Fargo</item>
     </choice>
   </rule>

   <rule id="state" scope="public">
     <choice>
       <item>Florida</item> <item>Idaho</item> <item>New York</item>
     </choice>
   </rule>

   <!-- Reference by URI to a local rule -->
   <!-- Artificial example allows "Boston, Florida"! -->
   <rule id="city_state" scope="public">
     <ruleref uri="#city"/> <ruleref uri="#state"/>
   </rule>
</grammar>

XML Grammar: http://www.example.com/booking.xml

<?xml version="1.0"?>

<grammar xml:lang="en">
   <import uri="http://www.example.com/places.xml" name="someplaces"/>

   <!-- Using URI syntax -->
   <rule id="flight">
     I want to fly to <ruleref uri="http://www.example.com/places.xml#city"/>
   </rule>

   <!-- Using import syntax -->
   <rule id="exercise">
     I want to walk to <ruleref import="someplaces#state"/>
   </rule>

   <!-- Using import syntax to the toplevel import -->
   <rule is="wet">
     I want to swim to <ruleref import="someplaces"/>
   </rule>
</grammar>

Appendix B: Sample DTD for the XML Format

The DTD has the following known limitations.

The count, item and choice elements may be empty. The specification does not describe the interpretation for empty versions of these elements. We could modify the DTD (bar emptiness) or modify the document.
"NULL" and "VOID" are not explicitly listed as values of a "ruleref".

<?xml version="1.0" encoding="ISO-8859-1"?>

<!-- Speech Recognition Grammar v0.5 20000615 -->

<!ENTITY % rule-expansion " #PCDATA | ruleref | item | choice | count " >

<!ELEMENT ruleref EMPTY>
<!ATTLIST ruleref
     uri CDATA #IMPLIED
     import CDATA #IMPLIED
     tag CDATA #IMPLIED>

<!ELEMENT choice (item)*>
<!ATTLIST choice
     tag CDATA #IMPLIED>

<!ELEMENT item ( %rule-expansion; )*>
<!ATTLIST item
    weight NMTOKEN #IMPLIED
    tag CDATA #IMPLIED>

<!ELEMENT count ( %rule-expansion; )*>
<!ATTLIST count
    number CDATA #IMPLIED
    tag CDATA #IMPLIED>

<!ELEMENT rule ( %rule-expansion; | example )*>
<!ATTLIST rule 
    id ID #REQUIRED
    scope (private | public) "private">

<!ELEMENT example (#PCDATA)>

<!ELEMENT import EMPTY>
<!ATTLIST import
    uri CDATA #REQUIRED
    name CDATA #REQUIRED>

<!ELEMENT grammar (import*,rule*)>
<!ATTLIST grammar
    xml:lang CDATA #REQUIRED>

Appendix C: Formal Syntax for Augmented BNF

A Future Revision of this document will include a formal specification of the syntax of the Augmented BNF format.

Appendix D: Sample Style Sheet to Convert XML to ABNF

<?xml version="1.0"?> 

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:strip-space elements="import rule ruleref example item choice count token"/>

<xsl:output method="text"/>

<xsl:template name="addtag">
<xsl:if test="string(@tag)!=''">{<xsl:value-of select="@tag"/>}</xsl:if>
</xsl:template>

<xsl:template name="addweight">
<xsl:if test="string(@weight)!=''">/<xsl:value-of select="@weight"/>/</xsl:if>
</xsl:template>

<xsl:template match="grammar">
#ABNF V1.0 <xsl:value-of select="system-property('xsl:encoding')"/>;
<xsl:text> </xsl:text>
language <xsl:value-of select="@lang"/>;
<xsl:apply-templates/>
</xsl:template>

<xsl:template match="import">
import <xsl:value-of select="@uri"/> as <xsl:value-of select="@name"/>;
</xsl:template>

<xsl:template match="rule">
<xsl:value-of select="@scope"/> $<xsl:value-of select="@id"/> = 
<xsl:apply-templates/>
;
</xsl:template>

<xsl:template match="ruleref">
<xsl:choose>
<xsl:when test="string(@import)!=''">
$<xsl:value-of select="translate(@import,'#','.')"/>
</xsl:when>
<xsl:otherwise>
<xsl:choose>
<xsl:when test="starts-with(string(@uri),'#')">
$<xsl:value-of select="substring-after(@uri,'#')"/>
</xsl:when>
<xsl:otherwise>
$(<xsl:value-of select="@uri"/>)
</xsl:otherwise>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
<xsl:call-template name="addtag"/>
</xsl:template>

<xsl:template match="example">
</xsl:template>

<xsl:template match="choice">
(
<xsl:apply-templates/>
)<xsl:call-template name="addtag"/>
</xsl:template>

<xsl:template match="item">
<xsl:apply-templates/> 
<xsl:call-template name="addtag"/>
</xsl:template>

<xsl:template match="choice/item">
<xsl:call-template name="addweight"/>
<xsl:apply-templates/> 
<xsl:call-template name="addtag"/>
<xsl:if test="not(position()=last())">|</xsl:if>
</xsl:template>

<xsl:template match="count[@number='optional']|count[@number='?']">
<xsl:call-template name="addtag"/>[<xsl:apply-templates/>
]
</xsl:template>

<xsl:template match="count[@number='0+']">
<xsl:call-template name="addtag"/>(<xsl:apply-templates/>
)*
</xsl:template>

<xsl:template match="count[@number='1+']">
<xsl:call-template name="addtag"/>(<xsl:apply-templates/>
)+
</xsl:template>

</xsl:stylesheet>

Appendix E: Requirements Analysis

The W3C Voice Browser Working Group previously published Grammar Representation Requirements for Voice Markup Languages. This draft specification largely follows those requirements. The group plans to revise the requirements document and to ensure that the requirements and the specification are matched.

Speech Recognition Grammar Specification for the W3C Speech Interface Framework

W3C Working Draft 10 July 2000

Abstract

Status of this Document

ABNF Form

XML Form

Issues

Local References:

ABNF Form

XML Form

External Reference by URI:

ABNF Form

XML Form

External Reference by Import:

ABNF Form

XML Form

Special Rules:

Issues

ABNF Form

XML Form

ABNF Form

XML Form

Issues

ABNF Form

XML Form

Issues

ABNF Form

XML Form

Issues

ABNF Form

XML Form

ABNF Form

XML Form

Issues

ABNF Form

XML Form

ABNF Form

XML Form

ABNF Form

XML Form

ABNF Form

XML Form

Issues

ABNF Form

XML Form

Issues

ABNF Form

XML Form

Issues

ABNF Form

XML Form

ABNF: http://www.example.com/places.gram

ABNF: http://www.example.com/booking.gram

XML Grammar: http://www.example.com/places.xml

XML Grammar: http://www.example.com/booking.xml

Speech Recognition Grammar Specification
for the W3C Speech Interface Framework