Speech Recognition Grammar Specification
for the W3C Speech Interface Framework

W3C Working Draft 3 January 2001

This version:: http://www.w3.org/TR/2001/WD-speech-grammar-20010103
Latest version:: http://www.w3.org/TR/speech-grammar
Previous version:: http://www.w3.org/TR/2000/WD-grammar-spec-20000710
Editors:: Andrew Hunt, SpeechWorks International; Scott McGlashan, PipeBeach

Abstract

This document defines syntax for representating grammars for use in speech recognition so that developers can specify the words and patterns of words to be listened for by a speech recognizer. The syntax of the grammar format is presented in two forms, an augmented BNF syntax and an XML syntax. The specification intends to make the two representations directly mappable and allow automatic transformations between the two forms.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This is the 3 January 2001 last call Working Draft of the "Speech Grammar Markup Language Specification". This last call review period ends 31 January 2001. You are encouraged to subscribe to the public discussion list <www-voice@w3.org> and to mail in your comments before the review period ends. To subscribe, send an email to <www-voice-request@w3. org> with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe). A public archive is available online.

This specification describes markup for grammars for use in speech recognition, and forms part of the proposals for the W3C Speech Interface Framework. This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only). Note that the name of the document has been changed from "grammar-spec" to "speech-grammar" to match how people have been referring to the specification.

To help the Voice Browser working group build an implementation report, (as part of advancing the document on the W3C Recommendation Track), you are encouraged to implement this specification and to indicate to W3C which features have been implemented, and any problems that arose.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite W3C Working Drafts as other than "work in progress". A list of current public W3C Working Drafts can be found at http://www.w3.org/TR/.

1. Introduction
- 1.1 Grammar Processors
- 1.2 Scope
2. Rule Expansions
- 2.1 Tokens
- 2.2 Rule References
  - 2.2.1 Local References
  - 2.2.2 External Reference by URI
  - 2.2.3 External Reference by Import
  - 2.2.4 Special Rules
  - 2.2.5 Referencing N-gram Documents
- 2.3 Sequences
- 2.4 Alternatives
- 2.5 Counts
- 2.6 Tags
- 2.7 Language and Lexicon
- 2.8 Precedence
3. Rule Definitions
- 3.1 Basic Rule Definition
- 3.2 Scoping of Rule Definitions
- 3.3 Example Phrases
4. Grammar Documents
- 4.1 Grammar Header and Character Encoding
- 4.2 Grammar Locale
- 4.3 Grammar Mode
- 4.4 Root Rule Declaration
- 4.5 Imports
- 4.6 Comments
5. Conformance
6. Future Study
- 6.1 Semantic Interpretation
- 6.2 Morphology and Tokens
- 6.3 Dynamic Grammars
- 6.4 Embedding Partial Rule Definitions
- 6.5 Phonemic Pronunciations for Tokens
- 6.6 Lexicons
- 6.7 Tag Element
- 6.8 MIME Types
7. Acknowledgements
Appendix A. Example Grammar in ABNF and XML Forms
Appendix B. DTD for the XML Format
Appendix C. Formal Syntax for Augmented BNF
Appendix D. DTMF Grammars
Appendix E. XSLT Style Sheet to Convert XML Grammars to the ABNF Form

1. Introduction

This document defines the syntax for grammar representation. The grammars are intended for use by speech recognizers and other grammar processors so that developers can specify the words and patterns of words to be listened for by a speech recognizer.

The syntax of the grammar format is presented in two forms, an Augmented BNF (ABNF) syntax and an XML syntax. The specification ensures that the two representations are directly mappable to allow automatic transformations between the two forms.

Augmented BNF syntax (ABNF): this is a plain-text (non-XML) representation which is similar to traditional BNF grammar and to many existing BNF-like representations commonly used in the field of speech recognition including the JSpeech Grammar Format from which this specification is derived. Augmented BNF should not be confused with Extended BNF which is used in DTDs for XML and SGML.
XML: This syntax uses XML elements to represent the grammar constructs and adapts designs from the PipeBeach grammar, TalkML and a research XML variant of the JSpeech Grammar Format.

Section 2, Section 3 and Section 4 define the ABNF and XML grammar formats. Readers may find the Examples in Appendix A instructive in understanding the specification. Section 5 defines the Conformance criteria for grammar documents and for grammar processors such as speech recognizers. Section 6 identifies a number of areas of Future Study for the grammar specification under consideration by the W3C Voice Browser Working Group.

This W3C Standard is known as the Speech Recognition Grammar Specification and is based upon the JSpeech Grammar Format (JSGF) specification, which is owned by Sun Microsystems, Inc., California, U.S.A.

1.1 Grammar Processors

A grammar processor is any entity that accepts as input grammars as described in this specification. As the specification title implies, speech recognizers are considered to be an important class of grammar processor. Another class of grammar processor anticipated by this specification is a DTMF detector (see Section 4.3 and Appendix D).

For simplicity, through-out this document references to a speech recognizer apply to other types of grammar processor unless explicitly stated otherwise.

A speech recognizer is a grammar processor with the following inputs and outputs:

Input: A grammar or multiple grammars as defined by this specification. These grammars inform the recognizer of the words and patterns of words to listen for.
Input: An audio stream that may contain speech content that matches the grammar(s).
Output: Descriptions of results that indicate details about the speech content detected by the speech recognizer. The format and details of the content of the result are outside the scope of this specification. For informative purposes, most practical recognizers will include at least a transcription of any detected words.
Output: Error and other performance information may be provided to the host environment: e.g. to a Voice Browser that incorporates a grammar processor. The method of interaction with the host environment is outside the scope of this document. The specification does, however, require that a conformant grammar processor inform the environment of errors in parsing and other processing of grammar documents.

1.2 Scope

The primary use of a speech recognizer grammar is to permit a speech application to indicate to a recognizer what it should listen for, specifically:

Words that may be spoken,
Patterns in which those words may occur,
Language of the spoken words.

Many speech recognizers also support the Speech Recognition N-Gram Grammar Specification. Both specifications define ways to set up a speech recognizer to detect spoken input but define the word and patterns of words by different and complementary means. Some recognizers permit cross-references between grammars in the two formats. The rule reference element of this specification describes how to reference an N-gram document.

The grammar specification does not address a number of other issues that affect speech recognition performance. Most of the following capabilities are addressed by the context in which a grammar is referenced or invoked: for example, through the Dialog Markup Language or through a Speech Recognizer API.

Speaker adaptation data: some speech recognizers support the ability to dynamically adjust to the voice of a speaker and often the ability to store adaptation data for that voice for future use. The speaker data may also includes lists of words more often spoken by the user. The grammar format does not explicitly address these capabilities.
Speech recognizer configuration: the grammar format does not incorporate features for setting recognizer features such as timeouts, recognition thresholds, search sizes or N-best result counts.
Lexicon: the grammar format does not address the loading of lexicons or the pronunciation of words referenced by the grammar. The W3C Voice Browser Working Group is considering the development of a standard lexicon format. If and when a format is developed appropriate updates will be made to this grammar specification.

2. Rule Expansions

A rule expansion is a regular expression that defines patterns of tokens, rule references and combinations of these.

2.1: Token
2.2: Rule Reference
2.3: Sequence
2.4: Alternatives
2.5: Counts
2.6: Tags
2.8: Precedence

2.1 Token

A token (aka, a terminal symbol) is the part of a grammar that defines words or other entities that may be spoken. In both the XML and ABNF forms any unmarked text is a token. The grammar format assumes that tokens can be resolved as lexicon entries and associated pronunciations by the recognizer.

ABNF Form

Any plain text is a token. Tokens are delimited by white-space or by symbols with special syntactic function (e.g. ; = | * + <> () [] {} /* */ //). Tokens may be explicitly quoted if they contain white-space or special symbols.
hello
bon voyage
this is a test    (sequence of four tokens)
"San Francisco"
2
XML Form

Follows the same style as the ABNF tokens including the same use of quotes. XML delimiters act as token delimiters.

As an alternative to quoting tokens, in the XML form a token may be explicitly delimited by a "token" element. The content of the element is PCDATA representing a single token. The permitted attributes are "xml:lang" and "lexicon" as described in Section 2.7.
hello
bon voyage
this is a test    (sequence of four tokens)
"San Francisco"
<token xml:lang="en-US">San Francisco</token>
2

Issues

Need to define handling for tokens like "9", "+", "&", "'", """, "<". Usually it's better if developers use full words (i.e. "plus", "ampersand" or "and"...).

2.2 Rule Reference

Rulenames: Every rule definition has a local name that must be unique within the scope of the grammar in which it is defined. Legal rule names must be legal XML IDs as defined in the XML specification as the "Name" production in Section 2.3. Section 3.1 documents the rule definition mechanism and the legal naming of rules.

This table summarizes the various forms of rule reference that are possible within and across grammar documents.

Reference type	ABNF Form	XML Form
2.2.1: Local rule reference	`$rulename`	`<ruleref uri="#rulename"/>`
2.2.2: Reference to a named rule of a grammar identified by a URI	`$(grammarURI#rulename)`	`<ruleref uri="grammarURI#rulename"/>`
2.2.2: Reference to the root rule of a grammar identified by a URI	`$(grammarURI)`	`<ruleref uri="grammarURI"/>`
2.2.3: Reference to a named rule of a grammar identified by an import alias (alias for a URI)	`$$alias#rulename`	`<ruleref import="alias#rulename"/>`
2.2.3: Reference to the root rule of a grammar identified by an import alias (alias for a URI)	`$$alias`	`<ruleref import="alias"/>`
2.2.4: Special rule definitions	`$NULL $VOID $GARBAGE`	`<ruleref special="#NULL"/> <ruleref special="#VOID"/> <ruleref special="#GARBAGE"/>`

2.2.1 Local References

When referencing rules defined locally (defined in the same grammar as contains the reference), always use a simple rulename reference which consists of the local rulename only. The ABNF and XML forms have a different syntax for representing a simple rulename reference.

ABNF Form

The simple rulename reference is prefixed by a "$" character.
$city
$digit
XML Form

The "ruleref" element is an empty element with a "uri" attribute that specifies the rule reference as a fragment.
<ruleref uri="#city"/>
<ruleref uri="#digit"/>

2.2.2 External Reference by URI

References to rules defined in other grammars are legal under the conditions defined in Section 3. The external reference must identify the external grammar and may identify a specific rule within that grammar. If the rulename fragment is omitted then the reference targets the "root" rule of the external grammar.

ABNF Form

The URL for the external grammar and optional rulename fragment are enclosed in a parentheses following a "$" symbol.
// References to specific rules of an external grammar
$(http://www.mygrammars.com/world-cities.gram#canada)
$(http://www.example.com/numbers.gram#digit)

// Reference to the root rule of an external grammar
$(../date.gram)
XML Form

The "ruleref" element is an empty element with a "uri" attribute that optionally specifies the rule reference as a fragment.

<ruleref uri="http://www.grammars.com/world-cities.xml#canada"/>
<ruleref uri="http://www.example.com/numbers.xml#digit"/>


<ruleref uri="../date.gram"/>

Issues

There is a proposal to add a "type" attribute to the "ruleref" element to indicate the MIME type of the referenced grammar. Representing the same information in the ABNF form may be difficult.
There is a proposal to add a "mode" attribute to the "ruleref" element to permit a grammar to indicate the type of the grammar that it references. In considering this modification there needs to be study of when a grammar of one mode (e.g. "speech" or "dtmf") would refer to a grammar of the another mode. Representing the same information in the ABNF form may be difficult.

2.2.3 External Reference by Import

Section 4.5 defines import declarations. An import defines a local alias (short-hand reference) for an external grammar identified by its URI that acts to bind a local alias for an external grammar identified by its URI. The rule reference syntax has a special mechanism to support reference to rules in grammars that are imported. As with reference by URI, the reference must identify the grammar name and may optionally specify the name of a rule defined within that grammar. If the rulename is omitted, the root rule of the grammar is referenced.

ABNF Form

A reference by import consists of the "$$" symbols to mark the reference followed by the import alias and optionally by the fragment separator "#" and the rulename within the imported grammar.
// Reference a specific rule of the imported grammar
$$places#city

// Reference the root rule of the imported grammar
$$places
XML Form

Instead of the "uri" attribute, an import attribute is used with a special syntax (that is intended to look somewhat like a URI with a fragment). The value of the import attribute is the import alias followed optionally by the hash separator "#" and then the rulename within the imported grammar.

<ruleref import="places#city"/>


<ruleref import="places"/>

2.2.4 Special Rules

Several rulenames are defined to have specific interpretation and processing by a speech recognizer. Grammar should not attempt to redefine these rulenames.

$NULL <ruleref special="#NULL"/>
Defines a rule that is automatically matched: that is, matched without the user speaking any word.
$VOID <ruleref special="#VOID"/>
Defines a rule that can never be spoken. Inserting VOID into a sequence automatically makes that sequence unspeakable.
$GARBAGE <ruleref special="#GARBAGE"/>
Defines a rule that matches any speech up until the next rule match, the next token or until the end of spoken input. For example, given definitions of US cities and states, the following ABNF and XML rule definitions match "Philadelphia in the great state of Pennsylvania" as well as simply "Philadelphia Pennsylvania".
```
$location = $city $GARBAGE $state;
```
```
<rule id="location">
  <ruleref uri="#city"/>
  <ruleref special="GARBAGE"/>
  <ruleref uri="#state"/>
</rule>
```

Issues

Unlike the XML form with the "special" attribute, the ABNF form does not syntactically distinguish special rules from other rules. There could be yet another rule reference syntax for special rules: e.g. $NULL$.

2.2.5 Referencing N-gram Documents

The Voice Browser Working Group is developing the Speech Recognition N-Gram Grammar Specification in parallel with this specification. These two specifications represent different and complementary ways of informing a speech recognizer of which words and patterns of words to listen for.

A speech recognizer may choose to support the Speech Recognition N-Gram Grammar Specification in addition to the Speech Recognition Grammar Specification defined in this document.

If a speech recognizer supports both grammar representations it may optionally support references between the two formats. Grammars defined in the ABNF or XML formats may reference start symbols of N-Gram documents and vice versa.

The syntax for referencing an N-Gram is the same as referencing externally defined ABNF or XML grammar documents. Both URI references and import referencing methods are permitted in both the ABNF and XML forms. The fragment identifier (a rulename when referencing ABNF and XML grammars) identifies a start symbol as defined by the N-Gram specification. If the start symbol is absent the N-Gram, as a whole, is referenced as defined in the N-Gram specification.

ABNF Form

URI references and import references to N-Gram documents follow the same syntax as reference to other ABNF/XML grammar documents. The following are examples of URI reference and import reference each with an explicit rule reference and a reference to the root rule.
$(http://www.mygrammars.com/ngram.xml#StartSymbol)
$(http://www.mygrammars.com/ngram.xml)
$$ngram#StartSymbol
$$ngram
XML Form

URI references and import references to N-Gram documents follow the same syntax as reference to other ABNF/XML grammar documents. The following are examples of URI reference and import reference each with an explicit rule reference and a reference to the root rule.
<ruleref uri="http://www.mygrammars.com/ngram.xml#StartSymbol"/>
<ruleref uri="http://www.mygrammars.com/ngram.xml"/>
<ruleref import="ngram#StartSymbol"/>
<ruleref import="ngram"/>

2.3 Sequences

A sequence of legal rule expansions is itself a legal rule expansion.

The sequence of rule expansions in the grammar implies the temporal order in which the expansions must be spoken by the user and detected by the speech recognizer. This constraint applies to sequences of tokens, sequences of rule references, parentheticals and all combinations of these rule expansions.

ABNF Form

A sequence of legal expansions is a white-space-separated string of the concatenated sub-expansions. Where necessary, the sequence can be delimited at the start and end by parentheses.
this is a test           // sequence of tokens
$action $object          // sequence of rule references
the $object is $color    // sequence of tokens and rule references
(fly to $city)           // parentheses for explicit boundaries
XML Form

A sequence of XML rule expansion elements (<ruleref>, <item>, <one-of>, <count>, <token>) and CDATA sections containing space separated tokens must be recognized in temporal sequence. (The only exception is where one or more "item" elements appear within a "one-of" element.)

If necessary an "item" element can surround the elements of a sequence to allow tags or other data to annotate the sequence. (The "weight" attribute of "item" is ignored unless the element appears within a "one-of" element.)

this is a test


<ruleref uri="#action"/> <ruleref uri="#object"/>


the <ruleref uri="#object"/> is <ruleref uri="#color"/>


<item>fly to <ruleref uri="#city"/> </item>

2.4 Alternatives

A set of alternative rule expansions is itself a legal rule expansion.

Weights may be optionally provided for each alternative expansion. The weights should indicate the occurrence likelihood for each choice. Weights are simple floating point values (nnn[.nnn]) and must be zero or greater. In the absence of weights or if the weights are not properly specified (e.g. one or more missing) then the choices are assumed to be equally likely.

ABNF Form

A set of alternative choices is identified as a list of legal expansions separated by the vertical bar symbol. If necessary, the set of alternative choices may be delimited by parentheses.
Michael | Yuriko | Mary | Duke | $otherNames
(1 | 2 | 3)
A weight is surrounded by forward slashes and placed before each item in the alternatives list.
/10/ small | /2/ medium | /1/ large
/3.1415/ pie | /1.414/ root beer
XML Form

The "one-of" element identifies a set of alternative elements. Each alternative expansion is contained in a "item" element. Weights are optionally indicated by the "weight" attribute on the "item" element.
<one-of>
  <item>Michael</item>
  <item>Yuriko</item>
  <item>Mary</item>
  <item>Duke</item>
  <item><ruleref uri="#otherNames"/></item>
</one-of>

<one-of><item>1</item> <item>2</item> <item>3</item></one-of>

<one-of>
  <item weight="3.1415">pie</item>
  <item weight="1.414">root beer</item>
</one-of>

2.5 Counts: Optional, *, +

Operators are provided that define a legal rule expansion as being another sub-expansion that is optional, that is repeated zero or more times, or that is repeated one or more times.

ABNF Form

Optional expansions are delimited by square brackets: [...]. The postfix operators, * and +, are attached to expansions that are to be repeated zero or more times, or one or more times respectively.
// "pizza"
// "big pizza with pepperoni"
// "very big pizza with cheese and pepperoni"
[[very] big] pizza ([with | and] $topping)* 

// "1"
// "1234"
$digit +
XML Form

The "count" element has a "number" attribute that indicates the number of times the contained expansion may be repeated. Defined values are "optional" or "?", "0+", "1+".



<count number="optional"> 
   <count number="optional"> very </count>
   big 
</count> 
pizza
<count number="0+">
   <count number="optional">
      <one-of>
         <item>with</item>
         <item>and</item>
      </one-of>
   </count>
   <ruleref uri="#topping"/>
</count>



<count number="1+"> <ruleref uri="#digit"/> </count>

Issues

The number attribute could support values such as "0-3", "6" etc. This will simplify certain grammars (e.g. telephone numbers). In the XML form this changes the set of legal values of the number attribute. In ABNF, the values could be placed in square brackets or between slashes: e.g. $digit /3-6/
Note: There are ways to represent all these count patterns with the existing specification. The change would address clarity/compactness only.

2.6 Tags

A tag is an arbitrary string that may be attached to any legal rule expansion. Tags do not affect the legal word patterns defined by the grammars or the process of recognizing speech input given a grammar.

Tags instead provide information that is typically used in post-processing of speech recognition results that match a grammar (more specifically match rule definitions and rule expansions).

Section 6.1 describes work of the W3C Voice Browser Working Group in parallel to this grammar specification towards defining a semantic interpretation language. That language is likely to be contained within grammar tags.

ABNF Form

A tag is delimited by curly braces and is a postfix attachment to a rule expansion. The number of opening curly braces matches the number of closing curly braces. This is useful when the contained text contains curly braces, for example, when the contained text is a scripting language. Alternatively, contained closing braces may be escaped with a backslash. A backslash must also be escaped with a backslash.
this is a test {tag attached to "test"}
open {action=open;} | close {action=shut;}
XML Form

A "tag" element may be attached to any of the rule expansion elements: "ruleref", "one-of", "item", "count".
this is a <item tag='tag attached to "test"'>test</item>
<one-of>
   <item tag="action=open;"> open </item>
   <item tag="action=shut;"> close </item>
</one-of>

Issues

Section 6.1 outlines ongoing study of a semantic interpretation mechanism for speech grammars.
Section 6.7 outlines a Future Study item in which an XML tag element will be introduced with equivalent capability to the tag attribute.

2.7 Language and Lexicon

In situations where applications target a multilingual user community it may be required that grammars contain words in different languages. For instance, in response to a prompt such as: "Do you want to talk to Michel Tremblay?" (a combination of an English sentence with a French name), the response may be either "yes" or "oui". In this case, it is possible to define two grammars, once each for the English and French responses. However, it is useful and simpler to define a grammar that contains both possibilities.

There is a related challenge for multilingual applications that deal with proper names (people, streets, companies, etc.) that may be spoken with different pronunciations or accents depending upon the language of origin and the speaking language. It is often impossible to predict the language that users will use to pronounce certain tokens. In fact, users may actually use different languages for different words in the same sentence, and in unpredictable ways. For instance, the name "Robert Jones" might be pronounced by a French-speaking user using the French pronunciation for "Robert" but an English pronunciation for "Jones", whereas a mono-lingual English speaker would use the English pronunciation for both words.

Both the ABNF and XML grammar forms support three ways of indicating language and lexicon.

Token attachment: one or more languages may be specified optionally on a per-token basis. If the lexicon is not supported then it is inherited from the parent rule expansions that define language, or otherwise from the language declaration of the document. If more than one language is specified then the recognizer should apply pronunciations, phonetic inventory and acoustic models for each language for the token in parallel.
Rule expansion attachment: a single language may be specified for any rule expansion. (Multiple language may not be attached except on a per-token basis.)
Document declaration: Section 4.2 defines the way to declare the default language of a grammar. Unless a token attachment or rule expansion attachment overrides the document locale declaration, all tokens in the grammar are of the same locale.

Language scoping: language declarations are scoped locally to a document and to a rule definition. In XML terminology, the language attribute is inherited down the document tree. Where a language change encompasses a reference to another grammar, the referenced rule is defined in the language of the reference grammar not by the local language at the point of the reference.

Attaching a language identifier to grammar constructs is an indication to the speech recognizer that for the tokens contained within the construct, it should be using pronunciation rules, phonetic inventory and acoustic models corresponding to the specified language identifier. If several languages are specified for a given token, then the pronunciation, phonetic inventory and acoustic models of each language should all be used in parallel.

ABNF Form

For the ABNF form a language identifier is attached to a token or a rule expansion using the exclamation mark as delimiter. The language identifier applies to all tokens within the rule expansion. For token attachment with multiple language, the language identifiers are comma separated.
#ABNF 1.0 ISO8859-1;

// default grammar language is US English
language en-US;

// single language attachment to tokens
$yes = oui!fr-CA | yes!en-US

// single language attachment to a rule expansion
$request = May I speak to (Michel Tremblay | André Roy)!fr-CA

// multiple language attachment to a token
// and the equivalent single-language attachment expansion
Robert!en-US,fr-CA
Robert!en-US | Robert!fr-CA
XML Form

For the XML form the "xml:lang" attribute can be attached to any of the rule expansion elements: "one-of", "item" or "count". The "xml:lang" attribute may be attached to a "ruleref" element but has no effect because of the scoping rules. In addition, the "token" element can be used with the "lexicon" attribute for specifying multiple languages for an individual token by a comma-separated list of locales.
<?xml version="1.0"?>


<grammar xml:lang="en-US" version="1.0">


<rule id="yes">
<one-of>
  <item xml:lang="fr-CA">oui</item>
  <item xml:lang="en-US">yes</item>
</one-of> 
</rule> 


<rule id="request">
may I speak to
<one-of xml:lang="fr-CA">
  <item>Michel Tremblay</item>
  <item>André Roy</item>
</one-of>
</rule>


<rule id="people1">
<token lexicon="en-US,fr-CA"> Robert </token>
</rule>


<rule id="people2">
<one-of>
  <item xml:lang="en-US">Robert</item>
  <item xml:lang="fr-CA">Robert</item>
<one-of>
</rule>

</grammar>

Issues

In a future revision the attribute name "lexicon" may be replaced by "lang-list" for greater consistency with "xml:lang". [Note that xml:lang cannot be used for lists of locales.]
Some issues have been raised regarding the potential for inefficiency in loading and recognizing against grammars with multiple locales. These issues are under study and may lead to changes in the handling of locales in a future revision of this document.
This mechanism could be extended to specify a special set of phonetic models available on the platform. For instance, an application might want to use a set of phonetic models specially tuned for recognizing stock quotes.

2.8 Precedence

This section defines the precedence of the rule expansion syntax. Because XML documents explicitly indicate structure there is no ambiguity and thus a precedence definition is not required. The precedence definitions for the ABNF form are intended minimize the need for parentheses.

ABNF Form

The following is the ordering of precedence of rule expansions. Parentheses are used when necessary to explicitly control rule structure.

Rulename denoted by the dollar sign '$', and a quoted or unquoted token.

"()" parentheses for grouping and "[]" for optional grouping.

The unary operators (`+', `*', and tag attachment) apply to the tightest immediate preceding rule expansion. (To apply them to a sequence or to alternatives, use `()' or `[]' grouping.)

Sequence of rule expansions.

`|' separated set of alternative rule expansions.

XML Form

None required. XML structure is explicit.

3. Rule Definitions

A rule definition associates a legal rule expansion with a rulename. The rule definition is also responsible for defining the scope of the rule definition: whether it is local to the grammar in which it is defined or whether it may be imported into and referenced within other grammars. Finally, the rule definition may additionally include documentation comments and other pragmatics.

The rulename must be unique within a grammar. The same rulename may be used in multiple grammars with the rulename resolution specification defining how to uniquely identify each rule definition.

3.1: Basic Rule Definition
3.2: Scoping of Rule Definitions
3.3: Example Phrases

3.1 Basic Rule Definition

The core purposes of a rule definition is to associate a legal rule expansion with a rulename.

ABNF Form

The rule definition consists of an optional scoping declaration (explained in the next section) followed by a legal rule name, an equals sign, a legal rule expansion and a closing semi-colon. The rule definition has one of the following legal forms:
$ruleName = ruleExpansion;
public $ruleName = ruleExpansion;
private $ruleName = ruleExpansion;
For example:
$city = Boston | "New York" | Madrid;
$command = $action $object;
XML Form

A rule definition is represented by the "rule" element. The "id" attribute of the element indicates the name of the rule and must be unique within the grammar (this is enforced by XML). The contents of the "rule" element may be any legal rule expansion defined in Section 2. The "scope" attribute is explained in the next section.
<rule id="city">
   <one-of>
      <item>Boston</item>
      <item>"San Francisco"</item>
      <item>Madrid</item> 
   </one-of>
</rule>
<rule id="command">
   <ruleref uri="#action"/>
   <ruleref uri="#object"/>
</rule>

Issues

Because the rulename is an XML ID, a rulename must be unique to a document. If one or more XML grammars are embedded in another document (e.g. DialogML) then they cannot use the same rulename. The Working Group may consider XPath to address this constraint.

3.2 Scoping of Rule Definitions

A rule definition may be defined as local to a grammar or may be referenceable within other grammars. Rules with local scope are private. Rules that may be referenced from other grammars are public. A rule defaults to private unless the scope is explicitly stated in the rule definition.

Rules with public scope may be activated for recognition. That is they may define the top-level syntax of spoken input. For instance, VoiceXML grammar activation may explicitly reference a single public rule or multiple public rules.

The intent of scoping is to allow a grammar author to separate working rules from exported rules that are intended for use elsewhere. The scoping mechanism defined here is closest to that of the Java(TM) Programming Language. Section 4 explains the import mechanism and namespace resolution.

ABNF Form

A rule definition may be annotated with the keywords "public" or "private". If no scope is provided, the default is "private".
$town = Townsville | Beantown;
private $city = Boston | "New York" | Madrid;
public $command = $action $object;
XML Form

The "scope" attribute of the "rule" element defines the scope of the rule definition. Defined values are "public" and "private". If omitted, the default scope is "private".
<rule id="town">
   <one-of>
      <item>Townsville</item>
      <item>Beantown</item> 
   </one-of>
</rule>
<rule id="city" scope="private">
   <one-of>
      <item>Boston</item>
      <item>"San Francisco"</item>
      <item>Madrid</item> 
   </one-of>
</rule>
<rule id="command" scope="public">
   <ruleref uri="#action"/>
   <ruleref uri="#object"/>
</rule>

Issues

Many recognizers can enhance performance if activatable rules and exported rules are distinguished, for example, by using a separate keyword for the two separate meanings of public.

3.3 Example Phrases

It is often desirable to include examples of phrases that match rule definitions along with the definition. Zero, one or many example phrases may be provided for any rule definition. Because the examples are explicitly marked, automated tools can be used for regression testing and for generation of grammar documentation.

ABNF Form

A documentation comment is a C/C++/Java comment that starts with the sequence of characters /** and which immediately precedes the relevant rule definition. Zero or more "@example" tags may be contained at the end of the documentation comment. The tokenization of the example follows the tokenization and sequence rules defined in Section 2.
/**
 * A simple directive to execute an action.
 *
 * @example open the window
 * @example close the door
 */
public $command = $action $object;
XML Form

Any number of "example" elements may be provided as the initial content within a "rule" element. The tokenization of the example follows the tokenization and sequence rules defined in Section 2.
<rule id="command" scope="public">
    
    <example> open the window </example>
    <example> close the door </example>
    <ruleref uri="#action"/> <ruleref uri="#object"/>
</rule>

4. Grammar Documents

A grammar document specifies a set of associated rules. All rules defined within that grammar are scoped within the grammar's namespace. Each rule defined within the grammar must have a unique name. In the XML format the grammar name is an XML ID and must be a unique ID within the complete XML document.

4.1: Grammar Header and Character Encoding
4.2: Grammar Locale
4.3: Grammar Mode
4.4: Root Rule Declaration
4.5: Imports
4.6: Comments

4.1 Grammar Header and Character Encoding

The character encoding indicates the symbol set used in the document. For example, for US applications it would be common to use ASCII or the superset of ISO8859. For Japanese grammars, character sets such as JIS and Unicode could be used. For both the ABNF and XML forms, the omission of the character encoding passes responsibility for determining encoding to the recognizer or host platform.

ABNF Form

The ABNF form defines the character encoding in the opening line of the grammar. A legal grammar must start with the "#" symbol and the characters leading to the first newline symbol are of the style:
#ABNF version-number optional-char-encoding;
#ABNF 1.0;
#ABNF 1.0 ISO8859-5;
#ABNF 1.0 JIS;
XML Form

XML defines character encodings as part of the document's XML declaration on the first line of the document. (Note that the version number in this declaration refers to the XML version and not the version of the grammar specification.)
<?xml version="1.0" ?>
<?xml version="1.0" encoding="ISO8859-5" ?>
<?xml version="1.0" encoding="JIS" ?>

Issues

Need to define the XML namespace for the grammar document.
Need to define the DOCTYPE.

4.2 Grammar Locale

The Locale of a grammar indicates the primary language contained by the document. The locale follows RFC 1766 which defines a language code and an optional national or regional variant. If the locale is not defined, the recognizer or host platform should assume a reasonable default locale.

ABNF Form

An optional language declaration should be the first non-comment declaration of an ABNF grammar file following the self-identifying header.
language en-US;
XML Form

Following the XML convention the language and variant are indicated by a "xml:lang" attribute on the "grammar" element. The version of the grammar is represented by the "version" attribute which is "1.0" for this specification.
<grammar xml:lang="en-US" version="1.0">
... imports
... rule definitions
</grammar>

4.3 Grammar Mode

The mode of a grammar indicates the type of input that the speech recognizer should be detecting. The default mode is "speech". An alternative and optional input mode defined in Appendix D is "dtmf" input.

The mode attribute indicates how to interpret the tokens contained by the grammar. Speech tokens are expected to be detected as speech audio that sounds like the token. DTMF tokens, if supported, are detected as per the ITU Recommendation Q.24.

It is often the case that a different processor is used for detecting DTMF tones than for speech recognition. The same may be true for other modes defined in future revisions of the specification.

ABNF Form

An optional mode declaration should follow the language declaration if present or otherwise be the first non-comment declaration of an ABNF grammar file following the self-identifying header. If the mode declaration is omitted the default of "speech" is assumed. If the mode declaration is "dtmf" then the tokens contained by the grammar comprise DTMF events as defined in Appendix D.
mode speech;
XML Form

The mode declaration is provided as an optional "mode" attribute on the root "grammar" element. Legal values are "speech" (the default) and "dtmf". Other values are permitted but should be considered vendor-specific.
<grammar mode="speech" version="1.0" xml:lang="en-US"> 
... imports
... rule definitions
</grammar>

4.4 Root Rule Declaration

The grammar specification permits rule references to target either a specific public rule definition of an external grammar or the implicit or explicit "root" rule defined within the grammar.

Explicit root rule: both the XML and ABNF forms permit the grammar header to declare a single rule to be the root rule of the grammar. The rule must be a public rule. The specified rule is the rule that is referenced when a "ruleref" element references the grammar without a rulename identifier (applies both to URI references and import references).

Implicit root rule: for both the XML and ABNF forms if there is not an explicit definition of the root rule the speech recognizer should generate a root rule by the conjunction of all the public rules defined within the grammar. In effect this is equivalent to defining a rule with all the public rules as alternatives. In other respects the implicit root rule is equivalent to explicit definition.

Although a grammar is not required to declare a root rule it is good practice to declare the root rule of any grammar.

ABNF Form

An optional root rule declaration should follow the language and mode declarations (if present) or otherwise be the first non-comment declaration of an ABNF grammar file following the self-identifying header. The root declaration must identify one public rule defined elsewhere within the same grammar.
root rulename;
XML Form

The root rulename declaration is provided as an optional "root" attribute on the "grammar" element. The root declaration must identify one public rule defined elsewhere within the same grammar.
<grammar root="rulename" ...> 
... imports
... rule definitions
</grammar>

4.5 Imports

An import is a convenience mechanism for referencing externally defined grammars and public rules of those grammars. An import is effectively a local name -- an alias -- for an external grammar identified by its URI. Rule references (as defined in Section 2.2.2) can use the alias instead of the URI when referencing rules of the imported grammar. A document should never import two grammars and assign the same local alias.

The import declaration does not copy the referenced grammar. An import is analogous to the "import" statement of the Java Programming Language or to a hyperlink in an HTML document. Thus, it is not possible to reference externally-defined rules as if they were local rules (using only the simple rulename).

ABNF Form

Zero, one or many "import" declarations follow the optional "language" declaration, but preceed the rule definitions in the body of the grammar. The following import statements define local aliases for imported grammars.
import http://www.mygrammars.com/cities-states.xml as places;

// References the "city" rule defined in the "places" grammar
... $$places#city ...
XML Form

Zero, one or many "import" elements may be contained as the leading elements within a "grammar" element. The "import" elements must preceed the "rule" elements. The "import" element is empty. The "uri" and "name" attributes are required.
<import uri="http://www.mygrammars.com/cities-states.xml"
        name="places"/>

 
 ... <ruleref import="places#city"/> ...

Issues

There is a proposal to add a "type" attribute to the "import" element to indicate the MIME type of the referenced grammar. Representing the same information in the ABNF form may be difficult.
There is a proposal to add a "mode" attribute to the "import" element to permit a grammar to indicate the type of the grammar that it references. In considering this modification there needs to be study of when a grammar of one mode (e.g. "speech" or "dtmf") could refer to a grammar of the another mode. Representing the same information in the ABNF form may be difficult.
There is a request to change the element name to "alias" to more closely reflect the usage of import statements. The semantics and documentation would be largely unaffected since "imports" are largely referred to as "aliases" throughout the specification.

4.6 Comments

Comments may be placed in most places in a grammar document. For XML, use XML comments. For ABNF there are documentation comments and C/C++/Java-style comments.

ABNF Form

C/C++/Java comments are permitted. Documentation comments are permitted before grammar, language and import declarations and before each rule definition.

Section 3.3 defines the format for representing examples in documentation comments before a rule definition.
// C++/Java-style single-line comment
/* C/C++/Java-style comment */
/** Java-style documentation comment */
XML Form

An XML comment has the following syntax.

5. Conformance

This section is Normative.

Different sets of grammar conformance criteria exist for:

5.1: Conforming XML Grammar Fragments
5.2: Conforming Stand-Alone XML Grammar Document
5.3: Conforming Included XML Grammar Fragments
5.4: Conforming XML Grammar Processors
5.5: Conforming Stand-Alone ABNF Grammar Documents
5.6: Conforming ABNF Grammar Processors
5.7: Conforming ABNF/XML Grammar Processors

Issues

A future revision of this specification may specify conformance criteria for entities that generate the XML Grammar Format.

5.1: Conforming XML Grammar Fragments

An XML grammar document fragment is a Conforming XML Document Fragment if it adheres to the specification described in this document (Speech Recognition Grammar Format Specification) including the DTD (see Document Type Definition) and also:

(relative to XML) is well-formed.
if all non-grammar namespace elements and attributes and all xmlns attributes which refer to non-grammar namespace elements are removed from the given document, and if an appropriate XML declaration (i.e., <?xml...?>) is included at the top of the document, and if an appropriate document type declaration (i.e., <!DOCTYPE grammar ... >) which points to the Grammar DTD is included immediately thereafter, the result is a valid XML document.
conforms to the following W3C Recommendations:
- the XML 1.0 specification (Extensible Markup Language (XML) 1.0).
- (if any namespaces other than grammar are used in the document) Namespaces in XML.

The XML grammar language or these conformance criteria provide no designated size limits on any aspect of grammar documents. There are no maximum values on the number of elements, the amount of character data, or the number of characters in attribute values.

5.2: Conforming Stand-Alone XML Grammar Document

A file is a Conforming Stand-Alone XML Grammar Document if:

it is an XML document.
its root element is a 'grammar' element.
it conforms to the criteria for Conforming XML Grammar Fragments.

5.3: Conforming Included XML Grammar Fragments

Issues

It is anticipated that fragments of grammar definitions will be incorporated into other documents, most importantly, into Dialog Markup Language. In a Future Revision of this specification the conformance criteria for Included XML Grammar Fragments will be defined.

5.4: Conforming XML Grammar Processors

An XML Grammar processor is a program that can parse and process XML Grammar fragments. Examples include speech recognizers and DTMF detectors that accept the XML Grammar format.

In a Conforming XML Grammar Processor, the XML parser must be able to parse and process all XML constructs defined within XML 1.0 and XML Namespaces.

A Conforming XML Grammar Processor must correctly understand and apply the semantics of each possible grammar feature defined by this document.

A Conforming XML Grammar Processor should inform its hosting environment if it encounters a language, locale or lexicon that it is unable to support. A Conforming XML Grammar Processor should also inform its hosting environment if it encounters an illegal grammar document, an unknown XML entity reference or any other grammar content that it is unable to process. [See the issues note below on consideration being given to more precise and complete handling regarding multi-lingual conformance.]

A Conforming XML Grammar Processor is not required to support recursive grammars, that is, grammars in which rule references include direct or indirect self-reference.

There is, however, no conformance requirement with respect to performance characteristics of the XML Grammar Processor. For instance, no statement is required regarding the accuracy, speed or other characteristics of a speech recognizer or DTMF detector. No statement is made regarding the size of grammar or size of grammar vocabulary that an XML Grammar Processor must support.

Issues

The Voice Browser Working Group is developing a more precise statement regarding conformance requirements for multi-lingual grammars and other multi-lingual/multi-locale behavior. The following is draft wording. Comments will be welcomed!

DRAFT: A Conforming XML Grammar Processor must meet the following requirements for handling of languages and locales:

A Conforming Grammar Processor is required to parse all language declarations successfully.
A Conforming Grammar Processor shoud inform its hosting environment if it encounters a language or locale that it can not support.
A Conforming Grammar Processor that can support a given language/locale, must be able to activate the root, any single public rule, or any set of public rules or roots of one or many grammars where each rule or root and all directly or indirectly referenced sub-rules are for this same given language/locale.
A Conforming Grammar Processor may be able to activate the root, any single public rule, or any set of public rules or roots of one or many grammars where each rule or root and all directly or indirectly referenced sub-rules contain a single language and locale but not all the same language and locale, or where at least one rule or root or at least on of the directly or indirectly referenced sub-rules contain more than one language and locale. When a processor is able to support each language or locale in the set but is unable to handle them concurrently it should inform the hosting environment. When the set includes one or more languages or locales that are not supported by the processor it should inform the hosting environment.
A Conforming Grammar Processor may implement locales by approximate substitutions according to a documented, platform-specific behavior. For example, using a US English speech recognizer to process British English input.

5.5: Conforming Stand-Alone ABNF Grammar Documents

An ABNF grammar document is a Conforming ABNF Document if it adheres to the specification described in this document (Speech Recognition Grammar Format Specification) including the Formal BNF Specification.

5.6: Conforming ABNF Grammar Processor

An ABNF Grammar processor is a program that can parse and process ABNF Grammar documents. Examples include speech recognizers and DTMF detectors that accept the ABNF Grammar format.

A Conforming ABNF Grammar Processor must correctly understand and apply the semantics of each possible grammar feature defined by this document.

A Conforming ABNF Grammar Processor should inform its hosting environment if it encounters a language, locale or lexicon that it is unable to support. A Conforming ABNF Grammar Processor should also inform its hosting environment if it encounters an illegal grammar document or other grammar content that it is unable to process. [See the issues note in Section 5.4 on consideration being given to more precise and complete handling regarding multi-lingual conformance.]

A Conforming ABNF Grammar Processor is not required to support recursive grammars, that is, grammars in which rule references include direct or indirect self-reference.

There is, however, no conformance requirement with respect to performance characteristics of the ABNF Grammar Processor. For instance, no statement is required regarding the accuracy, speed or other characteristics of a speech recognizer or DTMF detector. No statement is made regarding the size of grammar or size of grammar vocabulary that an ABNF Grammar Processor must support.

5.7: Conforming ABNF/XML Grammar Processor

A Conforming ABNF/XML Grammar Processor must meet all the conformance criteria defined in Section 5.4 and in Section 5.6.

Additionally an ABNF/XML Grammar Processor must be able to resolve and apply references from XML Grammars to ABNF Grammars, and references from ABNF Grammars to XML Grammars.

6. Future Study

This section is Informative.

6.1: Semantic Interpretation (Future Study)

A speech recognition grammar defines what a user can say. Similarly, it defines the syntax of the spoken input that can be heard by a speech recognizer.

The W3C Voice Browser Working Group is currently working on a draft for an Natural Language Semantics specification which will represent interpreted spoken input: what a user means.

The group has initiated work on defining a mechanism by which the semantic interpretation for spoken input sentence can be derived from the sequence of spoken words and the grammar(s) that it matches. A draft proposal is planned for the next release of this document. We are interested in comments and requirements from reviewers of this document.

The group is currently exploring means by which semantic interpretation can be attached to the grammar using the "tagging" mechanisms defined in this document (specifically in Section 2.6). It must be possible to represent the semantic result in the NL Semantic format and it must also be possible to use the semantic result in the processing of the Dialog Markup Language. The first release is intended to support stateless interpretation of spoken input. The following are amongst the approaches under consideration.

Simple tags: interpret a tag as a value string that represents the meaning of the object to which it is attached.
Action tags: embed a scripting language (e.g. ECMAScript) in the tags. It should have access to spoken words, to matched sub-rules and to other information which assists in semantic interpretation. The return value should represent the meaning of each defined rule.
Declarative tags: the tag element can contain constructs to generate semantic representation in the NL semantic markup language. The constructs should follow the DOM document generating principles outlined in Sec. 7 of W3C XSLT recommendation, "Creating the result tree". Each CFG rule should be regarded and treated as a template rule in the XSLT context. As such, the plain text output is a special case of using declarative tags by adhering strictly to Sec. 7.2, "creating text" of the XSLT specifications.

Specific proposals and general requirements for semantic interpretation are welcomed.

6.2: Morphology and Tokens (Future Study)

The current specification assumes a direct mapping from tokens appearing in a grammar to lexicon entries and pronunciations used by the speech recognizer. In the design of some application grammars and for many classes of language it is simpler to reference morphological variants of tokens within the grammar and permit the speech recognizer to perform a broader mapping from the token to a class of lexical entries and associated pronunciations.

The morphological variations of a token may depend upon the grammatical class of the token (e.g. verb, noun, male noun, adjective). Morphological rules are intrinsically language-specific and some languages have much richer morphological behaviour than others.

For illustrative purposes the following example is in English (a language with a moderate level of morphological richness).

<morph>dog</morph> <morph>drink</morph> faster 

dog:: dog | dogs
drink:: drink | drank | drunk | drinks

Issues: This simple grammar permits ungrammatical input such as "dogs drinks faster" (not legal because of subject-verb disagreement). The morphological variants of "drink" include both past and present tense ("drink" vs "drank") and numbering ("drinks" vs. "drink"). In many cases automated variations may exceed the variations intended by the grammar author. Finally, morphological variation will complicate semantic interpretation of spoken input.

Status: The Working Group is not aware of any existing grammar format that supports this kind of morphological inference by a speech recognizer for a grammar. For this reason and because of the standing issues listed above our disposition is towards not supporting this capability in the specification.

Work-around: a grammar author may explicitly include all variations into a grammar. Tools may also help in the process of identifying and selecting morphological variants.

6.3: Dynamic Grammars (Future Study)

The current specification makes no statement about when grammars are loaded into a voice browser or speech recognizer. Furthermore, the current specification makes no statement about how or when the definition of a grammar can be modified after its initial loading. The following are issues under consideration.

A static vs. volatile attribute might be attached to individual rules or to an entire grammar. Where rules are known to be static or volatile some speech recognition systems are able to optimize run-time performance. A boolean "static" or "volatile" attribute may be attached to any rule element in the XML form or to the grammar element to indicate the default value for the grammar. In the ABNF format a "static" or "volatile" keyword would be permitted on equivalent the rule and grammar declarations.
To reduce the size of grammar files when dynamic changes are made, some consideration has been given to allowing a document to define only those rules which are modified. Unchanged rules may be stubbed out or omitted.
Weights of choices and tokens may be dynamically generated by a Voice Browser. For example, an element with similar capability to the <value> element of VoiceXML might be used to infer grammar definitions.

6.4: Embedding Partial Rule Definitions (Future Study)

The current specification is intended to support the embedding of fully-defined grammars into parent documents, in particular, into the Dialog Markup Language currently in development. A full grammar document contains the header, import declarations, and one or many rule definitions.

It is also desirable that it be possible to embed just the fragment of a rule definition that represents the right side of a rule definition. This could be any legal combination of the entities defined in Section 2. There is nothing in the current specification that prohibits this, but study of the namespace issues is required.

For the ABNF form, the embedded grammar may look like the following:

<grammar> apple | melon | banana | peach </grammar>

For the XML form, the embedded grammar may look like the following:

<grammar xmlns:nl='http://www.w3.org/2001/01/speech-grammar'>
  <one-of> 
     <item>apple</item>
     <item>melon</item>
     <item>banana</item>
     <item>peach</item>
  </one-of> 
</grammar>

6.5: Phonemic pronunciations for tokens (Future Study)

For many words, the written form does not accurately indicate the correct pronunciation of the word. For example, in languages that use the Chinese character set a single character may have many pronunciations amongst which only one might make sense in a given context. Similarly, written forms such as abbreviations, acronyms, proper names, and foreign words do not always reliably indicate correct pronunciation.

Because a recognizer needs to know a word's pronunciation to be able to hear it, the Working Group is considering an enhancement to both the ABNF and XML grammar formats to allow a grammar document to explicitly specify pronunciations. This mechanism may be supported in addition to any existing platform mechanism for supporting vocabularies and pronunciations. It is expected that if pronunciations are supported, that they be optional and that they use a similar format to the pronunciation element defined in the parallel specification for the Speech Synthesis Markup Language (e.g. supporting the same phonetic alphabets including the International Phonetic Alphabet).

For the ABNF form, augment tokens with the pronunciation language.

// Following ":" is the US pronunciation as IPA characters 
tomato:t&#252;m&#251;to&#28A;

For XML, add a token element with optional phoneme and phonetic alphabet attributes.

<!-- The attribute provides the US pronunciation as IPA -->
<token phoneme="t&#252;m&#251;to&#28A;">tomato</token>

6.6 Lexicons (Future Study)

The W3C Voice Browser Working Group recently initiated work on a Lexicon Format but has not yet determined whether to proceed to development of a full specification.

If a specification is developed it is likely that it would permit a lexicon document to define a set of words and the pronunciation or pronunciations for each word. The Speech Recognition Grammar Format could permit a grammar to reference one or more lexicons with the intent that a speech recognizer use the pronunications contained with the lexicon document.

It might be possible to reference a lexicon document in the header of a grammar or reference individual words within a lexicon within rule definitions.

lexicon http://www.acme.com/mylexicon.xml
...
$rule = $(http://www.acme.com/mylexicon.xml#word);

<lexicon uri="http://www.acme.com/mylexicon.xml"/>
...
<token uri="http://www.acme.com/mylexicon.xml#word"/>

Status: the W3C Voice Browser Working Group will reconsider this enhancement once a decision is made on whether to proceed with a Lexicon specification.

6.7: Tag element (Future Study)

Both the XML and ABNF forms in the current specification permit tags to be attached to any rule expansion (see Section 2.6). In the XML form the tag is attached to an expansion as an attribute. In the ABNF form the tag is attached to a legal rule expansion as a post-fix entity contained with curly braces. In both forms the contents of the tag is an arbitrary string, however, the Working Group expects that semantic attachment will be an important special-case use of tags in a future revision of this specification (see Section 6.1).

The Working Group plans to introduce a tag element for the XML form of the specification in its next release. This would complement the existing tag attribute. The separate element should have the following advantages:

CDATA contained within the tag element would have fewer formatting constraints that tags attached as an attribute.
Multi-line tags would be easier to read.
The tag data could contain arbitrary XML data, mostly likely in a different XML namespace.

There are two forms in which the tag element could be attached to the existing rule expansion elements.

The tag element could contain the rule expansion to which it is attached. A downside of this approach is that the tag CDATA would need to be separated from the expansion so an additional element may be needed for the tag data.
Tag elements could be interpreted as if attached to the element that contains them. Some of the design choices available include (1) allow a single tag element as the first contained sub-element, (2) allow a single tag element as the last contained sub-element, (3) allow multiple tag elements within a single parent element. In all cases, there may also be a tag attribute on the parent element and for (3) there is explicitly an ability to attach multiple tags: multiple tags will require an explicit interpretation model.

6.8: MIME Types (Future Study)

The W3C Voice Browser Working Group is applying for standard MIME types for the XML and ABNF forms of grammar documents. The proposal for the MIME type of an XML Grammar document is "application/grammar+xml". The proposal for the MIME type of an Augmented BNF grammar document is "application/grammar".

7. Acknowledgements

This document was written with the participation of the members of the W3C Voice Browser Working Group (listed in alphabetical order):

Mike Brown, Lucent Bell Labs
Dan Burnett, Nuance Communications
Debbie Dahl, Unisys
Andrew Hunt, SpeechWorks International
Bruce Lucas, IBM
Scott McGlashan, PipeBeach
Yves Normandin, Locus Dialogue
Dave Raggett, HP
David Ramsthaler, Cisco
Luc Van Tichelen, Lernout & Hauspie
Kuansan Wang, Microsoft

Appendix A: Example Grammars in ABNF and XML Forms

This appendix is Informative.

The following shows a simple grammar that supports commands such as "open a file" and "please move the window". It references a separately-defined grammar for politeness which is not shown here.

ABNF Form

#ABNF 1.0 ISO8859-1;

language en;

import http://www.sayplease.com/politeness.xml as polite;

/**
 * Basic command.
 * @example please move the window
 * @example open a file
 */

public $basicCmd = 
          $$polite#startPolite $command $$polite#endPolite;

$command = $action $object;
$action = /10/ open {OPEN} | /2/ close {CLOSE} 
                 | /1/ delete {DELETE} | /1/ move {DELETE};
$object = [the | a] (window | file | menu);

XML Form

<?xml version="1.0"?>

<grammar xml:lang="en" version="1.0">

<import name="polite"
  uri="http://www.sayplease.com/politeness.xml"/>

<rule id="basicCmd" scope="public">
  <example> please move the window </example>
  <example> open a file </example>

  <ruleref import="polite#startPolite"/>
  <ruleref uri="#command"/>
  <ruleref import="polite#endPolite"/>
</rule>

<rule id="command">
  <ruleref uri="#action"/> <ruleref uri="#object"/>
</rule>

<rule id="action">
   <one-of>
      <item weight="10" tag="OPEN">   open </item>
      <item weight="2"  tag="CLOSE">  close </item>
      <item weight="1"  tag="DELETE"> delete </item>
      <item weight="1"  tag="MOVE">   move </item>
    </one-of>
</rule>

<rule id="object">
  <count number="optional">
    <one-of>
      <item> the </item>
      <item> a </item>
    </one-of>
  </count>
  <one-of>
      <item> window </item>
      <item> file </item>
      <item> menu </item>
  </one-of>
</rule>

</grammar>

The next two grammars show both an imported and importing grammar in both XML and ABNF formats.

ABNF: http://www.example.com/places.gram

#ABNF 1.0 ISO8859-1;

language en;

// No imports in this grammar.

public $city = Boston | Philadelphia | Fargo;

public $state = Florida | North Dakota | New York;

// References to local rules
// Artificial example allows "Boston, Florida!"

public $city_state = $city $state;

ABNF: http://www.example.com/booking.gram

#ABNF 1.0 ISO8859-1;

language en;

import http://www.example.com/places.xml as someplaces;

// Reference by URI syntax
$flight = I want to fly to
   $(http://www.example.com/places.xml#city);

// Reference using imported name
$exercise = I want to walk to $$someplaces#state;

// Reference to root rule using an import reference
$wet = I want to swim to $$someplaces;

XML Grammar: http://www.example.com/places.xml

<?xml version="1.0"?>

<grammar xml:lang="en" version="1.0">

   <rule id="city" scope="public">
     <one-of>
       <item>Boston</item>
       <item>Philadelphia</item>
       <item>Fargo</item>
     </one-of>
   </rule>

   <rule id="state" scope="public">
     <one-of>
       <item>Florida</item>
       <item>North Dakota</item>
       <item>New York</item>
     </one-of>
   </rule>

   <!-- Reference by URI to a local rule -->
   <!-- Artificial example allows "Boston, Florida"! -->
   <rule id="city_state" scope="public">
     <ruleref uri="#city"/> <ruleref uri="#state"/>
   </rule>
</grammar>

XML Grammar: http://www.example.com/booking.xml

<?xml version="1.0"?>

<grammar xml:lang="en" version="1.0">
   <import name="someplaces"
    uri="http://www.example.com/places.xml"/>

   <!-- Using URI syntax -->
   <rule id="flight">
     I want to fly to 
     <ruleref uri="http://www.example.com/places.xml#city"/>
   </rule>

   <!-- Using import syntax -->
   <rule id="exercise">
     I want to walk to <ruleref import="someplaces#state"/>
   </rule>

   <!-- Reference to root rule of an imported grammar -->
   <rule id="wet">
     I want to swim to <ruleref import="someplaces"/>
   </rule>
</grammar>

Appendix B: DTD for the XML Format

This appendix is Normative.

The DTD has the following known limitations.

The DTD allows the count, item and one-of elements may be empty. The specification does not describe the interpretation for empty versions of these elements and should be modified.

<?xml version="1.0" encoding="ISO-8859-1"?>

<!-- Speech Recognition Grammar Format DTD v0.6 20001026 -->

<!ENTITY % rule-expansion "#PCDATA | token | ruleref
                              | item | one-of | count " >

<!ELEMENT ruleref EMPTY>
<!ATTLIST ruleref
     uri CDATA #IMPLIED
     import CDATA #IMPLIED
     special CDATA #IMPLIED
     xml:lang NMTOKEN #IMPLIED
     tag CDATA #IMPLIED>

<!ELEMENT token (#PCDATA)>
<!ATTLIST token
     lexicon CDATA #IMPLIED
     xml:lang NMTOKEN #IMPLIED>

<!ELEMENT one-of (item)*>
<!ATTLIST one-of
     tag CDATA #IMPLIED
     xml:lang NMTOKEN #IMPLIED>

<!ELEMENT item ( %rule-expansion; )*>
<!ATTLIST item
    weight NMTOKEN #IMPLIED
    tag CDATA #IMPLIED
    xml:lang NMTOKEN #IMPLIED>

<!ELEMENT count ( %rule-expansion; )*>
<!ATTLIST count
    number CDATA #IMPLIED
    tag CDATA #IMPLIED
    xml:lang NMTOKEN #IMPLIED>

<!ELEMENT rule ( %rule-expansion; | example )*>
<!ATTLIST rule 
    id ID #REQUIRED
    scope (private | public) "private">

<!ELEMENT example (#PCDATA)>

<!ELEMENT import EMPTY>
<!ATTLIST import
    uri CDATA #REQUIRED
    name CDATA #REQUIRED>

<!ELEMENT grammar (import*,rule*)>
<!ATTLIST grammar
    version CDATA #REQUIRED
    xml:lang NMTOKEN #REQUIRED
    root IDREF #IMPLIED
    mode (speech | dtmf) "speech">

Appendix C: Formal Syntax for Augmented BNF

This appendix will be Normative when complete.

A Future Revision of this document will include a formal specification of the syntax of the Augmented BNF format.

Appendix D: DTMF Grammars

This appendix is Informative.

This section defines a Normative representation of a grammar consisting of DTMF tokens. A DTMF grammar can be used by a DTMF detector to determine sequences of legal and illegal DTMF events. However, not all grammar processors are required to support DTMF input.

DTMF (Dual Tone Multiple Frequency) is an ITU standard for telephony signaling. ITU Recommendation Q.23 defines DTMF generation. ITU Recommendation Q.24 defines DTMF reception.

If the grammar mode is declared as "dtmf" then tokens contained by the grammar are treated as DTMF tones (rather as the default of speech tokens).

There are sixteen (16) DTMF tones. Of these twelve (12) are commonally found on telephone sets as the digits "0" through "9" plus "*" (star) and "#" (pound). The four DTMF tones not typically present on telephones are "A", "B", "C", "D".

Each of the DTMF symbols is a legal DTMF token in a DTMF grammar. Space-separated DTMF symbols represent temporal sequences of DTMF entry. Non-space-separated DTMF sequences are also permitted for clarity (under study, see below).

In the ABNF form the "*" symbol is confusable with the "*" post-fix operator (Section 2.5). It is recommended that the "*" and "#" symbols be quoted to avoid ambiguity including when those symbols appear in sequences. As an alternative the tokens "star" and "pound" are acceptable synonyms.

In all other respects a DTMF grammar is syntactically the same as a speech grammar.

The following is a simple DTMF grammar that accepts a 4-digit PIN followed by a pound terminator. It also permits "*9" (e.g. to receive a help message).

#ABNF 1.0 ISO8859-1;

mode dtmf;

$digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9;
public $pin = $digit $digit $digit $digit "#" | "*9";

<?xml version="1.0"?>

<grammar mode="dtmf" version="1.0">

<rule id="digit">
 <one-of>
   <item> 0 </item>
   <item> 1 </item>
   <item> 2 </item>
   <item> 3 </item>
   <item> 4 </item>
   <item> 5 </item>
   <item> 6 </item>
   <item> 7 </item>
   <item> 8 </item>
   <item> 9 </item>
 </one-of>
</rule>

<rule id="pin" scope="public">
 <one-of>
   <item>
     <ruleref uri="#digit"/> <ruleref uri="#digit"/>
     <ruleref uri="#digit"/> <ruleref uri="#digit"/>
     #
   </item>
   <item>
     *9
   </item>
 </one-of>
</rule>

</grammar>

Issues

A grammar must specify its locale by including an xml:lang attribute on the root element. What should the locale be for a DTMF grammar? Does it matter?
It may be helpful to define convenience rules for $digits (0-9) or $any (0-9, "#", "*") since these are so commonly used. However, developers could simply cut'n'paste the definitions from this document.
The current definition permits sequences of DTMF tokens to be defined without space separation. This difference from speech mode is a useful shorthand but does mean that parsers and other processors of grammars must handle DTMF grammars differently. The distinction may be removed in a future revision of this specification.
DTMF input processors are typically configured with timeout parameters to determine when DTMF input has failed. A common timeout is the "inter-digit timeout". For speech recognition, analogous configuration parameters are not part of the speech grammar specification but are instead controlled by the environment in which the grammar is invoked -- e.g. through DialogML/VoiceXML. Should the same apply to DTMF grammars?
There has been a request to allow a grammar to distinguish DTMF events by length. For example, the following might be a way to indicate a DTMF event of at least 1 second in length:
<token dtmf_min_duration="1000ms"> # </token>
Semantic interpretation should follow the same specification as is used for interpreting speech input. Since DTMF input tends to be simple and direct in nature, simplicity in the semantic interpretation specification would be an asset for interpreting DTMF input.

Appendix E: XSLT Style Sheet to Convert XML Grammars to ABNF

This appendix is Informative.

<?xml version="1.0"?> 

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:strip-space elements=
    "import rule example item one-of count token"/>

<xsl:output method="text"/>

<xsl:template name="addtag">
  <xsl:if test="string(@tag)!=''">
    {<xsl:value-of select="@tag"/>}
  </xsl:if>
</xsl:template>

<xsl:template name="addweight">
  <xsl:if test="string(@weight)!=''">
    /<xsl:value-of select="@weight"/>/
  </xsl:if>
</xsl:template>

<xsl:template name="addlang">
  <xsl:if test="string(@lang)!=''">
    !<xsl:value-of select="@lang"/>
  </xsl:if>
</xsl:template>

<xsl:template name="addlexicon">
  <xsl:if test="string(@lexicon)!=''">
    <xsl:choose>
      <xsl:when test="string(@lang)!=''">,</xsl:when>
      <xsl:otherwise>!</xsl:otherwise>
    </xsl:choose>
    <xsl:value-of select="@lexicon"/> 
  </xsl:if>
</xsl:template>

<xsl:template match="grammar">
  #ABNF
  <xsl:value-of select="@version"/>
  <xsl:value-of select="system-property('xsl:encoding')"/>;
  <xsl:text> </xsl:text>
  language <xsl:value-of select="@lang"/>;
  <xsl:if test="string(@mode)!=''">
    mode <xsl:value-of select="@mode"/>;
  </xsl:if>
  <xsl:apply-templates/>
</xsl:template>

<xsl:template match="import">
  import <xsl:value-of select="@uri"/>
  as <xsl:value-of select="@name"/>;
</xsl:template>

<xsl:template match="rule">
  <xsl:value-of select="@scope"/>
  $<xsl:value-of select="@id"/> = 
  <xsl:apply-templates/>
  ;
</xsl:template>

<xsl:template match="token">
  "<xsl:value-of select="text()"/>"
  <xsl:call-template name="addlang"/>
  <xsl:call-template name="addlexicon"/>
</xsl:template>

<xsl:template match="ruleref">
  <xsl:choose>
    <xsl:when test="string(@special)!=''">
      $<xsl:value-of select="@special"/> 
    </xsl:when>
    <xsl:when test="string(@import)!=''">
      $$<xsl:value-of select="@import"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:choose>
        <xsl:when test="starts-with(string(@uri),'#')">
          $<xsl:value-of select="substring-after(@uri,'#')"/>
        </xsl:when>
        <xsl:otherwise>
          $(<xsl:value-of select="@uri"/>)
        </xsl:otherwise>
      </xsl:choose>
    </xsl:otherwise>
  </xsl:choose>
  <xsl:call-template name="addlang"/>
  <xsl:call-template name="addtag"/>
</xsl:template>

<xsl:template match="example"/>

<xsl:template match="one-of">
 (<xsl:apply-templates/>)
 <xsl:call-template name="addlang"/>
 <xsl:call-template name="addtag"/>
</xsl:template>

<xsl:template match="item">
  <xsl:apply-templates/> 
  <xsl:call-template name="addlang"/>
  <xsl:call-template name="addtag"/>
</xsl:template>

<xsl:template match="one-of/item">
  <xsl:call-template name="addweight"/>
  <xsl:apply-templates/> 
  <xsl:call-template name="addlang"/>
  <xsl:call-template name="addtag"/>
  <xsl:if test="not(position()=last())">|</xsl:if>
</xsl:template>

<xsl:template 
match="count[@number='optional']|count[@number='?']">
  <xsl:call-template name="addtag"/>
    [<xsl:apply-templates/>]
  <xsl:call-template name="addlang"/>
</xsl:template>

<xsl:template match="count[@number='0+']">
  <xsl:call-template name="addtag"/>
    (<xsl:apply-templates/>)* 
  <xsl:call-template name="addlang"/>
</xsl:template>

<xsl:template match="count[@number='1+']">
  <xsl:call-template name="addtag"/>
    (<xsl:apply-templates/>)+
  <xsl:call-template name="addlang"/>
</xsl:template>

</xsl:stylesheet>

Speech Recognition Grammar Specification for the W3C Speech Interface Framework

W3C Working Draft 3 January 2001

Status of this Document

1. Introduction

ABNF Form

XML Form

Issues

ABNF Form

XML Form

ABNF Form

XML Form

Issues

ABNF Form

XML Form

Issues

ABNF Form

XML Form

ABNF Form

XML Form

ABNF Form

XML Form

ABNF Form

XML Form

Issues

ABNF Form

XML Form

Issues

ABNF Form

XML Form

Issues

ABNF Form

XML Form

ABNF Form

XML Form

Issues

ABNF Form

XML Form

Issues

ABNF Form

XML Form

ABNF Form

XML Form

Issues

ABNF Form

XML Form

ABNF Form

XML Form

ABNF Form

XML Form

ABNF Form

XML Form

Issues

ABNF Form

XML Form

Issues

Issues

Issues

ABNF Form

XML Form

ABNF: http://www.example.com/places.gram

ABNF: http://www.example.com/booking.gram

XML Grammar: http://www.example.com/places.xml

XML Grammar: http://www.example.com/booking.xml

Issues

Speech Recognition Grammar Specification
for the W3C Speech Interface Framework