W3C

Semantic Interpretation for Speech Recognition (SISR) Version 1.0

W3C Candidate Recommendation 11 January 2006

This version:
http://www.w3.org/TR/2006/CR-semantic-interpretation-20060111/
Latest version:
http://www.w3.org/TR/semantic-interpretation/
Previous version:
http://www.w3.org/TR/2004/WD-semantic-interpretation-20041108/
Editors:
Luc Van Tichelen, Nuance Communications (Editor-in-Chief)
Dave Burke, Voxpilot

Abstract

This document defines the process of Semantic Interpretation for Speech Recognition and the syntax and semantics of semantic interpretation tags that can be added to speech recognition grammars to compute information to return to an application on the basis of rules and tokens that were matched by the speech recognizer. In particular, it defines the syntax and semantics of the contents of Tags in the Speech Recognition Grammar Specification [SRGS].

The results of semantic interpretation describe the meaning of a natural language utterance. The current specification represents this information as an ECMAScript object, and defines a mechanism to serialize the result into XML. The W3C Multimodal Interaction Activity [MMI] is defining an XML data format [EMMA] for containing and annotating the information in user utterances. It is expected that the EMMA language will be able to integrate results generated by Semantic Interpretation for Speech Recognition.

Semantic Interpretation may be useful in combination with other specifications, such as Stochastic Language Models [N-GRAM], but their use with N-grams has not yet been studied.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the 11 January 2006 W3C Candidate Recommendation of "Semantic Interpretation for Speech Recognition (SISR) Version 1.0". W3C publishes a technical report as a Candidate Recommendation to indicate that the document is believed to be stable and to encourage implementation by the developer community. Candidate Recommendation status is described in section 7.1.1 of the Process Document. Comments can be sent until 20 February 2006.

Publication as a Candidate Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document has been produced as part of the Voice Browser Activity (activity statement), following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only).

This document was produced under the 5 February 2004 W3C Patent Policy. The Working Group maintains a public list of patent disclosures relevant to this document; that page also includes instructions for disclosing [and excluding] a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is for public review, and comments and discussion are welcomed on the (archived) public mailing list <www-voice@w3.org>.

This document is based upon the Semantic Interpretation for Speech Recognition (SISR) Version 1.0 Last Call Working Draft of 8 November 2004 and feedback received during the review period (see the Disposition of Comments document). The Voice Browser Working Group (member-only link) believes that this specification addresses its requirements and all Last Call issues.

The entrance criteria to the Proposed Recommendation phase require at least two independently developed interoperable implementations of each required feature, and at least one or two implementations of each optional feature depending on whether the feature's conformance requirements have an impact on interoperability. The Voice Browser Working Group considers the optional feature specified in Section 7 to be a "feature at risk" since it will be removed if no implementation of it is reported to the group. Detailed implementation requirements and the invitation for participation in the Implementation Report are provided in the Implementation Report Plan. We expect to meet all requirements of that report within the Candidate Recommendation period closing 20 February 2006.

Table of Contents

Appendices


1 Introduction

This section is informative.

1.1 Semantic Interpretation

Grammar Processors, and in particular speech recognizers, use a grammar that defines the words and sequences of words to define the input language that they can accept. The major task of a grammar processor consists of finding the sequence of words described by the grammar that (best) matches a given utterance, or to report that no such sequence exists.

In an application, knowing the sequence of words that were uttered is sometimes interesting but often not the most practical way of handling the information that is present in the user utterance. What is needed is a computer processable representation of the information, the Semantic Result, more than a natural language transcript. The process of producing a Semantic Result representing the meaning of a natural language utterance is called Semantic Interpretation (SI).

The Semantic Interpretation process described in this specification uses Semantic Interpretation Tags (SI Tags) (see section 3.2) to provide a means to attach instructions for the computation of such semantic results to a speech recognition grammar. When used with a [VOICEXML20] Processor, it is expected that a Semantic Interpretation Grammar Processor will convert the result generated by an [SRGS] speech grammar processor into an ECMAScript object that can then be processed as specified in section 3.1.6 Mapping Semantic Interpretation Results to VoiceXML Forms in [VOICEXML20].

The W3C Multimodal Interaction Activity [MMI] is defining an XML data format [EMMA] for containing and annotating the information in user utterances. It is expected that the EMMA language will be able to integrate results generated by Semantic Interpretation for Speech Recognition.

This document defines the syntax and the semantics of Semantic Interpretation Tags for use with the Speech Recognition Grammar Specification [SRGS]. It is possible that Semantic Interpretation Tags as defined here can be used also with Stochastic Language Models [N-GRAM], but the current specification does not specifically address such use and does not guarantee that the Semantic Interpretation Tags as defined here are meeting the needs of such use.

1.2 Basic Principles

The basic principles for the Semantic Interpretation mechanism defined in this specification are the following:

This specification uses the ECMAScript Compact Profile [ECMA-327], which is a strict subset of [ECMA-262]. [ECMA-327] has been designed to meet the needs of resource-constrained environments. Special attention has been paid to constraining ECMAScript features that require proportionately large amounts of system memory, and continuous or proportionately large amounts of processing power. In particular, it is designed to facilitate prior compilation for execution in a lightweight environment. This makes it attractive for use in association with speech grammar rules for extracting semantic results from speech recognition.

2 Notational Conventions

In this document, the key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" are to be interpreted as described in [RFC2119]. Requirement levels for conforming Semantic Interpretation for Speech Recognition implementations are defined in Appendix A.

The sections in the main body of this document are normative unless otherwise specified. The appendices and examples in this document are informative unless otherwise indicated explicitly.

This specification normatively references [ECMA-327], which in turn references [ECMA-262]. The notation ES n is used in this document as shorthand for section number n in [ECMA-262].

3 Expressions in Semantic Interpretation Tags

3.1 Rule Variables and Semantic Values

SI Tags compute semantic values. During the semantic interpretation process, these values can be assigned to variables that are associated with the rules in the grammar. These variables are known as Rule Variables.

Every grammar rule has a single Rule Variable that holds a semantic value. The Rule Variable is typically assigned its value by the SI Tags within its grammar rule. SI Tags also have access to the Rule Variables of any other rules referenced by the current grammar rule and already processed up to that point in the utterance (according to the visibility constraints defined in section 6). The Rule Variables of other rules are referenced by the name of their grammar rule, as described in section 3.3.2.

Rule Variables can hold semantic values of any type defined in [ECMA-327]. They are not explicitly typed. Rule Variables that have not been assigned a value are not defined. SI authors will typically use scalar types, e.g. string or numeric values, in lower level rules and more structured objects in higher level rules (particularly root rules).

In addition to semantic values, certain other values corresponding to Rule Variables are available during SI processing.

For every Rule Variable there is an associated variable named text, of type String, which holds the substring (the series of tokens) in the utterance that is governed by the corresponding grammar rule. Text variables are not part of the Rule Variable (see section 3.3.3) and the value of the text variables cannot be modified.

Likewise, for every Rule Variable, there is an associated variable called score, of type Number, which holds a value that is related to the confidence or probability of the corresponding grammar rule or some similar measure. Higher score values indicate higher confidence or probability over the corresponding grammar rule. Processors that don't compute or don't have access to such values must return undefined as the score value. Score variables are not part of the Rule Variable and the value of the score variables cannot be modified.

For every Rule Variable there are two associated variables named starttime and endtime, of type Number, which hold the starting time and ending time of the utterance that is governed by the corresponding grammar rule. The value must be an absolute timestamp in terms of the number of milliseconds since 1 January 1970 00:00:00 GMT, or otherwise must be undefined if the SI processor does either not have time information for the input or cannot process time information. For any given Rule Variable, if both starttime and endtime values are not undefined, then the starttime value must not be greater than the endtime value. For any given Rule Variable, both the starttime value and the endtime value related to the corresponding grammar rule must not be smaller than the starttime value related to the referencing grammar rule. Similarly, for any given Rule Variable, both the starttime and the endtime value related to the corresponding grammar rule must not be greater than the endtime value related to the referencing grammar rule. Undefined values for starttime or endtime cannot impose constraints on referenced rules and are not subject to constraints from referencing rules. These variables are not part of the Rule Variable and their values cannot be modified.

The semantic result for an utterance is the value of the Rule Variable of the root rule when all semantic interpretation evaluations have been completed. For certain result formats (e.g. [EMMA]), this value is serialized into an XML document according to the description in section 7. It is outside the scope of this specification to define how the semantic result is communicated to the application.

3.1.1 Implementation Notes

This section is informative.

In the context of the W3C Voice Browser architecture, the semantic result will be directly cast into ECMAScript variables in the VoiceXML interpreter (see section 3.1.6 in [VOICEXML20]). In the W3C Multimodal Interaction Framework [MMI-FRAMEWORK], the semantic result is expected to be transformed into EMMA following the mechanism described in section 7. In other contexts, the mechanism described in section 7 can be used to transform the semantic result into other XML formats.

Score values are highly dependent on the processor's implementation. In most implementations using speech recognition, scores are likely to be dependent on factors such as audio channel quality, grammar contents, grammar weights, language, individual speaker characteristics, and others. Scores for a particular word or phrase within a grammar are typically comparable over instances of the same word or phrase over time. Scores for different words in a single grammar are also typically comparable to one another. Scores across grammars, or scores for words and word sequences, or scores between different processors, are very often not comparable. It is anticipated that scores will be useful only for annotating the results, not for influencing the results during SI processing. Note that an SI processor doesn't require a speech recognizer, and thus that the score does not even have to be related to speech recognition.

The starttime and endtime variables may be useful in a variety of application contexts, for instance temporal annotations in a multimodal application which integrates semantic results from different modalities (e.g. events from speech and gesture modalities). Note that the constraints on starttime and endtime values for referenced rules do not imply that starttime and endtime values for sequentially referenced rules cannot overlap, and do not exclude that there may be gaps between the endtime time of a first referenced rule and the starttime time of the next referenced rule. The starttime and endtime values may be dependent on the processor's implementation, and the accuracy of the values may be dependent on the speech signal quality or other factors. This specification does not define accuracy requirements that a processor should meet.

3.2 Semantic Interpretation Tags

Semantic Interpretation Tags are added in the string content of the tag elements in the grammar rule expansion, as described in section 2.6 of [SRGS]. This specification further uses the term Semantic Interpretation Tag (or SI Tag) to refer to such tag.

This specification defines two different Semantic Interpretation tag syntaxes. The two different possible values of the tag-format declaration in the grammar define which of the two syntaxes is being used. The different syntaxes only change the processing of tags during Semantic Interpretation, in all other respects the grammar behaves identically.

The "Script" tag syntax, enabled by setting the tag-format to semantics/1.0, defines the contents of tags to be ECMAScript. Each tag is a valid [ECMA-327] program. Section 3.2.2 describes the processing of this tag syntax in more detail.

The "String Literal" tag syntax, enabled by setting the tag-format to semantics/1.0-literals, defines the contents of tags to be strings. This syntax does not have the expressive power of a full scripting language, but does provide a way to produce semantic results consisting of simple strings. Section 3.2.3 describes this tag syntax in more detail.

Within one grammar, it is not possible to mix the two tag syntaxes. All tags in one grammar must have the same tag-format. However, it is possible for externally referenced grammars to have a different tag-format to the parent grammar from which they are referenced from.

3.2.1 Adding Semantic Interpretation Tags to Grammars

Below are two example formats of SI Tags in the Speech Recognition Grammar Specification [SRGS] (tag-content represents the content of the tag which can be either ECMAScript code or a String Literal).

In the XML grammar format, SI Tags are specified as the content of the <tag> element:

<tag> tag-content </tag>

In the ABNF grammar format, SI Tags are enclosed in curly braces or in the three-character sequences '{!{' and '}!}':

{ tag-content }
{!{ tag-content }!}

3.2.2 Semantic Interpretation Scripts

A Semantic Interpretation Script (SI Script) holds a string that is treated as the source text of a valid [ECMA-327] Program ("Program" is defined by ES 14).

The environment in which SI Tags are embedded may introduce escaped characters, character references, or other markup that has to be resolved by the environment. The result after resolution is treated as ECMAScript code.

It is illegal to make an assignment to a variable that has not been previously declared (either implicitly as is the case for Rule Variables or explicitly by using a var statement). Attempting to assign to an undeclared variable will result in a runtime error.

3.2.3 Semantic Interpretation String Literals

A tag using the String Literal tag syntax has content that is a sequence of zero or more characters. If the character sequence is not empty, it has to follow either the DoubleStringCharacters or the SingleStringCharacters production of ES 7.8.4

During processing, a tag with a String Literal has the same effect as a script that assigns the content of the tag, as a string literal, to the Rule Variable of the rule the tag is in.

3.2.4 Authoring Notes

This section is informative.

If multiple tags are present in the rule expansion, the Rule Variable is set to the value of the last tag in the expansion. Prior tags are overwritten by the final tag.

A grammar using the Script tag syntax can reference rules of a grammar using the String Literal tag syntax. The value of the string literal can be obtained by the parent rule using the Rule Variable of the referenced rule. The recognized text of the referenced rule is also available in the meta.latest().text and meta.rulename.text variables (where rulename is the name of the rule).

A grammar using the String Literal tag syntax can reference rules in other grammars (which can be using either the Script tag syntax or the String Literal tag syntax). See section 5 for the way semantic results from a referenced grammar can be used in a grammar with String Literal tag syntax.

Authors should take care to set the tag-format correctly. Using the String Literal tag syntax when the tag-format is set to semantics/1.0 will generally result in a runtime error. However, the converse (using the Script tag syntax when the tag-format is set to semantics/1.0-literals) will not produce a runtime error but rather result in erroneously populating Rule Variables with ECMAScript code.

Examples:

Examples of equivalent grammars, one using the Script tag syntax and the other using the String Literal tag syntax, are given below for both the XML Form and ABNF Form.

XML Form
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"
         xml:lang="en-US" tag-format="semantics/1.0-literals" root="answer">
  <rule id="answer" scope="public">
    <one-of>
      <item><ruleref uri="#yes"/></item>
      <item><ruleref uri="#no"/></item>
    </one-of>
  </rule>
  <rule id="yes">
    <one-of>
      <item>yes</item>
      <item>yeah<tag>yes</tag></item>
      <item><token>you bet</token><tag>yes</tag></item>
      <item xml:lang="fr-CA">oui<tag>yes</tag></item>
    </one-of>
  </rule>
  <rule id="no">
    <one-of>
      <item>no</item>
      <item>nope</item>
      <item>no way</item>
    </one-of>
    <tag>no</tag>
  </rule>
</grammar>

The grammar above with the String Literal tag syntax is equivalent to the grammar below with the Script tag syntax:

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"
         xml:lang="en-US" tag-format="semantics/1.0" root="answer">
  <rule id="answer" scope="public">
    <one-of>
      <item><ruleref uri="#yes"/></item>
      <item><ruleref uri="#no"/></item>
    </one-of>
  </rule>
  <rule id="yes">
    <one-of>
      <item>yes</item>
      <item>yeah<tag>out="yes";</tag></item>
      <item><token>you bet</token><tag>out="yes";</tag></item>
      <item xml:lang="fr-CA">oui<tag>out="yes";</tag></item>
    </one-of>
  </rule>
  <rule id="no">
    <one-of>
      <item>no</item>
      <item>nope</item>
      <item>no way</item>
    </one-of>
    <tag>out="no";</tag>
  </rule>
</grammar>
ABNF Form
#ABNF 1.0;
language en-US;
tag-format <semantics/1.0-literals>;
root $answer;
public $answer = $yes | $no;
$yes = yes | yeah {yes} | "you bet" {!{yes}!} | "oui"!fr-CA {yes};
$no = (no | nope | no way) {no};

The grammar above with the String Literal tag syntax is equivalent to the grammar below with the Script tag syntax:

#ABNF 1.0;
language en-US;
tag-format <semantics/1.0>;
root $answer;
public $answer = $yes | $no;
$yes = yes | yeah {out="yes";} | "you bet" {!{out="yes";}!} |
       "oui"!fr-CA {out="yes";};
$no = (no | nope | no way) {out="no";};

3.3 Syntax for Rule Variables

A SI Script can access Rule Variables using the syntax defined in this section. This syntax applies only to documents for which the SI Tags hold SI Scripts (and not to documents where SI Tags contain the String Literals tag syntax).

3.3.1 Accessing the Rule Variable

Every grammar rule has a single Rule Variable that holds a [ECMA-327] value. This Rule Variable can both be evaluated and assigned to.

The Rule Variable is identified by out.

Properties of the Rule Variable can be individually accessed by out.identifier, where identifier is the name of the property.

out              (identifies the Rule Variable)
out.pizza        (identifies the pizza property of the Rule Variable)

3.3.1.1 Authoring Notes

This section is informative.

The Semantic Interpretation Script typically assigns a value to the Rule Variable of its embedding grammar rule. The Rule Variable is initialized to an empty Object before the first tag in the grammar rule is executed (see section 6.3). The SI author will usually either add properties to this Object or alternatively discard it by assigning a primitive value (e.g. String or Number) to the Rule Variable. Since the Rule Variable is initialized before the tag is executed, a var statement is not required prior to assigning to it.

As a consequence of normal ECMAScript behavior, the SI author is free to override the Rule Variable type as well as value within the bounds of legal ECMAScript. Note that [ECMA-327] enforces rules that affect Semantic Interpretation Scripts. For example, [ECMA-327] reserved words cannot be used as a property. Thus, out.for is illegal because it uses the [ECMA-327] reserved word for.

Examples:
// An Object with property name prop
out.prop = "my property";

// A String with value "my value"
out = "my value";

// A String with value "my value"
out.prop = "my property"; out = "my value";

// A String with value "my value"
out = "my value"; out.prop = "my property";

// A String with value "ab"
out.prop1 = "a"; out.prop2 = "b"; out = out.prop1 + out.prop2;

// An Object with property name prop
out = "my value"; out = new Object(); out.prop = "my property";

3.3.2 Accessing the Rule Variable of a Referenced Grammar Rule

SI Scripts can access the Rule Variable associated with grammar rules referenced in SI Tags that appear after (to the right or below) the rule reference in the grammar expansion, and only if the referenced rule was used in the expansion that matched the input utterance. See visibility rules in section 6 for a more detailed description of when Rule Variables associated to rule references can be referenced in SI Tags, using the concept of the logical parse structure and the flat parse list.

Rule Variables associated to referenced rules can both be evaluated and assigned to. Every SI Script has access to a rules object that has a property holding the Rule Variable value for every visible rule. The Rule Variable associated to a rule reference is identified by rules.rulename, where rulename is the rulename of the rule, as defined in Section 3.1 Basic Rule Definition in [SRGS]. Individual properties of a Rule Variable can be identified by rules.rulename.identifier, where rulename is the name of the rule and identifier is the name of the property.

The Rule Variable for the latest rule reference that was used in the expansion matching the utterance up to the position of the SI Tag can also be referenced through rules.latest().

In an expression, both the Rule Variables of the current grammar rule and the referenced rules can be evaluated and assigned to.

Special rules (NULL, VOID, GARBAGE) cannot be evaluated.

3.3.2.1 Authoring Notes

This section is informative.

The rules.rulename notation (where rulename is the name of a referenced rule) can be used only for explicit local rule references and for explicit references to a named rule of a grammar, not for implicit rule references (see SRGS Section 2.2 Rule Reference in [SRGS] for a definition of explicit and implicit rule references). To refer to the Rule Variable for a rule that is referenced by an implicit reference to the root rule of a grammar, the rules.latest() notation can be used.

Examples:
// The Rule Variable associated to the referenced rule "rulename"
rules.rulename

// The property "prop" of the Rule Variable associated with the referenced
// rule "rulename"
rules.rulename.prop

// The Rule Variable associated to the latest matching rule reference before
// the SI Tag
rules.latest()

// The property "prop" of Rule Variable associated to latest matching rule
// reference before the SI Tag
rules.latest().prop

Section 6 describes the visibility rules for accessing Rule Variables. If according to these rules a Rule Variable is not visible, one can still evaluate or declare and assign to the variable with that name (it is just a property on the rules object). The value assigned to a property of the rules object that has the name of a Rule Variable will be overwritten when that Rule Variable is visible according to section 6. This behavior can be used to "initialize" Rule Variables to handle cases where a referenced rule may not actually be matched depending on the input to the grammar.

In the following grammar, by declaring and assigning rules.foodsize a default value, the value for the drink rule will always be:

{
  drinksize: "medium",
  type: "coke"
}

regardless of whether the input is 'coke' or 'medium coke':

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"
         xml:lang="en-US" tag-format="semantics/1.0" root="drink">
  <rule id="drink">
    <-- Note: rules object always exists in scope -->
    <tag>rules.foodsize="medium"; </tag>
    <item repeat="0-1">
      <ruleref uri="#foodsize"/>
    </item>
    <ruleref uri="#kindofdrink"/>
    <tag>out.drinksize=rules.foodsize; out.type=rules.kindofdrink;</tag>
  </rule>
  <rule id="foodsize">
    <one-of>
      <item>small</item>
      <item>medium</item>
      <item>large</item>
    </one-of>
  </rule>
  <rule id="kindofdrink">
    <one-of>
      <item>coke</item>
      <item>pepsi</item>
    </one-of>
  </rule>
</grammar>

3.3.3 Accessing Variables Associated with a Grammar Rule or Referenced Grammar Rule

A Rule Variable's text variable is identified by meta.rulename.text, where rulename is the name of the Rule Variable. The text variable of the Rule Variable referred to by rules.latest() is identified by meta.latest().text. The text variable associated to the current grammar rule is identified by meta.current().text. The text variable of the current grammar rule is read-only.

A Rule Variable's score variable is identified by meta.rulename.score, where rulename is the name of the Rule Variable. The score variable of the Rule Variable referred to by rules.latest() is identified by meta.latest().score. The score variable associated to the current grammar rule is identified by meta.current().score. The score variable of the current grammar rule is read-only.

A Rule Variable's starttime and endtime variables are identified by meta.rulename.starttime and meta.rulename.endtime, where rulename is the name of the Rule Variable. The starttime and endtime variables of the Rule Variable referred to by rules.latest() are identified by meta.latest().starttime and meta.latest().endtime. The starttime and endtime variables associated to the current grammar rule are identified by meta.current().starttime and meta.current().endtime. The starttime and endtime variables of the current grammar rule are read-only.

3.3.3.1 Authoring Notes

This section is informative.

Since the text, score, starttime, and endtime variables of the current grammar are read-only, they behave as read-only properties as defined in [ECMA-327]. As a consequence, attempts to assign to the text, score, starttime or endtime variable associated to the Rule Variable of the current grammar rule will be ignored. Note, however, that the text, score, starttime, and endtime properties of a referenced rule (i.e. those properties of meta.rulename() where rulename is the referenced rule or meta.latest()), are not read-only.

Examples:
// The text variable of the Rule Variable called "rulename"
meta.rulename.text

// The text variable of the Rule Variable referenced to by rules.latest()
meta.latest().text

// The text (read-only) variable of the current grammar rule
meta.current().text

4 Semantic Interpretation Grammars

4.1 Semantic Interpretation Grammars

This specification defines a Semantic Interpretation Grammar to be a Speech Recognition Grammar as defined by [SRGS] that

4.2 Global Variable Declarations and Initialization

The header of an [SRGS] grammar may contain one or more global SI Tags. In grammars using the Script tag syntax, these tags are executed before any of the SI Tags in the matching grammar rules are evaluated. There are no ordering constraints between SI Tags and other valid SRGS grammar header items (see section 4.1 of [SRGS]). Global tags are ignored in grammars using the String Literal tag syntax.

The SI Tags are evaluated only once in a global scope that will be shared by all evaluations (see section 6.3)

Whereas all evaluations for SI Tags in flat parse lists for matching rules have access to the global scope for reading only, the SI Tags in the grammar header have write access to the global scope. This is the primary function of these tags: to initialize the global scope for use in the SI Tags.

Examples:
XML Form

In the XML Form, global SI Tags are SI Tags that appear outside all rules in the grammar header and before the first rule.

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"
         xml:lang="en-US" tag-format="semantics/1.0" root="rule">
  <tag>var x=1;</tag>
  <tag>var y='abcd';</tag>
  <rule id="rule">
    <one-of>
      <item>yes</item>
      <item>no</item>
    </one-of>
  </rule>
</grammar>
ABNF Form

In the ABNF Form, global SI Tags are SI Tags followed by a semicolon, that appear outside all rules in the grammar header and before the first rule. Both tag delimiting syntaxes are illustrated in the example.

#ABNF 1.0;
language en-US;
tag-format <semantics/1.0>;
root $rule;
{var x=1;};
{!{var y='abcd';}!};
$rule = yes | no;

5 Default Assignment

For a given parse, if there is no SI Tag attached to the expansion in the grammar rule that is used to match the utterance, then the value for the out Rule Variable is determined as follows. If there are no rule references in the parse, the value for the text meta variable (meta.current().text) is automatically copied into the Rule Variable (which then becomes of type String). Otherwise, the value of the Rule Variable of the last rule reference in the parse (rules.latest()) is automatically copied into the Rule Variable.

Examples:

For the following rule, rules.drink is either "coke", "pepsi" or "coca cola". Similarly for meta.drink.text.

<rule id="drink">
  <one-of>
    <item>coke</item>
    <item>pepsi</item>
    <item>coca cola</item>
  </one-of>
</rule>

For the following rule, there is an String Literal tag associated with "coca cola" and hence rules.drink is either "coke" or "pepsi". However, meta.drink.text is either "coke", "coca cola", or "pepsi".

<rule id="drink">
  <one-of>
    <item>coke</item>
    <item>pepsi</item>
    <item>coca cola<tag>coke</tag></item>
  </one-of>
</rule>

For the following grammar, the utterance "I want to fly to Boston" will return the result "BOS".

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"
         xml:lang="en-US" tag-format="semantics/1.0-literals" root="flight">
   <rule id="flight" scope="public">
     I want to fly to
     <ruleref uri="#airports"/>
   </rule>
   <rule id="airports" scope="private">
     <one-of>
       <ruleref uri="#USairport "/>
       <ruleref uri="#otherairport"/>
     </one-of>
   </rule>
   <rule id="USairport" scope="private">
     <one-of>
       <item>Boston<tag>BOS</tag></item>
       <item>New York<tag>JFK</tag></item>
       <item>Chicago<tag>ORD</tag></item>
     </one-of>
   </rule>
   <rule id="otherairport" scope="private">
     <one-of>
       <item>Brussels<tag>BRU</tag></item>
       <item>Paris<tag>CDG</tag></item>
       <item>Rome<tag>FCO</tag></item>
     </one-of>
   </rule>
</grammar>

Note that the default assignment has been designed to handle the simplest but most frequent cases only. It cannot cope with combining information from different rule references. For example, the grammar below would return the information about the last airport only, not about both airports. For the following grammar, the utterance "I want to fly from Chicago to Boston" will return the result "BOS".

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"
         xml:lang="en-US" tag-format="semantics/1.0-literals" root="flight">
   <rule id="flight" scope="public">
     I want to fly from
     <one-of>
       <item><ruleref uri="#USairport "/></item>
       <item><ruleref uri="#otherairport"/></item>
     </one-of>
     to
     <one-of>
       <item><ruleref uri="#USairport "/></item>
       <item><ruleref uri="#otherairport"/></item>
     </one-of>
   </rule>
   <rule id="USairport" scope="private">
     <one-of>
       <item>Boston<tag>BOS</tag></item>
       <item>New York<tag>JFK</tag></item>
       <item>Chicago<tag>ORD</tag></item>
     </one-of>
   </rule>
   <rule id="otherairport" scope="private">
     <one-of>
       <item>Brussels<tag>BRU</tag></item>
       <item>Paris<tag>CDG</tag></item>
       <item>Rome<tag>FCO</tag></item>
     </one-of>
   </rule>
</grammar>

In order to make this grammar return both airports, one would have to use the Script tag syntax, as shown below. This functionality cannot be achieved by relying only on literal tags and default assignments.

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"
         xml:lang="en-US" tag-format="semantics/1.0" root="flight">
   <rule id="flight" scope="public">
     I want to fly from
     <one-of>
       <item>
         <ruleref uri="http://www.example.com/places.grxml"/>
       </item>
       <item>
         <ruleref uri="http://www.example.com/places.grxml#otherairport"/>
       </item>
     </one-of>
     <tag>out.departure = rules.latest();</tag>
     to
     <one-of>
       <item>
         <ruleref uri="http://www.example.com/places.grxml"/>
       </item>
       <item>
         <ruleref uri="http://www.example.com/places.grxml#otherairport"/>
       </item>
     </one-of>
     <tag>out.arrival = rules.latest();</tag>
   </rule>
</grammar>

Grammar http://www.example.com/places.grxml:

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"
         xml:lang="en-US" tag-format="semantics/1.0-literals" root="USairport">
   <rule id="USairport" scope="public">
     <one-of>
       <item>Boston<tag>BOS</tag></item>
       <item>New York<tag>JFK</tag></item>
       <item>Chicago<tag>ORD</tag></item>
     </one-of>
   </rule>
   <rule id="otherairport" scope="public">
     <one-of>
       <item>Brussels<tag>BRU</tag></item>
       <item>Paris<tag>CDG</tag></item>
       <item>Rome<tag>FCO</tag></item>
     </one-of>
   </rule>
</grammar>

6 Visibility Rules and Order of Tag Evaluation for SRGS Grammars

This section defines the visibility rules and order of tag evaluation for SI Tags used in the Speech Recognition Grammar Format (ABNF and XML Form). When SI Tags are embedded in other markup languages (e.g. in [N-GRAM]), the visibility rules and order of evaluation may be defined differently.

6.1 Logical Parse Structure

After the initialization of the global scope (see section 6.3), the visibility rules and the order of evaluation of semantic interpretation tags are defined in terms of the logical parse structure as defined in Appendix H Logical Parse Structure in [SRGS] .

Note that while this appendix is informative for the Speech Recognition Grammar Specification, it is normative for the Semantic Interpretation specification. This does not imply that grammar processors must implement a logical parse structure, nor that ambiguities or recursion should be handled in any specific way over what is required for a conformant speech recognition grammar processor. The Logical Parse Structure is only a means to illustrate the order of evaluation and visibility rules for SI Tags. Implementations are not required to expose the logical structure and may use different internal representation as long as these yield the results described here.

The Logical Parse Structure is a formal syntax for describing the sequence and relation of tags and rule references to the tokens that are input to the grammar processor.

The Logical Parse output is represented as an array of output entities en, e.g. [e1, e2, e3].

Output entities can be one out of three kinds:

Appendix H in [SRGS] contains a full description of how to create the logical parse on a grammar for a given input to a grammar processor.

For the purpose of building the logical parse, all String Literals are assumed to be converted into the equivalent SI Script as defined in 3.2.3

Examples:

The sentence "turn the heating off" on the following XML Form grammar

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"
         xml:lang="en-US" tag-format="semantics/1.0" root="command">
   <rule id="command">
      <one-of>
         <item>set</item>
         <item>turn</item>
      </one-of>
      <ruleref uri="#object"/>
      <ruleref uri="#state"/>
      <tag>out.o=rules.object; out.s=rules.state;</tag>
   </rule>
   <rule id="object">
      <item repeat="0-1">the</item>
      <one-of>
         <item>
            <one-of>
               <item>heating</item>
               <item>cooling</item>
            </one-of>
            <tag>out="airco";</tag>
          </item>
          <item>radio<tag>out="radio";</tag></item>
          <item>lights<tag>out="lights";</tag></item>
       </one-of>
   </rule>
   <rule id="state">
      <one-of>
         <item>to</item>
         <item><ruleref special="NULL"/></item>
      </one-of>
      <one-of>
         <item>on<tag>out="1";</tag></item>
         <item>off<tag>out="0";</tag></item>
         <item>warm<tag>out="w";</tag></item>
         <item>cool<tag>out="c";</tag></item>
         <item>cold<tag>out="c";</tag></item>
      </one-of>
   </rule>
</grammar>

or equivalent ABNF Form grammar

#ABNF 1.0;
language en-US;
tag-format <semantics/1.0>;
root $command;
$command = (set | turn)
           $object $state {out.o=rules.object; out.s=rules.state;};
$object = [the] (heating | cooling){out="airco";} | radio{out="radio";} |
          lights{out="lights";};
$state = (to|$NULL) (on{out="1";} | off{out="0";} | warm{out="w";} |
         cool{out="c";} | cold{out="c";});

will result in the logical parse

[$command [turn,
           $object [the,
                    heating,
                    {out="airco";}],
           $state  [off,
                    {out="0";}],
           {out.o=rules.object; out.s=rules.state;}]
]

6.2 Flat Parse List

The logical parse structure is a tree-like structure that shows all terminals, tags and rule references governed by a given rule. This tree can also be represented in a flattened list of parses, with one parse for every grammar rule application.

The flat parse for a given rule application is represented as:

The output entities are as in the logical parse structure, except that rule references are represented without an array of output entities but followed by a sequence number in parenthesis.

Examples:

The equivalent flat parse list for the above example is:

$command(1): turn, $object(1),
             $state(1), {out.o=rules.object; out.s=rules.state;}
$object(1): the, heating, {out="airco";}
$state(1): off, {out="0";}

The following example illustrates the use of the sequence number for rules that are applied more than once. Consider the grammar with String Literals, in XML Form:

<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar"
         xml:lang="en-US" tag-format="semantics/1.0-literals" root="a">
   <rule id="a">
      <item repeat="1-"><ruleref uri="#b"/></item>
      <ruleref uri="#c"/>
      <one-of>
         <item>
            <item repeat="0-1">t1</item>
            <tag>tag1</tag>
         </item>
         <item>
            <ruleref uri="#d"/>
            <tag>tag2</tag>
         </item>
      </one-of>
   </rule>
   <rule id="b">
      <one-of>
         <item>t2</item>
         <item>t3<tag>tag3</tag></item>
         <item>t4</item>
      </one-of>
   </rule>
   <rule id="c">
      <item repeat="1-2">t5<tag>tag5</tag></item>
   </rule>
   <rule id="d">
      t6 <ruleref uri="#c"/>
   </rule>
</grammar>

or equivalently in ABNF Form:

#ABNF 1.0;
language en-US;
tag-format <semantics/1.0-literals>;
root $a;
$a = ($b)<1-> $c (t1)<0-1> {tag1} | $d {tag2};
$b = t2 | t3 {tag3} | t4;
$c = (t5 {tag5})<1-2>;
$d = t6 $c;

Given the input "t2 t3 t5 t5", the logical parse structure is:

[$a[ $b[t2], $b[t3, {tag3}],$c[t5, {tag5}, t5, {tag5}],{tag1}]

and the flat parse list is:

$a: $b(1), $b(2), $c(1), {tag1}
$b(1): t2
$b(2): t3, {tag3}
$c(1): t5, {tag5}, t5, {tag5}

6.3 Scoping and Visibility Rules for Script Tag Syntax Grammars

These scoping and visibility rules are defined on the basis of the flat parse list as specified in section 6.2

6.3.1 The Global Scope

Before evaluating any scripts in the flat parse list, a global anonymous ECMAScript scope is created for the grammar. This global scope is initialized by executing the scripts that are in the global tags in the grammar header (see section 4.2).

During evaluation of a script in the flat parse list, the global scope is accessible for reading only.

Every script has only one global scope associated: the global scope for the grammar in which the script appears. Scripts in referenced rules that are located in a referenced external grammar are thus executed with access to that referenced grammar's global scope, and don't have access to the referencing grammar's global scope.

The tags within a flat parse are executed in the order in which they appear, left to right. The global tags (in the grammar header) are executed in document order. See section 6.4 for details.

6.3.2 Scope Chains and Access to Variables

For each flat parse, a new anonymous ECMAScript scope is created that is a direct child of the global scope object for the grammar in which the related rule is defined. The ECMAScript scope chains thus always have the global scope (the scope of the whole parse) as the top-level object, and the scope belonging to the parse list as the successor.

Access to variables in tag executions are resolved with the scope chain according to the ECMAScript rules (ES 10.1.4).

The variables object according to [ECMA-327] is the scope object created for this rule. This means that local variables that are defined in tags belonging to a rule reference are created in the scope object that was created for this rule.

Before the first tag in a flat parse is executed, the environment of a new scope is set up in the following way:

When execution of the flat parse is finished, the scope object of this flat parse is removed from the scope chain. The scope belonging to the referencing flat parse is then updated in the following way (replace rulename with the name of the rule in what follows):

If any of these variables already exist, they are overwritten.

Note: Whether or not the out, rules and meta variables are enumerated when enumerating the scope object is not defined by this specification and may vary over implementations. Authors are discouraged to use enumeration of the scope object.

6.3.3 Visibility

The consequences of these scoping rules are:

6.3.4 Global Variables

Since the global scope is read-only, assignments to global variables are not allowed in SI Tags in rules. They are only possible in the global SI Tags in the grammar header (see section 4.2)

Examples:

The following rule contains two Rule Variables associated with the same rule "city". The XML Form is:

<rule id="fromto">
   from
   <ruleref uri="#city"/>
   <tag>out.fromcity=rules.city.name;</tag>
   to
   <ruleref uri="#city"/>
   <tag>out.tocity=meta.city.text;</tag>
</rule>

and the equivalent ABNF Form is:

$fromto = from $city {out.fromcity=rules.city.name;} to
          $city {out.tocity=meta.city.text;};

To determine which of the Rule Variable instances the tags refer to, we can build the flat parse for $fromto, which is always of the form:

$fromto: from, $city(1), {out.fromcity=rules.city.name;}, to,
         $city(2), {out.tocity=meta.city.text;}

From this it follows that rules.city.name in the first tag refers to the first Rule Variable rules.city in the rule, and that the reference to meta.city.text in the second tag is to the second Rule Variable named rules.city.

In the following rule, the flat parse is depending on whether the input matches the optional rule b. The XML Form is:

<rule id="a">
   <ruleref uri="#b"/>
   <item repeat="0-1"><ruleref uri="#b"/></item>
   <tag>out.x=rules.b.x;</tag>
</rule>

and the equivalent ABNF Form is:

$a = $b [$b] {out.x=rules.b.x;};

The two possible flat parses are:

$a: $b(1), {out.x=rules.b.x;}
$a: $b(1), $b(2), {out.x=rules.b.x;}

The reference rules.b.x in the tag will thus refer to either the first or the last rule b, depending on whether the optional rule b was matched in the input.

The SI Tag in the rule below contains a couple of references to Rule Variables that are undefined since there is no Rule Variable with that name before the tag in the flat parse. The XML Form is:

<rule id="a">
   <ruleref uri="#b"/>
   <item repeat="0-1"><ruleref uri="#c"/></item>
   <tag>out.x=rules.c; out.y=rules.d; out.z=rules.e;</tag>
   <ruleref uri="#e"/>
</rule>

and the equivalent ABNF Form is:

$a = $b [$c] {out.x=rules.c; out.y=rules.d; out.z=rules.e;} $e;

The two possible flat parses are:

$a: $b(1), {out.x=rules.c; out.y=rules.d; out.z=rules.e;}, $e(1)
$a: $b(1), $c(1), {out.x=rules.c; out.y=rules.d; out.z=rules.e;}, $e(1)

This means that:

6.4 Order of Tag Execution for Script Tag Syntax Grammars

Within a single SI Tag, the order of evaluation is determined by [ECMA-327] for the evaluation of a valid [ECMA-327] Program (ES 14).

All global SI Tags (in tags in the grammar header) are executed once, before any SI Tags within a grammar rule are executed (see section 4.2).

The order of evaluating multiple SI Tags within a grammar rule is the order in which the SI Tags appear in the flat parse list for that rule application. The flat parse list also determines how many SI elements will be generated from an SI Tag that occurs in a grammar rule. Every SI Tag element in a flat parse list is evaluated exactly once. The order of evaluating String Literals is determined by the order in which the equivalent SI Tag appears in the flat parse list (see section 6.2).

The computation of the semantic value of a rule reference in a flat parse list may occur at any time during the processing of the entire logical parse structure, subject to the following condition: the semantic value of a rule reference must be computed before any SI Tag using that reference's value is processed.

Examples:

Consider the following rules in XML Form:

<rule id="a">
   <ruleref uri="#b"/>
   <tag>out.y=rules.b.x;</tag>
   <item repeat="0-1">
     <ruleref uri="#b"/><tag>out.y=out.y+rules.b.x;</tag>
   </item>
</rule>
<rule id="b">
   foo
   <tag>out.x=1;</tag>
   <one-of>
      <item>bar<tag>out.x=3;</tag></item>
      <item>
         <item repeat="1-">boo<tag>out.x=out.x+1;</tag></item>
      </item>
   </one-of>
</rule>

or equivalently in ABNF Form:

$a = $b  {out.y=rules.b.x;} [$b {out.y=out.y+rules.b.x;}];
$b = foo {out.x=1;} (bar {out.x=3;} | (boo {out.x=out.x+1;})<1->);

For the input "foo boo boo boo", the flat parse lists are:

$a: $b(1), {out.y=rules.b.x}
$b(1): foo, {out.x=1;}, boo, {out.x=out.x+1;}, boo, {out.x=out.x+1;},
       boo, {out.x=out.x+1;}

and out.y evaluates to 4.

For the input "foo bar foo boo", the flat parse lists are:

$a: $b(1), {out.y=rules.b.x;}, $b(2), {out.y=out.y+rules.b.x;}
$b(1): foo, {out.x=1;}, bar, {out.x=3;}
$b(2): foo, {out.x=1;}, boo, {out.x=out.x+1;}

and out.y evaluates to 5.

6.5 Examples

The rules.b.x and rules.c.x refer to the respective Rule Variable properties:

<rule id="a">
   <ruleref uri="#b"/>
   <ruleref uri="#c"/>
   <tag>out.x = rules.b.x + rules.c.x;</tag>
</rule>

The rules.c.x causes a run-time error because it is used to the left of rule c:

<rule id="a">
   <ruleref uri="#b"/>
   <tag>out.x = rules.b.x + rules.c.x;</tag>
   <ruleref uri="#c"/>
</rule>

The rules.b.x evaluates to the x property of rules.b if rule b is matched on the input utterance. Otherwise it causes a run-time error:

<rule id="a">
   <item repeat="0-1"><ruleref uri="#b"/></item>
   <ruleref uri="#c"/>
   <tag>out.x = rules.b.x + rules.c.x;</tag>
</rule>

A safer way to write this rule could be (assuming x is of type Number):

<rule id="a">
   <tag>out.x=0;</tag>
   <item repeat="0-1"><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item>
   <ruleref uri="#c"/>
   <tag>out.x = out.x + rules.c.x;</tag>
</rule>

The rules.b.x evaluates to the last occurrence of rule b in the repeat:

<rule id="a">
   <item repeat="1-"><ruleref uri="#b"/></item>
   <ruleref uri="#c"/>
   <tag>out.x=rules.b.x+rules.c.x;</tag>
</rule>

If the purpose was to add or concatenate over each occurrence of rules.b, it should be written as:

<rule id="a">
   <item repeat="1-">
     <ruleref uri="#b"/><tag>out.x=out.x+rules.b.x;</tag>
   </item>
   <ruleref uri="#c"/>
   <tag>out.x=out.x+rules.c.x;</tag>
</rule>

The rules.b evaluates to the last occurrence of rules.b in the repeat="0-" expansion, if any, otherwise it is undefined:

<rule id="a">
   <item repeat="0-"><ruleref uri="#b"/><ruleref uri="#d"/></item>
   <ruleref uri="#c"/>
   <tag>out.x=rules.b+rules.c.x;</tag>
</rule>

Either rules.b.x or rules.c.x will cause a run-time error depending on the input utterance:

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/></item>
      <item><ruleref uri="#c"/></item>
   </one-of>
   <tag>out.x=rules.b.x+rules.c.x;</tag>
</rule>

This could be better written as:

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item>
      <item><ruleref uri="#c"/><tag>out.x=rules.c.x;</tag></item>
   </one-of>
</rule>

The rules.b.x refers to whichever rules.b actually matched:

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/> a</item>
      <item>a <ruleref uri="#b"/></item>
   </one-of>
   <ruleref uri="#c"/>
   <tag>out.x=rules.b.x+rules.c.x;</tag>
</rule>

One of the operands to every addition causes a run-time error here depending on the input utterance:

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/></item>
      <item><ruleref uri="#c"/></item>
   </one-of>
   <one-of>
      <item><ruleref uri="#d"/></item>
      <item><ruleref uri="#e"/></item>
   </one-of>
   <tag>out.x=(rules.b.x+rules.c.x) * (rules.d.x+rules.e.x);</tag>
</rule>

This rule can be better written as:

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item>
      <item><ruleref uri="#c"/><tag>out.x=rules.c.x;</tag></item>
   </one-of>
   <one-of>
      <item><ruleref uri="#d"/><tag>out.x=out.x*rules.d.x;</tag></item>
      <item><ruleref uri="#e"/><tag>out.x=out.x*rules.e.x;</tag></item>
   </one-of>
</rule>

Evaluation of rules.b.x always causes a run-time error because the expression will be evaluated only when rule c matches, not rule b. (When rule b matches, the default assignment would cause out=meta.b.text).

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/></item>
      <item><ruleref uri="#c"/><tag>out.x=rules.b.x+rules.c.x;</tag></item>
   </one-of>
</rule>

A more useful rule could be:

<rule id="a">
   <one-of>
      <item><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item>
      <item><ruleref uri="#c"/><tag>out.x=rules.c.x;</tag></item>
   </one-of>
</rule>

The expression is only evaluated if rule c matches; in that case both rules.b and rules.c are defined:

<rule id="a">
   <ruleref uri="#b"/>
   <item repeat="0-1">
      <ruleref uri="#c"/>
      <tag>out.x=rules.b.x+rules.c.x;</tag>
   </item>
</rule>

The expression is evaluated for every occurrence of rule c. Note that this will actually result in rules.b.x to be added to out.x for the last occurrence of rule c because every evaluation will overwrite the previous result.

<rule id="a">
   <ruleref uri="#b"/>
   <item repeat="1-">
      <ruleref uri="#c"/>
      <tag>out.x = rules.b.x + rules.c.x;</tag>
   </item>
</rule>

Same effect as previous example except that now the expression is not evaluated if rule c did not match once.

<rule id="a">
   <ruleref uri="#b"/>
   <item repeat="0-">
      <ruleref uri="#c"/>
      <tag>out.x = rules.b.x + rules.c.x;</tag>
   </item>
</rule>

These rules do the obvious concatenation of digits. Note that the ds property is first initialized to "" because otherwise in the first evaluation of the expression, ds would be undefined and would cause a run-time error:

<rule id="digits">
   <tag>out.ds="";</tag>
   <item repeat="1-">
      <ruleref uri="#digit"/>
      <tag>out.ds = out.ds + rules.digit;</tag>
   </item>
</rule>
<rule id="digit">
   <one-of>
      <item>"0"</item>
      <item>"1"</item>
      <item>"2"</item>
      <item>"3"</item>
      <item>"4"</item>
      <item>"5"</item>
      <item>"6"</item>
      <item>"7"</item>
      <item>"8"</item>
      <item>"9"</item>
   </one-of>
</rule>

The rules.latest() resolves to rules.c:

<rule id="a">
   <ruleref uri="#b"/>
   <ruleref uri="#c"/>
   <tag>out=rules.latest();</tag>
</rule>

The rules.latest() resolves to rules.b:

<rule id="a">
   <ruleref uri="#c"/>
   <ruleref uri="#b"/>
   <tag>out=rules.latest();</tag>
</rule>

The rules.latest() returns undefined:

<rule id="a">
   b c
   <tag>out=rules.latest();</tag>
</rule>

If rule b matches, rules.latest() resolves to rules.b. If rule c matches, rules.latest() resolves to rules.c:

<rule id="x">
   <ruleref uri="#a"/>
   <one-of>
      <item><ruleref uri="#b"/></item>
      <item><ruleref uri="#c"/></item>
   </one-of>
   <tag>out=rules.latest();</tag>
</rule>

This is equivalent to:

<rule id="x">
   <ruleref uri="#a"/>
   <one-of>
      <item><ruleref uri="#b"/><tag>out=rules.latest();</tag></item>
      <item><ruleref uri="#c"/><tag>out=rules.latest();</tag></item>
   </one-of>
</rule>

The rules.latest() resolves to rules.b, if rule b matches, if not, it resolves to rules.a:

<rule id="x">
   <ruleref uri="#a"/>
   <item repeat="0-1"><ruleref uri="#b"/></item>
   <tag>out=rules.latest();</tag>
</rule>

The effect is equivalent to:

<rule id="x">
   <ruleref uri="#a"/><tag>out=rules.latest();</tag>
   <item repeat="0-1"><ruleref uri="#b"/><tag>out=rules.latest();</tag></item>
</rule>

The rules.latest() resolves to the last occurrence of rules.a:

<rule id="x">
   <item repeat="1-"><ruleref uri="#a"/></item>
   <tag>out=rules.latest();</tag>
</rule>

The effect is equivalent to:

<rule id="x">
   <item repeat="1-"><ruleref uri="#a"/><tag>out=rules.latest();</tag></item>
</rule>

7 Using Semantic Interpretation to Generate XML Results

Semantic Interpretation processors may be used in environments where a return result is expected in XML format (for example, those supporting [EMMA]).

If returning XML results, the following serialization rules must be used to generate an XML fragment from the Semantic Interpretation process. Notice that these serialization rules apply to semantic values generated by authored SI Tags during SI processing, and do not preclude the addition of further information into the XML result by an individual SI processor (for example, recognizer annotations corresponding to acoustic confidence scores or other such information). This specification does not define the XML documents in which the generated fragment can be embedded.

The serialization into XML has been designed as a convenient mechanism to generate XML fragments directly from SI grammars. It has not been designed as a generic conversion mechanism from [ECMA-327] objects into XML fragments. It is not a generic conversion mechanism for at least the following reasons:

7.1 Serialization of an ECMAScript Result into an XML Fragment

The serialization of the ECMAScript result into an XML fragment is governed by the following transformations rules:

  1. If the ECMAScript top-level Rule Variable is not an Object but a simple scalar type (String, Number, Boolean, Null or Undefined) then the resulting XML fragment only consists of character data without any mark-up. The character data will be the value of the top-level Rule Variable as if the ToString() operation had been performed on an argument of this type (e.g., for Boolean, the result would be true or false).
  2. Each property (see note below) in the ECMAScript top-level Rule Variable becomes an XML element. The name of the element will be the same as the name of the property.
  3. If the value of the property is a simple scalar type (String, Number, Boolean, Null or Undefined) then the character data content of the XML element will be the value of this property as if the ToString() operation had been performed on an argument of this type.
  4. If the property is of type Object, then each child property of this object becomes a child element, and the contents of these child elements are in turn processed.
  5. Indexed elements of an Array object (e.g. a[0], a[1]. etc.) become XML child elements with name <item>. Each <item> element has an attribute named index, which is the index of the corresponding element in the array. In addition, the XML element containing the <item> elements includes an attribute named length, whose value is given by the length property of the ECMAScript Array object. Any other properties of an Array object, for instance the keys of an associative array (e.g. a["prop"]), are subject to the same transformation rules as the regular properties of an object. In a sparse array, only those elements which hold defined values will be serialized.
  6. Properties with the name _attributes, _value, _nsdecl and _nsprefix will be treated according to the rules described in the sections below.

Notes:

Examples:

Following the above principles, to take the top-level Rule Variable with the properties drink and pizza of the example grammar in section 8:

{
   drink: {
      liquid:"coke",
      drinksize:"medium"},
   pizza: {
      number: "3",
      pizzasize: "large",
      topping: [ "pepperoni" "mushrooms" ]
   }
}

SI processing in an XML environment would generate the following document:

<drink>
   <liquid>coke</liquid>
   <drinksize>medium</drinksize>
</drink>
<pizza>
   <number>3</number>
   <pizzasize>large</pizzasize>
   <topping length="2">
      <item index="0">pepperoni</item>
      <item index="1">mushrooms</item>
   </topping>
</pizza>

The following example ECMAScript object would cause an error because the $size$ property while a valid name in ECMAScript is not a valid name for an XML Element:

{
   drink: {
      liquid:"coke",
      $size$:"medium"}
}

7.2 Use of _attributes and _value

Variables named _attributes and _value can be created and used by the author to enable the generation of richer XML results, including the following structures:

The _attributes object is used to hold property name/value pairs which will be rendered as XML attributes of the object which contains _attributes.

The _value variable is used to hold a scalar value for character data contained in an element or to hold the value of an attribute.

Semantic Interpretation processors treat these objects in the following way:

  1. Properties specified in the _attributes object are rendered as XML attributes of the containing object.
  2. The value of _value is treated as character data content of the containing object or the value of an attribute if the containing object is a child of _attributes.

If the value of _value is not a scalar type, the ToString() operation is performed to generate a string value.

It is an error to transform an ECMAScript object into XML, that contains properties with names that are not allowed in XML. This can occur when a property name in an _attribute has a name that is not a legal name for an XML attribute.
Examples:

The following ECMAScript object:

{
   martini: {
      gin: {
         _value: "Bombay Sapphire",
         _attributes {
            ratio: 8
         }
      },
      vermouth: {
         _value: "Noilly Prat" ,
         _attributes {
            ratio: 1
         }
      },
      _attributes {
         method: "shaken"
      }
   }
}

would generate the following XML result:

...
<martini method="shaken">
   <gin ratio="8">Bombay Sapphire</gin>
   <vermouth ratio="1">Noilly Prat</vermouth>
</martini>
...

7.3 Namespaces

The object named _nsdecl is used to declare a namespace [XML Names] in an element. The property named _nsprefix enables the SI author to associate an XML element or attribute with a particular namespace.

When an object contains the _nsdecl property, the namespace declaration is attached to the resultant XML serialized element for this object. The _prefix property of _nsdecl indicates the namespace prefix and the _name property of _nsdecl indicates the corresponding namespace name (usually a URI reference). If the _prefix property is an empty string, the default namespace is declared. If both _prefix and _name are empty strings, the namespace declaration xmlns="" applies.

When an Array object contains the _nsprefix property, the prefix also applies to the automatically generated <item> elements and length and index attributes.

Note that this transformation produces an XML fragment - see [XML Names] for rules on valid namespace usage in XML.

Informative Note:
The _nsprefix can be used for example to generate XML attributes such as emma:hook or emma:tokens when generating XML fragments to be embedded in EMMA documents. See Appendix C of the [EMMA] specification for more information and examples. The namespace declaration with _nsdecl may not be needed when provided by the XML document in which the fragment will be embedded.
Examples:

The following ECMAScript object:

{
   drink: {
      _nsdecl: {
         _prefix:"n1",
         _name:"http://www.example.com/n1"
      },
      _nsprefix:"n1",
      liquid: {
         _nsdecl: {
             _prefix:"n2",
             _name:"http://www.example.com/n2"
         },
         _attributes: {
             color: {
                _nsprefix:"n2",
                _value:"black"
             }
         },
         _value:"coke"
      },
      size:"medium"
   }
}

would generate the following XML result:

<n1:drink xmlns:n1="http://www.example.com/n1">
   <liquid n2:color="black" xmlns:n2="http://www.example.com/n2">coke</liquid>
   <size>medium</size>
</n1:drink>

Note that the _nsprefix property only applies to its parent object and hence neither the <liquid> element nor the <size> element are associated with a namespace in this fragment.

8 Example Grammars with Semantic Interpretation Tags

8.1 Example 1

With the grammar illustrated below, the following utterance

"I would like a coca cola and three large pizzas with pepperoni and mushrooms."

would create the following Rule Variable on the rule order:

{
  drink: {
    liquid:"coke",
    drinksize:"medium"},
  pizza: {
    number: "3",
    pizzasize: "large",
    topping: [ "pepperoni", "mushrooms" ]
  }
}
XML Form
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN"
                  "http://www.w3.org/TR/speech-grammar/grammar.dtd">
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         version="1.0" mode="voice" tag-format="semantics/1.0" root="order">
   <rule id="order">
      I would like a
      <ruleref uri="#drink"/>
      <tag>out.drink = new Object(); out.drink.liquid=rules.drink.type;
           out.drink.drinksize=rules.drink.drinksize;</tag>
      and
      <ruleref uri="#pizza"/>
      <tag>out.pizza=rules.pizza;</tag>
   </rule>
   <rule id="kindofdrink">
      <one-of>
         <item>coke</item>
         <item>pepsi</item>
         <item>coca cola<tag>out="coke";</tag></item>
      </one-of>
   </rule>
   <rule id="foodsize">
      <tag>out="medium";</tag> <!-- "medium" is default if nothing said -->
      <item repeat="0-1">
         <one-of>
            <item>small<tag>out="small";</tag></item>
            <item>medium</item>
            <item>large<tag>out="large";</tag></item>
            <item>regular<tag>out="medium";</tag></item>
         </one-of>
      </item>
   </rule>
   <!-- Construct Array of toppings, return Array -->
   <rule id="tops">
      <tag>out=new Array;</tag>
      <ruleref uri="#top"/>
      <tag>out.push(rules.top);</tag>
      <item repeat="1-">
         and
         <ruleref uri="#top"/>
         <tag>out.push(rules.top);</tag>
      </item>
   </rule>
   <rule id="top">
      <one-of>
         <item>anchovies</item>
         <item>pepperoni</item>
         <item>mushroom<tag>out="mushrooms";</tag></item>
         <item>mushrooms</item>
      </one-of>
   </rule>
   <!-- Two properties (drinksize, type) on left hand side Rule Variable -->
   <rule id="drink">
      <ruleref uri="#foodsize"/>
      <ruleref uri="#kindofdrink"/>
      <tag>out.drinksize=rules.foodsize; out.type=rules.kindofdrink;</tag>
   </rule>
   <!-- Three properties on rules.pizza -->
   <rule id="pizza">
      <ruleref uri="#number"/>
      <ruleref uri="#foodsize"/>
      <tag>out.pizzasize=rules.foodsize; out.number=rules.number;</tag>
      pizzas with
      <ruleref uri="#tops"/>
      <tag>out.topping=rules.tops;</tag>
   </rule>
   <rule id="number">
      <one-of>
         <item>
            <tag>out=1;</tag>
            <one-of>
               <item>a</item>
               <item>one</item>
            </one-of>
         </item>
         <item>two<tag>out=2;</tag></item>
         <item>three<tag>out=3;</tag></item>
      </one-of>
   </rule>
</grammar>
ABNF Form
#ABNF 1.0 UTF-8;
language en;
mode voice;
tag-format <semantics/1.0>;
root $order;
$order = I would like a $drink {out.drink = new Object();
         out.drink.liquid = rules.drink.type;
         out.drink.drinksize = rules.drink.drinksize;}
         and $pizza {out.pizza=rules.pizza;};
$kindofdrink = coke | pepsi | "coca cola"{out="coke";};

// "medium" is default if nothing said
$foodsize = {out="medium";}
            [small {out="small";} | medium |
            large {out="large";}| regular {out="medium";}];

// Construct Array of toppings, return Array
$tops = {out=new Array;} $top {out.push(rules.top);}
        (and $top {out.push(rules.top);})<1->;
$top = anchovies | pepperoni | mushroom{out="mushrooms";} | mushrooms;

// Two properties (drinksize, type) on left hand side Rule Variable
$drink = $foodsize $kindofdrink
         {out.drinksize=rules.foodsize; out.type=rules.kindofdrink; };

// Three properties on rules.pizza's Rule Variable
$pizza = $number $foodsize
         {out.pizzasize=rules.foodsize; out.number=rules.number;} pizzas
         with $tops {out.topping=rules.tops;};
$number = (a | one){out="1";} | two{out="2";} | three{out="3";};

8.2 Example 2

The following grammar demonstrates the use of Semantic Interpretation for computation within a grammar.

This simple number grammar accepts as input whole numbers between 0 and 99,999 inclusive. It demonstrates how rule references may be reused multiple times and the returned SI information processed differently each time. The grammar also shows how the Rule Variable may be given a default value (0 in this case) and also used as an intermediate variable during computation (essentially incrementing the running total stored in the Rule Variable). In this example, the Rule Variable type is changed from an Object to a Number but an alternative strategy might just as easily store the number as a property of the Rule Variable object.

XML Form
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN"
                  "http://www.w3.org/TR/speech-grammar/grammar.dtd">
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         version="1.0" mode="voice" tag-format="semantics/1.0" root="main">
    <rule id="main">
        <one-of>
            <item>
              <ruleref uri="#sub_hundred_thousand"/>
              <tag>out = rules.sub_hundred_thousand;</tag>
            </item>
            <item>
              <ruleref uri="#sub_thousand"/>
              <tag>out = rules.sub_thousand;</tag>
            </item>
            <item>
              <ruleref uri="#sub_hundred"/>
              <tag>out = rules.sub_hundred;</tag>
            </item>
        </one-of>
    </rule>
    <rule id="sub_hundred_thousand">
        <ruleref uri="#sub_hundred"/>
        <tag>out = (1000 * rules.sub_hundred)</tag>
        thousand
        <item repeat="0-1">
            <item repeat="0-1">and</item>
            <ruleref uri="#sub_thousand"/><tag>out += rules.sub_thousand;</tag>
        </item>
    </rule>
    <rule id="sub_thousand">
        <ruleref uri="#sub_hundred"/>
        <tag>out = (100 * rules.sub_hundred);</tag>
        hundred
        <item repeat="0-1">
            <item repeat="0-1">and</item>
            <ruleref uri="#sub_hundred"/><tag>out += rules.sub_hundred;</tag>
        </item>
    </rule>
    <rule id="sub_hundred">
        <tag>out = 0;</tag>
        <one-of>
            <item>zero</item>
            <item><ruleref uri="#teens"/><tag>out += rules.teens;</tag></item>
            <item>
                <ruleref uri="#tens"/><tag>out += rules.tens;</tag>
                <item repeat="0-1">
                  <ruleref uri="#digit"/>
                  <tag>out += rules.digit;</tag>
                </item>
            </item>
            <item><ruleref uri="#digit"/><tag>out += rules.digit;</tag></item>
        </one-of>
    </rule>
    <rule id="tens">
        <one-of>
            <item>twenty<tag>out = 20;</tag></item>
            <item>thirty<tag>out = 30;</tag></item>
            <item>forty<tag>out = 40;</tag></item>
            <item>fifty<tag>out = 50;</tag></item>
            <item>sixty<tag>out = 60;</tag></item>
            <item>seventy<tag>out = 70;</tag></item>
            <item>eighty<tag>out = 80;</tag></item>
            <item>ninety<tag>out = 90;</tag></item>
        </one-of>

    </rule>
   <rule id="teens">
        <one-of>
            <item>ten<tag>out = 10;</tag></item>
            <item>eleven<tag>out = 11;</tag></item>
            <item>twelve<tag>out = 12;</tag></item>
            <item>thirteen<tag>out = 13;</tag></item>
            <item>fourteen<tag>out = 14;</tag></item>
            <item>fifteen<tag>out = 15;</tag></item>
            <item>sixteen<tag>out = 16;</tag></item>
            <item>seventeen<tag>out = 17;</tag></item>
            <item>eighteen<tag>out = 18;</tag></item>
            <item>nineteen<tag>out = 19;</tag></item>
        </one-of>
    </rule>
    <rule id="digit">
        <one-of>
            <item>one<tag>out = 1;</tag></item>
            <item>two<tag>out = 2;</tag></item>
            <item>three<tag>out = 3;</tag></item>
            <item>four<tag>out = 4;</tag></item>
            <item>five<tag>out = 5;</tag></item>
            <item>six<tag>out = 6;</tag></item>
            <item>seven<tag>out = 7;</tag></item>
            <item>eight<tag>out = 8;</tag></item>
            <item>nine<tag>out = 9;</tag></item>
        </one-of>
    </rule>
</grammar>
ABNF Form
#ABNF 1.0 UTF-8;
language en;
mode voice;
tag-format <semantics/1.0>;
root $main;
$main = $sub_hundred_thousand { out = rules.sub_hundred_thousand; } |
        $sub_thousand { out = rules.sub_thousand; } |
        $sub_hundred { out = rules.sub_hundred; };
$sub_hundred_thousand = $sub_hundred { out = (1000 * rules.sub_hundred); }
                        thousand
                        [ [and] $sub_thousand { out += rules.sub_thousand; } ];
$sub_thousand = $sub_hundred { out = (100 * rules.sub_hundred); } hundred
                [ [and] $sub_hundred { out += rules.sub_hundred; } ];
$sub_hundred = { out = 0; } (zero | $teens { out += rules.teens; } |
               $tens { out += rules.tens; }
               [ $digit { out += rules.digit; } ] |
               $digit { out += rules.digit; });
$tens = twenty { out = 20; } | thirty { out = 30; } | forty { out = 40; } |
        fifty { out = 50; } | sixty { out = 60; } | seventy { out = 70; } |
        eighty { out = 80; } | ninety { out = 90; };
$teens = ten { out = 10; } | eleven { out = 11; } | twelve { out = 12; } |
         thirteen { out = 13; } | fourteen { out = 14; } |
         fifteen { out = 15; } | sixteen { out = 16; } |
         seventeen { out = 17; } | eighteen { out = 18; } |
         nineteen { out = 19; };
$digit = one { out = 1; } | two { out = 2; } | three { out = 3; } |
         four { out = 4; } | five { out = 5; } | six { out = 6; } |
         seven { out = 7; } | eight { out = 8; } | nine { out = 9; };

A Conformance

This section is normative.

A.1 Conforming Semantic Interpretation Tags

A Semantic Interpretation Tag (SI Tag) is a Conforming SI Tag if its content matches the syntax as defined in the normative sections in this document.

There is no normative restriction on the size of a SI Tag.

A.2 Conforming Semantic Interpretation Grammars

A Conforming Semantic Interpretation Grammar is a stand-alone ABNF or XML Grammar Document or an XML Grammar Fragment where:

  1. The document or fragment is a conforming ABNF or XML document or XML fragment as defined by the conformance requirements in [SRGS].
  2. The tag-format [SRGS] for the grammar fragment or document is semantics/1.0 or semantics/1.0-literals.
  3. Every tag in the grammar document or fragment is a Conforming SI Tag.

A grammar that contains tags in a format other than specified by this document or its successors must have a tag format declaration with a value that is not beginning with the string semantics/x.y (where x and y are digits) (see Speech Recognition Grammar Specification 4.8 Tag Format Declaration [SRGS]).

A.3 Conforming Semantic Interpretation Processors

A Semantic Interpretation Processor is a program that can parse and process Conforming SI Tags to produce semantic results. Semantic Interpretation Processors are executed in a hosting environment (e.g. a grammar processor).

A Conforming Semantic Interpretation Processor:

  1. Must be capable of accepting and executing Conforming SI Tags.
  2. Should inform the hosting environment at the time it evaluates a Conforming SI Tag that causes a runtime error.
  3. Must inform the hosting environment when it encounters a non-conforming Semantic Interpretation Tag. A processor is free to inform the hosting environment of such a non-conforming tag any time between loading the non-conforming SI Tag and evaluating the offending language construct in the non-conforming SI Tag. There is no requirement for a processor to continue processing after encountering a non-conforming tag.

A.4 Conforming Semantic Interpretation Grammar Processors

A Semantic Interpretation Grammar Processor is a system that can parse and process Conforming Semantic Interpretation Grammars. Specifically, a Semantic Interpretation Grammar Processor is a conforming processor if:

  1. It is a conforming ABNF or XML Grammar Processor as defined in the Speech Recognition Grammar Specification [SRGS].
  2. It is a conforming Semantic Interpretation Processor.

A.5 Conformance Statements

A.5.1 Conformance Statement for Conforming Documents

Anyone wishing to state conformance of a Grammar Fragment or Grammar Document with SI Tags (document) to this specification should use the following wording:

This document conforms to W3C's "Semantic Interpretation for Speech Recognition", available at http://www.w3.org/TR/2006/CR-semantic-interpretation-20060111/.

A.5.2 Conformance Statement for Conforming Processors

Anyone wishing to state conformance of a processor to this specification should use the following wording:

[PROCESSOR] is a Conforming [ (1) ABNF, (2) XML, (3) ABNF and XML ] Semantic Interpretation Grammar Processor according to W3C's "Semantic Interpretation for Speech Recognition", available at http://www.w3.org/TR/2006/CR-semantic-interpretation-20060111/ [with support for XML Transformation].

Make the appropriate substitutions:

B Glossary

ABNF
Augmented BNF, a syntax used for specifying Speech Recognition Grammars (defined in [SRGS]).
ASR (Automatic Speech Recognition)
The process of using an automatic computation algorithm to analyze spoken utterances to determine what words and phrases were present.
ECMA
Ecma International (see [ECMA]) is an industry association founded in 1961, dedicated to the standardization of information and communication systems. ECMAScript is a standard published by ECMA
ECMA Compact Profile
ECMAScript Compact Profile (see [ECMA-327]) is a subset of ECMAScript 3rd Edition tailored to resource-constrained devices such as battery-powered embedded devices.
ECMAScript
See Script.
Grammar
Shorthand for Speech Recognition Grammar.
Grammar Document
An XML or ABNF Document Grammar Document as defined in sections 5.2 and 5.5 of [SRGS].
Grammar Fragment
An XML Fragment as defined in section 5.1 of [SRGS].
Hosting environment
The Grammar processor or VoiceXML processor or other computer program that contains a processor for Semantic Interpretation
Logical Parse Structure
A representation of a parse as a hierarchical structure. See section 6.1
Parse
Noun (1): A structured representation of the (possible) application of Grammar Rules to the sequence of Tokens in an utterance. See section 6 for definition of Parse structure and Parse list in this specification.
Noun (2): A structured representation of the contents of a document by analyzing the stream of characters against the defined model for the document.
Verb: The process of creating a Parse.
Parse List (Flat Parse List)
A representation of a parse as a linear sequence of applied rules. See section 6.2
Rule (Grammar Rule)
A Rule Definition describes the composition of a possible utterance in terms of other Rule Definitions and Tokens. See details in section 3.1 of [SRGS].
Script (ECMAScript)
A computer program listing the instructions to be executed. In SI, scripts are written in the ECMAScript programming language. (See [ECMA-262])
Semantic Interpretation
A process to produce a Semantic Result representing the meaning of a natural language utterance.
Semantic Result or Semantic Value
A computer processable representation of the information (the meaning, or "semantics") contained in a user input. In the context of this specification the user input is a natural language utterances. A Semantic Result is used here in the relatively narrow sense of representing the information that is relevant to the application that is intended to process it, typically using ad-hoc conventions for the representation. See section 1.1.
Speech Recognizer
A program or device that performs Automatic Speech Recognition
Speech Recognition Grammar
A description of the candidate words and phrases for use by a Speech Recognizer. Speech Recognition Grammars for use with this specification are defined in [SRGS], a standardized format for context-free grammars.
SRGS
Speech Recognition Grammar Specification for the W3C Speech Interface Framework. See [SRGS]
String Literal
A sequence of zero or more characters. String Literals in this specification are defined in section 3.2.3.
Token
A token (a.k.a. a terminal symbol) is the part of a Grammar that defines words or other entities that may be spoken (see section 2 of [SRGS]).
VoiceXML
VoiceXML is markup language designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. VoiceXML is part of the W3C Speech Interface Framework. See [VOICEXML20].
XML
A simple dialect of SGML intended to enable generic SGML to be served, received, and processed on the Web. See W3C Glossary for XML.

C Normative References

ECMA
ECMA International - Standardizing Information and Communication Systems, http://www.ecma-international.org/ .
ECMA-262
Standard ECMA-262, 3rd Edition, December 1999, http://www.ecma-international.org/publications/standards/Ecma-262.htm .
EMMA
EMMA: Extensible MultiModal Annotation Markup Language, M. Johnston, W. Chou, D. A. Dahl, G. McCobb, D. Raggett, Editors, W3C Working Draft, 16 September 2005, http://www.w3.org/TR/2005/WD-emma-20050916/ . Latest version available at http://www.w3.org/TR/emma/ .
ECMA-327
Standard ECMA-327, 3rd Edition Compact Profile, June 2001, http://www.ecma-international.org/publications/standards/Ecma-327.htm .
N-GRAMS
Stochastic Language Models (N-Gram) Specification, N. K. Brown, A. Kellner, D. Raggett, Editors. W3C Working Draft (work in progress), 3 January 2001, http://www.w3.org/TR/2001/WD-ngram-spec-20010103/ . Latest version available at http://www.w3.org/TR/ngram-spec/ .
MMI
W3C Multimodal Interaction Activity, http://www.w3.org/2002/mmi/ .
MMI-FRAMEWORK
W3C Multimodal Interaction Framework , T. V. Raman, D. Raggett, J. , Editors, W3C Working Group Note, 6 May 2003, http://www.w3.org/TR/2003/NOTE-mmi-framework-20030506/ . Latest version available at http://www.w3.org/TR/mmi-framework/ .
RFC2119
Key words for use in RFCs to Indicate Requirement Levels , IETF RFC 2119, March 1997. http://www.ietf.org/rfc/rfc2119.txt .
SRGS
Speech Recognition Grammar Specification Version 1.0 , A. Hunt, S. McGlashan, Editors, W3C Recommendation, 16 March 2004, http://www.w3.org/TR/2004/REC-speech-grammar-20040316/ . Latest version available at http://www.w3.org/TR/speech-grammar/ .
VBWG
W3C Voice Browser Activity, http://www.w3.org/Voice/ .
VOICEXML20
Voice Extensible Markup Language (VoiceXML) Version 2.0, J. Ferrans, B. Lucas, K. G. Rehor, B. Porter, A. Hunt, S. McGlashan, S. Tryphonas, D. C. Burnett, J. Carter, P. Danielsen, Editors, W3C Recommendation, 16 March 2004, http://www.w3.org/TR/2004/REC-voicexml20-20040316/ . Latest version available at http://www.w3.org/TR/voicexml20/ .
XML-NAMES
Namespaces in XML, T. Bray, D. Hollander, A. Layman, Editors, W3C Recommendation, 14 January 1999, http://www.w3.org/TR/1999/REC-xml-names-19990114/ . Latest version available at http://www.w3.org/TR/REC-xml-names/ .

D Acknowledgments

This document was written with the participation of members of the W3C Voice Browser Working Group [VBWG]. The following have significantly contributed to writing this specification:

E Summary of Changes Since the Last Call Working Draft

The following is a summary of the major changes since the Last Call Working Draft was published on November 8, 2004, based on input from reviewers and the working group:

Valid XHTML 1.0!