W3C

Semantic Interpretation for Speech Recognition

W3C Working Draft 1 April 2003

This version:
http://www.w3.org/TR/2003/WD-semantic-interpretation-20030401/
Latest version:
http://www.w3.org/TR/semantic-interpretation/
Previous version:
http://www.w3.org/TR/2001/WD-semantic-interpretation-20011116/
Editors:
Luc Van Tichelen, ScanSoft

Abstract

This document defines the process of Semantic Interpretation for Speech Recognition and the syntax and semantics of semantic interpretation tags that can be added to speech recognition grammars to compute information to return to an application on the basis of rules and tokens that were matched by the speech recognizer. In particular, it defines the syntax and semantics of the contents of Tags in the Speech Recognition Grammar Specification.

Semantic Interpretation may be useful in combination with other specifications, such as the Stochastic Language Models (N-Gram) Specification, but their use with N-grams has not yet been studied.

The results of semantic interpretation are describing the meaning of a natural language utterance. The current specification represents this information as an EcmaScript object, and defines a mechanism to serialize the result into XML. The W3C Multimodal Interaction Activity is defining a data format (EMMA) for representing information contained in user utterances, and has published the requirements for this data format (EMMA Requirements). It is believed that semantic interpretation will be able to produce results that can be included in EMMA.

Status of this document

This document is a public W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current public W3C Working Drafts can be found at http://www.w3.org/TR.

This specification describes the syntax and semantics for semantic interpretation tags in speech recognition grammars, and forms part of the proposals for the W3C Speech Interface Framework. It is intended to be used with Speech Recognition grammars as defined in Speech Recognition Grammar Specification.

This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only).

Patent disclosures relevant to this specification may be found on the Working Group's patent disclosure page in conformance with W3C policy.

This document is for public review, and comments and discussion are welcomed on the public mailing list <w3c-voice@w3.org>. Note as a precaution against spam, you should first subscribe to this list by sending an email to <www-voice-request@w3.org> with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe). The archive for the list is accessible online.

The working group's intention is to advance this specification to last call Working Draft during the 2nd quarter of 2003 (see Work Items of the Voice Browser Activity). Reviewers are encouraged to send their comments on this working draft before 2 May 2003.

Table of contents


1. Introduction

This section is informative.

1.1. Semantic Results

Grammar Processors, and in particular speech recognizers, use a grammar that defines the words and sequences of words to define the input language that they can accept. The major task of a grammar processor consists of finding the sequence of words described by the grammar that (best) matches a given utterance, or to report that no such sequence exists.

In an application, knowing the sequence of words that were uttered is sometimes interesting but often not the most practical way of handling the information that is presented in the user utterance. What is needed is a computer processable representation of the information, the semantic result, more than a natural language transcript.

Semantic Interpretation Tags provide a means to attach instructions for the computation of such semantic results to a speech recognition grammar.

When used with a VoiceXML Processor, it is expected that a Semantic Interpretation Tag Processor will convert the result generated by an SRGS speech grammar processor into an ECMAScript object that can then be processed as specified in the VoiceXML 2.0 specification section 3.1.6 Mapping Semantic Interpretation Results to VoiceXML forms.

The W3C Multimodal Interaction working group is defining a data format (EMMA) for the representation of information contained in the user's input (a spoken utterance or other forms of input available through the modalities in the interaction). It is expected that Semantic Interpretation for Speech Recognition will be generating results that can be integrated into EMMA.

This document defines the syntax and the semantics of Semantic Interpretation Tags for use with the Speech Recognition Grammar Specification.

It is possible that Semantic Interpretation Tags as defined here can be used also with the N-Gram Specification, but the current specification does not specifically address such use and does not guarantee that the Semantic Interpretation Tags as defined here are meeting the needs of such use.

1.2. Basic Principles

The basic principles for the Semantic Interpretation mechanism defined in this specification are the following:

1.3. ECMAScript Compact Profile

While there was no explicit requirements document created for the properties of a semantic interpretation syntax, the working group gradually learned that there are some conflicting desires to be met.

Certainly, the Semantic Interpretation Tags must be easy to use by developers, and it should minimally provide the expressive power that is needed for the majority of applications. ECMAScript (ECMA-262) would meet these requirements.

On the other hand, there are concerns on performance and other implications from using ECMAScript (such as variable scoping, platform access, etc.).

The ECMAScript Compact Profile (ECMA 327) is a strict subset of the third edition of ECMA-262. It has been designed to meet the needs of resource-constrained environments. Special attention has been paid to constraining ECMAScript features that require proportionately large amounts of system memory, and continuous or proportionately large amounts of processing power. In particular, it is designed to facilitate prior compilation for execution in a lightweight environment. This makes it attractive for use in association with speech grammar rules for extracting semantic results from speech recognition.

2. Normative References and Conformance

2.1. Normative References

This document normatively references the ECMA-327 Standard "ECMAScript 3rd Edition Compact Profile", June 2001, further referenced as ES-CP.

The ES-CP itself references the ECMA-262 Standard "ECMAScript Language Specification", 3rd Edition - December 1999.

For informative purposes, some text from the ECMA-262 has been copied in this document. Where that is done, unless otherwise specified, such text should be considered informative and the corresponding reference to the ECMA-262 standard is normative.

All sections in this specification are normative, unless otherwise indicated.

2.2. Notational Conventions

Throughout the specification following abbreviations will be used:

Abbreviation Description
GRN Grammar Rule Name, the rule name that is in the left-hand side of a speech grammar rule definition.
GRR Grammar Rule Reference, a rule reference in the rule expansion that is in the right-hand side of a speech grammar rule definition.
ES n Shorthand notation for ECMA-262 Section number. n
SI Semantic Interpretation.

This specification uses the notational conventions for Syntactic and Lexical Grammars as given in ES 5.1, and the same Algorithm Conventions as in ES 5.2.

3. Expressions in Semantic Interpretation Tags

3.1. Attributes and Semantic Values

Semantic Interpretation Tags compute semantic values. During the semantic interpretation processing, semantic values are stored in attributes that are associated with the rules in the grammar.

Every rule always has exactly one attribute that holds a semantic value. The attribute is not named, but is referred to by referring to the name of the rule as defined in section 3.3.1.

During semantic interpretation evaluation, attribute values of a rule that is referenced (Grammar Rule Reference, or GRR) can be evaluated to compute the value for the attribute of the rule that is named in the grammar rule definition (Grammar Rule Name, or GRN).

Attributes hold ECMAScript values. Attributes that have not been assigned a value are Undefined.

Next to the attribute, every GRR also has an associated text variable of type string, which holds the substring (series of tokens) in the utterance that is governed by that GRR. Text variables are not part of the GRR's attribute and can not be modified.

The semantic result for an utterance is the final computed value of the attribute of the rule that was activated by the application. It is outside the scope of this specification to define how the semantic result is communicated to the application.

Informative Note:

In the context of the W3C Voice Browser architecture, the semantic result will directly be cast into ECMAScript variables in the VoiceXML interpreter (see VoiceXML2.0 section 3.1.6. Mapping Semantic Interpretation Results to VoiceXML forms).

In the W3C Multimodal architecure, the semantic result is expected to be transformed into EMMA following the mechanism described in section 7.

In other contexts, the mechanism described in section 7 can be used to transform the semantic result into other XML formats.

3.2. Semantic Interpretation Tags

Semantic Interpretation Tags exist in two forms: Semantic Interpretation Scripts and Semantic Interpretation Literals.

3.2.1 Semantic Interpretation Scripts

A Semantic Interpretation Script holds a string that is treated as the source text of a valid ES-CP Program (with Program as defined by ES14)

The environment in which SI tags are embedded may introduce escaped characters, character references or other markup that has to be resolved by the environment. The result after resolution is treated as ES-CP.

Informative:

Semantic Interpretation Scripts are added in the string content of the tag elements in the grammar rule expansion, as described in Section 2.6 Tags of the Speech Recognition Grammar Specification. This specification further uses the term Semantic Interpretation Tag (or SI Tag) to refer to such tag.

Below are two example formats of SI Tags in the Speech Recognition Grammar Specification.

ABNF Form

In the ABNF grammar format, SI Tags are enclosed in curly braces or in the three-character sequences '{!{' and '}!}'.

ABNFSemanticTag :
{}
{ EcmaScript-327 }
{!{ EcmaScript-327 }!} 

XML Form

In the XML grammar format, SI Tags are specified as the content of the tag element.

XMLSemanticTag:
<tag/>
<tag> </tag>
<tag> EcmaScript-327 </tag>

3.2.2. Semantic Interpretation Literals

A Semantic Interpretation Literal (SI Literal) is a sequence of zero or more characters that is attached to a rule expansion. If the character sequence is not empty, it has to follow either the DoubleStringCharacters or the SingleStringCharacters production of ES 7.8.4

Attaching a SI Literal is equivalent to enclosing the expansion by an expansion container and adding an SI Tag element that assigns the String Literal to the GRN as last expansion in this container.

Note: This description does not disallow to have both a SI Literal and an SI tag element within its attached content -- instead it defines behavior that is independent on whether the contained element or expression contains other SI elements or SI literals. The description would lead to any contained SI Literal or SI Script to be overwritten by the SI Literal.

Any legal rule expansion that has an attached SI Literal is itself a legal rule expansion. Both the ABNF Form and the XML Form permit a legal SI Literal to be attached to any token, rule reference, sequence or set of alternatives.

The syntax for the ABNF Form and for the XML Form are provided below.

ABNF Form

In the ABNF Form an SI Literal may be right-attached to any legal rule expansion. The attachment consists of a colon (':') followed immediately by an SI Literal delimited by single or double quotes. There must be no white space between the colon and the first quote.

If the SI Literal contains a single quote, i.e. if it follows the DoubleStringCharacters production of ES 7.8.4, it must be delimited by double quotes. If the SI Literal contains a double quote, i.e. if it follows the SingleStringCharacters production of ES 7.8.4, it must be delimited by single quotes.

The SI Literal has higher precedence than sequences or alternatives. To attach a SI Literal to these rule expansion types the expansion should be delimited by parentheses.

The SI Literal has the same precedence than the repeat operator and language attachment.

#ABNF 1.0 ISO-8859-1;
language en-US;

// SI Literal attachment to tokens
$yes = yes | yeah:"yes" | "you bet":'yes' | "oui"!fr-CA:"yes";

// SI Literal attachment to a sequence container
$no = (no | nope | no way):"no";

public $answer = $yes | $no;

The SI Literals in the above grammar are equivalent to the SI Scripts in the SI Tags in the grammar below:

#ABNF 1.0 ISO-8859-1;
language en-US;

// SI Literal attachment to tokens
$yes = yes | (yeah {$="yes"}) | ("you bet" {$="yes"}) | ("oui"!fr-CA {$="yes"});

// SI Literal attachment to a sequence container
$no = ((no | nope | no way) {$="no"});

public $answer = $yes | $no;
XML Form

In the XML form an SI Literal may be attached to a rule expansion by providing a tag attribute on any one-of , token , ruleref or item element.

According to the rules of XML 1.0 an attribute value must not contain the quotation mark used as a delimiter for that value. Therefore, if the SI Literal contains a single quote, i.e. if it follows the DoubleStringCharacters production of ES 7.8.4, it must be delimited by double quotes. If the SI Literal contains a double quote, i.e. if it follows the SingleStringCharacters production of ES 7.8.4, it must be delimited by single quotes.

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN"
                  "http://www.w3.org/TR/speech-grammar/grammar.dtd">
 
<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar 
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en-US" version="1.0">
 
  <!-- 
     SI Literal attachment to tokens and items
  -->
  <rule id="yes">
    <one-of>
      <item>yes</item>
      <item tag="yes">yeah</item>
      <item> <token tag='yes'>you bet</token></item>
      <item tag="yes" xml:lang="fr-CA">oui</item>
    </one-of> 
  </rule> 
  
  <!-- SI Literal attachment to one-of -->
  <rule id="no">
    <one-of tag="no">
      <item>no</item>
      <item>nope</item>
      <item>no way</item>
    </one-of>
  </rule>
  

  <rule id="answer" scope="public">  
    <one-of>
      <item> <ruleref uri="#yes"/> </item>
      <item> <ruleref uri="#no"/> </item>
    </one-of>
  </rule>
</grammar>

3.3. Syntax for semantic attributes

3.3.1. Syntax for attributes of Grammar Rule Names (GRN)

The GRN's attribute can both be evaluated and assigned to.

It is identified by a dollar sign $.

Properties of the GRN's attribute can be individually accessed by $.propertyname

Examples

$       identifies the GRN attribute
$.pizza identifies the pizza property of the GRN attribute

A GRN cannot be a reserved word nor can literals in its name be a reserved word. A GRN cannot be a ReadOnly property, but parts of it can.

Examples

$.for   illegal reference to the for property of the GRN attribute

$a      illegal: $a is a  ReadOnly property
$.$a    legal reference to the $a property of the GRN attribute
Issue-1:

As in ECMA-327 assignment to read-only object properties are ignored. Note that as a consequence, attempts to assign to a GRR or the text variable of the GRR or GRN will be ignored. The group is interested in reviewer feedback on this issue.

3.3.2. Syntax for attributes of Grammar Rule References (GRR)

The GRR's attributes and properties are ReadOnly and can only be evaluated, not assigned to.

A GRR's attribute is identified by $Rulename, where Rulename is the name of the GRR.

Individual properties of a GRR attribute can be identified by $Rulename.Identifier, where Rulename is the name of the GRR and Identifier is the name of the property.

A GRR's attribute and properties can only be referenced in SI Tags that appear after (to the right or below) the GRR in the grammar expansion, and only if the GRR was used in the expansion that matched the input utterance. See visibility rules in section 6 for a more detailed description of when GRR can be referenced in SI Tags, using the concept of the logical parse structure and the flat parse list.

A GRR can also be referenced through $$, which denotes the last GRR that was used in the expansion matching the utterance but still before to the tag (i.e. the last element of type rule reference before the tag in the flat parse, see section 6.2).

In an expression, both the GRR's and GRN's attributes and properties can be evaluated.

Special Rules (NULL, VOID, GARBAGE) can not be evaluated.

Examples

$             the attribute of the GRN
$.prop        the property prop of the GRN's attribute

$rname        the attribute of GRR $rname
$rname.prop   the property prop of the attribute of GRR $rname


$$            $$ is shorthand for the closest matching GRR before the SI Tag
$$.prop       Property prop of closest matching GRR before the SI Tag

3.3.3. Syntax for text variables associated with GRRs

A GRR's text variable is identified by $Rulename$.text, where Rulename is the name of the GRR.

The text variable of $$ is identified by $$$.text.

$rname$.text  the text variable of the GRR $rname
$$$.text      the text variable of the closest matching GRR before the SI Tag
Issue-2:

It is possible to extend the proposal to other built-in variables than the text variable, such as the list of tokens, the unmodified string, the confidence values, etc. This is left for a future version, if any.

Issue-3:

The working group discussed following alternative that provides access to the associated text variable through the GRN instead of the GRR:

A GRN's (read-only) text variable is identified by _$.text.

_$.text  identifies the GRN's (read-only) text variable 

The working group will make a choice between associating the text variables to either the GRN or the GRR before advancing this specification, and welcomes reviewer feedback on this alternative.

Issue-4:

The working group has some concern that the use of the dollar sign is overloaded, and is considering following alternative notation:

A GRR's text variable is identified by _$Rulename.text, where Rulename is the name of the GRR.

The text variable of $$ is identified by _$$.text.

_$rname.text  the text variable of the GRR $rname
_$$.text      the text variable of the closest matching GRR before the SI Tag

4. Semantic Interpretation Grammars

4.1. Semantic Interpretation Grammars

This specification defines a Semantic Interpretation Grammar to be a Speech Recognition Grammar as defined by SRGS that

4.2. Extensions to the SRGS Elements

4.2.1. Tokens

Tokens are defined in section 2.1 of the Speech Grammar Format Specification, and are extended with an optional tag attribute.

When attached to a token the SI Literal is equivalent to enclosing the token in a sequence container and adding a SI Tag assigning the literal to the GRN value as last expansion of this sequence.

ABNF Form

A SI Literal may be attached to any token.

Appendix A.2. ABNF Syntax for Semantic Interpretation Grammars normatively defines the token parsing behavior.

XML Form

The token element may include an optional SI Literal attribute.

4.2.2. Rule References

Rule References are defined in section 2.2 of the Speech Grammar Format Specification, and are extended with an optional tag attribute.

When attached to a rule reference the SI Literal is equivalent to enclosing the rule reference in a sequence container and adding a SI Tag assigning the literal to the GRN value as last expansion of this sequence.

ABNF Form

A SI Literal may be attached to any rule reference.

XML Form

The ruleref element may include an optional SI Literal attribute.

4.2.3. Sequences

Sequences are defined in section 2.3 of the Speech Grammar Format Specification, and are extended with an optional tag attribute.

When attached to a sequence the SI Literal is equivalent to enclosing the sequence in a sequence container and adding a SI Tag assigning the literal to the GRN value as last expansion of this sequence.

ABNF Form

A SI Literal may be attached to any sequence container (parentheses).

The SI Literal attachment has a higher precedence than sequences. To attach an SI Literal to alternatives the expansion should be enclosed in a sequence container.

XML Form

The item element may include an optional SI Literal attribute.

4.2.4. Alternatives

Alternatives are defined in section 2.4 of the Speech Grammar Format Specification, and are extended with an optional tag attribute.

ABNF Form

The SI Literal attachment has a higher precedence than alternatives. To attach an SI Literal to alternatives the expansion should be enclosed in a sequence container.

XML Form

The one-of element may include an optional SI Literal attribute.

When attached to a one-of the SI Literal is equivalent to enclosing the alternatives in a sequence container and adding a SI Tag assigning the literal to the GRN value as last expansion of this sequence.

4.2.5. Precedence

This section extends the precedence rules of the ABNF rule expansion syntax given in the Section 2.8 of the SRGS. Because XML documents explicitly indicate structure there is no ambiguity and thus a precedence definition is not required.

ABNF Form

The precendence rules for grammars with semantic interpretation are the same as those defined in SRGS, with the addition that the SI Literal attachment has the same precedence level as Repeat operator (e.g. "<0-1>") and language attachment (e.g. "!en-AU").

4.3. Global Variable Declarations and Initialization

The header of an SRGS grammar may contain one or more SI Tags that are executed before any of the SI Tags in the matching grammar rules are evaluated. There are no ordering constraints between SI Tags and other valid SRGS grammar header items (section 4.1 of SRGS)

The SI Tags are evaluated only once, in a global scope that will be shared by all evaluations (see 6.3.)

Whereas all evaluations for SI Tags in flat parse lists for matching rules have access to the global scope for reading only, the SI Tags in the grammar header have write access to the global scope. This is the primary function of these tags: to initialize the global scope for use in the SI Tags.

ABNF Form

In the ABNF Form, global SI Tags are SI Tags followed by a semicolon, that appear outside all rules in the grammar header, before the first rule. Both tag delimiting syntaxes can be used.

Example:

#ABNF 1.0;
language en-US;
{var x=1};
{!{var y='low{1}';}!};
$rule = . . .;
XML Form

In the XML Form, global SI Tags are SI Tags that appear outside all rules in the grammar header, before the first rule.

Example:

<grammar xml:lang="en-US">
  <tag>var x=1</tag>
  <tag>var y='low{1}';</tag>
...
  <rule id="rule">. . .</rule>
</grammar>

5. Default Assignment

If there are no Semantic Interpretation Tags attached to any of the expansions in the grammar rule expansion that were used to match the utterance, then the value for the text variable of the GRN is also automatically copied into the GRN's attribute (which then becomes of type string).

Examples:

$drink = coke | pepsi | coca cola;

no tags: $drink is either "coke", "pepsi" or "coca cola", and so is $drink$.text

$drink = coke | pepsi | "coca cola":"coke";

SI Literal on one alternative:
now both coke and coca cola result in "coke" on $drink
$drink$.text is still returning coke, pepsi and coca cola

$drink = I want to have a {$.action = "order"}
   (coke | pepsi | "coca cola":"coke");

possibly flawed rule: returns either $.action="order", or "coke" if the utterance contained coca cola. If there is at least one tag in the expansions used, then the default mechanism doesn't work. This means coke or pepsi are never automatically assigned to $drink in this rule.

6. Visibility Rules and order of tag evaluation for ABNF/XML Speech Recognition Grammar Format

This section defines the visibility rules and order of tag evaluation for SI Tags used in the Speech Recognition Grammar Format (ABNF and XML Form). When SI Tags are embedded in other markup languages (e.g. in ngrams), the visibility rules and order of evaluation may be defined differently.

6.1. Logical Parse Structure

The visibility rules and the order of evaluation of semantic interpretation tags are defined in terms of the logical parse structure as defined in Appendix H. Logical Parse Structure of the Speech Recognition Grammar Specification.

Note that while this appendix is informative for the Speech Recognition Grammar Specification, it is normative for the Semantic Interpretation specification. This does not imply that grammar processors must implement a logical parse structure, nor that ambiguities or recursion should be handled in any specific way over what is required for a conformant speech recognition grammar processor. The Logical parse structure is only a means to illustrate the order of evaluation and visibility rules for Semantic Interpretation Tags. Implementations are not required to expose the logical structure and may use different internal representation as long as these yield the results described here.

The Logical Parse Structure is a formal syntax for describing the sequence and relation of tags and rule references to the tokens that are input to the grammar processor.

The Logical Parse output is represented as an array of output entities en, e.g. [e1, e2, e3].

Output entities can be one out of three kinds:

Appendix H of the Speech Recognition Grammar Specification contains a full description of how to create the logical parse on a grammar for a given input to a grammar processor.

For the purpose of building the logical parse, all SI Literals are assumed to be converted into the equivalent SI Tag element as defined in 3.2.2.

Example

The sentence "turn the heating off" on the following grammar (in ABNF Form)

root $command;
$command = (set | turn) $object $state {$.o=$object;$.s=$state};
$state = (to|$NULL) (on:"1" | off:"0" | warm:"w" | cool:"c" | cold:"c");
$object = [the] (heating | cooling):"airco" | radio:"radio" | lights:"lights");

would result in the logical parse

[$command [turn,
           $object [the,
                    heating,
                    {$="airco"}],
           $state  [off,
                    {$="0"}],
           {$.o=$object,$.s=$state}]
]

6.2. Flat Parse List

The logical parse structure is a tree-like structure that shows all terminals, tags and rule references governed by a given rule. This tree can also be represented in a flattened list of parses for every grammar rule application.

The flat parse list for a given rule application is represented as:

The output elements are as in the logical parse structure, except that rule references are represented without an output array but followed by a sequence number in parenthesis.

Examples

The equivalent flat parse lists for the above example are:

$command(1): turn, $object(1), $state(1), {$.o=$object;$.s=$state}

$object(1): the, heating, {$="airco"}

$state(1): off, {$="0"}

Following example illustrates the use of the sequence number for rules that are applied more than once:

root $a;
$a = ($b)<1-> $c (t1)<0-1> {tag1} | $d {tag2};
$b = t2 | t3 {tag3} | t4;
$c = (t5 {tag5})<1-2>;
$d = t6 $c;

Given the input "t2 t3 t5 t5", the logical parse structure is:

[$a[ $b[t2], $b[t3, {tag3}],$c[t5, {tag5}, t5, {tag5}],{tag1}]

Flat parse lists per rule application:

$a: $b(1), $b(2), $c(1), {tag1}
$b(1): t2
$b(2): t3, {tag3}
$c(1): t5, {tag5}, t5, {tag5}

6.3. Scoping and Visibility Rules

These scoping and visibility rules are defined on the basis of the flat parse list as specified in section 6.2.
A flat parse list consists of terminals, tags, and rule references. Each rule reference in a flat parse list defines the beginning of another flat parse list, which again may consist of terminals, tags and rule references.

The Global Scope

Semantic interpretation takes the recognition result and computes the semantic result according to the information given in the semantic tags. To do so a global anonymous ECMA-262 scope is created that is read only for scripts. Before executing the sentence, the global scope is initialized with the global members visible in the semantic actions of all tags by means of global tags in the grammar header (see 4.3.)

Order of Tag Execution

The tags within a flat parse list are executed in the order in which they appear , left to right.

Scope Chains and Access to Variables

For each flat parse list, including embedded parse lists defined by rule references, a new anonymous ECMA scope is created that is a direct child of the global scope object. The ECMA scope chains thus always have the global scope (the scope of the whole parse) as top-level object, and the scope belonging to the parse list as successor.

Access to variables in tag executions are resolved with the scope chain according to the ECMA script rules. (Cf. to ES 10.1.4)

The variables object according to ECMA-262 is the scope object created for this rule. This means that local variables that are defined in tags belonging to a rule reference are created in the scope object that was created for this rule. Confer below for global variable handling.

Before the first tag is executed, the environment of a new scope is set up in the following way:

When execution of the referenced rule is finished, the scope object of this rule is removed from the scope chain and the scope belonging to the embedding parse list is updated in the following way:

When any of these variables already existed, they are overwritten. All these variables are read-only.

Note: Whether or not the $, $rulename and $rulename$.text variables are enumerated when enumerating the scope object is not defined by this specification and may vary over implementations. Authors are discouraged to use enumeration of the scope object.

Visibility

The consequences of these scoping rules are:

Global Variables

Since the global scope is read only, assignments to global variables are not allowed in SI Tags in rules. They are only possible in the global SI Tags in the grammar header (see 4.3.)

Examples

The following rule contains two GRRs to the same rule ($city).

$fromto = from $city {fromcity=$city.name} to $city {tocity=$city$.text};

To determine which of the GRR instances the tags refer to, we can build the flat parse for $fromto, which is always of the form:

$fromto: from, $city(1), {$.fromcity=$city.name}, to, $city(2), {$.tocity=$city$.text}

From this it follows that $city.name in the first tag refers to the first GRR $city in the rule, and that the reference to $city$.text in the second tag is to the second GRR named $city.

In the following rule, the flat parse is depending on whether the input matches the optional GRR $b:

$a = $b [$b] {$.x=$b.x};

The two possible flat parses are:

$a: $b(1), {$.x=$b.x}
$a: $b(1), $b(2), {$.x=$b.x}

The reference $b.x in the tag will thus refer to either the first or the last $b in the rule, depending on whether the optional rule $b was matched in the input.

The SI Tag in the rule below contains a couple of references to GRRs that are undefined since there is no GRR with that name before the tag in the flat parse:

$a = $b [$c] {$.x=$c; $.y=$d; $.z=$e} $e;

The two possible flat parses are:

$a: $b(1), {$.x=$c; $.y=$d; $.z=$e}, $e(1)
$a: $b(1), $c(1), {$.x=$c; $.y=$d; $.z=$e}, $e(1)

This means that:

$.x is undefined if $c didn't match in the utterance
$.y is undefined because $d is not in the rule expansion at all
$.z is undefined because $e doesn't appear before the tag

6.4. Order of tag execution

Within a single SI Tag, the order of evaluation is determined by ES-CP for the evaluation of a valid ES-CP Program (ES14)

All global SI Tags (in tags in the grammar header) are executed once, before any SI Tags within a grammar rule are executed (see 4.3.).

The order of evaluating multiple SI Tags within a grammar rule is the order in which the SI Tags appear in the flat parse list for that rule application. The flat parse list also determines how many SI elements will be generated from an SI tag that occurs in a grammar rule. Every SI Tag element in a flat parse list is evaluated exactly once. The order of evaluating SI Literals is determined by the order in which the equivalent SI Tag appears in the flat parse list (see 6.2.).

The computation of the semantic value of a rule reference in a flat parse list may occur at any time during the processing of the entire logical parse structure, subject to the following condition: the semantic value of a rule reference must be computed before any SI tag using that reference's value is processed.

Example

Given the following rules

$a = $b  {$.y=$b.x} [$b {$.y=$.y+$b.x}];
$b = foo {$.x=1} (bar {$.x=3} | (boo {$.x=$.x+1})<1->);

The value of $.y in rule $a for a few input sentences with these rules:

input: foo boo boo boo
flat parses:
$a: $b(1), {$.y=$b.x}
$b(1): foo, {$.x=1}, boo, {$.x=$.x+1}, boo, {$.x=$.x+1}, boo, {$.x=$.x+1}
Result:  $.y = 4
input: foo bar foo boo
flat parses:
$a: $b(1), {$.y=$b.x}, $b(2), {$.y=$.y+$b.x}
$b(1): foo, {$.x=1}, bar, {$.x=3}
$b(2): foo, {$.x=1}, boo, {$.x=$.x+1}
Result: $.y = 5

6.5. Examples

1/
$a = $b $c { $.x = $b.x + $c.x; };
$b.x and $c.x refer to the resp. GRR properties

2/
$a = $b { $.x = $b.x + $c.x; } $c;
$c.x causes run-time error because used to the left of $c

3/
$a = [ $b ] $c { $.x = $b.x + $c.x; };
$b.x evaluates to the x property of $b if $b matched on the input utterance. 
Otherwise it causes run-time error 
A safer way to write this rule could be (assuming x is of type number):
$a = {$.x=0} [$b {$.x=$b.x}] $c {$.x=$.x+$c.x};

4/
$a = $b<1-> $c { $.x = $b.x + $c.x; };
$b.x evaluates to the last occurrence of $b in the count. 
If the purpose was to add or concatenate over each occurrence of $b, it should be written as:
$a = ($b {$.x = $.x + $b.x})<1-> $c { $.x = $.x + $c.x; };

4a/  
$a = $b<1-> $c { $.x = $b + $c.x; };
Similar as 4/, should be e.g.
$a = ($b {$.x = $.x + $b})<1-> $c { $.x = $.x + $c.x; };

4b/
$a = ($b $d)<0-> $c { $.x = $b + $c.x; };
Similar: $b evaluates to the last occurrence of $b in the expansion ($b $d)<0->, if any. 
Otherwise undefined.

5/
$a = ( $b | $c ) { $.x = $b.x + $c.x; };
Either $b.x or $c.x will cause a run-time error depending on the input utterance.
This is better written as:
$a = ( $b {$.x=$b.x} | $c {$.x=$c.x} );

5a/
$a = ( $b a | a $b ) $c { $.x = $b.x + $c.x; };
$b.x refers to whichever $b actually matched.

6/
$a = ($b | $c) ($d | $e) { $.x=($b.x + $c.x) * ($d.x + $e.x); } ;
One of the operands to every addition causes a run-time error 
here depending on the input utterance.
This rule should better be rewritten, e.g. as
$a = ($b {$.x=$b.x} | $c {$.x=$c.x}) ($d {$.x=$.x*$d.x} | $e {$.x=$.x*$e.x}) ;

7/
$a = $b | $c { $.x = $b.x + $c.x; };
Evaluation of $b.x always causes a run-time error 
because the expression will be evaluated only when $c matches, not $b 
(When $b matches the default assignment would cause $=$b$.text).
A more useful rule could be:
$a = $b { $.x = $b.x} | $c { $.x = $c.x };

8/
$a = $b [ $c { $.x = $b.x + $c.x; } ];
The expression is only evaluated if $c matches; in that case both $b and $c are defined.

9/
$a = $b ( $c { $.x = $b.x + $c.x; } )<1-> ;
The expression is evaluated for every occurrence of $c. 
Note that this will actually result in $b.x to be added to $c.x for the last occurrence of $c 
because every evaluation will overwrite the previous result.

10/
$a = $b ( $c { $.x = $b.x + $c.x; } )<0->;
Same effect as 9/, except that now the expression is not evaluated if $c didn't match once.

11/
$digits = {$.ds=""} ( $digit { $.ds = $.ds + $digit; } )<1-> ;
$digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
These rules do the obvious concatenation of digits. 
Note that ds is first initialized to "" because otherwise in the first evaluation 
of the expression, ds would be undefined and cause a run-time error

12/
$a = $b $c { $$ };
$$ resolves to $c.

13/
$a = $b c { $$ };
$$ resolves to $b.

14/
$a = b c { $$ };
$$ can’t be resolved and causes run-time error.

15/
$x = $a ($b | $c) {$$};
If $b matches, $$ resolves to $b. If $c matches, $$ resolves to $c.
This is equivalent to $x = $a ($b {$$} | $c {$$}).

16/
$x = $a [$b] {$$};
$$ resolves to $b, if $b matches, if not, it resolves to $a.
The effect is equivalent to $x = $a {$$} [$b {$$}].

17/
$x = $a<1-> {$$};
$$ resolves to the last occurrence of $a.
The effect is equivalent to $x = ($a {$$})<1->.

7. Using Semantic Interpretation to generate XML results

Semantic Interpretation processors may be used in environments where a return result is expected in XML format (for example, those supporting EMMA, the forthcoming W3C specification for the representation of user input.)

If returning XML results, the following serialization rules must be used to generate an XML fragment from the Semantic Interpretation process. Notice that these serialization rules apply to semantic values generated by authored SI tags during SI processing, and do not preclude the addition of further information into the XML result by an individual SI processor (for example, recognizer annotations corresponding to acoustic confidence scores or other such information). This specification does not define the XML documents in which the generated fragment can be embedded.

7.1. Serialization of ECMAScript result into an XML fragment

The serialization of the ECMAScript result into an XML fragment is constituted by the following general transformations:

  1. Each property in the ECMAScript top-level Rule Variable becomes an XML node. The name of the node will be the same as the name of the property.
  2. If the value of the property is a simple scalar type (e.g. string, number) then the CDATA text content of the XML node will be the value of this property.
  3. If the property is of type Object, then each child property of this object becomes a child node, and the contents of these child nodes are in turn processed.
  4. Array child properties become child nodes with name 'item'.
  5. If the value of the property is none of the above types (e.g. it is of type Boolean), then the CDATA text content of the XML node will be the value of this property as if the ToString() operation had been performed on an object of this type (e.g., for Boolean, "true" or "false").
  6. Properties with the name _attributes or _value will be treated in the ways described below.

Example
Following the above principles, to take the example ECMAScript object in section 8:

   order: {
      drink: {
         liquid:"coke"
         drinksize:"medium"}
      pizza: {
         number: "3"
         pizzasize: "large"
         topping: [ "pepperoni" "mushrooms" ]
      }
   }

SI processing in an XML environment would generate the following document:

        <order>
          <drink>
            <liquid> coke </liquid> 
            <drinksize> medium </drinksize>
          </drink>
          <pizza>
            <number> 3 </number>
            <pizzasize> large </pizzasize>
            <topping >
              <item > pepperoni </item>
              <item > mushrooms </item>
            </topping>
          </pizza>
        </order>
Issue-5:

Should arrays generate length and indexing information? For example:

        <topping length="2">
           <item index="0"> pepperoni </item>
           <item index="1"> mushrooms </item>
        </topping>

7.2. Use of _attributes and _value

Variables named _attributes and _value can be created and used by the SI author to enable the generation of richer XML results, including the following structures:

The _attributes object is used to hold property name/value pairs which will be rendered as XML attribute nodes of the object which contains _attributes.

The _value variable is used to hold a scalar value for CDATA.

Semantic Interpretation processors treat these objects in the following way:

  1. properties specified in the _attributes object are rendered as attribute nodes of the containing object.
  2. the value of _value is treated as CDATA content of the containing object.

If the properties specified in _attributes or the value of _value are not scalar types, the ToString() operation is performed to generate a string value.

To give an example, the following ECMAScript object:

        {
          martini: {
            gin: {
                _value: "Bombay Sapphire"
                _attributes {
                   ratio: 8
                }
            }
            vermouth: { 
                _value: "Noilly Prat" 
                _attributes {
                   ratio: 1
                }
            }
            _attributes {
                method: "shaken"
            }
        }

would generate the following XML result:

          ...
              <martini method="shaken">
                <gin ratio="8"> Bombay Sapphire </gin>
                <vermouth ratio="1"> Noilly Prat </vermouth>
              </martini>
          ...

8. Example Grammar with Semantic Interpretation Tags

Example in ABNF Form:

$order = I would like a $drink {$.drink = new Object(); $.drink.liquid = $drink.type;
        $.drink.drinksize = $drink.drinksize}
   and $pizza {$.pizza=$pizza};
// two properties on $order, both are structs
// drink was passed property by property to change a property name
// pizza is passed as whole struct

$kindofdrink = coke | pepsi | "coca cola":"coke";


$foodsize = [ {"medium"} | small | medium | large | regular {"medium"}]; 
// medium is default if nothing said
  
$tops = {$=new Array;} $top {$.push($top)}
  (and $top {$.push($top)})<1-> ;
// construct Array of toppings, return Array
 
$top = anchovies | pepperoni | mushroom:"mushrooms" | mushrooms;
 
$drink = $foodsize $kindofdrink {$.drinksize=$foodsize; $.type=$kindofdrink };
// two named properties (drinksize and type) on left hand side attribute

$pizza = $number $foodsize {$.pizzasize=$foodsize; $.number=$number} pizzas
   with $tops {$.topping=$tops};
// three properties on $pizza’s attribute 


$number = (a | one):"1" | two:"2"| three:"3";

On the above grammar, the following utterance

"I would like a coca cola and three large pizzas with pepperoni and mushrooms."

Would create following struct attribute on $order:

{
  drink: {
    liquid:"coke"
    drinksize:"medium"}
  pizza: {
    number: "3"
    pizzasize: "large"
    topping: [ "pepperoni", "mushrooms" ]
  }
}


Example in XML Form:

<rule id="order">
  I would like a
  <ruleref uri="#drink"/>
  <tag> $.drink = new Object(); $.drink.liquid=$drink.type; $.drink.drinksize=$drink.drinksize</tag>
  and
  <ruleref uri="#pizza"/>
  <tag> $.pizza=$pizza </tag>
</rule>

<rule id="kindofdrink">
  <one-of>
    <item> coke </item>
    <item> pepsi </item>
    <item tag="coke"> coca cola < </item>
    <!-- quote for string constant -->

  </one-of>
</rule>

<rule id="foodsize">
  <tag> $='medium' </tag> <!-- no more need for NULL -->
  <item repeat="0-1">
    <one-of>
      <item> small </item>
      <item> medium </item>
      <item> large </item>
      <item tag='medium'>  regular </item>
    </one-of>
  </item>
</rule>

<rule id="tops">
  <tag> $=new Array;</tag>
  <ruleref uri="#top"/> 

  <tag> $.push($top) </tag>
  <item repeat="1-">
    and
    <ruleref uri="#top"/> 

    <tag> $.push($top);/tag>
  </item>
</rule>

<rule id="top">
  <one-of>
    <item> anchovies </item>
    <item> pepperoni </item>
    <item tag='mushrooms'>  mushroom </item>
    <item> mushrooms </item>
  </one-of>
</rule>

<rule id="drink">
  <ruleref uri="#foodsize"/>
  <ruleref uri="#kindofdrink"/> 
  <tag> $.drinksize=$foodsize; $.type=$kindofdrink </tag>
</rule>
 
<rule id="pizza">
  <ruleref uri="#number"/>
  <ruleref uri="#foodsize"/> 
  <tag> $.pizzasize=$foodsize; $.number=$number </tag>
  pizzas with
  <ruleref uri="#tops"/>
  <tag> $.topping=$tops </tag>
</rule>
 
<rule id="number">
  <one-of>
    <item tag="1">

      <one-of> 
        <item> a </item>
        <item> one </item>
      </one-of>
    </item>
    <item tag="2"> two </item>
    <item tag="3"> three </item>
  </one-of>
</rule>

9. Conformance

9.1.Conforming Semantic Interpretation Tags

A Semantic Interpretation Tag (SI Tag) is a conforming SI Tag if it's content is matching the syntax as defined in the normative sections in this document.

There is no normative restriction on the size of a SI Tag.

9.2. Conforming Grammar Fragments and Grammar Documents with SI Tags

A stand-alone ABNF or XML Grammar Document or an XML Grammar Fragment with SI Tags is conforming if:

Informative

The Speech Recognition Grammar Specification provides a tag-format declaration that identifies the format of the contents of the tag element in a speech grammar. The tag-format to reference Semantic Interpretation Tags conforming with the present specification is defined here as "semantics/1.0". Note that this is the default tag-format in the current Speech Recognition Grammar Specification when no explicit tag-format is specified.

It is expected that future revisions of this specification will use higher version numbers.

Other tag-formats can be used with Speech Recognition Grammars; in this case the tag-format must be explicitly declared and must not be "semantics/x.y" (where x and y are any digits).

Issue-6:

The Speech Recognition Grammar Specification doesn't currently reserve "semantics/x.y" for SI tags. It is expected that a future update of SRGS may do so.

9.3. Conforming Semantic Interpretation Processors

A Semantic Interpretation Processor is a program that can parse and process Semantic Interpretation Tags to produce semantic results. Semantic Interpretation Processors are executed in a hosting environment (e.g. a grammar processor or VoiceXML processor).

A Conforming Semantic Interpretation Processor

Issue-7:

The group is considering adding a "strict" mode in which a semantic processor MUST inform the hosting environment when it encounters a non-conforming SI Tag, even if it can process it.

Informative

We anticipate that following will be the non-conforming conditions a processor may encounter:

  1. Non-conforming document by developer error (or error in automatic document generation).
  2. Not conforming by use of a proprietary semantic interpretation syntax in the grammar tags.
  3. Not conforming by use of proprietary extensions to SI Tags.

The W3C Voice Browser Working Group has applied to IETF to register MIME types for both the ABNF and XML grammar forms (See Appendix G. Media Types and File Suffix of the Speech Recognition Grammar Specification)

The ABNF MIME type will identify ABNF grammars containing only conforming SI Tags. If the grammar contains tags of any other format then a different MIME type must be used.

Similarly, the XML grammar MIME type will identify XML grammars containing only conforming SI Tags. If the grammar contains tags of any other format then a different MIME type must be used.

A grammar that contains tags in a format other than conforming SI Tags must have an explicit tag format declaration specifying the format (see Speech Recognition Grammar Specification 4.8 Tag Format Declaration). The tag-format for a grammar that contains conforming Semantic Interpretation Tags is "semantics/1.0".

Note: a VoiceXML 2.0 processor will require support for Semantic Interpretation Tags as defined here, but will allow to support other grammar formats or SRGS with other tags in addition (probably identified by other MIME type).

9.4. Conforming ABNF and XML Grammar Processors

An ABNF or XML Grammar Processor is a conforming processor if:

Appendix A. XML Schema and ABNF Syntax

A.1. XML Form: XML Schema

This Schema extends the Schema for XML Form grammars ABNF as defined in SRGS as follows:

    <xsd:attributeGroup name="tag.attrib">
        <xsd:annotation>
            <xsd:documentation/>
        </xsd:annotation>
        <xsd:attribute name="tag" type="tag"/>
    </xsd:attributeGroup>
                
    <xsd:attributeGroup name="Token.attribs">
        <xsd:annotation>
            <xsd:documentation/>
        </xsd:annotation>
        <xsd:attribute ref="xml:lang"/>
        <xsd:attributeGroup ref="tag.attrib"/>
    </xsd:attributeGroup>
                
    <xsd:attributeGroup name="One-of.attribs">
        <xsd:annotation>
            <xsd:documentation/>
        </xsd:annotation>
        <xsd:attribute ref="xml:lang"/>
        <xsd:attributeGroup ref="tag.attrib"/>
    </xsd:attributeGroup>
                
    <xsd:attributeGroup name="Item.attribs">
        <xsd:annotation>
            <xsd:documentation/>
        </xsd:annotation>
        <xsd:attributeGroup ref="Repeat-prob.attrib"/>
        <xsd:attributeGroup ref="Repeat.attrib"/>
        <xsd:attributeGroup ref="Weight.attrib"/>
        <xsd:attribute ref="xml:lang"/>
        <xsd:attributeGroup ref="tag.attrib"/>
    </xsd:attributeGroup>

    <xsd:attributeGroup name="Ruleref.attribs">
        <xsd:annotation>
            <xsd:documentation/>
        </xsd:annotation>
        <xsd:attributeGroup ref="Type.attrib"/>
        <xsd:attribute name="uri" type="xsd:anyURI"/>
        <xsd:attributeGroup ref="Special.attrib"/>
        <xsd:attribute ref="xml:lang"/>
        <xsd:attributeGroup ref="tag.attrib"/>
    </xsd:attributeGroup>


    <xsd:complexType name="grammar">
        <xsd:sequence>
            <xsd:choice minOccurs="0" maxOccurs="unbounded">
                <xsd:element name="lexicon" type="lexicon"/>
                <xsd:element name="meta" type="meta"/>
                <xsd:element name="metadata" type="metadata"/>
                <xsd:element name="tag" type="tag"/>
            </xsd:choice>
            <xsd:choice minOccurs="0" maxOccurs="unbounded">
                <xsd:element name="rule" type="rule"/>
            </xsd:choice>
        </xsd:sequence>
        <xsd:attributeGroup ref="Grammar.attribs"/>
    </xsd:complexType>


A.2. ABNF Form: Formal Syntax

This Syntax extends the Formal syntax for ABNF as defined in SRGS as follows:

Lexical Grammar for ABNF with Semantic Interpretation

TagAttachment ::=
    ':"' DoubleStringCharacters? '"'
    | ':'' SingleStringCharacters? '''

--------------------------------------------------------------------
SingleStringCharacters is defined by the ES-262 
production SingleStringCharacters in ES-7.8.4.
DoubleStringCharacters is defined by the ES-262 
production DoubleStringCharacters in ES-7.8.4.

Syntactic Grammar for ABNF with Semantic Interpretation

declaration     ::=
    baseDecl | languageDecl | modeDecl | rootRuleDecl
    | tagFormatDecl | lexiconDecl | metaDecl | tagDecl
                
tagDecl     ::=
    Tag ';'
                
                
attachment      ::=
    LanguageAttachment | TagAttachment 
    | LanguageAttachment TagAttachment
    | TagAttachment LanguageAttachment
                
subexpansion    ::=
    Token attachment?
    | ruleRef attachment?
    | Tag
    | '(' ')'
    | '(' ruleExpansion ')' attachment?
    | '[' ruleExpansion ']' attachment?

Acknowledgments

This document was written with the participation of members of the W3C Voice Browser Working Group. The following have significantly contributed to writing this specification:

References

[ECMA]
ECMA International - Standardizing Information and Communication Systems
http://www.ecma-international.org/
[ECMA-262]
ECMAScript Language Specification, 3rd Edition - December 1999, published by ECMA.
http://www.ecma-international.org/publications/standards/ECMA-262.HTM
[EMMA Requirements]
Requirements for EMMA (Extensible MultiModal Annotation), W3C Multimodal Interaction Activity.
http://www.w3.org/TR/EMMAreqs/
[ES-CP]
ECMA-327 Standard "ECMAScript 3rd Edition Compact Profile", June 2001, published by ECMA.
http://www.ecma-international.org/publications/standards/ECMA-327.HTM
[N-grams]
Stochastic Language Models (N-Gram) Specification, W3C Voice Browser Activity
http://www.w3.org/TR/ngram-spec
[MMI]
W3C Multimodal Interaction Activity
http://www.w3.org/2002/mmi/
[SRGS]
Speech Recognition Grammar Specification for the W3C Speech Interface Framework, W3C Voice Browser Activity
http://www.w3.org/TR/speech-grammar
[Voice]
W3C Voice Browser Activity
http://www.w3.org/Voice/
[VoiceXML]
Voice Extensible Markup Language (VoiceXML) Version 2.0, W3C Voice Browser Activity
http://www.w3.org/TR/voicexml20

Valid XHTML 1.0!