Copyright ©2003-2006 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document defines the process of Semantic Interpretation for Speech Recognition and the syntax and semantics of semantic interpretation tags that can be added to speech recognition grammars to compute information to return to an application on the basis of rules and tokens that were matched by the speech recognizer. In particular, it defines the syntax and semantics of the contents of Tags in the Speech Recognition Grammar Specification [SRGS].
The results of semantic interpretation describe the meaning of a natural language utterance. The current specification represents this information as an ECMAScript object, and defines a mechanism to serialize the result into XML. The W3C Multimodal Interaction Activity [MMI] is defining an XML data format [EMMA] for containing and annotating the information in user utterances. It is expected that the EMMA language will be able to integrate results generated by Semantic Interpretation for Speech Recognition.
Semantic Interpretation may be useful in combination with other specifications, such as Stochastic Language Models [N-GRAM], but their use with N-grams has not yet been studied.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the 11 January 2006 W3C Candidate Recommendation of "Semantic Interpretation for Speech Recognition (SISR) Version 1.0". W3C publishes a technical report as a Candidate Recommendation to indicate that the document is believed to be stable and to encourage implementation by the developer community. Candidate Recommendation status is described in section 7.1.1 of the Process Document. Comments can be sent until 20 February 2006.
Publication as a Candidate Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document has been produced as part of the Voice Browser Activity (activity statement), following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only).
This document was produced under the 5 February 2004 W3C Patent Policy. The Working Group maintains a public list of patent disclosures relevant to this document; that page also includes instructions for disclosing [and excluding] a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is for public review, and comments and discussion are welcomed on the (archived) public mailing list <www-voice@w3.org>.
This document is based upon the Semantic Interpretation for Speech Recognition (SISR) Version 1.0 Last Call Working Draft of 8 November 2004 and feedback received during the review period (see the Disposition of Comments document). The Voice Browser Working Group (member-only link) believes that this specification addresses its requirements and all Last Call issues.
The entrance criteria to the Proposed Recommendation phase require at least two independently developed interoperable implementations of each required feature, and at least one or two implementations of each optional feature depending on whether the feature's conformance requirements have an impact on interoperability. The Voice Browser Working Group considers the optional feature specified in Section 7 to be a "feature at risk" since it will be removed if no implementation of it is reported to the group. Detailed implementation requirements and the invitation for participation in the Implementation Report are provided in the Implementation Report Plan. We expect to meet all requirements of that report within the Candidate Recommendation period closing 20 February 2006.
This section is informative.
Grammar Processors, and in particular speech recognizers, use a grammar that defines the words and sequences of words to define the input language that they can accept. The major task of a grammar processor consists of finding the sequence of words described by the grammar that (best) matches a given utterance, or to report that no such sequence exists.
In an application, knowing the sequence of words that were uttered is sometimes interesting but often not the most practical way of handling the information that is present in the user utterance. What is needed is a computer processable representation of the information, the Semantic Result, more than a natural language transcript. The process of producing a Semantic Result representing the meaning of a natural language utterance is called Semantic Interpretation (SI).
The Semantic Interpretation process described in this specification uses Semantic Interpretation Tags (SI Tags) (see section 3.2) to provide a means to attach instructions for the computation of such semantic results to a speech recognition grammar. When used with a [VOICEXML20] Processor, it is expected that a Semantic Interpretation Grammar Processor will convert the result generated by an [SRGS] speech grammar processor into an ECMAScript object that can then be processed as specified in section 3.1.6 Mapping Semantic Interpretation Results to VoiceXML Forms in [VOICEXML20].
The W3C Multimodal Interaction Activity [MMI] is defining an XML data format [EMMA] for containing and annotating the information in user utterances. It is expected that the EMMA language will be able to integrate results generated by Semantic Interpretation for Speech Recognition.
This document defines the syntax and the semantics of Semantic Interpretation Tags for use with the Speech Recognition Grammar Specification [SRGS]. It is possible that Semantic Interpretation Tags as defined here can be used also with Stochastic Language Models [N-GRAM], but the current specification does not specifically address such use and does not guarantee that the Semantic Interpretation Tags as defined here are meeting the needs of such use.
The basic principles for the Semantic Interpretation mechanism defined in this specification are the following:
This specification uses the ECMAScript Compact Profile [ECMA-327], which is a strict subset of [ECMA-262]. [ECMA-327] has been designed to meet the needs of resource-constrained environments. Special attention has been paid to constraining ECMAScript features that require proportionately large amounts of system memory, and continuous or proportionately large amounts of processing power. In particular, it is designed to facilitate prior compilation for execution in a lightweight environment. This makes it attractive for use in association with speech grammar rules for extracting semantic results from speech recognition.
In this document, the key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" are to be interpreted as described in [RFC2119]. Requirement levels for conforming Semantic Interpretation for Speech Recognition implementations are defined in Appendix A.
The sections in the main body of this document are normative unless otherwise specified. The appendices and examples in this document are informative unless otherwise indicated explicitly.
This specification normatively references [ECMA-327], which in turn references [ECMA-262]. The notation ES n is used in this document as shorthand for section number n in [ECMA-262].
SI Tags compute semantic values. During the semantic interpretation process, these values can be assigned to variables that are associated with the rules in the grammar. These variables are known as Rule Variables.
Every grammar rule has a single Rule Variable that holds a semantic value. The Rule Variable is typically assigned its value by the SI Tags within its grammar rule. SI Tags also have access to the Rule Variables of any other rules referenced by the current grammar rule and already processed up to that point in the utterance (according to the visibility constraints defined in section 6). The Rule Variables of other rules are referenced by the name of their grammar rule, as described in section 3.3.2.
Rule Variables can hold semantic values of any type defined in [ECMA-327]. They are not explicitly typed. Rule Variables that have not been assigned a value are not defined. SI authors will typically use scalar types, e.g. string or numeric values, in lower level rules and more structured objects in higher level rules (particularly root rules).
In addition to semantic values, certain other values corresponding to Rule Variables are available during SI processing.
For every Rule Variable there is an associated variable named text
,
of type String, which holds the substring (the series of tokens) in the utterance
that is governed by the corresponding grammar rule. Text variables are not part of
the Rule Variable (see section 3.3.3) and the value of the
text variables cannot be modified.
Likewise, for every Rule Variable, there is an associated variable called
score
, of type Number, which holds a value that is related to the
confidence or probability of the corresponding grammar rule or some similar measure.
Higher score values indicate higher confidence or probability over the corresponding
grammar rule. Processors that don't compute or don't have access to such values must
return undefined as the score value. Score variables are not part of the Rule
Variable and the value of the score variables cannot be modified.
For every Rule Variable there are two associated variables named
starttime
and endtime
, of type Number, which hold the
starting time and ending time of the utterance that is governed by the corresponding
grammar rule. The value must be an absolute timestamp in terms of the number of
milliseconds since 1 January 1970 00:00:00 GMT, or otherwise must be undefined if the
SI processor does either not have time information for the input or cannot process
time information. For any given Rule Variable, if both starttime
and
endtime
values are not undefined, then the starttime
value
must not be greater than the endtime
value.
For any given Rule Variable, both the starttime
value and the
endtime
value related to the corresponding grammar rule must not be
smaller than the starttime
value related to the referencing
grammar rule. Similarly, for any given Rule Variable, both the starttime
and the endtime
value related to the corresponding grammar rule must
not be greater than the endtime
value related to the referencing
grammar rule. Undefined values for starttime
or endtime
cannot impose constraints on referenced rules and are not subject to constraints from
referencing rules.
These variables are not part of the Rule Variable and their values cannot be modified.
The semantic result for an utterance is the value of the Rule Variable of the root rule when all semantic interpretation evaluations have been completed. For certain result formats (e.g. [EMMA]), this value is serialized into an XML document according to the description in section 7. It is outside the scope of this specification to define how the semantic result is communicated to the application.
This section is informative.
In the context of the W3C Voice Browser architecture, the semantic result will be directly cast into ECMAScript variables in the VoiceXML interpreter (see section 3.1.6 in [VOICEXML20]). In the W3C Multimodal Interaction Framework [MMI-FRAMEWORK], the semantic result is expected to be transformed into EMMA following the mechanism described in section 7. In other contexts, the mechanism described in section 7 can be used to transform the semantic result into other XML formats.
Score values are highly dependent on the processor's implementation. In most implementations using speech recognition, scores are likely to be dependent on factors such as audio channel quality, grammar contents, grammar weights, language, individual speaker characteristics, and others. Scores for a particular word or phrase within a grammar are typically comparable over instances of the same word or phrase over time. Scores for different words in a single grammar are also typically comparable to one another. Scores across grammars, or scores for words and word sequences, or scores between different processors, are very often not comparable. It is anticipated that scores will be useful only for annotating the results, not for influencing the results during SI processing. Note that an SI processor doesn't require a speech recognizer, and thus that the score does not even have to be related to speech recognition.
The starttime
and endtime
variables may be useful in a
variety of application contexts, for instance temporal annotations in a multimodal
application which integrates semantic results from different modalities (e.g.
events from speech and gesture modalities). Note that the constraints on
starttime
and endtime
values for referenced rules do not
imply that starttime
and endtime
values for sequentially
referenced rules cannot overlap, and do not exclude that there may be gaps between
the endtime
time of a first referenced rule and the
starttime
time of the next referenced rule. The starttime
and endtime
values may be dependent on the processor's implementation,
and the accuracy of the values may be dependent on the speech signal quality or other
factors. This specification does not define accuracy requirements that a processor
should meet.
Semantic Interpretation Tags are added in the string content of the
tag
elements in the grammar rule expansion, as described in section 2.6
of [SRGS]. This specification further uses the term Semantic
Interpretation Tag (or SI Tag) to refer to such tag.
This specification defines two different Semantic Interpretation tag syntaxes. The
two different possible values of the tag-format
declaration in the
grammar define which of the two syntaxes is being used. The different syntaxes only
change the processing of tags during Semantic Interpretation, in all other respects
the grammar behaves identically.
The "Script" tag syntax, enabled by setting the tag-format
to
semantics/1.0
, defines the contents of tags to be ECMAScript. Each tag
is a valid [ECMA-327] program. Section 3.2.2 describes the processing of this tag syntax in more
detail.
The "String Literal" tag syntax, enabled by setting the tag-format
to
semantics/1.0-literals
, defines the contents of tags to be strings. This
syntax does not have the expressive power of a full scripting language, but does
provide a way to produce semantic results consisting of simple strings. Section 3.2.3 describes this tag syntax in more detail.
Within one grammar, it is not possible to mix the two tag syntaxes. All tags in
one grammar must have the same tag-format
. However, it is possible for
externally referenced grammars to have a different tag-format
to the
parent grammar from which they are referenced from.
Below are two example formats of SI Tags in the Speech Recognition Grammar Specification [SRGS] (tag-content represents the content of the tag which can be either ECMAScript code or a String Literal).
In the XML grammar format, SI Tags are specified as the content of the
<tag>
element:
<tag> tag-content </tag>
In the ABNF grammar format, SI Tags are enclosed in curly braces or in the
three-character sequences '{!{'
and '}!}'
:
{ tag-content } {!{ tag-content }!}
A Semantic Interpretation Script (SI Script) holds a string that is treated as the source text of a valid [ECMA-327] Program ("Program" is defined by ES 14).
The environment in which SI Tags are embedded may introduce escaped characters, character references, or other markup that has to be resolved by the environment. The result after resolution is treated as ECMAScript code.
It is illegal to make an assignment to a variable that has not been previously
declared (either implicitly as is the case for Rule Variables or explicitly by using
a var
statement). Attempting to assign to an undeclared variable will result in a
runtime error.
A tag using the String Literal tag syntax has content that is a sequence of zero
or more characters. If the character sequence is not empty, it has to follow either
the DoubleStringCharacters
or the SingleStringCharacters
production of ES 7.8.4
During processing, a tag with a String Literal has the same effect as a script that assigns the content of the tag, as a string literal, to the Rule Variable of the rule the tag is in.
This section is informative.
If multiple tags are present in the rule expansion, the Rule Variable is set to the value of the last tag in the expansion. Prior tags are overwritten by the final tag.
A grammar using the Script tag syntax can reference rules of a grammar using the
String Literal tag syntax. The value of the string literal can be obtained by the parent
rule using the Rule Variable of the referenced rule. The recognized text of the
referenced rule is also available in the meta.latest().text
and
meta.rulename.text
variables (where rulename
is the name of
the rule).
A grammar using the String Literal tag syntax can reference rules in other grammars (which can be using either the Script tag syntax or the String Literal tag syntax). See section 5 for the way semantic results from a referenced grammar can be used in a grammar with String Literal tag syntax.
Authors should take care to set the tag-format
correctly. Using the String
Literal tag syntax when the tag-format
is set to semantics/1.0
will generally result in a runtime error. However, the converse (using the Script tag syntax
when the tag-format
is set to semantics/1.0-literals
) will
not produce a runtime error but rather result in erroneously populating Rule Variables
with ECMAScript code.
Examples of equivalent grammars, one using the Script tag syntax and the other using the String Literal tag syntax, are given below for both the XML Form and ABNF Form.
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="answer"> <rule id="answer" scope="public"> <one-of> <item><ruleref uri="#yes"/></item> <item><ruleref uri="#no"/></item> </one-of> </rule> <rule id="yes"> <one-of> <item>yes</item> <item>yeah<tag>yes</tag></item> <item><token>you bet</token><tag>yes</tag></item> <item xml:lang="fr-CA">oui<tag>yes</tag></item> </one-of> </rule> <rule id="no"> <one-of> <item>no</item> <item>nope</item> <item>no way</item> </one-of> <tag>no</tag> </rule> </grammar>
The grammar above with the String Literal tag syntax is equivalent to the grammar below with the Script tag syntax:
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="answer"> <rule id="answer" scope="public"> <one-of> <item><ruleref uri="#yes"/></item> <item><ruleref uri="#no"/></item> </one-of> </rule> <rule id="yes"> <one-of> <item>yes</item> <item>yeah<tag>out="yes";</tag></item> <item><token>you bet</token><tag>out="yes";</tag></item> <item xml:lang="fr-CA">oui<tag>out="yes";</tag></item> </one-of> </rule> <rule id="no"> <one-of> <item>no</item> <item>nope</item> <item>no way</item> </one-of> <tag>out="no";</tag> </rule> </grammar>
#ABNF 1.0; language en-US; tag-format <semantics/1.0-literals>; root $answer; public $answer = $yes | $no; $yes = yes | yeah {yes} | "you bet" {!{yes}!} | "oui"!fr-CA {yes}; $no = (no | nope | no way) {no};
The grammar above with the String Literal tag syntax is equivalent to the grammar below with the Script tag syntax:
#ABNF 1.0; language en-US; tag-format <semantics/1.0>; root $answer; public $answer = $yes | $no; $yes = yes | yeah {out="yes";} | "you bet" {!{out="yes";}!} | "oui"!fr-CA {out="yes";}; $no = (no | nope | no way) {out="no";};
A SI Script can access Rule Variables using the syntax defined in this section. This syntax applies only to documents for which the SI Tags hold SI Scripts (and not to documents where SI Tags contain the String Literals tag syntax).
Every grammar rule has a single Rule Variable that holds a [ECMA-327] value. This Rule Variable can both be evaluated and assigned to.
The Rule Variable is identified by out
.
Properties of the Rule Variable can be individually accessed by
out.identifier
, where identifier
is the name of the
property.
out (identifies the Rule Variable) out.pizza (identifies the pizza property of the Rule Variable)
This section is informative.
The Semantic Interpretation Script typically assigns a value to the Rule Variable
of its embedding grammar rule. The Rule Variable is initialized to an empty Object
before the first tag in the grammar rule is executed (see section 6.3). The SI author will usually either add properties to this
Object or alternatively discard it by assigning a primitive value (e.g. String or
Number) to the Rule Variable. Since the Rule Variable is initialized before the tag
is executed, a var
statement is not required prior to assigning to
it.
As a consequence of normal ECMAScript behavior, the SI author is free to override
the Rule Variable type as well as value within the bounds of legal ECMAScript. Note
that [ECMA-327] enforces rules that affect Semantic
Interpretation Scripts. For example, [ECMA-327] reserved
words cannot be used as a property. Thus, out.for
is illegal because it
uses the [ECMA-327] reserved word for
.
// An Object with property name prop out.prop = "my property"; // A String with value "my value" out = "my value"; // A String with value "my value" out.prop = "my property"; out = "my value"; // A String with value "my value" out = "my value"; out.prop = "my property"; // A String with value "ab" out.prop1 = "a"; out.prop2 = "b"; out = out.prop1 + out.prop2; // An Object with property name prop out = "my value"; out = new Object(); out.prop = "my property";
SI Scripts can access the Rule Variable associated with grammar rules referenced in SI Tags that appear after (to the right or below) the rule reference in the grammar expansion, and only if the referenced rule was used in the expansion that matched the input utterance. See visibility rules in section 6 for a more detailed description of when Rule Variables associated to rule references can be referenced in SI Tags, using the concept of the logical parse structure and the flat parse list.
Rule Variables associated to referenced rules can both be evaluated and assigned
to. Every SI Script has access to a rules
object that has a property
holding the Rule Variable value for every visible rule. The Rule Variable associated
to a rule reference is identified by rules.rulename
, where
rulename
is the rulename of the rule, as defined in Section 3.1 Basic
Rule Definition in [SRGS]. Individual properties of a Rule
Variable can be identified by rules.rulename.identifier
, where
rulename
is the name of the rule and identifier
is the name
of the property.
The Rule Variable for the latest rule reference that was used in the expansion
matching the utterance up to the position of the SI Tag can also be referenced
through rules.latest()
.
In an expression, both the Rule Variables of the current grammar rule and the referenced rules can be evaluated and assigned to.
Special rules (NULL, VOID, GARBAGE) cannot be evaluated.
This section is informative.
The rules.rulename
notation (where rulename
is the name
of a referenced rule) can be used only for explicit local rule references and for
explicit references to a named rule of a grammar, not for implicit rule references
(see SRGS Section 2.2 Rule Reference in [SRGS] for a
definition of explicit and implicit rule references). To refer to the Rule Variable
for a rule that is referenced by an implicit reference to the root rule of a grammar,
the rules.latest()
notation can be used.
// The Rule Variable associated to the referenced rule "rulename" rules.rulename // The property "prop" of the Rule Variable associated with the referenced // rule "rulename" rules.rulename.prop // The Rule Variable associated to the latest matching rule reference before // the SI Tag rules.latest() // The property "prop" of Rule Variable associated to latest matching rule // reference before the SI Tag rules.latest().prop
Section 6 describes the visibility rules for accessing Rule
Variables. If according to these rules a Rule Variable is not visible, one can still
evaluate or declare and assign to the variable with that name (it is just a property
on the rules
object). The value assigned to a property of the
rules
object that has the name of a Rule Variable will be overwritten
when that Rule Variable is visible according to section 6. This
behavior can be used to "initialize" Rule Variables to handle cases where a
referenced rule may not actually be matched depending on the input to the
grammar.
In the following grammar, by declaring and assigning rules.foodsize
a default
value, the value for the drink
rule will always be:
{ drinksize: "medium", type: "coke" }
regardless of whether the input is 'coke' or 'medium coke':
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="drink"> <rule id="drink"> <-- Note: rules object always exists in scope --> <tag>rules.foodsize="medium"; </tag> <item repeat="0-1"> <ruleref uri="#foodsize"/> </item> <ruleref uri="#kindofdrink"/> <tag>out.drinksize=rules.foodsize; out.type=rules.kindofdrink;</tag> </rule> <rule id="foodsize"> <one-of> <item>small</item> <item>medium</item> <item>large</item> </one-of> </rule> <rule id="kindofdrink"> <one-of> <item>coke</item> <item>pepsi</item> </one-of> </rule> </grammar>
A Rule Variable's text variable is identified by meta.rulename.text
,
where rulename
is the name of the Rule Variable. The text variable of
the Rule Variable referred to by rules.latest()
is identified by
meta.latest().text
. The text variable associated to the current grammar
rule is identified by meta.current().text
. The text variable of the
current grammar rule is read-only.
A Rule Variable's score variable is identified by
meta.rulename.score
, where rulename
is the name of the Rule
Variable. The score variable of the Rule Variable referred to by
rules.latest()
is identified by meta.latest().score
. The
score variable associated to the current grammar rule is identified by
meta.current().score
. The score variable of the current grammar rule is
read-only.
A Rule Variable's starttime
and endtime
variables are
identified by meta.rulename.starttime
and
meta.rulename.endtime
, where rulename
is the name of the
Rule Variable. The starttime
and endtime
variables of the
Rule Variable referred to by rules.latest()
are identified by
meta.latest().starttime
and meta.latest().endtime
. The
starttime
and endtime
variables associated to the current
grammar rule are identified by meta.current().starttime
and
meta.current().endtime
. The starttime
and
endtime
variables of the current grammar rule are read-only.
This section is informative.
Since the text
, score
, starttime
, and
endtime
variables of the current grammar are read-only, they behave as
read-only properties as defined in [ECMA-327]. As a
consequence, attempts to assign to the text
, score
,
starttime
or endtime
variable associated to the Rule
Variable of the current grammar rule will be ignored. Note, however, that the
text
, score
, starttime
, and
endtime
properties of a referenced rule (i.e. those properties of
meta.rulename()
where rulename
is the referenced rule or
meta.latest()
), are not read-only.
// The text variable of the Rule Variable called "rulename" meta.rulename.text // The text variable of the Rule Variable referenced to by rules.latest() meta.latest().text // The text (read-only) variable of the current grammar rule meta.current().text
semantics/1.0
or
semantics/1.0-literals
<tag>
element to the grammar header
for the purpose of setting global variablesThe header of an [SRGS] grammar may contain one or more global SI Tags. In grammars using the Script tag syntax, these tags are executed before any of the SI Tags in the matching grammar rules are evaluated. There are no ordering constraints between SI Tags and other valid SRGS grammar header items (see section 4.1 of [SRGS]). Global tags are ignored in grammars using the String Literal tag syntax.
The SI Tags are evaluated only once in a global scope that will be shared by all evaluations (see section 6.3)
Whereas all evaluations for SI Tags in flat parse lists for matching rules have access to the global scope for reading only, the SI Tags in the grammar header have write access to the global scope. This is the primary function of these tags: to initialize the global scope for use in the SI Tags.
In the XML Form, global SI Tags are SI Tags that appear outside all rules in the grammar header and before the first rule.
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="rule"> <tag>var x=1;</tag> <tag>var y='abcd';</tag> <rule id="rule"> <one-of> <item>yes</item> <item>no</item> </one-of> </rule> </grammar>
In the ABNF Form, global SI Tags are SI Tags followed by a semicolon, that appear outside all rules in the grammar header and before the first rule. Both tag delimiting syntaxes are illustrated in the example.
#ABNF 1.0; language en-US; tag-format <semantics/1.0>; root $rule; {var x=1;}; {!{var y='abcd';}!}; $rule = yes | no;
For a given parse, if there is no SI Tag attached to the expansion in the grammar
rule that is used to match the utterance, then the value for the out
Rule Variable is determined as follows. If there are no rule references in the parse,
the value for the text meta variable (meta.current().text
) is
automatically copied into the Rule Variable (which then becomes of type String).
Otherwise, the value of the Rule Variable of the last rule reference in the parse
(rules.latest()
) is automatically copied into the Rule Variable.
For the following rule, rules.drink
is either "coke", "pepsi" or "coca cola".
Similarly for meta.drink.text
.
<rule id="drink"> <one-of> <item>coke</item> <item>pepsi</item> <item>coca cola</item> </one-of> </rule>
For the following rule, there is an String Literal tag associated with "coca cola"
and hence rules.drink
is either "coke" or "pepsi". However,
meta.drink.text
is either "coke", "coca cola", or "pepsi".
<rule id="drink"> <one-of> <item>coke</item> <item>pepsi</item> <item>coca cola<tag>coke</tag></item> </one-of> </rule>
For the following grammar, the utterance "I want to fly to Boston" will return the result "BOS".
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="flight"> <rule id="flight" scope="public"> I want to fly to <ruleref uri="#airports"/> </rule> <rule id="airports" scope="private"> <one-of> <ruleref uri="#USairport "/> <ruleref uri="#otherairport"/> </one-of> </rule> <rule id="USairport" scope="private"> <one-of> <item>Boston<tag>BOS</tag></item> <item>New York<tag>JFK</tag></item> <item>Chicago<tag>ORD</tag></item> </one-of> </rule> <rule id="otherairport" scope="private"> <one-of> <item>Brussels<tag>BRU</tag></item> <item>Paris<tag>CDG</tag></item> <item>Rome<tag>FCO</tag></item> </one-of> </rule> </grammar>
Note that the default assignment has been designed to handle the simplest but most frequent cases only. It cannot cope with combining information from different rule references. For example, the grammar below would return the information about the last airport only, not about both airports. For the following grammar, the utterance "I want to fly from Chicago to Boston" will return the result "BOS".
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="flight"> <rule id="flight" scope="public"> I want to fly from <one-of> <item><ruleref uri="#USairport "/></item> <item><ruleref uri="#otherairport"/></item> </one-of> to <one-of> <item><ruleref uri="#USairport "/></item> <item><ruleref uri="#otherairport"/></item> </one-of> </rule> <rule id="USairport" scope="private"> <one-of> <item>Boston<tag>BOS</tag></item> <item>New York<tag>JFK</tag></item> <item>Chicago<tag>ORD</tag></item> </one-of> </rule> <rule id="otherairport" scope="private"> <one-of> <item>Brussels<tag>BRU</tag></item> <item>Paris<tag>CDG</tag></item> <item>Rome<tag>FCO</tag></item> </one-of> </rule> </grammar>
In order to make this grammar return both airports, one would have to use the Script tag syntax, as shown below. This functionality cannot be achieved by relying only on literal tags and default assignments.
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="flight"> <rule id="flight" scope="public"> I want to fly from <one-of> <item> <ruleref uri="http://www.example.com/places.grxml"/> </item> <item> <ruleref uri="http://www.example.com/places.grxml#otherairport"/> </item> </one-of> <tag>out.departure = rules.latest();</tag> to <one-of> <item> <ruleref uri="http://www.example.com/places.grxml"/> </item> <item> <ruleref uri="http://www.example.com/places.grxml#otherairport"/> </item> </one-of> <tag>out.arrival = rules.latest();</tag> </rule> </grammar>
Grammar http://www.example.com/places.grxml:
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="USairport"> <rule id="USairport" scope="public"> <one-of> <item>Boston<tag>BOS</tag></item> <item>New York<tag>JFK</tag></item> <item>Chicago<tag>ORD</tag></item> </one-of> </rule> <rule id="otherairport" scope="public"> <one-of> <item>Brussels<tag>BRU</tag></item> <item>Paris<tag>CDG</tag></item> <item>Rome<tag>FCO</tag></item> </one-of> </rule> </grammar>
This section defines the visibility rules and order of tag evaluation for SI Tags used in the Speech Recognition Grammar Format (ABNF and XML Form). When SI Tags are embedded in other markup languages (e.g. in [N-GRAM]), the visibility rules and order of evaluation may be defined differently.
After the initialization of the global scope (see section 6.3), the visibility rules and the order of evaluation of semantic interpretation tags are defined in terms of the logical parse structure as defined in Appendix H Logical Parse Structure in [SRGS] .
Note that while this appendix is informative for the Speech Recognition Grammar Specification, it is normative for the Semantic Interpretation specification. This does not imply that grammar processors must implement a logical parse structure, nor that ambiguities or recursion should be handled in any specific way over what is required for a conformant speech recognition grammar processor. The Logical Parse Structure is only a means to illustrate the order of evaluation and visibility rules for SI Tags. Implementations are not required to expose the logical structure and may use different internal representation as long as these yield the results described here.
The Logical Parse Structure is a formal syntax for describing the sequence and relation of tags and rule references to the tokens that are input to the grammar processor.
The Logical Parse output is represented as an array of output entities en, e.g. [e1, e2, e3].
Output entities can be one out of three kinds:
Appendix H in [SRGS] contains a full description of how to create the logical parse on a grammar for a given input to a grammar processor.
For the purpose of building the logical parse, all String Literals are assumed to be converted into the equivalent SI Script as defined in 3.2.3
The sentence "turn the heating off" on the following XML Form grammar
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0" root="command"> <rule id="command"> <one-of> <item>set</item> <item>turn</item> </one-of> <ruleref uri="#object"/> <ruleref uri="#state"/> <tag>out.o=rules.object; out.s=rules.state;</tag> </rule> <rule id="object"> <item repeat="0-1">the</item> <one-of> <item> <one-of> <item>heating</item> <item>cooling</item> </one-of> <tag>out="airco";</tag> </item> <item>radio<tag>out="radio";</tag></item> <item>lights<tag>out="lights";</tag></item> </one-of> </rule> <rule id="state"> <one-of> <item>to</item> <item><ruleref special="NULL"/></item> </one-of> <one-of> <item>on<tag>out="1";</tag></item> <item>off<tag>out="0";</tag></item> <item>warm<tag>out="w";</tag></item> <item>cool<tag>out="c";</tag></item> <item>cold<tag>out="c";</tag></item> </one-of> </rule> </grammar>
or equivalent ABNF Form grammar
#ABNF 1.0; language en-US; tag-format <semantics/1.0>; root $command; $command = (set | turn) $object $state {out.o=rules.object; out.s=rules.state;}; $object = [the] (heating | cooling){out="airco";} | radio{out="radio";} | lights{out="lights";}; $state = (to|$NULL) (on{out="1";} | off{out="0";} | warm{out="w";} | cool{out="c";} | cold{out="c";});
will result in the logical parse
[$command [turn, $object [the, heating, {out="airco";}], $state [off, {out="0";}], {out.o=rules.object; out.s=rules.state;}] ]
The logical parse structure is a tree-like structure that shows all terminals, tags and rule references governed by a given rule. This tree can also be represented in a flattened list of parses, with one parse for every grammar rule application.
The flat parse for a given rule application is represented as:
The output entities are as in the logical parse structure, except that rule references are represented without an array of output entities but followed by a sequence number in parenthesis.
The equivalent flat parse list for the above example is:
$command(1): turn, $object(1), $state(1), {out.o=rules.object; out.s=rules.state;} $object(1): the, heating, {out="airco";} $state(1): off, {out="0";}
The following example illustrates the use of the sequence number for rules that are applied more than once. Consider the grammar with String Literals, in XML Form:
<grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" tag-format="semantics/1.0-literals" root="a"> <rule id="a"> <item repeat="1-"><ruleref uri="#b"/></item> <ruleref uri="#c"/> <one-of> <item> <item repeat="0-1">t1</item> <tag>tag1</tag> </item> <item> <ruleref uri="#d"/> <tag>tag2</tag> </item> </one-of> </rule> <rule id="b"> <one-of> <item>t2</item> <item>t3<tag>tag3</tag></item> <item>t4</item> </one-of> </rule> <rule id="c"> <item repeat="1-2">t5<tag>tag5</tag></item> </rule> <rule id="d"> t6 <ruleref uri="#c"/> </rule> </grammar>
or equivalently in ABNF Form:
#ABNF 1.0; language en-US; tag-format <semantics/1.0-literals>; root $a; $a = ($b)<1-> $c (t1)<0-1> {tag1} | $d {tag2}; $b = t2 | t3 {tag3} | t4; $c = (t5 {tag5})<1-2>; $d = t6 $c;
Given the input "t2 t3 t5 t5", the logical parse structure is:
[$a[ $b[t2], $b[t3, {tag3}],$c[t5, {tag5}, t5, {tag5}],{tag1}]
and the flat parse list is:
$a: $b(1), $b(2), $c(1), {tag1} $b(1): t2 $b(2): t3, {tag3} $c(1): t5, {tag5}, t5, {tag5}
Before evaluating any scripts in the flat parse list, a global anonymous ECMAScript scope is created for the grammar. This global scope is initialized by executing the scripts that are in the global tags in the grammar header (see section 4.2).
During evaluation of a script in the flat parse list, the global scope is accessible for reading only.
Every script has only one global scope associated: the global scope for the grammar in which the script appears. Scripts in referenced rules that are located in a referenced external grammar are thus executed with access to that referenced grammar's global scope, and don't have access to the referencing grammar's global scope.
The tags within a flat parse are executed in the order in which they appear, left to right. The global tags (in the grammar header) are executed in document order. See section 6.4 for details.
For each flat parse, a new anonymous ECMAScript scope is created that is a direct child of the global scope object for the grammar in which the related rule is defined. The ECMAScript scope chains thus always have the global scope (the scope of the whole parse) as the top-level object, and the scope belonging to the parse list as the successor.
Access to variables in tag executions are resolved with the scope chain according to the ECMAScript rules (ES 10.1.4).
The variables object according to [ECMA-327] is the scope object created for this rule. This means that local variables that are defined in tags belonging to a rule reference are created in the scope object that was created for this rule.
Before the first tag in a flat parse is executed, the environment of a new scope is set up in the following way:
out
is initialized to a new object as constructed by
the expression new Object()
.rules
is initialized to a new object as constructed
by the expression new Object()
.meta.current().text
is initialized (read-only) to the text
variable of the current grammar rule.meta.current().score
is initialized (read-only) to the score value
related to the current grammar rule.meta.current().starttime
is initialized (read-only) to the
starting time related to the current grammar rule.meta.current().endtime
is initialized (read-only) to the ending
time related to the current grammar rule.rules.latest()
returns undefined.meta.latest()
returns undefined.When execution of the flat parse is finished, the scope object of this flat parse
is removed from the scope chain. The scope belonging to the referencing flat parse is
then updated in the following way (replace rulename
with the name of the
rule in what follows):
rules.rulename
of the scope of the referencing rule is set to the
value of the variable out
of the child scope.meta.rulename.text
of the scope of the referencing rule is set to
the concatenation of all terminals within the rule reference.meta.rulename.score
of the scope of the referencing rule is set to
score value for the referenced rule.meta.rulename.starttime
of the scope of the referencing rule is
set to starting time value for the referenced rule.meta.rulename.endtime
of the scope of the referencing rule is set
to ending time value for the referenced rule.rules.latest()
= rules.rulename
(both variables are
in the scope of the referencing rule).meta.latest().text
= meta.rulename.text
(both
variables are in the scope of the referencing rule).meta.latest().score
= meta.rulename.score
(both
variables are in the scope of the referencing rule).meta.latest().starttime
= meta.rulename.starttime
(both variables are in the scope of the referencing rule).meta.latest().endtime
= meta.rulename.endtime
(both
variables are in the scope of the referencing rule).Note: Whether or not the out
, rules
and
meta
variables are enumerated when enumerating the scope object is not
defined by this specification and may vary over implementations. Authors are
discouraged to use enumeration of the scope object.
rules.rulename
(where
rulename
is the name of the referenced rule). rules.latest()
always refers to the result of the previous
reference in the current scope; meta.latest().text
refers to the
corresponding text utterance; and meta.latest().score
refers to the
corresponding score value; meta.latest().starttime
and meta.latest().endtime
refer to the corresponding starting and ending time value.Since the global scope is read-only, assignments to global variables are not allowed in SI Tags in rules. They are only possible in the global SI Tags in the grammar header (see section 4.2)
The following rule contains two Rule Variables associated with the same rule "city". The XML Form is:
<rule id="fromto"> from <ruleref uri="#city"/> <tag>out.fromcity=rules.city.name;</tag> to <ruleref uri="#city"/> <tag>out.tocity=meta.city.text;</tag> </rule>
and the equivalent ABNF Form is:
$fromto = from $city {out.fromcity=rules.city.name;} to $city {out.tocity=meta.city.text;};
To determine which of the Rule Variable instances the tags refer to, we can build
the flat parse for $fromto
, which is always of the form:
$fromto: from, $city(1), {out.fromcity=rules.city.name;}, to, $city(2), {out.tocity=meta.city.text;}
From this it follows that rules.city.name
in the first tag refers to
the first Rule Variable rules.city
in the rule, and that the reference
to meta.city.text
in the second tag is to the second Rule Variable named
rules.city
.
In the following rule, the flat parse is depending on whether the input matches
the optional rule b
. The XML Form is:
<rule id="a"> <ruleref uri="#b"/> <item repeat="0-1"><ruleref uri="#b"/></item> <tag>out.x=rules.b.x;</tag> </rule>
and the equivalent ABNF Form is:
$a = $b [$b] {out.x=rules.b.x;};
The two possible flat parses are:
$a: $b(1), {out.x=rules.b.x;} $a: $b(1), $b(2), {out.x=rules.b.x;}
The reference rules.b.x
in the tag will thus refer to either the
first or the last rule b
, depending on whether the optional rule
b
was matched in the input.
The SI Tag in the rule below contains a couple of references to Rule Variables that are undefined since there is no Rule Variable with that name before the tag in the flat parse. The XML Form is:
<rule id="a"> <ruleref uri="#b"/> <item repeat="0-1"><ruleref uri="#c"/></item> <tag>out.x=rules.c; out.y=rules.d; out.z=rules.e;</tag> <ruleref uri="#e"/> </rule>
and the equivalent ABNF Form is:
$a = $b [$c] {out.x=rules.c; out.y=rules.d; out.z=rules.e;} $e;
The two possible flat parses are:
$a: $b(1), {out.x=rules.c; out.y=rules.d; out.z=rules.e;}, $e(1) $a: $b(1), $c(1), {out.x=rules.c; out.y=rules.d; out.z=rules.e;}, $e(1)
This means that:
out.x
is undefined if rule c
didn't match in the
utterance.out.y
is undefined because rule d
is not in the rule
expansion at all.out.z
is undefined because rule e
doesn't appear before
the tag.Within a single SI Tag, the order of evaluation is determined by [ECMA-327] for the evaluation of a valid [ECMA-327] Program (ES 14).
All global SI Tags (in tags in the grammar header) are executed once, before any SI Tags within a grammar rule are executed (see section 4.2).
The order of evaluating multiple SI Tags within a grammar rule is the order in which the SI Tags appear in the flat parse list for that rule application. The flat parse list also determines how many SI elements will be generated from an SI Tag that occurs in a grammar rule. Every SI Tag element in a flat parse list is evaluated exactly once. The order of evaluating String Literals is determined by the order in which the equivalent SI Tag appears in the flat parse list (see section 6.2).
The computation of the semantic value of a rule reference in a flat parse list may occur at any time during the processing of the entire logical parse structure, subject to the following condition: the semantic value of a rule reference must be computed before any SI Tag using that reference's value is processed.
Consider the following rules in XML Form:
<rule id="a"> <ruleref uri="#b"/> <tag>out.y=rules.b.x;</tag> <item repeat="0-1"> <ruleref uri="#b"/><tag>out.y=out.y+rules.b.x;</tag> </item> </rule> <rule id="b"> foo <tag>out.x=1;</tag> <one-of> <item>bar<tag>out.x=3;</tag></item> <item> <item repeat="1-">boo<tag>out.x=out.x+1;</tag></item> </item> </one-of> </rule>
or equivalently in ABNF Form:
$a = $b {out.y=rules.b.x;} [$b {out.y=out.y+rules.b.x;}]; $b = foo {out.x=1;} (bar {out.x=3;} | (boo {out.x=out.x+1;})<1->);
For the input "foo boo boo boo", the flat parse lists are:
$a: $b(1), {out.y=rules.b.x} $b(1): foo, {out.x=1;}, boo, {out.x=out.x+1;}, boo, {out.x=out.x+1;}, boo, {out.x=out.x+1;}
and out.y
evaluates to 4.
For the input "foo bar foo boo", the flat parse lists are:
$a: $b(1), {out.y=rules.b.x;}, $b(2), {out.y=out.y+rules.b.x;} $b(1): foo, {out.x=1;}, bar, {out.x=3;} $b(2): foo, {out.x=1;}, boo, {out.x=out.x+1;}
and out.y
evaluates to 5.
The rules.b.x
and rules.c.x
refer to the respective Rule
Variable properties:
<rule id="a"> <ruleref uri="#b"/> <ruleref uri="#c"/> <tag>out.x = rules.b.x + rules.c.x;</tag> </rule>
The rules.c.x
causes a run-time error because it is used to the left
of rule c
:
<rule id="a"> <ruleref uri="#b"/> <tag>out.x = rules.b.x + rules.c.x;</tag> <ruleref uri="#c"/> </rule>
The rules.b.x
evaluates to the x
property of
rules.b
if rule b
is matched on the input utterance.
Otherwise it causes a run-time error:
<rule id="a"> <item repeat="0-1"><ruleref uri="#b"/></item> <ruleref uri="#c"/> <tag>out.x = rules.b.x + rules.c.x;</tag> </rule>
A safer way to write this rule could be (assuming x
is of type
Number):
<rule id="a"> <tag>out.x=0;</tag> <item repeat="0-1"><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item> <ruleref uri="#c"/> <tag>out.x = out.x + rules.c.x;</tag> </rule>
The rules.b.x
evaluates to the last occurrence of rule b
in the repeat:
<rule id="a"> <item repeat="1-"><ruleref uri="#b"/></item> <ruleref uri="#c"/> <tag>out.x=rules.b.x+rules.c.x;</tag> </rule>
If the purpose was to add or concatenate over each occurrence of
rules.b
, it should be written as:
<rule id="a"> <item repeat="1-"> <ruleref uri="#b"/><tag>out.x=out.x+rules.b.x;</tag> </item> <ruleref uri="#c"/> <tag>out.x=out.x+rules.c.x;</tag> </rule>
The rules.b
evaluates to the last occurrence of rules.b
in the repeat="0-"
expansion, if any, otherwise it is undefined:
<rule id="a"> <item repeat="0-"><ruleref uri="#b"/><ruleref uri="#d"/></item> <ruleref uri="#c"/> <tag>out.x=rules.b+rules.c.x;</tag> </rule>
Either rules.b.x
or rules.c.x
will cause a run-time
error depending on the input utterance:
<rule id="a"> <one-of> <item><ruleref uri="#b"/></item> <item><ruleref uri="#c"/></item> </one-of> <tag>out.x=rules.b.x+rules.c.x;</tag> </rule>
This could be better written as:
<rule id="a"> <one-of> <item><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item> <item><ruleref uri="#c"/><tag>out.x=rules.c.x;</tag></item> </one-of> </rule>
The rules.b.x
refers to whichever rules.b
actually
matched:
<rule id="a"> <one-of> <item><ruleref uri="#b"/> a</item> <item>a <ruleref uri="#b"/></item> </one-of> <ruleref uri="#c"/> <tag>out.x=rules.b.x+rules.c.x;</tag> </rule>
One of the operands to every addition causes a run-time error here depending on the input utterance:
<rule id="a"> <one-of> <item><ruleref uri="#b"/></item> <item><ruleref uri="#c"/></item> </one-of> <one-of> <item><ruleref uri="#d"/></item> <item><ruleref uri="#e"/></item> </one-of> <tag>out.x=(rules.b.x+rules.c.x) * (rules.d.x+rules.e.x);</tag> </rule>
This rule can be better written as:
<rule id="a"> <one-of> <item><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item> <item><ruleref uri="#c"/><tag>out.x=rules.c.x;</tag></item> </one-of> <one-of> <item><ruleref uri="#d"/><tag>out.x=out.x*rules.d.x;</tag></item> <item><ruleref uri="#e"/><tag>out.x=out.x*rules.e.x;</tag></item> </one-of> </rule>
Evaluation of rules.b.x
always causes a run-time error because the
expression will be evaluated only when rule c
matches, not rule
b
. (When rule b
matches, the default assignment would cause
out=meta.b.text
).
<rule id="a"> <one-of> <item><ruleref uri="#b"/></item> <item><ruleref uri="#c"/><tag>out.x=rules.b.x+rules.c.x;</tag></item> </one-of> </rule>
A more useful rule could be:
<rule id="a"> <one-of> <item><ruleref uri="#b"/><tag>out.x=rules.b.x;</tag></item> <item><ruleref uri="#c"/><tag>out.x=rules.c.x;</tag></item> </one-of> </rule>
The expression is only evaluated if rule c
matches; in that case both
rules.b
and rules.c
are defined:
<rule id="a"> <ruleref uri="#b"/> <item repeat="0-1"> <ruleref uri="#c"/> <tag>out.x=rules.b.x+rules.c.x;</tag> </item> </rule>
The expression is evaluated for every occurrence of rule c
. Note that
this will actually result in rules.b.x
to be added to out.x
for the last occurrence of rule c
because every evaluation will
overwrite the previous result.
<rule id="a"> <ruleref uri="#b"/> <item repeat="1-"> <ruleref uri="#c"/> <tag>out.x = rules.b.x + rules.c.x;</tag> </item> </rule>
Same effect as previous example except that now the expression is not evaluated if
rule c
did not match once.
<rule id="a"> <ruleref uri="#b"/> <item repeat="0-"> <ruleref uri="#c"/> <tag>out.x = rules.b.x + rules.c.x;</tag> </item> </rule>
These rules do the obvious concatenation of digits. Note that the ds
property is first initialized to ""
because otherwise in the first
evaluation of the expression, ds
would be undefined and would cause a
run-time error:
<rule id="digits"> <tag>out.ds="";</tag> <item repeat="1-"> <ruleref uri="#digit"/> <tag>out.ds = out.ds + rules.digit;</tag> </item> </rule> <rule id="digit"> <one-of> <item>"0"</item> <item>"1"</item> <item>"2"</item> <item>"3"</item> <item>"4"</item> <item>"5"</item> <item>"6"</item> <item>"7"</item> <item>"8"</item> <item>"9"</item> </one-of> </rule>
The rules.latest()
resolves to rules.c
:
<rule id="a"> <ruleref uri="#b"/> <ruleref uri="#c"/> <tag>out=rules.latest();</tag> </rule>
The rules.latest()
resolves to rules.b
:
<rule id="a"> <ruleref uri="#c"/> <ruleref uri="#b"/> <tag>out=rules.latest();</tag> </rule>
The rules.latest()
returns undefined
:
<rule id="a"> b c <tag>out=rules.latest();</tag> </rule>
If rule b
matches, rules.latest()
resolves to
rules.b
. If rule c
matches, rules.latest()
resolves to rules.c
:
<rule id="x"> <ruleref uri="#a"/> <one-of> <item><ruleref uri="#b"/></item> <item><ruleref uri="#c"/></item> </one-of> <tag>out=rules.latest();</tag> </rule>
This is equivalent to:
<rule id="x"> <ruleref uri="#a"/> <one-of> <item><ruleref uri="#b"/><tag>out=rules.latest();</tag></item> <item><ruleref uri="#c"/><tag>out=rules.latest();</tag></item> </one-of> </rule>
The rules.latest()
resolves to rules.b
, if rule
b
matches, if not, it resolves to rules.a
:
<rule id="x"> <ruleref uri="#a"/> <item repeat="0-1"><ruleref uri="#b"/></item> <tag>out=rules.latest();</tag> </rule>
The effect is equivalent to:
<rule id="x"> <ruleref uri="#a"/><tag>out=rules.latest();</tag> <item repeat="0-1"><ruleref uri="#b"/><tag>out=rules.latest();</tag></item> </rule>
The rules.latest()
resolves to the last occurrence of
rules.a
:
<rule id="x"> <item repeat="1-"><ruleref uri="#a"/></item> <tag>out=rules.latest();</tag> </rule>
The effect is equivalent to:
<rule id="x"> <item repeat="1-"><ruleref uri="#a"/><tag>out=rules.latest();</tag></item> </rule>
Semantic Interpretation processors may be used in environments where a return result is expected in XML format (for example, those supporting [EMMA]).
If returning XML results, the following serialization rules must be used to generate an XML fragment from the Semantic Interpretation process. Notice that these serialization rules apply to semantic values generated by authored SI Tags during SI processing, and do not preclude the addition of further information into the XML result by an individual SI processor (for example, recognizer annotations corresponding to acoustic confidence scores or other such information). This specification does not define the XML documents in which the generated fragment can be embedded.
The serialization into XML has been designed as a convenient mechanism to generate XML fragments directly from SI grammars. It has not been designed as a generic conversion mechanism from [ECMA-327] objects into XML fragments. It is not a generic conversion mechanism for at least the following reasons:
DontEnum
properties are not serialized.The serialization of the ECMAScript result into an XML fragment is governed by the following transformations rules:
Object
but a
simple scalar type (String, Number, Boolean, Null or Undefined) then the resulting
XML fragment only consists of character data without any mark-up. The character
data will be the value of the top-level Rule Variable as if the
ToString()
operation had been performed on an argument of this type
(e.g., for Boolean, the result would be true
or false
).ToString()
operation had been
performed on an argument of this type.Array
object (e.g. a[0]
,
a[1]
. etc.) become XML child elements with name <item>
.
Each <item>
element has an attribute named index
,
which is the index of the corresponding element in the array. In addition, the XML
element containing the <item>
elements includes an attribute named
length
, whose value is given by the length property of the ECMAScript
Array object. Any other properties of an Array object, for instance the keys of an
associative array (e.g. a["prop"]
), are subject to the same
transformation rules as the regular properties of an object. In a sparse array,
only those elements which hold defined values will be serialized._attributes
, _value
,
_nsdecl
and _nsprefix
will be treated according to the rules
described in the sections below.Notes:
DontEnum
attribute (see ES 8.6.1) are
not serialized. This prevents functions and built-in properties from being
serialized.Array
object, the length
attribute will not be
present because there will be no XML element containing the <item>
child
elements.Following the above principles, to take the top-level Rule Variable with the properties drink and pizza of the example grammar in section 8:
{ drink: { liquid:"coke", drinksize:"medium"}, pizza: { number: "3", pizzasize: "large", topping: [ "pepperoni" "mushrooms" ] } }
SI processing in an XML environment would generate the following document:
<drink> <liquid>coke</liquid> <drinksize>medium</drinksize> </drink> <pizza> <number>3</number> <pizzasize>large</pizzasize> <topping length="2"> <item index="0">pepperoni</item> <item index="1">mushrooms</item> </topping> </pizza>
The following example ECMAScript object would cause an error because the $size$
property while a valid name in ECMAScript is not a valid name for an XML Element:
{ drink: { liquid:"coke", $size$:"medium"} }
Variables named _attributes
and _value
can be created
and used by the author to enable the generation of richer XML results, including
the following structures:
The _attributes
object is used to hold property name/value pairs
which will be rendered as XML attributes of the object which contains
_attributes
.
The _value
variable is used to hold a scalar value for character data
contained in an element or to hold the value of an attribute.
Semantic Interpretation processors treat these objects in the following way:
_attributes
object are rendered as XML
attributes of the containing object._value
is treated as character data content of the
containing object or the value of an attribute if the containing object is a child
of _attributes
.If the value of _value
is not a scalar type, the
ToString()
operation is performed to generate a string value.
_attribute
has a name that is not a legal name for an XML attribute.
The following ECMAScript object:
{ martini: { gin: { _value: "Bombay Sapphire", _attributes { ratio: 8 } }, vermouth: { _value: "Noilly Prat" , _attributes { ratio: 1 } }, _attributes { method: "shaken" } } }
would generate the following XML result:
... <martini method="shaken"> <gin ratio="8">Bombay Sapphire</gin> <vermouth ratio="1">Noilly Prat</vermouth> </martini> ...
The object named _nsdecl
is used to declare a namespace [XML Names] in an element. The property named
_nsprefix
enables the SI author to associate an XML element or attribute
with a particular namespace.
When an object contains the _nsdecl
property, the namespace
declaration is attached to the resultant XML serialized element for this object. The
_prefix
property of _nsdecl
indicates the namespace prefix
and the _name
property of _nsdecl
indicates the
corresponding namespace name (usually a URI reference). If the _prefix
property is an empty string, the default namespace is declared. If both
_prefix
and _name
are empty strings, the namespace
declaration xmlns=""
applies.
When an Array
object contains the _nsprefix
property,
the prefix also applies to the automatically generated <item>
elements and length
and index
attributes.
Note that this transformation produces an XML fragment - see [XML Names] for rules on valid namespace usage in XML.
_nsprefix
can be used for example to generate XML attributes such as
emma:hook
or emma:tokens
when generating XML fragments to
be embedded in EMMA documents. See Appendix C of the [EMMA]
specification for more information and examples. The namespace declaration with
_nsdecl
may not be needed when provided by the XML document in which the
fragment will be embedded.
The following ECMAScript object:
{ drink: { _nsdecl: { _prefix:"n1", _name:"http://www.example.com/n1" }, _nsprefix:"n1", liquid: { _nsdecl: { _prefix:"n2", _name:"http://www.example.com/n2" }, _attributes: { color: { _nsprefix:"n2", _value:"black" } }, _value:"coke" }, size:"medium" } }
would generate the following XML result:
<n1:drink xmlns:n1="http://www.example.com/n1"> <liquid n2:color="black" xmlns:n2="http://www.example.com/n2">coke</liquid> <size>medium</size> </n1:drink>
Note that the _nsprefix
property only applies to its parent object
and hence neither the <liquid>
element nor the <size>
element are associated with a
namespace in this fragment.
With the grammar illustrated below, the following utterance
"I would like a coca cola and three large pizzas with pepperoni and mushrooms."
would create the following Rule Variable on the rule order
:
{ drink: { liquid:"coke", drinksize:"medium"}, pizza: { number: "3", pizzasize: "large", topping: [ "pepperoni", "mushrooms" ] } }
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN" "http://www.w3.org/TR/speech-grammar/grammar.dtd"> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" version="1.0" mode="voice" tag-format="semantics/1.0" root="order"> <rule id="order"> I would like a <ruleref uri="#drink"/> <tag>out.drink = new Object(); out.drink.liquid=rules.drink.type; out.drink.drinksize=rules.drink.drinksize;</tag> and <ruleref uri="#pizza"/> <tag>out.pizza=rules.pizza;</tag> </rule> <rule id="kindofdrink"> <one-of> <item>coke</item> <item>pepsi</item> <item>coca cola<tag>out="coke";</tag></item> </one-of> </rule> <rule id="foodsize"> <tag>out="medium";</tag> <!-- "medium" is default if nothing said --> <item repeat="0-1"> <one-of> <item>small<tag>out="small";</tag></item> <item>medium</item> <item>large<tag>out="large";</tag></item> <item>regular<tag>out="medium";</tag></item> </one-of> </item> </rule> <!-- Construct Array of toppings, return Array --> <rule id="tops"> <tag>out=new Array;</tag> <ruleref uri="#top"/> <tag>out.push(rules.top);</tag> <item repeat="1-"> and <ruleref uri="#top"/> <tag>out.push(rules.top);</tag> </item> </rule> <rule id="top"> <one-of> <item>anchovies</item> <item>pepperoni</item> <item>mushroom<tag>out="mushrooms";</tag></item> <item>mushrooms</item> </one-of> </rule> <!-- Two properties (drinksize, type) on left hand side Rule Variable --> <rule id="drink"> <ruleref uri="#foodsize"/> <ruleref uri="#kindofdrink"/> <tag>out.drinksize=rules.foodsize; out.type=rules.kindofdrink;</tag> </rule> <!-- Three properties on rules.pizza --> <rule id="pizza"> <ruleref uri="#number"/> <ruleref uri="#foodsize"/> <tag>out.pizzasize=rules.foodsize; out.number=rules.number;</tag> pizzas with <ruleref uri="#tops"/> <tag>out.topping=rules.tops;</tag> </rule> <rule id="number"> <one-of> <item> <tag>out=1;</tag> <one-of> <item>a</item> <item>one</item> </one-of> </item> <item>two<tag>out=2;</tag></item> <item>three<tag>out=3;</tag></item> </one-of> </rule> </grammar>
#ABNF 1.0 UTF-8; language en; mode voice; tag-format <semantics/1.0>; root $order; $order = I would like a $drink {out.drink = new Object(); out.drink.liquid = rules.drink.type; out.drink.drinksize = rules.drink.drinksize;} and $pizza {out.pizza=rules.pizza;}; $kindofdrink = coke | pepsi | "coca cola"{out="coke";}; // "medium" is default if nothing said $foodsize = {out="medium";} [small {out="small";} | medium | large {out="large";}| regular {out="medium";}]; // Construct Array of toppings, return Array $tops = {out=new Array;} $top {out.push(rules.top);} (and $top {out.push(rules.top);})<1->; $top = anchovies | pepperoni | mushroom{out="mushrooms";} | mushrooms; // Two properties (drinksize, type) on left hand side Rule Variable $drink = $foodsize $kindofdrink {out.drinksize=rules.foodsize; out.type=rules.kindofdrink; }; // Three properties on rules.pizza's Rule Variable $pizza = $number $foodsize {out.pizzasize=rules.foodsize; out.number=rules.number;} pizzas with $tops {out.topping=rules.tops;}; $number = (a | one){out="1";} | two{out="2";} | three{out="3";};
The following grammar demonstrates the use of Semantic Interpretation for computation within a grammar.
This simple number grammar accepts as input whole numbers between 0 and 99,999 inclusive. It demonstrates how rule references may be reused multiple times and the returned SI information processed differently each time. The grammar also shows how the Rule Variable may be given a default value (0 in this case) and also used as an intermediate variable during computation (essentially incrementing the running total stored in the Rule Variable). In this example, the Rule Variable type is changed from an Object to a Number but an alternative strategy might just as easily store the number as a property of the Rule Variable object.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN" "http://www.w3.org/TR/speech-grammar/grammar.dtd"> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" version="1.0" mode="voice" tag-format="semantics/1.0" root="main"> <rule id="main"> <one-of> <item> <ruleref uri="#sub_hundred_thousand"/> <tag>out = rules.sub_hundred_thousand;</tag> </item> <item> <ruleref uri="#sub_thousand"/> <tag>out = rules.sub_thousand;</tag> </item> <item> <ruleref uri="#sub_hundred"/> <tag>out = rules.sub_hundred;</tag> </item> </one-of> </rule> <rule id="sub_hundred_thousand"> <ruleref uri="#sub_hundred"/> <tag>out = (1000 * rules.sub_hundred)</tag> thousand <item repeat="0-1"> <item repeat="0-1">and</item> <ruleref uri="#sub_thousand"/><tag>out += rules.sub_thousand;</tag> </item> </rule> <rule id="sub_thousand"> <ruleref uri="#sub_hundred"/> <tag>out = (100 * rules.sub_hundred);</tag> hundred <item repeat="0-1"> <item repeat="0-1">and</item> <ruleref uri="#sub_hundred"/><tag>out += rules.sub_hundred;</tag> </item> </rule> <rule id="sub_hundred"> <tag>out = 0;</tag> <one-of> <item>zero</item> <item><ruleref uri="#teens"/><tag>out += rules.teens;</tag></item> <item> <ruleref uri="#tens"/><tag>out += rules.tens;</tag> <item repeat="0-1"> <ruleref uri="#digit"/> <tag>out += rules.digit;</tag> </item> </item> <item><ruleref uri="#digit"/><tag>out += rules.digit;</tag></item> </one-of> </rule> <rule id="tens"> <one-of> <item>twenty<tag>out = 20;</tag></item> <item>thirty<tag>out = 30;</tag></item> <item>forty<tag>out = 40;</tag></item> <item>fifty<tag>out = 50;</tag></item> <item>sixty<tag>out = 60;</tag></item> <item>seventy<tag>out = 70;</tag></item> <item>eighty<tag>out = 80;</tag></item> <item>ninety<tag>out = 90;</tag></item> </one-of> </rule> <rule id="teens"> <one-of> <item>ten<tag>out = 10;</tag></item> <item>eleven<tag>out = 11;</tag></item> <item>twelve<tag>out = 12;</tag></item> <item>thirteen<tag>out = 13;</tag></item> <item>fourteen<tag>out = 14;</tag></item> <item>fifteen<tag>out = 15;</tag></item> <item>sixteen<tag>out = 16;</tag></item> <item>seventeen<tag>out = 17;</tag></item> <item>eighteen<tag>out = 18;</tag></item> <item>nineteen<tag>out = 19;</tag></item> </one-of> </rule> <rule id="digit"> <one-of> <item>one<tag>out = 1;</tag></item> <item>two<tag>out = 2;</tag></item> <item>three<tag>out = 3;</tag></item> <item>four<tag>out = 4;</tag></item> <item>five<tag>out = 5;</tag></item> <item>six<tag>out = 6;</tag></item> <item>seven<tag>out = 7;</tag></item> <item>eight<tag>out = 8;</tag></item> <item>nine<tag>out = 9;</tag></item> </one-of> </rule> </grammar>
#ABNF 1.0 UTF-8; language en; mode voice; tag-format <semantics/1.0>; root $main; $main = $sub_hundred_thousand { out = rules.sub_hundred_thousand; } | $sub_thousand { out = rules.sub_thousand; } | $sub_hundred { out = rules.sub_hundred; }; $sub_hundred_thousand = $sub_hundred { out = (1000 * rules.sub_hundred); } thousand [ [and] $sub_thousand { out += rules.sub_thousand; } ]; $sub_thousand = $sub_hundred { out = (100 * rules.sub_hundred); } hundred [ [and] $sub_hundred { out += rules.sub_hundred; } ]; $sub_hundred = { out = 0; } (zero | $teens { out += rules.teens; } | $tens { out += rules.tens; } [ $digit { out += rules.digit; } ] | $digit { out += rules.digit; }); $tens = twenty { out = 20; } | thirty { out = 30; } | forty { out = 40; } | fifty { out = 50; } | sixty { out = 60; } | seventy { out = 70; } | eighty { out = 80; } | ninety { out = 90; }; $teens = ten { out = 10; } | eleven { out = 11; } | twelve { out = 12; } | thirteen { out = 13; } | fourteen { out = 14; } | fifteen { out = 15; } | sixteen { out = 16; } | seventeen { out = 17; } | eighteen { out = 18; } | nineteen { out = 19; }; $digit = one { out = 1; } | two { out = 2; } | three { out = 3; } | four { out = 4; } | five { out = 5; } | six { out = 6; } | seven { out = 7; } | eight { out = 8; } | nine { out = 9; };
This section is normative.
A Semantic Interpretation Tag (SI Tag) is a Conforming SI Tag if its content matches the syntax as defined in the normative sections in this document.
There is no normative restriction on the size of a SI Tag.
A Conforming Semantic Interpretation Grammar is a stand-alone ABNF or XML Grammar Document or an XML Grammar Fragment where:
semantics/1.0
or semantics/1.0-literals
.A grammar that contains tags in a format other than specified by this document or
its successors must have a tag format declaration with a value that is not beginning
with the string semantics/x.y
(where x
and y
are digits) (see Speech Recognition Grammar Specification 4.8 Tag Format Declaration
[SRGS]).
A Semantic Interpretation Processor is a program that can parse and process Conforming SI Tags to produce semantic results. Semantic Interpretation Processors are executed in a hosting environment (e.g. a grammar processor).
A Conforming Semantic Interpretation Processor:
A Semantic Interpretation Grammar Processor is a system that can parse and process Conforming Semantic Interpretation Grammars. Specifically, a Semantic Interpretation Grammar Processor is a conforming processor if:
Anyone wishing to state conformance of a Grammar Fragment or Grammar Document with SI Tags (document) to this specification should use the following wording:
This document conforms to W3C's "Semantic Interpretation for Speech Recognition", available at http://www.w3.org/TR/2006/CR-semantic-interpretation-20060111/.Anyone wishing to state conformance of a processor to this specification should use the following wording:
[PROCESSOR] is a Conforming [ (1) ABNF, (2) XML, (3) ABNF and XML ] Semantic Interpretation Grammar Processor according to W3C's "Semantic Interpretation for Speech Recognition", available at http://www.w3.org/TR/2006/CR-semantic-interpretation-20060111/ [with support for XML Transformation].
Make the appropriate substitutions:
This document was written with the participation of members of the W3C Voice Browser Working Group [VBWG]. The following have significantly contributed to writing this specification:
The following is a summary of the major changes since the Last Call Working Draft was published on November 8, 2004, based on input from reviewers and the working group: