Copyright © 2004 W3C ® (MIT , ERCIM , Keio), All Rights Reserved. W3C liability, trademark, document use, and software licensing rules apply.
This document details the responses made by the Voice Browser Working Group to issues raised during the Candidate Recommendation (beginning 28th January 2003 and ending 10th April 2003) review of Voice Extensible Markup Language (VoiceXML) Version 2.0 . Comments were provided by Voice Browser Working Group members, other W3C Working Groups, and the public via the www-voice-request@w3.org (archive) mailing list.
This document of the W3C's Voice Browser Working Group describes the disposition of comment as of January 19, 2004 on Voice Extensible Markup Language (VoiceXML) Version 2.0 Candidate Recommendation. It may be updated, replaced or rendered obsolete by other W3C documents at any time.
For background on this work, please see the Voice Browser Activity Statement.
This document describes the disposition of comments in relation to the Voice Extensible Markup Language (VoiceXML) Version 2.0 (http://www.w3.org/TR/2003/CR-voicexml20-20030220/). Each issue is described by the name of the commentator, a description of the issue, and either the resolution or the reason that the issue was not resolved.
The full set of Issues raised for the Voice Extensible Markup Language (VoiceXML) Version 2.0 since August 2000, their resolution and in most cases the reasoning behind the resolution are available from http://www.w3.org/Voice/Group/2004/voicexml-change-requests.htm [W3C Members Only]. This document provides the analysis of the issues that were submitted and resolved as part of the Last Call Review.
Notation: Each original comment is tracked by a "(Change) Request" [R] designator. Each point within that original comment is identified by a point number. For example, "R5-1" is the first point in the fifth change request for the specification.
Item | Commentator | Nature | Disposition |
CR1-1 | Arnaud Vallee | Clarification / Typographical / Editorial (§2.1) | accepted (no-reply) |
CR2-1 | Arnaud Vallee | Technical Error (§2.2) | accepted (no reply) |
CR3-1 | Arnaud Vallee | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR4-1 | Arnaud Vallee | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR5-1 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-2 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-3 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-4 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-5 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-6 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-7 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-8 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-9 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-10 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-11 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-12 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-13 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-14 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-15 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR5-16 | Guillaume Berche | Change to Existing Feature (§2.3) | accepted |
CR6-1 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR6-2 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR6-3 | Guillaume Berche | Change to Existing Feature (§2.3) | accepted |
CR6-4 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR6-5 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR6-6 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR6-7 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR6-8 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR6-9 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR6-10 | Guillaume Berche | Technical Error (§2.2) | accepted |
CR6-11 | Guillaume Berche | Technical Error (§2.2) | accepted |
CR6-12 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR6-13 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR7-1 | Max Froumentin | Clarification / Typographical / Editorial (§2.1) | accepted |
CR8-1 | Matt Porter | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR9-1 | John Voger | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR10-1 | Philippe Le Hegaret | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR11-1 | C. M. Sperberg-McQueen | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR11-2 | C. M. Sperberg-McQueen | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR11-3 | C. M. Sperberg-McQueen | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR11-4 | C. M. Sperberg-McQueen | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR11-5 | C. M. Sperberg-McQueen | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR11-6 | C. M. Sperberg-McQueen | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR11-7 | C. M. Sperberg-McQueen | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR11-8 | C. M. Sperberg-McQueen | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR12-1 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR12-2 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR12-3 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR13-1 | Greg FitzPatrick | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR14-1 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR14-2 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | accepted |
CR15-1 | Ufuk Kayserilioglu | Clarification / Typographical / Editorial (§2.1) | accepted |
CR16-1 | Mark Clark | Clarification / Typographical / Editorial (§2.1) | accepted (no reply) |
CR17-1 | Robert Barkan | Clarification / Typographical / Editorial (§2.1) | accepted |
CR18-1 | Mark Clark | Change to Existing Feature (§2.3) | accepted (no reply) |
CR19-1 | Pavel Cenek | Feature Request (§2.4) | accepted |
CR19-2 | Pavel Cenek | Feature Request (§2.4) | accepted |
From Arnaud Vallee
Analysis:I have a question about where the error.badfetch is thrown and caught when a called document has non existent root document. Take the following scenario. The document 1 makes a transition to document 2 whose root document does not exist. document 1 and document 2 have error.badfetch handler at the document level. Where is the error supposed to be caught? I think the question could be the same for the following assertion: If a document's application attribute refers to a document that also has an application attribute specified, an error.semantic event is thrown. As i did not get any anwer to the message, i post my query one more time. The issue is as follows: In a document named doc1.vxml, which is a root document (do not specify an application attribute in the vxml tag), we transition to a document doc2.vxml. doc2.vxml refers to a non existing root document (i.e., application attribute set to doc2-root-unexisting.vxml). As the spec says (chap 1.5.2), " If a document refers to a non-existent application root document, an error.badfetch event is thrown ", an error.badfetch is thrown in this case. The question: where is the error thrown, or in other way, where do i put the error.badfetch handler to catch the error? I see 2 possibilities: - in doc1.vxml, which means that if a document refers to a non existing root document, it is a badfecth to try to get this document. - in doc2.vxml, which means that current document has to be initialized before getting and initializing the root document. I think this is the same issue with the following assertion in chapter 1.5.2: "If a document's application attribute refers to a document that also has an application attribute specified, an error.semantic event is thrown. " except that, in this case, the error.semantic could also be catched in the first root document.
[Pavel Cenek] I am not member of WBWG, so my answer is only a guess. I also waited for an authorized answer and therefore haven't reacted on your first attempt. > The issue is as follows: > In a document named doc1.vxml, which is a root document (do not specify an application attribute in the vxml tag), we transition to a document doc2.vxml. > doc2.vxml refers to a non existing root document (i.e., application attribute set to doc2-root-unexisting.vxml). > > As the spec says (chap 1.5.2), > " If a document refers to a non-existent application root document, an error.badfetch event is thrown ", > an error.badfetch is thrown in this case. > > The question: where is the error thrown, or in other way, where do i put > the error.badfetch handler to catch the error? The transition is caused by <goto> or <submit>, etc, therefore I would apply the rules for these tags (which should be the same for all of them). For <goto>, spec says: "Note that for errors which occur during a dialog or document transition, the scope in which errors are handled is platform specific." > I see 2 possibilities: > - in doc1.vxml, which means that if a document refers to a non existing root document, it is a badfecth to try to get this document. In my opinion this possibility is more logical. > - in doc2.vxml, which means that current document has to be initialized before getting and initializing the root document. > I think this is the same issue with the following assertion in chapter 1.5.2: > "If a document's application attribute refers to a document that also has an application attribute specified, an error.semantic event is thrown. " I think it would be valuable to mention the citation above also in the chapter one.
Resolution: rejected
The specification allows the error.badfetch event to be thrown in either the referring document or the referred document. To guarantee that the error is caught, catch handlers need to be specified in both documents. This error handling pattern is illustrated in numerous tests in our implementation report.
Email Trail:
From Arnaud Vallee
chapter 2.4 of the VoiceXML (24 April 2002) Attributes of filled are: mode Either all (the default), or any. If any, this action is executed when any of the specified input items is filled by the last user input. If all, this action is executed when all of the mentioned input items are filled, and at least one has been filled by the last user input. A <filled> element in an input item cannot specify a mode. namelist The input items to trigger on. For a <filled> in a form, namelist defaults to the names (explicit and implicit) of the form's input items. A <filled> element in an input item cannot specify a namelist; the namelist in this case is the input item name. Note that control items are not permitted in this list. As i understand these attributes are not permitted in filled elements which are child of input item. But the spec do not say what happens in this case: - ignore those attributes? - throw an error (semantic)? Furthermore, control items items are not permitted in namelist. I suppose any other ECMA variable are not permitted neither. But how a voice browser should handle that case? Ignore the non-input variable elements or throw an error (semantic)?
Resolution: accepted with modifications
The specification will be modified so that upon encountering a document containing a <filled> element specifying either a 'mode' or 'namelist' attribute as a child of an input item, then an error.badfetch is thrown by the platform. In addition, the specification will also make clear that an error.badfetch is thrown when the document contains a <filled> element with a namelist attribute referencing a control item variable.
Email Trail:
From Arnaud Vallee
The bargeintype propery is defined as follows: "speech: The prompt will be stopped as soon as speech or DTMF input is detected. The prompt is stopped irrespective of whether or not the input matches a grammar. " Would this mean that even if no dtmf grammar is active and the user enter a dtmf, the prompt should be stopped?
Resolution: accepted with modifications
Yes. If bargeintype is speech then the prompt will be stopped as soon as speech or DTMF input is detected regardless of if it is a match or not. Having dtmf grammars active or not does not effect this. Setting the inputmodes to voice should prevent the DTMF from barging in on the prompts (although some platforms may have difficulty separating in-band DTMF from speech). The specification will be clarified as follows: addition of the words "and irrespective of which grammars are active." to the end of the sentence "The prompt is stopped irrespective of whether or not the input matches a grammar" from table 38.
Email Trail:
From Guillaume Berche
0- Precise the value of the _dtmf special variable when a grammar element is specified in a choice element. As specified in the section "2.2 Menus", paragraph "Choice element": "If a <grammar> element is specified in <choice>, then the external grammar is used instead of an automatically generated grammar." However, in such case it is not clear what value will be assigned in the _dtmf special variable while executing an enumerate element. Suggested text modification to "2.2.4 ENUMERATE": "This specifier may refer to two special variables: _prompt is the choice's prompt, and _dtmf is the choice's assigned DTMF sequence. **If no DTMF sequence is assigned to the choice element or if a <grammar> element is specified in <choice> then the _prompt variable is assigned the ECMAScript undefined value.**"
Resolution: accepted with modifications
We accept the suggested text but will re-word it more precisely (e.g. '_dtmf' instead of '_prompt').
Email Trail:
From Guillaume Berche
1- Precise semantics of id attribute of form and menu The id attribute is optional according to the schema. However the specifications do not seem to precise how the interpreter should handle dialogs without specified id. Suggested text modification to section "2.1 Forms": "id The optional name of the form. If specified, the form can be referenced within the document or from another document. For instance <form id="weather">, <goto next="#weather">. **If not specified, an internal name is generated by the interpreter instead.**" Suggested text modification to section "2.2 Menus": "id The optional identifier of the menu. It allows the menu to be the target of a <goto> or a <submit>. **If not specified, an internal name is generated by the interpreter instead.**"
Resolution: rejected
If no explicit id is specified, then the developer is not interested in referring to the form or menu element. Whether or not the platform generates an internal name is a vendor-specific issue.
Email Trail:
From Guillaume Berche
2- Precise that <value> should be ignored if the expression resolves to ECMAScript undefined There are cases where it is difficult to know whether a variable (such as special variable as _dtmf) has a non-null value without writing an explicit if statement. To avoid this, it would be convenient if value elements would be silently ignored if their expressions resolved into the ECMAScript undefined value (whereas references to undeclared variables would keep throwing an error.semantic event). Suggested text modification to section section "4.1.4 <value> Element": "expr The ECMAScript expression which provides the text to render, or resolves into a special variable such as _prompt or _dmtf as specified in section "2.2 Menus" paragraph "Enumerate element". If the expression resolves into the ECMAScript undefined value, then the value element is silently ignored. However, if the expression refers to an undeclared variable, then an error.semantic event is thrown."
Resolution: rejected
As pointed out, the developer can always write explicit code to check the value of variables. The value of providing a 'convenience' interpretation is not clear to us.
Email Trail:
From Guillaume Berche
3- Precise the value of _prompt when an option has no nested CDATA As specified in "2.3.1.3. Fields Using Option Lists": "The default assignment is the CDATA content of the <option> element with leading and trailing white space removed. If this does not exist, then the DTMF sequence is used instead." Since the value of the _prompt variable is computed from the CDATA content, what values is assigned to the _prompt variable when no CDATA content is available in an option element? If the undefined value is assigned to the _prompt special variable, would a <value expr="_prompt"> element fail? Suggested modification: "if no CDATA is available from the <option> or <choice> element, then the _prompt special variable is assigned the undefined ECMAScript value."
Resolution: rejected
Having considered various alternatives including your suggestion, the group felt that at this stage in the process it is better to leave the behavior undefined and thereby platform-specific. A later version of VoiceXML may provide a more optimal solution.
Email Trail:
From Guillaume Berche
4- precise the semantics of the value attribute of option elements Section "2.3.1.3. Fields Using Option Lists" specifies the following: "value The string to assign to the field's form item variable when a user selects this option, whether by speech or DTMF. The default assignment is the CDATA content of the <option> element with leading and trailing white space removed. If this does not exist, then the DTMF sequence is used instead. " However, the DTMF sequence is optional according to the schema. Consequently, it would be useful to precise the behavior if unspecified Suggested text modification to section "2.3.1.3. Fields Using Option Lists": "Each <option> element contains PCDATA that is used to generate a speech grammar. This follows the grammar generation method described for <choice> in Section 2.2. Attributes may be used to specify a DTMF sequence for each option and to control the value assigned to the field's form item variable. Each option should at least define a DTMF sequence through the dtmf attribute or contain CDATA content specifying the matching speech element, otherwise an error.badfetch event is thrown."
Resolution: accepted with modifications
We will modify the specification so that in the situation where neither CDATA content nor a dtmf sequence is specified, then the default for the value attribute is undefined and the form field item is not filled.
Email Trail:
From Guillaume Berche
5- Precise the format of the _dtmf special variable. Section "2.2 Menus", paragraph "Enumerate element" states that "specifier may refer to two special variables: _prompt is the choice's prompt, and _dtmf is the choice's assigned DTMF sequence." However it does not precise how the DTMF sequence is formatted (whether there are white space delimiters that makes the string suitable for direct inclusion within a speech prompt) Suggested text modification to section "2.2 Menus", paragraph "Enumerate element": "_prompt is the choice's prompt, and _dtmf is the choice's assigned DTMF sequence formatted as a string holding the DTMF keystrokes separated by white spaces (making it suitable for inclusion within a speech prompt)"
Resolution: accepted with modifications
The specification will be modified so that the format of _dtmf is a normalized representation of the dtmf sequence (i.e. single whitespace between DTMF tokens).
Email Trail:
From Guillaume Berche
6- Precise the semantics of the dtmf attribute of option elements Suggested modification to section "2.3.1.3. Fields Using Option Lists": "dtmf An **optional** DTMF sequence for this option. It is equivalent to a simple DTMF <grammar> and DTMF properties (Section 6.3.3) apply to recognition of the sequence. Unlike DTMF grammars, whitespace is optional: dtmf="123#" is equivalent to dtmf="1 2 3 #". **If unspecified, no DTMF grammar is associated to this option, meaning that this option can not be matched using a DTMF**" Rationale: it would make sense to add an option similar to the menu's dtmf attribute so that dtmf sequence is automatically generated. Without this attribute, how would an VXML author prevent the automatic generation of DTMF grammars that may override other grammars (such as links)? In addition, we would also need to specify what happens if a specified option's dtmf attributes overlaps an automatically assigned dtmf. Should this throw an "error.semantic" event as for choice elements or should we rather apply the default grammar precedence algorithm to select the matching element?
Resolution: accepted with modifications
We accept the suggested modification to 2.3.1.3 concerning the description of the dtmf attribute based on an alternative rationale; namely, that this is good clarification independent of the new features you mentioned in your rationale.
Email Trail:
From Guillaume Berche
7- Precise semantics of Clear element. Section "5.3.3 CLEAR" states that "The <clear> element resets one or more form items" However, the definition of the namelist attribute adds that "this [i.e. the namelist] can include variable names other than form items" Besides, in the case where the namelist includes variable names other than form items, what is the variable scope in which the variable must be defined to be cleared? Since a Clear element is an executable which may be included in a catch element, which variable scope does it targets? In other words, would the reset of a non-form item variable target the anonymous, dialog, document or application-level scope? [In addition, the Clear element may be invoked outside of the FIA (such as during the document initialization), in which the notion of active element is not clear, so relying on the scope of the active element as the scope in which a variable should be cleared is ambiguous.] Suggested text modification to Section "5.3.3 CLEAR": "The <clear> element resets one or more form items, and possibly other variables which are not form items. For each specified variable name, the variable is resolved in the closest enclosing scope of the currently active element as described in section "5.1.3 Referencing Variables". To remove ambiguity, each variable name in the namelist may be prefixed with a scope name as described in section "5.1.3 Referencing Variables". Once a declared variable has been identified as declared in a given scope S, its value is assigned the ECMAScript undefined value. In addition, if the variable name corresponds to a form item in scope S, then the form item's prompt counter and event counters are reset."
Resolution: accepted with modifications
We accept that the clear element should be clarified as your text suggests. However, we will modify the wording so that (a) variable references are resolved relative to the current scope as described in section 5.1.3, and (b) in the case of initialization, variable references are handled the same as for other ECMAScript variables.
Email Trail:
From Guillaume Berche
8- Precise that var name attribute does not support scope prefixes Suggested text modification to section "5.3.1 VAR": "name The name of the variable that will hold the result. **Unlike the name attribute of assign element, this attribute should not contain dots (and in particular a scope prefix). The scope in which the variable is defined is determined from the position in the document at which the var element is declared.**"
Resolution: accepted with modifications
We accept the suggestion but will modify the text style for consistency with the rest of the document.
Email Trail:
From Guillaume Berche
9- Precise that the assign's name attribute does support scope prefixes The scope in which a variable is resolved is currently not clear. The accepted scope prefix in the name attribute is also not clear. Suggested text modification to section "5.3.2 ASSIGN" "name The name of the variable being assigned to. As specified in section "5.1.2 Variable Scopes", the corresponding variable should have been previously declared otherwise an error.semantic event is thrown. By default, the scope in which the variable is resolved is the closest enclosing scope of the currently active element. To remove ambiguity, the variable name may be prefixed with a scope name as described in section "5.1.3 Referencing Variables". Note however that the name must refer to a variable and can not refer to a property of an ECMAScript object or can not be a complex ECMAScript expression."
Resolution: accepted with modifications
We accept the suggested text modification but not the final line beginning "Note however" since it is permissable to assign to the property of an object; the second example in 5.3.2 makes this clear - <assign name="document.mycost" expr="document.mycost+14"/>.
Email Trail:
From Guillaume Berche
10- Precise evaluation order of log attributes versus nested text/value, and constraints on attributes Suggested modification to section "5.3.13 LOG": "label An **optional** string which may be used, for example, to indicate the purpose of the log. expr An **optional** ECMAscript expression evaluating to a string. " "The <log> element may contain any combination of text (CDATA) and <value> elements. The generated message consists of the concatenation of the evaluation of the ECMAscript expression followed in their respective order by the nested text and the string form of the value of the "expr" attribute of the <value> elements."
Resolution: accepted with modifications
We accept the clarification of 'optional' but not the last paragraph describing the order of evaluation - the order is already specified as document order.
Email Trail:
From Guillaume Berche
11- Precise ordering of anonymous grammar generated for dtmfterm As specified in section "2.3.6. RECORD": "The <record> element contains a 'dtmfterm' attribute as a developer convenience. A 'dtmfterm' attribute with the value 'true' is equivalent to the definition of a local DTMF grammar which matches any DTMF input. " However, it is legal to have nested grammars in a record element. For instance, a DTMF grammar that matches only the # key. It is not clear which grammar would match because the precedence is not described. Suggested text modification to section "2.3.6. RECORD": "The <record> element contains a 'dtmfterm' attribute as a developer convenience. A 'dtmfterm' attribute with the value 'true' is equivalent to the definition of a local DTMF grammar which matches any DTMF input. Any nested grammar element will have precedence over this anonymous local grammar (even though usefulness of such nested grammar is not clear)."
Resolution: accepted with modifications
We accept the suggested clarification of 'dtmfterm' attribute, but reject the suggested priority order when both the attribute and local grammars are specified. That is, we maintain that the dtmfterm attribute has priority over local grammars. Developers who want full control can omit the dtmf attribute and write their own local grammar.
Email Trail:
From Guillaume Berche
12- Precise the semantics of the timeout property for the record element The specs currently state the following "A timeout interval is defined to begin immediately after prompt playback (including the 'beep' tone if defined) and its duration is determined by the 'timeout' property. If the timeout interval is exceeded before recording begins, then a <noinput> event is thrown. " However, how the "recording begins" is not clearly defined. I would assume that when the platform supports speech recognition during recording, the recording begins as soon as speech is provided by the remote end. However the specification is not clear on whether in this case the platform should remove the silence from the end of the first beep prompt up to the first recognised speech. It is not clear either whether background noise or music should trigger beginning of recording. For platforms not supporting speech recognition during recording I believe this timeout property should be ignored. Suggested text modification to section "2.3.6. RECORD": "A timeout interval is defined to begin immediately after prompt playback (including the 'beep' tone if defined) and its duration is determined by the 'timeout' property. If the timeout interval is exceeded before recording begins, then a <noinput> event is thrown. When the platform supports detection of silence, the recording begins as soon as leading silence (following the 'beep' tone if defined) completes. Note that whether the recording would include the leading silence is platform specific. For platforms not supporting silence detection, this property is ignored and no <noinput> even is ever raised during a recording."
Resolution: accepted with modifications
We believe that when recording begins is clearly defined: in Section 2.3.6, it states:
"A recording begins at the earliest after the playback of any prompts (including the 'beep' tone if defined). As an optimization, a platform may begin recording when the user starts speaking."
i.e. the recording may include initial silence, etc if the platform does not use the optimization (e.g. voice activity detection). With the optimization, the recording can begin with the user's speech. Whether music or other audio triggers voice activity detection is platform-specific. Note that this behavior applies independent of whether speech recognition is supported (while the recording and recognition processes use the same audio data stream, theese processes are independent and therefore their voice activity detection mechanism may be different).
The timeout interval is clearly defined: "A timeout interval is defined to begin immediately after prompt playback (including the 'beep' tone if defined) and its duration is determined by the 'timeout' property."
The timeout interval has an effect on both recording and recognition (which are logically independent).
For recording, the impact is specified in "If the timeout interval is exceeded before recording begins, then a <noinput> event is thrown." In the case of non-optimized recording, recording always begins after prompt playback, so <noinput> would never be thrown. With optimized recording, however, <noinput> may be thrown if no voice activity is detected before timeout interval elapses.
For recognition, the situation is more complex. We are modifying the specification (due to implementation report feedback) so that if recognition is supported during recording (this is an optional feature), then only non-local speech grammars are active. If a non-local speech grammar is matched by audio input, then execution is immediately transferred its enclosing element. This raises the issue of whether a <noinput> or <nomatch> could be thrown by the recognition process. A <noinput> could be generated if the timeout interval has elapsed. A <nomatch> could be generated if the audio triggers recognition but does not match the active grammar. Our belief is that throwing these events by the recognition process during recording is undesirable and not what VoiceXML authors expect. Consequently, we are considering clarifying the specification to make it clear that <noinput> and <nomatch> events are never thrown from the recognition process during recording.
Email Trail:
From Guillaume Berche
13- Precise that maxtime record attribute is mandatory and has no defaults Suggested text modification to section "2.3.6. RECORD": "maxtime The maximum duration to record. **This attribute must be specified as it has no default value. If not specified an error.badfetch event is thrown.**"
Resolution: rejected
The default value of the maxtime attribute is already specified as platform-dependent (see Table 16).
Email Trail:
From Guillaume Berche
14- Precise that if value is used outside of a prompt element it inherits default prompt parameters The prompt element defines that if its attributes are not specified, they default to values specified by properties. However, for the value element, the specification do not precise how default values are computed. Suggested text addition to section "4.1.4 <value> Element": "The manner in which the value attribute is played is controlled by the surrounding speech synthesis markup in the case the expression resolves to a string. In the case the expression resolves to a special variable such as _prompt, then the prompt attributes are inherited from the enclosing element of the definition of the referenced element. If no surrounding prompt element nor SSML tag is available, then the default attributes of a prompt element (such as bargein, timeout or language) are applied. Consequently, the two following constructions are equivalent. <catch event="noinput"> <value expr="'please retry'"> </catch> <catch event="noinput"> <prompt> <value expr="'please retry'"> </prompt> </catch> "
Resolution: accepted with modifications
We accept that clarification is required but not the proposed modification. We will clarify in 4.1.2 that for cases where prompt content is specified without prompt element then attributes are defined as specified in table 33.
Email Trail:
From Guillaume Berche
1- Precise that buffered non-matching DTMF are discarded when an ASR grammar matches. It is unclear in the specifications whether the following document <form name="form1"> <field> <grammar src="builtin:grammar/boolean"/> <grammar src="builtin:dtmf/digits?length=4"/> <field> <filled> <goto next="#form2"> </filled> </form> <form name="form2"> <field> <grammar src="builtin:dtmf/digits?length=1"/> <field> <filled> <prompt>thanks for the dtmf</prompt> </filled> <noinput> <prompt>DTMF was discarded</prompt> </noinput> </form> By pressing the 1 key and speaking "yes" and waiting for the input timeout. Should the interpreter play the "thanks for the dtmf" prompt or the "DTMF was discarded" prompt? Suggested solution: specify that partially buffered data are flushed in case of grammar match in another mode.
Resolution: accepted with modfications
We will modify the specification to make it clear that this is a platform-specific issue (i.e. platforms may differ in whether or not they discard buffered non-matching DTMF when an ASR grammar matches).
Email Trail:
From Guillaume Berche
2a- Rationale for not accepting local ruleref in inline SRGS grammars? Can you please provide rationale for not accepting ruleref elements with pure fragment URLs? Why would this be rejected in grammars provided inline in VXML documents? What is the reason driving this restriction and forcing to use remote grammars for any grammar using private rules?
Resolution: accepted with modifications
This is probably a misunderstanding on both sides. In section 3.1.1.4, the paragraph beginning "When referencing an external grammar, the value of src attribute ...", describes which values for the src attribute are permitted and which are not (the last paragraph of this section). It makes no statement about inline grammars. In particular, "Local rule reference: a fragment-only URI is not permited. (See definition in Section 2.2.1 of [SRGS]). A fragment-only URI value for the src attribute causes an error.semantic event." is intended to indicate that it is not permitted to have a fragment-only URI value for the src attribute in a VoiceXML <grammar> element. The simplest clarification is to start the last paragraph of this section "**And** the following are the forms of rule reference defined by [SRGS] that are not supported in VoiceXML 2.0. ...". For <ruleref>s in inline grammars, it is possible to refer rules within the same grammar, or an external grammar. What is not possible is to reference rules within a different inline grammar in a VoiceXML document since the uri is then pointing at a VoiceXML document not a grammar document. We believed that is clearly implied by VoiceXML and SRGS (especially with the clarification above) and that a separate clarfication is not required.
Email Trail:
From Guillaume Berche
3- Precise that when transitionning to a document (without fragment in the URI) and the transitionned document has no form, then the interpreter exits Rationale: it can not be requested that every document have at least a dialog (because a root application may only define variables or links), however when transitionning to a document (without specifying a dialog) and this document has no dialog defined, then the execution stops. Suggested modification to section "5.3.7 GOTO" "If the form item, dialog or document to transition to is not valid (i.e. the form item, dialog or document does not exist), an error.badfetch must be thrown. Note that for errors which occur during a dialog or document transition, the scope in which errors are handled is platform specific. For errors which occur during form item transition, the event is handled in the dialog scope. If the document to transition has no dialog defined (and no specific dialog was specified), then the execution stops."
Resolution: rejected
We believe it is already precise: a document to transition to without dialog is not valid, so an error.badfetch is thrown as already stated in 5.3.7.
Email Trail:
From Guillaume Berche
4- Precise Prompt selection algorithm when the Prompt element appears as executable content. It does not seem clear from the examples provided in section "4.1.6 Prompt Selection" whether the "prompt tappering" mechanism is supposed to be applied when a prompt element appears as executable content. For instance in the following case: <field ...> <help> <prompt count="1"> prompt 1 </prompt> <prompt count="3"> prompt 2 </prompt> <goto next="#form2"/> <prompt count="4"> prompt 3 </prompt> </help> </field> Which prompt should be heard when the prompt counter of the current form item (the field in this same) is 4? Applying the algorithm described in section "4.1.6 Prompt Selection" would result in having the "prompt 3" speech text to be heard, however it would be very confusing from the VXML author point of view because it would be expected that after the goto element no more executable content would be executed as specified in Appendix C in the definition of the "execute" term. Suggested modification to section "4.1.6 Prompt Selection": "Each input item, <initial>, and menu has an internal prompt counter that is reset to one each time the form or menu is entered. Whenever the system uses a prompt, its associated prompt counter is incremented. This is the mechanism supporting tapered prompts within form item elements. **When a prompt element is specified as executable content (e.g. inside a catch or filled element) then its count element is ignored and all prompts contained in this element as queued in document order)**"
Resolution: rejected
As stated in 5.3.5 the count attribute on prompts in executable content is meaningless.
Email Trail:
From Guillaume Berche
5- Precise the value of name$.inputmode when a transfer is not interrupted by user input Suggested modification to "Table 22: <transfer> Shadow Variables" "name$.inputmode The input mode of the terminating command (dtmf or voice) or **undefined if the transfer was not interrupted by a grammar match**"
Resolution: accepted
We will apply the suggested modification.
Email Trail:
From Guillaume Berche
6- Correct typo in example of Section "4.1.3 Audio Prompting" The extension of the file should rather be .vxml to not introduce confusion. "<goto next="./make_bid.html"/>"
Resolution: accepted
We will correct the typo.
Email Trail:
From Guillaume Berche
7- Precise that alternate audio is recursive: According to the schema, the following vxml fragment is legal <prompt> <audio src="http://www.dummy.org/main.wav" > <audio src="http://www.dummy.org/alternate1.wav" > <audio src="http://www.dummy.org/alternate2.wav"/ > </audio> </audio> </prompt> Can you please confirm my understanding of the specification: I understand that if both main.wav and alternate1.wav can not be played, but alternate2.wav can be played, then alternate2.wav will be played and no error will be thrown.
Resolution: accepted
Your understanding is correct. No modifications will be made to the text since we believe this is sufficiently clear already.
Email Trail:
From Guillaume Berche
8- Precise behavior of submit if undeclared/unvalid variables are references in submit's namelist attributes The specifications section "5.3.8 SUBMIT" states the following "The list of variables to submit. By default, all the named input item variables are submitted. If a namelist is supplied, it may contain individual variable references which are submitted with the same qualification used in the namelist. Declared VoiceXML and ECMAScript variables can be referenced." It does not specify the expected behavior in case an undeclared variable or an invalid variable name is referenced in the namelist attribute. Suggested modification to section "5.3.8 SUBMIT": "namelist The list of variables to submit. By default, all the named input item variables are submitted. If a namelist is supplied, it may contain individual variable references which are submitted with the same qualification used in the namelist. Declared VoiceXML and ECMAScript variables can be referenced. **If an undeclared or invalid variable name is referenced then an "error.semantic" event is thrown**"
Resolution: accepted with modifications
We will modify the specification to clarify that an error.semantic is thrown when an undeclared variable is referenced, including reference within the namelist of a submit element (as well as exit, return, and subdialog elements).
Email Trail:
From Guillaume Berche
11- Typo in section "2.3.6. RECORD" The second sentence of the extract below seems incomplete, I don't get the impact of the timeout interval on having a record variable unfilled. "If no audio is collected during execution of <record>, then the record variable remains unfilled (note). This can occur, for example, when DTMF or speech input is received during prompt playback or the timeout interval (if the developer wants input during prompt playback to initiate recording, then prompts should be placed in an immediately preceding <field> with a zero timeout). "
Resolution: accepted
We will modify the text so that the second sentence reads "This can occur, for example, when DTMF or speech input is received during prompt playback or *before* the timeout interval *expires* ..."
Email Trail:
From Guillaume Berche
12- Typo in "Last Call Disposition of Comments" The table in section "2. Comments" has an invalid "disposition" content: all items are marked as accepted whereas this is not the case.
Resolution: accepted
No action since this document will be replaced by a CR disposition of comments document.
Email Trail:
From Max Froumentin
I would like to object that all the examples in VoiceXML2 come with an XML declaration and a schemaLocation attribute. It makes the language appear unneccesarily complex. The Hello World example would be much simpler as: <vxml xmlns="http://www.w3.org/2001/vxml" version="2.0"> <form> <block>Hello World!</block> </form> </vxml> schemaLocation bothers me more than by just making the examples hard to read. It suggests that the declaration is mandatory (which the XMLSchema refutes), or even that the use of the schema is.
Resolution: rejected
It is good practise to provide the XML declaration (even though it is not mandatory). Providing the schemaLocation allows documents to be validated automatically by various tools, although as you correctly point out neither the attribute nor schema are mandatory.
Email Trail:
From Matt Porter
this has to do Guillaume's question... with let me elaborate on an issue with <record> that i dont understand. Given this dialog... <?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0"> <form> <record name="msg" beep="true" maxtime="10s" finalsilence="4000ms" dtmfterm="true" type="audio/x-wav"> <prompt timeout="5s">Record a message after the beep.</prompt> <noinput> I didn't hear anything, please try again. </noinput> </record> </form> </vxml> what if the user does not say anything ( no audio is collected because of silence detection or whatever ), but terminates the recording with a DTMF. it seems to me the "termchar" shadow variable should hold the key they pressed, and the "noinput" event would still be thrown...is this correct? The <record> section seems to need more clarification....
Resolution: rejected with modifications
If dtmfterm is set to true, recording is terminated when any dtmf key is pressed ("Any DTMF keypress matching an active grammar terminates recording") but if no audio has been collected, then the record variable is not filled ("If no audio is collected during execution of <record>, then the record variable remains unfilled.") and consequently no shadow variables are assigned. The FIA then applies as normal without a noinput event being thrown; in your example, the prompt would be read again and another attempt at recording initiated. This is analogous to the situation with complex grammar result which don't assign any values to form input item variables, but no noinput event is thrown and the FIA applies as normal. Finally, note that there may be information available in these situations via the application.lastresult$ as described in 5.1.5. We will modify the specification to make clearer that information may be available via the application.lastresult$ in these situations.
Email Trail:
From John Voger
Under section 3.1.1.3 Grammar Weight. The last paragraph contains ..... real speech and textual data on a paricular platform." Please replace "paricular" with "particular"
Resolution: accepted
We will correct the typo.
Email Trail:
From Philippe Le Hegaret
[ECMASCRIPT] " Standard ECMA-262 ECMAScript Language Specification ", Standard ECMA-262, December 1999. See http://www.ecma.ch/ecma1/STAND/ECMA-262.htm should read [ECMASCRIPT] " Standard ECMA-262 ECMAScript Language Specification ", Standard ECMA-262, December 1999. See http://www.ecma-international.org/publications/standards/ECMA-262.HTM
Resolution: accepted
We will update the reference.
Email Trail:
1. Several complex type definitions in vxml.xsd have <choice> model groups that contain a single particle consisting of a reference to a group. For example: <xsd:complexType name="basic.event.handler" mixed="true"> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:group ref="executable.content" /> </xsd:choice> <xsd:attributeGroup ref="EventHandler.attribs" /> </xsd:complexType> Since the particle in the group executable.content is also a <choice>, this content model becomes <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:choice> <xsd:group ref="audio"/> <xsd:element ref="assign"/> <xsd:element ref="clear"/> ... ... </xsd:choice> </xsd:choice> The outer <choice> is clearly redundant. The complex type definition can be simplified to: <xsd:complexType name="basic.event.handler" mixed="true"> <xsd:group ref="executable.content" minOccurs="0" maxOccurs="unbounded" /> <xsd:attributeGroup ref="EventHandler.attribs" /> </xsd:complexType> We think such a simplification makes the schema easier to follow and we recommend the change.
Resolution: accepted
Change applied.
Email Trail:
2. Some contents may usefully be constrained more tightly than the schema now constrains them. For example, the <if> element is declared as: <xsd:element name="if"> <xsd:complexType mixed="true"> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:group ref="executable.content" /> <xsd:element ref="elseif" /> <xsd:element ref="else" /> </xsd:choice> <xsd:attributeGroup ref="If.attribs" /> </xsd:complexType> </xsd:element> Since there is no order or occurence constraint, instances such as the following are all valid, which seems too flexible. <if> ... <else/> ... <else/> ... <elseif/> ... </if> The content can be changed to the following to ensure that all <elseif> elements occur before <else> and that there is no more than one <else> element: <xsd:element name="if"> <xsd:complexType mixed="true"> <xsd:sequence> <xsd:group ref="executable.content minOccurs="0" maxOccurs="unbounded" /> <xsd:sequence minOccurs="0" maxOccurs="unbounded"> <xsd:element ref="elseif" /> <xsd:group ref="executable.content minOccurs="0" maxOccurs="unbounded" /> </xsd:sequence> <xsd:sequence minOccurs="0" maxOccurs="1"> <xsd:element ref="else" /> <xsd:group ref="executable.content minOccurs="0" maxOccurs="unbounded" /> </xsd:sequence> </xsd:sequence> <xsd:attributeGroup ref="If.attribs" /> </xsd:complexType> </xsd:element> (In passing, we note that on general principles, we believe the language would be easier to describe and use if the 'elseif' and 'else' elements (and a 'then' element) were not empty elements followed by appropriate executable content, but non-empty elements which contained the appropriate executable content. We recognize that this may not be a feasible change at this stage in the life of VoiceXML.)
Resolution: accepted
Change applied. We will look into changing the if-then-else structure in a future version of the language.
Email Trail:
3. The element "output" in vxml.xsd is declared as abstract, and not used or referenced anywhere else. The declaration may be removed.
Resolution: accepted
Element removed.
Email Trail:
4. The VariableName.datatype in vxml-datatypes.xsd has a pattern: xsd:pattern value="['$'\c]+" /> The character '$' in the range doesn't need the quotation mark, and as written the value will accept single quotation marks where a dollar sign or \c is expected. We suspect this is not intended.
Resolution: accepted
Change applied.
Email Trail:
5. The ContentType.datatype in vxml-datatypes.xsd is defined as a list of string. Since string may contain whitespaces, the definition should perhaps be changed to a list of token; this is less subject to misunderstanding by readers of the schema.
Resolution: accepted
Change applied.
Email Trail:
6. According to the comments in the annotations, VariableNames.datatype, RestrictedVariableNames.datatype, and EventNames.datatype are lists of atomic VariableName.datatype, RestrictedVariableName.datatype and EventNames.datatype respectively. We believe they should be defined as such rather than as NMTOKENS or other types: <xsd:simpleType name="RestrictedVariableNames.datatype"> <xsd:annotation> <xsd:documentation>space separated list of restricted variable names </xsd:documentation> </xsd:annotation> <xsd:list itmeType="RestrictedVariableName.datatype"/> </xsd:simpleType> <xsd:simpleType name="VariableNames.datatype"> <xsd:annotation> <xsd:documentation>space separated list of variable names including shadow variables</xsd:documentation> </xsd:annotation> <xsd:list itemType="VariableName.datatype"> </xsd:simpleType> <xsd:simpleType name="EventNames.datatype"> <xsd:annotation> <xsd:documentation>space separated list of EventName.datatype</xsd:documentation> </xsd:annotation> <xsd:list itmeType="EventName.datatype"/> </xsd:simpleType>
Resolution: accepted
Change applied.
Email Trail:
7. Some suggestions for simple type Repeat-prob.datatype in grammar-core.xsd: a. The base type might better be made decimal instead of float. It should be noted that decimal is not a subtype of float and their mappings from the lexical space to the value space are different. For example, '1.1' may be rounded to some float value different from exactly 1.1. Such behavior is not expected in decimal. b. The maxInclusive value is 1.0, while the patterns allow any positive values less than 10. They should be made consistent. c. The pattern ([0-9]+)? should probably be replaced with the equivalent pattern [0-9]*.
Resolution: accepted
Changes applied.
Email Trail:
8. The commented-out pattern constraint in RestrictedVariableName.datatype in vxml.xsd needs to be removed or fixed.
Resolution: accepted
Change applied.
Email Trail:
From Guillaume Berche
1- precise behavior when only activated grammars are disabled by "inputmodes" property In the following example, what is the expected behavior? Should an error.semantic be thrown as would if no grammar was activated as described in section "3.1.4 Activation of Grammars"? Should the grammars considered rather as activated but would not match as described in section "6.3.6 Miscellaneous Properties" (inputmodes property) ", and thus lead to a nomatch event to be thrown? Section "3.1.4 Activation of Grammars" states that "If no grammars are active when an input is expected, the platform must throw an error.semantic event". Section "6.3.6 Miscellaneous Properties" states that "For instance, voice-only grammars may be active when the inputmode is restricted to DTMF. Those grammars would not be matched, however, because the voice input modality is not active. " <menu> <prompt> Choose wind speed and after temperature then finaly ask for leave choice test. </prompt> <choice next="#exacte_rain"> rain humidity </choice> <choice next="#approx_wind"> wind speed </choice> <choice next="#approx_weat">temperature celcius</choice> <choice next="#exacte_leave">Leave choice test </choice> </menu> Suggested modification to Section "6.3.6 Miscellaneous Properties" (inputmodes definition) "[..] For instance, voice-only grammars may be active when the inputmode is restricted to DTMF. Those grammars would not be matched, however, because the voice input modality is not active. If among all grammars active none can be matched because their associated input modality is not enabled, then a nomatch event is thrown."
Resolution: rejected
Your question is not very clear but given a menu with active speech grammars and no user input, then a noinput event would be thrown. This also applies if the input mode is set to dtmf only; an error.semantic event would not thrown since the statement in 3.1.4 only applies when there are no active grammars - and there are active grammars in this example, even though their input mode is disabled. In essence, grammar activation is separate from input mode activation.
Email Trail:
From Guillaume Berche
2- out-of-date fetching algorithm: maxage defaults to property value Section "6.1.2 Caching" specifies the following: "[...] If a maxage value is provided, [...] Otherwise, If the resource has expired, Perform maxstale check. Otherwise, use the cached copy." I understand that the predicate "If a maxage value is provided" is always true, as there are default values for the different maxage properties (audiomaxage, documentmaxage, grammarmaxage, objectmaxage, scriptmaxage...) as specified in section "6.3.5 Fetching Properties" Suggested modification to section "6.1.2 Caching": remove the "If a maxage value is provided, " part and the corresponding "otherwise" statement.
Resolution: accepted with modifications
We will clarify that maxage and maxstale properties are allowed to have no default value whatsoever. If the value is not provided by the author, and the platform does not provide a default value, then the value is undefined and the 'Otherwise' clause of the algorithm applies. All other properties must provide a default value (either as given by the specification or by the platform).
Email Trail:
From Guillaume Berche
3- schema forbids empty catch event name Section "5.2.4 Catch Element Selection " specifies the following "The name of a thrown event matches the catch element event name if it is an exact match, a prefix match or the catch event name is not specified. A prefix match occurs when the catch element event attribute is a token prefix of the name of the event being thrown, where the dot is the token separator, all trailing dots are removed, and the empty string matches everything. " However, the schema forbids an empty string event specification as illustrated below: <xsd:element name="catch"> <xsd:complexType> <xsd:complexContent mixed="true"> <xsd:extension base="basic.event.handler"> <xsd:attribute name="event" type="EventNames.datatype"/> </xsd:extension> </xsd:complexContent> </xsd:complexType> </xsd:element> <xsd:simpleType name="EventNames.datatype"> <xsd:annotation> <xsd:documentation>space separated list of EventName.datatype</xsd:documentation> </xsd:annotation> <xsd:restriction base="xsd:NMTOKENS"/> </xsd:simpleType> The schema specifications (http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#NMTOKENS) defines NMTOKENS as the following: "3.3.5 NMTOKENS [Definition:] NMTOKENS represents the NMTOKENS attribute type from [XML 1.0 (Second Edition)]. The value space of NMTOKENS is the set of finite, non-zero-length sequences of NMTOKENs."
Resolution: accepted with modifications
We will modify the description in 5.2.4 to make it clearer that event names cannot be empty strings (i.e. event="" is illegal) but can be unspecified (i.e. <catch> ....) and can prefix match when dots are removed (e.g. event="." will match any event).
Email Trail:
From Greg FitzPatrick
4.1 Prompts (from Version 2.0 - 20 February 2003) does not mention the version attribute as described by the DTD and schema <!ATTLIST prompt bargein %boolean; #IMPLIED bargeintype %bargeintype; #IMPLIED cond %expression; #IMPLIED count %integer; #IMPLIED xml:lang NMTOKEN #IMPLIED timeout %duration; #IMPLIED xml:base %uri; #IMPLIED version CDATA #FIXED "1.0" >
Resolution: rejected
Since the version is fixed, we see no reason to discuss it further in the text.
Email Trail:
From Guillaume Berche
1- Precise that timeout attribute of the <prompt/> element only applies if the prompt element is not empty The <prompt> element is designed to queue prompt element for play. As a side effect it also set the timeout for the next input collection. However, the specifications do not describe the expected behavior if the prompt element is empty: should tghe side-effect still apply? This would make little sense: this would be a synonym for a "set next timeout" command without queueing any prompt. Suggested addition to section "4.1 Prompts": "Note that an empty prompt such as "<prompt [...]/>" will be silently ignored and in particular would not set the timeout of the next input collection phase" Dependency: IR testsuite case #539 (389/389.vxml)
Resolution: rejected
The timeout property applies as normal even if there is no content in the prompt.
Email Trail:
From Guillaume Berche
2- Precise behavior if "cond" attribute of form item does not resolve into an EcmaScript boolean value The VXML specification state the following in section "2.1.3 Form Item Variables and Conditions": "cond An expression to evaluate in conjunction with the test of the form item variable. If absent, this defaults to true, or in the case of <initial>, a test to see if any input item variable has been filled in." However, the specifications do not detail the expected behavior if the expression does not resolve to a boolean. Suggested modification to "2.1.3 Form Item Variables and Conditions" "cond An expression to evaluate in conjunction with the test of the form item variable. If absent, this defaults to true, or in the case of <initial>, a test to see if any input item variable has been filled in. If the evaluation of the expression results into an error or does not resolve into an ECMAScript boolean value, then an error.semantic event is thrown."
Resolution: rejected
In the specific description of cond attributes it states that it is "An expression that must evaluate to true after conversion to boolean in order for the form item to be visited". Boolean conversion of an ECMAScript expression always returns either true or false.
Email Trail:
From Ufuk Kayserilioglu
We are trying to implement the <record> tag in our Voice Browser in a comformant way; however, we cannot understand what, clearly, are the requirements from a browser for this tag. My points can be summed up as follows: I) The main confusion arises form the behaviour of bargein="true" prompts in <record>. According to Fig 7 in section 2.3.6 (lower left corner) bargein controls apply to audio queued within <record>. On the other hand, a few lines below, it is stated: "A /recording begins/ at the earliest after the playback of any prompts (including the 'beep' tone if defined). As an optimization, a platform may begin recording when the user starts speaking." Now, if recording does not begin DURING the prompt playback, then how can those prompts be barged-in? Or, should we understand that if the user barges-in with voice during prompt playback THEN recording should be started? A clarification of how <record> and audio queued within <record> with barge-in interacts, in our opinion, is badly needed. II) The second comment that baffles us in the spec is: "If no audio is collected during execution of <record>, then the record variable remains unfilled (note <http://www.w3.org/TR/voicexml20/#unfilled_record>). This can occur, for example, when DTMF or speech input is received during prompt playback or the timeout interval (if the developer wants input during prompt playback to initiate recording, then prompts should be placed in an immediately preceding <field> with a zero timeout)." (Section 2.3.6) This comment is weird in two ways: 1) How can record variable be unfilled "when DTMF or speech input is received during ... the timeout interval"? This seems to be the primary method of filling a record variable. 2) We cannot grasp, in any way, how it would be possible to achieve what the spec author has stated within the parantheses. If there is preceeding <field> with zero timeout then: i) if the user starts speaking while the prompts in the <field> are playing then the input goes to the processing of the field and will be matched to whatever grammar is specified for it, or will throw a "nomatch", ii) else if the user waits for the prompts to finish, then a "noinput" event will be thrown. In neither case, will the input be going into the <record> tag that succeeds the <field> tag. If the spec is trying to say something else then it should be clearly explained.
Resolution: rejected with modifications
I). Prompts can be barged in on if active DTMF grammars are defined (active speech grammars too but the ability to combine recognition and recording is may be removed from the specification due to a lack of implementation support). II.1) DTMF input with recording triggered by voice activity detection (i.e. as platform optimization, instead of recording starting immediately after prompt playback, recording only begins when voice activity is detected). II.2) We agree this is confusing (it was intended to cover another use case). So we will remove the text in parenthesis "(if the developer wants input during prompt playback to initiate recording, then prompts should be placed in an immediately preceding <field> with a zero timeout) "
Email Trail:
From Mark Clark
In section 4.1.5 you make the following statement: "In the case where several prompts are queued, the bargein attribute of each prompt is honored during the period of time in which that prompt is playing" I am concerned about the scenario where a barge in *true* prompt is followed by a barge in *false* prompt while waiting for speech input. I have not yet encountered a speech recognition engine that allows recognition to ignore input once recognition waiting has begun. It would seem reasonable to me that for Speech Recognition, if a barge in *true* is followed by a barge in *false* prompt, that the *false* (and any subsequent false settings) would be ignored by the speech recognition until the next transition state. I see no problem for the reverse condition. If the first prompt is barge in *false*, then just play the prompt without starting recognition. Only when a barge in *true* prompt is encountered is the recognition waiting started.
Resolution: rejected
It is possible to implement *true* to *false* bargein by re-starting recognition. We realize this may not be the perfect solution, but we are reluctant to change it at this stage in the standards process. A future version of the language may provide a better solution.
Email Trail:
From Robert Barkan
In all revs of the VXML 2.0 spec, Appendix J (Changes from VoiceXML 1.0), "Modified Elements" section, it says: added "error.unsupported.language" pre-defined error type (5.3.6) However, the reference to section 5.3.6 points to "REPROMPT", which doesn't have this error listed, and I don't understand any scenarios where REPROMPT could throw this event. We are working on a project porting a product from VXML 1.0 to 2.0, and if this change actually does impact the REPROMPT element, we need to understand it better. Alternately, is it possible that this is a typo in the spec, and that instead of "5.3.6", it should really refer to section "5.2.6" - "Event Types" which would make complete sense?
Resolution: accepted
It is a typo and will be corrected to 5.2.6.
Email Trail:
From Arnaud Vallee
Analysis:I need some clarification on this point: 2.3.7.1 Blind Transfer With a blind transfer, an attempt is made to connect the original caller with the callee. Any prompts preceeding the <transfer>, as well as prompts within the <transfer>, are queued and played before the transfer attempt begins; bargein properties apply as normal. As the transfer is modal, a bargein can happen only if we define a grammar under transfer. But what is the consequence of matching the grammar with a recognition result while the prompt are played? What will be the value of the transfer item variable?
[Teemu Tingander] As you said that transfer is always modal the grammars that are inside <transfer> element are field item grammars and as such they should filled the field item specified by name tag. But cause this is a transfer and the specification says that match in grammar of transfer should terminate the transfer, my opinnion is that the field should be filled with 'near_end_diconnect' and put the shadow variables as they should be f$.duration=0.0,f$.utterance=<what-was-recognized>,f$.inputmode=inputmode.. You have the point in here taht specification really does make difference with the cases The possible outcomes for a bridge transfer before the connection to the callee is established are: and The possible outcomes for a bridge transfer after the connection to the callee is established are: And it is not clearly said what should be done if bargein happens. This case should be defined in the first one of those cases. This same issue raises with blind as well as bridgerd transfer, and i used 'near_end_diconnect' to indicate that the caller has requested to cancel or disconnect the call. And what comes in tagging of those grammars, if someone really finds some reason for that, could explain it more deeply. [Ken Rehor] Your summary is correct. The result should be 'near_end_disconnect' if a caller cancels a transfer by barging in on a prompt, for both blind and bridge transfers. This is because prompts are queued and played to completion before the call transfer begins in either case. The shadow variables would be filled as you describe. This will be clarified in a future revision of the specification.
Resolution: accepted
Following the thread responses by Teema Tingander and Ken Rehor, the specification will be modified to indicate that the transfer item variable will have the value 'near_end_disconnect' if a caller cancels a transfer by barging in on a prompt, for both blind and bridge transfers and the shadow variables will be filled as described above.
Email Trail:
From Guillaume Berche
9- Inconsistent variable scope description: In section "5.1.2 Variable Scopes", the dialog variable scope is defined as: "dialog Each dialog (<form> or <menu>) has a dialog scope that exists while the user is visiting that dialog, and which is visible to the elements of that dialog. Dialog variables are declared by <var> and <script> child elements of <form> and by the various form item elements. " However, a block element is also a form item, as such variables defined in its are part of the dialog scope. Then the anonymous variable scope is defined as: "(anonymous) Each <block>, <filled>, and <catch> element defines a new anonymous scope to contain variables declared in that element." This definition is in contradiction with the first definition which specified that the block element had its variables assigned into the dialog scope. Correction suggestion to section "5.1.2 Variable Scopes": "(anonymous) Each <filled>, and <catch> element defines a new anonymous scope to contain variables declared in that element." (note block was removed from the description to make it consistent with the definition of the dialog scope)
Resolution: accepted with modifications
We reject the suggested correction, but accept that a clarification is required in the definition of dialog scope, namely, that form element item names are being referred to.
Email Trail:
From Guillaume Berche
10- Catch only apply to input items not to control items Section "5.2.2 Catch" state that "The catch element associates a catch with a document, dialog, or form item." However, the schema for block is the following: " <xsd:element name="block"> <xsd:complexType mixed="true"> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:group ref="executable.content"/> </xsd:choice> <xsd:attributeGroup ref="Form-item.attribs"/> </xsd:complexType> </xsd:element>" Suggested correction to Section "5.2.2 Catch": "The catch element associates a catch with a document, dialog, or a form item except for blocks."
Resolution: accepted
We will apply the suggested correction.
Email Trail:
From Guillaume Berche
15- Inconsistent behavior of option vs choice. - Choice can have nested grammars that override the default grammar whereas options can not - Choice can have nested audio as alternate prompts whereas options can not - Menus can have a dtfm boolean flag to turn on automatic dtmf grammar generation where as options within fields can not. Suggested modification: upgrade options so that they are equivalent to choice and only differ in the treatment of a match (which in the case of options does not trigger a transition)
Resolution: rejected
Making them consistent at this stage in the specification is problematic. However, we will consider this issue for a future version of VoiceXML.
Email Trail:
From Guillaume Berche
2b- Schema imposes that grammar rule roots and [private] rule ids are unique among grammar elements on a same VXML document. The VXML schema imposes the following constraint to the root attribute of the grammar element: <xsd:simpleType name="Root.datatype"> <xsd:annotation> <xsd:documentation>does not expression the constraint that NULL VOID GARBAGE are illegal as rule name</xsd:documentation> </xsd:annotation> <xsd:restriction base="xsd:IDREF"> <xsd:pattern value="[^.:-]+"/> </xsd:restriction> </xsd:simpleType> I understand this implies that it is illegal to have two different grammars with refering to distinct root rules with the same name. In addition, the VXML schema imposes the following constraint to the id attribute of the rule element: <xsd:simpleType name="Id.datatype"> <xsd:annotation> <xsd:documentation> does not expression the constraint that NULL VOID GARBAGE are illegal as rule name </xsd:documentation> </xsd:annotation> <xsd:restriction base="xsd:ID"> <xsd:pattern value="[^.:-]+"/> </xsd:restriction> </xsd:simpleType> I understand this implies that it is illegal in a same VXML document to have two different grammars with private rules that have the same id. To me this defeats the purpose of SRGS private rules (even if referencing to one inline private is currently forbidden in VXML as noted in remark #2a) Suggested modification: investigate modification of the VXML schema to waive the restrictions described above.
Resolution: rejected
This is a common problem with using ids on elements from multiple namespaces within the same document. The W3C Schema working group are aware of the problem and we may be able to provide a better solution in future versions of VoiceXML.
Email Trail:
From Mark Clark
Is there really no way to specify a URI for an external SSML file in VXML 2.0? I am looking for an "src=" attribute of the <prompt> element that could specify resources whose mime types are either "application/ssml+xml" or "text/plain". This would be analogous to the "src=" attribute of the <grammar> element that takes a URI specifying a resource whose mime type is "application/srgs+xml" or "application/srgs". Currently it appears that all Speech markup must be in line. Am I missing something?
Resolution: rejected
The specification doesn't provide a mechanism to reference external SSML documents. This will be considered for a future version of the language.
Email Trail:
From Pavel Cenek
1. What did you say? -------------------- In a real dialog in a noisy environment or when a user is not concentrated, it can happen that a user does not understand properly a system prompt and wants the system to repeat it. The last prompt should be repeated without increasing the prompt counter in order to repeat really the same prompt. It is not possible to do it in the current version of VoiceXML. I suggest the following solution: Add an atribute to the <reprompt> tag, which allows to repeat prompts without increasing the prompt counter.
Resolution: rejected
This will be considered for a future version of the language.
Email Trail:
From Pavel Cenek
2. detection and handling of multiple fills of one slot ------------------------------------------------------- VoiceXML provides no means for detection and handling the situation when a slot value is re-specified. In real conversation it can happen that a participant specifies a piece of information twice with different value. The normal reaction is that the other participant detects this situation and asks the first one for a clarification. VoiceXML has no means for doing this. I suggest the following solution: Define a standard event, e.g. slotredefinition.slotname that would be thrown in such a case and the old value would be contained in the _message variable in the <catch> tag's anonymous scope.
Resolution: rejected
This can be done within current specification by storing the values and when new input is received, comparing the stored values with the latest values. A future version of the language may provide a more flexible approach along the lines you suggest.
Email Trail: