W3C

VoiceXML 2.1 Disposition of Comments

This version:
April 11, 2007
Editor:
Matt Oshry, Tellme Networks

Abstract

This document details the responses made by the Voice Browser Working Group to issues raised during the Last Call (beginning 15 September 2006 and ending 06 October 2006) review of Voice Extensible Markup Language (VoiceXML) Version 2.1. Comments were provided by Voice Browser Working Group members, other W3C Working Groups, and the public via the www-voice-request@w3.org (archive) mailing list.

Status

This document of the W3C's Voice Browser Working Group describes the disposition of comments as of 28th November 2006 on the Last Call Working Draft of Voice Extensible Markup Language (VoiceXML) Version 2.1. It may be updated, replaced or rendered obsolete by other W3C documents at any time.

For background on this work, please see the Voice Browser Activity Statement.

Table of Contents


1. Introduction

This document describes the disposition of comments in relation to Voice Extensible Markup Language (VoiceXML) Version 2.1 (http://www.w3.org/TR/2006/WD-voicexml21-20060915/). Each issue is described by the name of the commentator, a description of the issue, and either the resolution or the reason that the issue was not resolved.

The full set of issues raised for the Voice Extensible Markup Language (VoiceXML) Version 2.1 since 19th August 2003, their resolution and in most cases the reasoning behind the resolution are available from http://www.w3.org/Voice/Group/vxml21/cr/voicexml21-cr.html [W3C Members Only]. This document provides the analysis of the issues that were submitted and resolved as part of the Last Call Review. It includes issues that were submitted outside the official review period, up to 28th November 2006.

Notation: Each original comment is tracked by a "(Change) Request" [R] designator. Each point within that original comment is identified by a point number. For example, "R5-1" is the first point in the fifth change request for the specification.

2. Summary

ItemCommentatorNatureStatusDisposition
R116Shane SmithClarification / Typo / Editorial RejectedAccepted
R117Tobias GobelClarification / Typo / Editorial RejectedAccepted
R118Srinivas R ThotaClarification / Typo / Editorial AcceptedAccepted (no reply)
R119Titus von der MalsburgFeature RequestDeferredAccepted
R120Greg InglisFeature RequestDeferredAccepted
R121Teemu TinganderClarification / Typo / Editorial RejectedAccepted
R122Srinivas R ThotaFeature RequestDeferredAccepted
R123Rethish KumarpsClarification / Typo / Editorial AcceptedAccepted (no reply)
R124HarbhanuChange to Existing FeatureRejectedRejected
R125Petr KubaTechnical ErrorAcceptedAccepted (no reply)
R126Jason HannaClarification / Typo / Editorial AcceptedAccepted (no reply)

2.1 Clarifications, Typographical, and Other Editorial

Issue R116

From Shane Smith:

Hey Folks,
I think the behavior of application.lastresult$ needs clarification. From the 2.0 spec:
The number of application.lastresult$ elements is guaranteed to be greater than or equal to one and less than or equal to the system property "maxnbest". If no results have been generated by the system, then " application.lastresult$" shall be ECMAScript undefined.
The behavior on most platforms is that this array only exists when a valid result occurs. But, in 2.1, we introduce new behavior concerning utterance recording. While recording user utterances on recognition is valuable, it's even *more* valuable to gather invalid recordings.... things that triggered a nomatch. In fact, the example from lcwd shows exactly this:
   <nomatch count="3">
     <var name="the_recording"
        expr="application.lastresult$.recording"/>
     <submit method="post"
       enctype="multipart/form-data"
       next="upload.cgi"
       namelist="the_recording"/>
   </nomatch>

Even reading the first 3 paragraphs of section seven give the impression that you need to actually have valid recognition for these shadow variables to become available. I have yet to find a 2.1 compliant vendor that has offers anything in the lastresult array when a nomatch occurs, and I think we should offer some clarification on this change from 2.0. If it's in 2.1, then I missed it, sorry.

Resolution: Rejected

According to 5.1.5 of VXML2, "[a]ll of the shadow variables described above (aka application.lastresult$) are set immediately after any recognition. In this context, a <nomatch> event counts as a recognition, and causes the value of "application.lastresult$" to be set, though the values stored in application.lastresult$ are platform dependent." The VBWG believes that no further clarification is required.

Email Trail:

Issue R117

From Tobias Gobel:

I have tested a number of platforms so far which support utterance recording. All except one fill the lastresult$ on a NoMatch event. And the one that currently doesn't claimed they will fix this some time soon. The 2.0 spec says:
"All of the shadow variables described above are set immediately after any recognition. In this context, a <nomatch> event counts as a recognition, and causes the value of "application.lastresult$" to be set" So it explicitly mentions that lastresult$ must be set in a NoMatch scenario. To make things clearer, though, I agree that the 2.1 spec could and should also explicitly mention this fact.
In the example you mention, the application.lastresult$.recording is first assigned to a variable and then put in the submit's namelist. Is this really required? Again, all except one platform I've tested support having the application.lastresult$.recording itself in the namelist, without assigning it to a variable first. The spec should be clearer about this, too.

Resolution: Rejected

The example is informative. application.lastresult$.recording can be submitted without the use of a temporary variable. The VBWG feels that no clarifying text is required here. The behavior of <submit> is described adequately in 5.3.8 of VXML2:
"If a namelist is supplied, it may contain individual variable references which are submitted with the same qualification used in the namelist. Declared VoiceXML and ECMAScript variables can be referenced."

Email Trail:

Issue R118

From Srinivas R Thota:

Referring to section "5 Using <data> to Fetch XML Without Requiring a Dialog Transition" (http://www.w3.org/TR/voicexml21/#sec-data) of VoiceXML 2.1 latest document, mentions about only read-only subset of DOM bindings which are needed, but the "Appendix D ECMAScript Language Binding for DOM" (http://www.w3.org/TR/voicexml21/#sec-data-dom) specifies methods other than read-only subset such as set, insert, remove, create etc. This was not present in the previous version (http://www.w3.org/TR/2005/CR-voicexml21-20050613/). Is this a Bug in the new document. ?

Resolution: Accepted

This was not intended. Thanks for pointing out the error. We'll correct it as soon as possible.

Email Trail:

Issue R121

From Teemu Tingander:

Can implementation be conformant with VoiceXML 2.0 specification if the DOM exposed by <data/> is read-write ? And if not please explaing why (and i don' t mean why in context: cause there reads "as real-only subset" in latest candidate)? And did you evaluate the possibility to use E4X ( ECMA-357 ) instead of W3C DOM (If I remember correctly there was some discussion about this..); the E4X interface IS more ECMA oriented than DOM mapping.. And please be more prompt what is the subset for dom. createElement is hardly readonly method. What you mean by read-only; that the user is not able to create new documents ?

Resolution: Rejected

Implementations may expose more than the read-only subset of DOM L2 if they wish, but only the read-only subset described in Appendix D is required for interoperability. As acknowledged in R118 [1], it was an editorial blunder that Appendix D describes more than the read-only subset (e.g. Document.createElement). This will be corrected in the next draft of VXML21. Regarding E4X, as stated in [2], the DOM approach is tried and tested, is a W3C Recommendation, and is widely implemented. E4X is still young, and was even more so at the time the WG designed this feature of VoiceXML 2.1.
[1] http://lists.w3.org/Archives/Public/www-voice/2006JulSep/0065.html
[2] http://lists.w3.org/Archives/Public/www-voice/2006JulSep/0032.html

Email Trail:

Issue R123

From Rethish Kumarps:

The <mark> element in VoiceXML 2.1 specifies that the "markname" in application.lastresult$ is to be set as the "name of the mark last executed by the SSML processor before barge-in occurred".
But there could be a cross-over scenario when the Speech synthesizer and Speech Recognizer are not part of the same MRCP session.
In this case, the Speech recognizer would send a START-OF-INPUT on bargein to the client which would send a BARGE-IN-OCCURED to the Speech Synthesizer.
But there could be a delay in sending the latter message and new <mark> could have been executed by this time.
So, is the mark information received before sending the BARGEIN_OCCURED to be honoured or the last one is to be honoured?

Resolution: Accepted

The statement from the VoiceXML 2.1 specification "the mark last executed by the SSML processor before barge-in occurred" is protocol and implementation agnostic. To comply with the spec, your implementation must use the mark that was executed closest to but no later than when the actual bargein occurred as determined by the recognizer. In your case, you will likely need to use timestamp information to choose the correct mark and set the lastresult$ properties to the correct values.

Email Trail:

Issue R126

From Jason Hanna:

I'm trying to fetch the schema documents from http://www.w3.org/TR/voicemail21/ and I'm finding two XSDs that appear to be unpublished in the 2.1 tree.
* grammar-core.xsd * synthesis-core.xsd
I checked the 2.0 schema tree, and sure enough they were found. I presume it's safe to use these. Should these two documents be published into the 2.1 schema folder?

Resolution: Accepted

The W3C Webmaster has added the missing files. The complete set of VoiceXML 2.1 schema files including those dependencies are available from http://www.w3.org/TR/2006/WD-voicexml21-20060915/vxml-schema.zip This .zip is referenced from Appendix B of the VoiceXML 2.1 spec. If you do reference the remote schema, we recommend that you reference it by dated URI (http://www.w3.org/TR/2006/WD-voicexml21-20060915/vxml.xsd) to ensure stability. Latest version links are inherently volatile.

Email Trail:

2.2 Technical Errors

Issue R125

From Petr Kuba:

We very appreciate the changes to the specification of foreach done in the LCWD from 15 September 2006. However, we found few discrepancies in what is written in the specification and what is written is the corresponding XML Schema.
1. Content of <foreach> in executable content except within a <prompt>
----------------------------------------------------------------------
Original text (first parahraph of Section 6):
"Within executable content, except within a <prompt>, the <foreach> element may contain any elements of executable content"
Comment: We beleive that it was ment that it may contain any elements of executable content and nothing more. However, the foreach-full.type definition in the XML Schema that applies to the <foreach> in executable content except within a <prompt> allows also the following children: break, emphasis, mark, phoneme, prosody, say-as, sub, voice, p, s which is probably not what was ment. It would introduce an inconsistency because the named elements must be in other situations enclosed in a <prompt> element.
Proposed change:
Remove the elements that cannot appear in executable content from the XML Schema.
2. Differences in <prompt> and <enumerate> content
--------------------------------------------------
The text in the first parahraph of Section 6 explicitly enumerates differencies in <prompt> and <enumerate> content but forgot to mention the <foreach> tag.
Original text:
"When <foreach> appears within a <prompt> element, it may contain only those elements valid within <enumerate> (i.e. the same elements allowed within <prompt> less <meta>, <metadata>, and <lexicon>); ..."
Proposed change: "When <foreach> appears within a <prompt> element, it may contain only those elements valid within <enumerate> (i.e. the same elements allowed within <prompt> less <meta>, <metadata>, <lexicon>, and <foreach>); ..."
3. Nesting of <foreach> in <prompt>
-----------------------------------
The XML Schema allows the <foreach> tag to be only a direct child of the <prompt> tag. Thus, nesting is not possible. Is there any rationale behind not allowing nesting of <foreach> in prompts? Allowing the <foreach> tag to be a child of another <foreach> tag in prompts would cause no harm and could be sometimes helpful. Moreover, nesting of <foreach> within executable content except within a prompt is possible.
Proposed change:
We do not propose any change in this respect, we would just like to get some rationale for the current situation. Perhaps it could be explicitly stated in the spec that nesting of <foreach> in prompts is not possible?
Any comments to our proposals are appreciated.

Resolution: Accepted

The schema will be updated as follows:

Email Trail:

2.3 Requests for Change to Existing Features

Issue R124

From Harbhanu:

VoiceXML 2.1 specification extends the <grammar> element to support dynamically generated URIs for grammar.(by addition of 'srcexpr' attribute).
I suggest that the same should only be allowed for formitem level grammars. Otherwise for all the grammars declared at dialog scope (or above), the URI has to re-evaluated and the grammar has to be defined again for each input item. This actually defies the concept of keeping a grammar at dialog level.
Also, am not able to find any use case to have the same (at dialog level or above) in any VoiceXML document.
Since, this is not added as a restriction in VoiceXML2.1 specification, all the compliant processors will anyways has to support this. So, this can be added as a restriction for grammar element with 'srcexpr' attribute.

Resolution: Rejected

As stated in the VoiceXML 2.1 specification, the expression associated with the srcexpr attribute of the <grammar> tag "must be evaluated each time the grammar needs to be activated." The ability to specify form and application level grammars is critical for real-world applications (e.g. banking, airline reservations, portal) where application behavior is dependent on user information (i.e. personalization). While such applications can be built using different VoiceXML pages for each user, this is not efficient. In contrast, one can attain optimum efficiency by using a single VoiceXML page populated with state from CCXML (session.connection.ccxml.values), <data>, or <script>. This state can then be used to drive the user interaction including both personalized prompts and grammars at any scope (e.g. a global menu implemented using a <form> or <link> declared at application scope). The VoiceXML page can be cached according to HTTP 1.1 caching semantics.
With respect to ASR processing time (i.e. the time it takes the recognizer to fetch and compile a grammar), ASR engines can and do implement caching, so if the result of evaluating the srcexpr results in a URI that was previously used by a recognizer, the recognizer need not perform much additional work to re-load the grammar in pre-compiled form. Even if 'srcexpr' is not employed, HTTP cache control can lead to forced re-compilation of grammars. As an extreme example, consider an HTTP server supplying a grammar that sets max-age to 1 and responds with a 200 and a fresh grammar resource. Execution of each grammar 'src' request, regardless of scope, may require a new grammar to be compiled at runtime. Even without the use of srcexpr, runtime compilation of application- and dialog-scoped grammars is required of VoiceXML 2.x implementations.

Email Trail:

2.4 New Feature Requests

Issue R119

From Titus von der Malsburg:

maybe I'm doing something wrong but my VoiceXML code tends to get a hell of a lot redundant. I use subdialogs, event-handling and XML macros (courtesy of a nice XSLT hack) to factor out common code as much as possible. Unfortunately there are many cases where I could use subdialogs when it was possible to parameterize them with grammars. Whereas it's possible to pass a grammar to a subdialog, the grammar element doesn't provide means for defining grammars programmatically. (Srcexpr is applicable in some situations but in others the grammar cannot be determined until runtime, so that srcexpr isn't very helpful.) What's missing here is an expr attribute as provided by other elements.
I can think of many situations beyond subdialogs where expr could be beneficially used. Since VoiceXML 2.1 already introduced the srcexpr attribute for the grammar element, I suspect that there are no technical reasons that rule out a expr attribute. Therefore I humbly propose to add it in some future version of VoiceXML.

Resolution: Deferred

Thank you for suggesting this feature. The VBWG will consider it for a future version of VoiceXML.

Email Trail:

Issue R120

From Greg Inglis:

I agree with Titus von der Malsburg comments on 8/10/06. With the addition of the <data> tag in v2.1 there seems to be some acknowlegement that VXML needed more client processing ablility (this reduces the number of complete VXML page transitions in an application). This is somewhat like the AJAX paradigm.
As an alternative, maybe using a <value> tag within <grammar> tags to dynamically specify the grammar content would be a bit less ambiguous? For example:
...
<grammar> 
   <value expr="GenerateSomeGrammar()"/>
</grammar>

Resolution: Deferred

Thank you for suggesting this feature. The VBWG will consider it for a future version of VoiceXML.

Email Trail:

Issue R122

From Srinivas R Thota:

Unlike the CCXML <fetch> element the VoiceXML <data> element has no type attribute. Can we also have the type attribute for <data> element, so that application writers can explicitly mention the type of the data they are processing on and also the type validation can be done same as like <fetch>.

Resolution: Deferred

Thank you for suggesting this feature. The VBWG will consider it for a future version of VoiceXML.

Email Trail: