This document details the responses made by the Voice Browser Working Group to issues raised during the Last Call (beginning 15 September 2006 and ending 06 October 2006) review of Voice Extensible Markup Language (VoiceXML) Version 2.1. Comments were provided by Voice Browser Working Group members, other W3C Working Groups, and the public via the firstname.lastname@example.org (archive) mailing list.
This document of the W3C's Voice Browser Working Group describes the disposition of comments as of 28th November 2006 on the Last Call Working Draft of Voice Extensible Markup Language (VoiceXML) Version 2.1. It may be updated, replaced or rendered obsolete by other W3C documents at any time.
For background on this work, please see the Voice Browser Activity Statement.
This document describes the disposition of comments in relation to Voice Extensible Markup Language (VoiceXML) Version 2.1 (http://www.w3.org/TR/2006/WD-voicexml21-20060915/). Each issue is described by the name of the commentator, a description of the issue, and either the resolution or the reason that the issue was not resolved.
The full set of issues raised for the Voice Extensible Markup Language (VoiceXML) Version 2.1 since 19th August 2003, their resolution and in most cases the reasoning behind the resolution are available from http://www.w3.org/Voice/Group/vxml21/cr/voicexml21-cr.html [W3C Members Only]. This document provides the analysis of the issues that were submitted and resolved as part of the Last Call Review. It includes issues that were submitted outside the official review period, up to 28th November 2006.
Notation: Each original comment is tracked by a "(Change) Request" [R] designator. Each point within that original comment is identified by a point number. For example, "R5-1" is the first point in the fifth change request for the specification.
|R116||Shane Smith||Clarification / Typo / Editorial||Rejected||Accepted|
|R117||Tobias Gobel||Clarification / Typo / Editorial||Rejected||Accepted|
|R118||Srinivas R Thota||Clarification / Typo / Editorial||Accepted||Accepted (no reply)|
|R119||Titus von der Malsburg||Feature Request||Deferred||Accepted|
|R120||Greg Inglis||Feature Request||Deferred||Accepted|
|R121||Teemu Tingander||Clarification / Typo / Editorial||Rejected||Accepted|
|R122||Srinivas R Thota||Feature Request||Deferred||Accepted|
|R123||Rethish Kumarps||Clarification / Typo / Editorial||Accepted||Accepted (no reply)|
|R124||Harbhanu||Change to Existing Feature||Rejected||Rejected|
|R125||Petr Kuba||Technical Error||Accepted||Accepted (no reply)|
|R126||Jason Hanna||Clarification / Typo / Editorial||Accepted||Accepted (no reply)|
From Shane Smith:
I think the behavior of application.lastresult$ needs clarification. From the 2.0 spec:
The number of application.lastresult$ elements is guaranteed to be greater than or equal to one and less than or equal to the system property "maxnbest". If no results have been generated by the system, then " application.lastresult$" shall be ECMAScript undefined.
The behavior on most platforms is that this array only exists when a valid result occurs. But, in 2.1, we introduce new behavior concerning utterance recording. While recording user utterances on recognition is valuable, it's even *more* valuable to gather invalid recordings.... things that triggered a nomatch. In fact, the example from lcwd shows exactly this:<nomatch count="3"> <var name="the_recording" expr="application.lastresult$.recording"/> <submit method="post" enctype="multipart/form-data" next="upload.cgi" namelist="the_recording"/> </nomatch>
Even reading the first 3 paragraphs of section seven give the impression that you need to actually have valid recognition for these shadow variables to become available. I have yet to find a 2.1 compliant vendor that has offers anything in the lastresult array when a nomatch occurs, and I think we should offer some clarification on this change from 2.0. If it's in 2.1, then I missed it, sorry.
From Tobias Gobel:
I have tested a number of platforms so far which support utterance recording. All except one fill the lastresult$ on a NoMatch event. And the one that currently doesn't claimed they will fix this some time soon. The 2.0 spec says:
"All of the shadow variables described above are set immediately after any recognition. In this context, a <nomatch> event counts as a recognition, and causes the value of "application.lastresult$" to be set" So it explicitly mentions that lastresult$ must be set in a NoMatch scenario. To make things clearer, though, I agree that the 2.1 spec could and should also explicitly mention this fact.
In the example you mention, the application.lastresult$.recording is first assigned to a variable and then put in the submit's namelist. Is this really required? Again, all except one platform I've tested support having the application.lastresult$.recording itself in the namelist, without assigning it to a variable first. The spec should be clearer about this, too.
From Srinivas R Thota:
Referring to section "5 Using <data> to Fetch XML Without Requiring a Dialog Transition" (http://www.w3.org/TR/voicexml21/#sec-data) of VoiceXML 2.1 latest document, mentions about only read-only subset of DOM bindings which are needed, but the "Appendix D ECMAScript Language Binding for DOM" (http://www.w3.org/TR/voicexml21/#sec-data-dom) specifies methods other than read-only subset such as set, insert, remove, create etc. This was not present in the previous version (http://www.w3.org/TR/2005/CR-voicexml21-20050613/). Is this a Bug in the new document. ?
From Teemu Tingander:
Can implementation be conformant with VoiceXML 2.0 specification if the DOM exposed by <data/> is read-write ? And if not please explaing why (and i don' t mean why in context: cause there reads "as real-only subset" in latest candidate)? And did you evaluate the possibility to use E4X ( ECMA-357 ) instead of W3C DOM (If I remember correctly there was some discussion about this..); the E4X interface IS more ECMA oriented than DOM mapping.. And please be more prompt what is the subset for dom. createElement is hardly readonly method. What you mean by read-only; that the user is not able to create new documents ?
From Rethish Kumarps:
The <mark> element in VoiceXML 2.1 specifies that the "markname" in application.lastresult$ is to be set as the "name of the mark last executed by the SSML processor before barge-in occurred".
But there could be a cross-over scenario when the Speech synthesizer and Speech Recognizer are not part of the same MRCP session.
In this case, the Speech recognizer would send a START-OF-INPUT on bargein to the client which would send a BARGE-IN-OCCURED to the Speech Synthesizer.
But there could be a delay in sending the latter message and new <mark> could have been executed by this time.
So, is the mark information received before sending the BARGEIN_OCCURED to be honoured or the last one is to be honoured?
From Jason Hanna:
I'm trying to fetch the schema documents from http://www.w3.org/TR/voicemail21/ and I'm finding two XSDs that appear to be unpublished in the 2.1 tree.
* grammar-core.xsd * synthesis-core.xsd
I checked the 2.0 schema tree, and sure enough they were found. I presume it's safe to use these. Should these two documents be published into the 2.1 schema folder?
From Petr Kuba:
We very appreciate the changes to the specification of foreach done in the LCWD from 15 September 2006. However, we found few discrepancies in what is written in the specification and what is written is the corresponding XML Schema.
1. Content of <foreach> in executable content except within a <prompt>
Original text (first parahraph of Section 6):
"Within executable content, except within a <prompt>, the <foreach> element may contain any elements of executable content"
Comment: We beleive that it was ment that it may contain any elements of executable content and nothing more. However, the foreach-full.type definition in the XML Schema that applies to the <foreach> in executable content except within a <prompt> allows also the following children: break, emphasis, mark, phoneme, prosody, say-as, sub, voice, p, s which is probably not what was ment. It would introduce an inconsistency because the named elements must be in other situations enclosed in a <prompt> element.
Remove the elements that cannot appear in executable content from the XML Schema.
2. Differences in <prompt> and <enumerate> content
The text in the first parahraph of Section 6 explicitly enumerates differencies in <prompt> and <enumerate> content but forgot to mention the <foreach> tag.
"When <foreach> appears within a <prompt> element, it may contain only those elements valid within <enumerate> (i.e. the same elements allowed within <prompt> less <meta>, <metadata>, and <lexicon>); ..."
Proposed change: "When <foreach> appears within a <prompt> element, it may contain only those elements valid within <enumerate> (i.e. the same elements allowed within <prompt> less <meta>, <metadata>, <lexicon>, and <foreach>); ..."
3. Nesting of <foreach> in <prompt>
The XML Schema allows the <foreach> tag to be only a direct child of the <prompt> tag. Thus, nesting is not possible. Is there any rationale behind not allowing nesting of <foreach> in prompts? Allowing the <foreach> tag to be a child of another <foreach> tag in prompts would cause no harm and could be sometimes helpful. Moreover, nesting of <foreach> within executable content except within a prompt is possible.
We do not propose any change in this respect, we would just like to get some rationale for the current situation. Perhaps it could be explicitly stated in the spec that nesting of <foreach> in prompts is not possible?
Any comments to our proposals are appreciated.
VoiceXML 2.1 specification extends the <grammar> element to support dynamically generated URIs for grammar.(by addition of 'srcexpr' attribute).
I suggest that the same should only be allowed for formitem level grammars. Otherwise for all the grammars declared at dialog scope (or above), the URI has to re-evaluated and the grammar has to be defined again for each input item. This actually defies the concept of keeping a grammar at dialog level.
Also, am not able to find any use case to have the same (at dialog level or above) in any VoiceXML document.
Since, this is not added as a restriction in VoiceXML2.1 specification, all the compliant processors will anyways has to support this. So, this can be added as a restriction for grammar element with 'srcexpr' attribute.
From Titus von der Malsburg:
maybe I'm doing something wrong but my VoiceXML code tends to get a hell of a lot redundant. I use subdialogs, event-handling and XML macros (courtesy of a nice XSLT hack) to factor out common code as much as possible. Unfortunately there are many cases where I could use subdialogs when it was possible to parameterize them with grammars. Whereas it's possible to pass a grammar to a subdialog, the grammar element doesn't provide means for defining grammars programmatically. (Srcexpr is applicable in some situations but in others the grammar cannot be determined until runtime, so that srcexpr isn't very helpful.) What's missing here is an expr attribute as provided by other elements.
I can think of many situations beyond subdialogs where expr could be beneficially used. Since VoiceXML 2.1 already introduced the srcexpr attribute for the grammar element, I suspect that there are no technical reasons that rule out a expr attribute. Therefore I humbly propose to add it in some future version of VoiceXML.
From Greg Inglis:
I agree with Titus von der Malsburg comments on 8/10/06. With the addition of the <data> tag in v2.1 there seems to be some acknowlegement that VXML needed more client processing ablility (this reduces the number of complete VXML page transitions in an application). This is somewhat like the AJAX paradigm.
As an alternative, maybe using a <value> tag within <grammar> tags to dynamically specify the grammar content would be a bit less ambiguous? For example:
... <grammar> <value expr="GenerateSomeGrammar()"/> </grammar>
From Srinivas R Thota:
Unlike the CCXML <fetch> element the VoiceXML <data> element has no type attribute. Can we also have the type attribute for <data> element, so that application writers can explicitly mention the type of the data they are processing on and also the type validation can be done same as like <fetch>.