Copyright ©2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document details the responses made by the Voice Browser Working Group to issues raised during the Last Call (beginning 24 April 2002 and ending 24 May 2002) review of Voice Extensible Markup Language (VoiceXML) Version 2.0 . Comments were provided by Voice Browser Working Group members, other W3C Working Groups, and the public via the www-voice-request@w3.org (archive) mailing list.
This document of the W3C's Voice Browser Working Group describes the disposition of comment as of 29th November 2002 on Voice Extensible Markup Language (VoiceXML) Version 2.0 Last Call. It may be updated, replaced or rendered obsolete by other W3C documents at any time.
For background on this work, please see the Voice Browser Activity Statement.
This document describes the disposition of comments in relation to the Voice Extensible Markup Language (VoiceXML) Version 2.0 (http://www.w3.org/TR/2002/WD-voicexml20-20020424/). Each issue is described by the name of the commentator, a description of the issue, and either the resolution or the reason that the issue was not resolved.
The full set of Issues raised for the Voice Extensible Markup Language (VoiceXML) Version 2.0 since August 2000, their resolution and in most cases the reasoning behind the resolution are available from http://www.w3.org/Voice/Group/2002/voicexml-change-requests.htm [W3C Members Only]. This document provides the analysis of the issues that were submitted and resolved as part of the Last Call Review. It includes issues that were submitted outside the official review period, up to 1st October 2002.
Notation: Each original comment is tracked by a "(Change) Request" [R] designator. Each point within that original comment is identified by a point number. For example, "R5-1" is the first point in the fifth change request for the specification.
Item | Commentator | Nature | Disposition |
R419-1 | Teemu Tingander | Clarification / Typographical / Editorial (§2.1) | Accepted |
R426-1 | Teemu Tingander | Clarification / Typographical / Editorial (§2.1) | Accepted |
R426-2 | Teemu Tingander | Clarification / Typographical / Editorial (§2.1) | Accepted |
R426-3 | Teemu Tingander | Clarification / Typographical / Editorial (§2.1) | Accepted |
R467-1 | Lyndel McGee | Clarification / Typographical / Editorial (§2.1) | Accepted |
R468-1 | Lyndel McGee | Clarification / Typographical / Editorial (§2.1) | Accepted |
R469-1 | Deborah Dahl | Feature Request (§2.4) | Accepted |
R469-2 | Deborah Dahl | Change to Existing Feature (§2.3) | Accepted |
R469-3 | Deborah Dahl | Feature Request (§2.4) | Accepted |
R469-4 | Deborah Dahl | Clarification / Typographical / Editorial (§2.1) | Accepted |
R470-1 | Stefan Hamerich | Change to Existing Feature (§2.3) | Accepted |
R470-2 | Stefan Hamerich | Feature Request (§2.4) | Accepted |
R470-3 | Stefan Hamerich | Change to Existing Feature (§2.3) | Accepted |
R470-4 | Stefan Hamerich | Feature Request (§2.4) | Accepted |
R471-1 | Matthew Wilson | Clarification / Typographical / Editorial (§2.1) | Accepted |
R471-2 | Matthew Wilson | Clarification / Typographical / Editorial (§2.1) | Accepted |
R471-3 | Matthew Wilson | Clarification / Typographical / Editorial (§2.1) | Accepted |
R471-4 | Matthew Wilson | Clarification / Typographical / Editorial (§2.1) | Accepted |
R471-5 | Matthew Wilson | Clarification / Typographical / Editorial (§2.1) | Accepted |
R471-6 | Matthew Wilson | Clarification / Typographical / Editorial (§2.1) | Accepted |
R471-7 | Matthew Wilson | Clarification / Typographical / Editorial (§2.1) | Accepted |
R471-8 | Matthew Wilson | Clarification / Typographical / Editorial (§2.1) | Accepted |
R471-9 | Matthew Wilson | Clarification / Typographical / Editorial (§2.1) | Accepted |
R472-1 | Bogdan Blaszczak | Change to Existing Feature (§2.3) | Accepted |
R472-2 | Bogdan Blaszczak | Clarification / Typographical / Editorial (§2.1) | Accepted |
R477-1 | Guillaume Berche | Technical Error (§2.2) | Accepted |
R477-2 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R477-3 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R477-4 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R477-5 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R477-6 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R477-7 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R477-8 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R477-9 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R478-1 | Al Gilman | Change to Existing Feature (§2.3) | Accepted |
R478-2 | Al Gilman | Clarification / Typographical / Editorial (§2.1) | Accepted |
R478-3 | Al Gilman | Clarification / Typographical / Editorial (§2.1) | Accepted |
R478-4 | Al Gilman | Clarification / Typographical / Editorial (§2.1) | Accepted |
R495-1 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R495-2 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R495-3 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R495-4 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R495-5 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R495-6 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R496-1 | Guillaume Berche | Change to Existing Feature (§2.3) | Accepted |
R496-2 | Guillaume Berche | Change to Existing Feature (§2.3) | Accepted |
R502-1 | Teemu Tingander | Clarification / Typographical / Editorial (§2.1) | Accepted |
R503-1 | Teemu Tingander | Clarification / Typographical / Editorial (§2.1) | Accepted |
R505-1 | Ray Whitmer | Technical Error (§2.2) | Accepted |
R505-2 | Ray Whitmer | Technical Error (§2.2) | Accepted |
R505-3 | Ray Whitmer | Technical Error (§2.2) | Accepted |
R505-4 | Ray Whitmer | Change to Existing Feature (§2.3) | Accepted |
R507-1 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R507-2 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R507-3 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R511-1 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R511-2 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R511-3 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R511-4 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-1 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-2 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-3 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-4 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-5 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-6 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-7 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-8 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-9 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-10 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-11 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-12 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-13 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
R519-14 | Guillaume Berche | Clarification / Typographical / Editorial (§2.1) | Accepted |
From Teemu Tingander
While reading spec 2.0 I found following "problem". --> Taken form W3C VoiceXML 2.0 draft in chapter 4.1.6 "Each form item and each menu has an internal prompt counter that is reset to one each time the form or menu is entered. Whenever the system uses a prompt, its associated prompt counter is incremented. This is the mechanism supporting tapered prompts." So the question is : Are prompt counts maintained for <form/> and <menu/> elements only or are they maintainded for each <*form item*> - specified in chapter 2.1.2 ? If we are working like in events; Counters are reseted when entering <form/> or <menu/> and we have individual counter for each <*form item*> , but initial uses <form/>s counters. This is right? So does this mean that when <goto item="this" /> occurs in field "that" while field "this" has prompt counter 3 the prompt that has count 3 is selected. So entering <field name="this">..</field> does not reset counter. So returning to <*form item*> that has prompted something before maintains <*form item*>s prompt count and allows us to do "tapared prompting", yeah ?
Resolution: Accepted
In Section 5.2.2 of the CR specification, we have clarified in the description of "count" attribute of <catch> when counters are reset: "The occurrence of the event (default is 1). The count allows you to handle different occurrences of the same event differently. Each <form>, <menu>, and form item maintains a counter for each event that occurs while it is being visited. Item-level event counters are used for events thrown while visiting individual form items and while executing <filled> elements contained within those items. Form-level and menu-level counters are used for events thrown during dialog initialization and while executing form-level <filled> elements. Form-level and menu-level event counters are reset each time the <menu> or <form> is re-entered. Form-level and menu-level event counters are not reset by the <clear> element. Item-level event counters are reset each time the <form> containing the item is re-entered. Item-level event counters are also reset when the item is reset with the <clear> element. An item's event counters are not reset when the item is re-entered without leaving the <form>. Counters are incremented against the full event name and every prefix matching event name; for example, occurrence of the event "event.foo.1" increments the counters for "event.foo.1" plus "event.foo" and "event".Email Trail:
From Teemu Tingander
EVENT HANDLING: As Jean-Michel Reghem [reghem@babeltech.com] in his mail about events already pointed out there seems to be some unanswered questions in event handling in field elements. We seem to have 2 different kind of field elements from the event handling point of view This makes it difficult to application developer to always know where FIA's going. Solutions for these problems may be easily solved in FIA, but I hope that this is not the path that we want to follow. <object> is out from this discussion cause it already is quite difficult for application developer. In field elements <field> <initial> event thrown in collect phase ( like event "nomatch" ) prevents field to be filled unless otherwise assigned in catch handler and by doing this prevent <filled> execution. This makes sense to me. How ever handling events in <subdialog> this is not clear. The VXML 2.0 Draft specifications section 5.3.10 it is mentioned that ; "In returning from a subdialog, an event can be thrown at the invocation point, or data is returned as an ECMAScript object" and also after example : "The subdialog event handler for <nomatch/> is triggered on the third failure to match; when triggered, it returns from the subdialog, and includes the nomatch event to be thrown in the context of the calling dialog. In this case, the calling dialog will execute its <nomatch/> handler, rather than the <filled/> element, where the resulting action is to execute a <goto/> element. Under normal conditions, the <filled> element of the subdialog is executed after a recognized social security number is obtained, and then this value is returned to the calling dialog, and is accessible as result.ssn" It would clarify the thing quite if that example wouldn't have that <goto/> element. because it anyways causes FIA to exit. So should the field remain unfilled and visited again, like in <field>. I think that is should. BUT then we go to VoiceXML.orgs conformance examples and the first subdialog example. In my opinion it should end into endless loop. This would make sense in <field> like event handling. This needs to be clarified in specification. In the case of event; if field is not needed to be visited again <subdialog> elements cond change or <assign> to fill the field , like in <field>, could be used. The case with <record> element is almost the same as in <subdialog> but even more complicated . Lets start form FIA side of the view; As the specification says FIA only has 2 states : Processing input and document and Collecting user input ( Events OR Utterances ) The combining first and second example of <record/>: <?xml version="1.0"?> <vxml version="2.0"> <form> <record name="greeting" beep="true" maxtime="10s" finalsilence="4000ms" dtmfterm="true" type="audio/wav"> .. no change here .. </record> <field name="confirm" type="boolean"> .. no channge here .. </field> <catch name="telephone.disconnect.hangup"> <if cond="greeting"> <submit next="save_greeting.pl" method="post" namelist="greeting"/> </if> </catch> </form> </vxml> This comes much about the underlying system but I think that if there was something recorded on underlying layer ( in here we have to think silence detection etc. systems ) and then hang-up , the collection phase in field "greeting" should return the Utterance then on the collection phase for confirm the <event> telephone.hangup will be connected form underlying system and normal catch handling will take place. This is the case if collection phase produces only Events OR Utterances not BOTH. If I havent already brought it up I think that field gets filled only if event is not thrown in collection phase. < form> level <catch> will handle this quite neatly The field elment <transfer> is like <field> to me. Should it be ( so in case of blind transfer ) the event is handled and field is not filled. Hopefully that <catch> does the <exit/> or similiar. Is this the case with <disconnect>; in specification for <blind> transfer " The interpreter disconnects from the session and continues execution (if anything remains to execute) but cannot regain control of the call. The caller and callee remain connected in a conversation " This is all about event handling .. There were few clarifications about event counting issues that I pointer in my earlier mails.
Resolution: Rejected
We are unclear exactly what this issue is about since it spans a number of different points. Please re-formulate more clearly.Email Trail:
From Teemu Tingander
DOCUMENT TRANSITIONS .. VXML 2.0 draft introduces more transitions within document, and some of these are not clearly explained. I would like to have some clarifications into specification to make application developers life little easier. I hope We all know the scoping of attributes and paramaters. Ill try to explain the problem. We have document 1 which is a Application Doc for doc 2 like on specs figure 3. In step 1 . Doc is an application root , but we dont know this at this time .. So should we initialize the variables in doc scope of doc 1 into scope document, Yes. But as said Root document is application by itself ( in spec Root2Root; "The root context is initialized with the new application root document, even if the documents are the same or have the same application name" ) so should variables ( and parameters ) in scope application be identical ( application == document ). ok. lets keep it this way. Should there be variables in scope application at all ? In Step 2. The doc scope should be replaced with variables for leaf doc 2. Right because this is the "doc" that we are executing . Yep This is easy. In Step 3. Same thing but with variables form doc 3 In Step 4. Do we just need to copy variables and parameters from scope app -> doc because they are the "doc" variables of this document. This is my interpretation of Specification i hope that it is right. So. I would like the changes in scopes to be clarified in case of Root2Leaf And Leaf2Root transition.
Resolution: Accepted
We have clarified in Section 5.1 of the CR specification that: "Note that while executing inside the application root document document.x is equivalent to application.x.".Email Trail:
From Teemu Tingander
LINKS (CHOICES) AND CATCHES As specified the <link> element ( also <choice> ) has three attributes that are derived from other components ( or should I say shorthand ) , event next and expr so: <link dtmlf="0" next="something" /> <link dtmlf="0" expr="'something'+'something'" /> <link dtmlf="0" event="help" /> They could be implemented as - The first 2 cases: <link dtmlf="0"> <goto next="something"/> </link> or <link dtmlf="0"> <throw event="help"/> </link> I think that this would be more clear what happens. and would make it possible to use <submit> too. In <choice> elements this could be little tricky, but would work also. This approach would make it possible to support <log> tag inside link too. And what it comes to links it seems to me that they share lots of common with catches - they catch an invocation to their grammar and process throw or goto .. almost like <catch grmr="link1_grmr"> <goto next="something"/> </catch> or <catch event="link1_grmr"> <goto next="something"/> </catch> Cause in link it is only needed to know what grammar triggered. So catch and link are special cases of catch ? Another thing is <prompt> , and to me it is a Big thing :) I would like to know why it is permitted to write "prompts" without proper tagging. I personally think that this wouldn't change the way how pages are made radically if <prompt>- tags would be mandatory; It would clarify the content quite a much.
Resolution: Rejected
While you are correct that there are similarities between <link> and <catch>, <link> is a simplified form and we want to keep it as simple as possible. If you want to do more powerful things with a <link>, then you can use a document-scope <form> instead.Email Trail:
From Lyndel McGee
Section 2.1.5 specifies (the 2nd sentence): " To make a form mixed initiative, where both the computer and the human direct the conversation, it must have one or more <initial> form items and one or more form-level grammars. " That implies that <initial> is a required element of the mixed-initiative form. The examples use <initial>, too. However, FIA does not seem to have any provisions to enforce it. Our questions: - Is <initial> required for a form to be mixed-initiative ? - Or, does a form-level grammar alone imply mixed-initiative? If <initial> is not required, then there seem to be no benefit in defining directed and mixed-initiative forms (a VoiceXML language structure). Instead, the directed and mixed-initiative behaviors should be discussed in terms of item modality and grammar types and scoping (VoiceXML use cases). For example, the following language could be used: - A 'directed dialog' can be implemented by using form item-level grammars rather than form-level grammars. If it is desired to restrict user options to just the item's grammar, the form item should be made modal. Otherwise, grammars in wider scopes may still accept user utterances (eg. links with 'restart', 'new order', etc.) and restart interpretation at a different form. - A 'mixed-initiative dialog' can be implemented by using form-level grammars that may return multiple slots and thus allow multiple form items to be filled from a single caller utterance. The <initial> form item can be used in this scenario to prompt for and collect an utterance before executing any input items of the form (which may have their own specialized grammars and may potentially capture the recognition results as their own input). Otherwise, if <initial> is required for a form to be mixed-initiative, a form without <initial> would be a directed form regardless of the presence of a form-level grammar. In such case, any utterances would be processed in the context of individual input items rather than in the form context. The form items will be filled one at a time.
Resolution: Accepted
In the CR specification, we have clarified that (a) mixed initiative is a style of dialog (not a form sub-type), and (b) that <initial> isn't necessary for mixed-initiative dialog but one way of doing it. In particular, the first paragraph of 2.1.5 now reads: "The last section talked about forms implementing rigid, computer-directed conversations. To make a form mixed initiative, where both the computer and the human direct the conversation, it must have one or more form-level grammars. The dialog may be written in several ways. One common authoring style combines an <initial> element that prompts for a general response with <field> elements that prompt for specific information. This is illustrated in the example below. More complex techniques, such as using the 'cond' attribute on <field> elements, may achieve a similar effect.Email Trail:
From Lyndel McGee
Let's consider interpretation of a VoiceXML document where: - there is a form with multiple fields, - the form has a form-level grammar that can return multiple slots, - the fields do not have their own grammars, - it is a mixed-initiative form (see also the problem #1 above), - the first recognition result fills some fields, but not all of them, - another caller utterance is needed to fill the remaining fields. Our questions: [a] - Is it expected that the form will switch to the 'directed dialog' mode after the first utterance and then consider only unfilled items for the subsequent utterances (see also problem #1 above) ? [b] - Or, will the form remain in the 'mixed-initiative dialog' mode and will user utterances continue to be mapped to multiple input fields (as the 2nd table in section 3.1.6.3 seems to imply) ? [c] - And, if the form is to remain in the 'mixed-initiative dialog' mode, can the next user utterance overwrite fields that have been already filled or will those fields retain their previous values ? To illustrate the problem, let's assume that: - the fields are 'size', 'color', and 'shape', - the first utterance is 'big square', - the second prompt says 'Please provide the color', - the second utterance is 'blue triangle'. Will the completed form be 'big blue square' or 'big blue triangle' ? The 2nd table in section 3.1.6.3 should be updated to cover all (canonical) combinations of user input and dynamic states of form components.
Resolution: Accepted
[a] and [b] don't require spec changes; Accepted [c]. See response to R467 for clarification of terms 'directed dialog' and 'mixed initiative dialog'. [a]/[b]: Given you only have a form-level grammar in your example, it is the only grammar that can be matched by user input. When the FIA visits the form, it will go to the prompts in an <initial> if present, read out the prompts in <initial> and activate the form-level grammar. If there is no <initial>, it will go to the first field and do the same thing there. After the first recognition fills in some but not all fields, the <initial> can no longer be visited, and the FIA will go to the next unfilled field, queuing its prompts and again activate the form-evel grammar (there are no other grammars in your example!). This will continue until all fields are filled. [c] We have clarified in 3.1.6.1 of the CR specification that matching form-level grammars can override existing values in input items and that <filled> processing of these items takes place as described Section 2.4 and Appendix C.Email Trail:
From Deborah Dahl
Grammar tag's content model Section 3.1.1 should state explicitly that the SRGS grammar tag is extended to allow PCDATA for inline grammar formats besides SRGS. It currently says that SRGS tags including grammar have not been redefined. Priority: High
Resolution: Accepted
We have clarified in Sections 3.1.1 and 3.1.1.4 of the CR specification that the SRGS <grammar> element is extended in VoiceXML 2.0 to allow PCDATA for inline grammar formats besides the XML format of SRGS.Email Trail:
From Matthew Wilson
The index between Appendix N and Appendix P appears to be Appendix zero, not Appendix O.
Resolution: Accepted
Corrected in CR specification.Email Trail:
From Matthew Wilson
In section 6.5, "Time Designations", the example "+1.5s" still contradicts the text, which describes the format as "an unsigned number followed by an optional time unit identifier".
Resolution: Accepted
Clarified in section 6.3 that a time designator is a non-negative number which must be followed by ms or s (i.e. it is fully aligned with Time in CSS2).Email Trail:
From Matthew Wilson
Section 2.1.2.1 "Input Items" says that "implementations must handle the <object> element by throwing error.unsupported.object.objectname if the particular platform-specific object is not supported". Section 2.3.5 "OBJECT" says that "implementations must handle the <object> element by throwing error.unsupported.object if the particular platform-specific object is not supported" (i.e. it does not include the object name in the event name). Section 5.2.6 "Event Types" does not list any error.unsupported.object events, but does include error.unsupported.format, which is raised if "The requested resource has ... e.g. an unsupported ... object type". Could this be clarified?
Resolution: Accepted
In CR specification, changed and clarified in sections 2.1.2.1 and 2.3.5 that if an implementation does not support a specific object, it throws error.unsupported.objectname. In section 5.2.6 that the event error.unsupported.format is not thrown for unsupported object types. The section 5.2.6 is not intended as an exhaustive list of event types (no change).Email Trail:
From Matthew Wilson
Events such as error.unsupported.uri, error.unsupported.language, error.unsupported.format are ambiguous, since they could also be occurrences of error.unsupported.<element> if incorrect elements have been used in the VoiceXML document.
Resolution: Accepted
Clarified 5.2.6 by adding that <element> in error.unsupported.<element> refers to "elements defined in this specification."Email Trail:
From Matthew Wilson
Section 6.1.2.1 says that "VoiceXML allows the author to control the caching policy for each use of each resource." Is this true of the application root document?
Resolution: Accepted
Clarified in 6.1.2.1 that there is no markup mechanism to specify the caching policy on a root document.Email Trail:
From Matthew Wilson
Regarding the "builtin" URI scheme, http://www.w3.org/Addressing/schemes#unreg says that "Unregistered schemes should not be deployed widely and should not be used except experimentally." Is there any intention to register the "builtin" scheme?
Resolution: Accepted
We are still trying to determine if there is an existing URI scheme which we can reuse for VoiceXML builtin. If we are unable to find one, then we will register the builtin scheme.Email Trail:
From Matthew Wilson
There is a typo "attibute" in the schema in Appendix O (in the xsd:annotation for the Accept.attrib attributeGroup).
Resolution: Accepted
Corrected.Email Trail:
From Matthew Wilson
The "minimal Conforming VoiceXML document" in appendix F1 is not minimal. As the text itself states, the XML declaration, and the xmlns:xsi and xsi:schemeLocation are not rqeuired for conformance.
Resolution: Accepted
In appendix F, changed description of example so that it is not described as minimal.Email Trail:
From Matthew Wilson
Section 5.1.3 says, when referring to XML-escaping characters such as <, >, and & : "For clarity, examples in this document do not use XML escapes." I strongly disagree with this decision. I think that having examples in the spec which do not work is more likely to lead to confusion than anything else. If there is a lack of clarity in VoiceXML in places, then I believe the spec should not try to hide the fact.
Resolution: Accepted
Removed this text and updated all examples to escaped XML characters.Email Trail:
From Bogdan Blaszczak
A standard solution for a playback control can be based on SSML ideas. For example, VoiceXML may allow additional attributes to be used in the <audio> tag. Such attributes can be modeled on selected attributes of <prosody> (see also section 2.2.4 of the SSML spec). The attributes would be optional and possibly ignored by some platforms. The following additional <audio> attributes would be useful: - speed: the playback speed in percent of the normal speed (e.g.:50%, 100%, 200%) or values 'slow', 'normal', 'fast'. - volume: the playback volume in percent of the normal volume (e.g.:50%, 100%, 200%) or values 'soft', 'normal', 'loud'. - position: the playback start position in seconds from the beginning of the audio recording (e,g.: 1s, 100s).
Resolution: Rejected
This issue has been discussed many times by the team, and has been decided as beyond the scope of VoiceXML 2.0. However, it could be addressed in the next version of VoiceXML.Email Trail:
From Guillaume Berche
In section 4.1.5, when playing prompt with a false bargein attribute, are matching DTMF or speech input buffered or discarded? Suggested fix: in section 4.1.5, modify the text to "When the bargein attribute is false, any DTMF input buffered in a transition state is deleted from the buffer (Section 4.1.8 describes input collection during transition states). In addition, while in the waiting state and a prompt whose bargein attribute is false, any user input (speech or DTMF) is simply ignored."
Resolution: Accepted
Clarified in section 4.1.5 that when a prompt's "bargein" attribute is false, no input is buffered while the prompt is playing (any DTMF already buffered is discarded).Email Trail:
From Guillaume Berche
It is not clear whether DTMF input which does not match currently active grammars should interrupt a prompt whose bargein attribute is true Suggested fix: In section 4.1.5, correct the the first sentence with the following: "If an implementation platform supports barge-in, the application author can specify whether a user can interrupt, or "barge-in" on, a prompt using speech or DTMF input. In the case of DTMFs, any input (even not matching active grammar) will interrupt a prompt, and will be handled in the same way as non matching DTMFs entered outside of a prompt."
Resolution: Accepted
We have a slightly different solution, though. We have clarified in section 4.1.5.1 that the "bargeintype" attribute of <prompt> applies to DTMF input as well as speech input.Email Trail:
From Guillaume Berche
Just clarify 4.1.5 with respect to interruption of a chain of queued prompts Suggested fix: in section 4.1.5, modify the text to: "Users can interrupt a prompt whose bargein attribute is true, but must wait for completion of a prompt whose bargein attribute is false. In the case where several prompts are queued, the bargein attribute of each prompt is honored during the period of time in which that prompt is playing. If bargein occurs during any prompt in a sequence, all subsequent prompts are not played **(even those whose bargein attribute are set to false)**."
Resolution: Accepted
Edit applied.Email Trail:
From Guillaume Berche
Analysis:For completeness and convenience, an extract from section 4.1.5 below should be reproduced or at least mentionned in section 4.1.8. Suggested fix: add the sentence below before "Before the interpreter exits all ..." "As stated in section 4.1.5, when the bargein attribute is false, any DTMF input buffered in a transition state is deleted from the buffer".
VBWG: Rejected. We didn't see a clear use case or motivation for this change. Berche: The motivation for my comment is that information is spread around in the specification. My feedback is that it is often hard to have a good understanding of the behavior of the language because general behavior such as input processing is scattered in different sections. Adding such cross references would make the specifications easier to read and understand. In the specific case of this request, the section "4.1.8 Prompt Queueing and Input Collection" provides a general description of the algorithm and state the interpreter uses to process input and play prompts. However it omits the description of bargein which impacts both input processing and prompt queueing.
Resolution: Accepted
Added in Section 4.1.8 a cross-reference to Section 4.1.5.Email Trail:
From Guillaume Berche
Concerning ECMAScript variables holding non-scalar values (such as field item variable for a record fieldaudio, or the special _prompt variable as mentionned in my previous mail) - what ECMAScript type do they have? Is it indeed an ECMAScript an host object as defined in the ECMAScript specifications (or Array object containing other objects in the case of the _prompt variable). If so, what is their exact list of properties along with their type and properties (ReadOnly, DontEnum, DontDelete, Internal)?. As a side-question, what does the ECMAScript typeof operator returns on these objects? Concerning ECMAScript special variables (such as <name>$.<shadow_var> in fields) - can they be modified by (of as a side effect of) ECMAScript code evaluation (such as evaluating a guard condition, or an expr attribute)? Suggested fix: Add a specific section about ECMAScript evaluation. This section could precise runtime error that occur during ECMAScript evaluation, possible side-effects of ECMAScript evaluation (such as cond attribute evaluation), and also the type of shadow variables with the text below: "Shadow variables are host objects as defined in the ECMAScript specifications. The properties of these shadow variables are read-only. Any attempt by some ECMAScript code evaluation (either in a script element or as a side effect of the evaluation of an expr attribute) to modify those properties will result in an error.semantic to be thrown"
Resolution: Rejected/Accepted
Many questions here! The properties of ECMAScript variables in VoiceXML are not specified unless necessary. With <record>, we have clarified that its implementation is platform-dependent (so different implementations can have different ECMAScript properties) but that all must playable by <audio>, submittable by <submit> and so on as described in the specification. For shadow variables, we have clarified that they are writable, so they can be modified. For runtime errors encountered during the FIA, we have clarified that in the FIA Appendix.Email Trail:
From Guillaume Berche
Section 2.2 describes that the _prompt special variable "is the choice's prompt". The type of this variable is fuzzy and the specs does precise the behavior of <value expr="_prompt"/> in case where a choice element would contain mixed audio prompts and TTS. Suggested fix: add the following text to section 2.2 in the enumerate element section: "This specifier may refer to two special variables: _prompt is the choice's prompt, and _dtmf is the choice's assigned DTMF sequence. The _prompt special variable is a host object which has no visible properties and should only be used within a <value expr="_prompt"> element. If the choice contained more than one prompt element (such as TTS elements, or a nested <value> element) then executing the <value expr="_prompt"> would queue all of the prompt elements and would also execute the nested <value> element. If the nested <value> element references itself the _prompt variable, this would lead to an infinite recursive loop, that interpreter may detect and handle by throwing an error.semantic event. The _dtmf special variable is of type string and may be used as such by ECMAScript code within the expr attribute of the <value> element."
Resolution: Rejected
The assumption that the <choice> element can contain SSML is no longer true: in the CR version of specification a <choice> element can now only contain CDATA.Email Trail:
From Guillaume Berche
Clarification to FIA with respect to run-time errors: When the evaluation of a guard condition results in a run-time exception, how does this modify the FIA. The FIA algorithm in appendix C seems to only consider exceptions generated during the execution phase and remains fuzzy about those that occur during previous phases (such as initialization, queuing of prompts such as <value>, evaluation of guard condition) Suggested fix: modify the FIA so that it state that "any runtime error occuring during the select phase (e.g. runtime error to evaluate guard condition), collect phase (e.g. runtime error at prompt queuing for instance during <value> element execution) up to the input collection result in the control being directly passed onto the process phase."
Resolution: Accepted
Clarified that FIA handles run-time errors in all phases.Email Trail:
From Guillaume Berche
In the FIA, appendix C, for the collection of active grammars when not modal, it says that these include "elements up the <subdialog> call chain." This seems to be in contradiction with the section on <subdialog> which says each subdialog has a totally separate context from the caller, and shares/inherits absolutely no elements with it. Suggested fix: Remove the "and then elements up the <subdialog> call chain." from the FIA description.
Resolution: Accepted
Correction applied.Email Trail:
From Al Gilman
Analysis:Voice is center, not limit, of domain of application. [reference: Appendix H] The appendix paints the applicability of this technology too narrowly. Even in cases where the dialog designer only thinks in a voice dialog context, the resulting dialogs have been thought through thoroughly and are expected to be, for example, - highly usable as transcoded into a text-telephone delivery context - likewise as transcoded into a Braille environment for those who are both deaf and blind. The idea that this technology is 'final form' should be eliminated and the language brought more in line with that in section 2.1 of the latest Member draft of the Speech Recognition Grammar Specification where it touches on this point.
The current accessibility appendix H is not acceptable to them. They have offered to provide a re-working of it where voice is center, but not limit, of domain of application (fits in with the goals of text input/output for accessibility). VBWG Response: Accepted.
Resolution: Accepted
PFWG provided the draft of a re-written Appendix H: Accessibility. This was subsequently edited, reviewed by the PFWG and then incorporated into the CR specification. This appendix also includes additional guidelines for enabling persons with disabilities to access VoiceXML applications.Email Trail:
From Al Gilman
Analysis:a. Completeness of key-access to function. Control of the application through key-presses by way of DTMF catches allows people with unrecognizable speech access to the application. Can the format specification enforce complete functionality in this mode of interaction? If it can, the group should consider requiring this. If not, we should discuss a lower minimum requirement including orientation to alternate modes of accessing the same operational service through another channel of communication (like giving an 800 number on your website). b. Always define a global ENTER user-action. [reference: 3.1 Grammars] The functionality intended here is that there is something the user can do that serves as an executeImmediate or justDoIt verb. Comparable to the use of the ENTER key on desktop keyboards. This could be 'Yes, please' in an English speech catch grammar, the hash (#) key in DTMF, or what you like. This mode of operation is used by people with severe motor limitations in accessing computer applications. The same "wait and act" mode of user action that they use there is adaptive here in the same personal circumstances and for the same reasons. http://www.abilityhub.com/switch/ Key-bindings that might make sense to standardize in this specification include this function and zero (0) for Help.
PFWG prefer that VoiceXML documents have 'DTMF functional completeness': for all inputs by speech, there is an equivalent DTMF input.
Resolution: Rejected
Specific reasons against this approach: (a) The application of speech recognition as an input medium is severely limited if there always needs to be a DTMF equivalent grammar. Speech provides a natural, 'wide' interface which is very difficult, if not impossible in mixed-initiative applications, to capture with DTMF input due to the limited token set and syntax of DTMF. (b) The menu structure of DTMF input can be very different from the menu structure allowed by speech input. It is unclear how these structures can be reconciled by requiring DTMF grammars to parallel speech grammars. (c) The prompts will need to be different: DTMF input patterns need to be explained and taught to the user, while speech is more intuitive and complex, explanatory prompts are not required. Providing both simultaneously leads to prompts which can be highly confusing for the end user. (d) There are alternative mechanisms available today to provide DTMF input in addition to speech input, including (i) providing an alternative DTMF only dialog within the same VoiceXML document, or (ii) providing an alternative VoiceXML application (for example by means of a separate telephone line). (e) It is unclear how this requirement would be enforced, and grave doubts about whether it would be complied with. (f) The alternative of using SRGS with text rather than speech input provides a better alternative since it allows a wider input channel than DTMF. (h) It is unclear how DTMF would address internationalization, which SRGS input addresses. In summary, the VBWG felt that it was inappropriate to use W3C technology specifications to enforce this type of policy. Guidelines describing best practise for accessibility with respect to Voice Browser are prefered. It was also unclear whether this policy is within the scope of W3C at all - further clarification and guidance from W3C management is required.Email Trail:
From Al Gilman
5. Regular, proven and familiar dialog structures. [reference: 3.1 Grammars] The above specific device is a special case of a broader issue. This has to do with creating simple, regular navigation structures that are highly usable and leverage learning across multiple applications that use the same best practices. Compare with http://www.w3.org/TR/WCAG10/#gl-facilitate-navigation In addition to the general concept set forth in this guideline, the following two examples of reference designs which have been developed for delivery contexts that stress the mnemonic appeal of the dialog flow are worth noting: Website design for those with severe learning disabilities: http://www.learningdisabilities.org.uk/html/content/webdesign.cfm Note in particular the five global functions in this dialog design. Navigation modes for the ANSI/NISO X39-86-2002 Digital Talking Book Standard. Start at http://www.loc.gov/nls/niso/ Some preliminary experience with designing VoiceXML applications with this as the general operational model have been highly encouraging. 6. Complete safety net. [reference: 1.3.5 Events 1.5.4 Final Processing 5.2 Event Handling 5.2.2 Catch 5.2.4 Catch Element Selection 5.2.5 Default Catch Elements] Each element in which an event can occur SHOULD specify catch elements, including one with a fail-soft or recovery functionality. Examples: <catch event="noinput"> <reprompt/> </catch> <catch event="nomatch"> <audio> I am sorry.,.I did not understand your command. Please re-enter your key choice. </audio> <reprompt/> </catch> <catch event="help"> Please say visa, mastercard, or amex. </catch> 7. Timeouts [reference: Appendix D - Timing Properties 4.1.7 Timeout] People with disabilities sometimes need a little extra time to respond or complete an input action. Generous time allowances should be available. Prompt user that the timeout will expire, give option to extend time. Making extra time available as an ask-for option may be the most effective way to a) keep the application accessible to those who need it without b) impairing the functionality for others through excessive delays. 8. Human or other option. [reference: 1.5.2 Executing a Multi-Document Application 5.3.9 EXIT] Advertise alternate modes through which comparable service is available. This may involve transfer to a human operator, text telephone service through transfer or re-dial, etc. Particularly during final processing as safeguard. Examples: Specify (or allow for) a "wait" or "help" for barge-in during final processing, with an option to <transfer> to a human operator. "Goodbye" (computer) "wait....wait!" (human) "Would you like me to repeat your new account number?" (computer) "Yes....I didn't get it the first time" (human) "The new account number established during this call for Mary Jane Jones, is 6652281. Does this answer all your questions?" (computer) "Yes" (human) "If you need to speak with an operator, just say "Operator". Would you like to transfer to an operator?" (computer) "No" (human) "Thank you for using the National Bank's automated voice system. Goodbye" ** DON'T LOSE THESE KEY FEATURES We couldn't resist the following affirmations. Please take these as a "Not just yes, but He** yes!" response: These features and design aspects will prove critical in disability access situations: 9. Layered help. [reference: 2.3 Form Items 2.5 Links 3.1.1.3 Grammar Weight 4.1.6 Prompt Selection 5.2 Event Handling 5.2.2 Catch 5.2.5 Default Catch Elements 5.2.6 Event Types 5.3 Executable Content] This is good. Thank you. Get people to use it. Example: Example: aMenu[1].items[0].helptext = "If this is the entry you want please press the pound key"; aMenu[1].items[0].morehelptext = "Press 1 to start over"; 10. Application-scope grammars. Best to specify application-scope form grammars in the root of a multi-document application. IMPORTANT: It is stated in the document body , as well as in the "Clarifications" section why this is a good idea in general. This goes redoubled for accessibility.
Resolution: Accepted
After consultation with the PFWG, it was agreed that this issue is more concerned with application design guidelines rather than the specification itself. In the current specification, Appendix H: Accessibility includes some basic guidelines. The VBWG and PFWG have initiated joint work on further expanding on these guidelines into separate 'Voice Design Guidelines' document.Email Trail:
From Guillaume Berche
Precise the execution of catch handlers in section "5.2.2 Catch" Section "5.2.2 Catch" seems to imply that handlers are called synchronously: "If a <catch> element contains a <throw> element with the same event, then there may be an infinite loop: <catch event="help"> <throw event="help"/> </catch>" Suggested text addition: "The FIA appendix C details the execution after a catch element is executed (in its definition of the "execute" term)"
Resolution: Rejected
Unclear what the problem is: we don't see any inconsistency between the 5.2.2 text and the FIA.Email Trail:
From Guillaume Berche
Precise the definition of "execution" in the FIA appendix C to executables from handlers. Suggested text modification: "execute To execute executable content either a block, a filled action, or a set of filled actions. If an event is thrown during execution, the execution of the executable content is aborted. The appropriate event handler is then executed, and this may cause control to resume in a form item, in the next iteration of the forms main loop, or outside of the form. If a computed-directed transition element(such as <goto>, <link>, <return> or <submit>) is executed, the transition takes place immediately, and the remaining executable content is not executed. During the execution of the event handler, the same rule applies as for the execution of executable content described above (with respect to execution abortion and transition)."
Resolution: Accepted
We have already clarified execution of executable context in response to other requests. Note that <link> cannot appear in executable context.Email Trail:
From Guillaume Berche
Precise error handling during document initialization (e.g. in document-level <script> and <var> elements) Suggested modification: Move the modified following text from section "5.2.6 Event Types" to section "5.2.2 Catch" (or to a new section, as suggested in comment #4) "Errors encountered during document loading, including transport errors (no document found, HTTP status code 404, and so on) and syntactic errors (no <vxml> element, etc) result in a badfetch error event raised in the calling document, while errors after loading (including document initialization) (such as semantic errors during <script> and <var> initialization), are raised and handled in the document itself." I could not understand the rationale behind the following statement in section "5.2.6 Event Types", near to error.badfetch. "Whether or not variable initialization is considered part of executing the new document is platform-dependent." Can please someone explain why this behavior would be platform dependent?
Resolution: Accepted
We have added the following text to the 5.2.6: "Errors encountered during document loading, including transport errors (no document found, HTTP status code 404, and so on) and syntactic errors (no <vxml> element, etc) result in a badfetch error event raised in the calling document. Errors that occur after loading and before entering the initialization phase of the Form Interpretation Algorithm are handled in a platform-specific manner. Errors that occur after entering the FIA initialization phase, such as semantic errors, are raised in the new document. The handling of errors encountered during the loading of the first document in a session is platform-specific." Variable initiatization may be platform-dependent since a platform may use a SAX-based document construction technique where initiation of variables takes places as each statement is reached during document loadin, or may use a DOM-based technique where the whole document is constructed first, then any initialization takes place.Email Trail:
From Guillaume Berche
Precise document initialization. As described above in comment #3, some events are handled at document initialization. However, since elements are initialized in document order, events handlers may not yet be active at the time an event is thrown. Take for instance the usual case of a vxml document starting with a script element: no document handlers are yet initialized, and an error in the <script> element would not be handled by defined event handlers. Suggested modification: add a specific section concerning document initialization similar to the FIA which precise the order of element initializations "1.5.0 Document initialization Document initialization starts once the transport and XML schema validation has been performed. As described in section "5.2.2 Catch", errors occuring during this phase are raised and handled in the document itself. During handling of events, the variable scope chain may not be complete (there might be no chained dialog scope yet), but the _event shadown variable is still defined in an anonymous variable scope" Each element is initialized in document order including event handlers. Consequently, it is advised to define document-level handlers first in the document. ... Once all elements are initialized, the document execution begins. As described in section "1.5.1 Execution within One Document", document execution begins at the first dialog by default. "
Resolution: Accepted
We have clarified in the FIA Appendix the description of initialization: "foreach ( <var>, <script> and form item, in document order ) if ( the element is a <var> ) Declare the variable, initializing it to the value of the "expr" attribute, if any, or else to undefined. else if ( the element is a <script> ) Evaluate the contents of the script if inlined or else from the location specified by the "src" attribute. else if ( the element is a form item ) Create a variable from the "name" attribute, if any, or else generate an internal name. Assign to this variable the value of the "expr" attribute, if any, or else undefined." and clarified error handling during FIA execution: "During FIA execution, events may be generated at several points. These events are processed differently depending on which phase is active. Before a form item is selected (i.e. during the Initialization and Select phases), events are generated at the dialog level. The corresponding catch handler is located and executed. If the catch does not result in a transition from the current dialog, FIA execution will terminate. Similarly, events triggered after a form item is selected (i.e. during the Collect and Process phases) are usually generated at the form item level. There is one exception: events triggered by a dialog level <filled> are generated at the dialog level. The corresponding catch handler is located and executed. If the catch does not result in a transition, the current FIA loop is terminated and Select phase is reentered." Note that XML Schema validation is NOT compulsory in VoiceXML (see Appendix F - Conformance).Email Trail:
From Guillaume Berche
Refine anonymous variable scope during event handling Section "5.2.2 Catch" states that "The catch element's anonymous variable scope includes the special variable _event which contains the name of the event that was thrown." To me, this implies that the handler is invoked when the FIA is currently running (that is a form and a form item are active). However, this might not be the case for events handled during document initialization. Consequently, the variable scope chain as described in section "5.1.2 Variable Scopes" would not work, in particular there would no chained dialog scope. Suggested modification is included in comment #4.
Resolution: Rejected
We don't see the 'implication' that the existent of the _event variable implies the FIA is currently running.Email Trail:
From Guillaume Berche
Precise that a <field> item without implicit nor explicit grammar should throw an error.semantic event. See if it is possible to refine the schema to enforce this. Alternative suggested text modification to the end of section "2.3.1 FIELD" "[...] The use of <option> does not preclude the simultaneous use of <grammar>. The result would be the match from either 'grammar', not unlike the occurence of two <grammar> elements in the same <field> representing a disjunction of choices. However, a field item without implicit nor explicit grammar would result in an error.semantic event to be thrown at document initialization time".
Resolution: Rejected
The specification doesn't state or imply that a field without grammars is an error, so we cannot make it more precise.Email Trail:
From Teemu Tingander
Case was this : <field name="order" /> <prompt> Make Your Order </prompt> <grammar mode="voice" src="order.grxml" type="application/srgs+xml"/> <filled> <submit src="someurl" mode="???"> </filled> </field> So if the order is filled with structured object like : order: { drink: "coke" pizza: { number: "3" size: "large" topping: [ "pepperoni"; "mushrooms" ] } } what is the correct way to create POST request and GET request.. like in GET: http://someurl?order.drink=coke&order.pizza.number=3 &order.pizza.number=3&or der.pizza.size=large&order.pizza.topping=pepperoni &order.pizza.topping=mushrooms Issues rise on arrays (order?); should they be numbered etc. ? The post request is more complicated to write so i leave it of from here.
Resolution: Accepted
VoiceXML 2.0 in April 2002 specification makes it clear that developers should decompose object themselves for submission, see Section 5.3.8 (default submission: stringOf on object). Decomposition as you suggest is reasonable and since you control the recomposition at the other end, any issues with arrays, etc you should be able to resolve yourself.Email Trail:
From Teemu Tingander
What is the scoping of properties ? how many, what is the top( field? ) scope. And expr for property would be nice and very usefull while tuning ASR throught <properties>. Should property reset back what it was in previous scope! or scope attribute for <property > like scope (universal | document | form | dialog | location) "location" what specifies how "deeply" it affects. <?xml version="1.0"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <property name="noicefilter" value="small"/> <form id="first"> <field name="location"> <property name="noicefilter" value="large"> <prompt> Say the location of the person you would like to call. </prompt> <filled> <if expr="location$.noicelevel > 0.3"> <!-- property name="noicefilter" value="huge"/ --> <prompt> Please repeat</prompt> <clear/> <else/> <goto next="#second"> </if> </filled> </field> <! -- second case how it should work if filled would be here > </form> <form id="second"> <! -- it would be nice to have that propery as "huge" here by default if it is noicy env.! --> <field name="location"> <prompt> Say the second location of the person you would like to call. </prompt> </field> <filled next="#second"> </form> </vxml> is the flow following 1- setProperty ( DOC.scope, "noicefilter", "small" ) 2- setProperty ( FIELD.scope, "noicefilter", "large" ) 3- Say the location of the person you would like to call. 4(?)- Field gets filled, with high noice level (just example shadow variable, could be confidence or... ) 5 - FIA exits field scope, goes 2 form scope 6 - propery noicefilter resets to small, cause it was it in doc scope. 7 - FIA searches field location, enters it and sets setProperty ( FIELD.scope, "noicefilter", "large" ) 8 - gets filled, executes, goes to #second 9 - propery noicefilter resets to small, cause it was it in doc scope. 10 - .... Is this what You had in mind ?
Resolution: Rejected
The team has already discussed adding an expr attribute on property and has previously rejected it. The scope of properties is described at beginning of Section 6.3 and it seems to be consistent with your description.Email Trail:
From Guillaume Berche
Precise the behavior if Reprompt is executed outside of a catch element Suggested text addition to section "5.3.6 REPROMPT": "If a Reprompt is executed outside of a catch element (such as in a block or filled elements) then an "error.semantic" event is thrown.
Resolution: Accepted
It has been clarified that a <reprompt/> outside a catch has no effect (since the FIA performs normal selection and queuing of prompts outside catches).Email Trail:
From Guillaume Berche
Precise which event should be thrown for malformed ECMAScript expressions in Var, Assign, Script, and ECMAScript expression evaluation (such as the "cond" attribute, and expr attribute variants). Suggested text addition to Appendix C, FIA: "During the execution of the FIA, various ECMAScript expressions are evaluated such as the "cond" attribute of input or prompt items and the different variants of "expr" attribute. If the evaluation of such an ECMAScript expression defined in the document results in an error then an "error.semantic" event is thrown. These events are handled in the same way than events thrown during execution as documented in the beginning of this section."
Resolution: Rejected
This behavior you describe is clearly implied by a number of points where error.semantic is discussed; e.g. 5.2.6 error.semantic is thrown if undefined variable is referenced. We have clarified that in '2.1.6.2.1 Select phase': "If an error occurs while checking guard conditions, the event is thrown which skips the collect phase, and is handled in the process phase."Email Trail:
From Guillaume Berche
Analysis:Precise that the Block's prompt queuing occurs during prior to execute the Block. In the example below, it seems unclear from the specifications whether the second prompt would be heard because prompts are executable content and the definition of "execute" states that as soon as a "<goto> is executed, the transfer takes place immediately, and the remaining executable content is not executed." However, the collect phase states that appropriate prompts elements should be selected for the input item (including blocks). <block> This is my first prompt text <goto next="#another_dialog"/> This is my second prompt text </block> Suggested modifications to the appendix C: " else if ( a <block> was chosen ) { Set the block's form item variable to a defined value. Execute the block's executable context (except for prompts which were previously queued in the select phase). }"
VBWG: This appears to be confusion. A block is not an input item. A block's prompts are not collected and queued a la prompt selection in form items. A block is fully executed in collect phase; in your example, when <goto> is executed, no further content is executed, so the second prompt is never executed. Berche: Sorry about the incorrect wording, I meant that a Block is a form item and as such it conforms to the appendix C algorithm which states in the collect phase: "Select the appropriate prompts for the form item." and makes no exception for the Block element. > A block is > fully executed in collect phase; in your example, when <goto> is > executed, > no further content is executed, so the second prompt is never executed. > I agree with that "A block is fully executed in collect phase" but as described in the appendix C the collect phase is split into 3 steps: 1) queue prompts for the form item 2) activate grammars 3) execute form item My feedback was to clarify the steps 1 and 3 with respects to the Block element. If the Block's prompts are to be considered as executable content, then I would suggest that appendix C excludes blocks from the sentence "Select the appropriate prompts for the form item." so that there is no ambiguity as to whether the Block's prompts are treated in step #1 or step #3 which leads to different results as illustrated in my comment. As a prompt is a form item, it seems a legitimate interpretation of the specifications to queue its prompts in step#1 and only execute non-prompts executable content in step #3. A simple correction to the specification to remove the ambiguity would be to replace "form item" with "input item" in the following appendix C extract: // // Collect Phase: [...] // // Queue up prompts for the form item. unless ( the last loop iteration ended with a catch that had no <reprompt> ) { Select the appropriate prompts for the form item. Queue the selected prompts for play prior to the next collect operation. Increment the form item's prompt counter. }
Resolution: Accepted
Following further analysis, we have make the change you suggest in the FIA appendix extract replacing 'form items' with 'input items'.Email Trail:
From Guillaume Berche
Incorrect time designation pattern in schema: The time designation pattern "Duration.datatype" is defined as "\+?[0-9]+(m?s)?" in the schema. However, this does not include real numbers such as "1.5s" as specified by CSS2 section "4.3.1 Integers and real numbers" Suggested modification to the definition of "Duration.datatype" in the schema: <xsd:restriction base="xsd:string"> <xsd:pattern value="\+?[0-9]+(\.[0-9]+)?(m?s)?" /> </xsd:restriction>
Resolution: Accepted
Time designation pattern now correctly follows CSS2 model.Email Trail:
From Guillaume Berche
Precise the Exit expr attribute is an **ECMAScript** expression which may resolve into a defined variable Suggested text modification to section "5.3.9 EXIT": "expr: An **ECMAScript** expression that is evaluated as the return value (e.g. "0", "'oops!'", or "field1")."
Resolution: Accepted
Corrected in CR specification.Email Trail:
From Guillaume Berche
Precise which event is thrown if the nextitem or expreitem attribute of a Goto element refers to a non-existing **form item**. Suggested text modification to section "5.3.7 GOTO": "If the **form item**, dialog, or document to transition to is not valid (i.e. the **form item**, dialog or document does not exist), an error.badfetch must be thrown. "
Resolution: Accepted
Corrected in CR specification.Email Trail:
From Guillaume Berche
I also have a question concerning the "Mapping Semantic Interpretation Results to VoiceXML forms" that I could not answer. When an input item contains a grammar with a dialog scope, would this grammar be considered as a form-level grammar (and therefore be semantically equivalent to a grammar element defined in the form) or would the interpretation of its results be different that a form-level grammar? In particular, if this grammar matches, would the other input items be inspected for match of their slot names on this match? If such a grammar is handled as a form-level grammar, I don't quite understand the benefit for developers to have it as a child of an input item rather than as a child of the form. Can somebody please point me to the appropriate section in the specifications which detail this or provide me with details?
Resolution: Accepted
Text is clear in the CR specification, so no change. The distinction between field versus form level grammars is based on where they defined, whilst scoping determines when they are activated. So there is no direct connection between them - all field grammars only fill its field variables, while form-level grammars can potential fill any field within a form.Email Trail:
From Guillaume Berche
Analysis:Precise the property scope from which executables within catch elements are evaluated. For instance, would a prompt element in a document-level catch element use the PropertyScope of the document or the active element at the time the event is handled? Suggested text addition to section "5.3 Executable Content": "Note that the property scope in which default values are resolved is the property scope of the active element at the time the executable content is executed. For example, a prompt element executed as the result of a document-level catch element would use the PropertyScope of the active element to resolve the "timeout" property if no "timeout" attribute was specified in the Prompt itself"
VBWG:Rejected: We believe that the text is already sufficiently explicit on this issue - see Section 5.2, last paragraph describing 'as if by copy semantics'. Berche: In Section 5.2, last paragraph describing 'as if by copy semantics', the specification only describes variable resolution and not property resolution as the extract below illustrates. This was the point of my change request. From your answer I understand that my interpretation is right although the suggested changes are considered extra by the VBWG.
Resolution: Accepted
We have modify the extract from 5.2 so that it is clear that properties, like variables, are resolved relative to the scope where event is thrown (i.e. NOT where the <catch> is defined).Email Trail:
From Guillaume Berche
Analysis:Precise that Block elements also have a form item Block elements may be executed more than once without clearing their prompt counters through their transition using a goto element. Consequently, it seems logical that they have a prompt counters like any form item. In addition, this removes an unnecessary exception statement in the specs and uniformises definition of form items. Suggested text modification to section "4.1.6 Prompt Selection": "Each form item and menu has an internal prompt counter that is reset to one each time the form or menu is entered. Whenever the system uses a prompt, its associated prompt counter is incremented. This is the mechanism supporting tapered prompts."
VBWG: Rejected. Multiple blocks can achieve the same purpose. No clear use case is offered for this change. Berche: I understand the prompt counter is a convenience feature to not require VXML authors to use <if> statements relying on ECMAScript variables to select prompts to play during their execution. I believe that for consistency of the specifications, prompts counters should apply in blocks as well as in any form item. I don't see any motivation for excluding blocks from event counter feature and for adding an exception case in the VXML specifications. As the "event counter feature" is a convenience, there are ways to do the same thing with more work from VXML authors. I believe that removing the exception that Blocks have no internal prompt counters would make the specifications simpler and more consistent with additional benefits to VXMl authors.
Resolution: Rejected
We believe that the current approach provides a consistent treatment of prompts inside fields and catches, and that adding a prompt counter to <block> at this stage may do more harm than good.Email Trail:
From Guillaume Berche
Precise which prompt counter to use when handling a runtime error while in the FIA selection phase? While in the FIA selection phase, no form item is currently active. However, if a runtime error occurs (such as during the evaluation of a cond attribute), prompts may be played by catch elements. Which prompt counter would then be used? Suggested text modification to section "4.1.6 Prompt Selection": "Each form item and menu has an internal prompt counter that is reset to one each time the form or menu is entered. Whenever the system uses a prompt, its associated prompt counter is incremented. This is the mechanism supporting tapered prompts. Note that when a prompt is used while no form item or menu is active, the current prompt counter value is one. This condition may happen when a runtime error occurs while the FIA is in the selecting phase (e.g. the cond expr of a form item generates an ECMAScript evaluation expression error)"
Resolution: Rejected
The prompt counter would default to 1.Email Trail:
From Guillaume Berche
Typo: Invalid cross-reference in "1.4 VoiceXML Elements" The <block> element is defined in section "2.3.2 BLOCK" and not "2.3.1 FIELD"
Resolution: Accepted
Corrected in CR specification.Email Trail:
From Guillaume Berche
Precise behavior if an unsupported built-in grammar is referenced in a document Suggested text addition to section "2.3.1 FIELD": "type: The type of field, i.e., the name of a builtin grammar type (see Appendix P). Platform support for builtin grammar types is optional. If the specified built-in type is not supported by the platform, an error.unsupported.format event will be thrown. In this case, <grammar> elements can be specified instead."
Resolution: Accepted
The CR specification contains an "error.unsupported.builtin" error type for this purpose; "_msg" can provide more information such as builtin type.Email Trail:
From Guillaume Berche
Precise behavior for unsupported language defined in xml:lang attribute of <vxml> Suggested text modification to section "1.5.1 Execution within One Document": "xml:lang The language identifier for this document as defined in [RFC3066]. If omitted, the value is a platform-specific default. When an unsupported language is requested, the platform throws an error.unsupported.language event which specifies the unsupported language in its message variable."
Resolution: Rejected
Platform only rejects language at the point it is used in a prompt or grammar in the document.Email Trail:
From Guillaume Berche
Precise the event thrown when an invalid property value is assigned. The section "6.3 Property" usually specifies the valid values for a property. However, it does not specify the behavior of the browser if a invalid value is specified in a property value. Since the schema does not provide validation support for property values, it seems important to specify the browser behavior in such a condition (for instance if the bargein property is assigned to the value "maybe") Suggested text addition to section "6.3 Property": "If a property element provides a value that falls outside of the set of valid values specified for the corresponding property name in this section, an error.badfetch event is thrown at document initialization."
Resolution: Accepted
If a platform detects that the value of a legal property name is illegal, then it should throw an error.semantic (some platforms may deal with this situation by using an appropriate default value).Email Trail:
From Guillaume Berche
Precise that a field with a built-in type may contain additional nested grammar elements I believe it can make much sense on real applications to define a field with one of the built-in type and have in addition other grammars that can match. One possible example is to complete a platform built-in grammar with alternative tokens (e.g. for boolean complete with "sure", "of course" and associate them with the "true" tag value) Suggested text addition to section "2.3.1 FIELD" (at the end of the type attribute definition): "When this attribute is defined, use of nested grammar elements is still legal and can possibly be used to extend the built-in grammar with application-specific tokens (e.g. for boolean complete with "sure", "of course" and associate them with the "true" tag value)."
Resolution: Accepted
Corrected in CR specification.Email Trail:
From Guillaume Berche
Precise and uniformize units for time values The timeout attribute specification of the Prompt element does not specify a unit, nor does the timeout property. It seems desirable to me add cross-reference to section "6.5 Time Designations" to precise the ambiguity. The ambiguity is made stronger by the fact that some properties representing time [intervals] do not conform to section "6.5 Time Designations": this is the case for the maxstale and maxage. There are numerous time designation in the specs. Following are two examples of modifications that would clarify time units. Suggested text modification to section "4.1.7 Timeout": "The timeout attribute is a time designation as specified in section "6.5 Time Designations" which specifies the interval of silence allowed while waiting for user input after the end of the last prompt." Suggested text modification to section "6.1.1 Fetching": fetchtimeout: The interval to wait (as specified in section "6.5 Time Designations") ... maxage: Indicates that the document is willing to use content whose age is no greater than the specified time (in the format specified in section "6.5 Time Designations") ... maxstale: Indicates that the document is willing to use content that has exceeded its expiration time (in the format specified in section "6.5 Time Designations") ...
Resolution: Accepted
Change applied in CR specification. Not that maxage and maxstale are derived from HTTP 1.1 and are clearly integers indicating seconds. We see no reason to coerce these into CSS2 time durations.Email Trail:
From Guillaume Berche
Precise the semantic for the "xml:lang" attribute of Prompt elements It is not clear how the xml:lang attribute of the Prompt element integrates with the xml:lang attributes in nested SSML markup such as in paragraph or sentence. Suggested text modification to section "4.1 Prompts": "xml:lang The language identifier as defined in [RFC3066]. If omitted, it defaults to the value specified in the document's "xml:lang" attribute. For speech output, this attribute has the same semantics as the SSML xml:lang attribute. Refer to SSML section "2.1.2 "xml:lang" Attribute: Language". For audio output, this attribute is ignored.
Resolution: Rejected
This is already clear -- we don't see any confusion.Email Trail:
From Guillaume Berche
Precise the behavior of queued prompts when a prompt fails to be played. In the current specs, prompts are queued during the transitionning phase. Then during the waiting phase, they start being played. It seems unclear how the browser should react to a prompt which can not be played (e.g. an unsupported language, or an audio prompt which can not be fetched and without alternative prompt): an event would be thrown, however the following questions remain: a- does the interpreter enter the transitionning phase? b- do remaining prompts get played? I believe that the answer to a) is yes, and answer to b) is no because otherwise partial audio would be delivered to the end-user, without the application being able to control it in any way. Suggested text modification to section "4.1.8 Prompt Queueing and Input Collection": - when a prompt fails to be played, the appropriate event is thrown (such as error.badfetch, error.unsupported.format, or error.unsupported.language ) and the interpreter enters the transitionning phase. The remaining prompts do not get played. As described in section "4.1.3 Audio Prompting", events thrown as a result of failed prompts are not designed to support programmatic recovery from the application.
Resolution: Accepted
Corrected in the CR specification.Email Trail:
From Guillaume Berche
Comment on issue concerning Grammar mode attribute in section "3.1.1.4 Grammar Element" I believe that this attribute should be ignored for external grammars. This is because a default value in SRGS exists, and as noted in the case a mode value is provided in the grammar, conflict can occur. Suggested text modification to section "3.1.1.4 Grammar Element": "mode Defines the mode of the contained grammar following the modes of the W3C Speech Recognition Grammar Specification [SRGS]. Defined values are "voice" and "dtmf" for DTMF input. If the mode value is in conflict with the mode of the grammar itself, a "badfetch" event is thrown. This attribute is ignored for referenced grammars."
Resolution: Accepted
Already fixed in CR specification due to earlier change requests.Email Trail:
From Guillaume Berche
Analysis:Enforce consistency among "xml:base" attribute of application and document and precise precedence order. Suggested text modification to section "1.5.1 Execution within One Document": "xml:base The base URI for this document as defined in [XML-BASE]. As in [HTML], a URI which all relative references within the document take as their base. If both the root application and the leaf document define an "xml:base" attribute and the values for this attribute if different then an error.semantic event is thrown. If either the root application or the leaf document define the xml:base attribute, then it becomes the base URI for the current document. If the xml:base attribute is defined in neither root application nor leaf document, then the root and leaf documents must be loaded from URIs with the same base, otherwise an error.semantic event is thrown." Rationale: it may be difficult for VXML authors to develop VXML applications in which the base URI is different for the root than for the leaf. This is because links defined in the application are active while the leaf is active. Therefore, relative URIs in root-defined links would probably fail if activated while the leaf is active. Consequently, not enforcing consistent base URIs prevents such root documents to use relative URIs since their leaf documents might override it.
VBWG: xml:base is by definition a document-oriented concept, not an application-oriented concept. Berche: Section "5.2 Event Handling", states that "Similarly, relative URL references in a catch element are resolved against the active document and not relative to the document in which they were declared." In addition, section "5.2.4 Catch Element Selection" states that "Form an ordered list of catches consisting of all catches in the current scope and all enclosing scopes (form item, form, document, application root document, interpreter context), ordered first by scope (starting with the current scope), and then within each scope by document order." Consequently, a catch element in a root application with a relative URI may be resolved from the xml:base of a leaf document. If no consistency is enforced between the application and the document whereas logic is shared among the two (such as catch elements, or links), this reduces the benefits of shared logic provided by application documents.
Resolution: Rejected
We are struggling to understand your point. We believe that the use of xml:base is clear in the specification. A root document and a leaf document can have different xml:bases by assigning different values to their attributes. When relative URIs are evaluated, they are evaluated together with the xml:base value of the document which contains the active dialog. For example, take a <link> defined in a root document. If the active dialog is in the leaf document, then the leaf's xml:base would be used; if the active dialog is in the root document, then the root's xml:base would be used. This seems to us consistent and coherent.Email Trail:
From Guillaume Berche
Preventing use of caches for submit requests I believe that the following sentence in section "5.3.8 SUBMIT" is dangerous and not consistent with the described intent of the submit element. "Note that although the URI is always fetched and the resulting document is transitioned to, some <submit> requests can be satisfied by intermediate caches. This might happen if the method is "get", the namelist is empty, there is no query string in the URI, and the application and the origin web server both allowed the document to be cached." The submit element is designed to expressly communicate with the origin server as stated in section "5.3.8 SUBMIT": "The <submit> element is used to submit information to the origin web server and then transition to the document sent back in the response." I therefore believe that <submit> elements should never be satisfied by intermediate caches. I don't see any real-life case in which this would be desirable. This would lead to situations were the cached pages would be transitionned to while the URI fetching might fail later on. A submit request with get request, empty namelist, no query string in the URI should logically use a goto rather than a submit. A cached submit request might also be dangerous for VXML browser supporting persistent HTTP Header such as cookies, because the remote server may expect to maintain session information and may rely on receiving the submit request even if it always provides the same result page back. Suggested text modification to section "5.3.8 SUBMIT": "Note that the URI is always fetched, the resulting document is transitioned to, and no <submit> requests should ever be satisfied by intermediate caches. VXML authors willing to have such behavior should rather use a goto element."
Resolution: Accepted
We have applied a similar clarification on the wording in this section.Email Trail:
From Guillaume Berche
In Section 4.1.8, it seems incorrect to state that "While in the transitioning state various prompts are queued, [...] by the <prompt> element in field items" since the queuing of prompt elements in field items is part of the FIA collect phase (Appendix C), which itself is part of the waiting phase ("the waiting state is entered in the collect phase of a field item"). Prefered suggested fix: modify comments in Appendix C so that there is a "prepare" phase in which prompts are queued and grammars are activated. The "Collect" phase would then only start after the comment "// Execute the form item." Then modify section 4.1.8 to the following: "The waiting and transitioning states are related to the phases of the Form Interpretation Algorithm as follows: - the waiting state is entered in the collect phase of an input item, and - the transitioning state encompasses the process, select and **prepare** phases" I believe this additional FIA phase makes the definition of the waiting and transitioning more clear. Alternative fix: modify section 4.1.8 to the following: The waiting and transitioning states are related to the phases of the Form Interpretation Algorithm as follows: - the waiting state is entered in the collect phase of an input item **at the point at which the interpreter waits for input** - the transitioning state encompasses the process and select phases, the collect phase for control items (such as <block>s), and the collect phase for input items up until the point at which the interpreter waits for input.
Resolution: Rejected
The queueing of prompts is part of the collect phase of the FIA, but the collect phase is part of BOTH the waiting state and the transition state, per the description in 4.1.8. However, we have clarified in section 4.1.8 of [5] the relationship between entering the waiting state and the phases of the FIA ("the waiting state is eventually entered in the collect phase of an input item (at the point at which the interpreter waits for input)").Email Trail:
From Ray Whitmer
VoiceXML Events as DOM Events Section 5.2 on event handling claims that "An interpreter may implement VoiceXML event handling using a DOM 2 event processor". It is difficult to see how this is true, and the following sub-issues are examples of why this is not true.
Resolution: Accepted
We original believed that a modified DOM2 event processor could implement the VoiceXML event model. However, since it is 'modified' processor - in order to handle your points [2] and [3] - it is not strictly a DOM2 processor. Hence, in the candidate recommendation version, all references to the DOM2 event model will be removed.Email Trail:
From Ray Whitmer
Handler Order Later in the document, section 5.2.4 states that the event delivery algorithm is described as a constrained version of XML Events and DOM 2 event processing, where the catch events are explicitly ordered by document order. This makes impossible to implement VoiceXML event handling using a normal DOM 2 event processor in any reasonable fashion.
Resolution: Accepted
In the candidate recommendation version, this statement will be removed.Email Trail:
From Ray Whitmer
Canceling on Current Level Also, section 5.2.4 states that an event handler which handles an event stops propogation of the event, and implies that other event handlers declared on the same element will not be called. While DOM event handling has the ability to cancel handlers declared on ancestor nodes, all handlers will always still be called on a single node if any handlers are called on that node regardless of cancelling that occurs during delivery.
Resolution: Accepted
In the candidate recommendation version, this statement will be removed.Email Trail:
From Deborah Dahl
NLSML It would be useful to understand how NLSML formatted results can be used to populate VoiceXML field items. The VoiceXML specification includes a comprehensive discussion of mapping ASR results in the form of ECMAScript objects to VoiceXML forms, but says very little about NLSML format. Priority: Medium High
Resolution: Rejected
Again this has been deferred until the next version of VoiceXML. NLSML is not mature as a specification and is currently changing into EMMA under the auspices of the MMWG. When mature, the specification may be re-considered for the next version of VoiceXML.Email Trail:
From Stefan Hamerich
<subdialog> in mixed initiative dialogue: as written in the last draft and as well in the DTD <subdialog> is only allowed as child element of <form>. Why can't <subdialog> be allowed as child element of <field> resp. <filled>? This would be fine for getting i.e. confirmation for given values and would allow the processing of different values separately. At the moment some more work has to be done to provide this ability with the given possibilities. We would appreciate at least to think about to widen the group of allowed parent elements of <subdialog>.
Resolution: Rejected
<subdialog> involves involves collecting user input and that is not part of executable content (such as <filled>) according to FIA. As you point, there are workarounds already available in VoiceXML for confirmation and processing different values separately. However, this issue may be addressed in the next version of VoiceXML where one tentative requirement is that the FIA is more flexible and extensible.Email Trail:
From Stefan Hamerich
filled fields in mixed-initiative dialogues: in mixed-initiative dialogues, which use one grammar for several fields, values which have been set correctly could be simply overwritten by new utterances from the user. Sometimes this behaviour is wished and good to have, but there are situations, where we would wish to deactivate fields, which were filled correctly. Is there any work done in this field? At the moment we solve this by adding an extra variable for each field. But maybe there is a more elegant solution available?
Resolution: Rejected
When to correctly override variables is an application issue. There is a workaround by copying variables into a separate space as soon as they are instantiated: this avoids them being overwritten. The issue may be re-visited in the next version of VoiceXML when have the opportunity to provide a better separation of presentation from data structure in VoiceXML forms (e.g. xforms) and to provide more detailed control of variable filling.Email Trail:
From Bogdan Blaszczak
Additional control over a start position, speed and volume of audio playback would be a useful feature in some applications. Section 6.3.1 has an example of a volume control provided as a platform-specific property. However, it also correctly states that "platform-specific properties introduce incompatibilities".
Resolution: Accepted
In section 6.3.1 we have clarified conformance behavior when interpreter encounters properties it cannot process: it must (rather than should) not thrown an error.unsupported.property and must (rather than should) ignore the property.Email Trail:
From Al Gilman
Analysis:Text equivalents for all recorded-speech prompts should be required as a validity condition of the format. These make the difference between a dialog where access by text telephone, for example, is readily achievable and where it quite difficult to achieve. Rationale: Text telephones are widely used by people who are Deaf or Hard of Hearing to access the services that others access by voice telephony. The dialog design of a voice dialog would work in a text-telephone delivery context, so long as the dialog elements are available as text.
A number of meetings between the VBWG and PFWG were held to mutually clarify the different perspectives and assumptions and to clarify the requirements (http://www.w3.org/2002/09/04-pf-irc . The following proposals were discussed: 1. VoiceXML 2.0 provides a mechanism for 'text equivalent' of an audio file. VBWG Response: See reponse to (3) below. 2. The text equivalence could be expressed as: (a) a tag such as 'alt' (comparable with HTML approach) (b) content of the <audio> element itself PFWG prefer (b) since it provides more flexibility in terms of content (Ruby was mentioned as one method by the text equivalence could be expressed. This isn't compatible with expressing it as an attribute). VBWG Response: See reponse to (3) below. 3. If the text equivalent is expressed as <audio> content, it could be achieved by: (a) using a separated, dedicated element: e.g. <audio><alt>access content</alt>normal fallback content</audio> (b) the content of the <audio> itself (i.e. accessiblity content and fallback text are identical): <audio> accessibility/fallback content </audio> PFWG prefer (b) on the grounds that they do not see any justification for a separation of accessibility content from fallback content. What is our justification for this? VBWG Response: There was some agreement that 3. (b) was acceptable. However, two concerns were raised: (i) there may be differences between fallback and text equivalence, especially where the fallback is an error message (i.e. 'cannot find file') which is not a text equivalent, and (ii) it is unclear whether any such change is feasible -- if it will have no impact on the current VoiceXML user agent implementations, how will a Voice Browser know about the end user's needs, how will it detect special end user devices? That is, if we accept 3. (b) it is unclear precisely what needs to change in the specification without a substantial investigation in the whole issue of Accessibility and Voice Browser User Agents. This investigation is already being planned by members of VBWG and PFWG, and will have an impact on the next version of VoiceXML. 4. They prefer that the text equivalent in <audio> is obligatory: e.g. a conformant VoiceXML must contain a text equivalent. VBWG Response: Rejected (tentative, discussion was incomplete) There were specific concerns for whether mandatory text equivalence is possible in (i) cases whether audio is dynamically generated (e.g. recording voice mail message), and (ii) where there are differences between spoken and written languages. There were also some concerns as to whether this was desirable for accessibility at all, especially where the audio was stylistic rather than contentful. Furthermore, it is unclear whether this could be enforced, and like (6) whether it was appropriate to use a technology specification to enforce this type of policy (Voice Browser Guidelines for best practise in writing acessible applications seemed more appropriate).
Resolution: Accepted
The VoiceXML specification uses the <audio> element provided by SSML 1.0. In the CR version of that specification, a <desc> element is introduced as a child of <audio> to provide a description of non-speech audio. When the audio source of <audio> is not available, or if the processor can detect that text-only output is required, the content of <audio> is rendered including, where appropriate, a <desc> element.Email Trail:
From Guillaume Berche
Problem with section "1.5.4 Final Processing" This section states that "While in the final processing state the application must remain in the transitioning state and may not enter the waiting state (as described in Section 4.1.8). Thus for example the application should not enter <field>, <record>, or <transfer> while in the final processing state. The VoiceXML interpreter must exit if the VoiceXML application attempts to enter the waiting state while in the final processing state. " While section "4.1.8 Prompt Queueing and Input Collection" states "Similarly, asynchronously generated events not related directly to execution of the transition should also be buffered until the waiting state (e.g. connection.disconnect.hangup). " However, since a single event triggers a transition to the transitionning state, those two descriptions conflict. Imagine the following situation in which a remote user sends a bunch of DTMFs and then hangs up, then since events would be sent in sequence, and that input would normally trigger a transition to another field which then requests a input collection. As currently described in section "1.5.4 Final Processing", this would result in the interpreter exiting, without letting the application catch the connection.disconnect.hangup event. Suggested modification to section "1.5.4 Final Processing": The final processing state is entered when the connection.disconnect.hangup event is handed to the application. As described in section "4.1.8 Prompt Queueing and Input Collection", the remote user may be disconnected and DTMF may be provided from a previous buffer before the application receives the connection.disconnect.hangup event. During the period of time in which the remote user is disconnected and final processing state is not yet entered, the application may queued prompts and request input as for normal processing. The buffered input will be used can compared against requested input, only DTMF grammars terminating timeouts would be shortened. While in the final processing state the application must remain in the transitioning state and may not enter the waiting state (as described in Section 4.1.8). Thus for example the application should not enter <field>, <record>, or <transfer> while in the final processing state (i.e while handling the connection.disconnect.hangup event). However, the <submit> tag is legal. The VoiceXML interpreter must exit if the VoiceXML application attempts to enter the waiting state while in the final processing state.
Resolution: Rejected
We believe there is some confusion here. The final processing state doesn't occur until the disconnect event occurs, so the problem you have identified should not happen.Email Trail:
From Guillaume Berche
Modify section "5.3.11 DISCONNECT" Section "5.3.11 DISCONNECT" states that "Causes the interpreter context to disconnect from the user. As a result, the interpreter context will throw a connection.disconnect.hangup event, which may be caught to do cleanup processing, e.g." I believe this is not a good thing to throw an event in this case because a catch clause would not be able to differentiate between a real user hang-up or some logic in the application that requested a disconnection. The suggested cleanup phase can easily done by the application by throwing a custom event, and in the catch clause performing necessary clean-up and then using the <disconnect> element. Suggested text modification to section "5.3.11 DISCONNECT": "As a result, the interpreter context will disconnect the remote user and exit the interpreter. Note that applications that would be willing to perform tasks upon disconnection (such as clean up) may rather throw a custom event, and in the catch clause perform necessary processing prior to invoke the <disconnect> element."
Resolution: Rejected
The application can always tell the difference between a 'real hangup' and an application generated one, since the developer can always use scripting to indicate that it is application-generated (e.g. set a variable).Email Trail:
From Ray Whitmer
Expect combination of VoiceXML with other markup such as, XHTML, SVG, SSML, etc. when defining multimodal presentations. In such cases, ECMAScript throughout the document should be consistent and interoperable. In this case, we would expect content authors call functions in the global scope throughout the document and access all parts of the document through DOM, register event handlers, etc. The intertwining of ECMAScript scopes and VoiceXML-based declaration of variables visible to ECMAScript, as described in section 5.1, is unusual. Ignoring implementation issues, it seems like it could cause usage problems. For example, if a script uses DOM to add an event handler, how does the event handler script get access to the field values it needs to get or set to respond to the event? If a script tries to access or modify a field value through DOM, how does that relate to the in-scope variable?
Resolution: Accepted
We don't expect these problems to arise in VoiceXML 2 since it was never designed for embedding in other execution container. We are aware that VoiceXML needs to be aligned with W3C best practises in terms of document model, event model, and so on, but doing so in VoiceXML 2 would be too fundamental a change this late in the process. In the next version of the language, which is intended for embedding in other environments, we are committed to addressing these model issues at a fundamental level, and look forward to receiving requirements from, and working with, on these issues with the DOM WG in the future.Email Trail:
From Deborah Dahl
VoiceXML Modularization Modularization of VoiceXML would separate VoiceXML constructs into separate modules. This would allow the constructs to be used in a multimodal language as components that can be embedded in multimodal documents. Priority: High
Resolution: Rejected
This issue has been deferred until the next version of VoiceXML. Attempting to introduce it at this stage is problematic since it requires bringing VoiceXML into line with XHTML modularization principles (e.g. no tag should have non-local effects, such as determination of active grammars) and this may require a fundamental restructuring of parts of the VoiceXML. For the next version, the VBWG will take this and other MMWG requirements into account from the beginning of the specification process. We encourage the MMWG to become actively involved in the process once it is initiated.Email Trail:
From Deborah Dahl
XML Events A modularized VoiceXML should support XML Events. VoiceXML components embedded in multimodal XML documents would share a multimodal document's DOM and DOM events. Priority: High
Resolution: Rejected
Again this has been deferred until the next version of VoiceXML, where integration with event models, such as DOM and XML, can be addressed at a fundamental level. Feedback from the DOM WG has indicated that the current VoiceXML event model is not compatible with the current DOM event model.Email Trail:
From Stefan Hamerich
recognize from file: the <record> element allows the recording of spoken utterances. With <audio> the resulting files could be played to the user. But we miss a possibility to take an audio file instead of real spoken input and to recognize from this file for further processing. This could be especially interesting for off-line processing of dialogues.
Resolution: Rejected
The use case is not fundamental to VoiceXML 2.0 since it focuses on realtime interaction with a user. There is a workaround where user input can be recorded and then analysed by an external ASR web service. This is really a batch Use Case (also applicable to Speaker Verification, multiple ASR passes, messaging, etc) which may be considered for the next version of VoiceXML.Email Trail:
From Stefan Hamerich
VoiceXML for embedded applications: VoiceXML is mainly good for telephony applications. But for embedded applications it takes too much space because of all the needed components like HTTP-server, interpreter for cgi scripts, and the VoiceXML interpreter itself. Is there any work done in the Voice Browser Group of the W3C at this field?
Resolution: Accepted
There are a wide variety of embedded devices and VoiceXML interpreter have already be used on some, depending on their available resources. Putting the interpreter, media resources and the application is more problematic though (although this is clearly possible on some PDA devices today). We may address modularization of VoiceXML and device profiling in the next version of VoiceXML, and this should facilitate running smaller interpreter profiles.Email Trail: