W3C

Voice Extensible Markup Language (VoiceXML) Version 2.0
Last Call Disposition of Comments

This version:
28 January 2003
Editor:
Scott McGlashan, PipeBeach

Abstract

This document details the responses made by the Voice Browser Working Group to issues raised during the Last Call (beginning 24 April 2002 and ending 24 May 2002) review of Voice Extensible Markup Language (VoiceXML) Version 2.0 . Comments were provided by Voice Browser Working Group members, other W3C Working Groups, and the public via the www-voice-request@w3.org (archive) mailing list.

Status

This document of the W3C's Voice Browser Working Group describes the disposition of comment as of 29th November 2002 on Voice Extensible Markup Language (VoiceXML) Version 2.0 Last Call. It may be updated, replaced or rendered obsolete by other W3C documents at any time.

For background on this work, please see the Voice Browser Activity Statement.

Table of Contents


1. Introduction

This document describes the disposition of comments in relation to the Voice Extensible Markup Language (VoiceXML) Version 2.0 (http://www.w3.org/TR/2002/WD-voicexml20-20020424/). Each issue is described by the name of the commentator, a description of the issue, and either the resolution or the reason that the issue was not resolved.

The full set of Issues raised for the Voice Extensible Markup Language (VoiceXML) Version 2.0 since August 2000, their resolution and in most cases the reasoning behind the resolution are available from http://www.w3.org/Voice/Group/2002/voicexml-change-requests.htm [W3C Members Only]. This document provides the analysis of the issues that were submitted and resolved as part of the Last Call Review. It includes issues that were submitted outside the official review period, up to 1st October 2002.

Notation: Each original comment is tracked by a "(Change) Request" [R] designator. Each point within that original comment is identified by a point number. For example, "R5-1" is the first point in the fifth change request for the specification.

2. Comments

Item Commentator Nature Disposition
R419-1    Teemu Tingander    Clarification / Typographical / Editorial (§2.1)     Accepted   
R426-1    Teemu Tingander    Clarification / Typographical / Editorial (§2.1)     Accepted   
R426-2    Teemu Tingander    Clarification / Typographical / Editorial (§2.1)     Accepted   
R426-3    Teemu Tingander    Clarification / Typographical / Editorial (§2.1)     Accepted   
R467-1    Lyndel McGee    Clarification / Typographical / Editorial (§2.1)     Accepted   
R468-1    Lyndel McGee    Clarification / Typographical / Editorial (§2.1)     Accepted   
R469-1    Deborah Dahl    Feature Request (§2.4)     Accepted   
R469-2    Deborah Dahl    Change to Existing Feature (§2.3)     Accepted   
R469-3    Deborah Dahl    Feature Request (§2.4)     Accepted   
R469-4    Deborah Dahl    Clarification / Typographical / Editorial (§2.1)     Accepted   
R470-1    Stefan Hamerich    Change to Existing Feature (§2.3)     Accepted   
R470-2    Stefan Hamerich    Feature Request (§2.4)     Accepted   
R470-3    Stefan Hamerich    Change to Existing Feature (§2.3)     Accepted   
R470-4    Stefan Hamerich    Feature Request (§2.4)     Accepted   
R471-1    Matthew Wilson    Clarification / Typographical / Editorial (§2.1)     Accepted   
R471-2    Matthew Wilson    Clarification / Typographical / Editorial (§2.1)     Accepted   
R471-3    Matthew Wilson    Clarification / Typographical / Editorial (§2.1)     Accepted   
R471-4    Matthew Wilson    Clarification / Typographical / Editorial (§2.1)     Accepted   
R471-5    Matthew Wilson    Clarification / Typographical / Editorial (§2.1)     Accepted   
R471-6    Matthew Wilson    Clarification / Typographical / Editorial (§2.1)     Accepted   
R471-7    Matthew Wilson    Clarification / Typographical / Editorial (§2.1)     Accepted   
R471-8    Matthew Wilson    Clarification / Typographical / Editorial (§2.1)     Accepted   
R471-9    Matthew Wilson    Clarification / Typographical / Editorial (§2.1)     Accepted   
R472-1    Bogdan Blaszczak    Change to Existing Feature (§2.3)     Accepted   
R472-2    Bogdan Blaszczak    Clarification / Typographical / Editorial (§2.1)     Accepted   
R477-1    Guillaume Berche    Technical Error (§2.2)     Accepted   
R477-2    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R477-3    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R477-4    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R477-5    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R477-6    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R477-7    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R477-8    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R477-9    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R478-1    Al Gilman    Change to Existing Feature (§2.3)     Accepted   
R478-2    Al Gilman    Clarification / Typographical / Editorial (§2.1)     Accepted   
R478-3    Al Gilman    Clarification / Typographical / Editorial (§2.1)     Accepted   
R478-4    Al Gilman    Clarification / Typographical / Editorial (§2.1)     Accepted   
R495-1    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R495-2    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R495-3    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R495-4    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R495-5    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R495-6    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R496-1    Guillaume Berche    Change to Existing Feature (§2.3)     Accepted   
R496-2    Guillaume Berche    Change to Existing Feature (§2.3)     Accepted   
R502-1    Teemu Tingander    Clarification / Typographical / Editorial (§2.1)     Accepted   
R503-1    Teemu Tingander    Clarification / Typographical / Editorial (§2.1)     Accepted   
R505-1    Ray Whitmer    Technical Error (§2.2)     Accepted   
R505-2    Ray Whitmer    Technical Error (§2.2)     Accepted   
R505-3    Ray Whitmer    Technical Error (§2.2)     Accepted   
R505-4    Ray Whitmer    Change to Existing Feature (§2.3)     Accepted   
R507-1    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R507-2    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R507-3    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R511-1    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R511-2    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R511-3    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R511-4    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-1    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-2    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-3    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-4    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-5    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-6    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-7    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-8    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-9    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-10    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-11    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-12    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-13    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   
R519-14    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     Accepted   

2.1 Clarifications, Typographical, and Other Editorial

Issue R419-1

From Teemu Tingander

While reading spec 2.0 I found following "problem". 
        --> Taken form W3C VoiceXML 2.0 draft
        in chapter 4.1.6
        "Each form item and each menu has an internal prompt counter that is
        reset to one each time the form or menu is entered. Whenever the
        system uses a prompt, its associated prompt counter is
        incremented. This is the mechanism supporting tapered prompts." 

So the question is : 
Are prompt counts maintained for <form/> and <menu/> elements only or
are they maintainded for each <*form item*> - specified in chapter 2.1.2 ?
If we are working like in events; Counters are reseted when entering
<form/> or <menu/> and we have individual counter for each <*form
item*> , but initial uses <form/>s counters. This is right?  So does this
mean that when <goto item="this" /> occurs in field "that" while field
"this" has prompt counter 3 the prompt that has count 3 is selected. So
entering <field name="this">..</field> does not reset counter. So
returning to <*form item*> that has prompted something before maintains
<*form item*>s prompt count and allows us to do "tapared prompting", yeah ?

Resolution: Accepted

In Section 5.2.2 of the CR specification, we have clarified in the description of "count" attribute of <catch> when counters are reset: "The occurrence of the event (default is 1). The count allows you to handle different occurrences of the same event differently. Each <form>, <menu>, and form item maintains a counter for each event that occurs while it is being visited. Item-level event counters are used for events thrown while visiting individual form items and while executing <filled> elements contained within those items. Form-level and menu-level counters are used for events thrown during dialog initialization and while executing form-level <filled> elements. Form-level and menu-level event counters are reset each time the <menu> or <form> is re-entered. Form-level and menu-level event counters are not reset by the <clear> element. Item-level event counters are reset each time the <form> containing the item is re-entered. Item-level event counters are also reset when the item is reset with the <clear> element. An item's event counters are not reset when the item is re-entered without leaving the <form>. Counters are incremented against the full event name and every prefix matching event name; for example, occurrence of the event "event.foo.1" increments the counters for "event.foo.1" plus "event.foo" and "event".
Email Trail:

Issue R426-1

From Teemu Tingander

EVENT HANDLING:

As Jean-Michel Reghem [reghem@babeltech.com] in his mail about events already
pointed out there seems to be some unanswered questions in event handling in
field elements.

We seem to have 2 different kind of field elements from the event handling
point of view This makes it difficult to application developer to always know
where FIA's going. Solutions for these problems may be easily solved in FIA,
but I hope that this is not the path that we want to follow. <object> is
out from this discussion cause it already is quite difficult for application
developer.

In field elements <field> <initial> event thrown in collect phase ( like
event "nomatch" ) prevents field to be filled unless otherwise assigned in
catch handler and by doing this prevent <filled> execution. This makes
sense to me.
        
How ever handling events in <subdialog> this is not clear. The VXML 2.0
Draft specifications section 5.3.10 it is mentioned that ;

        "In returning from a subdialog, an event can be thrown at the
        invocation point, or data is returned as an ECMAScript object"

and also after example : 

        "The subdialog event handler for <nomatch/> is triggered on the
        third failure to match; when triggered, it returns from the subdialog,
        and includes the nomatch event to be thrown in the context of the
        calling dialog. In this case, the calling dialog will execute its
        <nomatch/> handler, rather than the <filled/> element, where the
        resulting action is to execute a <goto/> element. Under normal
        conditions, the <filled> element of the subdialog is executed after
        a recognized social security number is obtained, and then this value
        is returned to the calling dialog, and is accessible as result.ssn" It
        would clarify the thing quite if that example wouldn't have that
        <goto/> element. because it anyways causes FIA to exit. So should
        the field remain unfilled and visited again, like in <field>. I
        think that is should. BUT then we go to VoiceXML.orgs conformance
        examples and the first subdialog example. In my opinion it should end
        into endless loop. This would make sense in <field> like event
        handling. This needs to be clarified in specification. In the case of
        event; if field is not needed to be visited again <subdialog>
        elements cond change or <assign> to fill the field , like in
        <field>, could be used.

The case with <record> element is almost the same as in <subdialog> but
even more complicated . Lets start form FIA side of the view; As the
specification says FIA only has 2 states : Processing input and document and
Collecting user input ( Events OR Utterances )  The combining first and second
example of <record/>: 

<?xml version="1.0"?> <vxml version="2.0">
   <form>
     <record  name="greeting" beep="true" maxtime="10s" 
                 finalsilence="4000ms" dtmfterm="true" type="audio/wav">
        .. no change here ..
     </record>

     <field name="confirm" type="boolean">
        .. no channge here ..
     </field>
     <catch name="telephone.disconnect.hangup">
        <if cond="greeting">
                <submit next="save_greeting.pl" method="post" 
                           namelist="greeting"/>
        </if>
     </catch>
   </form>
</vxml>

This comes much about the underlying system but I think that if there was
something recorded on underlying layer ( in here we have to think silence
detection etc. systems ) and then hang-up , the collection phase in field
"greeting" should return the Utterance then on the collection phase for
confirm the <event> telephone.hangup will be connected form underlying
system and normal catch handling will take place.  This is the case if
collection phase produces only Events OR Utterances not BOTH. If I havent
already brought it up I think that field gets filled only if event is not
thrown in collection phase. < form> level <catch> will handle this quite
neatly

The field elment <transfer> is like <field> to me. Should it be ( so in
case of blind transfer ) the event is handled and field is not
filled. Hopefully that <catch> does the <exit/> or similiar. Is this the
case with <disconnect>; in specification for <blind> transfer " The
interpreter disconnects from the session and continues execution (if anything
remains to execute) but cannot regain control of the call. The caller and
callee remain connected in a conversation "

This is all about event handling .. There were few clarifications about event
counting issues that I pointer in my earlier mails.

Resolution: Rejected

We are unclear exactly what this issue is about since it spans a number of different points. Please re-formulate more clearly.
Email Trail:

Issue R426-2

From Teemu Tingander

DOCUMENT TRANSITIONS ..

VXML 2.0 draft introduces more transitions within document, and some of these
are not clearly explained. I would like to have some clarifications into
specification to make application developers life little easier.

I hope We all know the scoping of attributes and paramaters. Ill try to
explain the problem.

We have document 1 which is a Application Doc for doc 2 like on specs figure
3.

        In step 1 . Doc is an application root ,  but we dont know this at
        this time .. So should we initialize the variables in doc scope of doc
        1 into scope document, Yes. But as said Root document is application
        by itself ( in spec Root2Root; "The root context is initialized with
        the new application root document, even if the documents are the same
        or have the same application name" ) so should variables ( and
        parameters ) in scope application be identical ( application ==
        document ). ok.  lets keep it this way. Should there be variables in
        scope application at all ? 

        In Step 2. The doc scope should be replaced with variables for leaf
        doc 2. Right because this is the "doc" that we are executing .  Yep
        This is easy.

        In Step 3. Same thing but with variables form doc 3 
        
        In Step 4. Do we just need to copy variables and parameters from scope
        app -> doc because they are the "doc" variables of this document.

        This is my interpretation of Specification i hope that it is right. 
        
So. I would like the changes in scopes to be clarified in case of Root2Leaf
And Leaf2Root transition.

Resolution: Accepted

We have clarified in Section 5.1 of the CR specification that: "Note that while executing inside the application root document document.x is equivalent to application.x.".
Email Trail:

Issue R426-3

From Teemu Tingander

LINKS (CHOICES) AND CATCHES

As specified the <link> element ( also <choice> ) has three attributes
that are derived from other components ( or should I say shorthand ) , event
next and expr so:

        <link dtmlf="0" next="something" />
        <link dtmlf="0" expr="'something'+'something'" />
        <link dtmlf="0" event="help" />

They could be implemented as 

        - The first 2 cases:
        <link dtmlf="0">
                <goto next="something"/>
        </link>
        or      
        <link dtmlf="0">
                <throw event="help"/>
        </link>

I think that this would be more clear what happens. and would make it possible
to use <submit> too. In <choice> elements this could be little tricky,
but would work also. This approach would make it possible to support <log>
tag inside link too.

        And what it comes to links it seems to me that they share lots of
common with catches - they catch an invocation to their grammar and process
throw or goto .. almost like
        <catch grmr="link1_grmr">
                <goto next="something"/>          
        </catch>
        or
        <catch event="link1_grmr">
                <goto next="something"/>          
        </catch>
        
Cause in link it is only needed to know what grammar triggered. So catch and
link are special cases of catch ?

Another thing is <prompt> , and to me it is a Big thing :) I would like to
know why it is permitted to write "prompts" without proper tagging. I
personally think that this wouldn't change the way how pages are made
radically if <prompt>- tags would be mandatory; It would clarify the
content quite a much.

Resolution: Rejected

While you are correct that there are similarities between <link> and <catch>, <link> is a simplified form and we want to keep it as simple as possible. If you want to do more powerful things with a <link>, then you can use a document-scope <form> instead.
Email Trail:

Issue R467-1

From Lyndel McGee

Section 2.1.5 specifies (the 2nd sentence): 
" To make a form mixed initiative, where both the computer and the human
direct the conversation, it must have one or more <initial> form items and
one or more form-level grammars. "

That implies that <initial> is a required element of the mixed-initiative
form. The examples use <initial>, too.  However, FIA does not seem to have
any provisions to enforce it.

Our questions: 
- Is <initial> required for a form to be mixed-initiative ? 
- Or, does a form-level grammar alone imply mixed-initiative? 

If <initial> is not required, then there seem to be no benefit in defining
directed and mixed-initiative forms (a VoiceXML language structure).  Instead,
the directed and mixed-initiative behaviors should be discussed in terms of
item modality and grammar types and scoping (VoiceXML use cases).

For example, the following language could be used: 

- A 'directed dialog' can be implemented by using form item-level grammars
rather than form-level grammars. If it is desired to restrict user options to
just the item's grammar, the form item should be made modal. Otherwise,
grammars in wider scopes may still accept user utterances (eg. links with
'restart', 'new order', etc.) and restart interpretation at a different form.

- A 'mixed-initiative dialog' can be implemented by using form-level grammars
that may return multiple slots and thus allow multiple form items to be filled
from a single caller utterance. The <initial> form item can be used in this
scenario to prompt for and collect an utterance before executing any input
items of the form (which may have their own specialized grammars and may
potentially capture the recognition results as their own input).


Otherwise, if <initial> is required for a form to be mixed-initiative, a
form without <initial> would be a directed form regardless of the presence
of a form-level grammar. In such case, any utterances would be processed in
the context of individual input items rather than in the form context. The
form items will be filled one at a time.

Resolution: Accepted

In the CR specification, we have clarified that (a) mixed initiative is a style of dialog (not a form sub-type), and (b) that <initial> isn't necessary for mixed-initiative dialog but one way of doing it. In particular, the first paragraph of 2.1.5 now reads: "The last section talked about forms implementing rigid, computer-directed conversations. To make a form mixed initiative, where both the computer and the human direct the conversation, it must have one or more form-level grammars. The dialog may be written in several ways. One common authoring style combines an <initial> element that prompts for a general response with <field> elements that prompt for specific information. This is illustrated in the example below. More complex techniques, such as using the 'cond' attribute on <field> elements, may achieve a similar effect.
Email Trail:

Issue R468-1

From Lyndel McGee

Let's consider interpretation of a VoiceXML document where: 
- there is a form with multiple fields, 
- the form has a form-level grammar that can return multiple slots, 
- the fields do not have their own grammars, 
- it is a mixed-initiative form (see also the problem #1 above), 
- the first recognition result fills some fields, but not all of them, 
- another caller utterance is needed to fill the remaining fields. 

Our questions: 
[a] - Is it expected that the form will switch to the 'directed dialog' mode 
after the first utterance and then consider only unfilled items for the 
subsequent utterances (see also problem #1 above) ? 
[b] - Or, will the form remain in the 'mixed-initiative dialog' mode and will
user utterances continue to be mapped to multiple input fields (as the 2nd
table in section 3.1.6.3 seems to imply) ?
[c] - And, if the form is to remain in the 'mixed-initiative dialog' mode, can
the next user utterance overwrite fields that have been already filled or will
those fields retain their previous values ?

To illustrate the problem, let's assume that: 
- the fields are 'size', 'color', and 'shape', 
- the first utterance is 'big square', 
- the second prompt says 'Please provide the color', 
- the second utterance is 'blue triangle'. 
Will the completed form be 'big blue square' or 'big blue triangle' ? 
The 2nd table in section 3.1.6.3 should be updated to cover all
(canonical) combinations of user input and dynamic states of form components. 

Resolution: Accepted

[a] and [b] don't require spec changes; Accepted [c]. See response to R467 for clarification of terms 'directed dialog' and 'mixed initiative dialog'. [a]/[b]: Given you only have a form-level grammar in your example, it is the only grammar that can be matched by user input. When the FIA visits the form, it will go to the prompts in an <initial> if present, read out the prompts in <initial> and activate the form-level grammar. If there is no <initial>, it will go to the first field and do the same thing there. After the first recognition fills in some but not all fields, the <initial> can no longer be visited, and the FIA will go to the next unfilled field, queuing its prompts and again activate the form-evel grammar (there are no other grammars in your example!). This will continue until all fields are filled. [c] We have clarified in 3.1.6.1 of the CR specification that matching form-level grammars can override existing values in input items and that <filled> processing of these items takes place as described Section 2.4 and Appendix C.
Email Trail:

Issue R469-4

From Deborah Dahl

Grammar tag's content model
Section 3.1.1 should state explicitly that the SRGS grammar tag is extended to
allow PCDATA for inline grammar formats besides SRGS.  It currently says that
SRGS tags including grammar have not been redefined.
Priority:  High

Resolution: Accepted

We have clarified in Sections 3.1.1 and 3.1.1.4 of the CR specification that the SRGS <grammar> element is extended in VoiceXML 2.0 to allow PCDATA for inline grammar formats besides the XML format of SRGS.
Email Trail:

Issue R471-1

From Matthew Wilson

The index between Appendix N and Appendix P appears to be Appendix zero, not
Appendix O.

Resolution: Accepted

Corrected in CR specification.
Email Trail:

Issue R471-2

From Matthew Wilson

In section 6.5, "Time Designations", the example "+1.5s" still contradicts the
text, which describes the format as "an unsigned number followed by an
optional time unit identifier".

Resolution: Accepted

Clarified in section 6.3 that a time designator is a non-negative number which must be followed by ms or s (i.e. it is fully aligned with Time in CSS2).
Email Trail:

Issue R471-3

From Matthew Wilson

Section 2.1.2.1 "Input Items" says that "implementations must handle the
<object> element by throwing error.unsupported.object.objectname if the
particular platform-specific object is not supported". Section 2.3.5 "OBJECT"
says that "implementations must handle the <object> element by throwing
error.unsupported.object if the particular platform-specific object is not
supported" (i.e. it does not include the object name in the event name).
Section 5.2.6 "Event Types" does not list any error.unsupported.object events,
but does include error.unsupported.format, which is raised if "The requested
resource has ... e.g. an unsupported ...  object type". Could this be
clarified?

Resolution: Accepted

In CR specification, changed and clarified in sections 2.1.2.1 and 2.3.5 that if an implementation does not support a specific object, it throws error.unsupported.objectname. In section 5.2.6 that the event error.unsupported.format is not thrown for unsupported object types. The section 5.2.6 is not intended as an exhaustive list of event types (no change).
Email Trail:

Issue R471-4

From Matthew Wilson

Events such as error.unsupported.uri, error.unsupported.language,
    error.unsupported.format are ambiguous, since they could also be
    occurrences of error.unsupported.<element> if incorrect elements have
    been used in the VoiceXML document.

Resolution: Accepted

Clarified 5.2.6 by adding that <element> in error.unsupported.<element> refers to "elements defined in this specification."
Email Trail:

Issue R471-5

From Matthew Wilson

Section 6.1.2.1 says that "VoiceXML allows the author to control the caching
   policy for each use of each resource." Is this true of the application root
   document?

Resolution: Accepted

Clarified in 6.1.2.1 that there is no markup mechanism to specify the caching policy on a root document.
Email Trail:

Issue R471-6

From Matthew Wilson

Regarding the "builtin" URI scheme, http://www.w3.org/Addressing/schemes#unreg
says that "Unregistered schemes should not be deployed widely and should not
be used except experimentally."  Is there any intention to register the
"builtin" scheme?

Resolution: Accepted

We are still trying to determine if there is an existing URI scheme which we can reuse for VoiceXML builtin. If we are unable to find one, then we will register the builtin scheme.
Email Trail:

Issue R471-7

From Matthew Wilson

There is a typo "attibute" in the schema in Appendix O (in the xsd:annotation
for the Accept.attrib attributeGroup).

Resolution: Accepted

Corrected.
Email Trail:

Issue R471-8

From Matthew Wilson

The "minimal Conforming VoiceXML document" in appendix F1 is not minimal. As
the text itself states, the XML declaration, and the xmlns:xsi and
xsi:schemeLocation are not rqeuired for conformance.

Resolution: Accepted

In appendix F, changed description of example so that it is not described as minimal.
Email Trail:

Issue R471-9

From Matthew Wilson

Section 5.1.3 says, when referring to XML-escaping characters such as
<, >, and & :

"For clarity, examples in this document do not use XML escapes."

I strongly disagree with this decision. I think that having examples in the
spec which do not work is more likely to lead to confusion than anything
else. If there is a lack of clarity in VoiceXML in places, then I believe the
spec should not try to hide the fact.

Resolution: Accepted

Removed this text and updated all examples to escaped XML characters.
Email Trail:

Issue R472-2

From Bogdan Blaszczak

A standard solution for a playback control can be based on SSML ideas.  For
example, VoiceXML may allow additional attributes to be used in the <audio>
tag. Such attributes can be modeled on selected attributes of <prosody>
(see also section 2.2.4 of the SSML spec). The attributes would be optional
and possibly ignored by some platforms.

The following additional <audio> attributes would be useful:

- speed:     the playback speed in percent of the normal speed (e.g.:50%,
100%, 200%) or values 'slow', 'normal', 'fast'.

- volume:    the playback volume in percent of the normal volume (e.g.:50%,
100%, 200%) or values 'soft', 'normal', 'loud'.

- position:  the playback start position in seconds from the beginning of 
the audio recording (e,g.: 1s, 100s).

Resolution: Rejected

This issue has been discussed many times by the team, and has been decided as beyond the scope of VoiceXML 2.0. However, it could be addressed in the next version of VoiceXML.
Email Trail:

Issue R477-2

From Guillaume Berche

In section 4.1.5, when playing prompt with a false bargein attribute, are
matching DTMF or speech input buffered or discarded? Suggested fix: in section
4.1.5, modify the text to "When the bargein attribute is false, any DTMF input
buffered in a transition state is deleted from the buffer (Section 4.1.8
describes input collection during transition states). In addition, while in
the waiting state and a prompt whose bargein attribute is false, any user
input (speech or DTMF) is simply ignored."

Resolution: Accepted

Clarified in section 4.1.5 that when a prompt's "bargein" attribute is false, no input is buffered while the prompt is playing (any DTMF already buffered is discarded).
Email Trail:

Issue R477-3

From Guillaume Berche

It is not clear whether DTMF input which does not match currently active
grammars should interrupt a prompt whose bargein attribute is true Suggested
fix: In section 4.1.5, correct the the first sentence with the following:

"If an implementation platform supports barge-in, the application author can
specify whether a user can interrupt, or "barge-in" on, a prompt using speech
or DTMF input. In the case of DTMFs, any input (even not matching active
grammar) will interrupt a prompt, and will be handled in the same way as non
matching DTMFs entered outside of a prompt."

Resolution: Accepted

We have a slightly different solution, though. We have clarified in section 4.1.5.1 that the "bargeintype" attribute of <prompt> applies to DTMF input as well as speech input.
Email Trail:

Issue R477-4

From Guillaume Berche

Just clarify 4.1.5 with respect to interruption of a chain of queued prompts
Suggested fix: in section 4.1.5, modify the text to: "Users can interrupt a
prompt whose bargein attribute is true, but must wait for completion of a
prompt whose bargein attribute is false. In the case where several prompts are
queued, the bargein attribute of each prompt is honored during the period of
time in which that prompt is playing. If bargein occurs during any prompt in a
sequence, all subsequent prompts are not played **(even those whose bargein
attribute are set to false)**."

Resolution: Accepted

Edit applied.
Email Trail:

Issue R477-5

From Guillaume Berche

For completeness and convenience, an extract from section 4.1.5 below should
be reproduced or at least mentionned in section 4.1.8. Suggested fix: add the
sentence below before "Before the interpreter exits all ..." "As stated in
section 4.1.5, when the bargein attribute is false, any DTMF input buffered in
a transition state is deleted from the buffer".
Analysis:
  
VBWG: Rejected. We didn't see a clear use case or motivation for this change.

Berche: The motivation for my comment is that information is spread around in
the specification. My feedback is that it is often hard to have a good
understanding of the behavior of the language because general behavior such as
input processing is scattered in different sections. Adding such cross
references would make the specifications easier to read and understand. In the
specific case of this request, the section "4.1.8 Prompt Queueing and Input
Collection" provides a general description of the algorithm and state the
interpreter uses to process input and play prompts. However it omits the
description of bargein which impacts both input processing and prompt
queueing.

Resolution: Accepted

Added in Section 4.1.8 a cross-reference to Section 4.1.5.
Email Trail:

Issue R477-6

From Guillaume Berche

Concerning ECMAScript variables holding non-scalar values (such as field item
variable for a record fieldaudio, or the special _prompt variable as
mentionned in my previous mail)
- what ECMAScript type do they have? Is it indeed an ECMAScript an host object
as defined in the ECMAScript specifications (or Array object containing other
objects in the case of the _prompt variable). If so, what is their exact list
of properties along with their type and properties (ReadOnly, DontEnum,
DontDelete, Internal)?. As a side-question, what does the ECMAScript typeof
operator returns on these objects?

Concerning ECMAScript special variables (such as <name>$.<shadow_var> in
fields)
- can they be modified by (of as a side effect of) ECMAScript code evaluation
(such as evaluating a guard condition, or an expr attribute)?

Suggested fix: Add a specific section about ECMAScript evaluation. This
section could precise runtime error that occur during ECMAScript evaluation,
possible side-effects of ECMAScript evaluation (such as cond attribute
evaluation), and also the type of shadow variables with the text below:
"Shadow variables are host objects as defined in the ECMAScript
specifications. The properties of these shadow variables are read-only. Any
attempt by some ECMAScript code evaluation (either in a script element or as a
side effect of the evaluation of an expr attribute) to modify those properties
will result in an error.semantic to be thrown"

Resolution: Rejected/Accepted

Many questions here! The properties of ECMAScript variables in VoiceXML are not specified unless necessary. With <record>, we have clarified that its implementation is platform-dependent (so different implementations can have different ECMAScript properties) but that all must playable by <audio>, submittable by <submit> and so on as described in the specification. For shadow variables, we have clarified that they are writable, so they can be modified. For runtime errors encountered during the FIA, we have clarified that in the FIA Appendix.
Email Trail:

Issue R477-7

From Guillaume Berche

Section 2.2 describes that the _prompt special variable "is the choice's
prompt". The type of this variable is fuzzy and the specs does precise the
behavior of <value expr="_prompt"/> in case where a choice element would
contain mixed audio prompts and TTS.

Suggested fix: add the following text to section 2.2 in the enumerate element
section: "This specifier may refer to two special variables: _prompt is the
choice's prompt, and _dtmf is the choice's assigned DTMF sequence. The _prompt
special variable is a host object which has no visible properties and should
only be used within a <value expr="_prompt"> element. If the choice
contained more than one prompt element (such as TTS elements, or a nested
<value> element) then executing the <value expr="_prompt"> would queue
all of the prompt elements and would also execute the nested <value>
element. If the nested <value> element references itself the _prompt
variable, this would lead to an infinite recursive loop, that interpreter may
detect and handle by throwing an error.semantic event. The _dtmf special
variable is of type string and may be used as such by ECMAScript code within
the expr attribute of the <value> element."

Resolution: Rejected

The assumption that the <choice> element can contain SSML is no longer true: in the CR version of specification a <choice> element can now only contain CDATA.
Email Trail:

Issue R477-8

From Guillaume Berche

Clarification to FIA with respect to run-time errors:
When the evaluation of a guard condition results in a run-time exception, how
does this modify the FIA. The FIA algorithm in appendix C seems to only
consider exceptions generated during the execution phase and remains fuzzy
about those that occur during previous phases (such as initialization, queuing
of prompts such as <value>, evaluation of guard condition)

Suggested fix: modify the FIA so that it state that "any runtime error
occuring during the select phase (e.g. runtime error to evaluate guard
condition), collect phase (e.g. runtime error at prompt queuing for instance
during <value> element execution) up to the input collection result in the
control being directly passed onto the process phase."

Resolution: Accepted

Clarified that FIA handles run-time errors in all phases.
Email Trail:

Issue R477-9

From Guillaume Berche

In the FIA, appendix C, for the collection of active grammars when not modal,
it says that these include "elements up the <subdialog> call chain." This
seems to be in contradiction with the section on <subdialog> which says
each subdialog has a totally separate context from the caller, and
shares/inherits absolutely no elements with it. Suggested fix: Remove the "and
then elements up the <subdialog> call chain." from the FIA description.

Resolution: Accepted

Correction applied.
Email Trail:

Issue R478-2

From Al Gilman

Voice is center, not limit, of domain of application.

[reference: Appendix H]  

The appendix paints the applicability of this technology too narrowly.  Even
in cases where the dialog designer only thinks in a voice dialog context, the
resulting dialogs have been thought through thoroughly and are expected to be,
for example,
- highly usable as transcoded into a text-telephone delivery context
- likewise as transcoded into a Braille environment for those who are both 
deaf and blind.
The idea that this technology is 'final form' should be eliminated and the
language brought more in line with that in section 2.1 of the latest Member
draft of the Speech Recognition Grammar Specification where it touches on this
point.
Analysis:
The current accessibility appendix H is not acceptable to them. They have
offered to provide a re-working of it where voice is center, but not limit, of
domain of application (fits in with the goals of text input/output for
accessibility).

VBWG Response: Accepted. 

Resolution: Accepted

PFWG provided the draft of a re-written Appendix H: Accessibility. This was subsequently edited, reviewed by the PFWG and then incorporated into the CR specification. This appendix also includes additional guidelines for enabling persons with disabilities to access VoiceXML applications.
Email Trail:

Issue R478-3

From Al Gilman

a. Completeness of key-access to function.  Control of the application through
key-presses by way of DTMF catches allows people with unrecognizable speech
access to the application.  Can the format specification enforce complete
functionality in this mode of interaction?  If it can, the group should
consider requiring this.

If not, we should discuss a lower minimum requirement including orientation to
alternate modes of accessing the same operational service through another
channel of communication (like giving an 800 number on your website).

b. Always define a global ENTER user-action.

[reference: 3.1 Grammars]  

The functionality intended here is that there is something the user can do
that serves as an executeImmediate or justDoIt verb.  Comparable to the use of
the ENTER key on desktop keyboards.  This could be 'Yes, please' in an English
speech catch grammar, the hash (#) key in DTMF, or what you like.

This mode of operation is used by people with severe motor limitations in
accessing computer applications.  The same "wait and act" mode of user action
that they use there is adaptive here in the same personal circumstances and
for the same reasons.

http://www.abilityhub.com/switch/

Key-bindings that might make sense to standardize in this specification
include this function and zero (0) for Help.
Analysis:
PFWG prefer that VoiceXML documents have 'DTMF functional completeness': for
all inputs by speech, there is an equivalent DTMF input.

Resolution: Rejected

Specific reasons against this approach: (a) The application of speech recognition as an input medium is severely limited if there always needs to be a DTMF equivalent grammar. Speech provides a natural, 'wide' interface which is very difficult, if not impossible in mixed-initiative applications, to capture with DTMF input due to the limited token set and syntax of DTMF. (b) The menu structure of DTMF input can be very different from the menu structure allowed by speech input. It is unclear how these structures can be reconciled by requiring DTMF grammars to parallel speech grammars. (c) The prompts will need to be different: DTMF input patterns need to be explained and taught to the user, while speech is more intuitive and complex, explanatory prompts are not required. Providing both simultaneously leads to prompts which can be highly confusing for the end user. (d) There are alternative mechanisms available today to provide DTMF input in addition to speech input, including (i) providing an alternative DTMF only dialog within the same VoiceXML document, or (ii) providing an alternative VoiceXML application (for example by means of a separate telephone line). (e) It is unclear how this requirement would be enforced, and grave doubts about whether it would be complied with. (f) The alternative of using SRGS with text rather than speech input provides a better alternative since it allows a wider input channel than DTMF. (h) It is unclear how DTMF would address internationalization, which SRGS input addresses. In summary, the VBWG felt that it was inappropriate to use W3C technology specifications to enforce this type of policy. Guidelines describing best practise for accessibility with respect to Voice Browser are prefered. It was also unclear whether this policy is within the scope of W3C at all - further clarification and guidance from W3C management is required.
Email Trail:

Issue R478-4

From Al Gilman

5. Regular, proven and familiar dialog structures.

[reference: 3.1 Grammars]  

The above specific device is a special case of a broader issue.  This has to
do with creating simple, regular navigation structures that are highly usable
and leverage learning across multiple applications that use the same best
practices.

Compare with

 http://www.w3.org/TR/WCAG10/#gl-facilitate-navigation

In addition to the general concept set forth in this guideline, the following
two examples of reference designs which have been developed for delivery
contexts that stress the mnemonic appeal of the dialog flow are worth noting:

Website design for those with severe learning disabilities:

 http://www.learningdisabilities.org.uk/html/content/webdesign.cfm

Note in particular the five global functions in this dialog design.

Navigation modes for the ANSI/NISO X39-86-2002 Digital Talking Book
Standard. Start at

 http://www.loc.gov/nls/niso/

Some preliminary experience with designing VoiceXML applications with this as
the general operational model have been highly encouraging.

6.  Complete safety net.

[reference:
1.3.5 Events
1.5.4 Final Processing

5.2 Event Handling
        5.2.2 Catch
        5.2.4 Catch Element Selection
        5.2.5 Default Catch Elements]

Each element in which an event can occur SHOULD specify catch elements,
including one with a fail-soft or recovery functionality.
             
      Examples:
          <catch event="noinput">
                <reprompt/>
            </catch>
            <catch event="nomatch">
              <audio>
               I am sorry.,.I did not understand your command. 
               Please re-enter your key choice.
            </audio>
                <reprompt/>
            </catch>

                                  
          <catch event="help">
             Please say visa, mastercard, or amex.
          </catch>

7. Timeouts

[reference:
Appendix D - Timing Properties
4.1.7 Timeout]

People with disabilities sometimes need a little extra time to respond or
complete an input action.  Generous time allowances should be available.
Prompt user that the timeout will expire, give option to extend time.

Making extra time available as an ask-for option may be the most effective way
to a) keep the application accessible to those who need it without b)
impairing the functionality for others through excessive delays.

8.  Human or other option.

[reference:
1.5.2 Executing a Multi-Document Application
5.3.9 EXIT]

Advertise alternate modes through which comparable service is available.  This
may involve transfer to a human operator, text telephone service through
transfer or re-dial, etc. Particularly during final processing as safeguard.

   Examples:
        Specify (or allow for) a "wait" or "help" for barge-in during
        final processing, with an       option to <transfer> to a human
        operator.

            "Goodbye" (computer)
            "wait....wait!" (human)
            "Would you like me to repeat your new account number?"
(computer)
            "Yes....I didn't get it the first time" (human)
            "The new account number established during this call for 
            Mary Jane Jones, is 6652281. Does this answer all your questions?"
(computer)
            "Yes" (human)
            "If you need to speak with an operator, just say "Operator". 
            Would you like to transfer to an operator?" (computer)
            "No" (human)
            "Thank you for using the National Bank's automated voice system. 
            Goodbye"
 
** DON'T LOSE THESE KEY FEATURES

We couldn't resist the following affirmations.  Please take these as a "Not
just yes, but He** yes!" response: These features and design aspects will
prove critical in disability access situations:

9.  Layered help.

[reference:
2.3 Form Items
2.5 Links
3.1.1.3 Grammar Weight
4.1.6 Prompt Selection
5.2 Event Handling
        5.2.2 Catch
        5.2.5 Default Catch Elements
        5.2.6 Event Types
5.3 Executable Content]

This is good.  Thank you.  Get people to use it.  Example:

  Example:
     aMenu[1].items[0].helptext = "If this is the entry you want please 
     press the pound key";
     aMenu[1].items[0].morehelptext = "Press 1 to start over";

10.  Application-scope grammars. 

Best to specify application-scope form grammars in the root of a
multi-document application.

IMPORTANT: It is stated in the document body , as well as in the
"Clarifications" section why this is a good idea in general.  This goes
redoubled for accessibility.

Resolution: Accepted

After consultation with the PFWG, it was agreed that this issue is more concerned with application design guidelines rather than the specification itself. In the current specification, Appendix H: Accessibility includes some basic guidelines. The VBWG and PFWG have initiated joint work on further expanding on these guidelines into separate 'Voice Design Guidelines' document.
Email Trail:

Issue R495-1

From Guillaume Berche

Precise the execution of catch handlers in section "5.2.2 Catch" Section
"5.2.2 Catch" seems to imply that handlers are called synchronously: "If a
<catch> element contains a <throw> element with the same event, then
there may be an infinite loop:

<catch event="help">
   <throw event="help"/>
</catch>"

Suggested text addition: "The FIA appendix C details the execution after a
catch element is executed (in its definition of the "execute" term)"

Resolution: Rejected

Unclear what the problem is: we don't see any inconsistency between the 5.2.2 text and the FIA.
Email Trail:

Issue R495-2

From Guillaume Berche

Precise the definition of "execution" in the FIA appendix C to executables
from handlers. Suggested text modification: "execute To execute executable
content either a block, a filled action, or a set of filled actions. If an
event is thrown during execution, the execution of the executable content is
aborted. The appropriate event handler is then executed, and this may cause
control to resume in a form item, in the next iteration of the forms main
loop, or outside of the form. If a computed-directed transition element(such
as <goto>, <link>, <return> or <submit>) is executed, the
transition takes place immediately, and the remaining executable content is
not executed. During the execution of the event handler, the same rule applies
as for the execution of executable content described above (with respect to
execution abortion and transition)."

Resolution: Accepted

We have already clarified execution of executable context in response to other requests. Note that <link> cannot appear in executable context.
Email Trail:

Issue R495-3

From Guillaume Berche

Precise error handling during document initialization (e.g. in document-level
<script> and <var> elements) Suggested modification: Move the modified
following text from section "5.2.6 Event Types" to section "5.2.2 Catch" (or
to a new section, as suggested in comment #4) "Errors encountered during
document loading, including transport errors (no document found, HTTP status
code 404, and so on) and syntactic errors (no <vxml> element, etc) result
in a badfetch error event raised in the calling document, while errors after
loading (including document initialization) (such as semantic errors during
<script> and <var> initialization), are raised and handled in the
document itself."

I could not understand the rationale behind the following statement in section
"5.2.6 Event Types", near to error.badfetch. "Whether or not variable
initialization is considered part of executing the new document is
platform-dependent." Can please someone explain why this behavior would be
platform dependent?

Resolution: Accepted

We have added the following text to the 5.2.6: "Errors encountered during document loading, including transport errors (no document found, HTTP status code 404, and so on) and syntactic errors (no <vxml> element, etc) result in a badfetch error event raised in the calling document. Errors that occur after loading and before entering the initialization phase of the Form Interpretation Algorithm are handled in a platform-specific manner. Errors that occur after entering the FIA initialization phase, such as semantic errors, are raised in the new document. The handling of errors encountered during the loading of the first document in a session is platform-specific." Variable initiatization may be platform-dependent since a platform may use a SAX-based document construction technique where initiation of variables takes places as each statement is reached during document loadin, or may use a DOM-based technique where the whole document is constructed first, then any initialization takes place.
Email Trail:

Issue R495-4

From Guillaume Berche

Precise document initialization. As described above in comment #3, some events
are handled at document initialization. However, since elements are
initialized in document order, events handlers may not yet be active at the
time an event is thrown. Take for instance the usual case of a vxml document
starting with a script element: no document handlers are yet initialized, and
an error in the <script> element would not be handled by defined event
handlers.

Suggested modification: add a specific section concerning document
initialization similar to the FIA which precise the order of element
initializations

"1.5.0 Document initialization

Document initialization starts once the transport and XML schema validation
has been performed.

As described in section "5.2.2 Catch", errors occuring during this phase are
raised and handled in the document itself. During handling of events, the
variable scope chain may not be complete (there might be no chained dialog
scope yet), but the _event shadown variable is still defined in an anonymous
variable scope"

Each element is initialized in document order including event
handlers. Consequently, it is advised to define document-level handlers first
in the document. ...

Once all elements are initialized, the document execution begins. As described
in section "1.5.1 Execution within One Document", document execution begins at
the first dialog by default. "

Resolution: Accepted

We have clarified in the FIA Appendix the description of initialization: "foreach ( <var>, <script> and form item, in document order ) if ( the element is a <var> ) Declare the variable, initializing it to the value of the "expr" attribute, if any, or else to undefined. else if ( the element is a <script> ) Evaluate the contents of the script if inlined or else from the location specified by the "src" attribute. else if ( the element is a form item ) Create a variable from the "name" attribute, if any, or else generate an internal name. Assign to this variable the value of the "expr" attribute, if any, or else undefined." and clarified error handling during FIA execution: "During FIA execution, events may be generated at several points. These events are processed differently depending on which phase is active. Before a form item is selected (i.e. during the Initialization and Select phases), events are generated at the dialog level. The corresponding catch handler is located and executed. If the catch does not result in a transition from the current dialog, FIA execution will terminate. Similarly, events triggered after a form item is selected (i.e. during the Collect and Process phases) are usually generated at the form item level. There is one exception: events triggered by a dialog level <filled> are generated at the dialog level. The corresponding catch handler is located and executed. If the catch does not result in a transition, the current FIA loop is terminated and Select phase is reentered." Note that XML Schema validation is NOT compulsory in VoiceXML (see Appendix F - Conformance).
Email Trail:

Issue R495-5

From Guillaume Berche

Refine anonymous variable scope during event handling Section "5.2.2 Catch"
states that "The catch element's anonymous variable scope includes the special
variable _event which contains the name of the event that was thrown." To me,
this implies that the handler is invoked when the FIA is currently running
(that is a form and a form item are active). However, this might not be the
case for events handled during document initialization. Consequently, the
variable scope chain as described in section "5.1.2 Variable Scopes" would not
work, in particular there would no chained dialog scope. Suggested
modification is included in comment #4.

Resolution: Rejected

We don't see the 'implication' that the existent of the _event variable implies the FIA is currently running.
Email Trail:

Issue R495-6

From Guillaume Berche

Precise that a <field> item without implicit nor explicit grammar should
throw an error.semantic event. See if it is possible to refine the schema to
enforce this. Alternative suggested text modification to the end of section
"2.3.1 FIELD" "[...] The use of <option> does not preclude the simultaneous
use of <grammar>. The result would be the match from either 'grammar', not
unlike the occurence of two <grammar> elements in the same <field>
representing a disjunction of choices. However, a field item without implicit
nor explicit grammar would result in an error.semantic event to be thrown at
document initialization time".

Resolution: Rejected

The specification doesn't state or imply that a field without grammars is an error, so we cannot make it more precise.
Email Trail:

Issue R502-1

From Teemu Tingander

Case was this :
        <field name="order" />
                <prompt> Make Your Order </prompt>
                <grammar mode="voice" src="order.grxml" 
                            type="application/srgs+xml"/>
                <filled>
                        <submit src="someurl" mode="???">
                </filled>
        </field>

So if the order is filled with structured object like :
        order: {
             drink: "coke"
                     pizza: {
                          number: "3"
                          size: "large"
                            topping: [ "pepperoni"; "mushrooms" ]
                }
                }


what is the correct way to create POST request and GET request..

like in GET:
http://someurl?order.drink=coke&order.pizza.number=3
&order.pizza.number=3&or
der.pizza.size=large&order.pizza.topping=pepperoni
&order.pizza.topping=mushrooms


Issues rise on arrays (order?); should they be numbered etc. ?

The post request is more complicated to write so i leave it of from here.

Resolution: Accepted

VoiceXML 2.0 in April 2002 specification makes it clear that developers should decompose object themselves for submission, see Section 5.3.8 (default submission: stringOf on object). Decomposition as you suggest is reasonable and since you control the recomposition at the other end, any issues with arrays, etc you should be able to resolve yourself.
Email Trail:

Issue R503-1

From Teemu Tingander

What is the scoping of properties ? how many, what is the top( field? )
scope. And expr for property would be nice and very usefull while tuning ASR
throught <properties>.  Should property reset back what it was in previous
scope! or scope attribute for <property > like scope (universal | document
| form | dialog | location) "location" what specifies how "deeply" it affects.

        
<?xml version="1.0"?> 
        <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> 
                <property name="noicefilter" value="small"/> 
                <form id="first">
                        <field name="location"> 
                                <property name="noicefilter"
value="large"> 
                        <prompt> Say the location of the person you
                        would like to call. </prompt> 
                                <filled>
                                        <if expr="location$.noicelevel 
                                        > 0.3">                           
                                                <!-- property
                                                name="noicefilter" 
                                                value="huge"/ -->

                                                <prompt>
                                                Please repeat</prompt>
                                                <clear/>
                                                <else/>
                                                <goto next="#second">
                                        </if>
                                </filled>
                </field>
                        <! -- second case how it should work if filled
                        would be here >
                </form>           
                <form id="second">
                        <! -- it would be nice to have that propery as
                        "huge" here by default if it is noicy env.! -->
                        <field name="location"> 
                        <prompt> Say the second location of the person
                        you would like to call. </prompt> 
                </field>
                        <filled next="#second">
                </form>           
        </vxml>
        
is the flow following
        1- setProperty ( DOC.scope, "noicefilter", "small" )
        2- setProperty ( FIELD.scope, "noicefilter", "large" )  
        3- Say the location of the person you would like to call.       
        4(?)- Field gets filled, with high noice level (just example shadow 
           variable, could be confidence or... )
        5 - FIA exits field scope, goes 2 form scope 
        6 - propery noicefilter resets to small, cause it was it in doc scope.
        7 - FIA searches field location, enters it and sets setProperty 
            ( FIELD.scope, "noicefilter", "large" )
        8 - gets filled, executes, goes to #second
        9 - propery noicefilter resets to small, cause it was it in doc scope.
        10 - ....

        Is this what You had in mind ?

Resolution: Rejected

The team has already discussed adding an expr attribute on property and has previously rejected it. The scope of properties is described at beginning of Section 6.3 and it seems to be consistent with your description.
Email Trail:

Issue R507-1

From Guillaume Berche

Precise the behavior if Reprompt is executed outside of a catch element

Suggested text addition to section "5.3.6 REPROMPT": "If a Reprompt is
executed outside of a catch element (such as in a block or filled elements)
then an "error.semantic" event is thrown.

Resolution: Accepted

It has been clarified that a <reprompt/> outside a catch has no effect (since the FIA performs normal selection and queuing of prompts outside catches).
Email Trail:

Issue R507-2

From Guillaume Berche

Precise which event should be thrown for malformed ECMAScript expressions in
Var, Assign, Script, and ECMAScript expression evaluation (such as the "cond"
attribute, and expr attribute variants).

Suggested text addition to Appendix C, FIA: "During the execution of the FIA,
various ECMAScript expressions are evaluated such as the "cond" attribute of
input or prompt items and the different variants of "expr" attribute. If the
evaluation of such an ECMAScript expression defined in the document results in
an error then an "error.semantic" event is thrown. These events are handled in
the same way than events thrown during execution as documented in the
beginning of this section."

Resolution: Rejected

This behavior you describe is clearly implied by a number of points where error.semantic is discussed; e.g. 5.2.6 error.semantic is thrown if undefined variable is referenced. We have clarified that in '2.1.6.2.1 Select phase': "If an error occurs while checking guard conditions, the event is thrown which skips the collect phase, and is handled in the process phase."
Email Trail:

Issue R507-3

From Guillaume Berche

Precise that the Block's prompt queuing occurs during prior to execute the
Block. In the example below, it seems unclear from the specifications whether
the second prompt would be heard because prompts are executable content and
the definition of "execute" states that as soon as a "<goto> is executed,
the transfer takes place immediately, and the remaining executable content is
not executed." However, the collect phase states that appropriate prompts
elements should be selected for the input item (including blocks). 
<block>
        This is my first prompt text
        <goto next="#another_dialog"/>
        This is my second prompt text
</block>

Suggested modifications to the appendix C:
"   else if ( a <block> was chosen )
   {
       Set the block's form item variable to a defined value.

       Execute the block's executable context (except for prompts which were
   previously queued in the select phase).  
   }"
Analysis:
VBWG: This appears to be confusion. A block is not an input item. A block's
prompts are not collected and queued a la prompt selection in form items. A
block is fully executed in collect phase; in your example, when <goto> is
executed, no further content is executed, so the second prompt is never
executed.


Berche: Sorry about the incorrect wording, I meant that a Block is a form item
and as such it conforms to the appendix C algorithm which states in the
collect phase: "Select the appropriate prompts for the form item." and makes
no exception for the Block element.

> A block is
> fully executed in collect phase; in your example, when <goto> is
> executed,
> no further content is executed, so the second prompt is never executed.
>

I agree with that "A block is fully executed in collect phase" but as
described in the appendix C the collect phase is split into 3 steps:
1) queue prompts for the form item
2) activate grammars
3) execute form item

My feedback was to clarify the steps 1 and 3 with respects to the Block
element. If the Block's prompts are to be considered as executable content,
then I would suggest that appendix C excludes blocks from the sentence
"Select the appropriate prompts for the form item." so that there is no
ambiguity as to whether the Block's prompts are treated in step #1 or step
#3 which leads to different results as illustrated in my comment. As a
prompt is a form item, it seems a legitimate interpretation of the
specifications to queue its prompts in step#1 and only execute non-prompts
executable content in step #3.

A simple correction to the specification to remove the ambiguity would be to
replace "form item" with "input item" in the following appendix C extract:
   //
   // Collect Phase: [...]
   //
   // Queue up prompts for the form item.

   unless ( the last loop iteration ended with
            a catch that had no <reprompt> )
   {
       Select the appropriate prompts for the form item.

       Queue the selected prompts for play prior to
       the next collect operation.

       Increment the form item's prompt counter.
   }

Resolution: Accepted

Following further analysis, we have make the change you suggest in the FIA appendix extract replacing 'form items' with 'input items'.
Email Trail:

Issue R511-1

From Guillaume Berche

Incorrect time designation pattern in schema: The time designation pattern
"Duration.datatype" is defined as "\+?[0-9]+(m?s)?" in the schema. However,
this does not include real numbers such as "1.5s" as specified by CSS2 section
"4.3.1 Integers and real numbers"

Suggested modification to the definition of "Duration.datatype" in the
schema:
 <xsd:restriction base="xsd:string">
  <xsd:pattern value="\+?[0-9]+(\.[0-9]+)?(m?s)?" />
  </xsd:restriction>

Resolution: Accepted

Time designation pattern now correctly follows CSS2 model.
Email Trail:

Issue R511-2

From Guillaume Berche

Precise the Exit expr attribute is an **ECMAScript** expression which may
resolve into a defined variable Suggested text modification to section "5.3.9
EXIT":

"expr: An **ECMAScript** expression that is evaluated as the return value
(e.g. "0", "'oops!'", or "field1")."

Resolution: Accepted

Corrected in CR specification.
Email Trail:

Issue R511-3

From Guillaume Berche

Precise which event is thrown if the nextitem or expreitem attribute of a Goto
element refers to a non-existing **form item**.

Suggested text modification to section "5.3.7 GOTO": "If the **form item**,
dialog, or document to transition to is not valid (i.e. the **form item**,
dialog or document does not exist), an error.badfetch must be thrown. "

Resolution: Accepted

Corrected in CR specification.
Email Trail:

Issue R511-4

From Guillaume Berche

I also have a question concerning the "Mapping Semantic Interpretation Results
to VoiceXML forms" that I could not answer. When an input item contains a
grammar with a dialog scope, would this grammar be considered as a form-level
grammar (and therefore be semantically equivalent to a grammar element defined
in the form) or would the interpretation of its results be different that a
form-level grammar?  In particular, if this grammar matches, would the other
input items be inspected for match of their slot names on this match?  If such
a grammar is handled as a form-level grammar, I don't quite understand the
benefit for developers to have it as a child of an input item rather than as a
child of the form. Can somebody please point me to the appropriate section in
the specifications which detail this or provide me with details?

Resolution: Accepted

Text is clear in the CR specification, so no change. The distinction between field versus form level grammars is based on where they defined, whilst scoping determines when they are activated. So there is no direct connection between them - all field grammars only fill its field variables, while form-level grammars can potential fill any field within a form.
Email Trail:

Issue R519-1

From Guillaume Berche

Precise the property scope from which executables within catch elements are
evaluated.

For instance, would a prompt element in a document-level catch element use the
PropertyScope of the document or the active element at the time the event is
handled?

Suggested text addition to section "5.3 Executable Content": "Note that the
property scope in which default values are resolved is the property scope of
the active element at the time the executable content is executed. For
example, a prompt element executed as the result of a document-level catch
element would use the PropertyScope of the active element to resolve the
"timeout" property if no "timeout" attribute was specified in the Prompt
itself"
Analysis:
VBWG:Rejected: We believe that the text is already sufficiently explicit on
this issue - see Section 5.2, last paragraph describing 'as if by copy
semantics'.

Berche: In Section 5.2, last paragraph describing 'as if by copy semantics',
the specification only describes variable resolution and not property
resolution as the extract below illustrates. This was the point of my change
request.  From your answer I understand that my interpretation is right
although the suggested changes are considered extra by the VBWG.

Resolution: Accepted

We have modify the extract from 5.2 so that it is clear that properties, like variables, are resolved relative to the scope where event is thrown (i.e. NOT where the <catch> is defined).
Email Trail:

Issue R519-2

From Guillaume Berche

Precise that Block elements also have a form item

Block elements may be executed more than once without clearing their prompt
counters through their transition using a goto element. Consequently, it seems
logical that they have a prompt counters like any form item. In addition, this
removes an unnecessary exception statement in the specs and uniformises
definition of form items.

Suggested text modification to section "4.1.6 Prompt Selection": "Each form
item and menu has an internal prompt counter that is reset to one each time
the form or menu is entered. Whenever the system uses a prompt, its associated
prompt counter is incremented. This is the mechanism supporting tapered
prompts."
Analysis:
VBWG: Rejected.

Multiple blocks can achieve the same purpose. No clear use case is offered for
this change.

Berche: I understand the prompt counter is a convenience feature to not
require VXML authors to use <if> statements relying on ECMAScript variables
to select prompts to play during their execution. I believe that for
consistency of the specifications, prompts counters should apply in blocks as
well as in any form item. I don't see any motivation for excluding blocks from
event counter feature and for adding an exception case in the VXML
specifications.  As the "event counter feature" is a convenience, there are
ways to do the same thing with more work from VXML authors. I believe that
removing the exception that Blocks have no internal prompt counters would make
the specifications simpler and more consistent with additional benefits to
VXMl authors.

Resolution: Rejected

We believe that the current approach provides a consistent treatment of prompts inside fields and catches, and that adding a prompt counter to <block> at this stage may do more harm than good.
Email Trail:

Issue R519-3

From Guillaume Berche

Precise which prompt counter to use when handling a runtime error while in the
FIA selection phase?

While in the FIA selection phase, no form item is currently active. However,
if a runtime error occurs (such as during the evaluation of a cond attribute),
prompts may be played by catch elements. Which prompt counter would then be
used?

Suggested text modification to section "4.1.6 Prompt Selection": "Each form
item and menu has an internal prompt counter that is reset to one each time
the form or menu is entered. Whenever the system uses a prompt, its associated
prompt counter is incremented. This is the mechanism supporting tapered
prompts. Note that when a prompt is used while no form item or menu is active,
the current prompt counter value is one. This condition may happen when a
runtime error occurs while the FIA is in the selecting phase (e.g. the cond
expr of a form item generates an ECMAScript evaluation expression error)"

Resolution: Rejected

The prompt counter would default to 1.
Email Trail:

Issue R519-4

From Guillaume Berche

Typo: Invalid cross-reference in "1.4 VoiceXML Elements"

The <block> element is defined in section "2.3.2 BLOCK" and not "2.3.1
FIELD"

Resolution: Accepted

Corrected in CR specification.
Email Trail:

Issue R519-5

From Guillaume Berche

Precise behavior if an unsupported built-in grammar is referenced in a
document

Suggested text addition to section "2.3.1 FIELD": "type: The type of field,
i.e., the name of a builtin grammar type (see Appendix P). Platform support
for builtin grammar types is optional. If the specified built-in type is not
supported by the platform, an error.unsupported.format event will be
thrown. In this case, <grammar> elements can be specified instead."

Resolution: Accepted

The CR specification contains an "error.unsupported.builtin" error type for this purpose; "_msg" can provide more information such as builtin type.
Email Trail:

Issue R519-6

From Guillaume Berche

Precise behavior for unsupported language defined in xml:lang attribute of
<vxml>

Suggested text modification to section "1.5.1 Execution within One Document":
"xml:lang The language identifier for this document as defined in
[RFC3066]. If omitted, the value is a platform-specific default. When an
unsupported language is requested, the platform throws an
error.unsupported.language event which specifies the unsupported language in
its message variable."

Resolution: Rejected

Platform only rejects language at the point it is used in a prompt or grammar in the document.
Email Trail:

Issue R519-7

From Guillaume Berche

Precise the event thrown when an invalid property value is assigned.

The section "6.3 Property" usually specifies the valid values for a
property. However, it does not specify the behavior of the browser if a
invalid value is specified in a property value. Since the schema does not
provide validation support for property values, it seems important to specify
the browser behavior in such a condition (for instance if the bargein property
is assigned to the value "maybe")

Suggested text addition to section "6.3 Property": "If a property element
provides a value that falls outside of the set of valid values specified for
the corresponding property name in this section, an error.badfetch event is
thrown at document initialization."

Resolution: Accepted

If a platform detects that the value of a legal property name is illegal, then it should throw an error.semantic (some platforms may deal with this situation by using an appropriate default value).
Email Trail:

Issue R519-8

From Guillaume Berche

Precise that a field with a built-in type may contain additional nested
grammar elements

I believe it can make much sense on real applications to define a field with
one of the built-in type and have in addition other grammars that can
match. One possible example is to complete a platform built-in grammar with
alternative tokens (e.g. for boolean complete with "sure", "of course" and
associate them with the "true" tag value)

Suggested text addition to section "2.3.1 FIELD" (at the end of the type
attribute definition): "When this attribute is defined, use of nested grammar
elements is still legal and can possibly be used to extend the built-in
grammar with application-specific tokens (e.g. for boolean complete with
"sure", "of course" and associate them with the "true" tag value)."

Resolution: Accepted

Corrected in CR specification.
Email Trail:

Issue R519-9

From Guillaume Berche

Precise and uniformize units for time values

The timeout attribute specification of the Prompt element does not specify a
unit, nor does the timeout property. It seems desirable to me add
cross-reference to section "6.5 Time Designations" to precise the
ambiguity. The ambiguity is made stronger by the fact that some properties
representing time [intervals] do not conform to section "6.5 Time
Designations": this is the case for the maxstale and maxage.

There are numerous time designation in the specs. Following are two examples
of modifications that would clarify time units.

Suggested text modification to section "4.1.7 Timeout":

"The timeout attribute is a time designation as specified in section "6.5 Time
Designations" which specifies the interval of silence allowed while waiting
for user input after the end of the last prompt."

Suggested text modification to section "6.1.1 Fetching":

fetchtimeout:      The interval to wait (as specified in section "6.5
                   Time Designations") ...
maxage:            Indicates that the document is willing to use content
                   whose age is no greater than the specified time (in the 
                   format specified in section "6.5 Time Designations") ...
maxstale:          Indicates that the document is willing to use content
                   that has exceeded its expiration time (in the format 
                   specified in section "6.5 Time Designations") ...

Resolution: Accepted

Change applied in CR specification. Not that maxage and maxstale are derived from HTTP 1.1 and are clearly integers indicating seconds. We see no reason to coerce these into CSS2 time durations.
Email Trail:

Issue R519-10

From Guillaume Berche

Precise the semantic for the "xml:lang" attribute of Prompt elements

It is not clear how the xml:lang attribute of the Prompt element integrates
with the xml:lang attributes in nested SSML markup such as in paragraph or
sentence.

Suggested text modification to section "4.1 Prompts":
"xml:lang        The language identifier as defined in [RFC3066]. If
                 omitted, it defaults to the value specified in the 
                 document's "xml:lang" attribute. For speech output, 
                 this attribute has the same semantics as the SSML 
                 xml:lang attribute. 
                 Refer to SSML section "2.1.2 "xml:lang" Attribute:
                 Language". For audio output, this attribute is ignored.

Resolution: Rejected

This is already clear -- we don't see any confusion.
Email Trail:

Issue R519-11

From Guillaume Berche

Precise the behavior of queued prompts when a prompt fails to be played.

In the current specs, prompts are queued during the transitionning phase. Then
during the waiting phase, they start being played. It seems unclear how the
browser should react to a prompt which can not be played (e.g. an unsupported
language, or an audio prompt which can not be fetched and without alternative
prompt): an event would be thrown, however the following questions remain: a-
does the interpreter enter the transitionning phase?  b- do remaining prompts
get played?

I believe that the answer to a) is yes, and answer to b) is no because
otherwise partial audio would be delivered to the end-user, without the
application being able to control it in any way.

Suggested text modification to section "4.1.8 Prompt Queueing and Input
Collection":
- when a prompt fails to be played, the appropriate event is thrown (such as
error.badfetch, error.unsupported.format, or error.unsupported.language ) and
the interpreter enters the transitionning phase. The remaining prompts do not
get played. As described in section "4.1.3 Audio Prompting", events thrown as
a result of failed prompts are not designed to support programmatic recovery
from the application.

Resolution: Accepted

Corrected in the CR specification.
Email Trail:

Issue R519-12

From Guillaume Berche

Comment on issue concerning Grammar mode attribute in section "3.1.1.4 Grammar
Element"

I believe that this attribute should be ignored for external grammars. This is
because a default value in SRGS exists, and as noted in the case a mode value
is provided in the grammar, conflict can occur.

Suggested text modification to section "3.1.1.4 Grammar Element":
"mode         Defines the mode of the contained grammar following the
              modes of the W3C Speech Recognition Grammar Specification 
              [SRGS]. Defined values are "voice" and "dtmf" for DTMF input. 
              If the mode value is in conflict with the mode of the grammar 
              itself, a "badfetch" event is thrown. This attribute is 
              ignored for referenced grammars."

Resolution: Accepted

Already fixed in CR specification due to earlier change requests.
Email Trail:

Issue R519-13

From Guillaume Berche

Enforce consistency among "xml:base" attribute of application and document and
precise precedence order.

Suggested text modification to section "1.5.1 Execution within One
Document":

"xml:base The base URI for this document as defined in [XML-BASE]. As in
             [HTML], a URI which all relative references within the document
             take as their base. If both the root application and the leaf
             document define an "xml:base" attribute and the values for this
             attribute if different then an error.semantic event is thrown. If
             either the root application or the leaf document define the
             xml:base attribute, then it becomes the base URI for the current
             document. If the xml:base attribute is defined in neither root
             application nor leaf document, then the root and leaf documents
             must be loaded from URIs with the same base, otherwise an
             error.semantic event is thrown."

Rationale: it may be difficult for VXML authors to develop VXML applications
in which the base URI is different for the root than for the leaf. This is
because links defined in the application are active while the leaf is
active. Therefore, relative URIs in root-defined links would probably fail if
activated while the leaf is active. Consequently, not enforcing consistent
base URIs prevents such root documents to use relative URIs since their leaf
documents might override it.
Analysis:
VBWG: xml:base is by definition a document-oriented concept, not an
application-oriented concept.

Berche: Section "5.2 Event Handling", states that "Similarly, relative URL
references in a catch element are resolved against the active document and not
relative to the document in which they were declared." In addition, section
"5.2.4 Catch Element Selection" states that "Form an ordered list of catches
consisting of all catches in the current scope and all enclosing scopes (form
item, form, document, application root document, interpreter context), ordered
first by scope (starting with the current scope), and then within each scope
by document order." Consequently, a catch element in a root application with a
relative URI may be resolved from the xml:base of a leaf document. If no
consistency is enforced between the application and the document whereas logic
is shared among the two (such as catch elements, or links), this reduces the
benefits of shared logic provided by application documents.

Resolution: Rejected

We are struggling to understand your point. We believe that the use of xml:base is clear in the specification. A root document and a leaf document can have different xml:bases by assigning different values to their attributes. When relative URIs are evaluated, they are evaluated together with the xml:base value of the document which contains the active dialog. For example, take a <link> defined in a root document. If the active dialog is in the leaf document, then the leaf's xml:base would be used; if the active dialog is in the root document, then the root's xml:base would be used. This seems to us consistent and coherent.
Email Trail:

Issue R519-14

From Guillaume Berche

Preventing use of caches for submit requests

I believe that the following sentence in section "5.3.8 SUBMIT" is dangerous
and not consistent with the described intent of the submit element.

"Note that although the URI is always fetched and the resulting document is
transitioned to, some <submit> requests can be satisfied by intermediate
caches. This might happen if the method is "get", the namelist is empty, there
is no query string in the URI, and the application and the origin web server
both allowed the document to be cached."

The submit element is designed to expressly communicate with the origin server
as stated in section "5.3.8 SUBMIT": "The <submit> element is used to
submit information to the origin web server and then transition to the
document sent back in the response."

I therefore believe that <submit> elements should never be satisfied by
intermediate caches. I don't see any real-life case in which this would be
desirable. This would lead to situations were the cached pages would be
transitionned to while the URI fetching might fail later on.

A submit request with get request, empty namelist, no query string in the URI
should logically use a goto rather than a submit. A cached submit request
might also be dangerous for VXML browser supporting persistent HTTP Header
such as cookies, because the remote server may expect to maintain session
information and may rely on receiving the submit request even if it always
provides the same result page back.

Suggested text modification to section "5.3.8 SUBMIT": "Note that the URI is
always fetched, the resulting document is transitioned to, and no <submit>
requests should ever be satisfied by intermediate caches. VXML authors willing
to have such behavior should rather use a goto element."

Resolution: Accepted

We have applied a similar clarification on the wording in this section.
Email Trail:

2.2 Technical Errors

Issue R477-1

From Guillaume Berche

In Section 4.1.8, it seems incorrect to state that "While in the transitioning
state various prompts are queued, [...] by the <prompt> element in field
items" since the queuing of prompt elements in field items is part of the FIA
collect phase (Appendix C), which itself is part of the waiting phase ("the
waiting state is entered in the collect phase of a field item").

Prefered suggested fix: modify comments in Appendix C so that there is a
"prepare" phase in which prompts are queued and grammars are activated. The
"Collect" phase would then only start after the comment "// Execute the form
item."

Then modify section 4.1.8 to the following:

"The waiting and transitioning states are related to the phases of the Form
Interpretation Algorithm as follows:
- the waiting state is entered in the collect phase of an input item, and
- the transitioning state encompasses the process, select and
**prepare** phases"

I believe this additional FIA phase makes the definition of the waiting and
transitioning more clear.

Alternative fix: modify section 4.1.8 to the following:
The waiting and transitioning states are related to the phases of the Form
Interpretation Algorithm as follows:
- the waiting state is entered in  the collect phase of an input item **at the
point at which the interpreter waits for input**
- the transitioning state encompasses the process and select phases, the
collect phase for control items (such as <block>s), and the collect phase
for input items up until the point at which the interpreter waits for input.

Resolution: Rejected

The queueing of prompts is part of the collect phase of the FIA, but the collect phase is part of BOTH the waiting state and the transition state, per the description in 4.1.8. However, we have clarified in section 4.1.8 of [5] the relationship between entering the waiting state and the phases of the FIA ("the waiting state is eventually entered in the collect phase of an input item (at the point at which the interpreter waits for input)").
Email Trail:

Issue R505-1

From Ray Whitmer

VoiceXML Events as DOM Events

Section 5.2 on event handling claims that "An interpreter may implement
VoiceXML event handling using a DOM 2 event processor".  It is difficult to
see how this is true, and the following sub-issues are examples of why this is
not true.

Resolution: Accepted

We original believed that a modified DOM2 event processor could implement the VoiceXML event model. However, since it is 'modified' processor - in order to handle your points [2] and [3] - it is not strictly a DOM2 processor. Hence, in the candidate recommendation version, all references to the DOM2 event model will be removed.
Email Trail:

Issue R505-2

From Ray Whitmer

Handler Order

Later in the document, section 5.2.4 states that the event delivery algorithm
is described as a constrained version of XML Events and DOM 2 event
processing, where the catch events are explicitly ordered by document
order. This makes impossible to implement VoiceXML event handling using a
normal DOM 2 event processor in any reasonable fashion.

Resolution: Accepted

In the candidate recommendation version, this statement will be removed.
Email Trail:

Issue R505-3

From Ray Whitmer

Canceling on Current Level

Also, section 5.2.4 states that an event handler which handles an event stops
propogation of the event, and implies that other event handlers declared on
the same element will not be called.  While DOM event handling has the ability
to cancel handlers declared on ancestor nodes, all handlers will always still
be called on a single node if any handlers are called on that node regardless
of cancelling that occurs during delivery.

Resolution: Accepted

In the candidate recommendation version, this statement will be removed.
Email Trail:

2.3 Requests for Change to Existing Features

Issue R469-2

From Deborah Dahl

NLSML
It would be useful to understand how NLSML formatted results can be used to
populate VoiceXML field items.  The VoiceXML specification includes a
comprehensive discussion of mapping ASR results in the form of ECMAScript
objects to VoiceXML forms, but says very little about NLSML format.
Priority:  Medium High

Resolution: Rejected

Again this has been deferred until the next version of VoiceXML. NLSML is not mature as a specification and is currently changing into EMMA under the auspices of the MMWG. When mature, the specification may be re-considered for the next version of VoiceXML.
Email Trail:

Issue R470-1

From Stefan Hamerich

<subdialog> in mixed initiative dialogue: as written in the last draft and
as well in the DTD <subdialog> is only allowed as child element of
<form>.  Why can't <subdialog> be allowed as child element of <field>
resp. <filled>?  This would be fine for getting i.e. confirmation for given
values and would allow the processing of different values separately.  At the
moment some more work has to be done to provide this ability with the given
possibilities. We would appreciate at least to think about to widen the group
of allowed parent elements of <subdialog>.

Resolution: Rejected

<subdialog> involves involves collecting user input and that is not part of executable content (such as <filled>) according to FIA. As you point, there are workarounds already available in VoiceXML for confirmation and processing different values separately. However, this issue may be addressed in the next version of VoiceXML where one tentative requirement is that the FIA is more flexible and extensible.
Email Trail:

Issue R470-3

From Stefan Hamerich

filled fields in mixed-initiative dialogues: in mixed-initiative dialogues,
which use one grammar for several fields, values which have been set correctly
could be simply overwritten by new utterances from the user.  Sometimes this
behaviour is wished and good to have, but there are situations, where we would
wish to deactivate fields, which were filled correctly. Is there any work done
in this field?  At the moment we solve this by adding an extra variable for
each field. But maybe there is a more elegant solution available?

Resolution: Rejected

When to correctly override variables is an application issue. There is a workaround by copying variables into a separate space as soon as they are instantiated: this avoids them being overwritten. The issue may be re-visited in the next version of VoiceXML when have the opportunity to provide a better separation of presentation from data structure in VoiceXML forms (e.g. xforms) and to provide more detailed control of variable filling.
Email Trail:

Issue R472-1

From Bogdan Blaszczak

Additional control over a start position, speed and volume of audio playback
would be a useful feature in some applications.

Section 6.3.1 has an example of a volume control provided as a
platform-specific property. However, it also correctly states that
"platform-specific properties introduce incompatibilities".

Resolution: Accepted

In section 6.3.1 we have clarified conformance behavior when interpreter encounters properties it cannot process: it must (rather than should) not thrown an error.unsupported.property and must (rather than should) ignore the property.
Email Trail:

Issue R478-1

From Al Gilman

Text equivalents for all recorded-speech prompts should be required as a
validity condition of the format.  These make the difference between a dialog
where access by text telephone, for example, is readily achievable and where
it quite difficult to achieve.

Rationale:

Text telephones are widely used by people who are Deaf or Hard of Hearing to
access the services that others access by voice telephony.

The dialog design of a voice dialog would work in a text-telephone delivery
context, so long as the dialog elements are available as text.
Analysis:
A number of meetings between the VBWG and PFWG were held to mutually clarify
the different perspectives and assumptions and to clarify the requirements (http://www.w3.org/2002/09/04-pf-irc . The following proposals were
discussed:

1. VoiceXML 2.0 provides a mechanism for 'text equivalent' of an audio file.

VBWG Response: See reponse to (3) below.

2. The text equivalence could be expressed as:
(a) a tag such as 'alt' (comparable with HTML approach)
(b) content of the <audio> element itself

PFWG prefer (b) since it provides more flexibility in terms of content (Ruby
was mentioned as one method by the text equivalence could be expressed. This
isn't compatible with expressing it as an attribute).

VBWG Response: See reponse to (3) below.


3. If the text equivalent is expressed as <audio> content, it could be
achieved by:

(a) using a separated, dedicated element: e.g. <audio><alt>access
content</alt>normal fallback content</audio>

(b) the content of the <audio> itself (i.e. accessiblity content and
fallback text are identical): <audio> accessibility/fallback content
</audio>

PFWG prefer (b) on the grounds that they do not see any justification for a
separation of accessibility content from fallback content. What is our
justification for this?

VBWG Response: There was some agreement that 3. (b) was acceptable. However,
two concerns were raised: (i) there may be differences between fallback and
text equivalence, especially where the fallback is an error message
(i.e. 'cannot find file') which is not a text equivalent, and (ii) it is
unclear whether any such change is feasible -- if it will have no impact on
the current VoiceXML user agent implementations, how will a Voice Browser know
about the end user's needs, how will it detect special end user devices? That
is, if we accept 3. (b) it is unclear precisely what needs to change in the
specification without a substantial investigation in the whole issue of
Accessibility and Voice Browser User Agents. This investigation is already
being planned by members of VBWG and PFWG, and will have an impact on the next
version of VoiceXML.

4. They prefer that the text equivalent in <audio> is obligatory: e.g. a
conformant VoiceXML must contain a text equivalent.

VBWG Response: Rejected (tentative, discussion was incomplete) 

There were specific concerns for whether mandatory text equivalence is
possible in (i) cases whether audio is dynamically generated (e.g. recording
voice mail message), and (ii) where there are differences between spoken and
written languages. There were also some concerns as to whether this was
desirable for accessibility at all, especially where the audio was stylistic
rather than contentful. Furthermore, it is unclear whether this could be
enforced, and like (6) whether it was appropriate to use a technology
specification to enforce this type of policy (Voice Browser Guidelines for
best practise in writing acessible applications seemed more appropriate).

Resolution: Accepted

The VoiceXML specification uses the <audio> element provided by SSML 1.0. In the CR version of that specification, a <desc> element is introduced as a child of <audio> to provide a description of non-speech audio. When the audio source of <audio> is not available, or if the processor can detect that text-only output is required, the content of <audio> is rendered including, where appropriate, a <desc> element.
Email Trail:

Issue R496-1

From Guillaume Berche

Problem with section "1.5.4 Final Processing"

This section states that "While in the final processing state the application
must remain in the transitioning state and may not enter the waiting state (as
described in Section 4.1.8). Thus for example the application should not enter
<field>, <record>, or <transfer> while in the final processing
state. The VoiceXML interpreter must exit if the VoiceXML application attempts
to enter the waiting state while in the final processing state. "

While section "4.1.8 Prompt Queueing and Input Collection" states "Similarly,
asynchronously generated events not related directly to execution of the
transition should also be buffered until the waiting state
(e.g. connection.disconnect.hangup). " However, since a single event triggers
a transition to the transitionning state, those two descriptions
conflict. Imagine the following situation in which a remote user sends a bunch
of DTMFs and then hangs up, then since events would be sent in sequence, and
that input would normally trigger a transition to another field which then
requests a input collection. As currently described in section "1.5.4 Final
Processing", this would result in the interpreter exiting, without letting the
application catch the connection.disconnect.hangup event.

Suggested modification to section "1.5.4 Final Processing":

The final processing state is entered when the connection.disconnect.hangup
event is handed to the application. As described in section "4.1.8 Prompt
Queueing and Input Collection", the remote user may be disconnected and DTMF
may be provided from a previous buffer before the application receives the
connection.disconnect.hangup event. During the period of time in which the
remote user is disconnected and final processing state is not yet entered, the
application may queued prompts and request input as for normal processing. The
buffered input will be used can compared against requested input, only DTMF
grammars terminating timeouts would be shortened.

While in the final processing state the application must remain in the
transitioning state and may not enter the waiting state (as described in
Section 4.1.8). Thus for example the application should not enter <field>,
<record>, or <transfer> while in the final processing state (i.e while
handling the connection.disconnect.hangup event). However, the <submit> tag
is legal. The VoiceXML interpreter must exit if the VoiceXML application
attempts to enter the waiting state while in the final processing state.

Resolution: Rejected

We believe there is some confusion here. The final processing state doesn't occur until the disconnect event occurs, so the problem you have identified should not happen.
Email Trail:

Issue R496-2

From Guillaume Berche

Modify section "5.3.11 DISCONNECT"
Section "5.3.11 DISCONNECT" states that "Causes the interpreter context to
disconnect from the user. As a result, the interpreter context will throw a
connection.disconnect.hangup event, which may be caught to do cleanup
processing, e.g."

I believe this is not a good thing to throw an event in this case because a
catch clause would not be able to differentiate between a real user hang-up or
some logic in the application that requested a disconnection. The suggested
cleanup phase can easily done by the application by throwing a custom event,
and in the catch clause performing necessary clean-up and then using the
<disconnect> element.

Suggested text modification to section "5.3.11 DISCONNECT": "As a result, the
interpreter context will disconnect the remote user and exit the
interpreter. Note that applications that would be willing to perform tasks
upon disconnection (such as clean up) may rather throw a custom event, and in
the catch clause perform necessary processing prior to invoke the
<disconnect> element."

Resolution: Rejected

The application can always tell the difference between a 'real hangup' and an application generated one, since the developer can always use scripting to indicate that it is application-generated (e.g. set a variable).
Email Trail:

Issue R505-4

From Ray Whitmer

Expect combination of VoiceXML with other markup such as, XHTML, SVG, SSML,
etc. when defining multimodal presentations.  In such cases, ECMAScript
throughout the document should be consistent and interoperable.  In this case,
we would expect content authors call functions in the global scope throughout
the document and access all parts of the document through DOM, register event
handlers, etc.

The intertwining of ECMAScript scopes and VoiceXML-based declaration of
variables visible to ECMAScript, as described in section 5.1, is
unusual. Ignoring implementation issues, it seems like it could cause usage
problems. For example, if a script uses DOM to add an event handler, how does
the event handler script get access to the field values it needs to get or set
to respond to the event?  If a script tries to access or modify a field value
through DOM, how does that relate to the in-scope variable?

Resolution: Accepted

We don't expect these problems to arise in VoiceXML 2 since it was never designed for embedding in other execution container. We are aware that VoiceXML needs to be aligned with W3C best practises in terms of document model, event model, and so on, but doing so in VoiceXML 2 would be too fundamental a change this late in the process. In the next version of the language, which is intended for embedding in other environments, we are committed to addressing these model issues at a fundamental level, and look forward to receiving requirements from, and working with, on these issues with the DOM WG in the future.
Email Trail:

2.4 Requests for New Feature

Issue R469-1

From Deborah Dahl

VoiceXML Modularization
Modularization of VoiceXML would separate VoiceXML constructs into separate
modules.  This would allow the constructs to be used in a multimodal language
as components that can be embedded in multimodal documents.  
Priority: High

Resolution: Rejected

This issue has been deferred until the next version of VoiceXML. Attempting to introduce it at this stage is problematic since it requires bringing VoiceXML into line with XHTML modularization principles (e.g. no tag should have non-local effects, such as determination of active grammars) and this may require a fundamental restructuring of parts of the VoiceXML. For the next version, the VBWG will take this and other MMWG requirements into account from the beginning of the specification process. We encourage the MMWG to become actively involved in the process once it is initiated.
Email Trail:

Issue R469-3

From Deborah Dahl

XML Events
A modularized VoiceXML should support XML Events.  VoiceXML components
embedded in multimodal XML documents would share a multimodal document's DOM
and DOM events.
Priority:  High

Resolution: Rejected

Again this has been deferred until the next version of VoiceXML, where integration with event models, such as DOM and XML, can be addressed at a fundamental level. Feedback from the DOM WG has indicated that the current VoiceXML event model is not compatible with the current DOM event model.
Email Trail:

Issue R470-2

From Stefan Hamerich

recognize from file: the <record> element allows the recording of spoken
utterances. With <audio> the resulting files could be played to the user.
But we miss a possibility to take an audio file instead of real spoken input
and to recognize from this file for further processing. This could be
especially interesting for off-line processing of dialogues.

Resolution: Rejected

The use case is not fundamental to VoiceXML 2.0 since it focuses on realtime interaction with a user. There is a workaround where user input can be recorded and then analysed by an external ASR web service. This is really a batch Use Case (also applicable to Speaker Verification, multiple ASR passes, messaging, etc) which may be considered for the next version of VoiceXML.
Email Trail:

Issue R470-4

From Stefan Hamerich

VoiceXML for embedded applications:
VoiceXML is mainly good for telephony applications. But for
embedded applications it takes too much space because of all the
needed components like HTTP-server, interpreter for cgi scripts,
and the VoiceXML interpreter itself. Is there any work done in
the Voice Browser Group of the W3C at this field?

Resolution: Accepted

There are a wide variety of embedded devices and VoiceXML interpreter have already be used on some, depending on their available resources. Putting the interpreter, media resources and the application is more problematic though (although this is clearly possible on some PDA devices today). We may address modularization of VoiceXML and device profiling in the next version of VoiceXML, and this should facilitate running smaller interpreter profiles.
Email Trail: