VoiceXML 2.1 Candidate Recommendation Disposition of Comments

This document details the responses made by the Voice Browser Working Group to issues raised during the Last Call (beginning 28 July 2004 and ending 1 September 2004) review of Voice Extensible Markup Language (VoiceXML) Version 2.1. Comments were provided by Voice Browser Working Group members, other W3C Working Groups, and the public via the www-voice-request@w3.org (archive) mailing list.

1. Introduction

This document describes the disposition of comments in relation to Voice Extensible Markup Language (VoiceXML) Version 2.1 (http://www.w3.org/TR/2004/WD-voicexml21-20040728/). Each issue is described by the name of the commentator, a description of the issue, and either the resolution or the reason that the issue was not resolved.

The full set of issues raised for the Voice Extensible Markup Language (VoiceXML) Version 2.1 since July 2004, their resolution and in most cases the reasoning behind the resolution are available from http://www.w3.org/Voice/Group/2005/voicexml21-cr.html [W3C Members Only]. This document provides the analysis of the issues that were submitted and resolved as part of the Last Call Review. It includes issues that were submitted outside the official review period, up to 13th January 2005.

Notation: Each original comment is tracked by a "(Change) Request" [R] designator. Each point within that original comment is identified by a point number. For example, "R5-1" is the first point in the fifth change request for the specification.

2. Summary

Item	Commentator	Nature	Disposition
R62-1	Teemu Tingander	Change to Existing Feature	Unknown
R63-1	Teemu Tingander	Change to Existing Feature	Unknown
R64-1	Teemu Tingander	Change to Existing Feature	Unknown
R65-1	Teemu Tingander	Clarification / Typo / Editorial	Unknown
R65-2	Teemu Tingander	Change to Existing Feature	Unknown
R66-1	Robert Keiller	Feature Request	Unknown
R67-1	Robert Keiller	Change to Existing Feature	Unknown
R104-1	Ken Waln	Change to Existing Feature	Accepted
R104-2	Ken Waln	Clarification / Typo / Editorial	Accepted
R104-3	Ken Waln	Clarification / Typo / Editorial	Accepted
R104-4	Ken Waln	Clarification / Typo / Editorial	Accepted
R104-5	Ken Waln	Clarification / Typo / Editorial	Accepted
R104-6	Ken Waln	Change to Existing Feature	Accepted
R104-7	Ken Waln	Feature Request	Accepted
R84-1	Dan Connoly	Clarification / Typo / Editorial	Unknown
R85-1	Dan Connoly	Change to Existing Feature	Accepted
R106-1	Janina Sajka	Clarification / Typo / Editorial	Accepted
R103-1	Tobias Gobel	Clarification / Typo / Editorial	Accepted
R86-1	Dominique Hazael-Massieux	Clarification / Typo / Editorial	Accepted
R87-1	Dominique Hazael-Massieux	Clarification / Typo / Editorial	Accepted
R88-1	Dominique Hazael-Massieux	Clarification / Typo / Editorial	Accepted
R89-1	Dominique Hazael-Massieux	Clarification / Typo / Editorial	Accepted
R90-1	Dominique Hazael-Massieux	Clarification / Typo / Editorial	Accepted
R91-1	Dominique Hazael-Massieux	Clarification / Typo / Editorial	Accepted
R92-1	Dominique Hazael-Massieux	Clarification / Typo / Editorial	Accepted
R93-1	Dominique Hazael-Massieux	Clarification / Typo / Editorial	Accepted
R94-1	Dominique Hazael-Massieux	Clarification / Typo / Editorial	Accepted
R105-1	James Wilson	Clarification / Typo / Editorial	Unknown
R107-1	Martin Duerst	Clarification / Typo / Editorial	Accepted
R107-2	Martin Duerst	Clarification / Typo / Editorial	Accepted
R107-3	Martin Duerst	Clarification / Typo / Editorial	Accepted
R107-4	Martin Duerst	Change to Existing Feature	Accepted

2.1 Clarifications, Typographical, and Other Editorial

Issue R65-1

From Teemu Tingander:

As a general comment for <data> elements DOM mapping; I don't see why we should add more complex programming capabilities into voicexml and once again make it possible to move the complex application logic into UI side !

Resolution: rejected

The Voice Browser working group performed a rigorous study of commonly implemented features and their associated use cases. Three clear needs arose from that study:

The ability to perform a non-transitional HTTP fetch
The ability to access data from back-end data sources
The ability to manipulate that data using a standard object model

Several existing implementations show that the exposure of XML data through the W3C DOM was a natural solution to solve these problems.

Email Trail:

Original Comment Teemu Tingander
Receipt Confirmation VBWG (2004-03-31)
VBWG official response to last call issue VBWG (2004-02-28)

Issue R104-2>

From Ken Waln:

In section 3, I do not see the value of this construct. The example shows it being used to pass a parameter to the script being included, but since the script functions by definition accept parameters, why pass in a parameter to load different scripts? In general this smacks of self-modifying code a little bit. If you are going to call functions in a script file, the function definitions should be included statically. Maybe another example could convince me. The best example I can come up with would be a set of language specific includes, but I think that can be handled better in other ways as well.

Resolution: accepted

The srcexpr attribute on script achieves parity with other tags including audio, goto, submit, subdialog, and choice (and now grammar and data) that support the dynamic resolution of the URI that they are to fetch. For all these tags, a developer might use the dynamic URI attribute to her advantage to dynamically configure the host from which the resource is fetched - for example, a development server during the development phase and a production server during the deployment phase. The script srcexpr example has been updated in the forthcoming draft to illustrate this use case.

Email Trail:

Original Comment Ken Waln
VBWG official response to last call issue VBWG (2005-02-28)
Response to VBWG official response Ken Waln (2005-02-28)

Issue R104-3

From Ken Waln:

I agree with comments that the data element seems to encourage designing far too much of an application as client-side script instead of using an n-tier model. I would prefer it not be added. At a minimum it should be optional as it should not be encouraged as an appropriate design pattern.

Resolution: rejected

The data element allows a clean separation of dynamic data from static presentation markup. A benefit of this approach is the ability for multiple applications targeted at one or more modes of interaction to consume data produced via a well-defined URL API. Another benefit is the ability for browsers to cache the presentation markup. Having said this, from the VoiceXML 2.1 specification: "If an implementation does not support DOM, the name attribute must not be set, and any retrieved content must be ignored by the interpreter." This implies that the data element can be used for its non-transitional "send" capabilities (HTTP GET or POST).

Email Trail:

Original Comment Ken Waln
VBWG official response to last call issue VBWG (2005-02-28)
Response to VBWG official response Ken Waln (2005-02-28)
VBWG response addressing optionality of data VBWG (2005-03-01)
Final Disposition for 2.1 Ken Waln (2005-03-06)

Issue R104-4

From Ken Waln:

If the data element is needed, I think the VBWG should avoid defining its own data access protocol like this. The current design encourages too much interdependency between the VoiceXML document and the XML service. Defining a new protocol like this opens up lots of work to be done in the areas of versioning, security, etc. I recommend replacing it with a mechanism to call SOAP web service (if it is left in at all) with clearly specified parameters and return values.

Resolution: rejected

The purpose of the data security model is to assert that browsers should support content sandboxing. In particular, because the data element is effectively a "file open" command for arbitrary XML content at any URL accessible to the browser, it represents a potential security risk.
Without this security model, a browser running inside a corporate firewall would permit an application running on that browser to access internal corporate documents and to potentially submit that data back to another web server.
The working group evaluated a number of solutions that would enable data providers to secure their data against unauthorized access. The working group decided to keep the description of the access-control PI in the spec to inform browser implementors about one possible mechanism for enforcing security on the data.

Email Trail:

Original Comment Ken Waln
In defense of the data security model Brad Porter (2004-10-20) (Internal)
VBWG official response to last call issue VBWG (2005-02-28)
Response to VBWG official response Ken Waln (2005-02-28)
VBWG response addressing optionality of data VBWG (2005-03-01)
Final Disposition for 2.1 Ken Waln (2005-03-06)

Issue R104-5

From Ken Waln:

The access control on the data element does not seem to be very secure. It seems to assume the browser is a trusted entity (since the credentials are fetched along with any sensitive data, the browser already has the sensitive data). I suppose it is trying to protect against malicious VoiceXML in a hosted environment, but that is only one deployment option for a VoiceXML browser. I think any security needs to be removed from the XML and moved into lower levels of the protocol. Perhaps supplying credentials for a web server level validation is enough.

Resolution: rejected

The security model is not designed to enforce a trust relationship between the server and the browser. The server-to-browser trust relationship should be enforced with existing technologies such as SSL certificates, XML-SIG or XML-ENC, trusted VPNs, or exclusive network access.
While the hosted environment is only one deployment option, Web standards should be designed to support the most conservative security environment. In particular, malicious VoiceXML content should not have arbitrary access to any network-accessible XML resource.
Web servers supplying credentials is insufficient because the trust relationship is from application-to-application, not browser-to-server. The browser is the agent entrusted to preserve application-to-application sandboxing.
The working group evaluated a number of solutions that would enable data providers to secure their data against unauthorized access. The working group decided to keep the description of the access-control PI in the spec to inform browser implementors about one possible mechanism for enforcing security on the data.

Email Trail:

Original Comment Ken Waln
VBWG official response to last call issue VBWG (2005-02-28)
Response to VBWG official response Ken Waln (2005-02-28)
VBWG response addressing optionality of data VBWG (2005-03-01)
Final Disposition for 2.1 Ken Waln (2005-03-06)

Issue R84-1

From Dan Connoly:

The only reference I can find to HTML4 in the text is "the <script> element allows the specification of a block of client-side scripting language code, and is analogous to the [HTML4] <SCRIPT> element." -- http://www.w3.org/TR/2004/WD-voicexml21-20040728/#sec-script_expr
That looks informative, to me.
Why is HTML4 listed among the normative references?

Resolution: accepted

The reference to HTML4 was moved to the informative reference section.

Email Trail:

Original Comment Dan Connoly
Acknowledgement of oversight VBWG (2004-07-28)

Issue R106-1

From Janina Sajka:

On behalf of the Protocols and Formats Working Group (WAI) We are concerned that the security provisions specified in Appendix E, "Securing access to <data>" would negatively impact accessibility.
It is reasonable to believe that various agencies and service organizations might create specialized scripts to better meet the interface needs of certain populations of persons with disabilities who cannot directly use a voice-based service without special accomodation. Indeed, we believe such enhanced interfaces could provide access to information and services were it does not exist today. Protecting this opportunity is important.
The mechanism outlined in Appendix E, however, tends to limit access to organizations known to the organization hosting the VoiceXML application. Agencies serving persons with disabilities, however, are likely to be unknown and of lesser commercial impact. It is likely, therefore, that agencies serving persons with disabilities would find it dificult to be listed.
Furthermore, the mechanism specified in Appendix E would require agencies serving persons with disabilities to seek listing with every VoiceXML application host individually. This is burdensome and likely to result in spotty accessibility support at best.
We would suggest the security control provisions be reconsidered to provide for a authenticated access vouched and certified by a third-party trust broker. While such services may not be commonplace today, we believe numerous use case scenarios exist for such services--beyond the current instance.

Resolution: accepted

The security mechanism described in Appendix E was made informative.

Email Trail:

Original Comment Janina Sajka
Acknowledgement and request for clarification VBWG (2004-10-20)
VBWG Official Response VBWG (2005-03-08)
WAI Telecon Minutes re: data security WAI (2005-04-06) (Internal)
Acceptance of response Al Gilman (2005-04-06)

Issue R103-1

From Tobias Gobel:

having studied the WD for VXML 2.1 a bit, I came across a problem concerning the new <mark> element support.
As for the attribute "marktime", the spec says:
"The number of milliseconds that elapsed since the last mark was executed by the SSML processor until barge-in occurred or the end of audio playback occurred. If no mark was executed, this variable is undefined."
Does this mean that if the caller does no barge-in, marktime is the same as the duration of the prompt itself ("...until the end of audio playback occurred")? I wonder why, since this would not allow to check how long it took the caller to react to a prompt. What I want, e.g. in order to be able to draw conclusions from reaction times to caller status (first caller, power user etc.), is to know the time that elapsed since the end of the last prompt (until "timeout" elapses or caller says something).
This could be achieved either by not stopping the timer after the end of audio playback, or by allowing to set an additional marker at the very end of a prompt and check markname and marktime of *this* marker later on (e.g. in the <filled> section). If the latter IS possible, you might want to adapt the spec slightly, pointing to this possibility.

Resolution: deferred

The driving motivation behind this VoiceXML 2.1 feature is to detect when the user barges in on a prompt. As implied by the VoiceXML 2.1 specification, if the caller does not barge-in and the only mark is at the beginning of the prompt queue, then the marktime reflects the duration of the prompt. Although there is no feature in VoiceXML 2.0 or VoiceXML 2.1 that allows you to obtain the reaction time of the caller, you can achieve this by setting the timeout property to zero seconds and add to the end of the prompt queue a silent audio file (e.g. silence.wav) the duration of which is equivalent to the desired timeout. The working group has opted to defer this feature request to a future version of VoiceXML.

Email Trail:

Original Comment Tobias Gobel
Internal Discussion VBWG (2005-02-03) (Internal)
VBWG official response to last call issue VBWG (2005-02-04)
Final Disposition for 2.1 Tobias Gobel (2005-02-08)

Issue R86-1

From Dominique Hazael-Massieux:

the conformance section of the document uses terms like 'may', 'must', 'recommended', etc, but without reference to RFC 2119 nor is there any definition of how these should be interpreted; is that on purpose?

Resolution: accepted

Definitions for these terms and a reference to RFC 2119 was added to the 'Status of this Document' section of the 2.1 specification.

Email Trail:

Original Comment Dominique Hazael-Massieux
Acknowledgement VBWG (2004-07-21)
VBWG Official Response VBWG (2005-03-08)
Acceptance of disposition Dominique Hazael-Massieux (2005-03-09)

Issue R87-1

From Dominique Hazael-Massieux:

the conformance labels (VoiceXML document, VoiceXML processor) don't make references to the version of VoiceXML; is that intended?

Resolution: accepted

The version ('2.1') was added to the conformance labels.

Email Trail:

Original Comment Dominique Hazael-Massieux
Acknowledgement VBWG (2004-07-21)
VBWG Official Response VBWG (2005-03-08)
Acceptance of disposition Dominique Hazael-Massieux (2005-03-09)

Issue R88-1

From Dominique Hazael-Massieux:

it's not obvious from reading voicexml2.0 (nor voicexml2.1) what a voicexml processor should do with a VXML document with a version that it doesn't know; if it should throw an error, I wonder how this relates to the claim that VoiceXML2.1 is backwards compatible with VoiceXML2.0

Resolution: accepted

The following text was added to Appendix C.3: 'When a Conforming VoiceXML 2.1 Processor encounters a non-Conforming VoiceXML 2.0 or 2.1 document, its behavior is undefined.'

Email Trail:

Original Comment Dominique Hazael-Massieux
Acknowledgement VBWG (2004-07-21)
VBWG Official Response VBWG (2005-03-08)
Acceptance of disposition Dominique Hazael-Massieux (2005-03-09)

Issue R89-1

From Dominique Hazael-Massieux:

it's not clear which sections are normative and which are simply informative

Resolution: accepted

The sections of the document in the main body are normative unless otherwise specified. For example, in section 9, "Adding type to <transfer> we explicitly state "As specified in 2.3.7 of [VXML2], the <transfer> element is optional, though platforms should support it. Platforms that support <transfer> may support any combination of bridge, blind, or consultation transfer types." Appendices are informative unless otherwise explicitly indicated. For example, in Appendix B and Appendix C: "This section is Normative." In Appendix F.1, the title is "Normative References". Text has been added to the status section of the document to explain this policy.

Email Trail:

Original Comment Dominique Hazael-Massieux
Acknowledgement VBWG (2004-07-21)
VBWG Official Response VBWG (2005-03-08)
Requests explanatory text in intro Dominique Hazael-Massieux (2005-03-09)

Issue R90-1

From Dominique Hazael-Massieux:

the notion of XML well-formed document is bound to XML 1.0 in the spec; is there any discussion on accepting also XML 1.1?

Resolution: deferred

Email Trail:

Original Comment Dominique Hazael-Massieux
Acknowledgement VBWG (2004-07-21)
VBWG Official Response VBWG (2005-03-09)
Acceptance of disposition Dominique Hazael-Massieux (2005-03-09)

Issue R91-1

From Dominique Hazael-Massieux:

the references to XML 1.0 are outdated (latest version is from February 2004)

Resolution: accepted

The reference to XML 1.0 was updated to point to the 3rd edition published in Feb 2004.

Email Trail:

Original Comment Dominique Hazael-Massieux
Acknowledgement VBWG (2004-07-21)
VBWG Official Response VBWG (2005-03-08)
Requests explanatory text in intro Dominique Hazael-Massieux (2005-03-09)

Issue R92-1

From Dominique Hazael-Massieux:

this may be planned for an more advanced draft, but having a table with all the elements and attributes defined by VoiceXML 2.1 would be great (like in HTML 4.01 [3])

Resolution: accepted

A table of elements was added to the introduction (1.1).

Email Trail:

Original Comment Dominique Hazael-Massieux
Acknowledgement VBWG (2004-07-21)
VBWG Official Response VBWG (2005-03-08)
Requests explanatory text in intro Dominique Hazael-Massieux (2005-03-09)

Issue R93-1

From Dominique Hazael-Massieux:

the example in section 9.3 is not well-formed (missing ending '>' in the root element) [this was found out by extracting the examples from the spec using an XSLT [4]; when the schema/dtd are published, it would be nice to re-use this trick to check that the examples and the formal languages are in sync]

Resolution: accepted

The example code was fixed.

Email Trail:

Original Comment Dominique Hazael-Massieux
Acknowledgement VBWG (2004-07-21)
VBWG Official Response VBWG (2005-03-08)
Requests explanatory text in intro Dominique Hazael-Massieux (2005-03-09)

Issue R94-1

From Dominique Hazael-Massieux:

data_sec: is there any reason why this is done in a processing instruction? process instructions aren't very scalable, have an odd place in the XML infoset, among other things... It looks to me like this security mechanism would be better addressed in a different place altogether - e.g. it would be more scalable to have a way to link to a security policy, rather than (or in addition to?) embedding in the document itself.

Resolution: accepted

The use of a processing instruction to enforce security of the data is a lightweight mechanism that is easy for developers to understand and to implement. It is one of several legitimate mechanisms for enforcing security, and the group has decided to make its description in the spec informative.

Email Trail:

Original Comment Dominique Hazael-Massieux
Acknowledgement VBWG (2004-07-21)
VBWG Official Response VBWG (2005-03-08)
Requests explanatory text in intro Dominique Hazael-Massieux (2005-03-09)

Issue R105-1

From James Wilson:

I have a question. Please can you explain why the mechanism defined in "7. Recording User Utterances While Attempting Recognition" returns a binary waveform rather than a URL that points to the waveform? Is it to avoid firewall issues?
This is in the context of MRCP 2 where a mechanism is provided to save waveforms on a recognition by recognition basis (using the save_waveform parameter). The waveforms are saved on the rec server. The MRCP recognition result does not return the binary waveform to the browser, but a URL that points to it. This would appear to be more efficient.

Resolution: rejected

By using the term 'reference', the specification of the utterance recording feature in VoiceXML 2.1, similar to that of the record feature in 2.3.6 of VoiceXML 2.0 [4], is careful to leave the format of the value of the recording variable and the physical location of the recording as an implementation detail of the platform. The only requirements are:

The browser must be able to play the recording to the user and,
The browser must be able to submit the binary recording to a document server as indicated in the following excerpt from [4]:

"Note that how this variable is implemented may vary between platforms (although all platforms must support its behaviour in <audio> and <submit> as described in this specification)."
These requirements help to enforce a separation between the client, the voice browser in this case, and the server, an architectural principle to which the group believes strongly that all Web-based specifications should adhere.
[4] http://www.w3.org/TR/2004/REC-voicexml20-20040316/#dml2.3.6

Email Trail:

Original Comment James Wilson
Explanation 1 Dave Burke (2004-10-14)
Concerns about efficiency James Wilson (2004-10-14)
Explanation 2: Security Dave Burke (2004-10-14)
VBWG official response to last call issue VBWG (2005-03-03)
Acceptance of VBWG Resolution James Wilson (2005-03-13)

Issue R107-1

From Martin Duerst:

Abstract: "VoiceXML 2.1 specifies a set of features commonly implemented by Voice Extensible Markup Language platforms. This specification is designed to be fully backwards-compatible with VoiceXML 2.0 [VXML2]."
It is not clear to the reader quickly enough that this specification only describes a diff between VoiceXML 2.1 and VoiceXML 2.0. This should be made much clearer.

Resolution: accepted

A sentence was added to the abstract to indicate that the specification describes only the set of additional features. In addition a table of the elements that were added or enhanced was added to the introduction.

Email Trail:

Original Comment Martin Duerst
VBWG Minutes VBWG (2004-10-28) (Internal)
VBWG Official Response VBWG (2004-03-11)
Acceptance of VBWG Resolution Martin Duerst (2004-03-12)

Issue R107-2

From Martin Duerst:

Appendix C: "A conforming VoiceXML document is a well-formed [XML] document that requires only the facilities described as mandatory in this specification and in [VXML2]."
Similar confusion as above. Either VoiceXML 2.1 is the diff, or it is the result of additions. But not both.

Resolution: rejected

While the VoiceXML 2.1 specification describes "the diff" - the small set of additional features that have been frequenty requested and widely implemented - VoiceXML 2.1 is built on top of the foundation described in the VoiceXML 2.0 specification. A conforming VoiceXML 2.1 document is one that meets the requirements described in both the VoiceXML 2.0 and VoiceXML 2.1 specifications.

Email Trail:

Original Comment Martin Duerst
VBWG Minutes VBWG (2004-10-28) (Internal)
VBWG Official Response VBWG (2004-03-11)
Acceptance of VBWG Resolution Martin Duerst (2004-03-12)

Issue R107-3

From Martin Duerst:

Section 2, street example: In usual Web browsers, for internationalization reasons, usually 'address1', 'address2', are used. Is there such practice for Voice applications? If not, how are addresses in various locations around the world handled? It would be highly desirable if this example were fixed so that it could be used as good practice worldwide. Same for citystate.

Resolution: accepted

The example was modified to ask for a country, city, and street address.

Email Trail:

Original Comment Martin Duerst
VBWG Minutes VBWG (2004-10-28) (Internal)
VBWG Official Response VBWG (2004-03-11)
Clarifies desire for more "international" example Martin Duerst (2004-03-12)
VBWG Official Response VBWG (2004-03-24)

2.2 Technical Errors

None.

2.3 Requests for Change to Existing Features

Issue R62-1

From Teemu Tingander:

Chapter 2. Referencing Grammars Dynamically.
I propose the use of attribute srcexpr in <grammar> element. This will leave the expr attribute to be used to evaluate the "grammar" content from javascript content etc. Especially this is handy when data is introduced !

Resolution: accepted

In VoiceXML 2.1, the expr attribute was changed to srcexpr on <grammar>.

Email Trail:

Original Comment Teemu Tingander
Receipt Confirmation VBWG (2004-03-31)
VBWG official response to last call issue VBWG (2004-02-28)

Issue R63-1

From Teemu Tingander:

Chapter 3 Referencing Scripts Dynamically'
Once again I propose attribute srcexpr just to make difference between value for element and 'value that evaluaes to attribute value'..

Resolution: accepted

In VoiceXML 2.1, the expr attribute was changed to srcexpr on <script>.

Email Trail:

Original Comment Teemu Tingander
Receipt Confirmation VBWG (2004-03-31)
VBWG official response to last call issue VBWG (2004-02-28)

Issue R64-1

From Teemu Tingander:

Chapter 3 Using <data> to Fetch XML Without Requiring a Dialog Transition
Once again I propose attribute srcexpr. Expr attribute could be used as it is in var. As data is clearly a some kind of extension of <var> element.

Resolution: accepted

In VoiceXML 2.1, the expr attribute was changed to srcexpr on <data>.

Email Trail:

Original Comment Teemu Tingander
Receipt Confirmation VBWG (2004-03-31)
VBWG official response to last call issue VBWG (2004-02-28)

Issue R65-2

From Teemu Tingander:

Using DOM in <data> is far to complex. I suggest of finding some more simplified structure for returned data. We could use a simple pattern like..
<data name="temp" src....
and as returned:
<data> <property name="a" expr="1"> <property name="b" expr="-1"> <property name="c[0]" expr="'temp'"> <property name="c[1]" expr="'tester'"> </data>
This could then be mapped
into javascript
temp { a = 1; b = -1; c = { [0] = 'temp' [1] = 'tester' } }
And so on.. its easy to use it in this way.. Somehow this could be made in VXML 2.0 with script element too.. Or even use that same mapping we use in SSML to field values

Resolution: rejected

The W3C DOM is the standard object model used for manipulating arbitrary XML data. DOM Level 1 was published as a full recommendation by the W3C in October 1998; DOM Level 2 in November 2000. The development community has broad implementation and usage experience with the DOM.

Email Trail:

Original Comment Teemu Tingander
Receipt Confirmation VBWG (2004-03-31)
VBWG official response to last call issue VBWG (2004-02-28)

Issue R67-1

From Robert Keiller:

Teemu Tingander raises a good question about the naming of the expr attributes for <script> and <grammar>. (Logically the expr attributes on <audio>, <next> and <submit> should also be srcexpr. <subdialog> already uses srcexpr, but expr in this case is an asignment of the subdialog variable, not a definition of the subdialog fetch.) Even if there is no immediate intention to support dynamic grammars via <grammar expr="..."/> where expr evaluates to an actual grammar, it seems a mistake to close off that possibility in future.

Resolution: accepted

In VoiceXML 2.1, the expr attribute was changed to srcexpr on <data>, <grammar>, and <script>.

Email Trail:

Original Comment Robert Keiller
Receipt Confirmation VBWG (2004-04-20)
VBWG official response to last call issue VBWG (2004-02-28)

Issue R104-1

From Ken Waln:

In Section 2, I agree with the comments on this list that "srcexpr" is a better attribute name, both for consistency and in case someday it is desired to use an expression to be the content of the element rather than the source. I would not advocate adding the "expr" attribute in addition as I'd rather see a cleaner way of handling dynamic grammars than using script to put the entire grammar into a variable. How about allowing <value> in an inline XML grammar (although I realize this is more of an SRGS problem at that point or at least there would be an interaction)?

Resolution: accepted

In VoiceXML 2.1, the expr attribute was changed to srcexpr on <grammar>. The ability to define grammars dynamically in the voice browser has been deferred to a future version of VoiceXML.

Email Trail:

Original Comment Ken Waln
VBWG official response to last call issue VBWG (2005-02-28)
Response to VBWG official response Ken Waln (2005-02-28)

Issue R104-6

From Ken Waln:

Section 9 - "consultation" implies that a dialog occurs on the second call leg. If we want to allow that feature, I would also add a <connect> tag to complete the transfer. I think "monitored" or "monitoredblind" might describe it better. An alternative would be to drop this proposal and instead add an "answermode" attribute with values "immediate", "startvoice", "endvoice" etc. There are a lot of variations on single-line transfers and deciding when a call is complete. Far-end answer is not well defined in general, depending on the protocol - our platform currently offer these choices as configuration parameters but sometimes it is necessary to set on a call by call basis (e.g. an international number might behave differently).

Resolution: deferred

Using the type attribute, platform vendors are free to add additional platform-specific transfer types.

Email Trail:

Original Comment Ken Waln
VBWG official response to last call issue VBWG (2005-02-28)
Response to VBWG official response Ken Waln (2005-02-28)

Issue R85-1

From Dan Connoly:

I'm surprised by... "If the XML document specifies an processing instruction, access to the data is allowed based on the following algorithm: ..." -- http://www.w3.org/TR/2004/WD-voicexml21-20040728/#sec-data-security
Last time a processing instruction was used in a W3C spec, it was allowed only after considerable debate...
"The use of XML processing instructions in this specification should not be taken as a precedent. The W3C does not anticipate recommending the use of processing instructions in any future specification." -- http://www.w3.org/1999/06/REC-xml-stylesheet-19990629/
I suggest using a namespace-qualified element or attribute instead.

Resolution: accepted

The VBWG evaluated a number of mechanisms that would enforce the security of the data retrieved by the <data/> element including domain-based restrictions, HTTP_REFERER, HTTP X-Header, XML security envelope, and XML-ENC. The use of a processing instruction to enforce security of the data is a lightweight mechanism that is straightforward for data providers and platform vendors to understand and to implement. The VBWG considered the specification and practical implementation limitations of processing instructions and determined that these did not interfere with the intended behavior of this mechanism.
Upon further review, the VBWG acknowledged that specifying how security policy and resource sandboxing must be implemented went beyond the scope of the working group and therefore chose not to mandate one particular mechanism. However, because resource sandboxing is an important principle for VoiceXML interpreters in certain deployment contexts, and interoperability among implementations should be encouraged, the group chose to document this mechanism in an independent W3C Note.

Email Trail:

Original Comment Dan Connoly
Receipt Confirmation VBWG (2004-07-28)
Explanation of data security model Brad Porter (2004-10-12)
Provider of XML data specifies allowed IP addresses; browser enforces sandboxing Brad Porter (2004-10-12)
Request for reasons not to use PI Brad Porter (2004-10-12)
Clarification of understanding purpose of PI Dave Raggett (2004-10-12)
Reasons not to use PI Dan Connoly (2004-10-12)
namespaced element works too Dan Connoly (2004-10-12)
PI is intended for processor; not programmer Brad Porter (2004-10-12)
PIs aren't bound to namespaces Max Froumentin (2004-10-15)
PIs easily inserted into existing documents Brad Porter (2004-10-15)
Feedback on use of PI for data element Brad Porter (2004-10-20) (Internal)
2.1 Conference Call Minutes Matt Oshry (2004-10-28) (Internal)
VBWG Official Response VBWG (2005-03-10)
Requests to review CR draft Dan Connoly (2005-03-10)
Links to CR draft and explains informative nature of Appendix E Matt Oshry (2005-03-10)
Objects to informative use of PI for security and arg against namespace-qualified el/attr Dan Connoly (2005-03-15)
Request for telecon and further justifies use of PI Brad Porter (2005-03-15)
Explains W3C review process Dan Connoly (2005-03-15)
Another request for telecon Brad Porter (2005-03-16)
Denies telecon request Dan Connoly (2005-03-16)
Request for telecon or proxy Brad Porter (2005-03-16)
Request for review from XML Core Jim Larson (2005-03-16)
Denial of telecon request; redirect to XML Core Dan Connoly (2005-03-16)
XML Core concludes that use of PI is legitimate in this case Paul Grosso (2005-03-17)
Likes PIs to comments Dan Connoly (2005-03-17)
PIs are a valid part of XML. Reiterates legitimacy of use in 2.1 Paul Grosso (2005-03-17)
Request for acceptance or registered objection VBWG (2005-03-29)
Request for formal objection Dan Connoly (2005-03-30)
Acknowledgement of objection VBWG (2005-03-31)
Agree migrate data security appendix to independent W3C Note VBWG (2005-05-09)
Acceptance of VBWG Resolution Dan Connoly (2005-05-10)

Issue R107-4

From Martin Duerst:

URIs: The XML Schema at http://www.w3.org/TR/2004/WD-voicexml21-20040728/vxml-datatypes.xsd containing the segment:
<xsd:simpleType name="URI.datatype"> <xsd:annotation> <xsd:documentation>URI (RFC2396)</xsd:documentation> </xsd:annotation> <xsd:restriction base="xsd:anyURI"/> </xsd:simpleType>
seems to try to restrict anyURIs used in VXML to URIs only. However, there are two problems with this approach:

This is a very poor way of trying to make this restriction, if the restriction is indeed to be made, an actual pattern should be specified.

Such a restriction would rule out the use of IRIs, which would be a very bad idea with respect to internationalization.

So we request that you:

(possibly) add a restriction that just removes space and a few other ASCII characters allowed in anyURI, but neither in URIs nor in IRIs.

Say clearly in the spec that wherever the term URI is used, this isn't restricted to ASCII only, but follows IRIs.

Resolution: accepted

A note was added to Appendix C.3 and to the documentation for URI.datatype in the schema with reference to RFC 3987 and Character Model for the World Wide Web 1.0: Resource Identifiers. Links to these documents were added to the informative reference appendix.

Email Trail:

Original Comment Martin Duerst
VBWG Minutes VBWG (2004-10-28) (Internal)
Request to submit URI/IRI change as 2.0 errata VBWG (2004-11-01) (Internal)
VBWG Official Response VBWG (2005-03-11)
Request for CR to 2.0 and subsequently to 2.1 Martin Duerst (2004-03-12)
Disparity in Reqs between XML Schema RFC 3987 Bjoern Hoehrmann (2004-03-16)
i18n response Addison Phillips (2005-04-22)
Minutes of telecon w/i18n discussing IRI VBWG (2005-04-26)
VBWG Official Response Max Froumentin (2005-05-03)
Acceptance of VBWG Resolution Addison Phillips (2005-05-03)

2.4 New Feature Requests

Issue R66-1

From Robert Keiller:

I am slightly disappointed that the support for <mark> does not go further and support client side audio control. application.lastresult$.marktime will support very simple audio control by sending the marktime as a url parameter in an audio request and having the application server apply offsets to the original audio file. However, there are two important cases where this will not work:

TTS prompts

where several audio files have been queued together (putting a mark on every prompt in the queue and restarting the prompt queue from the last mark would be very awkward solution) I believe that several voice browsers already support greater functionality via non-standard extensions and it seems a pity that these could not be standardised in VoiceXML 2.1.

Resolution: deferred

Client-side audio control has been deferred for a future version of VoiceXML.

Email Trail:

Original Comment Robert Keiller
Receipt Confirmation VBWG (2004-04-20)
VBWG official response to last call issue VBWG (2004-02-28)

Issue R104-7

From Ken Waln:

Could add more event values for completion: "SIT" (special information tone), "answeringmachine", etc. Are these implicitly allowed as platform specific return values or does it need to be explicit?

Resolution: deferred

Standardization of these return values is beyond the scope of VoiceXML 2.1.

Email Trail:

Original Comment Ken Waln
VBWG official response to last call issue VBWG (2005-02-28)
Response to VBWG official response Ken Waln (2005-02-28)