VoiceXML 2.0 Recommendation Errata

Last updated $Date: 2009/04/22 17:09:42 $

This document records known errors in the VoiceXML 2.0 Recommendation:

About the VoiceXML 2.0 Recommendation

The VoiceXML 2.0 Recommendation was produced by the W3C Voice Browser Working Group as part of the activity of the W3C Interaction Domain and has been maintained as part of the activity of the W3C Ubiquitous Web Domain.

This document lists known errata and updates to the Recommendation.

Please send general comments about this document to the public mailing list at www-voice@w3.org. The archive for the list is accessible online.


Added text marked new, added text. Changed text marked changed text. Removed text marked deleted text.

Proposed and Pending Errata

Section 1.5.2 Executing a Multi-Document Application: Assertion 81

Assertion 81: "If an application root document specifies an application root element error.semantic is thrown."

The assertion tests the scenario: An error is thrown when a document A transitions to root B where root B is itself referencing another root. There are three possible actions:

The text and test specify that the proper action is to throw error.semantic where a root application page has an application specified; however, it is not clear at which level it should be thrown. According to §5.3.7 "Note that for errors which occur during a dialog or document transition, the scope in which errors are handled is platform specific"

Most implementations tested in the VoiceXML Forum's Platform Certification Program pass the test as written -- e.g. throw error.semantic.

The VBWG decided that this should be left as stated: throw error.semantic, but not specify where the error is thrown.

A change request has been filed for VoiceXML 3.0.

Section FIA Process phase, Assertion 1106

Assertion 1106 states: "When no <reprompt> is used in conjunction with a <goto nextitem> within an error handler, then the target form item's prompt will not be played."

" does not terminate the FIA (the name suggests an action), but rather just sets a flag that affects the treatment of prompts on the subsequent iteration of the FIA."

A possible contradiction exists with §

"The form item's prompt will be played even if it has already been visited. "

Example implementations:

In §5.3.6 "reprompt element"

"..., the FIA does not generally perform the normal selection and queuing of prompts on the next iteration following the execution of a catch element." 

There are two exceptions listed, and assertion 1106 does not match either of those cases. From the pseudo-code in Appendix C:

unless ( the last loop iteration ended with
a catch that had NO ,
and the active dialog was NOT changed )
Select the appropriate prompts for an input item or .
Queue the selected prompts for play prior to
the next collect operation.

Increment an input item's or 's prompt counter.

There are multiple interpretations of execution behavior; this will be addressed in VoiceXML 3.0.

Section 6.2.1 meta element, Assertion 588

Assertion 588: 

"If the http-equiv attribute is set to "expires" and the content attribute is set to "0", the interpreter MUST not cache the document."

The definition of <meta> references the HTML 4 specification, §7.4.4 contains the reference to <meta>. While the VoiceXML 2.0 Recommendation is ambiguous, (nearly) all implementations tested in the VoiceXML Forum's Platform Certification Program are consistent.

The HTML 4.0 reference is to be replaced by the HTTP1.1 specification (RFC2616). The http-equiv feature is not inherently linked to meta (it isn't really metadata). If an implementation supports meta and http-equiv, it must support it as described.

A change request has been filed for VoiceXML 3.0.

Rejected Proposed Errata


Known errors as of 22 April 2009


E1: Section 1.2.1 Architectural Model, definition of character input

Clarification: The use of "character input" in "user actions (e.g. spoken or character input received, disconnect)" and all other instances in the specification is clarified as "DTMF key input."

Add term to Glossary:

character input

DTMF key input:  digits 0-9, *, #, A, B, C, D

E2: Section Controlling the order of field collection, 2nd paragraph


The form item's prompt will be played even if it has already been visited.


Although several factors may affect whether or not form item prompts are queued for playing (see Section 5.3.6, for example), whether or not the form item has been previously visited will not affect whether prompts are queued.

E3: Section 5.3.12 script element, charset attribute

The 'charset' attribute serves two functions: it informs the application server which character encoding is desired by the VoiceXML browser and it provides a default value for the character encoding when none is otherwise available.  Scripts, unlike XML documents such as SRGS grammars, do not internally specify a character encoding.

When the protocol of the 'src' attribute allows, best practice dictates that the 'charset' attribute be used in content negotiation.  The HTTP 1.1 protocol, for instance, provides the 'Accept-charset' header on request to inform the server which character encoding is desired.  HTTP 1.1 dictates that if the server is unable to provide this encoding, it SHOULD return an HTTP 406 error message (corresponding to error.badfetch.http.406) though it MAY alternatively provide a response in a different character set.

The document returned upon dereferencing the 'src' attribute will  often have an associated media type and an associated charset value.  The VoiceXML browser is responsible for deciding whether these are acceptable.  Although not required by the VoiceXML 2.0 specification, it is suggested in accordance with RFC 4329 and the IANA registry that only the 'application/javascript' and 'application/ecmascript' types be accepted by the <script> element.  As for the returned 'charset', priority MUST be given to the value returned in the reply.  There are three options:

E4: inputmodes and activation of grammars

A conflict exists between the use of the term "will" (improper term) vs. "may" in the following sections:

§ 3.1.4 Activation of Grammars

"Grammar activation is not affected by the inputmodes property. For instance, if the inputmodes property restricts input to just voice, DTMF grammars will may still be activated, but cannot be matched."

§ 6.3.6 Misc. Properties ---definition of "inputmodes" property in Table 63.

"…voice-only grammars may be active when the inputmode is restricted to DTMF. Those grammars would not be matched, however, because the voice input modality is not active."

E5: Clarification of (unclear) term "flush"

All uses of the term "flush" as in "Flush the prompt queue" are to be interpreted as "Play all queued prompts to completion."

... After the audio queue is flushed, the outgoing call is initiated. ...

... After all queued prompts are played to completion, the outgoing call is initiated. ...


...Processing the <disconnect> element will also flush the prompt queue (as described in Section 4.1.8). ...

... Processing the <disconnect> element will also play all queued prompts to completion (as described in Section 4.1.8). ...

E6: Clarification of (inconsistent use of) the term "should"

In Section 4.1.8, the term "should" is replaced with "must" in each of the following cases:

However, DTMF input (including timing information) should must be collected and buffered in the transition state. Similarly, asynchronously generated events not related directly to execution of the transition should must also be buffered until the waiting state (e.g. connection.disconnect.hangup).

E7: Appendix N —Media Type and File Suffix


The W3C Voice Browser Working Group has applied to IETF to register a media type for VoiceXML. The requested media type is application/voicexml+xml.

The W3C Voice Browser Working Group has adopted the convention of using the ".vxml" filename suffix for VoiceXML documents.


The media type for VoiceXML is application/voicexml+xml, as defined by RFC4267 [RFC4267]: W3C Speech Interface Media Types. RFC4267 also assigns the filename suffix '.vxml' for VoiceXML documents.

and add to M.1 normative references:

[RFC4267]" The W3C Speech Interface Framework Media Types ", IETF RFC 4267, 2005.
See http://www.ietf.org/rfc/rfc4267.txt

E8: Appendix O —VoiceXML XML Schema Definition, $ within variable name

The VoiceXML 2.0 schema. The RestrictedVariableName datatype defined in vxml-datatypes.xsd has been corrected to allow '$' within the variable name:

 <xsd:simpleType name="RestrictedVariableName.datatype">
<xsd:documentation>Variable name which doesn't start with "_"
or number, doesn't end with '$' and doesn't contain ".".
Additional constraints: must follow ECMAScript variable naming
conventions; not include ECMAScript reserves words</xsd:documentation>
<xsd:restriction base="xsd:NMTOKEN">
<xsd:restriction base="xsd:token">
<xsd:pattern value="([\p{L}\p{Nl}$]|[\p{L}\p{Nl}$][\p{L}\p{Nl}\p{Nd}\p{Mn}\p{Mc}\p{Pc}$_]*[\p{L}\p{Nl}\p{Nd}\p{Mn}\p{Mc}\p{Pc}_])"/>


E9: Replace XML 1.0 with XML 1.1

XML 1.0 is replaced by XML 1.1.

Extensible Markup Language (XML) 1.1 (Second Edition), T. Bray et al. World Wide Web Consortium, 16 August 2006, edited in place 29 September 2006. This version of the XML 1.1 Recommendation is http://www.w3.org/TR/2006/REC-xml11-20060816/. The latest version is available at http://www.w3.org/TR/xml11/.

E10: Replace URL with IRI

URLs are replaced by IRIs.

Internationalized Resource Identifiers (IRIs), M. Duerst and M. Suignard, Editors. IETF, January 2005. This RFC is available at http://www.ietf.org/rfc/rfc3987.txt

E11: Typo fix - Title of Section 4.1.8 and in Section 5.3.6

Replace "queueing" with "queuing"

E12: Typo fix - Section 5.3.10

Code example original text (spelling error)

<prompt> Please say Social Securityy number.</prompt>
Replace with:
<prompt> Please say Social Security number.</prompt>

E13: Informative clarification of prompt queuing and <disconnect> handling

Assertion ID Spec section Abstract
89 1.5.4 A VoiceXML interpreter may continue executing even after it no longer has a connection to the user.
552 5.3.11 When the interpreter executes a disconnect element, it must drop the call.
553 5.3.11 When the interpreter executes a disconnect element, it must throw a catchable connection.disconnect.hangup event.

Assertion 89 implies the disconnect event would get queued. Assertion 553 implies the disconnect event would be thrown immediately.

To clarify what's described in §5.3.11:

Upon entering the waiting state, prompts cannot be played but queueing is possible. There's a difference between prompts that are queued before the event was thrown, and those that are queue afterwards. The prompts queued before are played to completion; those queued afterward are not played because the connection no longer exists. Once the interpreter has entered the final processing state, the interpreter cannot go back to the waiting state (though the application can keep throwing events and catching them).

Note: Disconnect only occurs while the interpreter is in the waiting state.

Kazuyuki Ashimura - Voice Browser Activity Lead