W3C

XHTML+Voice Profile 1.0

W3C Note 21 December 2001

This Version:
http://www.w3.org/TR/2001/NOTE-xhtml+voice-20011221
Latest Version:
http://www.w3.org/TR/xhtml+voice
Editors:
Jonny Axelsson, Opera Software <jax@opera.no>
Chris Cross, IBM <xcross@us.ibm.com>
Håkon W. Lie, Opera Software <howcome@opera.no>
Gerald McCobb , IBM <mccobb@us.ibm.com>
T. V. Raman, IBM <tvraman@us.ibm.com>
Les Wilson, IBM <lesw@us.ibm.com>

Abstract

Profile XHTML+Voice brings spoken interaction to standard WWW content by integrating a set of mature WWW technologies such as XHTML and XML Events with XML vocabularies developed as part of the W3C Speech Interface Framework. The profile includes voice modules that support speech synthesis, speech dialogs, command and control, speech grammars, and the ability to attach Voice handlers for responding to specific DOM events, thereby re-using the event model familiar to web developers. Voice interaction features are integrated directly with XHTML and CSS, and can consequently be used directly within XHTML content.

The XHTML+Voice profile is designed for Web clients that support visual and spoken interaction. To this end, this document first re-formulates VoiceXML 2.0as a collection of modules. These modules, along with Speech Synthesis Markup Language and Speech Recognition Grammar Format are then integrated with XHTML using XHTML modularization to create the XHTML+Voice profile. Finally, we integrate the result with module XML-Events so that voice handlers can be invoked through a standard DOM2 EventListener interface.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document.

Note that the language profile described in this specification re-uses W3C working drafts that are likely to change. This integration profile will be updated as needed to use the final stable versions of these specifications.

This document is a submission to the World Wide Web Consortium (see Submission Request, W3C Staff Comment). For a full list of all acknowledged Submissions, please see Acknowledged Submissions to W3C.

This document is a NOTE made available by the W3C for discussion only. Publication of this Note by W3C indicates no endorsement by W3C or the W3C Team, or any W3C Members. No W3C resources were or are allocated to the issues addressed by the NOTE. W3C has had no editorial control over the preparation of this NOTE.

A list of current W3C technical documents can be found at the Technical Reports page.

Table of Contents

1 Introduction
    1.1 Motivation And Applications
    1.2 Design Rationale
2 Voice Modules
    2.1 Modularization Of VoiceXML 2.0
    2.2 Speech And Non-speech Audio Output
    2.3 Speech Dialogs
    2.4 Speech Grammars
    2.5 VoiceXML Event Types
    2.6 VoiceXML Event Handlers
3 Normative Definition Of Profile XHTML+Voice
    3.1 Document Conformance
    3.2 User Agent Conformance
    3.3 XHTML Namespace Integration
    3.4 XHTML+Voice Profile
    3.5 XHTML+Voice Modules
    3.6 Event types for XHTML+Voice
4 Extending Profile XHTML+Voice

Appendices

A Reusable Voice Handlers
B Examples
    B.1 Basic Structure Of XHTML+Voice Documents
    B.2 What You See Is What You Can Say
    B.3 Mixed-initiative Conversational Interface
    B.4 Speech-Enabled Mail Interface
    B.5 Reusable Voice Subdialogs
C DTD
    C.1 xhtml+voice10.dtd
D Schema
E References
    E.1 Normative References
    E.2 Informative References


1 Introduction

This section is informative.

The purpose of XHTML modularization [XHTML Modularization] (as expressed in XHTML 1.1 [XHTML 1.1] ) is to serve as the basis for future extended XHTML family document types, and to provide a consistent, forward-looking document type that is cleanly separated from the deprecated, legacy functionality of HTML 4. Thus, the XHTML 1.1 document type is essentially a reformulation of XHTML 1.0 Strict [XHTML 1.0] using XHTML Modules [XHTML Modularization].

Module XML-Events [XML Events] provides XML host languages the ability to uniformly integrate event listeners and associated event handlers with Document Object Model (DOM) Level 2 [DOM2 Events]event interfaces . The result is to provide XHTML based languages an event syntax that enables an interoperable way of associating behaviors with document-level markup.

VoiceXML 2.0 [VoiceXML 2.0] and the other XML vocabularies making up the W3C speech interface framework have been designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. In this document, we first modularize VoiceXML 2.0 to prepare it for integration into the XHTML family of languages using the XHTML modularization framework. We then integrate the resulting voice modules along with the XML events module into XHTML by defining an XHTML+Voice profile. This specification describes the VoiceXML modules that are added to XHTML and details the integration issues. The modularization of VoiceXML 2.0 also specifies DOM event types specific to voice interaction for use with the XHTML Events module. Speech dialogs authored in VoiceXML 2.0 can then be treated as event handlers that add voice-interaction specific behaviors to XHTML documents. The language integration supports all of the modules defined in XHTML Modularization, and adds speech interaction functionality to XHTML elements to enable multimodal applications. The document type defined by the XHTML+Voice profile is XHTML Host language document type conformant. A primary goal is to enable the integration of voice interaction into XHTML Basic for use on thin clients, while scaling up to today's desktop browsers.

1.1 Motivation And Applications

This note outlines how a set of mature WWW technologies including XHTML 1.1 [XHTML 1.1], VoiceXML 2.0 [VoiceXML 2.0], Speech Synthesis Markup Language [SSML 1.0], Speech Recognition Grammar Format [SRGF] and XML-Events [XML Events] can be integrated using XHTML modularization [XHTML Modularization] to bring spoken interaction to the WWW. The design leverages open industry APIs like the W3C DOM to create interoperable web content that can be deployed across a variety of end-user devices. Multiple modes of interaction are synchronized and integrated using the DOM2 Events model [DOM2 Events] and exposed to the content author via XML Events.

Today, WWW applications are authored in XHTML with user interaction created via XHTML form elements. W3C is presently working on XForms [XForms], the next generation of web forms that bring the power of XML to WWW application development. The combination of XHTML and voice described in this specification can leverage the semantic richness of web applications created using XForms, while providing a smooth transition for today's web developers wishing to deploy multimodal applications by adding spoken interaction to present-day web content. Integrating the work of the W3C voice browser working group into mainstream XHTML content has the additional advantage of being able to take advantage of future enhancements in the W3C speech interface framework such as natural language understanding. Thus, we provide a smooth transition path for web developers wishing to deliver increasingly smart user interaction for their WWW applications. At the same time, building on XHTML Basic [XHTML Basic] and XHTML modularization ensures that content developers will be able to deploy their content to a wide variety of end-user clients ranging from mobile phones and small PDAs to desktop browsers.

Using the functionality provided by the voice modules, this profile adds speech interaction functionality to standard user interface controls in XHTML. This provides an easy means of speech-enabling WWW applications by allowing Web developers to add voice interaction to standard WWW content. VoiceXML elements and constructs are included to permit the Web author easily create spoken interaction for specific parts of a standard WWW application. The integration provides a smooth means for moving from see-only WWW applications to WWW content that supports both visual and spoken interaction. Such combined (multimodal) interaction is crucial for next-generation multimodal devices. By integrating spoken interaction into the present WWW application authoring paradigm, this profile lowers the entry barrier for WWW developers wishing to add voice interaction to the visual WWW.

1.2 Design Rationale

This section provides the design rationale used to decide how we modularize VoiceXML 2.0. The goal is to modularize VoiceXML in a manner that permits the creation of profiles that match different application deployment environments. As an example, PDAs might not wish to include all of the telephony features from VoiceXML 2.0. To reflect the predominantly visual nature of today's WWW, we have chosen to make XHTML the host language; as a consequence, those parts of VoiceXML 2.0 that relate to the VoiceXML document being a stand-alone speech application are dropped from the XHTML+Voice profile.

2 Voice Modules

This section first modularizes VoiceXML 2.0 and then specifies the various voice modules used in the creation of the XHTML+Voice profile.

2.1 Modularization Of VoiceXML 2.0

The files making up the modularization of the VoiceXML 2.0 DTD are available as xhtml+voice-dtd.zip and have been created to ease the process of integrating VoiceXML 2.0 and XHTML. These modules do not change the VoiceXML 2.0 language as specified by the voice browser working group of the W3C. This section gives a high-level overview of each module.

File Module Purpose Elements XHTML+Voice
voicexml-events-1.mod Events Event types dispatched by Voice processor catch help noinput nomatch error throw Y
voicexml-exec-1.mod Executable statements Statements for use in voice handlers assign clear var log reprompt Y
voicexml-filled-1.mod Filled Voice handlers invoked when a slot is filled. filled Y
voicexml-flow-1.mod Flow control Flow control constructs from VoiceXML if else elseif return Y
voicexml-form-1.mod Dialogs Encapsulate voice dialogs form field record subdialog block initial option Y
voicexml-misc-1.mod Miscellaneous Non-local transfers in VoiceXML exit goto link script submit N
voicexml-menu-1.mod Menus VoiceXML menus menu choice enumerate N
voicexml-object-1.mod Object Foreign objects for VoiceXML object N
voicexml-resource-1.mod Resources Specifying voice resources param property Y
voicexml-root-1.mod Root VoiceXML stand-alone documents vxml meta N
voicexml-ssml-1.mod SSML Speech and audio output prompt value audio emphasis voice break prosody say-as phoneme paragraph p sentence s mark Y
voicexml-telephony-1.mod Telephony Telephony control transfer disconnect N
voicexml-grammar-1.mod SRGF Speech input constructs from VoiceXML grammar count example token import item one-of rule ruleref Y
voicexml-attribs-1.mod Attributes Common attributes used in VoiceXML Y
voicexml-datatypes-1.mod Datatypes Common datatypes used in VoiceXML N
voicexml-framework-1.mod Framework Creates modular framework for inclusion of other modules N
voicexml-notations-1.mod Notations Defines XML and SGML notations N
voicexml-qname-1.mod QNames Parameters and entities for qualified names (qnames) Y
voicexml20-model-1.mod Document Model Defines content model for VoiceXML elements Y

2.2 Speech And Non-speech Audio Output

Module voicexml-ssml-1.mod defines constructs for producing spoken and non-spoken audio output. These constructs are normatively defined in the SSML specification [SSML 1.0]. These constructs are used to author spoken prompts within voice handlers.

2.3 Speech Dialogs

Modules voicexml-exec-1.mod, voicexml-filled-1.mod, voicexml-resource-1.mod, voicexml-flow-1.mod, and voicexml-form-1.mod are used to author handlers that implement speech dialogs.

2.4 Speech Grammars

Module voicexml-grammar-1.mod provides constructs for authoring speech grammars. Speech grammars are normatively specified by the speech grammar specification [Speech Grammars].

2.5 VoiceXML Event Types

Module voicexml-events-1.mod declares the event types defined in VoiceXML 2.0 These event types are used in creating event listeners that respond to speech events.

2.6 VoiceXML Event Handlers

Modules voicexml-filled-1.mod, voicexml-flow-1.mod, voicexml-exec-1.mod, and voicexml-resource-1.mod declare constructs for use within voice handlers. The semantics of these constructs are as defined in the VoiceXML 2.0 specification.

3 Normative Definition Of Profile XHTML+Voice

This section is normative.

3.1 Document Conformance

A conforming XHTML+Voice document is a document that requires only the facilities described as mandatory in this specification. Such a document must meet all of the following criteria:

  1. It must validate against the XML Schema found in schema provided in this document.

  2. The root element of the document must be html.

  3. The name of the default namespace on the root element must be the XHTML namespace name: http://www.w3.org/1999/xhtml.

  4. If a DOCTYPE declaration is present and includes a public identifier, the DOCTYPE declaration must reference the DTD provided in this document using its Formal Public Identifier. The system identifier may be modified appropriately.

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+Voice//EN" "http://www.w3.org/Voice/Group/2001/xhtml+voice10.dtd">

3.2 User Agent Conformance

The user agent must conform to the "User Agent Conformance" section of the XHTML specification ([XHTML 1.0], section 3.2) and the conformance requirements detailed in the VoiceXML modules ([VoiceXML 2.0]) supported by the integration profile.

The user agent must conform to the following additional user agent rule:

  1. When the user agent claims to support facilities defined within the VoiceXML 2.0 specifications or facilities required by this specification through normative reference, it must do so in ways consistent with the facilities' definition.

3.3 XHTML Namespace Integration

In an XHTML document incorporating the voice functionality defined by the XHTML+Voice profile, the document's default XML namespace is still XHTML. Voice elements are included through an additional VXML namespace declaration:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:vxml="http://www.w3.org/2001/voicexml20">

The name of the unique identifier for the namespace within the document (in this example, vxml) is left to the discretion of the document author.

3.4 XHTML+Voice Profile

The XHTML functionality in the XHTML+Voice document type is based upon the XHTML modules defined in XHTML Modularization [XHTML Modularization]. The XHTML+Voice profile includes the XHTML modules defined in [XHTML Basic], such as the basic XHTML forms and tables modules. In addition, the XHTML+Voice document type supports the XHTML scripting module, and XML Events as defined by the XML Events module, [XML Events]. Finally, elements defined in the imported VoiceXML modules provide the ability to speech-enable XHTML constructs, and the VoiceXML event types and handlers allow the XHTML author to associate voice-interaction specific behaviors. The notation, terms and document conventions used here are borrowed from [XHTML 1.1].

The profile includes the following voice modules:

3.5 XHTML+Voice Modules

XHTML 1.1 is extended with voice modules by creating a new content model based on the XHTML 1.1 content model. The modifications include adding VoiceXML 2.0 with its content model, datatypes, and attributes to XHTML. This section specifies the modules needed to extend XHTML 1.1 with XML vocabularies defined as part of the W3C speech interface framework and create the XHTML+Voice profile.

File Module Purpose
xhtml+voice-model-1.mod XHTML+Voice Document Model Defines content model based on XHTML Basic for elements in XHTML+Voice
xhtml+voice-framework-1.mod Framework Includes the necessary modules for creating the XHTML+Voice profile
xhtml+voice-datatypes-1.mod Datatypes Imports VoiceXML datatypes into XHTML
xhtml+voice10.dtd DTD XHTML+Voice DTD
xhtml+voice.cat Catalog Catalog fragment for use with profile XHTML+Voice

3.6 Event types for XHTML+Voice

For a given XML language extended with XML Events, a set of event types must be specified independently of the [XML Events] module. The XML Event types supported by the XHTML+Voice profile includes all event types defined for [HTML 4.01] intrinsic events. VoiceXML handler activation is specified by including with an XHTML element one of these event types as an XML event, and an ID reference to the VoiceXML form as an XML event handler. The XHTML+Voice profile also supports VoiceXML 2.0 event types nomatch, noinput, error, and help. An additional event type, filled, is defined to have the same semantics as the VoiceXML element filled. Event filled is generated on the field or form level when a field is set after the prompted input matches the provided grammar.

Profile XHTML+Voice extends the XHTML script element with XML Events. Element script element does not generate any events of its own; hence attribute target is required to specify capturing an XML event. Element script can target any XHTML or VoiceXML element and can specify any HTML 4.01 intrinsic event or VoiceXML event.

The following table shows the correspondence between the XHTML+Voice event types with the XHTML or VoiceXML elements that support them:

Elements Event Type
XHTML body onload, onunload
Most XHTML elements onclick, ondblclick, onmousedown, onmouseup, onmouseover, onmouseout, onkeypress, onkeydown, onkeyup
VoiceXML form nomatch, noinput, error, help, filled
XHTML elements: a, label, input, select, textarea, button onfocus, onblur
XHTML form onsubmit, onreset
XHTML elements: input, textarea onselect
XHTML elements: input, select, textarea onchange

4 Extending Profile XHTML+Voice

This section is normative.

In the future, profile XHTML+Voice may be extended by other W3C recommendations, or by private extensions. For these extensions, the following rules must be obeyed:

Conformant XHTML+Voice user agents should be prepared to handle documents containing extensions that obey these two rules.

A Reusable Voice Handlers

This section is informative.

A VoiceXML form, defined here as an event handler, is more practical if it can be placed in a linked document separate from the XHTML as a reusable component. Reusable components allow easier maintenance, and provide default behaviors that can be used as application building-blocks. VoiceXML includes a subdialog construct and its calling convention is close to what is required for a reusable component. The problem is that the caller must know both the subdialog's parameters and the fields included in the ECMAScript object returned to the caller.

It is not within the scope of this profile to attempt to solve the problem of creating reusable dialog components within VoiceXML; this is the domain of the W3C Voice Working Group. Authoring conventions can, however, be suggested which should work in most cases. A VoiceXML handler can be placed in a separate file and linked from within an XHTML+Voice profile document if:

The appendix includes an example of how a subdialog can be reused by following the above authoring conventions.

B Examples

This section is informative.

B.1 Basic Structure Of XHTML+Voice Documents

        
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML +Voice//EN" "xhtml+voice10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:vxml="http://www.w3.org/2001/voicexml20" xmlns:ev="http://www.w3.org/2001/xml-events">
  <head>
    <title>Skeleton XHTML+Voice Document</title>
<!-- voice handlers -->
    <vxml:form id="sayHello">
      <vxml:block>Hello World</vxml:block>
    </vxml:form>
  </head>
  <body>
    <h1>Skeleton XHTML+Voice Document</h1>
    <p ev:event="onclick" ev:handler="#sayHello">
      This is a sample document that illustrates the markup
      structure of a conformant XHTML+Voice document.
      Notice that the default XML namespace is XHTML --and
      consequently, standard HTML element names do not need
      a namespace prefix.  We can add voice-interaction
      specific elements from the Voice XML 2.0 namespace
      using prefix <code>vxml</code>.  We can attach event
      handlers using prefix <code>ev</code>.  Clicking
      anywhere on this paragraph results in a welcome
      message being spoken on account of attaching a
      <code>vxml:form</code> handler to this paragraph.
    </p>
  </body>
</html>



      

B.2 What You See Is What You Can Say

        
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD  XHTML+Voice //EN" "xhtml+voice10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:vxml="http://www.w3.org/2001/voicexml20" xmlns:ev="http://www.w3.org/2001/xml-events">
  <head>
    <title>What You See Is What You Can Say</title>
<!-- first declare the voice handlers. -->
    <vxml:form id="voice_city">
      <vxml:field name="field_city">
        <vxml:grammar src="city.srgf" type="application/x-srgf"/>
        <vxml:prompt id="city_prompt">
          Please choose a city.
        </vxml:prompt>
        <vxml:catch event="help nomatch noinput">
          For example, say Chicago.
        </vxml:catch>
      </vxml:field>
    </vxml:form>
    <vxml:form id="voice_hotel">
      <vxml:field name="field_hotel">
        <vxml:grammar src="hotel.srgf" type="application/x-srgf"/>
        <vxml:prompt id="hotel_prompt">
          Select your hotel
        </vxml:prompt>
        <vxml:catch event="help nomatch noinput">
          For example, say Hilton.
        </vxml:catch>
        <vxml:filled>
          <vxml:prompt>
            You have chosen to stay at the 
            <vxml:value expr="field_hotel"/>.
          </vxml:prompt>
        </vxml:filled>
      </vxml:field>
    </vxml:form>
<!-- done voice handlers. -->
  </head>
  <body>
    <h1>What You See Is What You Can Say</h1>
    <p>This example demonstrates a simple voice-enabled GUI
      hotel picker  that permits the user to provide input
      using traditional GUI input peripherals,
      or speak the same information.
    </p>
    <h2>Hotel Picker</h2>
    <form id="hotel_query" method="post" action="cgi/hotel.pl">
      <p>Select a hotel in a city:</p>
      <input name="city" type="text" ev:event="onfocus" ev:handler="#voice_city"/>
      <input name="hotel" type="text" ev:event="onfocus" ev:handler="#voice_hotel"/>
<!-- Declare xhtml script handlers for setting inputs -->
      <script ev:target="#voice_city" ev:event="vxml:filled">
        city = field_city;
      </script>
      <script ev:target="#voice_hotel" ev:event="vxml:filled">
        hotel = field_hotel;
      </script>
<!-- done xhtml script handlers -->
      <input type="submit" value="Submit"/>
      <input type="reset"/>
    </form>
  </body>
</html>



      

B.3 Mixed-initiative Conversational Interface

        
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+Voice //EN" "xhtml+voice10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:vxml="http://www.w3.org/2001/voicexml20" xmlns:ev="http://www.w3.org/2001/xml-events">
  <head>
    <title>Mixed Initiative Conversational Interface</title>
<!-- first declare the voice handlers. -->
<!-- VXML form supporting a mixed-initiative grammar -->
    <vxml:form id="voice_city_hotel">
      <vxml:grammar src="city_hotel.srgf" type="application/x-srgf"/>
<!-- Mixed initiative form begins with initial prompt -->
      <vxml:initial name="start">
        <vxml:prompt>
             Please choose a city and hotel where you wish to stay.
          </vxml:prompt>
        <vxml:help>
            Please say the name of a city and a hotel to make 
            a reservation.
          </vxml:help>
<!-- If user is silent, reprompt once, then try 
               directed prompts. -->
        <vxml:noinput count="1">
          <vxml:reprompt/>
        </vxml:noinput>
        <vxml:noinput count="2">
          <vxml:reprompt/>
          <vxml:assign name="start" expr="true"/>
        </vxml:noinput>
      </vxml:initial>
      <vxml:field name="field_city">
        <vxml:grammar src="city.srgf" type="application/x-srgf"/>
        <vxml:prompt id="city_prompt">
             Please choose a city.
          </vxml:prompt>
        <vxml:catch event="help nomatch noinput">
            For example, say Chicago.
          </vxml:catch>
        <vxml:filled>
<!-- Use assign to set the xhtml input -->
          <vxml:assign name="document.city" expr="field_city"/>
        </vxml:filled>
      </vxml:field>
      <vxml:field name="field_hotel">
        <vxml:grammar src="hotel.srgf" type="application/x-srgf"/>
        <vxml:prompt id="hotel_prompt">
              Select your hotel
          </vxml:prompt>
        <vxml:catch event="help nomatch noinput">
            For example say Hilton.
          </vxml:catch>
        <vxml:filled>
          <vxml:prompt>
                You have chosen to stay at the 
<vxml:value expr="field_hotel"/>.
	        </vxml:prompt>
          <vxml:assign name="document.hotel" expr="field_hotel"/>
        </vxml:filled>
      </vxml:field>
    </vxml:form>
<!-- done voice handlers -->
  </head>
  <body>
    <h1>Mixed-Initiative Conversational Interface</h1>
    <p>In this example, we demonstrate how the earlier example can
       be easily extended to support mixed-initiative dialog.  By 
       activating a grammar capable of recognizing both cities and
       hotel names for the entire application, the user can specify
       both hotel and city in a single utterance.  Alternatively,
       the user can fill one field at a time.
    </p>
    <h2>Hotel Picker</h2>
    <p>This voice-enabled application lets you pick a 
       city and a hotel.
    </p>
    <form id="xhtml_city_hotel" method="post" action="cgi/hotel.pl">
      <p>Select a hotel in a city:</p>
      <input name="city" type="text" ev:event="onfocus" ev:handler="#voice_city_hotel"/>
      <input name="hotel" type="text"/>
      <input type="submit" value="Submit"/>
      <input type="reset"/>
    </form>
  </body>
</html>



      

B.4 Speech-Enabled Mail Interface

This email message from the W3C voice browser working group archives has been speech-enabled to allow easy browsing of email on hand-held devices.

        
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+Voice //EN" "xhtml+voice10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:vxml="http://www.w3.org/2001/voicexml20" xmlns:ev="http://www.w3.org/2001/xml-events">
  <head>
    <title>Speech-enabled Email Browser</title>
    <script language="javascript">
      // define array holding command words -&gt; activate-id map.
      //
      //define function that takes a command word,
      // looks it up in the afore-mentioned map,
      // and activates the link.
      //function activate (command) {
      //...
      //
    </script>
    <script ev:target="#command-and-control" ev:event="filled">
      activate(word.value);
    </script>
    <vxml:form id="command-and-control">
<!-- your word is my command. -->
      <vxml:field name="word">
        <vxml:grammar src="mail.srgf"/>
        <vxml:catch event="help nomatch">
            This mail reader is speech-enabled. You can
            perform available actions via speech input.
          </vxml:catch>
      </vxml:field>
    </vxml:form>
  </head>
  <body ev:event="onload" ev:handler="#command-and-control"><h1>W3C Speech Interaction Framework</h1><strong>From:</strong> T. V. Raman
<a href="mailto:tvraman@us.ibm.com?Subject=Re:%20W3C%20Speech%20Interface%20Framework"><em>tvraman@us.ibm.com</em></a>)<br/><strong>Date:</strong> Sat, Jan 01 2000 
    <ul class="noindent"><li><strong>Next message:</strong><a id="__next_message" href="0093.html">
 mxd@cisco.com: &quot;Re: [dialog] &lt;record&gt;'s dest attribute&quot;
          </a></li></ul><ul><li><strong>Previous message:</strong><a id="__prev_message" href="0091.html">
 Harish Varanasi: &quot;RE: [ dialog ] &lt;record&gt;'s dest attribute&quot;
          </a></li><li><strong>Messages sorted by:</strong><a id="__sort_by_date" href="index.html#92">
              [ date ]</a><a id="__sort_by_thread" href="thread.html#92">
              [ thread ]</a><a id="__sort_by_subject" href="subject.html#92">
              [ subject ]</a><a href="author.html#92">[ author ]</a></li><li><strong>Other mail archives:</strong><a id="__more_from_this_list" href="../">
            [ this mailing list ]</a><a id="__ohter_w3c_lists" href="../../">
            [ other W3C mailing lists ]</a></li><li><strong>Mail actions:</strong><a id="__reply_to_this_message">    href=&quot;mailto:w3c-voice-wg@w3.org&quot;
          [ respond to this message ]</a><a id="__mail_new_topic" href="mailto:w3c-voice-wg@w3.org">
     [ mail a new topic ]</a></li></ul><hr noshade="noshade"/><pre>
Message body was here.
    </pre><hr noshade="noshade"/></body>
</html>



      

B.5 Reusable Voice Subdialogs

A flight query is processed with two reusable voice subdialogs. One subdialog processes both arrival and departure city or airport, the other arrival and departure dates.

        
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+Voice //EN" "xhtml+voice10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:vxml="http://www.w3.org/2001/voicexml20" xmlns:ev="http://www.w3.org/2001/xml-events">
  <head>
    <title>Flight Query</title>
    <script src="cityorairport.es">
        var objCityOrAirport = new CityOrAirport();
      </script>
    <script src="dateinfo.es">
	  var objDateInfo = new DateInfo();
	</script>
    <vxml:form id="voice_city_from">
      <vxml:subdialog name="cityorairport" src="cityorairport.vxml">
        <vxml:param name="paramSubdialogObj" expr="objCityOrAirport"/>
        <vxml:param name="paramPromptQuestion" expr="'What city or airport are you departing from?'"/>
        <vxml:filled>
          <vxml:prompt>
            You are departing from
<value expr="cityorairport.returnCityOrAirport"/>.
            </vxml:prompt>
          <vxml:assign name="document.from" expr="cityorairport.returnCityOrAirport"/>
        </vxml:filled>
      </vxml:subdialog>
    </vxml:form>
    <vxml:form id="voice_city_to">
      <vxml:subdialog name="cityorairport" src="cityorairport.vxml">
        <vxml:param name="paramSubdialogObj" expr="objCityOrAirport"/>
        <vxml:param name="paramPromptQuestion" expr="'At what city or airport are you arriving?'"/>
        <vxml:filled>
          <vxml:prompt>
            You are arriving at
<value expr="cityorairport.returnCityOrAirport"/>.
            </vxml:prompt>
          <vxml:assign name="document.to" expr="cityorairport.returnCityOrAirport"/>
        </vxml:filled>
      </vxml:subdialog>
    </vxml:form>
    <vxml:form id="voice_date_from">
      <vxml:subdialog name="dateinfo" src="dateinfo.vxml">
        <vxml:param name="paramSubdialogObj" expr="objDateInfo"/>
        <vxml:param name="paramPromptQuestion" expr="'What day, month, and year are you leaving?'"/>
        <vxml:filled>
          <vxml:prompt>
            You are departing on <value expr="dateinfo.returnDateInfo"/>.
            </vxml:prompt>
          <vxml:assign name="document.fromDate" expr="dateinfo.returnDateInfo"/>
        </vxml:filled>
      </vxml:subdialog>
    </vxml:form>
    <vxml:form id="voice_date_to">
      <vxml:subdialog name="dateinfo" src="dateinfo.vxml">
        <vxml:param name="paramSubdialogObj" expr="objDateInfo"/>
        <vxml:param name="paramPromptQuestion" expr="'What day, month, and year are you arriving?'"/>
        <vxml:filled>
          <vxml:prompt>
            You are arriving on <value expr="dateinfo.returnDateInfo"/>.
		  <vxml:assign name="document.toDate" expr="dateinfo.returnDateInfo"/>
            </vxml:prompt>
        </vxml:filled>
      </vxml:subdialog>
    </vxml:form>
  </head>
  <body>
    <h1>Multimodal Flight Query</h1>
    <form method="post" action="/servlet/flightServlet">
      <table border="0" summary="Leave and return airport, date, and time">
        <tr>
          <td width="15%">
            <label for="from">Leaving From:</label>
          </td>
          <td colspan="2">
            <input type="text" id="from" size="20" ev:event="onclick" ev:handler="#voice_city_from"/>
          </td>
        </tr>
        <tr>
          <td width="15%">
            <label for="to">Arriving At:</label>
          </td>
          <td colspan="2">
            <input type="text" id="to" size="20" ev:event="onclick" ev:handler="voice_city_to"/>
          </td>
        </tr>
        <tr>
          <td width="15%">
            <label for="fromDate">Travel Date:</label>
          </td>
          <td width="35%">
            <input type="text" id="fromDate" size="20" ev:event="onclick" ev:handler="voice_date_from"/>
          </td>
          <td width="50%">
            <div class="c1">
              <label>Time of Day:</label>
              <br/>
              <table width="100%" border="0" summary="leave am or pm">
                <tr>
                  <td align="left">
                    <input type="checkbox" id="departam" value="checkbox"/>
                    <label for="departam">am</label>
                  </td>
                  <td align="left">
                    <input type="checkbox" id="departpm" value="checkbox"/>
                    <label for="departpm">pm</label>
                  </td>
                </tr>
              </table>
            </div>
          </td>
        </tr>
        <tr>
          <td width="15%">
            <label for="toDate">Return Date:</label>
          </td>
          <td width="35%">
            <input type="text" id="toDate" size="20" ev:event="onclick" ev:handler="voice_date_to"/>
          </td>
          <td width="50%">
            <div class="c1">
              <label>Time of Day:</label>
              <br/>
              <table width="100%" border="0" summary="return am or pm">
                <tr>
                  <td align="left">
                    <input type="checkbox" id="departam2" value="checkbox"/>
                    <label for="departam2">am</label>
                  </td>
                  <td align="left">
                    <input type="checkbox" id="departpm2" value="checkbox"/>
                    <label for="departpm2">pm</label>
                  </td>
                </tr>
              </table>
            </div>
          </td>
        </tr>
      </table>
      <br/>
      <table align="center">
        <tr>
          <td align="center" width="80%">
            <input type="submit" value="Submit"/>
          </td>
          <td>
            <input type="reset"/>
          </td>
        </tr>
      </table>
    </form>
  </body>
</html>



      

C DTD

This section defines the DTD used to formally define the XHTML+Voice integration profile. This section is normative.

C.1 xhtml+voice10.dtd

The individual modules making up the DTD for profile xhtml+voice10 along with the top-level driver file are packaged together and available with this note --see xhtml+voice-dtd.zip.

D Schema

This section defines the formal XML Schema used to define the XHTML+Voice profile. This section is normative.

The files defining the XHTML+Voice profile are available as a zip archive (xhtml+voice-schema.zip with this note.

E References

E.1 Normative References

XForms
XForms 1.0 , Micah Dubinko, Josef Dietl, Roland Merrick,Dave Raggett, T. V. Raman, Linda Bucsay Welsh 2001
XHTML Basic
XHTML Basic , 19 December 2000, Mark Baker, Masayasu Ishikawa, Shinichi Matsui, Peter Stark, Ted Wugofski, Toshihiko Yamakami
CSS2
Cascading Style Sheets, level 2 (CSS2) Specification, Bert Bos, Håkon Wium Lie, Chris Lilley, Ian Jacobs, 1998. W3C Recommendation available at: http://www.w3.org/TR/REC-CSS2.
DOM2 Events
Document Object Model (DOM) Level 2 Events Specification, Tom Pixley, 2000. W3C Recommendation available at: http://www.w3.org/TR/DOM-Level-2-Events/.
HTML 4.01
HTML 4.01 Specification, Dave Raggett, Arnaud Le Hors, Ian Jacobs, 1999. W3C Recommendation available at: http://www.w3.org/TR/html4/.
RFC 2396
RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax., Tim Berners-Lee, et. al, 1998. Available at: http://www.ietf.org/rfc/rfc2396.txt.
WML1.3
Wireless Application Protocol Wireless Markup Language Specification Version 1.3, Wireless Application Protocol Forum, Ltd., 2000. Available at: http://www1.wapforum.org/tech/documents/WAP-191-WML-20000219-a.pdf.
XML Events
xml Events - An events syntax for XML, Steven Pemberton, T. V. Raman and Shane P McCarron, 2001. W3C Working Draft available at: http://www.w3.org/TR/xhtml-events.
Speech Grammars
Speech Recognition Grammar Format (Members only), Andrew Hunt and Scott McGlashan, 9th May 2001 available at: http://www.w3.org/Voice/Group/2001/grammar-spec-20010509.html
SSML 1.0
Speech Synthesis Markup Language Specification, Mark Walker and Andrew Hunt, 8th August 2000 available at: http://www.w3.org/TR/speech-synthesis
SRGF
Speech Recognition Grammar Specification for the W3C Speech Interface Framework, Andrew Hunt, SpeechWorks International Scott McGlashan, PipeBeach available at: http://www.w3.org/tr/speech-grammar/
VoiceXML 2.0
Voice Extensible Markup Language (VoiceXML) , Scott McGlashan et al, available at: http://www.w3.org/tr/voicexml20
XHTML Modularization
Modularization of XHTML Murray Altheim, Frank Boumphrey, Sam Dooley, > Shane McCarron, Sebastian Schnitzenbaumer, Ted Wugofski available at: http://www.w3.org/TR/xhtml-modularization/
XHTML 1.1
XHTML 1.1 - Module-based XHTML Murray Altheim, Shane McCarron available at: http://www.w3.org/TR/xhtml11/
XLink
XML Linking Language (XLink) Version 1.0, Steve DeRose, Eve Maler, David Orchard, 2000. W3C Proposed Recommendation available at: http://www.w3.org/TR/xlink/.
XML 1.0
Extensible Markup Language (XML) 1.0 (Second Edition), Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, 2000. W3C Recommendation: available at: http://www.w3.org/TR/REC-xml
XML Names
Namespaces in XML, Tim Bray, Dave Hollander, Andrew Layman, 1999. W3C Recommendation available at: http://www.w3.org/TR/REC-xml-names.
XPath 1.0
XML Path Language (XPath) Version 1.0, James Clark, Steve DeRose, 1999. W3C Recommendation available at: http://www.w3.org/TR/xpath.
XSchema-1
XML Schema Part 1: Structures, Henry S. Thompson, David Beech, Murray Maloney, Noah Mendelsohn, 2001. W3C Recommendation available at: http://www.w3.org/TR/xmlschema-1/.
XSchema-2
XML Schema Part 2: Datatypes, Paul V. Biron, Ashok Malhotra, 2001. W3C Recommendation available at: http://www.w3.org/TR/xmlschema-2/.
XHTML 1.0
XHTML 1.0: The Extensible HyperText Markup Language - A Reformulation of HTML 4 in XML 1.0, Steven Pemberton, et. al, 2000. W3C Recommendation available at: http://www.w3.org/TR/xhtml1.

E.2 Informative References

ECMA 262
ECMA-262: ECMAScript Language Specification, European Computer Manufacturers' Association (ECMA), 1999. Available at ftp://ftp.ecma.ch/ecma-st/Ecma-262.pdf.
RFC 2141
RFC 2141: URN Syntax, R. Moats, 1997. Available at: http://www.ietf.org/rfc/rfc2141.txt.
XSchema-0
XML Schema Part 0: Primer, David C. Fallside, 2001. W3C Recommendation available at: http://www.w3.org/TR/xmlschema-0/.
XSLT
XSL Transformations (XSLT) Version 1.0, James Clark, 1999. W3C Recommendation available at: http://www.w3.org/TR/xslt.