VoiceXML + GUI Convergence and Architecture Issues

Bandwidth/Bearer Issues

Remote speech processing expensive on bandwidth
Local speech processing expensive on bandwidth (grammars) and expensive on CPU/Memory resources
Speech + data generally not supported
- Currently
How to deal with out of coverage
- Are users going to accept degraded recognition?

Input Model Issues

Dialog progression model very different
- Directed v.s. undirected input
How to keep models in sync
Separate data model from presentation models

Other requirements

Branding of sound of speech
Adaptation to device capabilities
- Text on device + speech in network
- All on device
Distributed speech recognition

Inter-modal synchronization

Changes generated by one mode are echoed to other modes
- Needs messaging via push over network
- May need messaging via internal browser events
- Need an event target URI in the data model
Define a data model update message mechanism
- Changes model data directly generate messages to other modalities
Need to define sync binding semantics (early – at origin, late in browsers)
Latency of separated browsers – affects user experience

Single/Dual Dual browser solutions

Dual
- Voice browser in network, Text browser in Device
- Needs a push sync mechanism over networks
Single
- Gives tight synchronisation
- Could offload speech processing
  - Grammar should be fetched by speech processor and not the client device

Security

How to trust network based voice browser and speech processor?
How to secure sync messages?

Dialog description language

New based upon VoiceXML with sync?
Keep VoiceXML and xHTML/WML etc
- Add a sync language?
Event mechanisms already compatible between xHTML and VoiceXML