Remote speech processing expensive on bandwidth
Local speech processing expensive on bandwidth (grammars) and expensive on CPU/Memory resources
Speech + data generally not supported
Currently
How to deal with out of coverage
Are users going to accept degraded recognition?
Dialog progression model very different
Directed v.s. undirected input
How to keep models in sync
Separate data model from presentation models
Branding of sound of speech
Adaptation to device capabilities
Text on device + speech in network
All on device
Distributed speech recognition
Changes generated by one mode are echoed to other modes
Needs messaging via push over network
May need messaging via internal browser events
Need an event target URI in the data model
Define a data model update message mechanism
Changes model data directly generate messages to other modalities
Need to define sync binding semantics (early – at origin, late in browsers)
Latency of separated browsers – affects user experience
Dual
Voice browser in network, Text browser in Device
Needs a push sync mechanism over networks
Single
Gives tight synchronisation
Could offload speech processing
Grammar should be fetched by speech processor and not the client device
How to trust network based voice browser and speech processor?
How to secure sync messages?
Dialog description language
New based upon VoiceXML with sync?
Keep VoiceXML and xHTML/WML etc
Add a sync language?
Event mechanisms already compatible between xHTML and VoiceXML