W3C Multimodal Interaction Working Group

Related Specification Work

The W3C HTML-Speech Incubator Group prepared a report describing a JavaScript API to speech recognition services.
The IETF Speech Services Control (SpeechSC) working group has developed the MRCP protocol to support distributed speech recognition, speech synthesis and speaker verification services, and is taking advantage of W3C's work on the speech recognition grammar specification (SRGS), the speech synthesis markup language (SSML), semantic interpretetation (SI) and extensible multimodal markup annotations (EMMA).
ETSI's STQ Aurora project is looking at codecs optimized for distributed speech recognition.
ETSI standard ES 202076 defines a generic spoken command vocabulary for controlling common operations such as calling someone by saying their name, browsing through a voice mail box, adjusting the volume, muting the microphone and other device properties. ETSI provide bindings for the vocabulary to a variety of human languages. This suggests the possibility of device-based recognition for common spoken commands together with network based recognition for other vocabularies.

For more details on other organizations see the Multimodal Interaction Charter.