I am Kazuyuki Ashimura, the W3C Multimodal Interaction Activity Lead.

I am interested in multimodal interaction especially using realtime OS
to handle utterance timing and speech rate for dialog-based computer
interface.  There is specific timing and rhythm in human speech dialog,
and I think it would be better if dialog-based computer interface
could reproduce the timing and rhythm.  However, ordinary OS for PCs
or smartphones can't handle presice timing due to multi-task
prioritization and interruption, so realtime OS should be used for
user terminals and the whole framework should also handle timing and
rhythm.  Maybe we can use EMMA [1] to make all the data in the system
synchronized.

Please see also the attached figure: speech-juke-box.pdf

[1] http://www.w3.org/TR/emma11/