I am Kazuyuki Ashimura, the W3C Multimodal Interaction Activity Lead. I am interested in multimodal interaction especially using realtime OS to handle utterance timing and speech rate for dialog-based computer interface. There is specific timing and rhythm in human speech dialog, and I think it would be better if dialog-based computer interface could reproduce the timing and rhythm. However, ordinary OS for PCs or smartphones can't handle presice timing due to multi-task prioritization and interruption, so realtime OS should be used for user terminals and the whole framework should also handle timing and rhythm. Maybe we can use EMMA [1] to make all the data in the system synchronized. Please see also the attached figure: speech-juke-box.pdf [1] http://www.w3.org/TR/emma11/