Distributed Multimodality in the Multimodal Architecture
Talks
Distributed Multimodality in the Multimodal Architecture
Add to calendarAlthough mobile device capabilities are increasing rapidly, many applications require the powerful capabilities of servers. For example, large vocabulary speech recognition natural language processing, and handwriting recognition require significant processing resources. Furthermore, updating grammars, language models, and vocabularies is much easier to do on a server than on millions of devices. For these reasons, many multimodal applications make use of distributed architectures with different modalities being processed in the cloud. In fact, it appears that a paradigm is emerging in which the device is used primarily for capture of media like audio, images, and ink, but the cloud is used for computationally intensive operations like speech recognition, natural language processing, handwriting recognition and other more advanced capabilities such as translation and biometric processing.The Multimodal Interaction Working Group of the World Wide Web Consortium is developi ng a Multimodal Architecture for supporting distributed, interoperable multimodal applications. The architecture is based on a set of Multimodal Life Cycle Events for communication between components over the Web, and the EMMA (Extensible MultiModal Annotation) for representing user input. This presentation will describe the architecture and discuss how it is particularly suited for developing distributed applications with examples of specific applications.