Multimodal Interaction Activity
This talk is aimed at explaining what multimodal
interaction is, how the W3C is standardising in that area, and what
has happened since last year. A lot has happened actually, since it
had just started last year.
- Intro: what is it?
- The W3C MMI Framework
- Ongoing Work
- Multimedia for output
- ??? for input
- Multimodal for output and input
- PDAs, mobile phones, car nav systems
- Web access is expected to be the main application. so it makes sense to have the w3c standardise it.
Described in Use Cases document.
- alternate modalities: driving directions
- simultaneous modalities: travel reservation
a very complex problem...
... that's why there is quite a big working group...
Intro::The MMI Working Group
- 43 Companies
- 79 Participants
- 4 Subgroups
- 7 documents and counting: Requirements, Use Cases, Framework, Ink, EMMA, etc.
- Many dependencies on other groups
The MMI Framework
- Framework doesn't necessarily map to hardware or devices
- Reuse of existing markup: XHTML, CSS, SVG for output, XForms for input
Last year, the WG had just started and concentrated on the framework.
Now more specific areas are worked on:
- Object Model
- I/O components
- New markup: EMMA, InkXML
- Interaction manager
Ongoing Work::The Object Model
Going down one level, it is necessary to specify
interfaces between components. The whole framework can be seen as a
distributed DOM, or it can be seen as components passing messages
between each other, in the form of markup.
Framework::The Object Model (cont'd)
Or message passing infrastructure
Ongoing Work::I/O Components
VIO - Requirements
- Defines an interface for voice input and output
- Input: speech recognition, DTMF
- Output: speech synthesis
Ink object, GUI object, etc.
- Output: reuse existing markup
- Input: new markup needed...
Markup for pen-based input devices
<channel name="X" type="decimal">
<channel name="Y" type="decimal">
<channel name="S1" type="boolean" default="F"/>
<channel name="S2" type="boolean" default="F"/>
<trace id = "4525BCD">
1125 18432'23'43"7"-8 3-5+7 -3+6+2+6 8+3+6:T;+2+4:*T;+3+6+3-6:FF;
Markup for input annotations
- Input device produces one or more results of a template (XForms)
- These results are annotated by intermediaries (device info,
<!-- time stamps for date in first interpretation -->
- The MMI framework is well advanced.
- Parts are beginning to be defined formally in specs.
- Many "holes" left: dynamic configuration, multi-user, multi-device.
- dependencies: HTML/SVG/XForms/CSS, DI, RDF