simultaneous modalities

Travel reservations over phone/pda/wap

III.2 Who you gonna call?

The WG was mostly a spin-off of the Voice Browser WG, after the realisation that voice interaction with computers isn't just for call centres. Instead it can now happen on the device itself. And then why limit ourselves to speech, how about ink, gestures, many input modes, concurrent or sequential. How about dynamic modality switching upon environmental conditions? etc.

III.3 How do they do that?

The Framework

What to standardise?

First, the browser
Second, the Web

Q: But why standardise things that are happening inside the browser?

A1: browsers are plurilithic

A2: you can't detach interaction form application

Deeper into the framework

the boxes don't necessarily map to devices
Reuse of existing markup: XHTML, CSS, SVG for output, SRGS for input

Deeper still:output

Deeper still: input

Work items

Now that we've constructed a framework, what new pieces do we add?

"connecting" things together: the MID
Input modalities: reuse standards for speech/text grammars (SRGS), need grammars for tablets: InkML
Output modalities: reuse SVG, HTML, CSS, SSML, etc.
Transmitting and merging input data: EMMA, Compositing
Interaction Manager
System and Environment (now DPF)
Sessions

III.1. Linking things together: the MID

Multimodal Interaction on The Web