From mf@w3.org Thu Aug 12 12:26:10 2004
Date: Tue, 20 Jul 2004 14:10:56 +0100
From: Max Froumentin <mf@w3.org>
To: dsr@w3.org
Subject: minutes mmi workshop tuesday morning

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "UTF-8" character set.  ]
    [ Some characters may be displayed incorrectly. ]

1. Extraction of 3D Scene Structure from a video for the Generation of
   3D Visual and Haptic Representations - K. Moustakas - Informatics and
   Telematics Institute

Debbie: do you have a use for 2D scene extraction from a just a
picture? Seems that you might be able to get the same
information.

Konstantinos: no, you need motion to reconstruct the scene. You need
to know object.  There's an other company doing it, but
the method requires manual intervention.

Konstantinos: there are other apps for education, geometry
courses. Also culture heritage, manipulating ancient objects.

Fabio: do results done cover costs of computing model, etc.?

Konstantinos: there is also sound feedback only. Application can talk
to you, "you're near a pharmacy". It might be interesting to have
audio only, but it's not as nice.

Bert: you showed connection by satellite. That introduces delays. How
do you deal with it?

K: I wasn't involved in this, but basically they minimise information
transmission.

Franck: did you run make some evaluation tests to know if haptic more
useful than non haptic, mouse, etc.

K: no comparison.

--------------- 
2. Adaptive Multimodal Dialogue Management Based on The Information State
   Update Approach - Kallirroi Georgila, University of Edinburgh - School
   of Informatics (Invited Talk)

Debbie: Are the update rules application dependent?

Kallirroi: sometimes you can't avoid app specific rules. But in our
system the rules make it easier to move to other domains, because we use
abstract rules, and prolog predicates.

Philipp: can you compare with state-based approaches, and things like
voice XML?

Kallirroi: state based or reinforcment learning?

Philipp: both. how is your approach better?

Kallirroi: in state-based, the designer has to think about all
possible states. It's not the case here. You have the rules, even if you
don't know if they're going to be triggered.  

Philipp: easier this way?

Kallirroi: depends on the application. For example, tutoring systems,
where you could have a huge amount of tests and actions. The idea was
to combine the dialog state like VoiceXML with plan-based from IA.

Debbie: responding to Phuilipp: this approach does have a lot in
comment with VoiceXML. VXML not entirely state based: the Form
Interpretation Algorithm in VoiceXML does a lot of implicit
processing, and the form itself can support all kinds of dialogues,
what Kallirroi does has more information on how the dialogue is
progressing.

Michael: in VXML there's the fixed FIA. Here you can see the rules and
change them. So you have more power, but more ability to create
problems.


----------------------
Human-system Interaction Container Paradigm
Sebastien Praud, Thales

Fabio : interface permits different languages?

Sebastien: not, all the interfaces are in java. 

Fabio: but slides say C++ and other languages

Sebastien: because we interface with existing systems, like plugging
MSOutlook. Otherwise Java for dedicated devices.


Developpers that use our system provide presentation services, required for
the system.

J: where do you define those services?

Stephane: [points at
figure]. System is dedicated to THALES because we know our users 

Michael: is your adaptation like an if statement? "if is it this kind
of terminal, then do..."

Stephane: we do 3 kinds of adaptation. User/teminal, interaction to
the task of the user. If the user in an expert, we will ask for his
login, if he's a novice, we will show some help. Also we adapt to
context, like noisy environment.

Philipp: is your system a prototype? or in use?

Stephane: real application. But app with HIC is a prototype. 

Stepahen Sire: on the last slide you show bloototh connection. What
does it connext to?

Stephane: we use bluetooth with mobile phones to experiment on phone
display. Eventually we want to use mobile radio with Tetra
communuication for policemen, firemen. For now we have a bluetooth prototype
where you can use mobile phone, but near the PC.

Maria: tried using GPRS?, 3G, why wi-fi?

Stephane: security, not given by GPRS: We develop for defence area, no
don't want external operator. But otherwise, it could be done as long as
data transport is provided.

Maria: what kind of data transfer requirements?

Stephane: stream of complex data. For now only usability study. We
have to secure communications process.

?: if you define for firemen, do you foresee to have some kind
of ASR in the device? 

Stephane: we are working on ASR for the Rafale aircraft. We can use
outsource existing systems on PDAs, would then depend on user device
features.

--------------
Multimodality and Multi-device Interfaces - Fabio Paterno, ISTI-CNR

Michael: where do prompts come from?

Fabio: comes from the description. The designer designs the structure
and has to provide low-level info as well, like prompts.

Michael: but having natural language generation is good. For example,
if you know in your task model that you're making a request, you could
generate the prompts.

Cedric: do you perform adaptation based on single characteristics:
screen size?

Fabio: we have this kind of finer detail, but we generate xhtml
mobile, so we try to avoid one transformation for each specific
device.

?: different degrees of stability. you may need different markup
languages for each component.

Fabio: we add one simple transformation. Transformation is in the
tool, hidden from high level, designer

Filip do you also support multimodality on more than one device?

Fabio: not yet, we are really to go on this direction, need to work on
coordination.


-------
Talking Heads for the Web: What for? - Massimo Zancanaro, ITC-IRST

Mary Ellen: I'm working on COMIC. with facial expression
recognition. Very similar research and results. We're looking at
combining expression from different actors.

Massimo: if you want to have an effective range of emotions, we prove:
don't go for actors. A nice designer does a more effective job than
statistical base. Not all apps require recognition of emotion. We are
also thinking of project with autistic person.

Mary Ellen: how do you do the lipsync with Festival?

Massimo: we use another tool, which does statistical model of phonemes.

Dave: are you considering looking at video information for training,
put people in real situations?

Massimo: lot of debate in this project about how to train effectively
whether to use actors or real people. Difficult to model real emotions
with actors. actors are trained. We tried to annotate from TV shows.

---------------------
Interacting with the Ambience: Multimodal Interaction and Ambient
Intelligence - Kai Richter - Zentrum für Graphische Datenverarbeitung

Massimo: for the subscribe mechanism, there is an issue with lots of
subscriptions, requires plenty of time.

Michael Hellenschmidt: evaluation of a task is done by each component,
which can be done in parallel, only results not parallel. But can be
done in real time.

Massimo: bargain of such a distributed system?

Kai: with hardwired settings, you can't tackle this problem.

Fabio: you mentioned W3C interaction framework. Did you find it usable?

Hellenschmidt: only one layer. We have components between the input
and dialogue components, which push events according to user events
and user goals. In our experience, it's enough to have one logical
layer which is able to combine information.

Kai: more levels would introduce more complexity.

?: users like to have one device, but you also said reorganize according
to device. they are different scenarios.

Kai: not necessarily, one use-case is handicapped people. The world
has to be accessed through one access point. non handicaped people
can also use separate devices in the environment, but handicapped
people need personalised support.

?: how do you imagine to cope with non compatible devices?

The world is not standardised, but joining different resources can be
joined in one accesspoint. You don't have to standardise the ATM but
you can give a link to its services. to make gateways. Join things in
other layers.

?: you seem to work on one kind of handicap. You won't be able to
adapt every device to a large variety of disablilties.

Kai: I disagree, quite a few disabled people can adapt. Joining
services is the hard part, industry should solve it.