From mf@w3.org Thu Aug 12 12:26:10 2004 Date: Tue, 20 Jul 2004 14:10:56 +0100 From: Max Froumentin To: dsr@w3.org Subject: minutes mmi workshop tuesday morning [ The following text is in the "iso-8859-1" character set. ] [ Your display is set for the "UTF-8" character set. ] [ Some characters may be displayed incorrectly. ] 1. Extraction of 3D Scene Structure from a video for the Generation of 3D Visual and Haptic Representations - K. Moustakas - Informatics and Telematics Institute Debbie: do you have a use for 2D scene extraction from a just a picture? Seems that you might be able to get the same information. Konstantinos: no, you need motion to reconstruct the scene. You need to know object. There's an other company doing it, but the method requires manual intervention. Konstantinos: there are other apps for education, geometry courses. Also culture heritage, manipulating ancient objects. Fabio: do results done cover costs of computing model, etc.? Konstantinos: there is also sound feedback only. Application can talk to you, "you're near a pharmacy". It might be interesting to have audio only, but it's not as nice. Bert: you showed connection by satellite. That introduces delays. How do you deal with it? K: I wasn't involved in this, but basically they minimise information transmission. Franck: did you run make some evaluation tests to know if haptic more useful than non haptic, mouse, etc. K: no comparison. --------------- 2. Adaptive Multimodal Dialogue Management Based on The Information State Update Approach - Kallirroi Georgila, University of Edinburgh - School of Informatics (Invited Talk) Debbie: Are the update rules application dependent? Kallirroi: sometimes you can't avoid app specific rules. But in our system the rules make it easier to move to other domains, because we use abstract rules, and prolog predicates. Philipp: can you compare with state-based approaches, and things like voice XML? Kallirroi: state based or reinforcment learning? Philipp: both. how is your approach better? Kallirroi: in state-based, the designer has to think about all possible states. It's not the case here. You have the rules, even if you don't know if they're going to be triggered. Philipp: easier this way? Kallirroi: depends on the application. For example, tutoring systems, where you could have a huge amount of tests and actions. The idea was to combine the dialog state like VoiceXML with plan-based from IA. Debbie: responding to Phuilipp: this approach does have a lot in comment with VoiceXML. VXML not entirely state based: the Form Interpretation Algorithm in VoiceXML does a lot of implicit processing, and the form itself can support all kinds of dialogues, what Kallirroi does has more information on how the dialogue is progressing. Michael: in VXML there's the fixed FIA. Here you can see the rules and change them. So you have more power, but more ability to create problems. ---------------------- Human-system Interaction Container Paradigm Sebastien Praud, Thales Fabio : interface permits different languages? Sebastien: not, all the interfaces are in java. Fabio: but slides say C++ and other languages Sebastien: because we interface with existing systems, like plugging MSOutlook. Otherwise Java for dedicated devices. Developpers that use our system provide presentation services, required for the system. J: where do you define those services? Stephane: [points at figure]. System is dedicated to THALES because we know our users Michael: is your adaptation like an if statement? "if is it this kind of terminal, then do..." Stephane: we do 3 kinds of adaptation. User/teminal, interaction to the task of the user. If the user in an expert, we will ask for his login, if he's a novice, we will show some help. Also we adapt to context, like noisy environment. Philipp: is your system a prototype? or in use? Stephane: real application. But app with HIC is a prototype. Stepahen Sire: on the last slide you show bloototh connection. What does it connext to? Stephane: we use bluetooth with mobile phones to experiment on phone display. Eventually we want to use mobile radio with Tetra communuication for policemen, firemen. For now we have a bluetooth prototype where you can use mobile phone, but near the PC. Maria: tried using GPRS?, 3G, why wi-fi? Stephane: security, not given by GPRS: We develop for defence area, no don't want external operator. But otherwise, it could be done as long as data transport is provided. Maria: what kind of data transfer requirements? Stephane: stream of complex data. For now only usability study. We have to secure communications process. ?: if you define for firemen, do you foresee to have some kind of ASR in the device? Stephane: we are working on ASR for the Rafale aircraft. We can use outsource existing systems on PDAs, would then depend on user device features. -------------- Multimodality and Multi-device Interfaces - Fabio Paterno, ISTI-CNR Michael: where do prompts come from? Fabio: comes from the description. The designer designs the structure and has to provide low-level info as well, like prompts. Michael: but having natural language generation is good. For example, if you know in your task model that you're making a request, you could generate the prompts. Cedric: do you perform adaptation based on single characteristics: screen size? Fabio: we have this kind of finer detail, but we generate xhtml mobile, so we try to avoid one transformation for each specific device. ?: different degrees of stability. you may need different markup languages for each component. Fabio: we add one simple transformation. Transformation is in the tool, hidden from high level, designer Filip do you also support multimodality on more than one device? Fabio: not yet, we are really to go on this direction, need to work on coordination. ------- Talking Heads for the Web: What for? - Massimo Zancanaro, ITC-IRST Mary Ellen: I'm working on COMIC. with facial expression recognition. Very similar research and results. We're looking at combining expression from different actors. Massimo: if you want to have an effective range of emotions, we prove: don't go for actors. A nice designer does a more effective job than statistical base. Not all apps require recognition of emotion. We are also thinking of project with autistic person. Mary Ellen: how do you do the lipsync with Festival? Massimo: we use another tool, which does statistical model of phonemes. Dave: are you considering looking at video information for training, put people in real situations? Massimo: lot of debate in this project about how to train effectively whether to use actors or real people. Difficult to model real emotions with actors. actors are trained. We tried to annotate from TV shows. --------------------- Interacting with the Ambience: Multimodal Interaction and Ambient Intelligence - Kai Richter - Zentrum für Graphische Datenverarbeitung Massimo: for the subscribe mechanism, there is an issue with lots of subscriptions, requires plenty of time. Michael Hellenschmidt: evaluation of a task is done by each component, which can be done in parallel, only results not parallel. But can be done in real time. Massimo: bargain of such a distributed system? Kai: with hardwired settings, you can't tackle this problem. Fabio: you mentioned W3C interaction framework. Did you find it usable? Hellenschmidt: only one layer. We have components between the input and dialogue components, which push events according to user events and user goals. In our experience, it's enough to have one logical layer which is able to combine information. Kai: more levels would introduce more complexity. ?: users like to have one device, but you also said reorganize according to device. they are different scenarios. Kai: not necessarily, one use-case is handicapped people. The world has to be accessed through one access point. non handicaped people can also use separate devices in the environment, but handicapped people need personalised support. ?: how do you imagine to cope with non compatible devices? The world is not standardised, but joining different resources can be joined in one accesspoint. You don't have to standardise the ATM but you can give a link to its services. to make gateways. Join things in other layers. ?: you seem to work on one kind of handicap. You won't be able to adapt every device to a large variety of disablilties. Kai: I disagree, quite a few disabled people can adapt. Joining services is the hard part, industry should solve it.