5-6 October 2010
Hosted by Telecom ParisTech, Paris, France
1.What do we mean by emotion name? 2.What did we forget? 3.What is inappropriate/incomplete/redundant? 4.How can the current selection of “recommended” emotions vocabularies? 5.Should separateelement? 6.Is a single confidence for each emotion? 7.Currently modality lists only where the emotion was expressed : face, voice, body, text … 8.Currently all scale values are in the range (0,1) including both unipolar and bipolar scales. 9.How can EmotionML be used in HTML5?
EmotionML is a crossover to bring together two worlds. Psychologists / researchers and at the other end businesses trying to make money with money. EmotionML provides representation of emotions and related states aimed to empower technology not to advance research on emotion. Applications areas : Opinion mining Affective monitoring Character design and control Social robots Expressive speech synthesis Emotion recognition Support for people with disabilities
→ 20-min presentation
→ 40-min presentation
1.Is a single emotion categories good enough? EmotionML does allow linkage and dynamic to be described. 2.How does the environment affect the emotion? Emotion is a response to a mindscape not the landscape, so given the same landscape different emotion can and do exist. 3.Can we develop lists that specifically address everyday emotions? 4.Can we expect emotion categories to solve all problems? 5.More generally, how do components relate to category descriptions?
Framing a satisfying description of emotional coloring is a huge challenge. . Emotion split into “emergent” emotion and “pervasive” emotion. Technology has clear motives to engage with the emotional coloring that shapes people’s choices and values and how they feel about things. Mood, stance, and altered state of arousal makes >80% of users states. The things that are easy to describe are rare. The things that predominate are hard to describe. Emotional coloring is fundamental to human dialogue or oral interaction. Tools oriented to emergent emotion do not transfer simply to applications involving emotional coloring. Major efforts have gone into lists of emergent emotions – often hierarchies rather than lists and can be either theory, usage, or data driven. Each approach arrives at different lists although names maybe can’t describe all and every emotion. Large categories are not practical to work with. The temporal profile of an emotional state is important. Each emotion has a temporal characteristic in addition to their instant once. How robust are descriptors? Uncertainty is part of the picture because of mixed feelings, unfamiliar feelings, concealment, and poor communication. This leads to active perception not passive perception where linkage and dynamics play are large role in understanding emotions.
Marc Schroeder --------------- * Vocabularies * Intensity * Confidence * Modality * Neutral Point (scale) * Relationship with HTML5 Roddy Cowie ------------ * Distinguishing Problem * Categories (based on data-driven approach) * Intensity * Caring * Expression Tendency * Components vs. Category Description * Timing: Temporal Chracteristics * Difiniteness * Uncertainty: Active Perception * Linkages: Network of feeling * Dynamics of Feelings * Landscape vs. Mindscape
→ 15-min presentation x 2 (=30mins)
→ 60-min discussion including brief summarization of topics discussed during the session
Question: Mental Ingredients as a route to Emotion Markup Language — Isabella Poggi & Francesca D'Errico (Roma Tre University) Marc: your talk shows that we talk a different language; challenge: how can we use EmotionML to represent some of the mental ingredients you described. Marc: at the moment, EmotionML is not so ambitious. It uses a unified approach incorporating info from different theories. Some of the issues Isabella raises could be encompassed directly in the emotion category labels or other emotionML tags Where would the additional information Isabella has provided fit best? In the single emotion term? If this information is background information to understand an emotion, it does not need to be in EmotionML. Prediction information: part of EmotionML? Eg when you see somebody angry, you can predict he might bit you //à// how to encode this predicted information in EmotionML Roddy mentions that Isabella: focus on goals and beliefs; so her primitives are different from those used in EmotionML. So what primitives do we need to do the type of studies Isabella mentions? Marc: 2 suggestions: coding Isabella’s mental ingredients with EmotionML and apply Isabella’s analysis to the 17 everyday emotion labels
Question: The need to represent emotion-related states in EmotionML — Marc Schröder (DFKI) Cf Scherer list of types of affect: emotions, moods, interpersonal stance, preferences/attitudes, Roddy: emotion-related conditions; study of most-common types of emotion-related states Marc: Does it exist dedicated vocabularies for these states? Eg for moods? (Roddy: they can be retrieved easily with some search in the literature) Mood: characterized by valence and time scale Roddy: Moods tend to flicker between 2 emergent emotions; so the transition features of mood is very important. Marc: do we have what we need to represent interpersonal stances? Isabella: distinction between mood and interpersonal stance Roddy: emotionML should stay away from interpersonal stances until we understand it better Liz: However people want to use this term; if we want to have interoperability we need to establish a common list of terms. It will be more useful as people will use stances in their work Roddy: stance is linked to attitude. Marc: action point includes the word attitude in the glossary and states we do not use it as it is too complex and little is known about it Gerard Chollet: why not using factor analysis to find out the dimensions of emotions? Marc: Which types of emotion-related state to support in EmotionML? And which representations for these states? Roddy: the notion of duration can be phrased with emotion inertia (inertia includes decay of an emotion-related states); consciously control
Isabella --------- * Mental Ingredients: Semantic Analysis vs. Conceptual Analysis * Dimentions of emotions * Common Ingredients: - Pride: dignity/superiority/arrogance - social relationships/purpose(function, goal)/admiration * Definition of vocaburary vs. Markup Language itself * What kind of primitives is needed? Marc ----- * Emotion-related states - Types of affect - Emotion-related conditins * Vocabularies for emotion-related states - need new definition?/reuse exixiting ones? * Duration of state * Transition potential, e.g., mood (anger, happiness) * Interpersonal stance by category and target * Which to support? (social emotion?) * Dynamic emotion model: Inertia of emotion * Social relationship * Nesting of emotions vs. parallel emotions with time stamp
→ 15-min presentaion x 2 (=30mins)
→ 60-min discussion including brief summarization of topics discussed during the session
Gill Windall: Tracking and Influencing Trainee Emotions in a Crisis-Planning Scenario Pandora project: training people who manage crises on a strategic level Need to represent emotion in Pandora: - trainees' emotional state (individual/group) - trainees' initial state an demotional predisposition - emotional change desired / target emotional state - annotation of media with likely emotional impact - emotion to be represented by Non-Player Characters Issues: - may be useful to indicate sensor type or sensor id (issue:150), or use <reference> for that? - represent relationships betweenelements -- combined emotion is derived from the individual ones - represnet timings with relation to both absolute and exercise timeline -- outside EmotionML? - timing: "observed at" a given time, rather than start + end. Emotional predisposition? Desired emotional change (target state, or direction of change?) Annotation of media with likely impact on perceiver; rules for combining effect of combined media? Modality: modality elements may be easier to parse than a space-separated list of values in an attributes? Scale: 0..1 or -1..1; continuous rather than discrete? linear rather than logarithmic? taking into account a "tipping point"? Felix: very useful input. most important? Gill: scales? Tim: who gives meaning to the scale? self-report uses a five-point scale Felix: a year ago we had a very complex scale model, now we wanted it as simple as possible Gill: with real numbers, you get rounding errors Felix: discrete scales have problem of agreeing on number and label of the discrete terms Tim: maybe it would be important to know the origin of the number, a self-report or continuous source Sarah Jane: danger of not agreeing on number of discrete values Issue of normalising the scale. Marc: interpretation of the scale depends on the context. Catherine: maybe the base level helps -- the "rest state" of the trainees in Pandora? Catherine: in MPEG-4, they do not provide max values. only the minimum is specified. Marc: if we adopted that, would it make interoperability easier or more difficult? About exaggeration, we had decided that it is the expression that is exaggerated for a cartoon character, not the emotion as such. Nobody is arguing in favour of exaggerated emotion values. Tim: just it would be good to have the option of discrete values Sarah Jane: it would make a difference in how to interpret the annotated data Resulting requirement: Record labels and number of possible values for discrete scales. Sarah Jane Delany: Use of Crowdsourceing for Labelling Emotional Speech Assets Project: emotion detection in natural speech High quality predictions require high quality recordings and high quality annotations Metadata annotation of recordings based on IMDI Annotation: considering "crowdsourcing" (group performs task of expert) - Mechanical Turk; reCaptcha; "games with a purpose"; - new development in many different research areas is using crowdsourcing - "good" annotators are those that agree with the consensus rating (Brew et al 2010) Training a system on annotations by "good" annotators converges much more quickly on high quality predictions. Use case: annotate speech assets using scales (incl. activation and evaluation) using crowdsourcing. Pilot test => need clear instructions; asset selection strategy; level of payment. Did use continuous scales but that made supervised learning more difficult, so will move towards discrete scales (starting with simple three-point scales). Tim: we predict distributions rather than chosing a single best value Requirement to represent multiple ratings of an asset (a speech clip). Requirement to record the annotator. => Add an example into the EmotionML spec: individual ratings in <emma:group>, use emma:derived-from to state that the consolidated rating is based on the group of individual ratings
Univ. Greenwhich ----------------- * Relationship between trainer and trainee * Wish list: - Sensor - Relationships between emotions - Single timepoint - base level - Combine media and markup - Scales: various types of scaples needed (discrete/non-linear/nominal etc.) Dublin Institute of Tech. -------------------------- * Prediction of emotion in natural language - acoustic - machine learning * Mood inducted corpus of speech * Croud-sourced annotations * Needs: - Assets list - Reliability - Identify annotators - Individual vs. Consolidated
→ 15-min presentaion x 3 (=45mins)
→ 45-min discussion including brief summarization of topics discussed during the session
Davide Bonardo (Loquendo) -------------------------- Isabella Poggi: what about the performative of the Speech, in output specification? How do you simulate special voices for suggest, etc. Davide Bonardo: performative can be used to control the expression of the avatar. TTS only use neutral, happy, sad Isabella Poggi: but about suggest, etc. do you have specific voice parameters ? There is also something to do with stances. The way you pose yourself toward the other, you tell people to do something but you are uncertain. This is close to stance. Within performative, often some particular emotion is embedded. It means, if I order you, it means I am angry at you. In my View of ingredients, an emotion may be inside this order. I am anticipating something in my voice. How they can fit together ? Davide Bonardo: Only affects can be described within emotion. Catherine Pelachaud: in response to a question about the Dialog Manager : Project Partner for dialog management is As An Angel (they use APML) Marc Schröder: For facial animation, the tradition is describing emotions with the big 6 emotions of Eckman. So the Computer Graphics community has used this in generation systems, driven from these 6 basic emotions. It is considered an appropriate description. So a mapping is necessary. We have to distinguish mapping modalities from multimodal systems. Another point, in the voice, you have 6 levels of modalities : there is a conflict! Modalities don't match completely. Necessarily, a conversion must be done. There is an issue of mapping between different representations and it is not realistic to share the same representation any time soon. Davide Bonardo: Actually there is a gap between the representations. There must be a link between them but this is not a problem. Each component uses the emotion it can understand. Marc Schröder: eachTTS could say it accept some of the modalities. Different TTS will accept different inputs, it cannot be standardized. Davide Bonardo: There must be a common language Marc Schröder: in 3 dimensions, a prosody will be generated by a TTS and another TTS will generate another prosody. It is not yet the time to have a common format. The application chooses the format. EmotionML does not define a mapping between the different representations. OCC categories could be mapped with combination of appraisal dimensions. FSRE categories could be matched. It has not be done already. Some mapping are easy, others are impossible without losing information. EmotionML language could define a format to describe the mapping. It is not done yet.
Felix Burkhardt (Deutsche Telekom) ----------------------------------- Isabella Poggi (about slides): what means "critical"? Felix Burkhardt: it means substantial for the application Sarah Jane Delany : Which labeled data do you keep? Felix Burkhardt: I am not sure what to do: do some automatic classification and keep interesting ones, hope to find them. In my experience: 10% are really angry Sarah Jane Delany: Active learning would be useful Felix Burkhardt: Yes, good idea! Gérard Chollet: there are cultural difference in speech Felix Burkhardt: Yes Tim Lieuvelynn: categories are written in specific languages? Marc Shröder: they must be in English (their names come from the literature, which is in English). They can be translated for users in applications. Question about levels 1 to 6: no anger, not sure, and 3 levels of anger ... How do you represent that? Felix Burkhardt: probably with 3 emotions. Anyway, level 2 is very special. Marc Shröder: so 3 categories, one of them with intensities? Felix Burkhardt: or linearly mapping ot ranges to 0 to 1 Roddy Cowie: people scale differently. It is very difficult to get uniform scales. It depends on the meanings you attach. This has implication on the meanings of scale in EmotionML. Marc Shröder: you suggest to use discreet scales only? Roddy Cowie: No. I mean that angry at level 4 is not necessarily twice as angry as level 2. Correlate people scores: you will see they use the intervals on the scale differently. People must understand this kind of limitations.
Laurent Ach (Cantoche) ----------------------- - Living Actor: Virtual avatars on websites, serving as guides for users. - Animation: direct edition of animation timeline; automatic animation from voice analysis; animation graph with emotion label; dimensions of emotion - Affective avatars: Emotion recognition in user's voice, animate character in Skype plugin Emotion first in research projects, then use the results in customer projects. Animate based on high-level annotations (semantics, emotions). Mappings from emotion to animation is done by artist. Dimensions of emotion indirectly related via selection of animations. Test at: http://www.livingactor.com
Roddy: How well does emotion recognition from text work? Laurent: Company Lingware does it. In addition, we know a lot of things beforehand. In the second stage of the project, we will use different text, but not clear yet. Tim: How do you choose your moods, where do they come from? Laurent: Depends on the application; we do not distinguish precisely between emotions, moods, etc. They are high-level annotations. The artists realise these intuitively. Marc: Dimensions -- do they trigger animations via thresholds? Laurent: No, we have multiple continuous dimensions, and the animations have a number of different properties, so there is a selection algorithm. Roddy: How is this related to persuasion? Laurent: The goal is to lead the customer to different parts of the website. The approach is artistic and intuitive, there is no theory behind it. Roddy: The theory exists -- would it be useful for you to have access to it? Laurent: Yes. The artists are influenced by the theory. Roddy: These things are around EmotionML, it is unclear if they should be integrated into EmotionML somehow. Marc: Any specific requirements for EmotionML? Laurent: No, what is already inside is fine with us. Actually: timestamps are currently absolute: it would be useful to have a point 0 (= when the web session is starting). Tim: Same for us.
Loquendo --------- * Vocabulary for TTS * Interoperability with SSML, etc. Deutsche Telekom ----------------- * Technology push vs. Market pull * VoiceXML should include EmotionML * Global confidence is sufficient Cantoche --------- * Emotion mapped to animation by artists * Time code: - currently absolute time - would need a custom 0 point (at start of a Web session)
→ 15-min presentaion x 3 (=45mins)
→ 45-min discussion including brief summarization of topics discussed during the session
Takeshi Natsuno (Dwango/Keio University) ----------------------------------------- Marc: What is the relation with EmotionML Answer: 2 versions of the system: Html vs flash versions. Flash is customizable; in HTML we do not have all the tags as in flash M: Is there a need for a specific interface: right now people can use quick text typing; may be an additional user interface is needed to add emotion tags TN: the audience can feel the ‘air’ and can add more emotions by adding comments; M: what’s about using emoticons TN: people already use emoticon-like (LOL, applaud) M: you do not hear your friend laughing; see your friend’s face TN: so we could synchronize with the emotion of your friends; make more effects variety M: real-time is a real pb in live applications; it needs to be fast, so users can not spend too much time to select display effects Tim: When was the service started? TN: 3 years ago; 7% of internet traffic is for nico nico K: What is the link between emotion markup tags and arbitrary text written by people; emotion-related description/emoticon created by people Need of video timestamps (SMIL/Media) and synchronization Relative position of information on video screen and EmotionML Roddy: nico nico picks up on high level (speed, rhythm, number of comments at once) While emotionML acts on low level (smile) Emotion theorists may be able to help here. Roddy: bond btn people is strong specially when their communication means is not understood by everyone.
Christian Peter (Fraunhofer IGD/Graz University) ------------------------------------------------- Tim: where is the ground truth Roddy: you can store the trace with emotionML; but you do not store the data related to material, M: put such info in a global info tag
Tim Llewellynnion (nViso) -------------------------- Masao: Can you detect when people are bored Tim: we are using the big six Roddy: there are existing models to detect other emotions (eg interest); you could add them in your software Isabella: can you detect different type of emotion (eg diff type of laughs (happy smile, irony smile) Tim: important for us is the pattern of the emotion profile Isabella: so you trust the emotion that is annotated? Marc: you predict from low-level feature, high level information? You could predict any other emotions rather than just the big 6? Tim: some studies have correlated/validated the correspondence between low level features and emotions. We need other correspondences to larger variety of emotions. Sarah-Jane: difficult to do classification for 7 classes in a go (6 big six emotions + neutral) Roddy: do you have analysis technique to find out different groups of people based on their reactions to adds. Tim: We would like to do based on their emotional profile Tim: We do not analyze the general set of Aus; we look for the features that are relevant in each of the big six emotions Roddy: a lot of info is in the head and movement Tim: we are starting looking into that Roddy: there is an issue where features came in within EmotionML Sarah: everybody will come with their own features as the most important ones Roddy: there may be one level: the channel; feature point is one channel; head pose and posture is another channel; head dynamics Felix: Can also be called modality (cf Christian Peter) Roddy: needs of an agreed terminology to describe the channel/modality Cath: cluster the channel/modality into group (visual/body, voice/acoustic, text, brain, physiological, etc)
Natsuno -------- * Relationship between arbitrary text and EmotionML * Emoticons * Stronger integration with SMIL and Media Fragment URI * Relationship with position of emotion information on video screen and EmotionML * Group of Emotion effects in NicoNicoDouga * Comments are a databse which could be analyzed/reused Fraunhofer ----------- * trace of sensor output (sequencial data) * grobal info tag/EMMA as container nViso ------ * different reactions from different groups * mapping between big six emtions and features * face shape/action units * visualization * channels (modalities), e.g., facial points * body parts grouping
In this session we will first review topics discussed during the previous sessions, do a preliminary vote, and decide which topics to discuss in detail. And then we will have detailed discussion about those topics.
Topic | First vote | Second vote |
---|---|---|
List of channels/modalities: | 8 | — |
Scales: | 7 | — |
Vocabularies: | 3 | 4 |
Interoperability: | 2 | 1 |
Dynamics of feelings: | 7 | (defered to EmotionML 2.0) |
Emotion-related states: | 2 | 0 |
Timing mechanism: | 2 | 3 |
Also we discussed "Scales" during session7.
Discussion identified requirements for at least recording the number of discrete categories in the scale and perhaps the labels used. Roddy presented work on scales – Feeltrace scale Asked people to identify appropriate labels that they found natural and useful– didn’t use discrete categories but provided guidelines to anchor the user. Suggested use of an anchored continuous scale. Divided into thirds with labels at each boundary. None, mild social, strong, uncontrolled -‐ are the markers/labels that can be used for any emotion. Bipolar scales – settled on two divisions with label of neutral in middle. Experience with more complex scales is that users don’t do anything different. Cross cultural differences in the rating of the intensity of the emotion. Is a health warning on cross cultural issues necessary? Discussion on the anchored scaled raised 2 questions: - Do we want to use normalise the continuous scale using anchors? - Do we want to use a discrete scale? The problem of accepting one group’s representation was raised as a problem. Discussion focussed on whether we set up reusable scales and how do we recommend vocabularies. Roddy advised that pyschologists have collections of standard scales in databases and standard techniques for measurement. Suggestion was to find widely used scales and recommend use of them. ** Agreed to set up labels associated with a scale separately and use it in the vocabularies. This can be done by pointing to the scale using a URI. ** Agreed to come up with best practice scales. ** Agreed to add to the dimensions, appraisals and action tendencies item, identification of the labels used for the scale. ** Agreed that a single value is given on the dimension which can either be discrete or continuous. What to do with intensity was raised as an issue but there was no time for discussion.
We split into two subgroups. The main group discussed "Vocabulary" topic, and the smaller group consisting of Roddy and Catherine dicussed "Channels/Modalities" topic.
davide: mentions PLS as example
sara-jane: example?
marc: let me try
... emotion is reaction to certain environment
... social regulation framework
... how to recognize emotion?
... keep tracing clean definitions
... depending on categories and context
... need custom vocabularies
... some vocabularies just make sense in some specific
context
... can we agree to multiple vocabularies?
davide: vocabulary depends on services
marc: people look for
orientations
... there is no default vocabularies
... FSRE vocabulary covers 4 features
... 24 categories
... we could consider language dependent mappings (for
EmotionML 2.0?)
(some discussion on "default" vocabularies)
marc: the question is whether we want to make FSRE vocabulary a default or not
isabella: emotion in deep
structure may be the only one that can allow reciprocal
information with people
... because not only different language but also different
theory
... besides the effort of mapping some terminology to
others
... kind of like Don-Xhihote (spelling?)
... I think I can go on
... as for now, it could be a good candidate
kaz: one possible walk around is saying "an EmotionML processor SHOULD process FSRE"
marc: it would make sense to
think about three major use cases again:
... manual annotation, recognition and generation
(Gill goes to the flipchart and write down the use cases)
marc: we can't ask all the users to revisit emotion theory and see the background of the vocabularies
sara-jane: is the purpose of EmotionML is interoperability?
marc: available category set
depends on context of applications
... the syntax is quite different between read and write
emotion information
... as soon as use actual application, our capability is
limited
... so we should not require any specific set or subset of
vocabulary
sara-jane: it should be best
practice
... should not be mandated
... we should provide it just for people's easy use
[ kaz's note: wonders about sustainability of URIs for vocabulary set references ]
[ marc points out the next WD should discuss that point ]
felix: two questions?
... 1. do we need default?
... 2. rich set/minimum set?
gill: default minimum set?
felix: probably larger set would be safer
catherine: how about "big6"?
marc: there is something which
can't be done by "big6" vocabulary
... we could say "this vocabulary set would fit several
different purposes" without defining any "default" set
[ Catherine mentions the initial list of channels/modalities generated by Roddy and herself ]
CONSENSUS about default set: we could say "this vocabulary set would fit several different purposes" without defining any "default" set
won't make any default set mandatory
Effectors: static (pose) & dynamic (movement) - Gaze: looking at & mvt - Face - Head - Torso - Gesture - Leg - Locomotion/position of whole body - Speech: verbal, vocal & paraverbal Visceral (controlled by autonomic nervous system): breathing, skin colour, secretion (sweat, tear), pupil dilation Central (central nervous system) - EEG - fMRIr - reflex (oculomotor reflex, startle reflex) interaction? Turn-taking
Gill: scale labels, should they consist of pairs of labels and numbers? yes
- should there be one intensity for the emotion element? - Could we treat categories just like dimension, having an attribute called "value" and allow for several in one emotion? - should you be able to have "mixed emotions"? - problem with mapping categoties to dimensional space. - is emotion tag as being one single statement then still preserved? - discussion on correlation between intensity and confidence. - suggestion: have value for categories as well as confidence.
The Call for Participation, the Logistics Information, the Presentation Guideline and the Agenda are also available on the W3C Web server.
Marc Schröder, Catherine Pelachaud, Deborah Dahl and Kazuyuki Ashimura, Workshop Organizing Committee
$Id: minutes.html,v 1.3 2010/10/21 23:32:26 ashimura Exp $