Emotion Markup Language: Requirements with Priorities

W3C Incubator Group Report 13 May 2008

This version:: http://www.w3.org/2005/Incubator/emotion/XGR-requirements-20080513/
Latest version:: http://www.w3.org/2005/Incubator/emotion/XGR-requirements/
Previous versions:: http://www.w3.org/2005/Incubator/emotion/XGR-requirements-20080429/

Authors:: Felix Burkhardt (DTAG); Marc Schröder (DFKI GmbH)

Abstract

This is the simplified Requirements document with priorities for an Emotion Markup Language ("EmotionML") as previewed by the charter of the Emotion Markup Language Incubator Group. The requirements extracted from use cases by the first Emotion Incubator Group, and listed in the group's Final Report, have now been prioritized by the second group and divided into two parts, depending on whether their support is mandatory ("must have") or optional ("should have") for an implementation of the envisaged EmotionML.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of Final Incubator Group Reports is available. See also the W3C technical reports index at http://www.w3.org/TR/.

This is the simplified list of requirements for the prospected Emotion Markup Langage. Twenty-two requirements were identified and divided into two groups, depending on whether we regard them as mandatory or optional.

The content of this document reflects the status at the beginning of the second charter of the Emotion Incubator Group and can be used for a discussion on the elements that shall be included in the final incubator draft specification of the Emotion ML.

This document was developed by the Emotion Markup Language Incubator Group.

Publication of this document by W3C as part of the W3C Incubator Activity indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. Participation in Incubator Groups and publication of Incubator Group Reports at the W3C site are benefits of W3C Membership.

1. Introduction
2. Mandatory Requirements (Must Have)
3. Optional Requirements (Should Have)
4. References

1. Introduction

The following represents the simplified Requirements document with priorities for an Emotion Markup Language ("EmotionML"). Twenty-two requirements were extracted from use cases by the first Emotion Incubator Group, and described in a thematic structure in the group's Final Report. These requirements were prioritized by the second group and divided into two parts, depending on whether they are considered to be mandatory ("must have") or optional ("should have") for an EmotionML specification.

The procedure to prioritize the requirements was as follows. A questionnaire was published, asking respondents which of the requirements they felt are most important and which are less important. For each requirement, the respondents had the choice of

must have: The specification must define the feature.
should have: The specification should define the feature, if possible.
nice to have: The specification may optionally define the feature.
future revision: The feature needs additional study before specification.
no need: I don't see the need for this feature in the specification
no opinion

Ten group members and four individuals from outside the group filled in the questionnaire and gave valuable feedback. In general, the agreement between the respondents was quite high.

In order to reach a common decision, the requirements of which 70% of the respondents said they were either "must have" or "should have" were taken to be mandatory; the others were taken as optional. The resulting coarse subdivision was further discussed in the group. In general, the classification was considered appropriate, with minor exceptions:

Many group participants requested that appraisal and action tendencies should be promoted to mandatory status;
There were contradictory opinions about the status of regulation, but it was decided for practical reasons to treat it as optional for the moment;
Whereas any individual global metadata was considered optional, it was considered mandatory to have at least a generic mechanism for specifying global metadata.

This approach divided the requirements into fourteen mandatory and eight optional requirements.

Requirements are presented here according to the thematic structure introduced in the 2007 Final Report.

2. Mandatory Requirements (Must Have)

The following requirements are mandatory for a generic emotion markup language.

2.1 Emotion Core

Core 1. Type of emotion-related phenomenon

The emotion markup language must be suitable for representing different types of emotion-related states -- not only emergent emotions, i.e. emotions in the strong sense (such as anger, joy, sadness, fear, etc.), but also moods, stances, affect dispositions, etc. The emotion markup must provide a way of indicating which of these (or similar) types of emotion-related/affective phenomena is being annotated.

Core 2. Emotion categories

The emotion markup must provide a generic mechanism to represent broad and small sets of possible emotion-related states. It must be possible to choose a set of emotion categories (a label set), because different applications need different sets of emotion labels. A flexible mechanism is needed to indicate, for a given document or individual annotation, which set of categories is being used.

A standard emotion markup language should propose one or several "default" set(s) of emotion categories, but leave the option to a user to specify an application-specific set instead. Douglas-Cowie et al. (2006) propose a list of 48 emotion categories that could be used as the "default" set.

Core 3. Emotion dimensions

The emotion markup must provide a generic format for describing emotions in terms of emotion dimensions. In emotion psychology, a small number of 2-4 emotion dimensions is considered to cover the most essential aspects of people's emotion concepts and subjective experience. A dimension is a unipolar or bipolar continuous scale.

As for emotion categories, it is not possible to predefine a normative set of dimensions. Instead, the language should provide a "default" set of dimensions, that can be used if there are no specific application constraints, but allow the user to "plug in" a custom set of dimensions if needed. Typical sets of emotion dimensions include "arousal, valence and dominance" (known in the literature by different names, including "evaluation, activation and power"; "pleasure, arousal, dominance"; etc.). Recent evidence suggests there should be a fourth dimension: Fontaine et al. (2007) report consistent results from various cultures where a set of four dimensions is found in user studies: "valence, potency, arousal, and unpredictability".

Core 4. Appraisals related to the emotion

The emotion markup must provide a generic format for describing emotions in terms of the appraisals that cause or accompany them. Appraisal is a core concept in cognitive emotion psychology; cognitive emotion theories describe in detail which appraisals of "things in the world" lead to which emotions. Syntactically, appraisals may be represented as unipolar or bipolar scales.

Appraisals can be described with a common set of intermediate terms between stimuli and response, between organism and environment. For example, the appraisal variables can be linked to different cognitive process levels as in the model of Leventhal and Scherer (1987). As one option, the following set of labels (Scherer et al., 2004) can be used to describe the protagonist's appraisal of the event: novelty, intrinsic pleasantness, relevance, coping potential, compatibility of the situation with standards.

Alternatively, in the "OCC" model of emotions (Ortony, Clore and Collins, 1988), which is implemented in a range of emotional reasoning systems, appraisals are sorted according to the consequences of events for oneself or for others, the actions of others and the perception of objects. A convenient way of specifying the appraisals in the OCC model was proposed by Gebhard et al. (2003), using "appraisal tags" such as "good_likely_future_event".

Core 5. Action tendencies

The emotion markup must provide a possibility to characterise emotions in terms of the action tendencies linked to them.

For example (Frijda, 1986, p. 88, Table 2.1), desire is linked to a tendency to approach, fear is linked to a tendency to avoid, etc.

Activation, as defined by Frijda (1986, pp. 90-94), is the readiness to act according to a specific action tendency. It is a degree, and should be represented by a scale value.

This requirement is not linked to any of the currently envisaged use cases, but has been added in order to cover the theoretically relevant components of emotions better. Action tendencies are potentially very relevant for use cases where emotions play a role in driving behaviour, e.g. in the behaviour planning component of non-player characters in games.

Core 6. Multiple and/or complex emotions

The emotion must provide a mechanism to represent multiple emotions that are co-occurring in the same experiencer.

There are at least two kinds of meanings for multiple and complex emotions. On the one hand, it should be possible to represent the fact that two emotions are co-occurring in parallel, e.g. when a person is angry about one thing and sad about another, or the fact that the face shows one emotion but the voice expresses another.

The other sense in which multiple emotions can be present is closely linked with regulation: One emotion can be suppressed, and another one may be simulated instead. This second meaning of multiple emotions can only be adequately represented when regulation is also represented in the markup language.

Core 7. Emotion intensity

The emotion markup must provide an emotion attribute to represent the intensity of an emotion. The intensity is a unipolar scale.

Core 8. Emotion timing

The emotion markup must provide a generic mechanism for temporal scope.

This mechanism should allow for different ways to specify temporal aspects such as i) start-time + end-time, ii) start-time+duration, iii) link to another entity (start 2 seconds before utterance starts and ends with the second noun-phrase...), iv) a sampling mechanism providing values for variables at evenly spaced time intervals.

2.2 Meta-information about emotion annotation

Meta 1. Confidence / probability

The emotion markup must provide a representation of the degree of confidence or probability that a certain element of the representation is correct. It must be possible to indicate the confidence for each element of the representation separately: e.g., the confidence that the category is indeed X is independent from the confidence that its intensity or its timing is correctly indicated.

Meta 2. Modality

The emotion markup must be able to represent the modalities in which the emotion is reflected, e.g. face, voice, body posture or hand gestures, but also lighting, font shape, etc. The emotion markup must provide a mechanism to represent an open set of values.

2.3 Links to the "rest of the world"

Links 1. Links to media

The emotion markup must be able to refer to external media of various kinds, including videos, pictures, audio files, and nodes in an XML document.

Links 2. Position on a time line in externally linked objects

The emotion markup must provide a mechanism for complementing a link to media with timing information, in order to further specify the scope of the link.

For example, a start time and duration may be used together with a link to a video in order to indicate the onset and offset of an emotionally relevant event.

Links 3. The semantics of links to the "rest of the world"

The emotion markup must provide a mechanism for assigning meaning to those links.

The following initial types of meaning are envisaged:

The experiencer (who "has" the emotion);
The observable behaviour "expressing" the emotion;
The trigger/cause/emotion-eliciting event of the emotion;
The object/target of the emotion (the thing that the emotion is about).

Links to media are relevant for all of the above. For some of them, timing information is also relevant:

observable behaviour;
trigger.

It is currently unclear if a flexible mechanism for assigning meaning to links is needed, or if a fixed set of "slots" is sufficient.

2.4 Global metadata

Global 0. A generic mechanism to represent global metadata

The emotion markup must provide a generic mechanism for representing global metadata.

In order to facilitate communication between a producer and a consumer of emotional data with respect to application specific information, a generic metadata element must be supported.

Note: this mandatory requirement is a relatively unspecified placeholder (hence the identifier "Global 0") for the various more specific but currently optional requirements for global metadata. As the specific types of metadata are implemented, this generic placeholder becomes irrelevant.

3. Optional Requirements (Should Have)

The following requirements are marked as less important and urgent. If their implementation poses non-trivial problems, they can be left unimplemented in a first draft of the emotion markup language. Nevertheless, they are required for certain use cases, and should be added in future versions of the markup language.

3.1 Emotion Core

Core 9. Emotion regulation

The emotion markup should provide emotion attributes to represent the various kinds of regulation.

Note: there are good reasons for including this in the "must have" requirements, but for reasons of prioritisation it has been kept on the "should have" list for the time being.

According to the process model of emotion regulation described by Gross (2001), emotion may be regulated at five points in the emotion generation process: selection of the situation, modification of the situation, deployment of attention, change of cognition, and modulation of experiential, behavioral or physiological responses. The most basic distinction underlying the concept of regulation of emotion-related behaviour is the distinction of internal vs. external state. The description of the external state is out of scope of the language - it can be covered by referring to other languages such as Facial Action Coding System (Ekman et al. 2002), Behavior Mark-up Language (Vilhjalmsson et al. 2007).

Other types of regulation-related information can represent genuinely expressed/felt (inferred)/masked(how well)/simulated, or inhibition/masking of emotions or expression, or excitation/boosting of emotions or expression.

3.2 Meta-information about emotion annotation

Meta 3. Acting

The emotion markup should provide a mechanism to add special attributes for acted emotions such as perceived naturalness, authenticity, quality, and so on.

3.3 Global Metadata

Global 1. Info on Person(s)

The emotion markup should provide information about the persons involved. Depending on the use case, this would be the labeler(s) (Data Annotation), persons observed (Data Annotation, Emotion Recognition), persons interacted with, or even computer-driven agents such as ECAs (Emotion Generation). While it would be desirable to have common profile entries throughout all use cases, we found that information on persons involved are very use case specific. While all entries could be provided and possibly used in most use cases, they are of different importance to each.

Examples are:

For Data Annotation: gender, age, language, culture, personality traits, experience as labeler, labeler ID (all required)
For Emotion Recognition: gender, age, culture, personality traits, experience with the subject, e.g. web experience for usability studies (depending on the use case all or some required).
For Emotion Generation: gender, age, language, culture, education, personality traits (again, use case dependant)

Global 2. Social and communicative environment

The emotion markup should provide a global information to specify genre of the observed social and communicative environment and more generally of the situation in which an emotion is considered to happen (e.g. fiction (movies, theater), in-lab recording, induction, human-human, human-computer (real or simulated)), interactional situation (number of people, relations, link to participants).

Global 3. Purpose of classification

The result of emotion classification is influenced by its purpose. For example, a corpus of speech data for training an ECA might be differently labelled than the same data used for a corpus for training an automatic dialogue system for phone banking applications; or the face data of a computer user might be differently labeled for the purpose of usability evaluation or guiding an user assistance program.These differences are application or at least genre specific. They are also independent from the underlying emotion model.

Global 4. Technical environment

The emotion markup should provide information about the technical environment.

The quality of emotion classification and interpretation, by either humans or machines, depend on the quality and technical parameters of sensors and media used.

Examples are:

Frame rate, resolution, colour characteristics of video sources;
Dynamic range, type of sound field of microphones;
Type of sensing devices for physiology, movement, or pressure measurements;
Data enhancement algorithms applied by either device or pre-processing steps.

The emotion markup should also be able to hold information on which way an emotion classification has been obtained, e.g. by a human observer monitoring a subject directly, or via a life stream from a camera, or a recording; or by a machine, utilising which algorithms.

3.4 Ontologies of emotion descriptions

Onto 1. Mappings between different emotion representations

It should be possible to map between different emotion representations, i.e. to convert data from one emotion description (categories, dimensions, appraisals, action tendencies) to another.

These different emotion representations are not independent; rather, they describe different aspects of the complex phenomenon emotion. Insofar, it is conceptually possible to map from one representation to another one in some cases; in other cases, mappings are not fully possible.

Some use cases require mapping between different emotion representations: e.g., from categories to dimensions, from dimensions to coarse categories (a lossy mapping), from appraisals onto dimensions, from categories to appraisals, etc.

Such mappings may either be based on findings from emotion theory or they can be defined in an application-specific way.

Onto 2. Relationships between concepts in an emotion description

The concepts in an emotion description are usually not independent, but are related to one another. For example, emotion words may form a hierarchy, as suggested e.g. by prototype theories of emotions. For example, Shaver et al. (1987) classified cheerfulness, zest, contentment, pride, optimism enthrallment and relief as different kinds of joy, irritation, exasperation, rage, disgust, envy and torment as different kinds of anger, etc.

Such structures, be they motivated by emotion theory or by application-specific requirements, may be an important complement to the representations in an Emotion Markup Language. In particular, they would allow for a mapping from a larger set of categories to a smaller set of higher-level categories.

4. References

Douglas-Cowie, E., et al. (2006). HUMAINE deliverable D5g: Mid Term Report on Database Exemplar Progress. http://emotion-research.net/deliverables/D5g%20final.pdf

Ekman, P., Friesen, W. C. and Hager, J. C. (2002). Facial Action Coding System. The Manual on CD ROM. Research Nexus division of Network Information Research Corporation.

Fontaine, J., Scherer, K., Roesch, E., & Ellsworth, P. (2007). The world of emotions is not two-dimensional. Psychological Science: 18 (12), 1050-1057.

Frijda, N (1986). The Emotions. Cambridge: Cambridge University Press.

Gebhard, P., Kipp, M., Klesen, M., and Rist, T. (2003). Adding the Emotional Dimension to Scripting Character Dialogues. In: Proceedings of the 4th International Working Conference on Intelligent Virtual Agents (IVA'03), 48-56, Kloster Irsee, Germany.

Gross, J. J. (2001). "Emotion regulation in adulthood: timing is everything." Current Directions in Psychological Science 10(6). http://www-psych.stanford.edu/~psyphy/Pdfs/2001%20Current%20Directions%20in%20Psychological%20Science%20-%20Emo.%20Reg.%20in%20Adulthood%20Timing%20.pdf

Ortony, A Clore, G.L. and Collins A (1988). The cognitive structure of emotions. Cambridge University Press, New York.

Scherer, K. R. et al. (2004). Preliminary plans for exemplars: Theory. HUMAINE deliverable D3c. http://emotion-research.net/deliverables/D3c.pdf

Shaver, P., Schwartz, J., Kirson, D., and O'Connor, C. (1987). Emotion knowledge: Further exploration of a prototype approach. Journal of Personality and Social Psychology, 52:1061-1086.

Vilhjalmsson, H., Cantelmo, N., Cassell, J., Chafai, N. E., Kipp, M., Kopp, S., Mancini, M., Marsella, S., Marshall, A. N., Pelachaud, C., Ruttkay, Z., Thórisson, K. R., van Welbergen, H. and van der Werf, R. J. (2007). The Behavior Markup Language: Recent Developments and Challenges. 7th International Conference on Intelligent Virtual Agents (IVA'07), Paris, France.