W3C

Emotion Markup Language (EmotionML) 1.0

W3C Working Draft 29 July 2010

This version:
http://www.w3.org/TR/2010/WD-emotionml-20100729/
Latest version:
http://www.w3.org/TR/emotionml/
Previous version:
http://www.w3.org/TR/2009/WD-emotionml-20091029/
Editor:
Marc Schröder (DFKI GmbH)
Authors:
(in alphabetic order)
Paolo Baggia (Loquendo, S.p.A.)
Felix Burkhardt (Deutsche Telekom AG)
Alessandro Oltramari (CNR)
Catherine Pelachaud (Telecom ParisTech)
Christian Peter (Fraunhofer Gesellschaft)
Enrico Zovato (Loquendo, S.p.A.)

Abstract

As the web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The present draft specification of Emotion Markup Language 1.0 aims to strike a balance between practical applicability and scientific well-foundedness. The language is conceived as a "plug-in" language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the Second Public Working Draft of the Emotion Markup Language 1.0 specification, published on 29 July 2010. It addresses many of the issues raised in the First Public Working Draft of 29 October 2009. Changes from the First Public Working Draft can be found in Appendix A.

This document was developed by the Multimodal Interaction Working Group. The Working Group expects to advance this Working Draft to Recommendation Status.

Please send comments about this document to www-multimodal@w3.org (with public archive).

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Conventions of this document

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

Table of Contents

1 Introduction

Human emotions are increasingly understood to be a crucial aspect in human-machine interactive systems. Especially for non-expert end users, reactions to complex intelligent systems resemble social interactions, involving feelings such as frustration, impatience, or helplessness if things go wrong. Furthermore, technology is increasingly used to observe human-to-human interactions, such as customer frustration monitoring in call center applications. Dealing with these kinds of states in technological systems requires a suitable representation, which should make the concepts and descriptions developed in the affective sciences available for use in technological contexts.

This report specifies Emotion Markup Language (EmotionML) 1.0, a markup language designed to be usable in a broad variety of technological contexts while reflecting concepts from the affective sciences.

The report is work in progress. Issue notes are used to describe open questions as well as available choices.

1.1 Reasons for defining an Emotion Markup Language

As for any standard format, the first and main goal of an EmotionML is twofold: to allow a technological component to represent and process data, and to enable interoperability between different technological components processing the data.

Use cases for EmotionML can be grouped into three broad types:

  1. Manual annotation of material involving emotionality, such as annotation of videos, of speech recordings, of faces, of texts, etc;
  2. Automatic recognition of emotions from sensors, including physiological sensors, speech recordings, facial expressions, etc., as well as from multi-modal combinations of sensors;
  3. Generation of emotion-related system responses, which may involve reasoning about the emotional implications of events, emotional prosody in synthetic speech, facial expressions and gestures of embodied agents or robots, the choice of music and colors of lighting in a room, etc.

Interactive systems are likely to involve both analysis and generation of emotion-related behavior; furthermore, systems are likely to benefit from data that was manually annotated, be it as training data or for rule-based modelling. Therefore, it is desirable to propose a single EmotionML that can be used in all three contexts.

Concrete examples of existing technology that could apply EmotionML include:

The Emotion Incubator Group has listed 39 individual use cases for an EmotionML.

A second reason for defining an EmotionML is the observation that ad hoc attempts to deal with emotions and related states often lead people to make the same mistakes that others have made before. The most typical mistake is to model emotions as a small number of intense states such as anger, fear, joy, and sadness; this choice is often made irrespective of the question whether these states are the most appropriate for the intended application. Crucially, the available alternatives that have been developed in the affective science literature are not sufficiently known, resulting in dead-end situations after the initial steps of work. Careful consideration of states to study and of representations for describing them can help avoid such situations.

EmotionML makes scientific concepts of emotions practically applicable. This can help potential users to identify the suitable representations for their respective applications.

1.2 The challenge of defining a generally usable Emotion Markup Language

Any attempt to standardize the description of emotions using a finite set of fixed descriptors is doomed to failure: even scientists cannot agree on the number of relevant emotions, or on the names that should be given to them. Even more basically, the list of emotion-related states that should be distinguished varies depending on the application domain and the aspect of emotions to be focused. Basically, the vocabulary needed depends on the context of use. On the other hand, the basic structure of concepts is less controversial: it is generally agreed that emotions involve triggers, appraisals, feelings, expressive behavior including physiological changes, and action tendencies; emotions in their entirety can be described in terms of categories or a small number of dimensions; emotions have an intensity, and so on. For details, see Scientific Descriptions of Emotions in the Final Report of the Emotion Incubator Group.

Given this lack of agreement on descriptors in the field, the only practical way of defining an EmotionML is the definition of possible structural elements, their valid child elements and attributes, but to allow users to "plug in" vocabularies that they consider appropriate for their work. A central repository of such vocabularies can serve as a recommended starting point; where that seems inappropriate, users can create their custom vocabularies.

An additional challenge lies in the aim to provide a generally usable markup, as the requirements arising from the three different use cases (annotation, recognition, and generation) are rather different. Whereas manual annotation tends to require all the fine-grained distinctions considered in the scientific literature, automatic recognition systems can usually distinguish only a very small number of different states.

For the reasons outlined here, it is clear that there is an inevitable tension between flexibility and interoperability, which need to be weighed in the formulation of an EmotionML. The guiding principle in the following specification has been to provide a choice only where it is needed; to propose reasonable default options for every choice; and, ultimately, to propose mapping mechanisms where that is possible and meaningful.

1.3 Glossary of terms

The terms related to emotions are not used consistently, neither in common use nor in the scientific literature. The following glossary describes the intended meaning of terms in this document.

Action tendency
Emotions have a strong influence on the motivational state of a subject. Emotion theory associates emotions to a small set of so-called action tendencies, e.g. avoidance (relates to fear), rejecting (disgust) etc. Action tendencies can be viewed as a link between the outcome of an appraisal process and actual actions.
Affect / Affective state
In the scientific literature, the term "affect" is often used as a general term covering a range of phenomena called "affective states", including emotions, moods, attitudes, etc. Proponents of the term consider it to be more generic than "emotion", in the sense that it covers both acute and long-term, specific and unspecific states. In this report, the term "affect" is avoided so that the scope of the intended markup language is more easily accessible to the non-expert; the term "affective state" is used interchangeably with "emotion-related state".
Appraisal
The term "appraisal" is used in the scientific literature to describe the evaluation process leading to an emotional response. Triggered by an "emotion-eliciting event", an individual carries out an automatic, subjective assessment of the event, in order to determine the relevance of the event to the individual. This assessment is carried out along a number of "appraisal dimensions" such as the novelty, pleasantness or goal conduciveness of the event.
Emotion
In this report, the term "emotion" is used in a very broad sense, covering both intense and weak states, short and long term, with and without event focus. This meaning is intended to reflect the understanding of the term "emotion" by the general public. In the scientific literature on emotion theories, the term "emotion" or "fullblown emotion" refers to intense states with a strong focus on current events, often in the context of the survival-benefiting function of behavioral responses such as "fight or flight". This reading of the term seems inappropriate for the vast majority of human-machine interaction contexts, in which more subtle states dominate; therefore, where this reading is intended, the term "fullblown emotion" is used in this report.
A cover term for the broad range of phenomena intended to be covered by this specification. In the scientific literature, several kinds of emotion-related or affective states are distinguished, see Emotions and related states in the final report of the Emotion Incubator Group.
Emotion dimensions
A small number of continuous scales describing the most basic properties of an emotion. Often three dimensions are used: valence (sometimes named pleasure), arousal (or activity/activation), and potency (sometimes called control, power or dominance). However, sometimes two, or more than three dimensions are used.
Fullblown emotion
Intense states with a strong focus on current events, often in the context of the survival-benefiting function of behavioral responses such as "fight or flight".

2 Elements of Emotion Markup

The following sections describe the syntax of the main elements of EmotionML. The specification is not yet fully complete. Feedback is highly appreciated.

2.1 Document structure

2.1.1 Document root: The <emotionml> element

Annotation <emotionml>
Definition The root element of an EmotionML document.
Children The element MUST contain one or more <emotion> elements. It MAY contain a single <info> element.
Attributes
  • Required:
    • Namespace declaration for EmotionML, see EmotionML namespace.
    • version indicates the version of the specification to be used for the document and MUST have the value "1.0".
  • Optional:
    • category-set, dimension-set, appraisal-set and action-tendency-set indicate default emotion vocabularies used in an EmotionML document. Any of these attributes used at document level makes optional the use of the same attribute in any <emotion> elements within the document; the document-level attribute determines the emotion vocabularies used for any <emotion> elements for which the respective attribute is not locally specified. The attributes are of type xsd:anyURI and MUST point to a definition of an emotion vocabulary as specified in Defining vocabularies for representing emotions.
Occurrence This is the root element -- it cannot occur as a child of any other EmotionML elements.

<emotionml> is the root element of a standalone EmotionML document. It wraps a number of <emotion> elements into a single document. It may contain a single <info> element, providing document-level metadata.

The <emotionml> element MUST define the EmotionML namespace.

Example:

<emotionml version="1.0" xmlns="http://www.w3.org/2009/10/emotionml">
 ...
</emotionml>

or

<em:emotionml version="1.0" xmlns:em="http://www.w3.org/2009/10/emotionml">
 ...
</em:emotionml>

Note: One of the envisaged uses of EmotionML is to be used in the context of other markup languages. In such cases, there will be no <emotionml> root element, but <emotion> elements will be used directly in other markup -- see Examples of possible use with other markup languages.

2.1.2 A single emotion annotation: The <emotion> element

Annotation <emotion>
Definition This element represents a single emotion annotation.
Children All children are optional. However, at least one of <category>, <dimension>, <appraisal>, <action-tendency> MUST occur.
ISSUE-72: should <intensity> be included in this list? Does it make sense to state the intensity of an emotion but not its nature?

If present, the following child elements can occur only once: <category>; <intensity>; <info>.

If present, the following child elements may occur one or more times: <dimension>; <appraisal>; <action-tendency>; <reference>; <modality>.

There are no constraints on the combinations of children that are allowed.

Attributes
  • Required:
    • category-set, dimension-set, appraisal-set and action-tendency-set indicate the emotion vocabularies used in this <emotion>. The attributes are of type xsd:anyURI and MUST point to a definition of an emotion vocabulary as specified in Defining vocabularies for representing emotions. The attributes are required as follows:
      • if the <emotion> element has a child element <category>, the category-set attribute is required and must point to the definition of a category vocabulary;
      • if the <emotion> element has a child element <dimension>, the dimension-set attribute is required and must point to the definition of a dimension vocabulary;
      • if the <emotion> element has a child element <appraisal>, the appraisal-set attribute is required and must point to the definition of an appraisal vocabulary;
      • if the <emotion> element has a child element <action-tendency>, the action-tendency-set attribute is required and must point to the definition of an action tendency vocabulary.

      The attribute values default to the values of any attributes of same name on an enclosing <emotionml> element.

    • version indicates the version of the specification to be used for the <emotion> and its descendants. Its value defaults to "1.0".
  • Optional:
    • id, a unique identifier for the emotion, of type xsd:ID.
    • start and end denote the absolute starting and ending times at which an emotion or related state happened.
Occurrence as a child of <emotionml>, or in any markup using EmotionML.

The <emotion> element represents an individual emotion annotation. No matter how simple or complex its substructure is, it represents a single statement about the emotional content of some annotated item. Where several statements about the emotion in a certain context are to be made, several <emotion> elements MUST be used. See Examples of emotion annotation for illustrations of this issue.

An <emotion> element MAY have an id attribute, allowing for a unique reference to the individual emotion annotation. Since the <emotion> annotation is an atomic statement about the emotion, it is inappropriate to refer to individual emotion representations such as <category>, <dimension>, <appraisal>, <action-tendency>, <intensity> or their children directly. For this reason, these elements do not allow for an id attribute.

Whereas it is possible to use <emotion> elements in a standalone <emotionml> document, a typical use case is expected to be embedding an <emotion> into some other markup -- see Examples of possible use with other markup languages.

2.2 Representations of emotions and related states

2.2.1 The <category> element

Annotation <category>
Definition Description of an emotion or a related state using a single category.
Children None
Attributes
  • Required:
    • name, the name of the category, which must be contained in the set of categories identified in the enclosing <emotion> element's category-set attribute.
  • Optional:
    • confidence, the annotator's confidence that the annotation is correct.
Occurrence A single <category> MAY occur as a child of <emotion>.

<category> describes an emotion or a related state in terms of a single category name, given as the value of the name attribute. The name MUST belong to a clearly-identified set of category names, which MUST be defined according to Defining vocabularies for representing emotions.

The set of legal values of the name attribute is indicated in the category-set attribute of the enclosing <emotion> element. Different sets can be used, depending on the requirements of the use case. In particular, different types of emotion-related / affective states can be annotated by using appropriate value sets.

Examples:

In the following example, the emotion category "satisfaction" is being annotated; it must be contained in the definition of an emotion category vocabulary located at http://www.example.com/category/everyday-emotions.

<emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml">
    <category name="satisfaction"/>
</emotion>

The following is an annotation of an interpersonal stance "distant" which must defined in the category set at the URI given in the category-set attribute:

<emotion category-set="http://www.example.com/custom/category/interpersonal-stances.xml">
    <category name="distant"/>
</emotion>

2.2.2 The <dimension> element

Annotation <dimension>
Definition One or more <dimension> elements jointly describe an emotion or a related state according to an emotion dimension vocabulary.
Children Optionally, <dimension> MAY have a <trace> child element.
Attributes
  • Required:
    • name, the name of the dimension, which must be contained in the set of dimensions identified in the enclosing <emotion> element's dimension-set attribute.
  • Optional:
    • value, the (constant) scale value of this dimension.
    • confidence, the annotator's (constant) confidence that the annotation given for this dimension is correct.
Occurrence <dimension> elements occur as children of <emotion>. For any given dimension name in the set, zero or one occurrences are allowed within an <emotion> element.

One or more <dimension> elements jointly describe an emotion or a related state in terms of a set of emotion dimensions. The names of the emotion dimensions MUST belong to a clearly-identified set of dimension names, which MUST be defined according to Defining vocabularies for representing emotions.

The set of values that can be used as values of the name attribute is indicated in the dimension-set attribute of the enclosing <emotion> element. Different sets can be used, depending on the requirements of the use case.

There are no constraints regarding the order of the <dimension> elements within an <emotion> element.

Any given dimension is either unipolar or bipolar; its value attribute MUST contain a Scale value.

A dimension element MUST either contain a value attribute or a <trace> child element, corresponding to static and dynamic representations of Scale values, respectively.

Examples:

One of the most widespread sets of emotion dimensions used (sometimes by different names) is the combination of valence, arousal and potency. Assuming that arousal and potency are unipolar scales with typical values between 0 and 1, and valence is a bipolar scale with typical values between -1 and 1, the following example is a state of rather low arousal, very positive pleasure, and high potency -- in other words, a relaxed, positive state with a feeling of being in control of the situation:

<emotion dimension-set="http://www.example.com/emotion/dimension/PAD.xml">
    <dimension name="arousal" value="0.3"/><!-- lower-than-average arousal -->
    <dimension name="pleasure" value="0.9"/><!-- very high positive valence -->
    <dimension name="dominance" value="0.8"/><!-- relatively high potency    -->
</emotion>

In some use cases, custom sets of application-specific dimensions will be required. The following example uses a custom set of dimensions, defining a single, bipolar dimension "friendliness".

<emotion dimension-set="http://www.example.com/custom/dimension/friendliness.xml">
    <dimension name="friendliness" value="0.2"/><!-- a pretty unfriendly person -->
</emotion>

2.2.3 The <appraisal> element

Annotation <appraisal>
Definition One or more <appraisal> elements jointly describe an emotion or a related state according to an emotion appraisal vocabulary.
Children Optionally, <appraisal> MAY have a <trace> child element.
Attributes
  • Required:
    • name, the name of the appraisal, which must be contained in the set of appraisals identified in the enclosing <emotion> element's appraisal-set attribute.
  • Optional:
    • value, the (constant) scale value of this appraisal.
    • confidence, the annotator's (constant) confidence that the annotation given for this appraisal is correct.
Occurrence <appraisal> elements occur as children of <emotion>. For any given appraisal name in the set, zero or one occurrences are allowed within an <emotion> element.

One or more <appraisal> elements jointly describe an emotion or a related state in terms of a set of appraisals. The names of the appraisals MUST belong to a clearly-identified set of appraisal names, which MUST be defined according to Defining vocabularies for representing emotions.

The set of values that can be used as values of the name attribute is indicated in the appraisal-set attribute of the enclosing <emotion> element. Different sets can be used, depending on the requirements of the use case.

There are no constraints regarding the order of the <appraisal> elements within an <emotion> element.

Any given appraisal is either unipolar or bipolar; its value attribute MUST contain a Scale value.

An appraisal element MUST either contain a value attribute or a <trace> child element, corresponding to static and dynamic representations of Scale values, respectively.

Examples:

One of the most widespread sets of emotion appraisals used is the appraisals set proposed by Klaus Scherer, namely novelty, intrinsic pleasantness, goal/need significance, coping potential, and norm/self compatibility. Another very widespread set of emotion appraisals, used in particular in computational models of emotion, is the OCC set of appraisals (Ortony et al., 1988), which includes the consequences of events for oneself or for others, the actions of others and the perception of objects. Assuming some appraisal variables, say novelty is a unipolar scale and intrinsic pleasantness is a bipolar scale, the following example is a state arising from the evaluation of an unpredicted and quite unpleasant event:

<emotion appraisal-set="http://www.example.com/emotion/appraisal/scherer.xml">
    <appraisal name="novelty" value="0.8"/>
    <appraisal name="intrinsic-pleasantness" value="0.2"/>
</emotion>

In some use cases, custom sets of application-specific appraisals will be required. The following example uses a custom set of appraisals, defining single, bipolar appraisal "likelihood".

<emotion appraisal-set="http://www.example.com/custom/appraisal/likelihood.xml">
    <appraisal name="likelihood" value="0.8"/><!-- a very predictable event -->
</emotion>

2.2.4 The <action-tendency> element

Annotation <action-tendency>
Definition One or more <action-tendency> elements jointly describe an emotion or a related state according to an emotion action tendency vocabulary.
Children Optionally, <action-tendency> MAY have a <trace> child element.
Attributes
  • Required:
    • name, the name of the action tendency, which must be contained in the set of action tendencies identified in the enclosing <emotion> element's action-tendency-set attribute.
  • Optional:
    • value, the (constant) scale value of this action tendency.
    • confidence, the annotator's (constant) confidence that the annotation given for this action tendency is correct.
Occurrence <action-tendency> elements occur as children of <emotion>. For any given action tendency name in the set, zero or one occurrences are allowed within an <emotion> element.

One or more <action-tendency> elements jointly describe an emotion or a related state in terms of a set of action-tendencies. The names of the action-tendencies MUST belong to a clearly-identified set of action-tendency names, which MUST be defined according to Defining vocabularies for representing emotions.

The set of values that can be used as values of the name attribute is indicated in the action-tendency-set attribute of the enclosing <emotion> element. Different sets can be used, depending on the requirements of the use case.

There are no constraints regarding the order of the <action-tendency> elements within an <emotion> element.

Any given action tendency is either unipolar or bipolar; its value attribute MUST contain a Scale value.

A action-tendency element MUST either contain a value attribute or a <trace> child element, corresponding to static and dynamic representations of Scale values, respectively.

Examples:

One well known use of action tendencies is by N. Frijda. This model uses a number of action tendencies that are low level, diffuse behaviors from which more concrete actions could be determined. An example of someone attempting to attract someone they like by being confident, strong and attentive might look like this:

<emotion action-tendency-set="http://www.example.com/emotion/action/frijda.xml">
    <action-tendency name="approach" value="0.7"/><!-- get close -->
    <action-tendency name="avoid" value="0.0"/>
    <action-tendency name="being-with" value="0.8"/><!-- be happy -->
    <action-tendency name="attending" value="0.7"/><!-- pay attention -->
    <action-tendency name="rejecting" value="0.0"/>
    <action-tendency name="non-attending" value="0.0"/>
    <action-tendency name="agonistic" value="0.0"/>
    <action-tendency name="interrupting" value="0.0"/>
    <action-tendency name="dominating" value="0.7"/><!-- be assertive -->
    <action-tendency name="submitting" value="0.0"/>
</emotion>

In some use cases, custom sets of application-specific action-tendencies will be required. The following example shows control values for a robot who works in a factory and uses a custom set of action-tendencies, defining example actions for a robot.

<emotion action-tendency-set="http://www.example.com/custom/action/robot.xml">
    <action-tendency name="charge-battery" value="0.9"/><!-- need to charge battery soon -->
    <action-tendency name="pickup-boxes" value="0.3"/><!-- feeling tired, avoid work -->
</emotion>

2.2.5 The <intensity> element

Annotation <intensity>
Definition Represents the intensity of an emotion.
Children Optionally, an <intensity> element MAY have a <trace> child element.
Attributes
  • Required:
    • (none)
  • Optional:
    • value, the (constant) scale value of the intensity.
    • confidence, the annotator's confidence that the annotation given for this intensity is correct.
Occurrence One <intensity> item MAY occur as a child of <emotion>.

<intensity> represents the intensity of an emotion. The <intensity> element MUST either contain a value attribute or a <trace> child element, corresponding to static and dynamic representations of scale values, respectively. <intensity> is a unipolar scale.

A typical use of intensity is in combination with <category>. However, in some emotion models (e.g. Gebhard, 2005), the emotion's intensity can also be used in combination with a position in emotion dimension space, that is in combination with <dimension> elements. Therefore, intensity is specified independently of <category>.

Example:

A weak surprise could accordingly be annotated as follows.

<emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml">
    <intensity value="0.2"/>
    <category name="surprise"/>
</emotion>

The fact that intensity is represented by an element makes it possible to add meta-information. For example, it is possible to express a high confidence that the intensity is low, but a low confidence regarding the emotion category, as shown as the last example in the description of confidence.

2.3 Meta-information

2.3.1 The confidence attribute

Annotation confidence
Definition A representation of the degree of confidence or probability that a certain element of the representation is correct.
Occurrence An optional attribute of <category>, <dimension>, <appraisal> and <action-tendency> elements and of <intensity>.

ISSUE-134: Should <emotion> have a confidence attribute?

Confidence MAY be indicated separately for each of the Representations of emotions and related states. For example, the confidence that the <category> is assumed correctly is independent from the confidence that its <intensity> is correctly indicated.

Rooted in the tradition of statistics a confidence is given in an interval from 0 to 1, resembling a probability. This is an intuitive range opposing e.g. (logarithmic) score values. Insofar, the confidence is a unipolar Scale value.

Examples:

In the following one simple example is provided for each element that MAY carry a confidence attribute.

The first example indicates a very high confidence that surprise is the emotion to annotate.

<emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml">
    <category name="surprise" confidence="0.95"/> 
</emotion

The next example illustrates using confidence to indicate that the annotation of high arousal is probably correct, but the annotation of slightly positive valence may or may not be correct.

<emotion dimension-set="http://www.example.com/emotion/dimension/PAD.xml">
    <dimension name="arousal" value="0.8" confidence="0.9"/>
    <dimension name="pleasure" value="0.6" confidence="0.3"/>
</emotion>

Finally, an example for the case of <intensity>: A high confidence is named that the emotion has a low intensity.

<emotion>
    <intensity value="0.1" confidence="0.8"/>
</emotion>

Note that, as stated, obviously an emotional annotation can be a combination of the above, as in the following example: the intensity of the emotion is quite probably low, but if we have to guess, we would say the emotion is boredom.

<emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml">
    <intensity value="0.1" confidence="0.8"/>
    <category name="boredom" confidence="0.1"/>
</emotion>

2.3.2 The modality attribute

Annotation modality
Definition The modality, or list of modalities, through which the emotion is expressed. An attribute of type xsd:nmtokens which contains a space delimited set of values from an open set of values including: {face, voice, body, text, ...}.
Occurrence An optional attribute of <emotion> elements.

The modality attribute describes the modality by which an emotion is expressed, not the technical modality by which it was detected, e.g. "face" rather than "camera" and "voice" rather than "microphone". The modality is agnostic about the use case: when detecting emotion, it represents the modality from which the emotion has been detected; when generating emotion-related system behavior, it represents the emotion through which the emotion is to be expressed.

ISSUE-148: The list of pre-defined values of modality needs to be extended and refined.
ISSUE-150: Sensor type by which a modality was observed cannot be annotated.

With the current representation of modality, it is not possible to indicate the type of sensor through which the given modality was observed. For example, a face may show no emotion with a normal optical camera, but an emotion may be detected from the same face using an infrared camera.

It must be considered to what extent an optional annotation must be added which would allow:

Example:

In the following example the emotion is expressed through the voice.

<emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml" 
         modality="voice">
    <category name="satisfaction"/>
</emotion>

In case of multimodal expression of an emotion, a list of space separated modalities can be indicated in the mode attribute, like in the following example in which the two values "face" and "voice" are used.

<emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml" 
         modality="face voice">
    <category name="satisfaction"/>
</emotion>>

See also the examples in sections 5.1.2 Automatic recognition of emotions, 5.1.3 Generation of emotion-related system behavior and 5.2.3 Use with SMIL.

2.3.3 The <info> element

Annotation <info>
Definition This element can be used to annotate arbitrary metadata.
Children One or more elements in a different namespace than the EmotionML namespace, providing metadata.
Attributes
  • Optional:
    • id, a unique identifier for the info element, of type xsd:ID.
Occurence A single <info> elements MAY occur as a child of the <emotionml> root tag to indicate global metadata, i.e. the annotations are valid for the document scope; furthermore, a single <info> element MAY occur as a child of each <emotion> element to indicate local metadata that is only valid for that <emotion> element.

This element can contain arbitrary XML data in a different namespace (one option could be [RDF] data), either on a document global level or on a local "per annotation element" level.

Examples:

In the following example, the automatic classification for an annotation document was performed by a classifier based on Gaussian Mixture Models (GMM); the speakers of the annotated elements were of different German origins.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
        xmlns:classifiers="http://www.example.com/meta/classify/"
        xmlns:origin="http://www.example.com/meta/local/"
        category-set="http://www.example.com/emotion/category/everyday-emotions.xml">
    <info>
        <classifiers:classifier classifiers:name="GMM"/>
    </info>

    <emotion>
        <info><origin:localization value="bavarian"/></info>
        <category name="joy"/>
    </emotion>

    <emotion>
        <info><origin:localization value="swabian"/></info>
        <category name="sadness"/>
    </emotion>
</emotionml>

2.4 References and time

2.4.1 The <reference> element

Annotation <reference>
Definition References may be used to relate the emotion annotation to the "rest of the world", more specifically to the emotional expression, the experiencing subject, the trigger, and the target of the emotion.
Children None
Attributes
  • Required:
    • uri, a URI identifying the actual reference target. The URI MAY be extended by a media fragment, as explained in section 2.4.2.2
  • Optional:
    • role, the type of relation between the emotion and the external item referred to; one of "expressedBy" (default), "experiencedBy", "triggeredBy", "targetedAt".
    • media-type, an attribute of type xsd:string holding the MIME type of the data that the uri attribute points to.
Occurrence Multiple <reference> items MAY occur as children of <emotion>.

A <reference> element provides a link to media as a URI [RFC3986]. The semantics of references are described by the role attribute which MUST have one of four values:

For resources representing a period of time, start and end time MAY be denoted by use of the media fragments syntax, as explained in section 2.4.2.2.

There is no restriction regarding the number of <reference> elements that MAY occur as children of <emotion>.

Examples:

The following example illustrates the reference to two different URIs having a different role with respect to the emotion: one reference points to the emotion's expression, e.g. a video clip showing a user expressing the emotion; the other reference points to the trigger that caused the emotion, e.g. another video clip that was seen by the person eliciting the expressed emotion. Note that the media-type attribute can be used to differentiate between different media types such as audio, video, text, etc.

<emotion>
    <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" role="expressedBy"/>
    <reference uri="http://www.example.com/events/e12.xml" role="triggeredBy"/>
</emotion>

Several references may follow as children of one <emotion> tag, even having the same role; for example, the following annotation refers to a portion of a video and to physiological sensor data, both of which expressed the emotion:

<emotion ...>
    ...
    <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" role="expressedBy"/>
    <reference uri="http://www.example.com/data/physio/ph7.txt" role="expressedBy"/>
</emotion>

It is possible to explicitly indicate the MIME type of the item that the reference refers to:

<emotion ...>
    ...
    <reference uri="http://www.example.com/data/video/v1.avi?t=2,13"
                media-type="video/mp4"/>
</emotion>

2.4.2 Timestamps

2.4.2.1 Absolute time
Annotation start, end
Definition Attributes to denote the starting and ending absolute times. They are of type xsd:nonNegativeInteger and indicate the number of milliseconds since 1 January 1970 00:00:00 GMT.
Occurrence The attributes MAY occur inside an <emotion> element.

start and end denote the absolute starting and ending times at which an emotion or related state happened. This might be used for example with an "emotional diary" application. These attributes MAY be used with an <emotion> element, and MUST be of type xsd:nonNegativeInteger.

Examples:

In the following example, the emotion category "surprise" is annotated, immediately followed by the category "joy". The start and end attributes determine for each emotion element the absolute beginning and ending times.

<emotion start="1268647200" end="1268647330">
    <category name="surprise"/>
</emotion>
<emotion start="1268647331" end="1268647400">
    <category name="joy"/>
</emotion>

The end value MUST be greater than or equal to the start value.

The ECMAScript Date object's getTime() function is a way to determine the absolute time.

2.4.2.2 Timing in media
Annotation URI fragment: t
Definition Attributes to denote start and endpoint of an annotation in a media stream. Allowed values must be conform with the Media Fragments Specification [Media Fragments]
Occurence The URI fragment MAY occur in the uri attribute of a <reference> element.

Temporal clipping is denoted by the name t, and specified as an interval with a begin time and an end time. Either or both may be omitted, with the begin time defaulting to 0 seconds and the end time defaulting to the duration of the source media. The interval is half-open: the begin time is considered part of the interval whereas the end time is considered to be the first time point that is not part of the interval. If a single number only is given, this is the begin time.

Temporal clipping can be specified either as Normal Play Time (npt) [RFC 2326], as SMPTE timecodes, [SMPTE], or as real-world clock time (clock) [RFC 2326]. Begin and end times are always specified in the same format. The format is specified by name, followed by a colon (:), with npt: being the default.

This MAY be used with a <reference> element.

Examples:

In the following example, the emotion category "joy" is displayed in an audio file called "myAudio.wav" from the 3rd to the 9th second.

<emotion>
    <category name="joy"/>
    <reference uri="myAudio.wav#t=3,9"/>
</emotion>

In the following example, the emotion category "joy" is displayed in a video file called "myVideo.avi" in SMPTE values, resulting in the time interval [120,121.5).

<emotion>
    <category name="joy"/>
    <reference uri="myVideo.avi#t=smpte-30:0:02:00,0:02:01:15"/>
</emotion>

A last example states this in a video file in real-world clock time code, as a 1 min interval on 26th Jul 2009 from 11hrs, 19min, 1sec.

<emotion>
    <category name="joy"/>
    <reference uri="myVideo.avi#t=clock:2009-07-26T11:19:01Z,2009-07-26T11:20:01Z"/>
</emotion>

2.5 Scale values

Scale values are needed to represent content in <dimension>, <appraisal> and <action-tendency> elements, as well as in <intensity> and confidence.

Representations of scale values can vary along the following axes:

2.5.1 The value attribute

Annotation value
Definition Representation of a static scale value.
Occurrence An optional attribute of <dimension>, <appraisal> and <action-tendency> elements and of <intensity>; these elements MUST either contain a value attribute or a <trace> element.

The value attribute represents a static scale value of the enclosing element.

Conceptually, each <dimension>, <appraisal> and <action-tendency> element is either unipolar or bipolar. The definition of a set of dimensions, appraisals or action tendencies MUST define, for each item in the set, whether it is unipolar or bipolar.

<intensity> is a unipolar scale.

Legal values: For both unipolar and bipolar scales, legal values are a floating-point value from the interval [0;1].

See also ISSUE-136 on defining a neutral point depending on scale type.

Examples of the value attribute can be found in the context of the <dimension>, <appraisal> and <action-tendency> elements and of <intensity>.

2.5.2 The <trace> element

Annotation <trace>
Definition Representation of the time evolution of a dynamic scale value.
Children None
Attributes
  • Required:
    • freq, a sampling frequency in Hz.
    • samples, a space-separated list of numeric scale values representing the scale value of the enclosing element as it changes over time.
Occurrence

An optional child element of <dimension>, <appraisal> and <action-tendency> elements and of <intensity>; these elements MUST either contain a value attribute or a <trace> element.

A <trace> element represents the time course of a numeric scale value. It cannot be used for discrete scale values.

The freq attribute indicates the sampling frequency at which the values listed in the samples attribute are given.

NOTE: The <trace> representation requires a periodic sampling of values. In order to represent values that are sampled aperiodically, separate <emotion> annotations with appropriate timing information and individual value attributes may be used.

Examples:

The following example illustrates the use of a trace to represent an episode of fear during which intensity is rising, first gradually, then quickly to a very high value. Values are taken at a sampling frequency of 10 Hz, i.e. one value every 100 ms.

<emotion category-set="http://www.example.com/emotion/category/ekman-big-six.xml">
    <category name="fear"/>
    <intensity>
        <trace freq="10Hz" samples="0.1 0.1 0.15 0.2 0.2 0.25 0.25 0.25 0.3 0.3 0.35 0.5 0.7 0.8 0.85 0.85"/>
    </intensity>
</emotion>

The following example combines a trace of the appraisal "novelty" with a global confidence that the values represent the facts properly. There is a sudden peak of novelty; the annotator is reasonable certain that the annotation is correct:

<emotion appraisal-set="http://www.example.com/emotion/appraisal/scherer.xml">
    <appraisal name="novelty" confidence="0.75">
        <trace freq="10Hz" samples="0.1 0.1 0.1 0.1 0.1 0.7 0.8 0.8 0.8 0.8 0.4 0.2 0.1 0.1 0.1"/>
    </appraisal>
</emotion>

3 Defining vocabularies for representing emotions

EmotionML markup MUST refer to one or more vocabularies to be used for representing emotion-related states. Due to the lack of agreement in the community, the EmotionML specification does not preview a single default set which should apply if no set is indicated. Instead, the user MUST explicitly state the value set used.

ISSUE-105: How to define the actual vocabularies to use for <category>, <dimension>, <appraisal> and <action-tendency> remains to be specified. A suitable method may be to define an XML format in which these sets can be defined. The format for defining a vocabulary MUST fulfill at least the following requirements:

Furthermore, the format SHOULD allow for

ISSUE-136: In the vocabulary definition for unipolar vs. bipolar scales, is the neutral point on the scale always predictable from the type of scale? Think of some examples.

3.1 Centrally defined default vocabularies

ISSUE-106: The EmotionML specification SHOULD come with a carefully-chosen selection of default vocabularies, representing a suitably broad range of emotion-related states and use cases. Advice from the affective sciences is being sought to obtain a balanced set of default vocabularies.

The following is a preliminary list of emotion vocabularies that can be used with EmotionML. The guiding principle for selecting "recommended" emotion vocabularies has been to select vocabularies that are either commonly used in technological contexts, or represent current emotion models from the scientific literature. Also, given the difficulty to define mappings between emotion categories, dimensions, appraisals and action tendencies, we have included pairs or groups of vocabularies where these mappings are rather well defined. The selection is necessarily incomplete; many highly relevant emotion models are not listed here. Where they are needed, users can write a definition as described in User-defined custom vocabularies.

ISSUE-149: The vocabularies listed in this section are preliminary. The descriptions of both the vocabularies and the individual terms are incomplete.

NOTE: Feedback on the selection of "default" emotion vocabularies in this section is highly appreciated. Please send comments to www-multimodal@w3.org (with public archive).

3.1.1 Emotion category sets

Ekman's "big six" basic emotions

These six terms are proposed by Paul Ekman (Ekman, 1972, p. 251-252) as basic emotions with universal facial expressions -- emotions that are recognized and produced in all human cultures.

Term Description
anger
disgust
fear
happiness
sadness
surprised

Everyday emotion vocabulary

These 17 terms are the result of a study by Cowie et al. (Cowie et al., 1999) investigating emotions that frequently occur in everyday life.

Term Description
affectionate
afraid
amused
angry
bored
confident
content
disappointed
excited
happy
interested
loving
pleased
relaxed
sad
satisfied
worried

OCC categories

The 22 OCC categories are proposed by Ortony, Clore and Collins (Ortony et al., 1988, p. 19) as part of their appraisal model. See also OCC appraisals below.

Term Description
admiration
anger
disappointment
distress
fear
fears-confirmed
gloating
gratification
gratitude
happy-for
hate
hope
joy
love
pity
pride
relief
remorse
reproach
resentment
satisfaction
shame

FSRE categories

The 24 FSRE categories are used in the study by Fontaine, Scherer, Roesch and Ellsworth (Fontaine et al., 2007, p. 1055) investigating the dimensionality of emotion space. See also FSRE dimensions below.

Term Description
anger
anxiety
being hurt
compassion
contempt
contentment
despair
disappointment
disgust
fear
guilt
happiness
hate
interest
irritation
jealousy
joy
love
pleasure
pride
sadness
shame
stress
surprise

Frijda's categories

This category set is included because according to Nico Frijda's proposal of action tendencies (Frijda, 1986), these categories are related to action tendencies. See Frijda's action tendencies, below.

Term Description
anger related to action tendency 'agnostic'
arrogance related to action tendency 'approach'
desire related to action tendency 'approach'
disgust related to action tendency 'rejecting'
enjoyment related to action tendency 'being-with'
fear related to action tendency 'avoidance'
humility related to action tendency 'submitting'
indifference related to action tendency 'nonattending'
interest related to action tendency 'attending'
resignation related to action tendency 'submitting'
shock related to action tendency 'interrupting'
surprise related to action tendency 'interrupting'

3.1.2 Emotion dimension sets

Mehrabian's PAD dimensions

Mehrabian proposed a three-dimensional description of emotion in terms of Pleasure, Arousal, and Dominance (PAD; Mehrabian, 1996, p. 264).

Term Description
pleasure
arousal
dominance

FSRE dimensions

The four emotion dimensions obtained in the study by Fontaine, Scherer, Roesch and Ellsworth (Fontaine et al., 2007, p. 1051 and 1055) investigating the dimensionality of emotion space. See also FSRE categories above.

Term Description
valence also named evaluation or pleasantness
potency also named control
arousal also named activation
unpredictability

3.1.3 Appraisal sets

OCC appraisals

The following appraisals were proposed by Ortony, Clore and Collins (Ortony et al., 1988) in their appraisal model. See also OCC categories above.

Term Description
desirability relevant for event based emotions. (pleased/displeased)
praiseworthiness relevant for attribution emotions. (approving/disapproving)
appealingness relevant for attraction emotions. (liking/disliking)
desirability-for-other related to fortunes of others. Whether the event is desirable for the other.
deservingness related to fortunes of others. Whether the other “deserves” the event.
liking related to fortunes of others. Whether the other is liked or not. These distinguish between: happy-for, pity, gloating (schadenfreude), and resentment.
likelihood relevant for prospect emotions. (hope/fear)
effort relevant for prospect emotions. How much effort the individual invested in the outcome.
realization relevant for prospect emotions. The actual resulting outcome. These distinguish between: relief, disappointment, satisfaction, and fears-confirmed.
strength-of-identification relevant for attribution emotions. The stronger one identifies with the other, that distinguishes between whether pride or admiration is felt.
expectation-of-deviation relevant for attribution emotions. Distinguishes whether the other is expected to act in the manner deserving of admiration or reproach. These distinguish b between: pride, shame, admiration, reproach.
familiarity relevant for attraction emotions. (love/hate)

Scherer's appraisals

The following list of appraisals was proposed by Klaus Scherer as a sequence of Stimulus Evaluation Checks (SECs) in his Component Process Model of emotion (Scherer, 1984, p. 310; Scherer, 1999, p. 639).

Term Description
Novelty
suddenness
familiarity
predictability
Intrinsic pleasantness
intrinsic-pleasantness
Goal significance
relevance-person

Relevance to the concerns of the person him- or herself, e.g. survival, bodily integrity, fulfillment of basic needs, self-esteem

relevance-relationship Relevance to concerns regarding relationships with others, e.g. establishment, continued existence and intactness of relationships, cohesion of social groups
relevance-social-order Relevance to social order, e.g. sense of orderliness, predictability in a social environment including fairness & appropriateness
outcome-probability
consonant-with-expectation
goal-conduciveness
urgency
Coping potential
agent-self The event was caused by the agent him- or herself
agent-other The event was caused by another person
agent-nature The event was caused by chance or by nature
cause-intentional 0: caused by negligence, 1: caused intentionally
control Is the event controllable?
power Power of the agent him- or herself
adjustment-possible Is adjustment possible to the agent's own goals?
Compatibility with standards
norm-compatibility Compatibility with external standards, such as norms or demands of a reference group
self-compatibility Compatibility with internal standards, such as the self ideal or internalized moral code

EMA appraisals

The following list of appraisals was compiled by Gratch and Marsella (Gratch & Marsella, 2004) for their EMA model.

Term Description
relevance
desirability
agency causal attribution -- who caused the event?
blame blame and credit -- part of causal attribution
likelihood
unexpectedness
urgency
ego-involvement
controllability part of coping potential
changeability part of coping potential
power part of coping potential
adaptability part of coping potential

3.1.4 Action tendency sets

Frijda's action tendencies

This set of action tendencies was proposed by Nico Frijda (Frijda, 1986), who also coined the term 'action tendency'. See also Frijda's category set, above.

Term Description
approach aimed towards access and consummatory activity, related to desire
avoidance aimed towards own inaccessibility and protection, related to fear
being-with aimed at contact and interaction, related to enjoyment
attending aimed at identification, related to interest
rejecting aimed at removal of object, related to disgust
nonattending aimed at selecting, related to indifference
agnostic aimed at removal of obstruction and regaining control, related to anger
interrupting aimed at reorientation, related to shock and surprise
dominating aimed at retained control, related to arrogance
submitting aimed at deflecting pressure, related to humility and resignation

3.2 User-defined custom vocabularies

EmotionML markup makes no syntactic difference between referring to centrally-defined default vocabularies and referring to user-defined custom vocabularies. Therefore, one option to define a custom vocabulary is to create a definition XML file in the same way as it is done for the default vocabularies.

ISSUE-107: It may be desirable to embed the definition of custom vocabularies inside an <emotionml> document, e.g. by placing the definition XML element as a child element below the document element <emotionml>.

4 Conformance

4.1 EmotionML namespace

The EmotionML namespace is "http://www.w3.org/2009/10/emotionml". All EmotionML elements MUST use this namespace.

ISSUE-108: This section is a stub. It will be filled with the proper content in a future working draft.

4.2 Use with other namespaces

The EmotionML namespace is intended to be used with other XML namespaces as per the Namespaces in XML Recommendation (1.0 [XML-NS10] or 1.1 [XML-NS11], depending on the version of XML being used).

ISSUE-109: This section is a stub. It will be filled with the proper content in a future working draft.

4.3 Schema validation and processor validation of EmotionML documents

The EmotionML schema is designed to validate the structural integrity of an EmotionML document or document fragment, but cannot verify whether the emotion descriptors used in the name attribute of <category>, <dimension>, <appraisal> and <action-tendency> are consistent with the vocabularies indicated in the respective category-set, dimension-set, appraisal-set and action-tendency-set attributes.

It is the responsibility of an EmotionML processor to verify that the use of descriptor names and values is consistent with the vocabulary definition.

5 Examples (informative)

5.1 Examples of emotion annotation

5.1.1 Manual annotation of emotional material

Annotation of static images

An image gets annotated with several emotion categories at the same time, but different intensities.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata"
           category-set="http://www.example.com/custom/hall-matsumoto-emotions.xml">
   <info>
      <meta:media-type>image</meta:media-type>
      <meta:media-id>disgust</meta:media-id>
      <meta:media-set>JACFEE-database</meta:media-set>
      <meta:doc>Example adapted from (Hall & Matsumoto 2004) http://www.davidmatsumoto.info/Articles/2004_hall_and_matsumoto.pdf
      </meta:doc>
   </info>

   <emotion>
       <category name="Disgust"/>
       <intensity value="0.82"/>
   </emotion>
   <emotion>
       <category name="Contempt"/>
       <intensity value="0.35"/>
   </emotion>
   <emotion>
       <category name="Anger"/>
       <intensity value="0.12"/>
   </emotion>
   <emotion>
       <category name="Surprise"/>
       <intensity value="0.53"/>
   </emotion>
</emotionml>
Annotation of videos

Example 1: Annotation of a whole video: several emotions are annotated with different intensities.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata"
           category-set="http://www.example.com/custom/humaine-database-labels.xml">
    <info>
        <meta:media-type>video</meta:media-type>
        <meta:media-name>ed1_4</meta:media-name>
        <meta:media-set>humaine database</meta:media-set>
        <meta:coder-set>JM-AB-UH</meta:coder-set>
    </info>
    <emotion>
        <category name="Amusement"/>
        <intensity value="0.52"/>
    </emotion>
    <emotion>
        <category name="Irritation"/>
        <intensity value="0.63"/>
    </emotion>
    <emotion>
        <category name="Relaxed"/>
        <intensity value="0.02"/>
    </emotion>
    <emotion>
        <category name="Frustration"/>
        <intensity value="0.87"/>
    </emotion>
    <emotion>
        <category name="Calm"/>
        <intensity value="0.21"/>
    </emotion>
    <emotion>
        <category name="Friendliness"/>
        <intensity value="0.28"/>
    </emotion>
</emotionml>

Example 2: Annotation of a video segment, where two emotions are annotated for the same timespan.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata"
           category-set="http://www.example.com/custom/emotv-labels.xml">
    <info>
        <meta:media-type>video</meta:media-type>
        <meta:media-name>ext-03</meta:media-name>
        <meta:media-set>EmoTV</meta:media-set>
        <meta:coder>4</meta:coder>
    </info>

    <emotion>
        <category name="irritation"/>
        <intensity value="0.46"/>
        <reference uri="file:ext03.avi?t=3.24,15.4">
    </emotion>
    <emotion>
        <category name="despair"/>
        <intensity value="0.48"/>
        <reference uri="file:ext03.avi?t=3.24,15.4"/>
    </emotion>
</emotionml>

5.1.2 Automatic recognition of emotions

This example shows how automatically annotated data from three affective sensor devices might be stored or communicated.

It shows an excerpt of an episode experienced on 23 November 2001 from 14:36 onwards (absolute start time is 1006526160 milliseconds since 1 January 1970 00:00:00 GMT). Each device detects an emotion, but at slightly different times and for different durations.

The next entry of observed emotions occurs about 6 minutes later (absolute start time is 1006526520 milliseconds since 1 January 1970 00:00:00 GMT). Only the physiology sensor has detected a short glimpse of anger, for the visual and IR camera it was below their individual threshold so no entry from them.

For simplicity, all devices use categorical annotations and the same set of categories. Obviously it would be possible, and even likely, that different devices from different manufacturers provide their data annotated with different emotion sets.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
    category-set="http://www.example.com/emotion/category/everyday-emotions.xml">
 ...
<emotion start="1006526160" modality="face">
  <!--the first modality detects excitement.
      It is a camera observing the face. An URI to the database
      (a dedicated port at the server) is provided to access the
      video stream.-->
  <category name="excited"/>
  <reference uri="http://www.example.com/#t=26,98"/>
</emotion>

<emotion start="1006526160" modality="facial-skin-color">
  <!--the second modality detects anger. It is an IR camera
      observing the face. An URI to the database (a dedicated port
      at the server) is provided to access the video stream.-->
  <category name="angry"/>
  <reference uri="http://www.example.com/#t=23,108"/>
</emotion>

<emotion start="1006526160" modality="physiology">
  <!--the third modality detects excitement again. It is a
      wearable device monitoring physiological changes in the
      body. An URI to the database (a dedicated port at the
      server) is provided to access the data stream.-->
  <category name="excited"/>
  <reference uri="http://www.example.com/#t=19,101"/>
</emotion>

<emotion start="1006526520" modality="physiology">
  <category name="angry"/>
  <reference uri="http://www.example.com/#t=2,6"/>
</emotion>
 ...
</emotionml>

NOTE that handling of complex emotions is not explicitly specified. This example assumes that parallel occurrences of emotions will be determined on the time stamp.

5.1.3 Generation of emotion-related system behavior

The following example describes various aspects of an emotionally competent robot whose battery is nearly empty. The robot is in a global state of high arousal, negative pleasure and low dominance, i.e. a negative state of distress paired with some urgency but quite limited power to influence the situation. It has a tendency to seek a recharge and to avoid picking up boxes. However, sensor data displays an unexpected obstacle on the way to the charging station. This triggers planning of expressive behavior of frowning. The annotations are grouped into a stand-alone EmotionML document here; in the real world, the various aspects would more likely be embedded into different specialized markup in various parts of the Robot architecture.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata">
    <info>
        <meta:name>robbie the robot example</meta:name>
    </info>

    <!-- Robot's current global state configuration: negative, active, powerless -->
    <emotion dimension-set="http://www.example.com/emotion/dimension/PAD.xml">
        <dimension name="pleasure" value="0.2"/>
        <dimension name="arousal" value="0.8"/>
        <dimension name="dominance" value="0.3"/>
    </emotion>

    <!-- Robot's action tendencies: want to recharge -->
    <emotion action-tendency-set="http://www.example.com/custom/action/robot.xml">
        <action-tendency name="charge-battery" value="0.9"/>
        <action-tendency name="seek-shelter" value="0.7"/>
        <action-tendency name="pickup-boxes" value="0.1"/>
    </emotion>

    <!-- Appraised value of incoming event: obstacle detected, appraised as novel and unpleasant -->
    <emotion appraisal-set="http://www.example.com/emotion/appraisal/scherer.xml"
             modality="laser-scanner">
        <appraisal name="novelty" value="0.8" confidence="0.4"/>
        <appraisal name="intrinsic-pleasantness" value="0.2" confidence="0.8"/>
        <reference role="triggeredBy" uri="file:scannerdata.xml#obstacle27"/>
    </emotion>

    <!-- Robot's planned facial gestures: will frown -->
    <emotion category-set="http://www.example.com/custom/robot-emotions.xml"
        modality="face">
        <category name="frustration"/>
        <reference role="expressedBy" uri="file:behavior-repository.xml#frown"/>
    </emotion>
</emotionml>

5.2 Examples of possible use with other markup languages

One intended use of EmotionML is as a plug-in for existing markup languages. For compatibility with text-annotating markup languages such as SSML, EmotionML avoids the use of text nodes. All EmotionML information is encoded in element and attribute structures.

This section illustrates the concept using two existing W3C markup languages: EMMA and SSML.

ISSUE-127: Must EmotionML work with HTML 5? Think of possible use cases.

5.2.1 Use with EMMA

EMMA is made for representing arbitrary analysis results; one of them could be the emotional state. The following example represents an analysis of a non-verbal vocalization; its emotion is described as most probably a low-intensity state, maybe boredom.

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"
        xmlns="http://www.w3.org/2009/10/emotionml">
    <emma:interpretation emma:start="12457990" emma:end="12457995" emma:mode="voice" emma:verbal="false">

        <emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml">
            <intensity value="0.1" confidence="0.8"/>
            <category name="boredom" confidence="0.1"/>
        </emotion>

    </emma:interpretation>
</emma:emma>

5.2.2 Use with SSML

Two options for using EmotionML with SSML can be illustrated.

First, it is possible with the latest version of SSML [SSML 1.1] to use arbitrary markup belonging to a different namespace anywhere in an SSML document; only SSML processors that support the markup would take it into account. Therefore, it is possible to insert EmotionML below, for example, an <s> element representing a sentence; the intended meaning is that the enclosing sentence should be spoken with the given emotion, in this case a moderately doubtful tone of voice:

<?xml version="1.0"?>
<speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis"
         xmlns:emo="http://www.w3.org/2009/10/emotionml"
         xml:lang="en-US">
    <s>
        <emo:emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml">
            <emo:category name="doubt"/>
            <emo:intensity value="0.4"/>
        </emo:emotion>

        Do you need help?
    </s>
</speak>

Second, a future version of SSML could explicitly preview the annotation of paralinguistic information, which could fill the gap between the extralinguistic, speaker-constant settings of the <voice> tag and the linguistic elements such as <s>, <emphasis>, <say-as> etc. The following example assumes that there is a <style> tag for paralinguistic information in a future version of SSML. The style could either embed an <emotion>, as follows:

<?xml version="1.0"?>
<speak version="x.y" xmlns="http://www.w3.org/2001/10/synthesis"
         xmlns:emo="http://www.w3.org/2009/10/emotionml"
         xml:lang="en-US">
    <s>
      <style>
        <emo:emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml">
            <emo:category name="doubt"/>
            <emo:intensity value="0.4"/>
        </emo:emotion>

        Do you need help?
      </style>
    </s>
</speak>

Alternatively, the <style> could refer to a previously defined <emotion>, for example:

<?xml version="1.0"?>
<speak version="x.y" xmlns="http://www.w3.org/2001/10/synthesis"
         xmlns:emo="http://www.w3.org/2009/10/emotionml"
         xml:lang="en-US">
    <emo:emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml"
            id="somewhatDoubtful">
        <emo:category name="doubt"/>
        <emo:intensity value="0.4"/>
    </emo:emotion>

    <s>
        <style ref="#somewhatDoubtful">
            Do you need help?
        </style>
    </s>
</speak>

5.2.3 Use with SMIL

Using EmotionML for the use case of generating system behavior requires elements of scheduling and surface form realization which are not part of EmotionML. Necessarily, this use case relies on other languages to provide the needed functionality. This is in line with the aim of EmotionML to serve as a specialized plug-in language.

This example illustrates the idea in terms of a simplified version of a storytelling application. A virtual agent tells a story using voice and facial animation. The expression in face and voice is influenced by the rendering engine in terms of EmotionML. The engine in this example uses SMIL [SMIL] for defining the temporal relation between events; EmotionML is used via SMIL's generic <ref> element. In general it is the engine which knows how to render the emotion in the virtual agent's expressive capabilities. To override this, the second <emotion> contains an explicit request to realize the emotional expression using both face and voice modalities.

ridinghood.smil:

<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0">
  <head> ... </head>
  <body>
    <par duration="8s">
      <img src="file:forest.jpg"/>
      <smileText>The little girl was enjoying the walk in the forest.</smileText>
      <ref src="file:ridinghood.emotionml#emotion1"/>
    </par>
    <par duration="5s">
      <img src="file:wolf.jpg"/>
      <smileText>Suddenly a dark shadow appeared in front of her.</smileText>
      <ref src="file:ridinghood.emotionml#emotion2"/>
    </par>

  </body>
</smil>

ridinghood.emotionml:

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
    category-set="http://www.example.com/emotion/category/everyday-emotions.xml"
    appraisal-set="http://www.example.com/emotion/appraisal/scherer.xml">

  <emotion id="emotion1">
    <category name="contentment"/>
    <intensity value="0.7"/>
  </emotion>

  <emotion id="emotion2" modality="face voice">
    <category name="fear"/>
    <intensity value="0.9"/>
    <appraisal name="novelty" value="0.9"/>
    <appraisal name="intrinsic-pleasantness" value="0.1"/>
  </emotion>
</emotionml>

Similar principles for decoupling emotion markup from the temporal organization of generating system behavior can be applied using other representations, including interactive setups.

6 References

Cowie et al., 1999
Cowie, R., Douglas-Cowie, E., Appolloni, B., Taylor, J., Romano, A., & Fellenz, W. (1999). What a neural net needs to know about emotion words. In N. Mastorakis (Ed.), Computational Intelligence and Applications (pp. 109-114). World Scientific & Engineering Society Press.
Ekman, 1972
Ekman, P. (1972). Universals and Cultural Differences in Facial Expressions of Emotion. In J. Cole (Ed.), Nebraska Symposium on Motivation (Vol. 19, pp.207-282). University of Nebraska Press.
EMMA
EMMA: Extensible MultiModal Annotation markup language version 1.0, Michael Johnston, et al., Editors. World Wide Web Consortium, 11 December 2007.
Emotion Incubator Group
W3C Emotion Incubator Group, M. Schröder, E. Zovato, H. Pirker, C. Peter, F. Burkhardt, Editors. Final Report of the Emotion Incubator Group at the World Wide Web Consortium, 10 July 2007.
Emotion Markup Language Incubator Group
Elements of an EmotionML 1.0, M. Schröder, Editor. Final Report of the Emotion Markup Language Incubator Group at the World Wide Web Consortium, 20 November 2008.
EmotionML Requirements
Emotion Markup Language: Requirements with Priorities. F. Burkhardt and M. Schröder. W3C Incubator Group Report, 13 May 2008.
Fontaine et al., 2007
Fontaine, J. R., Scherer, K. R., Roesch, E. B., & Ellsworth, P. C.(2007). The World of Emotions Is Not Two-Dimensional. Psychological Science, 18(12), 1050-1057.
Frijda, 1986
Frijda, N. H. (1986). The Emotions. Cambridge, UK: Cambridge University Press.
Gebhard, 2005
Gebhard, P. (2005). ALMA - A layered model of affect. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-05). Utrecht.
Gratch and Marsella, 2004
Gratch, J., & Marsella, S. (2004). A domain-independent framework for modeling emotion. Cognitive Systems Research, 5(4), 269-306.
Media Fragments URI
Media Fragments URI 1.0, Raphaël Troncy et al., Editors. World Wide Web Consortium, W3C Working Draft 24 June 2010.
Mehrabian, 1996
Mehrabian, A. (1996). Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in Temperament. Current Psychology, 14(4), 261-292.
Ortony et al., 1988
Ortony, A., Clore, G. L., & Collins, A. (1988). The Cognitive Structure of Emotion. Cambridge, UK: Cambridge University Press.
RDF
RDF/XML Syntax Specification (Revised), Dave Beckett, Editor. World Wide Web Consortium, W3C Recommendation 10 February 2004.
RelaxNG
RELAX NG Specification, James Clark and Makoto Murata, Editors. OASIS, Committee Specification, 2001.
RFC2119
Key words for use in RFCs to Indicate Requirement Levels, S. Bradner, Editor. IETF RFC 2119, March 1997.
RFC 2326
Real Time Streaming Protocol (RTSP). IETF RFC 2326, April 1998. Available at http://www.ietf.org/rfc/rfc2326.txt.
RFC3986
Uniform Resource Identifier (URI): Generic Syntax, T. Berners-Lee et al., Editors. IETF RFC 3986, January 2005.
Schematron
Information technology - Document Schema Definition Languages (DSDL) - Part 3: Rule-based validation - Schematron. ISO/IEC 19757-3:2006.
Scherer, 1984
Scherer, K. R. (1984). On the nature and function of emotion: A component process approach. In K. R. Scherer & P. Ekman (Eds.), Approaches to emotion (p. 293-317). Hillsdale, NJ: Erlbaum.
Scherer, 1999
Scherer, K. R. (1999). Appraisal theory. In T. Dalgleish & M. J. Power (Eds.), Handbook of Cognition & Emotion (p. 637-663). New York: John Wiley.
SMIL
Synchronized Multimedia Integration Language (SMIL) Version 3.0, Dick Bulterman et al., Editors. W3C Recommendation, 1 December 2008.
SMIL Clock Value Syntax
Synchronized Multimedia Integration Language (SMIL) Version 3.0 Time Clock Syntax, S. Mullender et al., Editors. W3C Proposed Recommendation, 6 October 2008.
SMPTE
SMPTE RP 136 Time and Control Codes for 24, 25 or 30 Frame-Per-Second Motion-Picture Systems
SSML
Speech Synthesis Markup Language (SSML) Version 1.0, Daniel C. Burnett, et al., Editors. World Wide Web Consortium, W3C Recommendation, 7 September 2004.
SSML 1.1
Speech Synthesis Markup Language (SSML) Version 1.1, Daniel C. Burnett, et al., Editors. World Wide Web Consortium, W3C Proposed Recommendation, 23 February 2010.
XML-NS10
Namespaces in XML 1.0, Tim Bray et al., Editors. World Wide Web Consortium, W3C Recommendation, 16 August 2006.
XML-NS11
Namespaces in XML 1.1, Tim Bray et al., Editors. World Wide Web Consortium, W3C Recommendation, 2006.
XML Schema
XML Schema Part 1: Structures Second Edition, Henry S. Thompson et al., Editors. World Wide Web Consortium, W3C Recommendation, 2004.
XProc
XProc: An XML Pipeline Language, Norman Walsh et al., Editors. World Wide Web Consortium, Working Draft, 14 August 2008.
XSLT
XSL Transformations (XSLT) Version 1.0, James Clark, Editor. W3C Recommendation. 16 November 1999.
W3C Datetime Note
Date and Time Formats, M. Wolf and C. Wicksteed. W3C Note, 1997.

7 Acknowledgments

The authors wish to acknowledge the contributions by all members of the Emotion Markup Language Incubator Group and the Emotion Incubator Group, in particular the following persons (in alphabetic order):


Appendix A: Changes since previous working draft

This Appendix points out the main changes since the previous working draft of 29 October 2009; for more details, see the diff-marked version of this specification (non-normative).