W3C

Emotion Markup Language (EmotionML) 1.0

W3C Working Draft 7 April 2011

This version:
http://www.w3.org/TR/2011/WD-emotionml-20110407/
Latest version:
http://www.w3.org/TR/emotionml/
Previous version:
http://www.w3.org/TR/2010/WD-emotionml-20100729/
Editor:
Marc Schröder (DFKI GmbH)
Authors:
(in alphabetic order)
Paolo Baggia (Loquendo, S.p.A.)
Felix Burkhardt (Deutsche Telekom AG)
Catherine Pelachaud (Telecom ParisTech)
Christian Peter (Fraunhofer Gesellschaft)
Enrico Zovato (Loquendo, S.p.A.)

Abstract

As the web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The specification of Emotion Markup Language 1.0 aims to strike a balance between practical applicability and scientific well-foundedness. The language is conceived as a "plug-in" language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a Last Call Working Draft of "Emotion Markup Language 1.0", published on 7 April 2011. The W3C Membership and other interested parties are invited to review the document and send comments to www-multimodal@w3.org (with public archive) until 7 June 2011.

This Last Call Working Draft has addressed all open issues from the previous working draft, as well as the issues which were raised at the W3C workshop on EmotionML. The changes compared to the previous Working Draft of 29 July 2010 are listed in Appendix A.

This document was developed by the Multimodal Interaction Working Group. The Working Group expects to advance this Working Draft to Recommendation Status.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Conventions of this document

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

The sections in the main body of this document are normative unless otherwise specified. The appendices in this document are informative unless otherwise indicated explicitly.

Table of Contents

1 Introduction

This section is informative.

Human emotions are increasingly understood to be a crucial aspect in human-machine interactive systems. Especially for non-expert end users, reactions to complex intelligent systems resemble social interactions, involving feelings such as frustration, impatience, or helplessness if things go wrong. Furthermore, technology is increasingly used to observe human-to-human interactions, such as customer frustration monitoring in call center applications. Dealing with these kinds of states in technological systems requires a suitable representation, which should make the concepts and descriptions developed in the affective sciences available for use in technological contexts.

This report specifies Emotion Markup Language (EmotionML) 1.0, a markup language designed to be usable in a broad variety of technological contexts while reflecting concepts from the affective sciences.

1.1 Reasons for defining an Emotion Markup Language

As for any standard format, the first and main goal of an EmotionML is twofold: to allow a technological component to represent and process data, and to enable interoperability between different technological components processing the data.

Use cases for EmotionML can be grouped into three broad types:

  1. Manual annotation of material involving emotionality, such as annotation of videos, of speech recordings, of faces, of texts, etc;
  2. Automatic recognition of emotions from sensors, including physiological sensors, speech recordings, facial expressions, etc., as well as from multi-modal combinations of sensors;
  3. Generation of emotion-related system responses, which may involve reasoning about the emotional implications of events, emotional prosody in synthetic speech, facial expressions and gestures of embodied agents or robots, the choice of music and colors of lighting in a room, etc.

Interactive systems are likely to involve both analysis and generation of emotion-related behavior; furthermore, systems are likely to benefit from data that was manually annotated, be it as training data or for rule-based modeling. Therefore, it is desirable to propose a single EmotionML that can be used in all three contexts.

Concrete examples of existing technology that could apply EmotionML include:

The Emotion Incubator Group has listed 39 individual use cases for an EmotionML.

A second reason for defining an EmotionML is the observation that ad hoc attempts to deal with emotions and related states often lead people to make the same mistakes that others have made before. The most typical mistake is to model emotions as a small number of intense states such as anger, fear, joy, and sadness; this choice is often made irrespective of the question whether these states are the most appropriate for the intended application. Crucially, the available alternatives that have been developed in the affective science literature are not sufficiently known, resulting in dead-end situations after the initial steps of work. Careful consideration of states to study and of representations for describing them can help avoid such situations.

EmotionML makes scientific concepts of emotions practically applicable. This can help potential users to identify the suitable representations for their respective applications.

1.2 The challenge of defining a generally usable Emotion Markup Language

Any attempt to standardize the description of emotions using a finite set of fixed descriptors is doomed to failure: even scientists cannot agree on the number of relevant emotions, or on the names that should be given to them. Even more basically, the list of emotion-related states that should be distinguished varies depending on the application domain and the aspect of emotions to be focused. Basically, the vocabulary needed depends on the context of use. On the other hand, the basic structure of concepts is less controversial: it is generally agreed that emotions involve triggers, appraisals, feelings, expressive behavior including physiological changes, and action tendencies; emotions in their entirety can be described in terms of categories or a small number of dimensions; emotions have an intensity, and so on. For details, see Scientific Descriptions of Emotions in the Final Report of the Emotion Incubator Group.

Given this lack of agreement on descriptors in the field, the only practical way of defining an EmotionML is the definition of possible structural elements and their valid child elements and attributes, but to allow users to "plug in" vocabularies that they consider appropriate for their work. A separate W3C Working Draft complements this specification to provide a central repository of [Vocabularies for EmotionML] which can serve as a starting point; where the vocabularies listed there seem inappropriate, users can create their custom vocabularies.

An additional challenge lies in the aim to provide a generally usable markup, as the requirements arising from the three different use cases (annotation, recognition, and generation) are rather different. Whereas manual annotation tends to require all the fine-grained distinctions considered in the scientific literature, automatic recognition systems can usually distinguish only a very small number of different states.

For the reasons outlined here, it is clear that there is an inevitable tension between flexibility and interoperability, which need to be weighed in the formulation of an EmotionML. The guiding principle in the following specification has been to provide a choice only where it is needed, and to propose reasonable default options for every choice.

1.3 Glossary of terms

The terms related to emotions are not used consistently, neither in common use nor in the scientific literature. The following glossary describes the intended meaning of terms in this document.

Action tendency
Emotions have a strong influence on the motivational state of a subject. Emotion theory associates emotions to a small set of so-called action tendencies, e.g. avoidance (relates to fear), rejecting (disgust) etc. Action tendencies can be viewed as a link between the outcome of an appraisal process and actual actions.
Affect / Affective state
In the scientific literature, the term "affect" is often used as a general term covering a range of phenomena called "affective states", including emotions, moods, attitudes, etc. Proponents of the term consider it to be more generic than "emotion", in the sense that it covers both acute and long-term, specific and unspecific states. In this report, the term "affect" is avoided so that the scope of the intended markup language is more easily accessible to the non-expert; the term "affective state" is used interchangeably with "emotion-related state".
Appraisal
The term "appraisal" is used in the scientific literature to describe the evaluation process leading to an emotional response. Triggered by an "emotion-eliciting event", an individual carries out an automatic, subjective assessment of the event, in order to determine the relevance of the event to the individual. This assessment is carried out along a number of "appraisal dimensions" such as the novelty, pleasantness or goal conduciveness of the event.
Attitude
In psychology, "attitude" is related to the global evaluation of an object, such as a person, an object, oneself, etc. Attitude is considered to include an emotional component, as well as cognition and behavior. However, the term "attitude" is sometimes used with slightly different meanings, such as speaking style ("he said it with a certain attitude") or more generally personal lifestyle ("she has quite an attitude"). Because of this ambiguity, this specification avoids the term.
Emotion
In this report, the term "emotion" is used in a very broad sense, covering both intense and weak states, short and long term, with and without event focus. This meaning is intended to reflect the understanding of the term "emotion" by the general public. In the scientific literature on emotion theories, the term "emotion" or "fullblown emotion" refers to intense states with a strong focus on current events, often in the context of the survival-benefiting function of behavioral responses such as "fight or flight". This reading of the term seems inappropriate for the vast majority of human-machine interaction contexts, in which more subtle states dominate; therefore, where this reading is intended, the term "fullblown emotion" is used in this report.
A cover term for the broad range of phenomena intended to be covered by this specification. In the scientific literature, several kinds of emotion-related or affective states are distinguished, see Emotions and related states in the final report of the Emotion Incubator Group.
Emotion dimensions
A small number of continuous scales describing the most basic properties of an emotion. Often three dimensions are used: valence (sometimes named pleasure), arousal (or activity/activation), and potency (sometimes called control, power or dominance). However, sometimes two, or more than three dimensions are used.
Fullblown emotion
Intense states with a strong focus on current events, often in the context of the survival-benefiting function of behavioral responses such as "fight or flight".

2 Elements of Emotion Markup

The following sections describe the syntax of the main elements of EmotionML.

2.1 Document structure

2.1.1 Document root: The <emotionml> element

Annotation <emotionml>
Definition The root element of an EmotionML document.
Children The element MAY contain one or more <emotion> elements. It MAY contain a single <info> element. It MAY contain one or more <vocabulary> elements.
Attributes
  • Required:
    • Namespace declaration for EmotionML, see EmotionML namespace.
    • version indicates the version of the specification to be used for the document and MUST have the value "1.0".
  • Optional:
    • category-set, dimension-set, appraisal-set and action-tendency-set indicate default emotion vocabularies used in an EmotionML document. Any of these attributes used at document level makes optional the use of the same attribute in any <emotion> elements within the document; the document-level attribute determines the emotion vocabularies used for any <emotion> elements for which the respective attribute is not locally specified. The attributes are of type xsd:anyURI and MUST point to a definition of an emotion vocabulary as specified in Defining vocabularies for representing emotions.
Occurrence This is the root element -- it cannot occur as a child of any other EmotionML element.

<emotionml> is the root element of a standalone EmotionML document. It MAY contain a single <info> element, providing document-level metadata.

The <emotionml> element MUST define the EmotionML namespace.

Standalone EmotionML documents usually serve one or both of the following two purposes:

  1. to wrap a number of <emotion> elements into a single document;
  2. to define emotion vocabularies for use with <emotion> annotations in the same or other documents.

Example:

<emotionml version="1.0" xmlns="http://www.w3.org/2009/10/emotionml">
 ...
</emotionml>

or

<em:emotionml version="1.0" xmlns:em="http://www.w3.org/2009/10/emotionml">
 ...
</em:emotionml>

Note: One of the envisaged uses of EmotionML is to be used in the context of other markup languages. In such cases, there will be no <emotionml> root element, but <emotion> elements will be used directly in other markup -- see Examples of possible use with other markup languages.

2.1.2 A single emotion annotation: The <emotion> element

Annotation <emotion>
Definition This element represents a single emotion annotation.
Children All children are optional. However, at least one of <category>, <dimension>, <appraisal>, <action-tendency> MUST occur.

If present, the following child element can occur only once: <info>.

If present, the following child elements may occur one or more times: <category>; <dimension>; <appraisal>; <action-tendency>; <reference>.

There are no constraints on the combinations of children that are allowed. There are no constraints on the order in which children occur.

Attributes
  • Required:
    • category-set, dimension-set, appraisal-set and action-tendency-set indicate the emotion vocabularies used in this <emotion>. The attributes are of type xsd:anyURI and MUST point to a definition of an emotion vocabulary as specified in Defining vocabularies for representing emotions. The attributes are required as follows:
      • if the <emotion> element has a child element <category>, the category-set attribute is required and must point to the definition of a category vocabulary;
      • if the <emotion> element has a child element <dimension>, the dimension-set attribute is required and must point to the definition of a dimension vocabulary;
      • if the <emotion> element has a child element <appraisal>, the appraisal-set attribute is required and must point to the definition of an appraisal vocabulary;
      • if the <emotion> element has a child element <action-tendency>, the action-tendency-set attribute is required and must point to the definition of an action tendency vocabulary.

      The attribute values default to the values of any attributes of same name on an enclosing <emotionml> element. Any of these attributes used at document level makes optional the use of the same attribute in any <emotion> elements.

    • version indicates the version of the specification to be used for the <emotion> and its descendants. Its value defaults to "1.0".
  • Optional:
Occurrence as a child of <emotionml>, or in any markup using EmotionML.

The <emotion> element represents an individual emotion annotation. No matter how simple or complex its substructure is, it represents a single statement about the emotional content of some annotated item. Where several statements about the emotion in a certain context are to be made, several <emotion> elements MUST be used. See Examples of emotion annotation for illustrations of this issue.

An <emotion> element MAY have an id attribute, allowing for a unique reference to the individual emotion annotation. Since the <emotion> annotation is an atomic statement about the emotion, it is inappropriate to refer to individual emotion representations such as <category>, <dimension>, <appraisal>, <action-tendency> or their children directly. For this reason, these elements do not allow for an id attribute.

Whereas it is possible to use <emotion> elements in a standalone <emotionml> document, a typical use case is expected to be embedding an <emotion> into some other markup -- see Examples of possible use with other markup languages.

2.2 Representations of emotions and related states

2.2.1 The <category> element

Annotation <category>
Definition Description of an emotion or a related state using a category.
Children <trace>: A <category> MAY contain either a value attribute or a <trace> element.
Attributes
  • Required:
    • name, the name of the category, which must be contained in the set of categories identified in the enclosing <emotionml> or <emotion> element's category-set attribute.
  • Optional:
    • value: A <category> MAY contain either a value attribute or a <trace> element.
    • confidence, the annotator's confidence that the annotation given for this category is correct.
Occurrence One or more <category> elements MAY occur as a child of <emotion>. For any given category name in the set, zero or one occurrence is allowed within an <emotion> element.

<category> describes an emotion or a related state in terms of a category name, given as the value of the name attribute. The name MUST belong to a clearly-identified set of category names, which MUST be defined according to Defining vocabularies for representing emotions.

The set of legal values of the name attribute is indicated in the category-set attribute of the enclosing <emotion> or <emotionml> element. Different sets can be used, depending on the requirements of the use case. In particular, different types of emotion-related / affective states can be annotated by using appropriate value sets.

The intensity of an emotion category MAY be specified as a Scale value, either as a static value in the value attribute, or as a dynamic trace over time using the <trace> element.

Examples:

In the following example, the emotion category "satisfaction" is being annotated; it must be contained in the definition of the emotion category vocabulary located at http://www.w3.org/TR/emotion-voc/xml#everyday-categories, which is one of the category vocabularies provided in [Vocabularies for EmotionML].

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
    <category name="satisfaction"/>
</emotion>

The following is an annotation of an interpersonal stance "distant" which must be defined in the custom category set at the URI given in the category-set attribute:

<emotion category-set="http://www.example.com/custom/category/interpersonal-stances.xml">
    <category name="distant"/>
</emotion>

In the following example, an emotion is described by several categories, each being present with different values of intensity. The category set used is the "big six" set described in [Vocabularies for EmotionML].

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
    <category name="sadness" value="0.3"/>
    <category name="anger" value="0.8"/>
    <category name="fear" value="0.3"/>
</emotion>

2.2.2 The <dimension> element

Annotation <dimension>
Definition One or more <dimension> elements jointly describe an emotion or a related state according to an emotion dimension vocabulary.
Children <trace>: A <dimension> MUST contain either a value attribute or a <trace> element.
Attributes
  • Required:
    • name, the name of the dimension, which must be contained in the set of dimensions identified in the enclosing <emotionml> or <emotion> element's dimension-set attribute.
    • value: A <dimension> MUST contain either a value attribute or a <trace> element.
  • Optional:
    • confidence, the annotator's confidence that the annotation given for this dimension is correct.
Occurrence <dimension> elements occur as children of <emotion>. For any given dimension name in the set, zero or one occurrence is allowed within an <emotion> element.

One or more <dimension> elements jointly describe an emotion or a related state in terms of a set of emotion dimensions. The names of the emotion dimensions MUST belong to a clearly-identified set of dimension names, which MUST be defined according to Defining vocabularies for representing emotions.

The set of legal values of the name attribute is indicated in the dimension-set attribute of the enclosing <emotionml> or <emotion> element. Different sets can be used, depending on the requirements of the use case.

The position on an emotion dimension MUST be specified as a Scale value, either as a static value in the value attribute, or as a dynamic trace over time using the <trace> element.

Examples:

One of the most widespread sets of emotion dimensions used (sometimes by different names) is the combination of valence, arousal and potency. The following example is a state of rather low arousal, very positive valence, and high potency -- in other words, a relaxed, positive state with a feeling of being in control of the situation. The example uses the Pleasure-Arousal-Dominance (PAD) vocabulary from [Vocabularies for EmotionML]:

<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#pad-dimensions">
    <dimension name="arousal" value="0.3"/><!-- lower-than-average arousal -->
    <dimension name="pleasure" value="0.9"/><!-- very high positive valence -->
    <dimension name="dominance" value="0.8"/><!-- relatively high potency    -->
</emotion>

In some use cases, custom sets of application-specific dimensions will be required. The following example uses a custom set of dimensions, defining a single dimension "friendliness".

<emotion dimension-set="http://www.example.com/custom/dimension/friendliness.xml">
    <dimension name="friendliness" value="0.2"/><!-- a pretty unfriendly person -->
</emotion>

The usual way to represent the intensity of an emotion would be the value attribute of a <category>. However, if only the intensity of an emotion is annotated, but not its nature, this can be done by using an "intensity" dimension. Thus, an emotional state's "strength" or "intensity" can be described independently from categorical or dimensional descriptions, as shown by the following example.

<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#intensity-dimension">
    <dimension name="intensity" value="0.2"/><!-- not in a strong emotional state -->
</emotion>

2.2.3 The <appraisal> element

Annotation <appraisal>
Definition One or more <appraisal> elements jointly describe an emotion or a related state according to an emotion appraisal vocabulary.
Children <trace>: An <appraisal> MAY contain either a value attribute or a <trace> element.
Attributes
  • Required:
    • name, the name of the appraisal, which must be contained in the set of appraisals identified in the enclosing <emotionml> or <emotion> element's appraisal-set attribute.
  • Optional:
    • value: An <appraisal> MAY contain either a value attribute or a <trace> element.
    • confidence, the annotator's confidence that the annotation given for this appraisal is correct.
Occurrence <appraisal> elements occur as children of <emotion>. For any given appraisal name in the set, zero or one occurrence is allowed within an <emotion> element.

One or more <appraisal> elements jointly describe an emotion or a related state in terms of a set of appraisals. The names of the appraisals MUST belong to a clearly-identified set of appraisal names, which MUST be defined according to Defining vocabularies for representing emotions.

The set of legal values of the name attribute is indicated in the appraisal-set attribute of the enclosing <emotion> element. Different sets can be used, depending on the requirements of the use case.

The degree to which an appraisal is present MAY be specified as a Scale value, either as a static value in the value attribute, or as a dynamic trace over time using the <trace> element.

Examples:

One of the most widespread sets of emotion appraisals used is the appraisals set proposed by Klaus Scherer, covering aspects of novelty, intrinsic pleasantness, goal/need significance, coping potential, and norm/self compatibility. Another very widespread set of emotion appraisals, used in particular in computational models of emotion, is the OCC set of appraisals (Ortony et al., 1988), which includes the consequences of events for oneself or for others, the actions of others and the perception of objects. Using Scherer's appraisals from [Vocabularies for EmotionML], the following example is a state arising from the evaluation of an unpredicted and quite unpleasant event:

<emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals">
    <appraisal name="suddenness" value="0.8"/>
    <appraisal name="intrinsic-pleasantness" value="0.2"/>
</emotion>

In some use cases, custom sets of application-specific appraisals will be required. The following example uses a custom set of appraisals, defining the single appraisal "likelihood".

<emotion appraisal-set="http://www.example.com/custom/appraisal/likelihood.xml">
    <appraisal name="likelihood" value="0.8"/><!-- a very predictable event -->
</emotion>

2.2.4 The <action-tendency> element

Annotation <action-tendency>
Definition One or more <action-tendency> elements jointly describe an emotion or a related state according to an emotion action tendency vocabulary.
Children <trace>: An <action-tendency> MAY contain either a value attribute or a <trace> element.
Attributes
  • Required:
    • name, the name of the action tendency, which must be contained in the set of action tendencies identified in the enclosing <emotionml> or <emotion> element's action-tendency-set attribute.
  • Optional:
    • value: An <action-tendency> MAY contain either a value attribute or a <trace> element.
    • confidence, the annotator's confidence that the annotation given for this action tendency is correct.
Occurrence <action-tendency> elements occur as children of <emotion>. For any given action tendency name in the set, zero or one occurrence is allowed within an <emotion> element.

One or more <action-tendency> elements jointly describe an emotion or a related state in terms of a set of action tendencies. The names of the action tendencies MUST belong to a clearly-identified set of action tendency names, which MUST be defined according to Defining vocabularies for representing emotions.

The set of legal values of the name attribute is indicated in the action-tendency-set attribute of the enclosing <emotion> element. Different sets can be used, depending on the requirements of the use case.

The degree to which an action tendency is present MAY be specified as a Scale value, either as a static value in the value attribute, or as a dynamic trace over time using the <trace> element.

Examples:

One well known use of action tendencies is by N. Frijda. This model uses a number of action tendencies that are low level, diffuse behaviors from which more concrete actions could be determined. It is provided in [Vocabularies for EmotionML]. An example of someone attempting to attract someone they like by being confident, strong and attentive might look like this:

<emotion action-tendency-set="http://www.w3.org/TR/emotion-voc/xml#frijda-action-tendencies">
    <action-tendency name="approach" value="0.7"/>   <!-- get close -->
    <action-tendency name="being-with" value="0.8"/> <!-- be happy -->
    <action-tendency name="attending" value="0.7"/>  <!-- pay attention -->
    <action-tendency name="dominating" value="0.7"/> <!-- be assertive -->
</emotion>

In some use cases, custom sets of application-specific action tendencies will be required. The following example shows control values for a robot who works in a factory and uses a custom set of action-tendencies, defining example actions for a robot. In the example, the robot has very low battery, so it needs to get ready to charge its battery and stop its work of picking up boxes.

<emotion action-tendency-set="http://www.example.com/custom/action/robot.xml">
    <action-tendency name="charge-battery" value="0.9"/> <!-- need to charge battery soon -->
    <action-tendency name="pickup-boxes" value="0.3"/>   <!-- feeling tired, avoid work -->
</emotion>

2.3 Meta-information

2.3.1 The confidence attribute

Annotation confidence
Definition A representation of the degree of confidence or probability that a certain element of the representation is correct.
Occurrence An optional attribute of <category>, <dimension>, <appraisal> and <action-tendency> elements.

Confidence MAY be indicated separately for each of the Representations of emotions and related states. For example, the confidence that the <category> is assumed correctly is independent from the confidence that the position on a dimension is correctly indicated.

Rooted in the tradition of statistics a confidence is given in an interval from 0 to 1, resembling a probability. Insofar, the confidence is a Scale value.

Examples:

In the following, one simple example is provided for each element that can carry a confidence attribute.

The first example indicates a very high confidence that surprise is the emotion to annotate.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
    <category name="surprise" confidence="0.95"/> 
</emotion

The next example illustrates using confidence to indicate that the annotation of high arousal is probably correct, but the annotation of slightly positive pleasure may or may not be correct.

<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#pad-dimensions">
    <dimension name="arousal" value="0.8" confidence="0.9"/>
    <dimension name="pleasure" value="0.6" confidence="0.3"/>
</emotion>

Finally, an example for the case of intensity: A high confidence is given that the emotion has a low intensity.

 <emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#intensity-dimension">
    <dimension name="intensity" value="0.1" confidence="0.8"/>
</emotion>

Note that, as stated, obviously an emotional annotation can be a combination of the above, as in the following example: the intensity of the emotion is quite probably low, but if we have to guess, we would say the emotion is boredom.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"
         dimension-set="http://www.w3.org/TR/emotion-voc/xml#intensity-dimension">
    <category name="bored" confidence="0.1"/>
    <dimension name="intensity" value="0.1" confidence="0.8"/>
</emotion>

2.3.2 The expressed-through attribute

Annotation expressed-through
Definition The modality, or list of modalities, through which the emotion is expressed. An attribute of type xsd:nmtokens which contains a space delimited set of values from an open set of values including: {gaze, face, head, torso, gesture, leg, voice, text, locomotion, posture, physiology, ...}.
Occurrence An optional attribute of <emotion> elements.

The expressed-through attribute describes the modality through which an emotion is produced, usually by a human being. It is not the technical modality by which it was detected, e.g. "face" rather than "camera" and "voice" rather than "microphone". The expressed-through attribute is agnostic about the use case: when detecting emotion, it represents the modality from which the emotion has been detected; when generating emotion-related system behavior, it represents the modality through which the emotion is to be expressed.

The list of values provided covers a broad range of modalities through which emotions may be expressed. These values SHOULD be used if they are appropriate. The list is an open set in order to allow for more fine-grained distinctions such as "eyes" vs. "mouth" etc.

The expressed-through attribute is not specific about the sensors used for observing the modality. These can be specified using the <info> element, or by the emma:mode attribute in an enclosing [EMMA] document.

Example:

In the following example the emotion is expressed through the voice.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" 
         expressed-through="voice">
    <category name="satisfied"/>
</emotion>

In case of multimodal expression of an emotion, a list of space separated modalities can be indicated in the expressed-through attribute, like in the following example in which the two values "face" and "voice" are used.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" 
         expressed-through="face voice">
    <category name="satisfied"/>
</emotion>>

See also the examples in sections 5.1.2 Automatic recognition of emotions, 5.1.3 Generation of emotion-related system behavior and 5.2.3 Use with SMIL.

2.3.3 The <info> element

Annotation <info>
Definition This element can be used to annotate arbitrary metadata.
Children One or more elements in a different namespace than the EmotionML namespace, providing metadata.
Attributes
  • Optional:
    • id, a unique identifier for the info element, of type xsd:ID.
Occurence A single <info> element MAY occur as a child of the <emotionml> root tag to indicate global metadata, i.e. the annotations are valid for the document scope; furthermore, a single <info> element MAY occur as a child of each <emotion> element to indicate local metadata that is only valid for that <emotion> element.

This element can contain arbitrary XML data in a different namespace (one option could be [RDF] data), either on a document global level or on a local "per annotation element" level.

Several initiatives of standardizing metadata exist, such as [IMDI] and [CLARIN]. Metadata may contain information on a large spectrum of elements such as: location description (continent, country, address), content type (e.g., genre, task, modalities), session (title, a recording date, a group of participants); each participant may be defined by her role in the session (e.g. annotator, filmer), her name, her social family role, etc.

Examples:

In the following example, the automatic classification for an annotation document was performed by a classifier based on Gaussian Mixture Models (GMM); the speakers of the annotated elements were of different German origins.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
        xmlns:classifiers="http://www.example.com/meta/classify/"
        xmlns:origin="http://www.example.com/meta/local/"
        category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
    <info>
        <classifiers:classifier classifiers:name="GMM"/>
    </info>

    <emotion>
        <info><origin:localization value="bavarian"/></info>
        <category name="happiness"/>
    </emotion>

    <emotion>
        <info><origin:localization value="swabian"/></info>
        <category name="sadness"/>
    </emotion>
</emotionml>

The following example uses the IMDI metadata language to represent information about the annotator who produced the emotion annotation in the current document, in a global <info> element.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"  
           xmlns:imdi="http://www.mpi.nl/IMDI/Schema/IMDI">
<info>
  <imdi:Actors>
      <imdi:Actor>
           <imdi:Role>Annotator</imdi:Role>
           <imdi:Name>John</imdi:Name>
           <imdi:FullName>John Smith Junior</imdi:FullName>
           <imdi:Code>JS</imdi:Code>
           <imdi:FamilySocialRole>Teacher</imdi:FamilySocialRole>
          ...
      </imdi:Actor>
  </imdi:Actors>
</info>
 ...
<emotion>...</emotion>
<emotion>...</emotion>
</emotionml> 

The following example illustrates how <info> can be used for annotating information on sensors through which an affective signal has been detected. In the global <info> section, the sensors used in the particular scenario are specified. Apart from their ID, information on the modality observed by this sensor is provided as well as information on the confidence for that sensor. In this example, the modality "posture" is observed by a camera and a chair equipped with pressure sensors. For some reason it is decided that emotion estimates based on camera data should be trusted more than those based on chair data. Within the <emotion> elements, <info> is used to specify which sensor has been used to calculate the actual emotion value.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
    xmlns:sensors="http://www.example.com/meta/sensors/"
    category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">

  <info>
    <sensors:sensor id="camera1" confidence="0.9" expressed-through="posture"/>
    <sensors:sensor id="chair" confidence="0.3" expressed-through="posture"/>
    ...
  </info>

  <emotion expressed-through="posture">
    <info>
      <sensors:sensor idref="camera1"/>
    </info>
    <category name="angry"/>
  </emotion>

  <emotion expressed-through="posture">
    <info>
      <sensors:sensor idref="chair"/>
    </info>
    <category name="neutral"/>
  </emotion>

</emotionml>

2.4 References and time

2.4.1 The <reference> element

Annotation <reference>
Definition References may be used to relate the emotion annotation to the "rest of the world", more specifically to the emotional expression, the experiencing subject, the trigger, and the target of the emotion.
Children None
Attributes
  • Required:
    • uri, a URI identifying the actual reference target. The URI MAY be extended by a media fragment, as explained in section 2.4.2.4.
  • Optional:
    • role, the type of relation between the emotion and the external item referred to; one of "expressedBy" (default), "experiencedBy", "triggeredBy", "targetedAt".
    • media-type, an attribute of type xsd:string holding the MIME type of the data that the uri attribute points to.
Occurrence Multiple <reference> elements MAY occur as children of <emotion>.

A <reference> element provides a link to media as a URI [RFC3986]. The semantics of references are described by the role attribute which, if present, MUST have one of four values:

For reference targets representing a period of time, start and end time MAY be denoted by using the media fragments syntax, as explained in section 2.4.2.4.

The media-type attribute MAY be used to differentiate between different media types such as audio, video, text, etc.

There is no restriction regarding the number of <reference> elements that MAY occur as children of <emotion>.

Examples:

The following example illustrates the reference to two different URIs having a different role with respect to the emotion: one reference points to the emotion's expression, a video clip showing a user expressing the emotion; the other reference points to the trigger that caused the emotion, in this case another video clip that was seen by the person who expressed the emotion.

<emotion ... >
    ...
    <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" role="expressedBy"/>
    <reference uri="http://www.example.com/events/e12.xml" role="triggeredBy"/>
</emotion>

Several references may follow as children of one <emotion> tag, even having the same role; for example, the following annotation refers to a portion of a video and to physiological sensor data, both of which expressed the emotion:

<emotion ... >
    ...
    <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" role="expressedBy"/>
    <reference uri="http://www.example.com/data/physio/ph7.txt" role="expressedBy"/>
</emotion>

It is possible to explicitly indicate the MIME type of the item that the reference refers to:

<emotion ... >
    ...
    <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" media-type="video/mp4"/>
</emotion>

2.4.2 Timestamps

2.4.2.1 Absolute time
Annotation start, end
Definition Attributes to denote the starting and ending absolute times. They are of type xsd:nonNegativeInteger and indicate the number of milliseconds since 1 January 1970 00:00:00 GMT.
Occurrence The attributes MAY occur inside an <emotion> element.

start and end attributes denote the absolute starting and ending times at which an emotion or related state happened. This might be used for example with an "emotional diary" application.

Examples:

In the following example, the emotion category "surprise" is annotated, immediately followed by the category "happiness". The start and end attributes specify for each emotion element the absolute beginning and ending times.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"
         start="1268647200" end="1268647330">
    <category name="surprise"/>
</emotion>
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"
         start="1268647331" end="1268647400">
    <category name="happiness"/>
</emotion>

The end value MUST be greater than or equal to the start value.

The ECMAScript Date object's getTime() function is a way to determine the absolute time.

2.4.2.2 Duration
Annotation duration
Definition Attribute of type xsd:nonNegativeInteger, defaulting to zero. It specifies the duration of the event in milliseconds.
Occurrence This attribute MAY occur inside an <emotion> element.

The duration of an input in milliseconds MAY be specified with the duration attribute. The duration attribute MAY be used either in combination with the start or offset-to-start attribute or independently.

A start or offset-to-start attribute together with the duration attribute set to zero MAY be used to indicate a single timestamp on a time axis.

Examples:

In the following example, the start and duration of the emotion category "surprise" are annotated:

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"
         start="1268647200" duration="130">
    <category name="surprise"/>
</emotion>

2.4.2.3 Relative time
Annotation time-ref-uri
Definition Attribute of type xsd:anyURI indicating the URI used to anchor the relative timestamp.
Occurrence This attribute MAY occur inside an <emotion> element.
Annotation time-ref-anchor-point
Definition Attribute with a value of start or end, defaulting to start. It indicates whether to measure the time from the start or end of the interval designated with time-ref-uri.
Occurrence This attribute MAY occur inside an <emotion> element.
Annotation offset-to-start
Definition Attribute of type xsd:integer, defaulting to zero. It specifies the offset in milliseconds for the start of input from the anchor point designated with time-ref-uri and time-ref-anchor-point
Occurrence This attribute MAY occur inside an <emotion> element.

Relative timestamps define the start of an input relative to the start or end of a reference interval such as another input.

The reference interval is designated with time-ref-uri attribute. This MAY be combined with time-ref-anchor-point attribute to specify whether the anchor point is the start or end of this interval. The start of an input relative to this anchor point is then specified with offset-to-start attribute.

The time-ref-uri attribute can point to a custom-defined timestamp or can be, for example, a session identifier.

Examples:

Here is an example where the emotion "surprise" occurs two seconds after the reference time point:

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"
         time-ref-uri="#my_session_id" offset-to-start="2000">
    <category name="surprise"/>
</emotion>

2.4.2.4 Timing in media
Annotation URI fragment: t
Definition Attributes to denote start and endpoint of an annotation in a media stream. Allowed values must be conform with the Media Fragments Specification [Media Fragments]
Occurence The URI fragment MAY occur in the uri attribute of a <reference> element.

Temporal clipping is denoted by the name t, and specified as an interval with a begin time and an end time. Either or both may be omitted, with the begin time defaulting to 0 seconds and the end time defaulting to the duration of the source media. The interval is half-open: the begin time is considered part of the interval whereas the end time is considered to be the first time point that is not part of the interval. If a single number only is given, this is the begin time.

Temporal clipping can be specified either as Normal Play Time (npt) [RFC 2326], as SMPTE timecodes, [SMPTE], or as real-world clock time (clock) [RFC 2326]. Begin and end times are always specified in the same format. The format is specified by name, followed by a colon (:), with npt: being the default.

Examples:

In the following example, the emotion category "happiness" is displayed in an audio file called "myAudio.wav" from the 3rd to the 9th second.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
    <category name="happiness"/>
    <reference uri="myAudio.wav#t=3,9"/>
</emotion>

In the following example, the emotion category "happiness" is displayed in a video file called "myVideo.avi" in SMPTE values, resulting in the time interval [120,121.5).

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
    <category name="happiness"/>
    <reference uri="myVideo.avi#t=smpte-30:0:02:00,0:02:01:15"/>
</emotion>

A last example states this in a video file in real-world clock time code, as a 1 min interval on 26th Jul 2009 from 11hrs, 19min, 1sec.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
    <category name="happiness"/>
    <reference uri="myVideo.avi#t=clock:2009-07-26T11:19:01Z,2009-07-26T11:20:01Z"/>
</emotion>

2.5 Scale values

Scale values are needed to represent content in <category>, <dimension>, <appraisal> and <action-tendency> elements, as well as in confidence.

Representations of scale values can be static or dynamic. A static, constant scale value is represented using the value attribute; for dynamic values, their evolution over time is expressed using the <trace> element.

2.5.1 The value attribute

Annotation value
Definition Representation of a static scale value.
Occurrence The <dimension> element MUST contain either a value attribute or a <trace> element; <category>, <appraisal> and <action-tendency> MAY contain either a value attribute or a <trace> element.

The value attribute represents a static scale value of the enclosing element.

Conceptually, a scale can represent concepts that vary from "nothing" to "a lot" (unipolar scales), or concepts that vary between two opposites, from "very negative" to "very positive" (bipolar scales). Both are represented in EmotionML using floating point values from the interval [0;1]. The min and max values of the scale SHOULD be interpreted as the extreme values, for both unipolar and bipolar scales. For example in a <category>, a value="0" SHOULD be interpreted to mean absolutely no emotion (emotionless); a value="1.0" SHOULD be interpreted to mean emotion at maximum intensity (pure uncontrolled emotion). For bipolar scales, such as the valence dimension, a value of 0 represents the most negative possible value, whereas a value of 1 represents the most positive value possible. The neutral middle point of the scale is at 0.5.

Here are several examples for the usage of scales with EmotionML.

<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#fsre-dimensions">
    <dimension name="arousal" value="0.4"/> <!-- a bit less than average arousal -->
    <dimension name="valence" value="0.6"/> <!-- a bit above average valence -->
</emotion>

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
    <category name="angry" value="0.5"/> <!-- anger at medium intensity -->
</emotion>

<emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals">
    <appraisal name="suddenness" value="0.9"/> <!-- appraisal as a very sudden event -->
</emotion>

<emotion action-tendency-set="http://www.w3.org/TR/emotion-voc/xml#frijda-action-tendencies">
    <action-tendency name="approach" value="0.3"/> <!-- a rather weak tendency to approach -->
</emotion>

Further examples of the value attribute can be found in the context of the <category>, <dimension>, <appraisal> and <action-tendency> elements.

2.5.2 The <trace> element

Annotation <trace>
Definition Representation of the time evolution of a dynamic scale value.
Children None
Attributes
  • Required:
    • freq, a sampling frequency in Hz.
    • samples, a space-separated list of numeric scale values from the interval [0;1] representing the scale value of the enclosing element as it changes over time.
Occurrence

The <dimension> element MUST contain either a value attribute or a <trace> element; <category>, <appraisal> and <action-tendency> MAY contain either a value attribute or a <trace> element.

A <trace> element represents the time course of a scale value.

The freq attribute indicates the sampling frequency at which the values listed in the samples attribute are given.

NOTE: The <trace> representation requires a periodic sampling of values. In order to represent values that are sampled aperiodically, separate <emotion> annotations with appropriate timing information and individual value attributes may be used.

Examples:

The following example illustrates the use of a trace to represent an episode of fear during which the emotion's intensity is rising, first gradually, then quickly to a very high value. Values are taken at a sampling frequency of 10 Hz, i.e. one value every 100 ms.

<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
  <category name="fear">
    <trace freq="10Hz"
           samples="0.1 0.1 0.15 0.2 0.2 0.25 0.25 0.25 0.3 0.3 0.35 0.5 0.7 0.8 0.85 0.85"/>
  </category>
</emotion>

The following example combines a trace of the appraisal "suddenness" with a global confidence that the values represent the facts properly. There is a sudden peak of suddenness; the annotator is reasonably certain that the annotation is correct:

<emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals">
  <appraisal name="suddenness" confidence="0.75">
    <trace freq="10Hz" samples="0.1 0.1 0.1 0.1 0.1 0.7 0.8 0.8 0.8 0.8 0.4 0.2 0.1 0.1 0.1"/>
  </appraisal>
</emotion>

3 Defining vocabularies for representing emotions

EmotionML markup MUST refer to one or more vocabularies to be used for representing emotion-related states, as specified in the context of the <emotionml> and <emotion> elements. Due to the lack of agreement in the community, the EmotionML specification does not preview a single default set which should apply if no set is indicated. Instead, the user MUST explicitly state the set of descriptor names used.

The document [Vocabularies for EmotionML] provides a number of emotion vocabularies which are likely to be of general interest. In order to promote interoperability, users SHOULD verify if one of the vocabularies defined in that document is suitable for their application. If that is not the case, users can define their own custom vocabularies as defined in the present section.

3.1 Mechanism for defining vocabularies

The syntax for defining emotion vocabularies is based on the element <vocabulary> and its child <item>.

3.1.1 The <vocabulary> element

Annotation <vocabulary>
Definition Contains the definition of an emotion vocabulary.
Children A <vocabulary> element MUST contain one ore more <item> elements. A <vocabulary> element MAY contain a single <info> element, providing arbitrary metadata about the vocabulary itself.
Attributes
  • Required:
    • type, MUST be one of "category", "dimension", "appraisal" or "action-tendency".
    • id, an unique vocabulary identifier of type xsd:ID.
Occurrence

One or more <vocabulary> elements MAY occur as direct children of an <emotionml> element.

Vocabulary definitions, when present, occur as direct children of the document root element <emotionml>. It is possible to refer to a vocabulary defined in the same or in a separate EmotionML document, through URIs specified by the values of the attributes category-set, dimension-set, appraisal-set and action-tendency-set of the <emotion> element.

The value of the type attribute explicitly states whether the vocabulary represents category names, dimension elements, appraisal elements or action tendency elements.

3.1.2 The <item> element

Annotation <item>
Definition Represents the definition of one vocabulary item, associated with a value which can be used in the "name" attribute of <category>, <dimension>, <appraisal> or <action-tendency> (depending on the type of vocabulary being defined).
Children An <item> element MAY contain a single <info> element, providing arbitrary metadata about the vocabulary item.
Attributes
  • Required:
    • name: a name for the item, used to refer to this item. An <item> MUST NOT have the same name as any other <item> within the same <vocabulary>.
Occurrence One or more <item> elements occur as direct children of a <vocabulary> element.

An <item> represents the definition of one vocabulary item. A <vocabulary> MUST contain at least one <item> element.

Examples:

In the following example, three vocabularies are wrapped into a single EmotionML document. Their id attributes are: "big6", "fsre-dimensions" and "frijda-subset". They are used to represent categories, dimensions and action tendencies respectively. The first <emotion> element specifies the emotion vocabularies used through the attributes category-set and action-tendency-set, while the second <emotion> element uses the attribute dimension-set.

<emotionml version="1.0" xmlns="http://www.w3.org/2009/10/emotionml">
   
    <!-- Vocabulary definitions -->
   
    <vocabulary type="category" id="big6">
        <item name="anger"/>
        <item name="disgust"/>
        <item name="fear"/>
        <item name="happiness"/>
        <item name="sadness"/>
        <item name="surprise"/>
    </vocabulary>

    <vocabulary type="dimension" id="fsre-dimensions">
        <item name="valence"/>
        <item name="potency"/>
        <item name="arousal"/>
        <item name="unpredictability"/>
    </vocabulary>

    <vocabulary type="action-tendency" id="frijda-subset">
        <item name="approach"/>
        <item name="avoidance"/>
        <item name="rejecting"/>
    </vocabulary>

    <!-- Emotion elements -->
   
    <emotion category-set="#big6" action-tendency-set="#frijda-subset">
        <category name="fear"/>
        <action-tendency name="approach" value="0.0"/>
        <action-tendency name="avoidance" value="0.9"/>
    </emotion>

    <emotion dimension-set="#fsre-dimensions">
        <dimension name="arousal" value="0.3"/>
    </emotion>

</emotionml>

4 Conformance

4.1 EmotionML namespace

The EmotionML namespace is "http://www.w3.org/2009/10/emotionml". All EmotionML elements MUST use this namespace.

4.2 Use with other namespaces

The EmotionML namespace is intended to be used with other XML namespaces as per the Namespaces in XML Recommendation (1.0 [XML-NS10] or 1.1 [XML-NS11], depending on the version of XML being used).

4.3 Schema validation and processor validation of EmotionML documents

The EmotionML schema is designed to validate the structural integrity of an EmotionML document or document fragment, but cannot verify whether the emotion descriptors used in the name attribute of <category>, <dimension>, <appraisal> and <action-tendency> are consistent with the vocabularies indicated in the respective category-set, dimension-set, appraisal-set and action-tendency-set attributes.

It is the responsibility of an EmotionML processor to verify that the use of descriptor names and values is consistent with the vocabulary definition.

5 Examples

This section is informative.

5.1 Examples of emotion annotation

5.1.1 Manual annotation of emotional material

Annotation of static images

An image gets annotated with several emotion categories at the same time, but different intensities.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata"
           category-set="http://www.example.com/custom/hall-matsumoto-emotions.xml">
   <info>
      <meta:media-type>image</meta:media-type>
      <meta:media-id>disgust</meta:media-id>
      <meta:media-set>JACFEE-database</meta:media-set>
      <meta:doc>Example adapted from (Hall & Matsumoto 2004) http://www.davidmatsumoto.info/Articles/2004_hall_and_matsumoto.pdf
      </meta:doc>
   </info>

   <emotion>
       <category name="Disgust" value="0.82"/>
       <category name="Contempt" value="0.35"/>
       <category name="Anger" value="0.12"/>
       <category name="Surprise" value="0.53"/>
   </emotion>
</emotionml>
Annotation of videos

Example 1: Annotation of a whole video: several emotions are annotated with different intensities.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata"
           category-set="http://www.example.com/custom/humaine-database-labels.xml">
    <info>
        <meta:media-type>video</meta:media-type>
        <meta:media-name>ed1_4</meta:media-name>
        <meta:media-set>humaine database</meta:media-set>
        <meta:coder-set>JM-AB-UH</meta:coder-set>
    </info>
    <emotion>
        <category name="Amusement" value="0.52"/>
        <category name="Irritation" value="0.63"/>
        <category name="Relaxed" value="0.02"/>
        <category name="Frustration" value="0.87"/>
        <category name="Calm" value="0.21"/>
        <category name="Friendliness" value="0.28"/>
    </emotion>
</emotionml>

Example 2: Annotation of a video segment, where two emotions are annotated for overlapping but not identical timespans.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata"
           category-set="http://www.example.com/custom/emotv-labels.xml">
    <info>
        <meta:media-type>video</meta:media-type>
        <meta:media-name>ext-03</meta:media-name>
        <meta:media-set>EmoTV</meta:media-set>
        <meta:coder>4</meta:coder>
    </info>

    <emotion>
        <category name="irritation" value="0.46"/>
        <reference uri="file:ext03.avi?t=3.24,15.4">
    </emotion>
    <emotion>
        <category name="despair" value="0.48"/>
        <reference uri="file:ext03.avi?t=5.15,17.9"/>
    </emotion>
</emotionml>

5.1.2 Automatic recognition of emotions

This example shows how automatically annotated data from three affective sensor devices might be stored or communicated.

It shows an excerpt of an episode experienced on 23 November 2001 from 14:36 onwards (absolute start time is 1006526160 milliseconds since 1 January 1970 00:00:00 GMT). Each device detects an emotion, but at slightly different times and for different durations.

The next entry of observed emotions occurs about 6 minutes later (absolute start time is 1006526520 milliseconds since 1 January 1970 00:00:00 GMT). Only the physiology sensor has detected a short glimpse of anger, for the visual and IR camera it was below their individual threshold so no entry from them.

For simplicity, all devices use categorical annotations and the same set of categories. Obviously it would be possible, and even likely, that different devices from different manufacturers provide their data annotated with different emotion sets.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
    category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
 ...
<emotion start="1006526160" expressed-through="face">
  <!--the first modality detects excitement.
      It is a camera observing the face. A URI to the database
      is provided to access the video stream.-->
  <category name="excited"/>
  <reference uri="http://www.example.com/facedb#t=26,98"/>
</emotion>

<emotion start="1006526160" expressed-through="facial-skin-color">
  <!--the second modality detects anger. It is an IR camera
      observing the face. A URI to the database
      is provided to access the video stream.-->
  <category name="angry"/>
  <reference uri="http://www.example.com/skindb#t=23,108"/>
</emotion>

<emotion start="1006526160" expressed-through="physiology">
  <!--the third modality detects excitement again. It is a
      wearable device monitoring physiological changes in the
      body. A URI to the database
      is provided to access the data stream.-->
  <category name="excited"/>
  <reference uri="http://www.example.com/physiodb#t=19,101"/>
</emotion>

<emotion start="1006526520" expressed-through="physiology">
  <category name="angry"/>
  <reference uri="http://www.example.com/physiodb2#t=2,6"/>
</emotion>
 ...
</emotionml>

Note that handling of complex emotions is not explicitly specified. This example assumes that parallel occurrences of emotions will be determined on the time stamp.

5.1.3 Generation of emotion-related system behavior

Generation of facial expressions in an MPEG-4 face model

The MPEG-4 standard offers 68 parameters, called Facial Animation Parameters FAPs, to animate a 3D facial model. 66 of these parameters correspond to low level parameters. These parameters act on the facial feature points defining a 3D facial model. They specify how these feature points are displaced. They simulate muscular contraction. On the other hand, two FAPs, namely FAP1 and FAP2, refer respectively to viseme and expression. FAP2 corresponds to one of the six basic facial expressions (anger, disgust, fear, happiness, sadness and surprise). The expressions associated to the six emotions are defined by textual descriptions [Ostermann, 2002].

In emotion theory, the idea of mixing emotions to create new emotions is disputed. For the purposes of facial expression modeling, however, it is possible to simulate different emotions as linear combinations of the six basic facial expressions. MPEG-4 allows the linear combination of any two of these expressions: emotion_1 * intensity_1 + emotion_2 * intensity_2. For example, [Raouzaiou et al., 2005] found that the expressions of depression and guilt can be obtained by combinations of fear and sadness with different intensities, while the expression of suspicion is obtained by combining anger and disgust.

In EmotionML it is possible to represent the emotional input to an MPEG-4 based facial animation system using multiple <category> elements, for example as follows.

<emotion xmlns="http://www.w3.org/2009/10/emotionml"
         category-set="http://www.w3.org/TR/emotion-voc/xml#big6">
  <!-- attempt to express suspicion as a combination of anger and disgust -->
  <category name="anger" value="0.5"/>
  <category name="disgust" value="0.3"/>
</emotion>
Generation of robot behavior

The following example describes various aspects of an emotionally competent robot whose battery is nearly empty. The robot is in a global state of high arousal, negative pleasure and low dominance, i.e. a negative state of distress paired with some urgency but quite limited power to influence the situation. It has a tendency to seek a recharge and to avoid picking up boxes. However, sensor data displays an unexpected obstacle on the way to the charging station. This triggers planning of expressive behavior of frowning. The annotations are grouped into a stand-alone EmotionML document here; in the real world, the various aspects would more likely be embedded into different specialized markup in various parts of the Robot architecture.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata">
    <info>
        <meta:name>robbie the robot example</meta:name>
    </info>

    <!-- Robot's current global state configuration: negative, active, powerless -->
    <emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#pad-dimensions">
        <dimension name="pleasure" value="0.2"/>
        <dimension name="arousal" value="0.8"/>
        <dimension name="dominance" value="0.3"/>
    </emotion>

    <!-- Robot's action tendencies: want to recharge -->
    <emotion action-tendency-set="http://www.example.com/custom/action/robot.xml">
        <action-tendency name="charge-battery" value="0.9"/>
        <action-tendency name="seek-shelter" value="0.7"/>
        <action-tendency name="pickup-boxes" value="0.1"/>
    </emotion>

    <!-- Appraised value of incoming event: obstacle detected, appraised as novel and unpleasant -->
    <emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals">
        <appraisal name="suddenness" value="0.8" confidence="0.4"/>
        <appraisal name="intrinsic-pleasantness" value="0.2" confidence="0.8"/>
        <reference role="triggeredBy" uri="file:scannerdata.xml#obstacle27"/>
    </emotion>

    <!-- Robot's planned facial gestures: will frown -->
    <emotion category-set="http://www.example.com/custom/robot-emotions.xml"
        expressed-through="face">
        <category name="frustration"/>
        <reference role="expressedBy" uri="file:behavior-repository.xml#frown"/>
    </emotion>
</emotionml>

5.2 Examples of possible use with other markup languages

One intended use of EmotionML is as a plug-in for existing markup languages. For compatibility with text-annotating markup languages such as SSML, EmotionML avoids the use of text nodes. All EmotionML information is encoded in element and attribute structures.

This section illustrates the concept using three existing W3C markup languages: EMMA, SSML, and SMIL.

5.2.1 Use with EMMA

EMMA is made for representing arbitrary analysis results; one of them could be the emotional state. The following example represents an analysis of a non-verbal vocalization; its emotion is described as a low-intensity state, maybe boredom.

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"
        xmlns="http://www.w3.org/2009/10/emotionml">
    <emma:interpretation emma:start="12457990" emma:end="12457995" emma:mode="voice" emma:verbal="false">

        <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
            <category name="bored" value="0.1" confidence="0.1"/>
        </emotion>

    </emma:interpretation>
</emma:emma>

In the folllowing example, the EMMA <emma:derivation> element is used to represent multiple emotion interpretations associated with audio and video media sources. The first and the third interpretations specify the same emotion category, "content", while the result of the second one is "amused". The consolidated emotion is the result of some processing made on the interpretations included in the derivation element. In this case it is "content", which is the most frequent category within the available interpretations.

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns="http://www.w3.org/2009/10/emotionml">
 
    <emma:derivation>

        <emma:interpretation id="text1" emma:start="12457960" emma:end="12457995" emma:mode="voice"
                emma:verbal="true" emma:signal="http://example.com/signals/emo123.wav"  
                emma:process="http://example.com/text_analysis.xml">
            <emma:literal>I feel happy</emma:literal>
            <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
                <category name="content" value="0.7" confidence="0.7"/>
            </emotion>
        </emma:interpretation>

        <emma:interpretation id="voice1" emma:start="12457960" emma:end="12457995" emma:mode="voice"
                emma:verbal="false" emma:signal="http://example.com/signals/emo123.wav"
                emma:process="http://example.com/voice_analysis.xml">
            <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
                <category name="amused" value="0.4" confidence="0.5"/>
            </emotion>
        </emma:interpretation>

        <emma:interpretation id="video1" emma:start="12457980" emma:end="12458000" emma:mode="video"
                emma:verbal="false" emma:signal="http://example.com/signals/emo123.mpg"
                emma:process="http://example.com/video_analysis.xml">
            <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
                <category name="content" value="0.5" confidence="0.7"/>
            </emotion>
        </emma:interpretation>
  
    </emma:derivation>


    <emma:interpretation id="multimodal1" emma:start="12457960" emma:end="12458000"
            emma:medium="acoustic visual" emma:mode="voice video">
        <emma:derived-from resource="#text1" composite="true"/>
        <emma:derived-from resource="#voice1" composite="true"/>
        <emma:derived-from resource="#video1" composite="true"/>
        <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
            <category name="content" value="0.6" confidence="0.7"/>
        </emotion>
    </emma:interpretation>

</emma:emma>

5.2.2 Use with SSML

Two options for using EmotionML with SSML can be illustrated.

First, it is possible with [SSML 1.1] to use arbitrary markup belonging to a different namespace anywhere in an SSML document; only SSML processors that support the markup would take it into account. Therefore, it is possible to insert EmotionML below, for example, an <s> element representing a sentence; the intended meaning is that the enclosing sentence should be spoken with the given emotion, in this case a moderately worried tone of voice:

<?xml version="1.0"?>
<speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis"
         xmlns:emo="http://www.w3.org/2009/10/emotionml"
         xml:lang="en-US">
    <s>
        <emo:emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
            <emo:category name="worried" value="0.4"/>
        </emo:emotion>

        Do you need help?
    </s>
</speak>

Second, a future version of SSML could explicitly preview the annotation of paralinguistic information, which could fill the gap between the extralinguistic, speaker-constant settings of the <voice> tag and the linguistic elements such as <s>, <emphasis>, <say-as> etc. The following example assumes that there is a <style> tag for paralinguistic information in a future version of SSML. The style could embed an <emotion>, as follows:

<?xml version="1.0"?>
<speak version="x.y" xmlns="http://www.w3.org/2001/10/synthesis"
         xmlns:emo="http://www.w3.org/2009/10/emotionml"
         xml:lang="en-US">
    <s>
      <style>
        <emo:emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories">
            <emo:category name="worried" value="0.4"/>
        </emo:emotion>

        Do you need help?
      </style>
    </s>
</speak>

Alternatively, the <style> could refer to a previously defined <emotion>, for example:

<?xml version="1.0"?>
<speak version="x.y" xmlns="http://www.w3.org/2001/10/synthesis"
         xmlns:emo="http://www.w3.org/2009/10/emotionml"
         xml:lang="en-US">
    <emo:emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"
                 id="somewhatWorried">
        <emo:category name="worried" value="0.4"/>
    </emo:emotion>

    <s>
        <style ref="#somewhatWorried">
            Do you need help?
        </style>
    </s>
</speak>

5.2.3 Use with SMIL

Using EmotionML for the use case of generating system behavior requires elements of scheduling and surface form realization which are not part of EmotionML. Necessarily, this use case relies on other languages to provide the needed functionality. This is in line with the aim of EmotionML to serve as a specialized plug-in language.

This example illustrates the idea in terms of a simplified version of a storytelling application. A virtual agent tells a story using voice and facial animation. The expression in face and voice is influenced by the rendering engine in terms of EmotionML. The engine in this example uses SMIL [SMIL] for defining the temporal relation between events; EmotionML is used via SMIL's generic <ref> element. In general it is the engine which knows how to render the emotion in the virtual agent's expressive capabilities. To override this, the second <emotion> contains an explicit request to realize the emotional expression using both face and voice modalities.

ridinghood.smil:

<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0">
  <head> ... </head>
  <body>
    <par duration="8s">
      <img src="file:forest.jpg"/>
      <smileText>The little girl was enjoying the walk in the forest.</smileText>
      <ref src="file:ridinghood.emotionml#emotion1"/>
    </par>
    <par duration="5s">
      <img src="file:wolf.jpg"/>
      <smileText>Suddenly a dark shadow appeared in front of her.</smileText>
      <ref src="file:ridinghood.emotionml#emotion2"/>
    </par>

  </body>
</smil>

ridinghood.emotionml:

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
    category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"
    appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals">

  <emotion id="emotion1">
    <category name="content" value="0.7"/>
  </emotion>

  <emotion id="emotion2" expressed-through="face voice">
    <category name="afraid" value="0.9"/>
    <appraisal name="suddenness" value="0.9"/>
    <appraisal name="intrinsic-pleasantness" value="0.1"/>
  </emotion>
</emotionml>

Similar principles for decoupling emotion markup from the temporal organization of generating system behavior can be applied using other representations, including interactive setups.

6 References

6.1 Normative references

EMMA
EMMA: Extensible MultiModal Annotation markup language version 1.0, Michael Johnston et al., Editors. World Wide Web Consortium, 11 December 2007.
Media Fragments URI
Media Fragments URI 1.0, Raphaël Troncy et al., Editors. World Wide Web Consortium, W3C Working Draft 17 March 2011.
RDF
RDF/XML Syntax Specification (Revised), Dave Beckett, Editor. World Wide Web Consortium, W3C Recommendation 10 February 2004.
RFC2119
Key words for use in RFCs to Indicate Requirement Levels, S. Bradner, Editor. IETF RFC 2119, March 1997.
RFC 2326
Real Time Streaming Protocol (RTSP), H. Schulzrinne et al., Editors. IETF RFC 2326, April 1998.
RFC3986
Uniform Resource Identifier (URI): Generic Syntax, T. Berners-Lee et al., Editors. IETF RFC 3986, January 2005.
SMIL
Synchronized Multimedia Integration Language (SMIL) Version 3.0, Dick Bulterman et al., Editors. W3C Recommendation, 1 December 2008.
SMPTE
SMPTE RP 136 Time and Control Codes for 24, 25 or 30 Frame-Per-Second Motion-Picture Systems.
SSML
Speech Synthesis Markup Language (SSML) Version 1.0, Daniel C. Burnett, et al., Editors. World Wide Web Consortium, W3C Recommendation, 7 September 2004.
SSML 1.1
Speech Synthesis Markup Language (SSML) Version 1.1, Daniel C. Burnett, et al., Editors. World Wide Web Consortium, W3C Recommendation, 7 September 2010.
XML-NS10
Namespaces in XML 1.0, Tim Bray et al., Editors. World Wide Web Consortium, W3C Recommendation, 16 August 2006.
XML-NS11
Namespaces in XML 1.1, Tim Bray et al., Editors. World Wide Web Consortium, W3C Recommendation, 2006.
XML Schema
XML Schema Part 1: Structures Second Edition, Henry S. Thompson et al., Editors. World Wide Web Consortium, W3C Recommendation, 2004.

6.2 Informative references

CLARIN
CLARIN Metadata Infrastructure for Language Resources and Technology,Version 5, D. Broeder et al., Editors. Common Language Resources and Technology Infrastructure Report, 4 February 2009.
Emotion Incubator Group
W3C Emotion Incubator Group, M. Schröder, E. Zovato, H. Pirker, C. Peter, F. Burkhardt, Editors. Final Report of the Emotion Incubator Group at the World Wide Web Consortium, 10 July 2007.
Emotion Markup Language Incubator Group
Elements of an EmotionML 1.0, M. Schröder, Editor. Final Report of the Emotion Markup Language Incubator Group at the World Wide Web Consortium, 20 November 2008.
EmotionML Requirements
Emotion Markup Language: Requirements with Priorities. F. Burkhardt and M. Schröder. W3C Incubator Group Report, 13 May 2008.
IMDI
IMDI Editor version 3.2, B. Hellwig and D. van Uytvanck. ISLE Metadata Initiative Report, 19 June 2007.
Ortony et al., 1988
Ortony, A., Clore, G. L., & Collins, A. (1988). The Cognitive Structure of Emotion. Cambridge, UK: Cambridge University Press.
Ostermann, 2002
Ostermann, J. (2002). Face Animation in MPEG-4. In: MPEG-4 Facial Animation - The Standard Implementation and Applications (I.S. Pandzic and R. Forchheimer, eds.), pp. 17-55. England: Wiley.
Raouzaiou et al., 2005
Raouzaiou, A., Spyrou, E., Karpouzis, K. and Kollias, S. (2005). Emotion Synthesis: an Intermediate Expressions’ Generator System in the MPEG-4 Framework. International Workshop VLBV05, 15-16 September 2005, Sardinia, Italy.
Vocabularies for EmotionML
Vocabularies for EmotionML. M. Schröder and C. Pelachaud, Editors. W3C Working Draft, 7 April 2011.

7 Acknowledgments

The authors wish to acknowledge the contributions by all members of the Multimodal Interaction Working Group, the Emotion Markup Language Incubator Group and the Emotion Incubator Group, as well as the participants to the W3C Workshop on EmotionML, in particular the following persons (in alphabetic order):


Appendix A: Changes

This section is informative.

Changes in the current Working Draft

This section summarizes the main changes since the previous working draft of 29 July 2010.

Changes in Working Draft 2 (29 July 2010)