Copyright
©
2009
2010 W3C ® (
MIT ,
ERCIM , Keio ),
All Rights Reserved. W3C
liability ,
trademark and document
use rules apply.
As the web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The present draft specification of Emotion Markup Language 1.0 aims to strike a balance between practical applicability and scientific well-foundedness. The language is conceived as a "plug-in" language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the
First
Second Public Working Draft of
"Emotion
the Emotion Markup Language
(EmotionML) 1.0",
1.0 specification, published on 29
October 2009. The present draft specification
draws on previous work in the Emotion Markup Language Incubator
Group (EmotionML XG, 2007-2008) that proposed elements
July 2010. It addresses many of
a generally usable markup language for
emotions and related states, as well as the
earlier Emotion Incubator Group
issues raised in the First Public Working Draft
(2006-2007) that had identified a
comprehensive list of
requirements arising
29 October 2009. Changes from
use cases of an Emotion Markup Language. The
present report reflects the
starting point of formal specification. The
group expects a process of condensing this document into a simpler,
ready-to-use specification which removes unclear parts of the draft
and cuts redundancy with related languages such as [ EMMA ].
First Public Working Draft can be found
in Appendix
A .
This document was developed by the Multimodal Interaction Working Group . The Working Group expects to advance this Working Draft to Recommendation Status.
Please send comments about this document to www-multimodal@w3.org (with public archive ).
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [ RFC2119 ].
Human emotions are increasingly understood to be a crucial aspect in human-machine interactive systems. Especially for non-expert end users, reactions to complex intelligent systems resemble social interactions, involving feelings such as frustration, impatience, or helplessness if things go wrong. Furthermore, technology is increasingly used to observe human-to-human interactions, such as customer frustration monitoring in call center applications. Dealing with these kinds of states in technological systems requires a suitable representation, which should make the concepts and descriptions developed in the affective sciences available for use in technological contexts.
This report specifies Emotion Markup Language (EmotionML) 1.0, a markup language designed to be usable in a broad variety of technological contexts while reflecting concepts from the affective sciences.
The report is work in progress. Issue notes are used to describe open questions as well as available choices.
As for any standard format, the first and main goal of an EmotionML is twofold: to allow a technological component to represent and process data, and to enable interoperability between different technological components processing the data.
Use cases for EmotionML can be grouped into three broad types:
Interactive systems are likely to involve both analysis and generation of emotion-related behavior; furthermore, systems are likely to benefit from data that was manually annotated, be it as training data or for rule-based modelling. Therefore, it is desirable to propose a single EmotionML that can be used in all three contexts.
Concrete examples of existing technology that could apply EmotionML include:
The Emotion Incubator Group has listed 39 individual use cases for an EmotionML .
A second reason for defining an EmotionML is the observation that ad hoc attempts to deal with emotions and related states often lead people to make the same mistakes that others have made before. The most typical mistake is to model emotions as a small number of intense states such as anger, fear, joy, and sadness; this choice is often made irrespective of the question whether these states are the most appropriate for the intended application. Crucially, the available alternatives that have been developed in the affective science literature are not sufficiently known, resulting in dead-end situations after the initial steps of work. Careful consideration of states to study and of representations for describing them can help avoid such situations.
EmotionML makes scientific concepts of emotions practically applicable. This can help potential users to identify the suitable representations for their respective applications.
Any attempt to standardize the description of emotions using a finite set of fixed descriptors is doomed to failure: even scientists cannot agree on the number of relevant emotions, or on the names that should be given to them. Even more basically, the list of emotion-related states that should be distinguished varies depending on the application domain and the aspect of emotions to be focused. Basically, the vocabulary needed depends on the context of use. On the other hand, the basic structure of concepts is less controversial: it is generally agreed that emotions involve triggers, appraisals, feelings, expressive behavior including physiological changes, and action tendencies; emotions in their entirety can be described in terms of categories or a small number of dimensions; emotions have an intensity, and so on. For details, see Scientific Descriptions of Emotions in the Final Report of the Emotion Incubator Group .
Given this lack of agreement on descriptors in the field, the only practical way of defining an EmotionML is the definition of possible structural elements, their valid child elements and attributes, but to allow users to "plug in" vocabularies that they consider appropriate for their work. A central repository of such vocabularies can serve as a recommended starting point; where that seems inappropriate, users can create their custom vocabularies.
An additional challenge lies in the aim to provide a generally usable markup, as the requirements arising from the three different use cases (annotation, recognition, and generation) are rather different. Whereas manual annotation tends to require all the fine-grained distinctions considered in the scientific literature, automatic recognition systems can usually distinguish only a very small number of different states.
For the reasons outlined here, it is clear that there is an inevitable tension between flexibility and interoperability, which need to be weighed in the formulation of an EmotionML. The guiding principle in the following specification has been to provide a choice only where it is needed; to propose reasonable default options for every choice; and, ultimately, to propose mapping mechanisms where that is possible and meaningful.
The terms related to emotions are not used consistently, neither in common use nor in the scientific literature. The following glossary describes the intended meaning of terms in this document.
The following sections describe the syntax of the main elements of EmotionML. The specification is not yet fully complete. Feedback is highly appreciated.
<emotionml>
elementAnnotation | <emotionml> |
---|---|
Definition | The root element of an EmotionML document. |
Children | The element MUST contain one or more
<emotion> elements. It MAY contain a single element. |
Attributes |
|
Occurrence | This is the root element -- it cannot occur as a child of any other EmotionML elements. |
<emotionml>
is the root element of a
standalone EmotionML document. It wraps a number of
<emotion>
elements into a single document. It
may contain a single element, providing
document-level metadata.<metadata>
<info>
The <emotionml>
element MUST define the EmotionML namespace , and may
define any other namespaces. .
Example:
<emotionml version="1.0" xmlns="http://www.w3.org/2009/10/emotionml"> ... </emotionml>
or
<em:emotionml version="1.0" xmlns:em="http://www.w3.org/2009/10/emotionml"> ... </em:emotionml>
Note: One of the envisaged uses of EmotionML is to be used in
the context of other markup languages. In such cases, there will be
no <emotionml>
root element, but
<emotion>
elements will be used directly in
other markup -- see Examples of possible use with
other markup languages .
<emotion>
elementAnnotation | <emotion> |
---|---|
Definition | This element represents a single emotion annotation. |
Children | All children are optional.
However, at least one of
ISSUE-72:
should
If present, the following child elements can occur only once:
If present, the following child elements may occur one or more
times: There are no constraints on the combinations of children that are allowed. |
Attributes |
|
Occurrence | as a child of <emotionml> , or in any markup
using EmotionML. |
The <emotion>
element represents an
individual emotion annotation. No matter how simple or complex its
substructure is, it represents a single statement about the
emotional content of some annotated item. Where several statements
about the emotion in a certain context are to be made, several
<emotion>
elements MUST be used. See Examples of emotion annotation for illustrations
of this issue.
An <emotion>
element MAY have an id
attribute, allowing for a unique reference to the
individual emotion annotation. Since the <emotion>
annotation is an atomic statement about the
emotion, it is inappropriate to refer to individual emotion
representations such as <category>
,<dimension>
,<appraisal>
,<action-tendency>
,<intensity>
or their children directly. For this reason,
these elements do not allow for an id
attribute.
Whereas it is possible to use <emotion>
elements in a standalone <emotionml>
document, a
typical use case is expected to be embedding an
<emotion>
into some other markup -- see Examples of possible use with other markup
languages .
<category>
elementAnnotation | <category> |
---|---|
Definition | Description of an emotion or a related state using a single category. |
Children | None |
Attributes |
|
Occurrence | A single <category> MAY occur as a child of
<emotion> . |
<category>
describes an emotion or a related state in terms of a
single category name, given as the value of the name
attribute. The name MUST belong to a clearly-identified set of
category names, which MUST be defined according to Defining vocabularies for representing emotions
.
The set of legal values of the name
attribute is
indicated in the attribute of the enclosing set
category-set element. Different
sets can be used, depending on the requirements of the use case. In
particular, different types of emotion-related / affective states can be annotated by using
appropriate value sets.<category>
<emotion>
Examples:
In the following example, the emotion category "satisfaction" is
being annotated; it must be contained in the
set
definition of
values named "everydayEmotions".
an emotion category vocabulary located at
http://www.example.com/category/everyday-emotions.
<emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml"> <category name="satisfaction"/> </emotion>
The following is an annotation of an interpersonal stance
"distant" which must
belong to
defined in the
category set
of values named
"commonInterpersonalStances".
at the URI given in the category-set
attribute:
<emotion category-set="http://www.example.com/custom/category/interpersonal-stances.xml"> <category name="distant"/> </emotion>
<dimensions>
<dimension> element
Annotation |
|
---|---|
Definition |
|
Children | Optionally,
|
Attributes |
|
Occurrence |
|
A
One or more <dimensions>
<dimension>element describes elements jointly describe an emotion or a related state in terms of a
set of emotion dimensions .
The names of the emotion dimensions MUST belong to a
clearly-identified set of dimension names, which MUST be defined
according to Defining vocabularies for representing
emotions .
The set of values that can be used as
tag names of child elements
values of the <dimensions>
nameelement attribute is indicated in the attribute of the enclosing set
dimension-set element. Different
sets can be used, depending on the requirements of the use
case.<dimensions>
<emotion>
There are no constraints regarding the order of the
dimension child <dimension>
elements
within a an
element.<dimensions>
<emotion>
Any given dimension is either unipolar or bipolar; its
value
attribute MUST contain
either discrete or continuous
a Scale values value
.
A dimension element MUST either contain a value
attribute or a <trace>
child element,
corresponding to static and dynamic representations of Scale values , respectively.
If the dimension element has both a
confidence attribute and a <trace> child, the <trace>
child MUST NOT have a samples-confidence attribute. In other words,
it is possible to either give a constant confidence on the
dimension element or a confidence trace on the <trace>
element, but not both. Examples:
One of the most widespread sets of emotion dimensions used
(sometimes by different names) is the combination of valence,
arousal and potency. Assuming that arousal and potency are unipolar
scales with typical values between 0 and 1, and valence is a
bipolar scale with typical values between -1 and 1, the following
example is a state of rather low arousal, very positive
valence,
pleasure, and high potency -- in other
words, a relaxed, positive state with a feeling of being in control
of the situation:
<emotion dimension-set="http://www.example.com/emotion/dimension/PAD.xml"> <dimension name="arousal" value="0.3"/><!-- lower-than-average arousal --> <dimension name="pleasure" value="0.9"/><!-- very high positive valence --> <dimension name="dominance" value="0.8"/><!-- relatively high potency --> </emotion>
In some use cases, custom sets of application-specific dimensions will be required. The following example uses a custom set of dimensions, defining a single, bipolar dimension "friendliness".
<emotion dimension-set="http://www.example.com/custom/dimension/friendliness.xml"> <dimension name="friendliness" value="0.2"/><!-- a pretty unfriendly person --> </emotion>
<appraisals>
<appraisal> element
Annotation |
|
---|---|
Definition |
|
Children | Optionally,
|
Attributes |
|
Occurrence |
|
An
One or more <appraisals>
<appraisal>element describes elements jointly describe an emotion or a related state in terms of a
set of appraisals . The names of the
appraisals MUST belong to a clearly-identified set of appraisal
names, which MUST be defined according to Defining
vocabularies for representing emotions .
The set of values that can be used as
tag names of child elements
values of the <appraisals>
nameelement attribute is indicated in the attribute of the enclosing set
appraisal-set element. Different
sets can be used, depending on the requirements of the use
case.<appraisals>
<emotion>
There are no constraints regarding the order of the
appraisal child <appraisal>
elements
within a an
element.<appraisals>
<emotion>
Any given appraisal is either unipolar or bipolar; its
value
attribute MUST contain
either discrete or continuous
a Scale values value
.
An appraisal element MUST either contain a value
attribute or a <trace>
child element,
corresponding to static and dynamic representations of Scale values , respectively.
If the appraisal element has both a
confidence attribute and a <trace> child, the <trace>
child MUST NOT have a samples-confidence attribute. In other words,
it is possible to either give a constant confidence on the
appraisal element, or a confidence trace on the <trace>
element, but not both. Examples:
One of the most widespread sets of emotion appraisals used is
the appraisals set proposed by
K. Scherer ,
Klaus Scherer, namely novelty,
intrinsic pleasantness, goal/need significance, coping potential,
and norm/self compatibility. Another very widespread set of emotion
appraisals, used in particular in computational models of emotion,
is the OCC set of appraisals ( Ortony et
al., 1988 ), which includes the consequences of events for
oneself or for others, the actions of others and the perception of
objects. Assuming some appraisal variables, say novelty is a
unipolar scale
with typical values between 0 and 1,
and intrinsic pleasantness is a bipolar
scale with typical values between -1 and
1,
scale, the following example is a state
arising from the evaluation of an unpredicted and quite unpleasant
event:
<emotion appraisal-set="http://www.example.com/emotion/appraisal/scherer.xml"> <appraisal name="novelty" value="0.8"/> <appraisal name="intrinsic-pleasantness" value="0.2"/> </emotion>
In some use cases, custom sets of application-specific appraisals will be required. The following example uses a custom set of appraisals, defining single, bipolar appraisal "likelihood".
<emotion appraisal-set="http://www.example.com/custom/appraisal/likelihood.xml"> <appraisal name="likelihood" value="0.8"/><!-- a very predictable event --> </emotion>
<action-tendencies>
<action-tendency>
element
Annotation |
|
---|---|
Definition |
|
Children | Optionally,
|
Attributes |
|
Occurrence |
|
An
One or more <action-tendencies>
<action-tendency>element describes elements jointly describe an emotion or a related state in terms of a
set of action-tendencies . The
names of the action-tendencies MUST belong to a clearly-identified
set of action-tendency names, which MUST be defined according to Defining vocabularies for representing emotions
.
The set of values that can be used as
tag names of child elements
values of the <action-tendencies>
nameelement attribute is indicated in the attribute of the
enclosing set
action-tendency-set element. Different
sets can be used, depending on the requirements of the use
case.<action-tendencies>
<emotion>
There are no constraints regarding the order of the
action-tendency child <action-tendency>
elements within a an
element.<action-tendencies>
<emotion>
Any given
action-tendency
action tendency is either unipolar or
bipolar; its value
attribute MUST contain
either discrete or continuous
a Scale values value
.
A action-tendency element MUST either contain a
value
attribute or a <trace>
child
element, corresponding to static and dynamic representations of Scale values , respectively.
If the action-tendency element has both a
confidence attribute and a <trace> child, the <trace>
child MUST NOT have a samples-confidence attribute. In other words,
it is possible to either give a constant confidence on the
action-tendency element, or a confidence trace on the <trace>
element, but not both. Examples:
One well known use of action tendencies is by N. Frijda who generally uses the term "action
readiness". . This model uses a number of action
tendencies that are low level, diffuse behaviors from which more
concrete actions could be determined. An example of someone
attempting to attract someone they like by being confident, strong
and attentive might look like
this using unipolar values:
this:
<emotion action-tendency-set="http://www.example.com/emotion/action/frijda.xml"> <action-tendency name="approach" value="0.7"/><!-- get close --> <action-tendency name="avoid" value="0.0"/> <action-tendency name="being-with" value="0.8"/><!-- be happy --> <action-tendency name="attending" value="0.7"/><!-- pay attention --> <action-tendency name="rejecting" value="0.0"/> <action-tendency name="non-attending" value="0.0"/> <action-tendency name="agonistic" value="0.0"/> <action-tendency name="interrupting" value="0.0"/> <action-tendency name="dominating" value="0.7"/><!-- be assertive --> <action-tendency name="submitting" value="0.0"/> </emotion>
In some use cases, custom sets of application-specific
action-tendencies will be required. The following example shows
control values for a robot who works in a factory and uses a custom
set of action-tendencies, defining example actions for a
robot using bipolar and unipolar values.
<emotion> <action-tendencies
set="myRobotActionTendencies"> <charge-battery
value="0.9"/><!-- need to charge battery soon, be-with
charger --> <pickup-boxes value="-0.2"/><!-- feeling
tired, avoid work --> </action-tendencies>
</emotion> Different use cases require continuous or discrete
Scale values ; the following example shows control values for a
robot who works in a factory and uses discrete values for a bipolar
action-tendency "pickup-boxes" and a unipolar action-tendency
"seek-shelter".
robot.
<emotion action-tendency-set="http://www.example.com/custom/action/robot.xml"> <action-tendency name="charge-battery" value="0.9"/><!-- need to charge battery soon --> <action-tendency name="pickup-boxes" value="0.3"/><!-- feeling tired, avoid work --> </emotion>
<intensity>
elementAnnotation | <intensity> |
---|---|
Definition | Represents the intensity of an emotion. |
Children | Optionally, an <intensity> element MAY have
a <trace> child
element. |
Attributes |
|
Occurrence | One <intensity> item MAY occur as a child of
<emotion> . |
<intensity>
represents the intensity of an
emotion. The <intensity>
element MUST either
contain a value
attribute or a
<trace>
child element, corresponding to static
and dynamic representations of scale values, respectively.
<intensity>
is a unipolar scale.
If the <intensity> element has both
a confidence attribute and a <trace> child, the <trace>
child MUST NOT have a samples-confidence attribute. In other words,
it is possible to either give a constant confidence on the
<intensity> element, or a confidence trace on the
<trace> element, but not both. A typical use of
intensity is in combination with <category>
.
However, in some emotion models (e.g. Gebhard, 2005 ), the emotion's
intensity can also be used in combination with a position in
emotion dimension space, that is in combination with elements. Therefore, intensity is specified
independently of <dimensions> .
<dimension><category>
.
Example:
A weak surprise could accordingly be annotated as follows.
<emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml"> <intensity value="0.2"/><category set="everydayEmotions" name="surprise"/><category name="surprise"/> </emotion>
The fact that intensity is represented by an element makes it
possible to add meta-information. For example, it is possible to
express a high confidence
that the intensity is low,
but a low confidence regarding the emotion category, as shown as
the last example in the description of confidence
.
confidence
attributeAnnotation | confidence |
---|---|
Definition | A representation of the degree of confidence or probability that a certain element of the representation is correct. |
Occurrence | An optional attribute of <category> , , and
<intensity> . |
<emotion>
have a confidence
attribute?
Confidence MAY be indicated separately for each of the Representations of emotions and related states .
For example, the confidence that the <category>
is assumed correctly is independent from the confidence that its
<intensity>
is correctly indicated.
Rooted in the tradition of statistics a confidence is
usually given in an interval from 0 to
1, resembling a probability. This is an intuitive range opposing
e.g. (logarithmic) score values.
However, additionally a given yet limited
number of discrete values may often be sufficient and more
intuitive. Insofar, the confidence is a unipolar Scale value .
Legal values: a floating-point value from
the interval [0;1]; a fixed number of discrete values (see ISSUE
note of the value attribute). ISSUE: Should legal numeric values be
in the range of [0,2] to allow for exaggeration? This would make
confidence consistent with Scale values . ISSUE: Check potential
redundancy of confidence with emma:confidence . Examples:
In the following one simple example is provided for each element
that MAY carry a confidence
attribute.
The first example
uses a verbal discrete scale value to
indicate
indicates a very high confidence that
surprise is the emotion to annotate.
<emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml"> <category name="surprise" confidence="0.95"/> </emotion
The next example illustrates using
continuous scale values for
confidence
to indicate that the annotation of high
arousal is probably correct, but the annotation of slightly
positive valence may or may not be correct.
Note that the choice of verbal vs. numeric
scales between the emotion <dimension> and its confidence is
totally independent, i.e. it is fully possible to use verbally
specified emotion dimensions with numerically specified confidence
(as in this example) or any other combination of verbal and numeric
scales. <emotion> <dimensions set="valenceArousal">
<arousal value="++" confidence="0.9"/> <valence value="+"
confidence="0.3"/> </dimensions> </emotion>
Accordingly, an example of <appraisals> using verbal scales
for both the appraisal dimensions themselves and for the
confidence. Note that the confidence is always unipolar, but that
some of the appraisal dimensions are bipolar.
<emotion dimension-set="http://www.example.com/emotion/dimension/PAD.xml"> <dimension name="arousal" value="0.8" confidence="0.9"/> <dimension name="pleasure" value="0.6" confidence="0.3"/> </emotion>
Finally, an example for the case of
<intensity>
: A high confidence is named that
the emotion has a low intensity.
<emotion> <intensity value="0.1" confidence="0.8"/> </emotion>
Note that, as stated, obviously an emotional annotation can be a
combination of
some or all of the above, as in the
following example: the intensity of the emotion is quite probably
low, but if we have to guess, we would say the emotion is
boredom.
<emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml"> <intensity value="0.1" confidence="0.8"/><category set="everydayEmotions" name="boredom" confidence="0.1"/><category name="boredom" confidence="0.1"/> </emotion>
<modality>
modality element attribute
Annotation |
|
---|---|
Definition |
|
Occurrence | <emotion> |
The <modality>
modalityelement is used to annotate attribute describes the modes in modality
by which the an emotion is reflected. The mode attribute can contain values
from a closed set of values, namely those specified by expressed, not the set
attribute. For example, a basic or default set could include values
like face, voice, body technical
modality by which it was detected, e.g. "face" rather than
"camera" and text. "voice" rather than "microphone". The mode and medium attributes can contain a list of
space separated values, in order modality is agnostic about the use case: when
detecting emotion, it represents the modality from which the
emotion has been detected; when generating emotion-related system
behavior, it represents the emotion through which the emotion
is to indicate multimodal input or
output. be expressed.
The advantages
With the current representation of
including a medium attribute, at
modality, it is not possible to
indicate the
cost
type of
sensor through which the given modality was
observed. For example, a
more complex syntax, are:
face may show no emotion with a normal
optical camera, but an emotion may be detected from the same face
using an infrared camera.
It must be considered to what extent an optional annotation must be added which would allow:
Example:
In the following example the emotion is expressed through the
voice, which is a modality included in the
basicModalities set.
voice.
<emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml" modality="voice"> <category name="satisfaction"/> </emotion>
In case of multimodal expression of an emotion, a list of space
separated modalities can be indicated in the mode attribute, like
in the following example in which the two values "face" and "voice"
must be included in the basicModalities
set.
are used.
<emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml" modality="face voice"> <category name="satisfaction"/> </emotion>>
See also the
example at section
examples in sections 5.1.2 Automatic recognition of emotions ISSUE: An alternative way of representing more
modalities is to indicate one <modality> element for
each , 5.1.3
Generation of them. In order to better
classify emotion-related system
behavior and
distinguish them, an identifier attribute
could be introduced. ISSUE: Depending on the previous issue, it
must be discussed whether one or more than one <modality>
elements can occur inside an <emotion> element. 5.2.3 Use with SMIL
.
<metadata>
<info> element
Annotation |
|
---|---|
Definition | This element can be used to annotate arbitrary metadata. |
Children | One or more elements in a different namespace than the EmotionML namespace , providing metadata. |
Attributes |
|
Occurence | A single
|
This element can contain arbitrary XML data in a different namespace (one option could be [ RDF ] data), either on a document global level or on a local "per annotation element" level.
Examples:
In the following example, the automatic classification for an annotation document was performed by a classifier based on Gaussian Mixture Models (GMM); the speakers of the annotated elements were of different German origins.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:classifiers="http://www.example.com/meta/classify/" xmlns:origin="http://www.example.com/meta/local/" category-set="http://www.example.com/emotion/category/everyday-emotions.xml"> <info> <classifiers:classifier classifiers:name="GMM"/></metadata></info> <emotion><metadata> <origin:localization value="bavarian"/> </metadata> <category set="everydayEmotions" name="joy"/><info><origin:localization value="bavarian"/></info> <category name="joy"/> </emotion> <emotion><metadata> <origin:localization value="swabian"/> </metadata> <category set="everydayEmotions" name="sadness"/><info><origin:localization value="swabian"/></info> <category name="sadness"/> </emotion> </emotionml>
<link>
<reference> element
Annotation |
|
---|---|
Definition | |
Children | None |
Attributes |
|
Occurrence | Multiple
|
A element provides a
link to media as a URI [ RFC3986 ]. The
semantics of <link>
<reference>links references are described by the
role
attribute which MUST have one of four
values:
role
attribute is not explicitly stated;For resources representing a period of time, start and end time
MAY be denoted by use of the
optional attributes start and end that
default to "0" and the time length of the media
file, respectively. ISSUE: What do the
default values of "start" and "end" mean for resources that do not
have a notion of time, such
fragments syntax, as
XML nodes, picture files, etc.? Maybe there
should not be default values, so start and end are unspecified if
the start and end attributes are not explicitly stated? ISSUE:
Check potential redundancy of <link> with
<emma:source>
explained in section 2.4.2.2 .
There is no restriction regarding the number of elements that MAY
occur as children of <link>
<reference><emotion>
.
Example:
Examples:
The following example illustrates the
link
reference to two different URIs having
a different role
with respect to the emotion: one
link
reference points to the emotion's
expression, e.g. a video clip showing a user expressing the
emotion; the other
link
reference points to the trigger that
caused the emotion, e.g. another video clip that was seen by the
person eliciting the expressed emotion. Note that
no media sub-classing is
the media-type
attribute can be used to differentiate
between different media types such as
audio, video, text, etc.
<emotion> <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" role="expressedBy"/> <reference uri="http://www.example.com/events/e12.xml" role="triggeredBy"/> </emotion>
Several
links
references may follow as children of
one <emotion>
tag, even having the same
role
; for :
example
example, the following annotation refers to a
portion of a video and to
physiological sensor data data, both of the which
expressed emotion. the emotion:
<emotion ...> ... <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" role="expressedBy"/> <reference uri="http://www.example.com/data/physio/ph7.txt" role="expressedBy"/> </emotion>
ISSUE: Position on a time line in
externally linked objects needs to be finalized. Agreement was
found to include absolute and relative timing. Start and end
provision is preferred over provision of a duration attribute.
Further no onset, hold, or decay will be included at the moment.
However, the following questions remain: How should timing be
defined syntactically? It
needs to be specified where timing may occur,
that is, is
it an element or an attribute (as presently
contained by start and end ). Thereby only one of these choices
should exist. ISSUE: The example points out that we allow links
with same role
possible to
different media types, without making
explicitly indicate the
media types explicit. The relation
MIME type of
this mechanism with emma:media-type
the item that the reference refers
to:
<emotion ...> ... <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" media-type="video/mp4"/> </emotion>
should be investigated.
Annotation |
|
---|---|
Definition |
|
Occurrence | The
<emotion> element. |
date
startdenotes and
end
denote the absolute timepoint starting and
ending times at which an emotion
or related state
happened. This might be used for example with an "emotional diary"
application. The attribute These attributes MAY be used with an
<emotion>
element, and MUST be a string in conformance to W3C datetime note based
on the ISO-8601 standard . of
type xsd:nonNegativeInteger
.
Examples:
In the following example, the emotion category
"joy"
"surprise" is
annotated
annotated, immediately followed by the
category "joy". The start and end attributes determine for
each emotion
element the 23
November 2001, 14:36 hours UTC. absolute beginning and ending
times.
<emotion start="1268647200" end="1268647330"> <category name="surprise"/> </emotion> <emotion start="1268647331" end="1268647400"> <category name="joy"/> </emotion>
The end
value
MUST be greater than or equal to the start
value.
The ECMAScript Date object's getTime() function is a way to determine the absolute time.
Annotation |
|
---|---|
Definition | Attributes to denote start and endpoint of an annotation in a
media stream. Allowed values must be conform with the
|
Occurence | The
|
start denotes
Temporal clipping is denoted by the
timepoint from which on
name t, and specified as an
emotion
interval with a begin time and an end time.
Either or
related state
both may be omitted, with the begin time
defaulting to 0 seconds and the end time defaulting to the duration
of the source media. The interval is
displayed in a media file. It
half-open: the begin time is
optional and defaults
considered part of the interval whereas the
end time is considered to
"0".
be the first time point that is not part of
the interval. If a single number only is given, this is the begin
time.
end denotes the timepoint at which an
emotion
Temporal clipping can be specified either
as Normal Play Time (npt) [ RFC 2326
], as SMPTE timecodes, [ SMPTE
], or
related state
as real-world clock time (clock) [ RFC 2326
ends to be displayed in a media file. It is
optional
]. Begin and
defaults to
end times are always specified in the
time length of
same format. The format is specified by name,
followed by a colon (:), with npt: being the
media file.
default.
Both attributes
This MAY be used with a <link>
<reference>element and MUST be a string in conformance
to the SMIL clock value syntax .
element.
Examples:
In the following example, the emotion category "joy" is
displayed in
a video
an audio file called
"myVideo.avi"
"myAudio.wav" from the 3rd to the 9th
second.
<emotion><category set="everydayEmotions" name="joy"/> <link uri="myVideo.avi" start="3s" end="9s"/><category name="joy"/> <reference uri="myAudio.wav#t=3,9"/> </emotion>
In the
start or end of
following example, the
interval designated with timeRefURI .
Possible values are "start" and "end", default value
emotion category "joy" is
"start". Annotation offsetToStart Definition
Attribute with
displayed in a
time value, defaulting to zero. It specifies
the offset for the start of input from the anchor point designated
with timeRefURI and timeRefAnchor . Allowed values must be conform
with the SMIL clock value syntax Occurence The above attributes MAY
occur as part of an <emotion> . If offsetToStart or
timeRefAnchor are given, timeRefURI MUST also be specified.
timeRefURI , timeRefAnchor and offsetToStart may be used to set the
timing of an emotion or related state relative to
video file called "myVideo.avi" in SMPTE
values, resulting in the
timing of another annotated element.
time interval [120,121.5).
<emotion> <category name="joy"/> <reference uri="myVideo.avi#t=smpte-30:0:02:00,0:02:01:15"/> </emotion>
Examples:
In the following example, Fred is
annotated
A last example states this in a video file in
real-world clock time code, as
being sad
a 1 min interval on
23 November 2001 at 14:39 hours, three
minutes later than the absolutely positioned reference
element.
26th Jul 2009 from 11hrs, 19min,
1sec.
<emotion> <category name="joy"/> <reference uri="myVideo.avi#t=clock:2009-07-26T11:19:01Z,2009-07-26T11:20:01Z"/> </emotion>
Scale values are needed to represent content in dimension
,
<dimension>
appraisal
and <appraisal>
action-tendency
elements, as well as in <action-tendency>
<intensity>
and confidence
.
Representations of scale values can vary along
three
the following axes:
value
attribute; for dynamic
values, their evolution over time is expressed using the
<trace>
element.value
attributeAnnotation | value |
---|---|
Definition | Representation of a static scale value. |
Occurrence | An optional attribute of ,
and
elements and of <intensity> ; these elements
MUST either contain a value attribute or a
<trace> element. |
The value
attribute represents a static scale value
of the enclosing element.
Conceptually, each dimension
,
<dimension>
appraisal
and <appraisal>
action-tendency
element is either unipolar or bipolar. The definition of a set of
dimensions, appraisals or action tendencies MUST define, for each
item in the set, whether it is unipolar or bipolar.<action-tendency>
<intensity>
is a
unipolar scale.
Legal values: For
both unipolar
scales, legal values are one of a
floating-point value from the interval [0;2], where usual values
are in the range [0;1], and
values in [1;2] can be used to represent
exaggerated values; a fixed number of discrete values (see ISSUE
note below). For bipolar scales, legal values are
one of a floating-point value from the
interval
[-2;2], where usual values are in the range
[-1;1], and values in [-2;-1] and [1;2] can be used to represent
exaggerated values; a fixed number of discrete values (see ISSUE
note below). ISSUE: The list of legal discrete values needs to be
finalized. There are two options for discrete five-point scales:
verbal scales, such as “very negative – negative – neutral –
positive – very positive”; abstract scales, such as “-- - 0 + ++”
It seems difficult to find generic wordings for verbal scales which
fit to all possible uses; however, abstract scales may be
unintuitive to use. One option would be to use the definition of
vocabulary sets for dimensions, appraisals and action tendencies to
define the list of legal discrete values for each dimension. As a
result, there would potentially be different discrete values,
potentially even a different number of values, for each dimension.
Generic interpretability may still be possible, though, because of
the requirement to state whether a scale is unipolar or bipolar and
in combination with a requirement to list the possible values in
increasing order.
[0;1].
Examples of the value
attribute can be found in the
context of the dimension
,
<dimension>
appraisal
and <appraisal>
action-tendency
elements and of <action-tendency>
<intensity>
.
<trace>
elementAnnotation | <trace> |
---|---|
Definition | Representation of the time evolution of a dynamic scale value. |
Children | None |
Attributes |
|
Occurrence |
An optional child element of |
A <trace>
element represents the time course
of a numeric scale value. It cannot be used for discrete scale
values.
The freq
attribute indicates the sampling frequency
at which the values listed in the samples
attribute
are given.
A <trace> MAY include a trace of the
confidence alongside with the trace of the scale itself, in the
samples-confidence attribute. If present, samples-confidence MUST
use the same sampling frequency as the content scale, as given in
the freq attribute. If the enclosing element contains a (static)
confidence attribute, the <trace> MUST NOT have a
samples-confidence attribute. In other words, it is possible to
indicate either a static or a dynamic confidence for a given scale
value, but not both. NOTE: The <trace>
representation requires a periodic sampling of values. In order to
represent values that are sampled aperiodically, separate
<emotion>
annotations with appropriate timing
information and individual value
attributes may be
used.
Examples:
The following example illustrates the use of a trace to represent an episode of fear during which intensity is rising, first gradually, then quickly to a very high value. Values are taken at a sampling frequency of 10 Hz, i.e. one value every 100 ms.
<emotion category-set="http://www.example.com/emotion/category/ekman-big-six.xml"> <category name="fear"/> <intensity> <trace freq="10Hz" samples="0.1 0.1 0.15 0.2 0.2 0.25 0.25 0.25 0.3 0.3 0.35 0.5 0.7 0.8 0.85 0.85"/> </intensity> </emotion>
The following example combines a trace of the appraisal "novelty" with a global confidence that the values represent the facts properly. There is a sudden peak of novelty; the annotator is reasonable certain that the annotation is correct:
<emotion appraisal-set="http://www.example.com/emotion/appraisal/scherer.xml"> <appraisal name="novelty" confidence="0.75"> <trace freq="10Hz" samples="0.1 0.1 0.1 0.1 0.1 0.7 0.8 0.8 0.8 0.8 0.4 0.2 0.1 0.1 0.1"/> </appraisal> </emotion>
EmotionML markup MUST refer to one or more vocabularies to be used for representing emotion-related states. Due to the lack of agreement in the community, the EmotionML specification does not preview a single default set which should apply if no set is indicated. Instead, the user MUST explicitly state the value set used.
ISSUE:
ISSUE-105: How to define the actual
vocabularies to use for <category>
, , <dimensions>
<dimension> and <appraisals>
<appraisal> remains to be
specified. <action-tendencies>
<action-tendency>As described in Considerations
regarding the validation of EmotionML documents , a A suitable method may be to define an XML
format in which these sets can be defined. The format for defining
a vocabulary MUST fulfill at least the following
requirements:
Furthermore, the format SHOULD allow for
ISSUE:
ISSUE-106: The EmotionML specification
SHOULD come with a carefully-chosen selection of default
vocabularies, representing a suitably broad range of
emotion-related states and use cases. Advice from the affective
sciences
SHOULD be
is being sought to obtain a balanced
set of default vocabularies.
The following is a preliminary list of emotion vocabularies that can be used with EmotionML. The guiding principle for selecting "recommended" emotion vocabularies has been to select vocabularies that are either commonly used in technological contexts, or represent current emotion models from the scientific literature. Also, given the difficulty to define mappings between emotion categories, dimensions, appraisals and action tendencies, we have included pairs or groups of vocabularies where these mappings are rather well defined. The selection is necessarily incomplete; many highly relevant emotion models are not listed here. Where they are needed, users can write a definition as described in User-defined custom vocabularies .
NOTE: Feedback on the selection of "default" emotion vocabularies in this section is highly appreciated. Please send comments to www-multimodal@w3.org (with public archive ).
These six terms are proposed by Paul Ekman ( Ekman, 1972 , p. 251-252) as basic emotions with universal facial expressions -- emotions that are recognized and produced in all human cultures.
Term | Description |
---|---|
|
|
|
|
|
|
|
|
|
|
|
These 17 terms are the result of a study by Cowie et al. ( Cowie et al., 1999 ) investigating emotions that frequently occur in everyday life.
Term | Description |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The 22 OCC categories are proposed by Ortony, Clore and Collins ( Ortony et al., 1988 , p. 19) as part of their appraisal model. See also OCC appraisals below.
Term | Description |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The 24 FSRE categories are used in the study by Fontaine, Scherer, Roesch and Ellsworth ( Fontaine et al., 2007 , p. 1055) investigating the dimensionality of emotion space. See also FSRE dimensions below.
Term | Description |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This category set is included because according to Nico Frijda's proposal of action tendencies ( Frijda, 1986 ), these categories are related to action tendencies. See Frijda's action tendencies , below.
Term | Description |
---|---|
|
related to action tendency 'agnostic' |
|
related to action tendency 'approach' |
|
related to action tendency 'approach' |
|
related to action tendency 'rejecting' |
|
related to action tendency 'being-with' |
|
related to action tendency 'avoidance' |
|
related to action tendency 'submitting' |
|
related to action tendency 'nonattending' |
|
related to action tendency 'attending' |
|
related to action tendency 'submitting' |
|
related to action tendency 'interrupting' |
|
related to action tendency 'interrupting' |
Mehrabian proposed a three-dimensional description of emotion in terms of Pleasure, Arousal, and Dominance (PAD; Mehrabian, 1996 , p. 264).
Term | Description |
---|---|
|
|
|
|
|
The four emotion dimensions obtained in the study by Fontaine, Scherer, Roesch and Ellsworth (Fontaine et al., 2007, p. 1051 and 1055) investigating the dimensionality of emotion space. See also FSRE categories above.
Term | Description |
---|---|
|
also named evaluation or pleasantness |
|
also named control |
|
also named activation |
|
The following appraisals were proposed by Ortony, Clore and Collins ( Ortony et al., 1988 ) in their appraisal model. See also OCC categories above.
Term | Description |
---|---|
|
relevant for event based emotions. (pleased/displeased) |
|
relevant for attribution emotions. (approving/disapproving) |
|
relevant for attraction emotions. (liking/disliking) |
|
related to fortunes of others. Whether the event is desirable for the other. |
|
related to fortunes of others. Whether the other “deserves” the event. |
|
related to fortunes of others. Whether the other is liked or not. These distinguish between: happy-for, pity, gloating (schadenfreude), and resentment. |
|
relevant for prospect emotions. (hope/fear) |
|
relevant for prospect emotions. How much effort the individual invested in the outcome. |
|
relevant for prospect emotions. The actual resulting outcome. These distinguish between: relief, disappointment, satisfaction, and fears-confirmed. |
|
relevant for attribution emotions. The stronger one identifies with the other, that distinguishes between whether pride or admiration is felt. |
|
relevant for attribution emotions. Distinguishes whether the other is expected to act in the manner deserving of admiration or reproach. These distinguish b between: pride, shame, admiration, reproach. |
|
relevant for attraction emotions. (love/hate) |
The following list of appraisals was proposed by Klaus Scherer as a sequence of Stimulus Evaluation Checks (SECs) in his Component Process Model of emotion ( Scherer, 1984 , p. 310; Scherer, 1999 , p. 639).
Term | Description |
---|---|
Novelty | |
|
|
|
|
|
|
Intrinsic pleasantness | |
|
|
Goal significance | |
|
Relevance to the concerns of the person him- or herself, e.g. survival, bodily integrity, fulfillment of basic needs, self-esteem |
|
Relevance to concerns regarding relationships with others, e.g. establishment, continued existence and intactness of relationships, cohesion of social groups |
|
Relevance to social order, e.g. sense of orderliness, predictability in a social environment including fairness & appropriateness |
|
|
|
|
|
|
|
|
Coping potential | |
|
The event was caused by the agent him- or herself |
|
The event was caused by another person |
|
The event was caused by chance or by nature |
|
0: caused by negligence, 1: caused intentionally |
|
Is the event controllable? |
|
Power of the agent him- or herself |
|
Is adjustment possible to the agent's own goals? |
Compatibility with standards | |
|
Compatibility with external standards, such as norms or demands of a reference group |
|
Compatibility with internal standards, such as the self ideal or internalized moral code |
The following list of appraisals was compiled by Gratch and Marsella ( Gratch & Marsella, 2004 ) for their EMA model.
Term | Description |
---|---|
|
|
|
|
|
causal attribution -- who caused the event? |
|
blame and credit -- part of causal attribution |
|
|
|
|
|
|
|
|
|
part of coping potential |
|
part of coping potential |
|
part of coping potential |
|
part of coping potential |
This set of action tendencies was proposed by Nico Frijda ( Frijda, 1986 ), who also coined the term 'action tendency'. See also Frijda's category set , above.
Term | Description |
---|---|
|
aimed towards access and consummatory activity, related to desire |
|
aimed towards own inaccessibility and protection, related to fear |
|
aimed at contact and interaction, related to enjoyment |
|
aimed at identification, related to interest |
|
aimed at removal of object, related to disgust |
|
aimed at selecting, related to indifference |
|
aimed at removal of obstruction and regaining control, related to anger |
|
aimed at reorientation, related to shock and surprise |
|
aimed at retained control, related to arrogance |
|
aimed at deflecting pressure, related to humility and resignation |
EmotionML markup makes no syntactic difference between referring to centrally-defined default vocabularies and referring to user-defined custom vocabularies. Therefore, one option to define a custom vocabulary is to create a definition XML file in the same way as it is done for the default vocabularies.
ISSUE: In addition, it
ISSUE-107: It may be desirable to embed
the definition of custom vocabularies inside an
<emotionml>
document, e.g. by placing the
definition XML element as a child element below the document
element <emotionml>
.
The EmotionML namespace is "http://www.w3.org/2009/10/emotionml". All EmotionML elements MUST use this namespace.
The EmotionML namespace is intended to be used with other XML namespaces as per the Namespaces in XML Recommendation (1.0 [ XML-NS10 ] or 1.1 [ XML-NS11 ], depending on the version of XML being used).
There is an intrinsic tension between the
requirement of using plug-in vocabularies and the formal
verification that a document is valid with respect to the
specification. The issue has been pointed out repeatedly throughout
this report, and is not yet solved. The following two subsections
provide elements which may be part of a solution. 4.3.1 Use of
QNAMES A proposal under consideration is to use QNAMES to specify
custom values for attributes. This solution allows to substitute
the set attribute from many elements with a namespace declaration
to be used as QNAME for the value of the attribute. With this
solution the attribute values are one or more white space separated
QNames as defined in Section 4 of Namespaces in XML (1.0 [ XML-NS10
] or 1.1 [ XML-NS11 ], depending on the version of XML being used).
When the attribute content is a QName, it is expanded into an
expanded-name using the namespace declarations that are in scope
for the relative element. Thus, each QName provides a reference to
a specific item in the referred namespace. In the example below,
the QName "everydayEmotions:satisfaction" is the value of the name
attribute and it will be expanded to the "satisfaction" item in the
"http://www.example.com/everyday_emotion_catg_tags" namespace. The
taxonomy for the everyday emotion categories has to be documented
at the specified namespace URI. <emotionml
xmlns="http://www.w3.org/2009/10/emotionml"
xmlns:everydayEmotions="http://www.example.com/everyday_emotion_catg_tags">
<emotion> <category
name="everydayEmotions:satisfaction"/> </emotion>
</emotionml> This solution allows for referencing different
dictionaries depending on the namespace declarations. Moreover, the
namespace qualification will make the new set of values
unique. The
drawbacks of this solution are the absence of
a simple and clear way on how to validate the QNAME attribute
values, and a more verbose syntax of the attribute contents. 4.3.2
Dynamic schema creation A static schema document can only fully
validate a language where the valid element names and attribute
values are known at the time when the
EmotionML schema is
written. For EmotionML, this is not possible
because of the fundamental requirement
designed to
give users
validate the
option
structural integrity of
using their own vocabularies . The following
is an
idea for dynamically creating a schema
document from a base schema and the vocabulary sets referenced in
the document itself. An EmotionML document
refers to a centrally defined generic schema,
one or more vocabularies which may be centrally defined or
user-specific, and to a centrally defined
XProc script; the generic schema defines the legal structure of
EmotionML documents using a schema language such as XML Schema ,
RelaxNG or Schematron , but using placeholders for concrete
vocabulary items; each vocabulary is defined using an XML format
known to the XProc script; the XProc script defines the workflow
for validating an EmotionML document: from the EmotionML document,
look up the generic schema and the custom vocabularies; through a
suitable mechanism such as XSLT , merge the generic schema and the
custom vocabularies into a (short-lived) custom schema; validate
the EmotionML document
using
fragment, but cannot verify whether the
custom schema
emotion descriptors used in the
usual way. ISSUE: The choice of a suitable
schema language depends on the required expressive power. The
schema language must allow for the verification of both
name
attribute values (for of
<category>
, ) and child
element names (for
, <dimensions>
<dimension> and <appraisals>
<appraisal><action-tendencies>
<action-tendency) > are consistent
with the vocabularies indicated in a
given set, which is either identified using the respective ,set
category-setdimension-set
,appraisal-set
attribute or using QNAMES . ISSUE: and action-tendency-set
attributes.
It is
unclear how user software can know that
the responsibility of an EmotionML
document is
processor to
be validated using
verify that the
XProc script.
use of descriptor names and values is
consistent with the vocabulary definition.
An image gets annotated with several emotion categories at the same time, but different intensities.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata" category-set="http://www.example.com/custom/hall-matsumoto-emotions.xml"> <info> <meta:media-type>image</meta:media-type> <meta:media-id>disgust</meta:media-id> <meta:media-set>JACFEE-database</meta:media-set> <meta:doc>Example adapted from (Hall & Matsumoto 2004) http://www.davidmatsumoto.info/Articles/2004_hall_and_matsumoto.pdf </meta:doc> </info> <emotion><category set="basicEmotions" name="Disgust"/><category name="Disgust"/> <intensity value="0.82"/> </emotion> <emotion><category set="basicEmotions" name="Contempt"/><category name="Contempt"/> <intensity value="0.35"/> </emotion> <emotion><category set="basicEmotions" name="Anger"/><category name="Anger"/> <intensity value="0.12"/> </emotion> <emotion><category set="basicEmotions" name="Surprise"/><category name="Surprise"/> <intensity value="0.53"/> </emotion> </emotionml>
Example 1: Annotation of a whole video: several emotions are annotated with different intensities.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata" category-set="http://www.example.com/custom/humaine-database-labels.xml"> <info> <meta:media-type>video</meta:media-type> <meta:media-name>ed1_4</meta:media-name> <meta:media-set>humaine database</meta:media-set> <meta:coder-set>JM-AB-UH</meta:coder-set> </info> <emotion><category set="humaineDatabaseLabels" name="Amusement"/><category name="Amusement"/> <intensity value="0.52"/> </emotion> <emotion><category set="humaineDatabaseLabels" name="Irritation"/><category name="Irritation"/> <intensity value="0.63"/> </emotion> <emotion><category set="humaineDatabaseLabels" name="Relaxed"/><category name="Relaxed"/> <intensity value="0.02"/> </emotion> <emotion><category set="humaineDatabaseLabels" name="Frustration"/><category name="Frustration"/> <intensity value="0.87"/> </emotion> <emotion><category set="humaineDatabaseLabels" name="Calm"/><category name="Calm"/> <intensity value="0.21"/> </emotion> <emotion><category set="humaineDatabaseLabels" name="Friendliness"/><category name="Friendliness"/> <intensity value="0.28"/> </emotion> </emotionml>
Example 2: Annotation of a video segment, where two emotions are annotated for the same timespan.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata" category-set="http://www.example.com/custom/emotv-labels.xml"> <info> <meta:media-type>video</meta:media-type> <meta:media-name>ext-03</meta:media-name> <meta:media-set>EmoTV</meta:media-set> <meta:coder>4</meta:coder> </info> <emotion><category set="emoTV-labels" name="irritation"/><category name="irritation"/> <intensity value="0.46"/><link uri="ext03.avi" start="3.24s" end="15.4s"><reference uri="file:ext03.avi?t=3.24,15.4"> </emotion> <emotion><category set="emoTV-labels" name="despair"/><category name="despair"/> <intensity value="0.48"/><link uri="ext03.avi" start="3.24s" end="15.4s"/><reference uri="file:ext03.avi?t=3.24,15.4"/> </emotion> </emotionml>
This example shows how automatically annotated data from three affective sensor devices might be stored or communicated.
It shows an excerpt of an episode experienced on 23 November
2001 from 14:36
onwards.
onwards (absolute start time is 1006526160
milliseconds since 1 January 1970 00:00:00 GMT). Each device
detects an emotion, but at slightly different times and for
different durations.
The next entry of observed emotions occurs about 6 minutes
later.
later (absolute start time is 1006526520
milliseconds since 1 January 1970 00:00:00 GMT). Only the
physiology sensor has detected a short glimpse of anger, for the
visual and IR camera it was below their individual threshold so no
entry from them.
For simplicity, all devices use categorical annotations and the same set of categories. Obviously it would be possible, and even likely, that different devices from different manufacturers provide their data annotated with different emotion sets.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" category-set="http://www.example.com/emotion/category/everyday-emotions.xml"> ...<emotion date="2001-11-23T14:36Z"><emotion start="1006526160" modality="face"> <!--the first modality detects excitement. It is a camera observing the face. An URI to the database (a dedicated port at the server) is provided to access the video stream.--><category set="everyday" name="excited"/> <modality medium="visual" mode="face"/> <link uri="http://www.example.com" start="26s" end="98s"/><category name="excited"/> <reference uri="http://www.example.com/#t=26,98"/> </emotion><emotion date="2001-11-23T14:36Z"><emotion start="1006526160" modality="facial-skin-color"> <!--the second modality detects anger. It is an IR camera observing the face. An URI to the database (a dedicated port at the server) is provided to access the video stream.--><category set="everyday" name="angry"/> <modality medium="infrared" mode="face"/> <link uri="http://www.example.com" start="23s" end="108s"/><category name="angry"/> <reference uri="http://www.example.com/#t=23,108"/> </emotion><emotion date="2001-11-23T14:36Z"><emotion start="1006526160" modality="physiology"> <!--the third modality detects excitement again. It is a wearable device monitoring physiological changes in the body. An URI to the database (a dedicated port at the server) is provided to access the data stream.--><category set="everyday" name="excited"/> <modality medium="physiological" mode="body"/> <link uri="http://www.example.com" start="19s" end="101s"/><category name="excited"/> <reference uri="http://www.example.com/#t=19,101"/> </emotion><emotion date="2001-11-23T14:42Z"> <category set="everyday" name="angry"/> <modality medium="physiological" mode="body"/> <link uri="http://www.example.com" start="2s" end="6s"/><emotion start="1006526520" modality="physiology"> <category name="angry"/> <reference uri="http://www.example.com/#t=2,6"/> </emotion> ... </emotionml>
NOTE that handling of complex emotions is not explicitly specified. This example assumes that parallel occurrences of emotions will be determined on the time stamp.
The following example describes various aspects of an
emotionally competent
robot.
robot whose battery is nearly empty. The
robot is in a global state of high arousal, negative pleasure and
low dominance, i.e. a negative state of distress paired with some
urgency but quite limited power to influence the situation. It has
a tendency to seek a recharge and to avoid picking up boxes.
However, sensor data displays an unexpected obstacle on the way to
the charging station. This triggers planning of expressive behavior
of frowning. The annotations are grouped into a stand-alone
EmotionML document here; in the real world, the various aspects
would more likely be embedded into different specialized markup in
various parts of the Robot architecture.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata"> <info> <meta:name>robbie the robot example</meta:name> </info><!-- Appraised value of incoming event --> <emotion> <modality mode="senses"/> <appraisals set="scherer_appraisals_checks"> <novelty value="0.8" confidence="0.4"/> <intrinsic-pleasantness value="-0.5" confidence="0.8"/> </appraisals><!-- Robot's current global state configuration: negative, active, powerless --> <emotion dimension-set="http://www.example.com/emotion/dimension/PAD.xml"> <dimension name="pleasure" value="0.2"/> <dimension name="arousal" value="0.8"/> <dimension name="dominance" value="0.3"/> </emotion><!-- Robots current internal state configuration --> <emotion> <modality mode="internal"/> <dimensions set="arousal_valence_potency"> <arousal value="0.3"/> <valence value="0.9"/> <potency value="0.8"/> </dimensions><!-- Robot's action tendencies: want to recharge --> <emotion action-tendency-set="http://www.example.com/custom/action/robot.xml"> <action-tendency name="charge-battery" value="0.9"/> <action-tendency name="seek-shelter" value="0.7"/> <action-tendency name="pickup-boxes" value="0.1"/> </emotion><!-- Robots output action tendencies --> <emotion> <modality mode="body"/> <action-tendencies set="myRobotActionTendencies"> <charge-battery value="0.9"/> <seek-shelter value="0.7"/> <pickup-boxes value="-0.2"/> </action-tendencies><!-- Appraised value of incoming event: obstacle detected, appraised as novel and unpleasant --> <emotion appraisal-set="http://www.example.com/emotion/appraisal/scherer.xml" modality="laser-scanner"> <appraisal name="novelty" value="0.8" confidence="0.4"/> <appraisal name="intrinsic-pleasantness" value="0.2" confidence="0.8"/> <reference role="triggeredBy" uri="file:scannerdata.xml#obstacle27"/> </emotion><!-- Robots facial gestures --> <emotion> <modality mode="face"/> <category set="ekman_universal" name="joy"/> <link role="expressedBy" start="0" end="5s" uri="smile.xml"/><!-- Robot's planned facial gestures: will frown --> <emotion category-set="http://www.example.com/custom/robot-emotions.xml" modality="face"> <category name="frustration"/> <reference role="expressedBy" uri="file:behavior-repository.xml#frown"/> </emotion> </emotionml>
One intended use of EmotionML is as a plug-in for existing markup languages. For compatibility with text-annotating markup languages such as SSML , EmotionML avoids the use of text nodes. All EmotionML information is encoded in element and attribute structures.
This section illustrates the concept using two existing W3C markup languages: EMMA and SSML .
EMMA is made for representing arbitrary analysis results; one of them could be the emotional state. The following example represents an analysis of a non-verbal vocalization; its emotion is described as most probably a low-intensity state, maybe boredom.
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns="http://www.w3.org/2009/10/emotionml"><emma:interpretation start="12457990" end="12457995" mode="voice" verbal="false"><emma:interpretation emma:start="12457990" emma:end="12457995" emma:mode="voice" emma:verbal="false"><emotion><emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml"> <intensity value="0.1" confidence="0.8"/><category set="everydayEmotions" name="boredom" confidence="0.1"/><category name="boredom" confidence="0.1"/> </emotion> </emma:interpretation> </emma:emma>
Two options for using EmotionML with SSML can be illustrated.
First, it is possible with the
current draft
latest version of SSML [ SSML 1.1 ] to use arbitrary markup
belonging to a different namespace anywhere in an SSML document;
only SSML processors that support the markup would take it into
account. Therefore, it is possible to insert EmotionML below, for
example, an <s>
element representing a sentence;
the intended meaning is that the enclosing sentence should be
spoken with the given emotion, in this case a moderately doubtful
tone of voice:
<?xml version="1.0"?> <speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:emo="http://www.w3.org/2009/10/emotionml" xml:lang="en-US"> <s><emo:emotion> <emo:category set="everydayEmotions" name="doubt"/><emo:emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml"> <emo:category name="doubt"/> <emo:intensity value="0.4"/> </emo:emotion> Do you need help? </s> </speak>
Second, a future version of SSML could explicitly preview the
annotation of paralinguistic information, which could fill the gap
between the extralinguistic, speaker-constant settings of the
<voice>
tag and the linguistic elements such as
<s>
, <emphasis>
,
<say-as>
etc. The following example assumes that
there is a <style>
tag for paralinguistic
information in a future version of SSML. The style could either
embed an <emotion>
, as follows:
<?xml version="1.0"?> <speak version="x.y" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:emo="http://www.w3.org/2009/10/emotionml" xml:lang="en-US"> <s> <style><emo:emotion> <emo:category set="everydayEmotion" name="doubt"/><emo:emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml"> <emo:category name="doubt"/> <emo:intensity value="0.4"/> </emo:emotion> Do you need help? </style> </s> </speak>
Alternatively, the <style>
could refer to a
previously defined <emotion>
, for example:
<?xml version="1.0"?> <speak version="x.y" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:emo="http://www.w3.org/2009/10/emotionml" xml:lang="en-US"><emo:emotion id="somewhatDoubtful"> <emo:category set="everydayEmotion" name="doubt"/><emo:emotion category-set="http://www.example.com/emotion/category/everyday-emotions.xml" id="somewhatDoubtful"> <emo:category name="doubt"/> <emo:intensity value="0.4"/> </emo:emotion> <s> <style ref="#somewhatDoubtful"> Do you need help? </style> </s> </speak>
Using EmotionML for the use case of generating system behavior requires elements of scheduling and surface form realization which are not part of EmotionML. Necessarily, this use case relies on other languages to provide the needed functionality. This is in line with the aim of EmotionML to serve as a specialized plug-in language.
This example illustrates the idea in terms
of a simplified version of a storytelling application. A virtual
agent tells a story using voice and facial animation. The
expression in face and voice is influenced by the rendering engine
in terms of EmotionML. The engine in this example uses SMIL [
SMIL
] for defining the temporal relation between
events; EmotionML is used via SMIL's generic <ref>
element. In general it is the engine which knows
how to render the emotion in the virtual agent's expressive
capabilities. To override this, the second <emotion>
contains an explicit request to realize the
emotional expression using both face and voice
modalities.
ridinghood.smil:
<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0"> <head> ... </head> <body> <par duration="8s"> <img src="file:forest.jpg"/> <smileText>The little girl was enjoying the walk in the forest.</smileText> <ref src="file:ridinghood.emotionml#emotion1"/> </par> <par duration="5s"> <img src="file:wolf.jpg"/> <smileText>Suddenly a dark shadow appeared in front of her.</smileText> <ref src="file:ridinghood.emotionml#emotion2"/> </par> </body> </smil>
ridinghood.emotionml:
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" category-set="http://www.example.com/emotion/category/everyday-emotions.xml" appraisal-set="http://www.example.com/emotion/appraisal/scherer.xml"> <emotion id="emotion1"> <category name="contentment"/> <intensity value="0.7"/> </emotion> <emotion id="emotion2" modality="face voice"> <category name="fear"/> <intensity value="0.9"/> <appraisal name="novelty" value="0.9"/> <appraisal name="intrinsic-pleasantness" value="0.1"/> </emotion> </emotionml>
Similar principles for decoupling emotion markup from the temporal organization of generating system behavior can be applied using other representations, including interactive setups.
The authors wish to acknowledge the contributions by all members of the Emotion Markup Language Incubator Group and the Emotion Incubator Group, in particular the following persons (in alphabetic order):
This Appendix points out the main changes since the previous working draft of 29 October 2009 ; for more details, see the diff-marked version of this specification (non-normative).
<dimension>
,<appraisal>
and <action-tendency>
elements with a name
attribute.
start
and end
attributes to represent absolute time, and
Media Fragment
URIs to refer to portions of media
files.
<info>
element, in synchrony with EMMA .<link>
element was renamed to <reference>
to avoid a name clash with the
<link>
element in HTML, which has a different scope and
syntax.