Copyright © 2011 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
As the web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The specification of Emotion Markup Language 1.0 aims to strike a balance between practical applicability and scientific well-foundedness. The language is conceived as a "plug-in" language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a Last Call Working Draft of "Emotion Markup Language 1.0", published on 7 April 2011. The W3C Membership and other interested parties are invited to review the document and send comments to www-multimodal@w3.org (with public archive) until 7 June 2011.
This Last Call Working Draft has addressed all open issues from the previous working draft, as well as the issues which were raised at the W3C workshop on EmotionML. The changes compared to the previous Working Draft of 29 July 2010 are listed in Appendix A.
This document was developed by the Multimodal Interaction Working Group. The Working Group expects to advance this Working Draft to Recommendation Status.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
The sections in the main body of this document are normative unless otherwise specified. The appendices in this document are informative unless otherwise indicated explicitly.
This section is informative.
Human emotions are increasingly understood to be a crucial aspect in human-machine interactive systems. Especially for non-expert end users, reactions to complex intelligent systems resemble social interactions, involving feelings such as frustration, impatience, or helplessness if things go wrong. Furthermore, technology is increasingly used to observe human-to-human interactions, such as customer frustration monitoring in call center applications. Dealing with these kinds of states in technological systems requires a suitable representation, which should make the concepts and descriptions developed in the affective sciences available for use in technological contexts.
This report specifies Emotion Markup Language (EmotionML) 1.0, a markup language designed to be usable in a broad variety of technological contexts while reflecting concepts from the affective sciences.
As for any standard format, the first and main goal of an EmotionML is twofold: to allow a technological component to represent and process data, and to enable interoperability between different technological components processing the data.
Use cases for EmotionML can be grouped into three broad types:
Interactive systems are likely to involve both analysis and generation of emotion-related behavior; furthermore, systems are likely to benefit from data that was manually annotated, be it as training data or for rule-based modeling. Therefore, it is desirable to propose a single EmotionML that can be used in all three contexts.
Concrete examples of existing technology that could apply EmotionML include:
The Emotion Incubator Group has listed 39 individual use cases for an EmotionML.
A second reason for defining an EmotionML is the observation that ad hoc attempts to deal with emotions and related states often lead people to make the same mistakes that others have made before. The most typical mistake is to model emotions as a small number of intense states such as anger, fear, joy, and sadness; this choice is often made irrespective of the question whether these states are the most appropriate for the intended application. Crucially, the available alternatives that have been developed in the affective science literature are not sufficiently known, resulting in dead-end situations after the initial steps of work. Careful consideration of states to study and of representations for describing them can help avoid such situations.
EmotionML makes scientific concepts of emotions practically applicable. This can help potential users to identify the suitable representations for their respective applications.
Any attempt to standardize the description of emotions using a finite set of fixed descriptors is doomed to failure: even scientists cannot agree on the number of relevant emotions, or on the names that should be given to them. Even more basically, the list of emotion-related states that should be distinguished varies depending on the application domain and the aspect of emotions to be focused. Basically, the vocabulary needed depends on the context of use. On the other hand, the basic structure of concepts is less controversial: it is generally agreed that emotions involve triggers, appraisals, feelings, expressive behavior including physiological changes, and action tendencies; emotions in their entirety can be described in terms of categories or a small number of dimensions; emotions have an intensity, and so on. For details, see Scientific Descriptions of Emotions in the Final Report of the Emotion Incubator Group.
Given this lack of agreement on descriptors in the field, the only practical way of defining an EmotionML is the definition of possible structural elements and their valid child elements and attributes, but to allow users to "plug in" vocabularies that they consider appropriate for their work. A separate W3C Working Draft complements this specification to provide a central repository of [Vocabularies for EmotionML] which can serve as a starting point; where the vocabularies listed there seem inappropriate, users can create their custom vocabularies.
An additional challenge lies in the aim to provide a generally usable markup, as the requirements arising from the three different use cases (annotation, recognition, and generation) are rather different. Whereas manual annotation tends to require all the fine-grained distinctions considered in the scientific literature, automatic recognition systems can usually distinguish only a very small number of different states.
For the reasons outlined here, it is clear that there is an inevitable tension between flexibility and interoperability, which need to be weighed in the formulation of an EmotionML. The guiding principle in the following specification has been to provide a choice only where it is needed, and to propose reasonable default options for every choice.
The terms related to emotions are not used consistently, neither in common use nor in the scientific literature. The following glossary describes the intended meaning of terms in this document.
The following sections describe the syntax of the main elements of EmotionML.
<emotionml>
elementAnnotation | <emotionml> |
---|---|
Definition | The root element of an EmotionML document. |
Children | The element MAY contain one or more <emotion>
elements. It MAY contain a single <info> element. It MAY contain
one or more <vocabulary>
elements. |
Attributes |
|
Occurrence | This is the root element -- it cannot occur as a child of any other EmotionML element. |
<emotionml>
is the root element of a standalone EmotionML
document. It MAY contain a single <info>
element, providing
document-level metadata.
The <emotionml>
element MUST define the EmotionML namespace.
Standalone EmotionML documents usually serve one or both of the following two purposes:
<emotion>
elements into a single document;<emotion>
annotations in the same or
other documents.Example:
<emotionml version="1.0" xmlns="http://www.w3.org/2009/10/emotionml"> ... </emotionml>
or
<em:emotionml version="1.0" xmlns:em="http://www.w3.org/2009/10/emotionml"> ... </em:emotionml>
Note: One of the envisaged uses of EmotionML is to be used in the context of
other markup languages. In such cases, there will be no
<emotionml>
root element, but <emotion>
elements will be used directly in other markup -- see Examples
of possible use with other markup languages.
<emotion>
elementAnnotation | <emotion> |
---|---|
Definition | This element represents a single emotion annotation. |
Children | All children are optional. However, at least one of
<category> , <dimension> ,
<appraisal> , <action-tendency>
MUST occur.
If present, the following child element can occur only once:
If present, the following child elements may occur one or more
times: There are no constraints on the combinations of children that are allowed. There are no constraints on the order in which children occur. |
Attributes |
|
Occurrence | as a child of <emotionml> , or in any markup using
EmotionML. |
The <emotion>
element represents an individual emotion
annotation. No matter how simple or complex its substructure is, it represents
a single statement about the emotional content of some annotated item. Where
several statements about the emotion in a certain context are to be made,
several <emotion>
elements MUST be used. See Examples of emotion annotation for illustrations of this
issue.
An <emotion>
element MAY have an id
attribute, allowing for a unique reference to the individual emotion
annotation. Since the <emotion>
annotation is an atomic
statement about the emotion, it is inappropriate to refer to individual emotion
representations such as <category>
,
<dimension>
, <appraisal>
,
<action-tendency>
or their children directly. For this
reason, these elements do not allow for an id
attribute.
Whereas it is possible to use <emotion>
elements in a
standalone <emotionml>
document, a typical use case is
expected to be embedding an <emotion>
into some other markup
-- see Examples of possible use with other markup
languages.
<category>
elementAnnotation | <category> |
---|---|
Definition | Description of an emotion or a related state using a category. |
Children | <trace> : A
<category> MAY contain either a value
attribute or a <trace> element. |
Attributes |
|
Occurrence | One or more <category> elements MAY occur as a
child of <emotion> . For any given category name in
the set, zero or one occurrence is allowed within an
<emotion> element. |
<category>
describes an emotion or a related state in terms of a category
name, given as the value of the name
attribute. The name MUST
belong to a clearly-identified set of category names, which MUST be defined
according to Defining vocabularies for representing
emotions.
The set of legal values of the name
attribute is indicated in
the category-set
attribute of the enclosing <emotion>
or <emotionml>
element. Different sets can
be used, depending on the requirements of the use case. In particular,
different types of emotion-related /
affective states can be annotated by using
appropriate value sets.
The intensity of an emotion category MAY be specified as a Scale value, either as a static value in the value
attribute, or as a dynamic trace over
time using the <trace>
element.
Examples:
In the following example, the emotion category "satisfaction" is being annotated; it must be contained in the definition of the emotion category vocabulary located at http://www.w3.org/TR/emotion-voc/xml#everyday-categories, which is one of the category vocabularies provided in [Vocabularies for EmotionML].
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="satisfaction"/> </emotion>
The following is an annotation of an interpersonal stance "distant" which must be defined in the custom category set at the URI given in the category-set attribute:
<emotion category-set="http://www.example.com/custom/category/interpersonal-stances.xml"> <category name="distant"/> </emotion>
In the following example, an emotion is described by several categories, each being present with different values of intensity. The category set used is the "big six" set described in [Vocabularies for EmotionML].
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <category name="sadness" value="0.3"/> <category name="anger" value="0.8"/> <category name="fear" value="0.3"/> </emotion>
<dimension>
elementAnnotation | <dimension> |
---|---|
Definition | One or more <dimension> elements jointly describe
an emotion or a related state according to an emotion dimension
vocabulary. |
Children | <trace> : A
<dimension> MUST contain either a value
attribute or a <trace> element. |
Attributes |
|
Occurrence | <dimension> elements occur as children of
<emotion> . For any given dimension name in the set,
zero or one occurrence is allowed within an
<emotion> element. |
One or more <dimension>
elements jointly describe an emotion or a related state in terms of a set of emotion dimensions. The names of the
emotion dimensions MUST belong to a clearly-identified set of dimension names,
which MUST be defined according to Defining vocabularies for
representing emotions.
The set of legal values of the name
attribute is indicated in
the dimension-set
attribute of the enclosing <emotionml>
or <emotion>
element. Different sets can be
used, depending on the requirements of the use case.
The position on an emotion dimension MUST be specified as a Scale value, either as a static value in the value
attribute, or as a dynamic trace over
time using the <trace>
element.
Examples:
One of the most widespread sets of emotion dimensions used (sometimes by different names) is the combination of valence, arousal and potency. The following example is a state of rather low arousal, very positive valence, and high potency -- in other words, a relaxed, positive state with a feeling of being in control of the situation. The example uses the Pleasure-Arousal-Dominance (PAD) vocabulary from [Vocabularies for EmotionML]:
<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#pad-dimensions"> <dimension name="arousal" value="0.3"/><!-- lower-than-average arousal --> <dimension name="pleasure" value="0.9"/><!-- very high positive valence --> <dimension name="dominance" value="0.8"/><!-- relatively high potency --> </emotion>
In some use cases, custom sets of application-specific dimensions will be required. The following example uses a custom set of dimensions, defining a single dimension "friendliness".
<emotion dimension-set="http://www.example.com/custom/dimension/friendliness.xml"> <dimension name="friendliness" value="0.2"/><!-- a pretty unfriendly person --> </emotion>
The usual way to represent the intensity of an emotion would be the
value
attribute of a <category>
. However, if
only the intensity of an emotion is annotated, but not its nature, this can be
done by using an "intensity" dimension. Thus, an emotional state's "strength"
or "intensity" can be described independently from categorical or dimensional
descriptions, as shown by the following example.
<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#intensity-dimension"> <dimension name="intensity" value="0.2"/><!-- not in a strong emotional state --> </emotion>
<appraisal>
elementAnnotation | <appraisal> |
---|---|
Definition | One or more <appraisal> elements jointly describe
an emotion or a related state according to an emotion appraisal
vocabulary. |
Children | <trace> : An
<appraisal> MAY contain either a value
attribute or a <trace> element. |
Attributes |
|
Occurrence | <appraisal> elements occur as children of
<emotion> . For any given appraisal name in the set,
zero or one occurrence is allowed within an
<emotion> element. |
One or more <appraisal>
elements jointly describe an emotion or a related state in terms of a set of appraisals. The names of the appraisals MUST belong
to a clearly-identified set of appraisal names, which MUST be defined according
to Defining vocabularies for representing emotions.
The set of legal values of the name
attribute is indicated in
the appraisal-set
attribute of the enclosing
<emotion>
element. Different sets can be used, depending on
the requirements of the use case.
The degree to which an appraisal is present MAY be specified as a Scale value, either as a static value in the value
attribute, or as a dynamic trace over
time using the <trace>
element.
Examples:
One of the most widespread sets of emotion appraisals used is the appraisals set proposed by Klaus Scherer, covering aspects of novelty, intrinsic pleasantness, goal/need significance, coping potential, and norm/self compatibility. Another very widespread set of emotion appraisals, used in particular in computational models of emotion, is the OCC set of appraisals (Ortony et al., 1988), which includes the consequences of events for oneself or for others, the actions of others and the perception of objects. Using Scherer's appraisals from [Vocabularies for EmotionML], the following example is a state arising from the evaluation of an unpredicted and quite unpleasant event:
<emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals"> <appraisal name="suddenness" value="0.8"/> <appraisal name="intrinsic-pleasantness" value="0.2"/> </emotion>
In some use cases, custom sets of application-specific appraisals will be required. The following example uses a custom set of appraisals, defining the single appraisal "likelihood".
<emotion appraisal-set="http://www.example.com/custom/appraisal/likelihood.xml"> <appraisal name="likelihood" value="0.8"/><!-- a very predictable event --> </emotion>
<action-tendency>
elementAnnotation | <action-tendency> |
---|---|
Definition | One or more <action-tendency> elements jointly
describe an emotion or a related state according to an emotion action
tendency vocabulary. |
Children | <trace> : An
<action-tendency> MAY contain either a
value attribute or a <trace> element.
|
Attributes |
|
Occurrence | <action-tendency> elements occur as children of
<emotion> . For any given action tendency name in the
set, zero or one occurrence is allowed within an
<emotion> element. |
One or more <action-tendency>
elements jointly describe
an emotion or a related state in terms of a set of action tendencies. The names of the action
tendencies MUST belong to a clearly-identified set of action tendency names,
which MUST be defined according to Defining vocabularies for
representing emotions.
The set of legal values of the name
attribute is indicated in
the action-tendency-set
attribute of the enclosing
<emotion>
element. Different sets can be used, depending on
the requirements of the use case.
The degree to which an action tendency is present MAY be specified as a Scale value, either as a static value in the value
attribute, or as a dynamic trace over
time using the <trace>
element.
Examples:
One well known use of action tendencies is by N. Frijda. This model uses a number of action tendencies that are low level, diffuse behaviors from which more concrete actions could be determined. It is provided in [Vocabularies for EmotionML]. An example of someone attempting to attract someone they like by being confident, strong and attentive might look like this:
<emotion action-tendency-set="http://www.w3.org/TR/emotion-voc/xml#frijda-action-tendencies"> <action-tendency name="approach" value="0.7"/> <!-- get close --> <action-tendency name="being-with" value="0.8"/> <!-- be happy --> <action-tendency name="attending" value="0.7"/> <!-- pay attention --> <action-tendency name="dominating" value="0.7"/> <!-- be assertive --> </emotion>
In some use cases, custom sets of application-specific action tendencies will be required. The following example shows control values for a robot who works in a factory and uses a custom set of action-tendencies, defining example actions for a robot. In the example, the robot has very low battery, so it needs to get ready to charge its battery and stop its work of picking up boxes.
<emotion action-tendency-set="http://www.example.com/custom/action/robot.xml"> <action-tendency name="charge-battery" value="0.9"/> <!-- need to charge battery soon --> <action-tendency name="pickup-boxes" value="0.3"/> <!-- feeling tired, avoid work --> </emotion>
confidence
attributeAnnotation | confidence |
---|---|
Definition | A representation of the degree of confidence or probability that a certain element of the representation is correct. |
Occurrence | An optional attribute of <category> , <dimension> , <appraisal> and <action-tendency> elements. |
Confidence MAY be indicated separately for each of the Representations of emotions and related states. For example,
the confidence that the <category>
is assumed correctly is
independent from the confidence that the position on a dimension is correctly
indicated.
Rooted in the tradition of statistics a confidence is given in an interval from 0 to 1, resembling a probability. Insofar, the confidence is a Scale value.
Examples:
In the following, one simple example is provided for each element that can
carry a confidence
attribute.
The first example indicates a very high confidence that surprise is the emotion to annotate.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <category name="surprise" confidence="0.95"/> </emotion
The next example illustrates using confidence
to indicate that
the annotation of high arousal is probably correct, but the annotation of
slightly positive pleasure may or may not be correct.
<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#pad-dimensions"> <dimension name="arousal" value="0.8" confidence="0.9"/> <dimension name="pleasure" value="0.6" confidence="0.3"/> </emotion>
Finally, an example for the case of intensity: A high confidence is given that the emotion has a low intensity.
<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#intensity-dimension"> <dimension name="intensity" value="0.1" confidence="0.8"/> </emotion>
Note that, as stated, obviously an emotional annotation can be a combination of the above, as in the following example: the intensity of the emotion is quite probably low, but if we have to guess, we would say the emotion is boredom.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" dimension-set="http://www.w3.org/TR/emotion-voc/xml#intensity-dimension"> <category name="bored" confidence="0.1"/> <dimension name="intensity" value="0.1" confidence="0.8"/> </emotion>
expressed-through
attributeAnnotation | expressed-through |
---|---|
Definition | The modality, or list of modalities, through which the emotion is
expressed. An attribute of type xsd:nmtokens which
contains a space delimited set of values from an open set of values
including: {gaze , face , head ,
torso , gesture , leg ,
voice , text , locomotion ,
posture , physiology , ...}. |
Occurrence | An optional attribute of <emotion> elements. |
The expressed-through
attribute describes the modality through
which an emotion is produced, usually by a human being. It is not the technical
modality by which it was detected, e.g. "face" rather than "camera" and "voice"
rather than "microphone". The expressed-through
attribute is
agnostic about the use case: when detecting emotion, it represents the modality
from which the emotion has been detected; when generating emotion-related
system behavior, it represents the modality through which the emotion is to be
expressed.
The list of values provided covers a broad range of modalities through which emotions may be expressed. These values SHOULD be used if they are appropriate. The list is an open set in order to allow for more fine-grained distinctions such as "eyes" vs. "mouth" etc.
The expressed-through
attribute is not specific about the
sensors used for observing the modality. These can be specified using the <info>
element, or by the
emma:mode
attribute in an enclosing [EMMA]
document.
Example:
In the following example the emotion is expressed through the voice.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" expressed-through="voice"> <category name="satisfied"/> </emotion>
In case of multimodal expression of an emotion, a list of space separated
modalities can be indicated in the expressed-through
attribute,
like in the following example in which the two values "face" and "voice" are
used.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" expressed-through="face voice"> <category name="satisfied"/> </emotion>>
See also the examples in sections 5.1.2 Automatic recognition of emotions, 5.1.3 Generation of emotion-related system behavior and 5.2.3 Use with SMIL.
<info>
elementAnnotation | <info> |
---|---|
Definition | This element can be used to annotate arbitrary metadata. |
Children | One or more elements in a different namespace than the EmotionML namespace, providing metadata. |
Attributes |
|
Occurence | A single <info> element MAY occur as a child of
the <emotionml > root tag to indicate global
metadata, i.e. the annotations are valid for the document scope;
furthermore, a single <info> element MAY occur as a
child of each <emotion> element to indicate local
metadata that is only valid for that <emotion>
element. |
This element can contain arbitrary XML data in a different namespace (one option could be [RDF] data), either on a document global level or on a local "per annotation element" level.
Several initiatives of standardizing metadata exist, such as [IMDI] and [CLARIN]. Metadata may contain information on a large spectrum of elements such as: location description (continent, country, address), content type (e.g., genre, task, modalities), session (title, a recording date, a group of participants); each participant may be defined by her role in the session (e.g. annotator, filmer), her name, her social family role, etc.
Examples:
In the following example, the automatic classification for an annotation document was performed by a classifier based on Gaussian Mixture Models (GMM); the speakers of the annotated elements were of different German origins.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:classifiers="http://www.example.com/meta/classify/" xmlns:origin="http://www.example.com/meta/local/" category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <info> <classifiers:classifier classifiers:name="GMM"/> </info> <emotion> <info><origin:localization value="bavarian"/></info> <category name="happiness"/> </emotion> <emotion> <info><origin:localization value="swabian"/></info> <category name="sadness"/> </emotion> </emotionml>
The following example uses the IMDI metadata language to represent
information about the annotator who produced the emotion annotation in the
current document, in a global <info>
element.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:imdi="http://www.mpi.nl/IMDI/Schema/IMDI"> <info> <imdi:Actors> <imdi:Actor> <imdi:Role>Annotator</imdi:Role> <imdi:Name>John</imdi:Name> <imdi:FullName>John Smith Junior</imdi:FullName> <imdi:Code>JS</imdi:Code> <imdi:FamilySocialRole>Teacher</imdi:FamilySocialRole> ... </imdi:Actor> </imdi:Actors> </info> ... <emotion>...</emotion> <emotion>...</emotion> </emotionml>
The following example illustrates how <info>
can be used
for annotating information on sensors through which an affective signal has
been detected. In the global <info>
section, the sensors
used in the particular scenario are specified. Apart from their ID, information
on the modality observed by this sensor is provided as well as information on
the confidence for that sensor. In this example, the modality "posture" is
observed by a camera and a chair equipped with pressure sensors. For some
reason it is decided that emotion estimates based on camera data should be
trusted more than those based on chair data. Within the
<emotion>
elements, <info>
is used to
specify which sensor has been used to calculate the actual emotion value.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:sensors="http://www.example.com/meta/sensors/" category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <info> <sensors:sensor id="camera1" confidence="0.9" expressed-through="posture"/> <sensors:sensor id="chair" confidence="0.3" expressed-through="posture"/> ... </info> <emotion expressed-through="posture"> <info> <sensors:sensor idref="camera1"/> </info> <category name="angry"/> </emotion> <emotion expressed-through="posture"> <info> <sensors:sensor idref="chair"/> </info> <category name="neutral"/> </emotion> </emotionml>
<reference>
elementAnnotation | <reference> |
---|---|
Definition | References may be used to relate the emotion annotation to the "rest of the world", more specifically to the emotional expression, the experiencing subject, the trigger, and the target of the emotion. |
Children | None |
Attributes |
|
Occurrence | Multiple <reference> elements MAY occur as
children of <emotion> . |
A <reference>
element provides a link to media as a URI
[RFC3986]. The semantics of references are described
by the role
attribute which, if present, MUST have one of four
values:
expressedBy
" indicates that the reference points to
observable behavior expressing the emotion. This is the default value if
the role
attribute is not explicitly stated;experiencedBy
" indicates that the reference points to the
subject experiencing the emotion;triggeredBy
" indicates that the reference points to an
emotion-eliciting event that caused an emotion and/or related
appraisals;targetedAt
" indicates that the reference points to an
object towards which an emotion-related action, or action tendency, is
directed.For reference targets representing a period of time, start and end time MAY be denoted by using the media fragments syntax, as explained in section 2.4.2.4.
The media-type
attribute MAY be used to differentiate between
different media types such as audio, video, text, etc.
There is no restriction regarding the number of
<reference>
elements that MAY occur as children of
<emotion>
.
Examples:
The following example illustrates the reference to two different URIs having
a different role
with respect to the emotion: one reference points
to the emotion's expression, a video clip showing a user expressing the
emotion; the other reference points to the trigger that caused the emotion, in
this case another video clip that was seen by the person who expressed the
emotion.
<emotion ... > ... <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" role="expressedBy"/> <reference uri="http://www.example.com/events/e12.xml" role="triggeredBy"/> </emotion>
Several references may follow as children of one
<emotion>
tag, even having the same role
; for
example, the following annotation refers to a portion of a video and to
physiological sensor data, both of which expressed the emotion:
<emotion ... > ... <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" role="expressedBy"/> <reference uri="http://www.example.com/data/physio/ph7.txt" role="expressedBy"/> </emotion>
It is possible to explicitly indicate the MIME type of the item that the reference refers to:
<emotion ... > ... <reference uri="http://www.example.com/data/video/v1.avi?t=2,13" media-type="video/mp4"/> </emotion>
Annotation | start , end |
---|---|
Definition | Attributes to denote the starting and ending absolute times. They are
of type xsd:nonNegativeInteger and indicate the number of
milliseconds since 1 January 1970 00:00:00 GMT. |
Occurrence | The attributes MAY occur inside an <emotion>
element. |
start
and end
attributes denote the absolute
starting and ending times at which an emotion or related state happened. This might be
used for example with an "emotional diary" application.
Examples:
In the following example, the emotion category "surprise" is annotated,
immediately followed by the category "happiness". The start
and
end
attributes specify for each emotion
element the
absolute beginning and ending times.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6" start="1268647200" end="1268647330"> <category name="surprise"/> </emotion> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6" start="1268647331" end="1268647400"> <category name="happiness"/> </emotion>
The end
value MUST be greater than or equal to the
start
value.
The ECMAScript Date object's getTime() function is a way to determine the absolute time.
Annotation | duration |
---|---|
Definition | Attribute of type xsd:nonNegativeInteger ,
defaulting to zero. It specifies the duration of the event in
milliseconds. |
Occurrence | This attribute MAY occur inside an <emotion>
element. |
The duration of an input in milliseconds MAY be specified with the
duration
attribute. The duration
attribute MAY be
used either in combination with the start
or
offset-to-start
attribute or independently.
A start
or offset-to-start
attribute together with
the duration
attribute set to zero MAY be used to indicate a
single timestamp on a time axis.
Examples:
In the following example, the start
and duration
of the emotion category "surprise" are annotated:
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6" start="1268647200" duration="130"> <category name="surprise"/> </emotion>
Annotation | time-ref-uri |
---|---|
Definition | Attribute of type xsd:anyURI indicating the URI used to
anchor the relative timestamp. |
Occurrence | This attribute MAY occur inside an <emotion>
element. |
Annotation | time-ref-anchor-point |
Definition | Attribute with a value of start or end ,
defaulting to start . It indicates whether to measure the
time from the start or end of the interval designated with
time-ref-uri . |
Occurrence | This attribute MAY occur inside an <emotion>
element. |
Annotation | offset-to-start |
Definition | Attribute of type xsd:integer , defaulting
to zero. It specifies the offset in milliseconds for the start of input
from the anchor point designated with
time-ref-uri and
time-ref-anchor-point |
Occurrence | This attribute MAY occur inside an <emotion>
element. |
Relative timestamps define the start of an input relative to the start or end of a reference interval such as another input.
The reference interval is designated with time-ref-uri
attribute. This MAY be combined with time-ref-anchor-point
attribute to specify whether the anchor point is the start or end of this
interval. The start of an input relative to this anchor point is then specified
with offset-to-start
attribute.
The time-ref-uri
attribute can point to a custom-defined
timestamp or can be, for example, a session identifier.
Examples:
Here is an example where the emotion "surprise" occurs two seconds after the reference time point:
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6" time-ref-uri="#my_session_id" offset-to-start="2000"> <category name="surprise"/> </emotion>
Annotation | URI fragment: t |
---|---|
Definition | Attributes to denote start and endpoint of an annotation in a media stream. Allowed values must be conform with the Media Fragments Specification [Media Fragments] |
Occurence | The URI fragment MAY occur in the uri attribute of a
<reference> element. |
Temporal clipping is denoted by the name t, and specified as an interval with a begin time and an end time. Either or both may be omitted, with the begin time defaulting to 0 seconds and the end time defaulting to the duration of the source media. The interval is half-open: the begin time is considered part of the interval whereas the end time is considered to be the first time point that is not part of the interval. If a single number only is given, this is the begin time.
Temporal clipping can be specified either as Normal Play Time (npt) [RFC 2326], as SMPTE timecodes, [SMPTE], or as real-world clock time (clock) [RFC 2326]. Begin and end times are always specified in the same format. The format is specified by name, followed by a colon (:), with npt: being the default.
Examples:
In the following example, the emotion category "happiness" is displayed in an audio file called "myAudio.wav" from the 3rd to the 9th second.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <category name="happiness"/> <reference uri="myAudio.wav#t=3,9"/> </emotion>
In the following example, the emotion category "happiness" is displayed in a video file called "myVideo.avi" in SMPTE values, resulting in the time interval [120,121.5).
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <category name="happiness"/> <reference uri="myVideo.avi#t=smpte-30:0:02:00,0:02:01:15"/> </emotion>
A last example states this in a video file in real-world clock time code, as a 1 min interval on 26th Jul 2009 from 11hrs, 19min, 1sec.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <category name="happiness"/> <reference uri="myVideo.avi#t=clock:2009-07-26T11:19:01Z,2009-07-26T11:20:01Z"/> </emotion>
Scale values are needed to represent content in <category>
, <dimension>
, <appraisal>
and <action-tendency>
elements, as well as in
confidence
.
Representations of scale values can be static or dynamic. A static, constant
scale value is represented using the value
attribute; for dynamic
values, their evolution over time is expressed using the
<trace>
element.
value
attributeAnnotation | value |
---|---|
Definition | Representation of a static scale value. |
Occurrence | The <dimension> element MUST
contain either a value attribute or a
<trace> element; <category> , <appraisal> and <action-tendency> MAY contain
either a value attribute or a <trace>
element. |
The value
attribute represents a static scale value of the
enclosing element.
Conceptually, a scale can represent concepts that vary from "nothing" to "a
lot" (unipolar scales), or concepts that vary between two opposites, from "very
negative" to "very positive" (bipolar scales). Both are represented in
EmotionML using floating point values from the interval [0;1]. The min and max
values of the scale SHOULD be interpreted as the extreme values, for both
unipolar and bipolar scales. For example in a <category>
, a
value="0"
SHOULD be interpreted to mean absolutely no emotion
(emotionless); a value="1.0"
SHOULD be interpreted to mean emotion
at maximum intensity (pure uncontrolled emotion). For bipolar scales, such as
the valence dimension, a value of 0 represents the most negative possible
value, whereas a value of 1 represents the most positive value possible. The
neutral middle point of the scale is at 0.5.
Here are several examples for the usage of scales with EmotionML.
<emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#fsre-dimensions"> <dimension name="arousal" value="0.4"/> <!-- a bit less than average arousal --> <dimension name="valence" value="0.6"/> <!-- a bit above average valence --> </emotion> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="angry" value="0.5"/> <!-- anger at medium intensity --> </emotion> <emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals"> <appraisal name="suddenness" value="0.9"/> <!-- appraisal as a very sudden event --> </emotion> <emotion action-tendency-set="http://www.w3.org/TR/emotion-voc/xml#frijda-action-tendencies"> <action-tendency name="approach" value="0.3"/> <!-- a rather weak tendency to approach --> </emotion>
Further examples of the value
attribute can be found in the
context of the <category>
, <dimension>
, <appraisal>
and <action-tendency>
elements.
<trace>
elementAnnotation | <trace> |
---|---|
Definition | Representation of the time evolution of a dynamic scale value. |
Children | None |
Attributes |
|
Occurrence | The |
A <trace>
element represents the time course of a scale
value.
The freq
attribute indicates the sampling frequency at which
the values listed in the samples
attribute are given.
NOTE: The <trace>
representation requires a periodic
sampling of values. In order to represent values that are sampled
aperiodically, separate <emotion>
annotations with
appropriate timing information and individual value
attributes may
be used.
Examples:
The following example illustrates the use of a trace to represent an episode of fear during which the emotion's intensity is rising, first gradually, then quickly to a very high value. Values are taken at a sampling frequency of 10 Hz, i.e. one value every 100 ms.
<emotion category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <category name="fear"> <trace freq="10Hz" samples="0.1 0.1 0.15 0.2 0.2 0.25 0.25 0.25 0.3 0.3 0.35 0.5 0.7 0.8 0.85 0.85"/> </category> </emotion>
The following example combines a trace of the appraisal "suddenness" with a global confidence that the values represent the facts properly. There is a sudden peak of suddenness; the annotator is reasonably certain that the annotation is correct:
<emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals"> <appraisal name="suddenness" confidence="0.75"> <trace freq="10Hz" samples="0.1 0.1 0.1 0.1 0.1 0.7 0.8 0.8 0.8 0.8 0.4 0.2 0.1 0.1 0.1"/> </appraisal> </emotion>
EmotionML markup MUST refer to one or more vocabularies to be used for
representing emotion-related states, as specified in the context of the <emotionml>
and <emotion>
elements. Due to the lack of
agreement in the community, the EmotionML specification does not preview a
single default set which should apply if no set is indicated. Instead, the user
MUST explicitly state the set of descriptor names used.
The document [Vocabularies for EmotionML] provides a number of emotion vocabularies which are likely to be of general interest. In order to promote interoperability, users SHOULD verify if one of the vocabularies defined in that document is suitable for their application. If that is not the case, users can define their own custom vocabularies as defined in the present section.
The syntax for defining emotion vocabularies is based on the element
<vocabulary>
and its child <item>
.
<vocabulary>
elementAnnotation | <vocabulary> |
---|---|
Definition | Contains the definition of an emotion vocabulary. |
Children | A <vocabulary> element MUST contain one ore more
<item> elements. A
<vocabulary> element MAY contain a single <info> element, providing
arbitrary metadata about the vocabulary itself. |
Attributes |
|
Occurrence | One or more |
Vocabulary definitions, when present, occur as direct children of the
document root element <emotionml>
. It
is possible to refer to a vocabulary defined in the same or in a separate
EmotionML document, through URIs specified by the values of the attributes
category-set
, dimension-set
,
appraisal-set
and action-tendency-set
of the <emotion>
element.
The value of the type
attribute explicitly states whether the
vocabulary represents category names, dimension elements, appraisal elements or
action tendency elements.
<item>
elementAnnotation | <item> |
---|---|
Definition | Represents the definition of one vocabulary item, associated with a
value which can be used in the "name" attribute of <category> , <dimension> , <appraisal> or <action-tendency> (depending on
the type of vocabulary being defined). |
Children | An <item> element MAY contain a single <info> element, providing
arbitrary metadata about the vocabulary item. |
Attributes |
|
Occurrence | One or more <item> elements occur as direct
children of a <vocabulary>
element. |
An <item>
represents the definition of one vocabulary
item. A <vocabulary>
MUST contain at
least one <item>
element.
Examples:
In the following example, three vocabularies are wrapped into a single
EmotionML document. Their id
attributes are: "big6",
"fsre-dimensions" and "frijda-subset". They are used to represent categories,
dimensions and action tendencies respectively. The first
<emotion>
element specifies the emotion vocabularies used
through the attributes category-set
and
action-tendency-set
, while the second <emotion>
element uses the attribute dimension-set
.
<emotionml version="1.0" xmlns="http://www.w3.org/2009/10/emotionml"> <!-- Vocabulary definitions --> <vocabulary type="category" id="big6"> <item name="anger"/> <item name="disgust"/> <item name="fear"/> <item name="happiness"/> <item name="sadness"/> <item name="surprise"/> </vocabulary> <vocabulary type="dimension" id="fsre-dimensions"> <item name="valence"/> <item name="potency"/> <item name="arousal"/> <item name="unpredictability"/> </vocabulary> <vocabulary type="action-tendency" id="frijda-subset"> <item name="approach"/> <item name="avoidance"/> <item name="rejecting"/> </vocabulary> <!-- Emotion elements --> <emotion category-set="#big6" action-tendency-set="#frijda-subset"> <category name="fear"/> <action-tendency name="approach" value="0.0"/> <action-tendency name="avoidance" value="0.9"/> </emotion> <emotion dimension-set="#fsre-dimensions"> <dimension name="arousal" value="0.3"/> </emotion> </emotionml>
The EmotionML namespace is "http://www.w3.org/2009/10/emotionml". All EmotionML elements MUST use this namespace.
The EmotionML namespace is intended to be used with other XML namespaces as per the Namespaces in XML Recommendation (1.0 [XML-NS10] or 1.1 [XML-NS11], depending on the version of XML being used).
The EmotionML schema is designed to validate the structural integrity of an
EmotionML document or document fragment, but cannot verify whether the emotion
descriptors used in the name
attribute of
<category>
, <dimension>
,
<appraisal>
and <action-tendency
> are
consistent with the vocabularies indicated in the respective
category-set
, dimension-set
,
appraisal-set
and action-tendency-set
attributes.
It is the responsibility of an EmotionML processor to verify that the use of descriptor names and values is consistent with the vocabulary definition.
This section is informative.
An image gets annotated with several emotion categories at the same time, but different intensities.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata" category-set="http://www.example.com/custom/hall-matsumoto-emotions.xml"> <info> <meta:media-type>image</meta:media-type> <meta:media-id>disgust</meta:media-id> <meta:media-set>JACFEE-database</meta:media-set> <meta:doc>Example adapted from (Hall & Matsumoto 2004) http://www.davidmatsumoto.info/Articles/2004_hall_and_matsumoto.pdf </meta:doc> </info> <emotion> <category name="Disgust" value="0.82"/> <category name="Contempt" value="0.35"/> <category name="Anger" value="0.12"/> <category name="Surprise" value="0.53"/> </emotion> </emotionml>
Example 1: Annotation of a whole video: several emotions are annotated with different intensities.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata" category-set="http://www.example.com/custom/humaine-database-labels.xml"> <info> <meta:media-type>video</meta:media-type> <meta:media-name>ed1_4</meta:media-name> <meta:media-set>humaine database</meta:media-set> <meta:coder-set>JM-AB-UH</meta:coder-set> </info> <emotion> <category name="Amusement" value="0.52"/> <category name="Irritation" value="0.63"/> <category name="Relaxed" value="0.02"/> <category name="Frustration" value="0.87"/> <category name="Calm" value="0.21"/> <category name="Friendliness" value="0.28"/> </emotion> </emotionml>
Example 2: Annotation of a video segment, where two emotions are annotated for overlapping but not identical timespans.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata" category-set="http://www.example.com/custom/emotv-labels.xml"> <info> <meta:media-type>video</meta:media-type> <meta:media-name>ext-03</meta:media-name> <meta:media-set>EmoTV</meta:media-set> <meta:coder>4</meta:coder> </info> <emotion> <category name="irritation" value="0.46"/> <reference uri="file:ext03.avi?t=3.24,15.4"> </emotion> <emotion> <category name="despair" value="0.48"/> <reference uri="file:ext03.avi?t=5.15,17.9"/> </emotion> </emotionml>
This example shows how automatically annotated data from three affective sensor devices might be stored or communicated.
It shows an excerpt of an episode experienced on 23 November 2001 from 14:36 onwards (absolute start time is 1006526160 milliseconds since 1 January 1970 00:00:00 GMT). Each device detects an emotion, but at slightly different times and for different durations.
The next entry of observed emotions occurs about 6 minutes later (absolute start time is 1006526520 milliseconds since 1 January 1970 00:00:00 GMT). Only the physiology sensor has detected a short glimpse of anger, for the visual and IR camera it was below their individual threshold so no entry from them.
For simplicity, all devices use categorical annotations and the same set of categories. Obviously it would be possible, and even likely, that different devices from different manufacturers provide their data annotated with different emotion sets.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> ... <emotion start="1006526160" expressed-through="face"> <!--the first modality detects excitement. It is a camera observing the face. A URI to the database is provided to access the video stream.--> <category name="excited"/> <reference uri="http://www.example.com/facedb#t=26,98"/> </emotion> <emotion start="1006526160" expressed-through="facial-skin-color"> <!--the second modality detects anger. It is an IR camera observing the face. A URI to the database is provided to access the video stream.--> <category name="angry"/> <reference uri="http://www.example.com/skindb#t=23,108"/> </emotion> <emotion start="1006526160" expressed-through="physiology"> <!--the third modality detects excitement again. It is a wearable device monitoring physiological changes in the body. A URI to the database is provided to access the data stream.--> <category name="excited"/> <reference uri="http://www.example.com/physiodb#t=19,101"/> </emotion> <emotion start="1006526520" expressed-through="physiology"> <category name="angry"/> <reference uri="http://www.example.com/physiodb2#t=2,6"/> </emotion> ... </emotionml>
Note that handling of complex emotions is not explicitly specified. This example assumes that parallel occurrences of emotions will be determined on the time stamp.
The MPEG-4 standard offers 68 parameters, called Facial Animation Parameters FAPs, to animate a 3D facial model. 66 of these parameters correspond to low level parameters. These parameters act on the facial feature points defining a 3D facial model. They specify how these feature points are displaced. They simulate muscular contraction. On the other hand, two FAPs, namely FAP1 and FAP2, refer respectively to viseme and expression. FAP2 corresponds to one of the six basic facial expressions (anger, disgust, fear, happiness, sadness and surprise). The expressions associated to the six emotions are defined by textual descriptions [Ostermann, 2002].
In emotion theory, the idea of mixing emotions to create new emotions is disputed. For the purposes of facial expression modeling, however, it is possible to simulate different emotions as linear combinations of the six basic facial expressions. MPEG-4 allows the linear combination of any two of these expressions: emotion_1 * intensity_1 + emotion_2 * intensity_2. For example, [Raouzaiou et al., 2005] found that the expressions of depression and guilt can be obtained by combinations of fear and sadness with different intensities, while the expression of suspicion is obtained by combining anger and disgust.
In EmotionML it is possible to represent the emotional input to an MPEG-4
based facial animation system using multiple <category>
elements, for example as follows.
<emotion xmlns="http://www.w3.org/2009/10/emotionml" category-set="http://www.w3.org/TR/emotion-voc/xml#big6"> <!-- attempt to express suspicion as a combination of anger and disgust --> <category name="anger" value="0.5"/> <category name="disgust" value="0.3"/> </emotion>
The following example describes various aspects of an emotionally competent robot whose battery is nearly empty. The robot is in a global state of high arousal, negative pleasure and low dominance, i.e. a negative state of distress paired with some urgency but quite limited power to influence the situation. It has a tendency to seek a recharge and to avoid picking up boxes. However, sensor data displays an unexpected obstacle on the way to the charging station. This triggers planning of expressive behavior of frowning. The annotations are grouped into a stand-alone EmotionML document here; in the real world, the various aspects would more likely be embedded into different specialized markup in various parts of the Robot architecture.
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" xmlns:meta="http://www.example.com/metadata"> <info> <meta:name>robbie the robot example</meta:name> </info> <!-- Robot's current global state configuration: negative, active, powerless --> <emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#pad-dimensions"> <dimension name="pleasure" value="0.2"/> <dimension name="arousal" value="0.8"/> <dimension name="dominance" value="0.3"/> </emotion> <!-- Robot's action tendencies: want to recharge --> <emotion action-tendency-set="http://www.example.com/custom/action/robot.xml"> <action-tendency name="charge-battery" value="0.9"/> <action-tendency name="seek-shelter" value="0.7"/> <action-tendency name="pickup-boxes" value="0.1"/> </emotion> <!-- Appraised value of incoming event: obstacle detected, appraised as novel and unpleasant --> <emotion appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals"> <appraisal name="suddenness" value="0.8" confidence="0.4"/> <appraisal name="intrinsic-pleasantness" value="0.2" confidence="0.8"/> <reference role="triggeredBy" uri="file:scannerdata.xml#obstacle27"/> </emotion> <!-- Robot's planned facial gestures: will frown --> <emotion category-set="http://www.example.com/custom/robot-emotions.xml" expressed-through="face"> <category name="frustration"/> <reference role="expressedBy" uri="file:behavior-repository.xml#frown"/> </emotion> </emotionml>
One intended use of EmotionML is as a plug-in for existing markup languages. For compatibility with text-annotating markup languages such as SSML, EmotionML avoids the use of text nodes. All EmotionML information is encoded in element and attribute structures.
This section illustrates the concept using three existing W3C markup languages: EMMA, SSML, and SMIL.
EMMA is made for representing arbitrary analysis results; one of them could be the emotional state. The following example represents an analysis of a non-verbal vocalization; its emotion is described as a low-intensity state, maybe boredom.
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns="http://www.w3.org/2009/10/emotionml"> <emma:interpretation emma:start="12457990" emma:end="12457995" emma:mode="voice" emma:verbal="false"> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="bored" value="0.1" confidence="0.1"/> </emotion> </emma:interpretation> </emma:emma>
In the folllowing example, the EMMA <emma:derivation>
element is used to represent multiple emotion interpretations associated with
audio and video media sources. The first and the third interpretations specify
the same emotion category, "content", while the result of the second one is
"amused". The consolidated emotion is the result of some processing made on the
interpretations included in the derivation element. In this case it is
"content", which is the most frequent category within the available
interpretations.
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns="http://www.w3.org/2009/10/emotionml"> <emma:derivation> <emma:interpretation id="text1" emma:start="12457960" emma:end="12457995" emma:mode="voice" emma:verbal="true" emma:signal="http://example.com/signals/emo123.wav" emma:process="http://example.com/text_analysis.xml"> <emma:literal>I feel happy</emma:literal> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="content" value="0.7" confidence="0.7"/> </emotion> </emma:interpretation> <emma:interpretation id="voice1" emma:start="12457960" emma:end="12457995" emma:mode="voice" emma:verbal="false" emma:signal="http://example.com/signals/emo123.wav" emma:process="http://example.com/voice_analysis.xml"> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="amused" value="0.4" confidence="0.5"/> </emotion> </emma:interpretation> <emma:interpretation id="video1" emma:start="12457980" emma:end="12458000" emma:mode="video" emma:verbal="false" emma:signal="http://example.com/signals/emo123.mpg" emma:process="http://example.com/video_analysis.xml"> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="content" value="0.5" confidence="0.7"/> </emotion> </emma:interpretation> </emma:derivation> <emma:interpretation id="multimodal1" emma:start="12457960" emma:end="12458000" emma:medium="acoustic visual" emma:mode="voice video"> <emma:derived-from resource="#text1" composite="true"/> <emma:derived-from resource="#voice1" composite="true"/> <emma:derived-from resource="#video1" composite="true"/> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="content" value="0.6" confidence="0.7"/> </emotion> </emma:interpretation> </emma:emma>
Two options for using EmotionML with SSML can be illustrated.
First, it is possible with [SSML 1.1] to use
arbitrary markup belonging to a different namespace anywhere in an SSML
document; only SSML processors that support the markup would take it into
account. Therefore, it is possible to insert EmotionML below, for example, an
<s>
element representing a sentence; the intended meaning is
that the enclosing sentence should be spoken with the given emotion, in this
case a moderately worried tone of voice:
<?xml version="1.0"?> <speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:emo="http://www.w3.org/2009/10/emotionml" xml:lang="en-US"> <s> <emo:emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <emo:category name="worried" value="0.4"/> </emo:emotion> Do you need help? </s> </speak>
Second, a future version of SSML could explicitly preview the annotation of
paralinguistic information, which could fill the gap between the
extralinguistic, speaker-constant settings of the <voice>
tag and the linguistic elements such as <s>
,
<emphasis>
, <say-as>
etc. The following
example assumes that there is a <style>
tag for
paralinguistic information in a future version of SSML. The style could embed
an <emotion>
, as follows:
<?xml version="1.0"?> <speak version="x.y" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:emo="http://www.w3.org/2009/10/emotionml" xml:lang="en-US"> <s> <style> <emo:emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <emo:category name="worried" value="0.4"/> </emo:emotion> Do you need help? </style> </s> </speak>
Alternatively, the <style>
could refer to a previously
defined <emotion>
, for example:
<?xml version="1.0"?> <speak version="x.y" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:emo="http://www.w3.org/2009/10/emotionml" xml:lang="en-US"> <emo:emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" id="somewhatWorried"> <emo:category name="worried" value="0.4"/> </emo:emotion> <s> <style ref="#somewhatWorried"> Do you need help? </style> </s> </speak>
Using EmotionML for the use case of generating system behavior requires elements of scheduling and surface form realization which are not part of EmotionML. Necessarily, this use case relies on other languages to provide the needed functionality. This is in line with the aim of EmotionML to serve as a specialized plug-in language.
This example illustrates the idea in terms of a simplified version of a
storytelling application. A virtual agent tells a story using voice and facial
animation. The expression in face and voice is influenced by the rendering
engine in terms of EmotionML. The engine in this example uses SMIL [SMIL] for defining the temporal relation between events;
EmotionML is used via SMIL's generic <ref>
element. In
general it is the engine which knows how to render the emotion in the virtual
agent's expressive capabilities. To override this, the second
<emotion>
contains an explicit request to realize the
emotional expression using both face and voice modalities.
ridinghood.smil:
<smil xmlns="http://www.w3.org/ns/SMIL" version="3.0"> <head> ... </head> <body> <par duration="8s"> <img src="file:forest.jpg"/> <smileText>The little girl was enjoying the walk in the forest.</smileText> <ref src="file:ridinghood.emotionml#emotion1"/> </par> <par duration="5s"> <img src="file:wolf.jpg"/> <smileText>Suddenly a dark shadow appeared in front of her.</smileText> <ref src="file:ridinghood.emotionml#emotion2"/> </par> </body> </smil>
ridinghood.emotionml:
<emotionml xmlns="http://www.w3.org/2009/10/emotionml" category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories" appraisal-set="http://www.w3.org/TR/emotion-voc/xml#scherer-appraisals"> <emotion id="emotion1"> <category name="content" value="0.7"/> </emotion> <emotion id="emotion2" expressed-through="face voice"> <category name="afraid" value="0.9"/> <appraisal name="suddenness" value="0.9"/> <appraisal name="intrinsic-pleasantness" value="0.1"/> </emotion> </emotionml>
Similar principles for decoupling emotion markup from the temporal organization of generating system behavior can be applied using other representations, including interactive setups.
The authors wish to acknowledge the contributions by all members of the Multimodal Interaction Working Group, the Emotion Markup Language Incubator Group and the Emotion Incubator Group, as well as the participants to the W3C Workshop on EmotionML, in particular the following persons (in alphabetic order):
This section is informative.
This section summarizes the main changes since the previous working draft of 29 July 2010.
<category>
element was
harmonized with the other emotion descriptors to allow a
value
attribute or a <trace>
child
element indicating the intensity of that category. Multiple
<category>
elements are now allowed within a single
<emotion>
to reflect the possible co-presence of
these categories. The <intensity>
element was
removed since the usual use is now covered by the value
attribute in <category>
.value
attribute or the <trace>
child element was made
optional for <appraisal>
and
<action-tendency>
elements, in
order to allow for the possibility to merely represent the fact that a
certain appraisal or action tendency is present, irrespective of its
intensity.duration
and relative timestamps was added.<info>
element for representing
metadata.emma:mode
attribute in [EMMA], the modality
attribute was
renamed to expressed-through
.<dimension>
, <appraisal>
and <action-tendency>
elements with a
name
attribute.start
and end
attributes to represent
absolute time, and Media Fragment
URIs to refer to portions of media files.<info>
element, in synchrony with EMMA.<link>
element was renamed to <reference>
to avoid a name clash
with the <link>
element in HTML, which has a
different scope and syntax.