previous   next   contents  

WD-smil-boston-19991115

7. The SMIL Media Object Module

Editors
Philipp Hoschka (ph@w3.org), W3C
Rob Lanphier (robla@real.com), RealNetworks

Table of contents

7.1 Introduction

This Section defines the SMIL media object module. This module contains elements and attributes used to describe media objects. Since these elements and attributes are defined in a module, designers of other markup languages can reuse the SMIL media module when they need to include media objects into their language.

Changes with respect to the media object elements in SMIL 1.0 provide additional functionality that was brought up as Requirements of the Working Group, and those differences are explained in Appendix A.

7.2 The ref, animation, audio, img, video, text and textstream elements

The media object elements allow the inclusion of media objects into a SMIL presentation. Media objects are included by reference (using a URI).

There are two types of media objects: media objects with an intrinsic duration (e.g. video, audio) (also called "continuous media"), and media objects without intrinsic duration (e.g. text, image) (also called "discrete media").

Anchors and links can be attached to visual media objects, i.e. media objects rendered on a visual abstract rendering surface.

When playing back a media object, the player must not derive the exact type of the media object from the name of the media object element. Instead, it must rely solely on other sources about the type, such as type information contained in the "type" attribute, or the type information communicated by a server or the operating system.

Authors, however, should make sure that the group into which of the media object falls (animation, audio, img, video, text or textstream) is reflected in the element name. This is in order to increase the readability of the SMIL document. When in doubt about the group of a media object, authors should use the generic "ref" element.

Element Attributes

Media object elements can have the following attributes:

abstract
A brief description of the content contained in the element.
alt
For user agents that cannot display a particular media-object, this attribute specifies alternate text. It  is strongly recommended that all media object elements have an "alt" attribute with a meaningful description. Authoring tools should ensure that no element can be introduced into a SMIL document without this attribute.

If the content of these attributes is read by a screen-reader, the presentation should be paused while the text is read out, and resumed afterwards.

author
The name of the author of the content contained in the element.
begin
Defined in SMIL Timing Module
clipBegin (clip-begin)
The clipBegin attribute specifies the beginning of a sub-clip of a continuous media object as offset from the start of the media object.
Values in the clipBegin attribute have the following syntax:
Clip-value        ::= [ Metric ] "=" ( Clock-val | Smpte-val ) |
                      "marker" "=" name-val 
Metric            ::= Smpte-type | "npt" 
Smpte-type        ::= "smpte" | "smpte-30-drop" | "smpte-25"
Smpte-val         ::= Hours ":" Minutes ":" Seconds 
                      [ ":" Frames [ "." Subframes ]]
Hours             ::= Digit Digit 
                  /* see XML 1.0 for a definition of ´Digit´*/
Minutes           ::= Digit Digit
Seconds           ::= Digit Digit
Frames            ::= Digit Digit
Subframes         ::= Digit Digit
name-val          ::= ([^<&"] | [^<&´])*
                  /* Derived from BNF rule [10] in [XML10] 
                     Whether single or double quotes are 
                     allowed in a name value depends on which
                     type of quotes is used to quote the 
                     clip attribute value */

The value of this attribute consists of a metric specifier, followed by a time value whose syntax and semantics depend on the metric specifier. The following formats are allowed:

SMPTE Timestamp
SMPTE time codes [SMPTE] can be used for frame-level access accuracy. The metric specifier can have the following values:
smpte
smpte-30-drop
These values indicate the use of the "SMPTE 30 drop" format with 29.97 frames per second. The "frames" field in the time value can assume the values 0 through 29. The difference between 30 and 29.97 frames per second is handled by dropping the first two frame indices (values 00 and 01) of every minute, except every tenth minute.
smpte-25
The "frames" field in the time specification can assume the values 0 through 24.

The time value has the format hours:minutes:seconds:frames.subframes. If the frame value is zero, it may be omitted. Subframes are measured in one-hundredth of a frame.
Examples:
clipBegin="smpte=10:12:33:20"

Normal Play Time
Normal Play Time expresses time in terms of SMIL clock values. The metric specifier is "npt", and the syntax of the time value is identical to the syntax of SMIL clock values.
Examples:
clipBegin="npt=123.45s"
clipBegin="npt=12:05:35.3
"
Marker
Used to define a clip using named time points in a media object, rather than using clock values or SMPTE values. The metric specifier is "marker", and the marker value is a string.

Example: Assume that a recorded radio transmission consists of a sequence of songs, which are separated by announcements by a disk jockey. The audio format supports marked time points, and the begin of each song or announcement with number X is marked as songX or djX respectively. To extract the first song using the "marker" metric, the following audio media element can be used:

<audio clipBegin="marker=song1" clipEnd="marker=dj1" />

"clipBegin" may also be expressed as "clip-begin" for compatibility with SMIL 1.0. Software supporting SMIL Boston must be able to handle both "clipBegin" and "clip-begin", whereas software supporting only the SMIL media object module only needs to support "clipBegin". If an element contains both the old and the new version of a clipping attribute, the the attribute that occurs later in the text is ignored.

Example:

<audio src="radio.wav" clip-begin="5s" clipBegin="10s" />

The clip begins at second 5 of the audio, and not at second 10, since the "clipBegin" attribute is ignored. See Changes to SMIL 1.0 Attributes for more discussion on this topic.

clipEnd (clip-end)
The clipEnd attribute specifies the end of a sub-clip of a continuous media object (such as audio, video or another presentation) that should be played. It uses the same attribute value syntax as the clipBegin attribute.
If the value of the "clipEnd" attribute exceeds the duration of the media object, the value is ignored, and the clip end is set equal to the effective end of the media object. "clipEnd" may also be expressed as "clip-end" for compatibility with SMIL 1.0. Software supporting SMIL Boston must be able to handle both "clipEnd" and "clip-end", whereas software supporting only the SMIL media object module only needs to support "clipEnd". If an element contains both the old and the new version of a clipping attribute, the the attribute that occurs later in the text is ignored.

See Changes to SMIL 1.0 Attributes for more discussion on this topic.

copyright
The copyright notice of the content contained in the element.
dur
Defined in the SMIL Timing Module
end
Defined in the SMIL Timing Module
fill
For a definition of the semantics of this attribute, see SMIL Timing Module. The attribute can have the values "remove" and "freeze".
id
This attribute uniquely identifies an element within a document. Its value is an XML identifier.
longdesc
This attribute specifies a link (URI) to a long description of a media object. This description should supplement the short description provided using the alt attribute. When the media object has associated hyperlinked content, this attribute should provide information about the hyperlinked content.

If the content of these attributes is read by a screen-reader, the presentation should be paused while the text is read out, and resumed afterwards.

port
This provides the RTP/RTCP port for a media object transferred via multicast. It is specified as a range, e.g., port="3456-3457" (this is different from "port" in SDP, where the second port is derived by an algorithm). Note: For transports based on UDP in IPv4, the value should be in the range 1024 to 65535 inclusive. For RTP compliance it should start with an even number. For applications where hierarchically encoded streams are being sent to a unicast address, this may be necessary to specify multiple port pairs. Thus, the range of this request may contain greater than two ports. This attribute is only interpreted if the media object is transferred via RTP and without using RTSP.
readIndex
This attribute specifies the position of the current element in the order in which longdesc and alt text are read out by a screen reader for the current document. This value must be a number between 0 and 32767. User agents should ignore leading zeros. The default value is 0.

Elements that contain alt or longdesc attributes are read by a screen reader according to the following rules:

region
This attribute specifies an abstract rendering surface (either visual or acoustic) defined within the layout section of the document. Its value must be an XML identifier. If no rendering surface with this id is defined in the layout section, the values of the formatting properties of this element are determined by the default layout.
rtpformat
This field has the same semantics as the "fmt list" sub-field in an SDP media description. It contains a list of media format payload IDs. For audio and video, these will normally be a media payload type as defined in the RTP Audio/Video Profile (RFC 1890). When a list of payload formats is given, this implies that all of these formats may be used in the session, but the first of these formats is the default format for the session.For media payload types not explicitly defined as static types, the rtpmap element (defined below) may be used to provide a dynamic binding of media encoding to RTP payload type. The encoding names in the RTP AV Profile do not specify a complete set of parameters for decoding the audio encodings (in terms of clock rate and number of audio channels), and so they are not used directly in this field. Instead, the payload type number should be used to specify the format for static payload types and the payload type number along with additional encoding information should be used for dynamically allocated payload types. This attribute is only interpreted if the media object is transferred via RTP.
src
The value of the src attribute is the URI of the media object.
title
This attribute offers advisory information about the element for which it is set. Values of the title attribute may be rendered by user agents in a variety of ways. For instance, visual browsers frequently display the title as a "tool tip" (a short message that appears when the pointing device pauses over an object). It is strongly recommended that all media object elements have a "title" attribute with a meaningful description. Authoring tools should ensure that no element can be introduced into a SMIL document without this attribute.
transport
This attribute has the same syntax and semantics as the "transport" sub-field in a SDP media description. It defines the transport protocol that is used to deliver the media streams. The standard value for this field is "RTP/AVP", but alternate values may be defined by IANA. RTP/AVP is the IETF's Realtime Transport Protocol using the Audio/Video profile carried over UDP. The complete definition of RTP/AVP can be found in [RFC1890]. Only applies if media object is transferred via RTP.

@@ this may be better to derive from the "src" parameter, which could optionally be rtp://___. This would mean that an RTP URL format  would need to be defined.

type
MIME type of the media object referenced by the "src" attribute.
xml:lang
Used to identify the natural or formal language for the element. For a complete description, see the XML specification, section 2.12  [XML10].

xml:lang differs from the system-language test attribute in one important respect. xml:lang provides information about the content's language independent of what implementations do with the information, whereas system-language is a test attribute with specific associated behavior (see system-language in SMIL Content Control Module for details)

Element Content

Media object elements can contain the following elements:

anchor
Defined in Linking Module
area
Defined in Linking Module
par
Defined in Timing Module
seq
Defined in Timing Module
excl
Defined in Timing Module
animate
Defined in Animation Module
set
Defined in Animation Module
animateColor
Defined in Animation Module
animateMotion
Defined in Animation Module
rtpmap
Defined below
 

7.3 The rtpmap element

If the media object is transferred using the RTP protocol, and uses a dynamic payload type, SDP requires the use of the "rtpmap" attribute field. In this specification, this is mapped onto the "rtpmap" element, which is contained in the content of the media object element. If the media object is not transferred using RTP, this element is ignored.

Attributes

payload
The value of this attribute is a payload format type number listed in the parent element's "rtpformat" attribute. This is used to map dynamic payload types onto definitions of specific encoding types and necessary parameters.
encoding
This attribute encodes parameters needed to decode the dynamic payload type. The attribute values have the following syntax:
encoding-val    ::= ( short-encoding | long-encoding )
short-encoding  ::= encoding-name "/" clock-rate 
long-encoding   ::= encoding-name "/" clock-rate "/" encoding-params
encoding-name   ::= name-val
clock-rate      ::= +Digit
encoding-params ::= ??

Legal values for "encoding-name" are payload names defined in [RFC1890], and RTP payload names registered as MIME types [draft-ietf-avt-rtp-mime-00].

For audio streams, "encoding parameters" may specify the number of audio channels. This parameter may be omitted if the number of channels is one provided no additional parameters are needed. For video streams, no encoding parameters are currently specified. Additional parameters may be defined in the future, but codec specific parameters should not be added, but defined as separate rtpmap attributes.

Element Content

"rtpmap" is an empty element

Example

<audio src="rtsp://www.w3.org/foo.rtp" port="49170" 
       transport="RTP/AVP" rtpformat="96,97,98">
  <rtpmap payload="96" encoding="L8/8000" />
  <rtpmap payload="97" encoding="L16/8000" />
  <rtpmap payload="98" encoding="L16/11025/2" />
</audio>

7.4 Support for media player extensions

A media object referenced by a media object element is often rendered by software modules referred to as media players that are separate from the software module providing the synchronization between different media objects in a presentation (referred to as synchronization engine).

Media players generally support varying levels of control, depending on the constraints of the underlying renderer as well as media delivery, streaming etc. This specification defines 4 levels of support, allowing for increasingly tight integration, and broader functionality. The details of the interface will be presented in a separate document.

Level 0
Must allow the synchronization engine to query for duration, and must support cue, start and stop on the player. To support reasonable resynchronization, the media player must provide pause/unpause controls with minimal latency. This is the minimum level of support defined.
Level 1
In addition to all Level 0 support, the media player can detect when sync has been broken, so that a resynchronization event can be fired. A media player that cannot support Level 1 functionality is responsible to maintain proper synchronization in all circumstances, and has no remedy if it cannot (Level 1 support is recommended).
Level 2
In addition to all Level 1 support, the media player supports a tick() method for advancing the timeline in strict sync with the document timeline. This is generally appropriate to animation renderers that are not tightly bound to media delivery constraints.
Level 3
In addition to all Level 2 support, the media player also supports a query interface to provide information about its time-related capabilities. Capabilities include things like canRepeat, canPlayBackwards, canPlayVariable, canHold, etc. This is mostly for future extension of the timing functionality and for optimization of media playback/rendering.

7.4.1 Appendix A: Changes to SMIL 1.0 Attributes

clipBegin, clipEnd, clip-begin, clip-end

With regards to the clipBegin/clip-begin and clipEnd/clip-end elements, SMIL Boston defines the following changes to the syntax defined in SMIL 1.0:

Handling of new clipBegin/clipEnd syntax in SMIL 1.0 software

Using attribute names with hyphens such as "clip-begin" and "clip-end" is problematic when using a scripting language and the DOM to manipulate these attributes. Therefore, this specification adds the attribute names "clipBegin" and "clipEnd" as an equivalent alternative to the SMIL 1.0 "clip-begin" and "clip-end" attributes. The attribute names with hyphens are deprecated.

Authors can use two approaches for writing SMIL Boston presentations that use the new clipping syntax and functionality ("marker", default metric) defined in this specification, but can still can be handled by SMIL 1.0 software. First, authors can use non-hyphenated versions of the new attributes that use the new functionality, and add SMIL 1.0 conformant clipping attributes later in the text.

Example:

<audio src="radio.wav" clipBegin="marker=song1" clipEnd="marker=moderator1" 
       clip-begin="0s" clip-end="3:50" />

SMIL 1.0 players implementing the recommended extensibility rules of SMIL 1.0 [SMIL10] will ignore the clip attributes using the new functionality, since they are not part of SMIL 1.0. SMIL Boston players, in contrast, will ignore the clip attributes using SMIL 1.0 syntax, since they occur later in the text.

The second approach is to use the following steps:

  1. Add a "system-required" test attribute to media object elements using the new functionality. The value of the "system-required" attribute must be the URI of this specification, i.e. @@ http://www.w3.org/AudioVideo/Group/Media/extended-media-object19990707
  2. Add an alternative version of the media object element that conforms to SMIL 1.0
  3. Include these two elements in a "switch" element

Example:

<switch>
  <audio src="radio.wav" clipBegin="marker=song1" clipEnd="marker=moderator1"  
   system-required=
     "@@http://www.w3.org/AudioVideo/Group/Media/extended-media-object19990707" />
  <audio src="radio.wav" clip-begin="0s" clip-end="3:50" />
</switch>

alt, longdesc

Added the recommendation that if the content of these attributes is read by a screen-reader, the presentation should be paused while the text is read out, and resumed afterwards.

New Accessibility Attributes

readIndex

SDP Attributes

When using SMIL in conjunction with the Real Time Transport Protocol (RTP, [RFC1889]), which is designed for real-time delivery of media streams, a media client is required to have initialization parameters in order to interpret the RTP data. In the typical RTP implementation, these initialization parameters are described in the Session Description Protocol (SDP, [RFC2327]). The SDP description can be delivered in the DESCRIBE portion of the Real Time Streaming Protocol (RTSP, [RFC2326]), or can be delivered as a file via HTTP.

Since SMIL provides a media description language which often references SDP via RTSP and can also reference SDP files via HTTP, a very useful optimization can be realized by merging parameters typically delivered via SDP into the SMIL document. Since retrieving a SMIL document constitutes one round trip, and retrieving the SDP descriptions referenced in the SMIL document constitutes another round trip, merging the media description into the SMIL document itself can save a round trip in a typical media exchange. This round-trip savings can result in a noticeably faster start-up over a slow network link.

This applies particularly well to two primary usage scenarios:

The following attributes were added to SMIL Boston:

port
rtpformat
transport

Example

<audio src="rtsp://www.w3.org/test.rtp" port="49170-49171"
       transport="RTP/AVP" rtpformat="96,97,98" />

In addition to these new attributes, the "rtpmap" element was added to complete the SDP functionality.

7.4.2 Appendix B: Element Content

SMIL 1.0 only allowed "anchor" as a child element of a media element. In addition to "anchor", the following elements are now allowed as children of a SMIL media object:

area, anchor
Defined in Linking Module
par, seq, excl
Defined in Timing Module
rtpmap
Defined above
animate, set, animateColor, animateMotion
Defined in Animation Module

7.4.3 Appendix C: New sections

The rtpmap element

A new section describing the "rtpmap" element is provided which provides functionality needed to use SMIL as a replacement for SDP.

Support for media player extensions

SMIL Boston introduces the concepts of levels of functionality, which are explained in this section.

7.4.4 Appendix D: Backburner

Listed below are the features that haven't been integrated yet, and may not make it into the final version of SMIL Boston:


previous   next   contents