I. The SMIL Media Object Module

Previous version:
http://www.w3.org/AudioVideo/Group/Media/extended-media-object-19990713 (W3C members only)
Philipp Hoschka, W3C (ph@w3.org),
Rob Lanphier (robla@real.com)

Table of Contents

1 Introduction

This Section defines the SMIL media object module. This module contains elements and attributes allowing to describe media objects. Since these elements and attributes are defined in a module, designers of other markup languages can reuse the SMIL media module when they need to include media objects into their language.

Changes with respect to the media object elements in SMIL 1.0 include changes required by basing SMIL on XLink [XLINK], and changes that provide additional functionality that was brought up as Requirements in the Working Group.

2 The ref, animation, audio, img, video, text and textstream elements

These elements can contain all attributes defined for media object elements in SMIL 1.0 with the changes described below, and the additional attributes described below.

2.1 Changes to SMIL 1.0 Attributes

clipBegin, clipEnd, clip-begin, clip-end

Using attribute names with hyphens such as "clip-begin" and "clip-end" is problematic when using a scripting language and the DOM to manipulate these attributes. Therefore, this specification adds the attribute names "clipBegin" and "clipEnd" as an equivalent alternative to the SMIL 1.0 "clip-begin" and "clip-end" attributes. The attribute names with hyphens are deprecated. Software supporting SMIL Boston must be able to handle all four attribute names, whereas software supporting only the SMIL media object module does not have to support the attribute names with hyphens. If an element contains both the old and the new version of a clipping attribute, the the attribute that occurs later in the text is ignored.


<audio src="radio.wav" clip-begin="5s" clipBegin="10s" />

The clip begins at second 5 of the audio, and not at second 10, since the "clipBegin" attribute is ignored.

The syntax of legal values for these attributes is defined by the following BNF:

Clip-value        ::= [ Metric ] "=" ( Clock-val | Smpte-val ) |
                      "name" "=" name-val 
Metric            ::= Smpte-type | "npt" 
Smpte-type        ::= "smpte" | "smpte-30-drop" | "smpte-25"
Smpte-val         ::= Hours ":" Minutes ":" Seconds 
                      [ ":" Frames [ "." Subframes ]]
Hours             ::= Digit Digit 
                  /* see XML 1.0 for a definition of ´Digit´*/
Minutes           ::= Digit Digit
Seconds           ::= Digit Digit
Frames            ::= Digit Digit
Subframes         ::= Digit Digit
name-val          ::= ([^<&"] | [^<&´])*
                  /* Derived from BNF rule [10] in [XML] 
                     Whether single or double quotes are 
                     allowed in a name value depends on which
                     type of quotes is used to quote the 
                     clip attribute value */

This implies the following changes to the syntax defined in SMIL 1.0:

Handling of new syntax in SMIL 1.0 software

Authors can use two approaches for writing SMIL Boston presentations that use the new clipping syntax and functionality ("name", default metric) defined in this specification, but can still can be handled by SMIL 1.0 software.

First, authors can use non-hyphenated versions of the new attributes that use the new functionality, and add SMIL 1.0 conformant clipping attributes later in the text.


<audio src="radio.wav" clipBegin="name=song1" clipEnd="name=moderator1" 
       clip-begin="0s" clip-end="3:50" />

SMIL 1.0 players implementing the recommended extensibility rules of SMIL 1.0 [SMIL] will ignore the clip attributes using the new functionality, since they are not part of SMIL 1.0. SMIL Boston players, in contrast, will ignore the clip attributes using SMIL 1.0 syntax, since they occur later in the text.

The second approach is to use the following steps:

  1. Add a "system-required" test attribute to media object elements using  the new functionality. The value of the "system-required" attribute must be the URI of this specification, i.e. @@ http://www.w3.org/AudioVideo/Group/Media/extended-media-object19990707
  2. Add an alternative version of the media object element that conforms to SMIL 1.0
  3. Include these two elements in a "switch" element


  <audio src="radio.wav" clipBegin="name=song1" clipEnd="name=moderator1"  
     "@@http://www.w3.org/AudioVideo/Group/Media/extended-media-object19990707" />
  <audio src="radio.wav" clip-begin="0s" clip-end="3:50" />

alt, longdesc

If the content of these attributes is read by a screen-reader, the presentation should be paused while the text is read out, and resumed afterwards.

New Accessibility Attributes

This attribute specifies the position of the current element in the order in which longdesc and alt text are read out by a screen reader for the current document. This value must be a number between 0 and 32767. User agents should ignore leading zeros. The default value is 0.
Elements that contain alt or longdesc attributes are read by a screen reader according to the following rules:

2.2 XLink Attributes

To make SMIL 1.0 media objects elements XLink-conformant, the attributes defined in the XLink specification are added as described below.

Note: Due to a limitation in the current XLink draft, only the "src" attribute is treated as an Xlink locator, the "longdesc" attribute is treated as non-XLink linking mechanism (as allowed in Section 8 of the XLink draft). See Appendix for an XLink-conformant equivalent of SMIL 1.0 elements that contain a "longdesc" attribute.

The value of this attribute is fixed to "auto", i.e. the link is followed automatically.
This attribute does not apply to simple-link media object elements
This attribute does not apply, since media object elements are not inline links.
This attribute does not apply, since media object elements are not inline links.
Defined in Xlink specification.
The value of this attribute is fixed to "false". SMIL media object elements are out-of-line links, since they do not have any content, and thus do not have a local resource as defined by XLink.
@@ since this is also a "simple link", this seems to be a "one-ended" link as described in Section 4.2 of the XLink draft (description there is not very clear)
@@ could be used to describe the role of the remote resource, i.e. the value of the "src" attribute. Can't think of a use case, so don't think this is needed
This attribute is defined in the Xlink specification. Its value is fixed to "embed". The media object behaves in the same way as SMIL 1.0 media objects, i.e. the media object is inserted into the presentation.
Equivalent to the SMIL 1.0 "src" attribute. Remapped via XLink attribute remapping onto the XLink "href" attribute.
Note: Attribute remapping is costly when the document does not contain a DTD definition, because in this case, FIXED attributes need to be included explicitly. This means the author has to use the following syntax to be XLink conformant:
    <audio src="audio.wav" xml:attributes="href src" />
Equivalent to SMIL 1.0 "title" attribute.
xml:link (required)
This attribute is required for an element to be an Xlink element. For simple media object elements, its value is fixed to "simple".
@@ same disadvantage for fixed attributes when DTD is missing as with "src" attribute.

2.3 SDP Attributes

When using SMIL in conjunction with the Real Time Transport Protocol (RTP, [RFC1889]), which is designed for real-time delivery of media streams, a media client is required to have initialization parameters in order to interpret the RTP data. These are typically described in the Session Description Protocol (SDP, [RFC2327]). This can be delivered in the DESCRIBE portion of the Real Time Streaming Protocol (RTSP, [RFC2326]), or can be delivered as a file via HTTP.

Since SMIL provides a media description language which often references SDP via RTSP and can also reference SDP files via HTTP, a very useful optimization can be realized by merging parameters typically delivered via SDP into the SMIL document. Since retrieving a SMIL document constitutes one round trip, and retrieving the SDP descriptions referenced in the SMIL document constitutes another round trip, merging the media description into the SMIL document itself can save a round trip in a typical media exchange.  This round-trip savings can result in a noticeably faster start-up over a slow network link.

This applies particularly well to two primary usage scenarios:

(see also "The rtpmap element" below)

SDP-related Attributes

This provides the RTP/RTCP port for a media object transferred via multicast. It is specified as a range, e.g., port="3456-3457" (this is different from "port" in SDP, where the second port is derived by an algorithm). Note: For transports based on UDP in IPv4, the value should be in the range 1024 to 65535 inclusive. For RTP compliance it should start with an even number. For applications where hierarchically encoded streams are being sent to a unicast address, this may be necessary to specify  multiple port pairs. Thus, the range of this request may contain greater than two ports. This attribute is only interpreted if the media object is transferred via RTP and without using RTSP.
This field has the same semantics as the "fmt list" sub-field in a SDP media description. It contains a list of media formats payload IDs. For audio and video, these will normally be a media payload type as defined in the RTP Audio/Video Profile (RFC 1890). When a list of payload formats is given, this implies that all of these formats may be used in the session, but the first of these formats is the default format for the session.For media payload types not explicitly defined as static types, the rtpmap element (defined below) may be used to provide a dynamic binding of media encoding to RTP payload type. The encoding names in the RTP AV Profile do not specify a complete set of parameters for decoding the audio encodings (in terms of clock rate and number of audio channels), and so they are not used directly in this field. Instead, the payload type number should be used to specify the format for static payload types and the payload type number along with additional encoding information should be used for dynamically allocated payload types. This attribute is only interpreted if the media object is transferred via RTP.
This attribute has the same syntax and semantics as the "transport" sub-field in a SDP media description. It defines the transport protocol that is used to deliver the media streams. The standard value for this field is "RTP/AVP", but alternate values may be defined by IANA. RTP/AVP is the IETF's Realtime Transport Protocol using the Audio/Video profile carried over UDP. The complete definition of RTP/AVP can be found in [RFC1890]. Only applies if media object is transferred via RTP.
@@ this may be better to derive from the "src" parameter, which could optionally be rtp://___. This would mean that an RTP URL format  would need to be defined.


<audio src="rtsp://www.w3.org/test.rtp" port="49170-49171"
       transport="RTP/AVP" fmt-list="96,97,98" />

Element Content

Media object elements can contain the following elements:

Defined in Linking Module
Defined in Timing Module
Defined below
Defined in Timing Module

3 The rtpmap element

If the media object is transferred using the RTP protocol, and uses a dynamic payload type, SDP requires the use of the "rtpmap" attribute field. In this specification, this is mapped onto the "rtpmap" element, which is contained in the content of the media object element. If the media object is not transferred using RTP, this element is ignored.


The value of this attribute is a payload format type number listed in the parent element's "rtpformat" attribute. This is used to map dynamic payload types onto definitions of specific encoding types and necessary parameters.
This attribute encodes parameters needed to decode the dynamic payload type. The attribute values have the following syntax:
encoding-val    ::= encoding-name "/" clock-rate "/" encoding-params
encoding-name ::= name-val clock-rate ::= +Digit encoding-params ::= ??

Legal values for "encoding-name" are payload names defined in [RFC1890], and RTP payload names registered as MIME types [draft-ietf-avt-rtp-mime-00].
For audio streams, "encoding parameters" may specify the number of audio channels. This parameter may be omitted if the number of channels is one provided no additional parameters are needed. For video streams, no encoding parameters are currently specified. Additional parameters may be defined in the future, but codec specific parameters should not be added, but defined as separate rtpmap attributes.

Element Content

"rtpmap" is an empty element


<audio src="rtsp://www.w3.org/foo.rtp" port="49170" 
       transport="RTP/AVP" fmt-list="96,97,98">
  <rtpmap payload="96" encoding="L8/8000" />
  <rtpmap payload="97" encoding="L16/8000" />
  <rtpmap payload="98" encoding="L16/11025/2" />

4 Support for media player extensions

A media object referenced by a media object element is often rendered by software modules referred to as media players that are separate from the software module providing the synchronization between different media objects in a presentation (referred to as synchronization engine).

Media players generally support varying levels of control, depending on the constraints of the underlying renderer as well as media delivery, streaming etc. This specification defines 4 levels of support, allowing for increasingly tight integration, and broader functionality. The details of the interface will be presented in a separate document.

Level 0
Must allow the synchronization engine to query for duration, and must support cue, start and stop on the player. To support reasonable resynchronization, the media player must provide pause/unpause controls with minimal latency. This is the minimum level of support defined.
Level 1
In addition to all Level 0 support, the media player can detect when sync has been broken, so that a resynchronization event can be fired. A media player that cannot support Level 1 functionality is responsible to maintain proper synchronization in all circumstances, and has no remedy if it cannot (Level 1 support is recommended).
Level 2
In addition to all Level 1 support, the media player supports a tick() method for advancing the timeline in strict sync with the document timeline. This is generally appropriate to animation renderers that are not tightly bound to media delivery constraints.
Level 3
In addition to all Level 2 support, the media player also supports a query interface to provide information about its time-related capabilities. Capabilities include things like canRepeat, canPlayBackwards, canPlayVariable, canHold, etc. This is mostly for future extension of the timing functionality and for optimization of media playback/rendering.


"MIME Type Registration of RTP Payload Formats", Steve Casner and Philipp Hoschka, June 1999.
Available at ftp://ftpeng.cisco.com/casner/outgoing/draft-ietf-avt-rtp-mime-00.txt.
"RTP: A Transport Protocol for Real-Time Applications", Henning Schulzrinne, Steve Casner, Ron Frederick and Van Jacobson, January 1996. Available at ftp://ftp.isi.edu/in-notes/rfc1889.txt.
" RTP Profile for Audio and Video Conferences with Minimal Control", Henning Schulzrinne, January 1996.
Available at ftp://ftp.isi.edu/in-notes/rfc1890.txt.
"Real Time Streaming Protocol (RTSP)", Henning Schulzrinne, Anup Rao and Rob Lanphier, April 1998. Available at ftp://ftp.isi.edu/in-notes/rfc2326.txt.
"SDP: Session Description Protocol", M. Handley, V. Jacobson, April 1998. Available at ftp://ftp.isi.edu/in-notes/rfc2327.txt.
"Synchronized Multimedia Integration Language (SMIL) 1.0 Specification", Philipp Hoschka, editor, 15 June 1998. Available at http://www.w3.org/TR/REC-smil.
"XML Linking Language (XLink) V1.0", Eve Maler and Steve DeRose, editors, 3 March 1998. Available at http://www.w3.org/TR/WD-xlink.
"Extensible Markup Language (XML) 1.0", Tim Bray, Jean Paoli and C. M. Sperberg-McQueen, editors, 10 February 1998. Available at http://www.w3.org/TR/REC-xml.