I. The SMIL Media Object Module
-
Previous version:
-
http://www.w3.org/AudioVideo/Group/Media/extended-media-object-19990713
(W3C members only)
-
Editors:
-
Philipp Hoschka, W3C (ph@w3.org),
Rob Lanphier (robla@real.com)
Table of Contents
This Section defines the SMIL media object module. This module contains elements
and attributes allowing to describe media objects. Since these elements and
attributes are defined in a module, designers of other markup languages can
reuse the SMIL media module when they need to include media objects into
their language.
Changes with respect to the media object elements in SMIL 1.0 include changes
required by basing SMIL on XLink [XLINK], and changes
that provide additional functionality that was brought up as Requirements
in the Working Group.
These elements can contain all attributes defined for media object elements
in SMIL 1.0 with the changes described below, and the additional attributes
described below.
clipBegin, clipEnd, clip-begin, clip-end
Using attribute names with hyphens such as "clip-begin" and "clip-end" is
problematic when using a scripting language and the DOM to manipulate these
attributes. Therefore, this specification adds the attribute names "clipBegin"
and "clipEnd" as an equivalent alternative to the SMIL 1.0 "clip-begin" and
"clip-end" attributes. The attribute names with hyphens are deprecated. Software
supporting SMIL Boston must be able to handle all four attribute names, whereas
software supporting only the SMIL media object module does not have to support
the attribute names with hyphens. If an element contains both the old and
the new version of a clipping attribute, the the attribute that occurs later
in the text is ignored.
Example:
<audio src="radio.wav" clip-begin="5s" clipBegin="10s" />
The clip begins at second 5 of the audio, and not at second 10, since the
"clipBegin" attribute is ignored.
The syntax of legal values for these attributes is defined by the following
BNF:
Clip-value ::= [ Metric ] "=" ( Clock-val | Smpte-val ) |
"name" "=" name-val
Metric ::= Smpte-type | "npt"
Smpte-type ::= "smpte" | "smpte-30-drop" | "smpte-25"
Smpte-val ::= Hours ":" Minutes ":" Seconds
[ ":" Frames [ "." Subframes ]]
Hours ::= Digit Digit
/* see XML 1.0 for a definition of ´Digit´*/
Minutes ::= Digit Digit
Seconds ::= Digit Digit
Frames ::= Digit Digit
Subframes ::= Digit Digit
name-val ::= ([^<&"] | [^<&´])*
/* Derived from BNF rule [10] in [XML]
Whether single or double quotes are
allowed in a name value depends on which
type of quotes is used to quote the
clip attribute value */
This implies the following changes to the syntax defined in SMIL 1.0:
Handling of new syntax in SMIL 1.0 software
Authors can use two approaches for writing SMIL Boston presentations that
use the new clipping syntax and functionality ("name", default metric) defined
in this specification, but can still can be handled by SMIL 1.0 software.
First, authors can use non-hyphenated versions of the new attributes that
use the new functionality, and add SMIL 1.0 conformant clipping attributes
later in the text.
Example:
<audio src="radio.wav" clipBegin="name=song1" clipEnd="name=moderator1"
clip-begin="0s" clip-end="3:50" />
SMIL 1.0 players implementing the recommended extensibility rules of SMIL
1.0 [SMIL] will ignore the clip attributes using the
new functionality, since they are not part of SMIL 1.0. SMIL Boston players,
in contrast, will ignore the clip attributes using SMIL 1.0 syntax,
since they occur later in the text.
The second approach is to use the following steps:
-
Add a "system-required" test attribute to media object elements using the
new functionality. The value of the "system-required" attribute must be the
URI of this specification, i.e. @@
http://www.w3.org/AudioVideo/Group/Media/extended-media-object19990707
-
Add an alternative version of the media object element that conforms to SMIL
1.0
-
Include these two elements in a "switch" element
Example:
<switch>
<audio src="radio.wav" clipBegin="name=song1" clipEnd="name=moderator1"
system-required=
"@@http://www.w3.org/AudioVideo/Group/Media/extended-media-object19990707" />
<audio src="radio.wav" clip-begin="0s" clip-end="3:50" />
</switch>
alt, longdesc
If the content of these attributes is read by a screen-reader, the presentation
should be paused while the text is read out, and resumed afterwards.
New Accessibility Attributes
-
readIndex
-
This attribute specifies the position of the current element in the order
in which
longdesc
and alt
text are read out by
a screen reader for the current document. This value must be a number between
0 and 32767. User agents should ignore leading zeros. The default value is
0.
Elements that contain alt
or longdesc
attributes
are read by a screen reader according to the following rules:
-
Those elements that assign a positive value to the readindex attribute are
read out first. Navigation proceeds from the element with the lowest readindex
value to the element with the highest value. Values need not be sequential
nor must they begin with any particular value. Elements that have identical
readindex values should be read out in the order they appear in the character
stream of the document.
-
Those elements that assign it a value of "0" read out in the order they appear
in the character stream of the document.
-
Elements in a switch statement and that have test-attributes which evaluate
to "false" are not read out.
-
To make SMIL 1.0 media objects elements XLink-conformant, the attributes
defined in the XLink specification are added as described below.
Note: Due to a limitation in the current XLink draft, only the "src" attribute
is treated as an Xlink locator, the "longdesc" attribute is treated as non-XLink
linking mechanism (as allowed in Section 8 of the XLink draft). See Appendix
for an XLink-conformant equivalent of SMIL 1.0 elements that contain a "longdesc"
attribute.
-
actuate
-
The value of this attribute is fixed to "auto", i.e. the link is followed
automatically.
-
behavior
-
This attribute does not apply to simple-link media object elements
-
content-role
-
This attribute does not apply, since media object elements are not inline
links.
-
content-title
-
This attribute does not apply, since media object elements are not inline
links.
-
inline
-
Defined in Xlink specification.
The value of this attribute is fixed to "false". SMIL media object elements
are out-of-line links, since they do not have any content, and thus do not
have a local resource as defined by XLink.
@@ since this is also a "simple link", this seems to be a "one-ended" link
as described in Section 4.2 of the XLink draft (description there is not
very clear)
-
role
-
@@ could be used to describe the role of the remote resource, i.e. the value
of the "src" attribute. Can't think of a use case, so don't think this is
needed
-
show
-
This attribute is defined in the Xlink specification. Its value is fixed
to "embed". The media object behaves in the same way as SMIL 1.0 media objects,
i.e. the media object is inserted into the presentation.
-
src
-
Equivalent to the SMIL 1.0 "src" attribute. Remapped via XLink attribute
remapping onto the XLink "href" attribute.
Note: Attribute remapping is costly when the document does not contain a
DTD definition, because in this case, FIXED attributes need to be included
explicitly. This means the author has to use the following syntax to be XLink
conformant:
<smil>
<body>
<audio src="audio.wav" xml:attributes="href src" />
</body>
</smil>
-
title
-
Equivalent to SMIL 1.0 "title" attribute.
-
xml:link (required)
-
This attribute is required for an element to be an Xlink element. For simple
media object elements, its value is fixed to "simple".
@@ same disadvantage for fixed attributes when DTD is missing as with "src"
attribute.
When using SMIL in conjunction with the Real Time Transport Protocol (RTP,
[RFC1889]), which is designed for real-time
delivery of media streams, a media client is required to have initialization
parameters in order to interpret the RTP data. These are typically described
in the Session Description Protocol (SDP,
[RFC2327]). This can be delivered in the
DESCRIBE portion of the Real Time Streaming Protocol (RTSP,
[RFC2326]), or can be delivered as a file via HTTP.
Since SMIL provides a media description language which often references SDP
via RTSP and can also reference SDP files via HTTP, a very useful optimization
can be realized by merging parameters typically delivered via SDP into the
SMIL document. Since retrieving a SMIL document constitutes one round trip,
and retrieving the SDP descriptions referenced in the SMIL document constitutes
another round trip, merging the media description into the SMIL document
itself can save a round trip in a typical media exchange. This round-trip
savings can result in a noticeably faster start-up over a slow network link.
This applies particularly well to two primary usage scenarios:
-
Pure multicast implementations. This is traditional IETF model where the
SDP is sent via some other transport protocol such as SAP, HTTP, or via email.
-
RTSP delivery. In this case, the primary value of the SDP description is
in the description of media headers delivered in the RTSP DESCRIBE phase,
and not in the transport specification. The transport information (such port
number negotiation and multicast addresses) is handled in RTSP separately
in the SETUP phase.
(see also "The rtpmap element" below)
SDP-related Attributes
-
port
-
This provides the RTP/RTCP port for a media object transferred via multicast.
It is specified as a range, e.g., port="3456-3457" (this is different from
"port" in SDP, where the second port is derived by an algorithm). Note: For
transports based on UDP in IPv4, the value should be in the range 1024 to
65535 inclusive. For RTP compliance it should start with an even number.
For applications where hierarchically encoded streams are being sent to a
unicast address, this may be necessary to specify multiple port pairs.
Thus, the range of this request may contain greater than two ports. This
attribute is only interpreted if the media object is transferred via RTP
and without using RTSP.
-
rtpformat
-
This field has the same semantics as the "fmt list" sub-field in a SDP media
description. It contains a list of media formats payload IDs. For audio and
video, these will normally be a media payload type as defined in the RTP
Audio/Video Profile (RFC 1890). When a list of payload formats is given,
this implies that all of these formats may be used in the session, but the
first of these formats is the default format for the session.For media payload
types not explicitly defined as static types, the rtpmap element (defined
below) may be used to provide a dynamic binding of media encoding to RTP
payload type. The encoding names in the RTP AV Profile do not specify a complete
set of parameters for decoding the audio encodings (in terms of clock rate
and number of audio channels), and so they are not used directly in this
field. Instead, the payload type number should be used to specify the format
for static payload types and the payload type number along with additional
encoding information should be used for dynamically allocated payload types.
This attribute is only interpreted if the media object is transferred via
RTP.
-
transport
-
This attribute has the same syntax and semantics as the "transport" sub-field
in a SDP media description. It defines the transport protocol that is used
to deliver the media streams. The standard value for this field is "RTP/AVP",
but alternate values may be defined by IANA. RTP/AVP is the IETF's Realtime
Transport Protocol using the Audio/Video profile carried over UDP. The complete
definition of RTP/AVP can be found in [RFC1890]. Only
applies if media object is transferred via RTP.
@@ this may be better to derive from the "src" parameter, which could optionally
be rtp://___. This would mean that an RTP URL format would need to
be defined.
Example
<audio src="rtsp://www.w3.org/test.rtp" port="49170-49171"
transport="RTP/AVP" fmt-list="96,97,98" />
Element Content
Media object elements can contain the following elements:
-
anchor
-
Defined in Linking Module
-
par
-
Defined in Timing Module
-
rtpmap
-
Defined below
-
seq
-
Defined in Timing Module
If the media object is transferred using the RTP protocol, and uses a dynamic
payload type, SDP requires the use of the "rtpmap" attribute field. In this
specification, this is mapped onto the "rtpmap" element, which is contained
in the content of the media object element. If the media object is not
transferred using RTP, this element is ignored.
Attributes
-
payload
-
The value of this attribute is a payload format type number listed in the
parent element's "rtpformat" attribute. This is used to map dynamic payload
types onto definitions of specific encoding types and necessary parameters.
-
encoding
-
This attribute encodes parameters needed to decode the dynamic payload type.
The attribute values have the following syntax:
encoding-val ::= encoding-name "/" clock-rate "/" encoding-params
encoding-name ::= name-val
clock-rate ::= +Digit
encoding-params ::= ??
Legal values for "encoding-name" are payload names defined in
[RFC1890], and RTP payload names registered as MIME
types
[draft-ietf-avt-rtp-mime-00].
For audio streams, "encoding parameters" may specify the number of audio
channels. This parameter may be omitted if the number of channels is one
provided no additional parameters are needed. For video streams, no encoding
parameters are currently specified. Additional parameters may be defined
in the future, but codec specific parameters should not be added, but defined
as separate rtpmap attributes.
Element Content
"rtpmap" is an empty element
Example
<audio src="rtsp://www.w3.org/foo.rtp" port="49170"
transport="RTP/AVP" fmt-list="96,97,98">
<rtpmap payload="96" encoding="L8/8000" />
<rtpmap payload="97" encoding="L16/8000" />
<rtpmap payload="98" encoding="L16/11025/2" />
</audio>
A media object referenced by a media object element is often rendered by
software modules referred to as media players that are separate from the
software module providing the synchronization between different media objects
in a presentation (referred to as synchronization engine).
Media players generally support varying levels of control, depending on the
constraints of the underlying renderer as well as media delivery, streaming
etc. This specification defines 4 levels of support, allowing for increasingly
tight integration, and broader functionality. The details of the interface
will be presented in a separate document.
-
Level 0
-
Must allow the synchronization engine to query for duration, and must support
cue, start and stop on the player. To support reasonable resynchronization,
the media player must provide pause/unpause controls with minimal latency.
This is the minimum level of support defined.
-
Level 1
-
In addition to all Level 0 support, the media player can detect when sync
has been broken, so that a resynchronization event can be fired. A media
player that cannot support Level 1 functionality is responsible to maintain
proper synchronization in all circumstances, and has no remedy if it cannot
(Level 1 support is recommended).
-
Level 2
-
In addition to all Level 1 support, the media player supports a tick() method
for advancing the timeline in strict sync with the document timeline. This
is generally appropriate to animation renderers that are not tightly bound
to media delivery constraints.
-
Level 3
-
In addition to all Level 2 support, the media player also supports a query
interface to provide information about its time-related capabilities.
Capabilities include things like canRepeat, canPlayBackwards, canPlayVariable,
canHold, etc. This is mostly for future extension of the timing functionality
and for optimization of media playback/rendering.
References
-
[draft-ietf-avt-rtp-mime-00]
-
"MIME Type Registration of RTP Payload Formats", Steve Casner and Philipp
Hoschka, June 1999.
-
Available at
ftp://ftpeng.cisco.com/casner/outgoing/draft-ietf-avt-rtp-mime-00.txt.
-
[RFC1889]
-
"RTP: A Transport Protocol for Real-Time Applications", Henning Schulzrinne,
Steve Casner, Ron Frederick and Van Jacobson, January 1996. Available at
ftp://ftp.isi.edu/in-notes/rfc1889.txt.
-
[RFC1890]
-
" RTP Profile for Audio and Video Conferences with Minimal Control", Henning
Schulzrinne, January 1996.
Available at
ftp://ftp.isi.edu/in-notes/rfc1890.txt.
-
[RFC2326]
-
"Real Time Streaming Protocol (RTSP)", Henning Schulzrinne, Anup Rao and
Rob Lanphier, April 1998. Available at
ftp://ftp.isi.edu/in-notes/rfc2326.txt.
-
[RFC2327]
-
"SDP: Session Description Protocol", M. Handley, V. Jacobson, April 1998.
Available at
ftp://ftp.isi.edu/in-notes/rfc2327.txt.
-
[SMIL]
-
"Synchronized Multimedia Integration Language (SMIL) 1.0 Specification",
Philipp Hoschka, editor, 15 June 1998. Available at
http://www.w3.org/TR/REC-smil.
-
[XLINK]
-
"XML Linking Language (XLink) V1.0", Eve Maler and Steve DeRose, editors,
3 March 1998. Available at
http://www.w3.org/TR/WD-xlink.
-
[XML]
-
"Extensible Markup Language (XML) 1.0", Tim Bray, Jean Paoli and C. M.
Sperberg-McQueen, editors, 10 February 1998. Available at
http://www.w3.org/TR/REC-xml.