previous   next   contents  

4. SMIL Content Control

Jeffrey Ayars (, RealNetworks
Dick Bulterman, (, Oratrix

Table of contents

4.1 Introduction

This Section defines the SMIL content control module. This module contains elements and attributes which provide for runtime content choices and optimized content delivery. Since these elements and attributes are defined in a module, designers of other markup languages can reuse the functionality in the SMIL content control module when they need to include media content control in their language. Conversely, language designers incorporating other SMIL modules do not need to include the content module if other content control functionality is already present.

Proposed Extensions to SMIL 1.0 content control functionality include:

4.2 Content Selection

SMIL 1.0 provides a "test-attribute" mechanism to process an element only when certain conditions are true, e.g. when the client has a certain screen-size. SMIL 1.0 also provides the "switch" element for expressing that a set of document parts are alternatives, and that the first one fulfilling certain conditions should be chosen. This is useful to express that different language versions of an audio file are available, and to have the client select one of them. SMIL Boston includes these features and extends them by supporting new system test-attributes, as well as the ability to customize a presentation to an individual viewer by providing author defined, user selected test-attributes.

4.2.1 The <switch> Element

The switch element allows an author to specify a set of alternative elements from which only one acceptable element should be chosen. In SMIL Boston, an element is acceptable if the element is a SMIL Boston element, the media-type can be decoded (if the element declares media), and all of the test-attributes of the element evaluate to "true". When integrating content control into other languages, the language designer must specify what constitutes an "acceptable element."

An element is selected as follows: the player evaluates the elements in the order in which they occur in the switch element. The first acceptable element is selected at the exclusion of all other elements within the switch.

Thus, authors should order the alternatives from the most desirable to the least desirable. Furthermore, authors should place a relatively fail-safe alternative as the last item in the <switch> so that at least one item within the switch is chosen (unless this is explicitly not desired). Implementations should NOT arbitrarily pick an object within a <switch> when test-attributes for all child elements fail.

Note that some network protocols, e.g. HTTP and RTSP, support content-negotiation, which may be an alternative to using the "switch" element in some cases.


The switch element can have the following attributes:

An XML identifier
This attribute offers advisory information about the element for which it is set. Values of the title attribute may be rendered by user agents in a variety of ways. For instance, visual browsers frequently display the title as a "tool tip" (a short message that appears when the pointing device pauses over an object).

4.2.2 Predefined Test Attributes

This specification defines a list of test attributes that can be added to language elements, as allowed by the language designer. In SMIL 1.0, these elements are synchronization and media elements. Conceptually, these attributes represent Boolean tests. When one of the test attributes specified for an element evaluates to "false", the element carrying this attribute is ignored.

Within the list below, the concept of "user preference" may show up. User preferences are usually set by the playback engine using a preferences dialog box, but this specification does not place any restrictions on how such preferences are communicated from the user to the SMIL player.

This version of SMIL defines the following test attributes. Note that some hyphenated test attribute names from SMIL 1.0 have been deprecated in favor of names using the current SMIL camelCase convention. For these, the deprecated SMIL 1.0 name is shown in parentheses after the preferred name.

systemBitrate (system-bitrate)
This attribute specifies the approximate bandwidth, in bits-per-second, available to the system. The measurement of bandwidth is application specific, meaning that applications may use sophisticated measurement of end-to-end connectivity, or a simple static setting controlled by the user. In the latter case, this could for instance be used to make a choice based on the users connection to the network. Typical values for modem users would be 14400, 28800, 56000 bit/s etc. Evaluates to "true" if the available system bitrate is equal to or greater than the given value. Evaluates to "false" if the available system bitrate is less than the given value.
The attribute can assume any integer value greater than 0. If the value exceeds an implementation-defined maximum bandwidth value, the attribute always evaluates to "false".
systemCaptions (system-captions)
This attribute allows authors to distinguish between a redundant text equivalent of the audio portion of the presentation (intended for audiences such as those with hearing disabilities or those learning to read who want or need this information) and text intended for a wide audience. The attribute can has the value "on" if the user has indicated a desire to see closed-captioning information, and it has the value "off" if the user has indicated that they don't wish to see such information. Evaluates to "true" if the value is "on", and evaluates to "false" if the value is "off".
systemLanguage (system-language)
The attribute value is a comma-separated list of language names as defined in [RFC1766].

Evaluates to "true" if one of the languages indicated by user preferences exactly equals one of the languages given in the value of this parameter, or if one of the languages indicated by user preferences exactly equals a prefix of one of the languages given in the value of this parameter such that the first tag character following the prefix is "-".

Evaluates to "false" otherwise.

Note: This use of a prefix matching rule does not imply that language tags are assigned to languages in such a way that it is always true that if a user understands a language with a certain tag, then this user will also understand all languages with tags for which this tag is a prefix.

The prefix rule simply allows the use of prefix tags if this is the case.

Implementation note: When making the choice of linguistic preference available to the user, implementers should take into account the fact that users are not familiar with the details of language matching as described above, and should provide appropriate guidance. As an example, users may assume that on selecting "en-gb", they will be served any kind of English document if British English is not available. The user interface for setting user preferences should guide the user to add "en" to get the best matching behavior.

Multiple languages MAY be listed for content that is intended for multiple audiences. For example, a rendition of the "Treaty of Waitangi", presented simultaneously in the original Maori and English versions, would call for:

<audio src="foo.rm" systemLanguage="mi, en"/>
However, just because multiple languages are present within the object on which the systemLanguage test attribute is placed, this does not mean that it is intended for multiple linguistic audiences. An example would be a beginner's language primer, such as "A First Lesson in Latin," which is clearly intended to be used by an English-literate audience. In this case, the systemLanguage test attribute should only include "en".

Authoring note: Authors should realize that if several alternative language objects are enclosed in a "switch", and none of them matches, this may lead to situations such as a video being shown without any audio track. It is thus recommended to include a "catch-all" choice at the end of such a switch which is acceptable in all cases.

systemOverdubOrCaption (system-overdub-or-caption)
This attribute is a setting which determines if users prefer overdubbing or captioning when the option is available. The attribute can have the values "caption" and "overdub". Evaluates to "true" if the user preference matches this attribute value. Evaluates to "false" if they do not match. This test attribute has been deprecated in favor of using systemOverdubOrSubtitle and systemCaptions.
systemRequired (system-required)
This attribute specifies the name of an extension. The extension may be a newly adopted language element or attribute, or may be the namespace prefix or URI for a namespace extension. Evaluates to "true" if the extension is supported by the implementation, otherwise, this evaluates to "false". [NAMESPACES]
systemScreenSize (system-screen-size)
Attribute values have the following syntax:
screen-size-val ::= screen-height"X"screen-width
Each of these is a pixel value, and must be an integer value greater than 0. Evaluates to "true" if the SMIL playback engine is capable of displaying a presentation of the given size. Evaluates to "false" if the SMIL playback engine is only capable of displaying smaller presentations.
systemScreenDepth (system-screen-depth)
This attribute specifies the depth of the screen color palette in bits required for displaying the element. The value must be greater than 0. Typical values are 1, 8, 24, 32 .... Evaluates to "true" if the SMIL playback engine is capable of displaying images or video with the given color depth. Evaluates to "false" if the SMIL playback engine is only capable of displaying images or video with a smaller color depth.
This attribute specifies whether subtitles or overdub is rendered for people who are watching a presentation where the audio may be in a language in which they are not fluent. This attribute can have two values: "overdub", which selects for substitution of one voice track for another, and "subtitle", which means that the user prefers the display of subtitles.
This test attribute specifies whether or not closed audio descriptions should be rendered. This is intended to provide authors with the ability to support audio descriptions for blind users like systemCaptions provides text captions for deaf users. The attribute has the value "on" if the user has indicated a desire to hear audio descriptions, and it has the value "off" if the user has indicated that they don't wish to hear audio descriptions. Evaluates to "true" if the value is "on", and evaluates to "false" if the value is "off".
TBD (i.e. Streaming/Stored)
TBD (i.e. Selecting embedded information (element in aggregate))
TBD (i.e. Costs of accessing a stream, free or Pay-Per-View)
CDATA that describes a component of the playback system, e.g. user-agent component/feature, number of audio channels, codec, HW mpeg decoder, etc.


1) Choosing between content with different total bitrates

In a common scenario, implementations may wish to allow for selection via a systemBitrate attribute on elements. The media player evaluates each of the "choices" (elements within the switch) one at a time, looking for an acceptable bitrate given the known characteristics of the link between the media player and media server.

  <text .../>
    <par systemBitrate="40000">
    <par systemBitrate="24000">
    <par systemBitrate="10000">

2) Choosing between audio resources with different bitrates

The elements within the switch may be any combination of elements. For instance, one could merely be specifying an alternate audio track:

   <audio src="joe-audio-better-quality" systemBitrate="16000" />
   <audio src="joe-audio" systemBitrate="8000" />

3) Choosing between audio resources in different languages

In the following example, an audio resource is available both in French and in English. Based on the user's preferred language, the player can choose one of these audio resources.

   <audio src="joe-audio-french" systemLanguage="fr"/>
   <audio src="joe-audio-english" systemLanguage="en"/>

4) Choosing between content written for different screens

In the following example, the presentation contains alternative parts designed for screens with different resolutions and bit-depths. Depending on the particular characteristics of the screen, the player can choose one of the alternatives.

  <text .../>
    <par systemScreenSize="1280X1024" systemScreenDepth="16">
    <par systemScreenSize="640X480" systemScreenDepth="32">
    <par systemScreenSize="640X480" systemScreenDepth="16">

5) Distinguishing caption tracks from stock tickers

In the following example, captions are shown only if the user wants captions on.

    <audio      src="audio.rm"/>
    <video      src="video.rm"/>
    <textstream src="stockticker.rtx"/>
    <textstream src="closed-caps.rtx" systemCaptions="on"/>

6) Choosing the language of overdub and subtitle tracks

In the following example, a French-language movie is available with English, German, and Dutch overdub and subtitle tracks. The following SMIL segment expresses this, and switches on the alternatives that the user prefers.

    <audio src="movie-aud-en.rm" systemLanguage="en" 
    <audio src="movie-aud-de.rm" systemLanguage="de" 
    <audio src="movie-aud-nl.rm" systemLanguage="nl" 
    <!-- French for everyone else -->
    <audio src="movie-aud-fr.rm"/>
  <video src="movie-vid.rm"/>
    <textstream src="movie-sub-en.rt" systemLanguage="en"
    <textstream src="movie-sub-de.rt" systemLanguage="de"
    <textstream src="movie-sub-nl.rt" systemLanguage="nl"
    <!-- French captions for those that really want them -->
    <textstream src="movie-caps-fr.rt" systemCaptions="on"/>

4.2.3 System Test Attribute In-Line Use

During the development of the SMIL 1.0, the issue of content selectability within a presentation received a great deal of attention. Early on, it was decided that a <switch> construct would form the basic selection primitive in the language. A <switch> allows a series of alternatives to be specified for a particular piece of content, one of which is selected by the runtime environment for presentation. An example of how a <switch> might be used to control the alternatives that could accompany a piece of video in a presentation would be:

  <video src="anchor.mpg" ... />
    <audio src="dutch.aiff"   systemLanguage="DU" systemCaptions="overdub" ... />
    <audio src="english.aiff" systemLanguage="EN" systemCaptions="overdub"... />
    <text  src="dutch.html"   systemLanguage="DU" systemCaptions="captions"... />
    <text  src="english.html" systemLanguage="EN" systemCaptions="captions"... />

This fragment (which is pseudo-SMIL for clarity) says that a video is played in parallel with one of: Dutch audio, English audio, Dutch text, or English text. SMIL does not specify the selection mechanism, only a way of specifying the alternatives. While <switch>-based content control is a powerful mechanism, it comes with two problems. 

First, it restricts the resolution of a <switch> to a single alternative. (If you want Dutch audio and Dutch text, you need to specify a compound <switch> statement, but in so doing, you always get the compound result.) 

Second, and more restrictively, it requires the author to explicitly state all of the possible combinations of input streams during authoring. If the user wanted Dutch audio and English text, this possibility must have been considered at authoring time. 

A solution to both problems is to allow in-line use of System Test Attributes, as given in the following document fragment:

  <video src="anchor.mpg" ... />
    <audio src="dutch.aiff"   systemLanguage="DU" systemCaptions="overdub" ... />
    <audio src="english.aiff" systemLanguage="EN" systemCaptions="overdub"... />
    <text  src="dutch.html"   systemLanguage="DU" systemCaptions="captions"... />
    <text  src="english.html" systemLanguage="EN" systemCaptions="captions"... />

This example says: a video is accompanied by four other data objects, all of which are (logically) shown in parallel. This is, of course, exactly what happens: all five do run in parallel, but it could be that only the video and one audio stream are actually selected by the user (or a user agent) to be rendered during the presentation. At author time you know which logical streams are available, but it is only at runtime that you know which combination of all potentially available stream actually meet the user's needs. Logically, the alternatives indicated by the in-line construct could be represented as a set of <switch> statements, although the resulting <switch> could become explosive in size. Use of an in-line test mechanism significantly simplifies the specification of adaptive content in the case that many independent alternatives exist.

4.2.4 User Groups

The provision of <switch>-based and in-line system test attributes provides a selection mechanism based on general system attributes. This version of SMIL extends this notion with the definition of user test attributes. User test attributes allow presentation authors to define their own test attributes for use in a specific document. 

The elements used to provide user group functionality are: 

The <user_attributes> element

A section within the SMIL head that contains definitions of each of the user groups. The elements within the section define a collection of author-specified test attributes that can be used in the document. 

The <u_group> element 

An author-defined grouping of related media objects. These are defined within the section delineated by the <user_attributes> elements that make up part of the document header, and they are referenced within a media object definition. 

The <u_group> element supports the following attributes: 

In addition to the <user_attribute> and <u_group> elements, this module provides a u_group attribute that can be applied to content requiring selection.

The u_group attribute

The u_group attribute is evaluated as a test attribute, if the u_group attribute evaluates to true, the associated element is evaluated, otherwise it and its content is skipped.

The following example shows how user groups can be applied within a SMIL document:

  1 <smil>
  2    <head>
  3       <layout>
  4          <!-- define projection regions -->
  5       </layout>
  6       <user_attributes>
  7          <u_group id="nl_aud" u_state="RENDERED" title="Dutch Audio Cap" override="allowed" />
  8          <u_group id="uk_aud" u_state="NOT_RENDERED" title="English Audio Cap" override="allowed" />
  9          <u_group id="nl_txt" u_state="NOT_RENDERED" title="Dutch Text Cap"override="allowed" />
 10          <u_group id="uk_txt" u_state="NOT_RENDERED" title="English Text Cap" override="allowed" />
 11       </user_attributes>
 12    </head>
 13    <body>
 14       ...
 15       <par>
 16          <video src="announcer.rm" region="a"/>
 17          <text src="news_headline.html" region="b"/>
 18          <audio src="story_1_nl.rm" u_group="nl_aud"/>
 19          <audio src="story_1_uk.rm" u_group="uk_aud-cam"/>
 20          <text src="story_1_nl.html" u_group="nl_txt" region="c"/>
 21          <text src="story_1_uk.html" u_group="uk_txt" region="d"/>
 22       </par>
 23       ...
 24    </body>
 25 </smil>

Lines 6 through 11 define the available groups. Each group contains an identifier and a title (which can be used by the user interface agent to label the group), as well as the (optional) initial state definition and override flag. 

In line 7, a <u_group> named "nl_aud" is defined for Dutch audio captions that is initially set to RENDERED. The other groups in this (very simple) example are set to NOT_RENDERED

In lines 15 through 22, a SMIL <par> construct is used to identify a portion of a presentation. In this <par>, a single video (line 16) is accompanied by two audio streams (18,19) and two text streams (20,21), one each for English and Dutch. The <par> also contains a text title that contains a headline. 

The interaction of the user interface and the initial state determine which objects are rendered. Note that the same attributes are used across the entire document, meaning that the user only needs to select his/her content preferences once to control related groups of information. In the example, user is free to have the video and headline text accompanied by any combination of English and Dutch captions. (Note that if two audio captions are selected, the player will need to determine how these are processed for delivery.) 

While this example shows in-line use of user groups, the groups could also be applied as test attributes in a <switch>. Similarly, the system test attributes typically found in a <switch> could also be used in-line as a control attribute on an element along with the u_group attribute.

A previous version of this specification used camelCase for the user group elements and attributes instead of the underlined convention used here. We need to standardize this across the SMIL modules.

4.3 Presentation Priority/Grouping

The following is still under development by the SYMM Working Group. The working group is interested in considering this functionality but the syntax and semantics described here are only preliminary thinking.

Define a means to group collections of objects that share a common policy. A Channel defines a partitioning of elements into groups each group has a common set of access policies control use of quasi-physical resources: - priority - common server - common access rights / charging model - local resource use (layout, devices, etc.)

4.4 User-Centered Adaptation

The following is still under development by the SYMM Working Group. The working group is interested in considering this functionality but the syntax and semantics described here are only preliminary thinking.

Focus on presentation as collection of content: each of the components may have a different user-level representation, encoding:

At author-time, you know alternatives; at use-time, you select

4.5 Presentation Optimization

4.5.1 The <prefetch> element

This element will give a suggestion or hint to a user-agent that a media resource will be used in the future and the author would like part or all of the resource fetched ahead of time to make to make the document playback more smoothly. User-agents can ignore <prefetch> elements, though doing so may cause an interruption in the document playback when the resource is needed. It gives authoring tools or savvy authors the ability to schedule retrieval of resources when they think that there is available bandwidth or time to do it. A <prefetch> element is contained within the body of an XML document, and its scheduling is based on its lexical order unless explicit timing is present.

The <prefetch> element, like media object elements, can have id and src. If SMIL Boston Timing is integrated into the document, begin, end, dur, clipBegin, and clipEnd attributes are also available. The id and src elements are the same as for other media objects id names the element for reference in the document and src names the resource to be prefetched. When a media object with the same src URL is encountered the user-agent can use any data it prefetched to begin playback without rebuffering or other interruption. The timing attributes begin, end, dur would constrain the presentation time period for prefetching the element. At the end of the presentation time specified by end or dur, the prefetch operation should stop. The clipBegin and clipEnd elements are used to identify the part of the src clip to prefetch, if only the last 30s of the clip are being played, we don't want to prefetch it from the beginning. Likewise if only the middle 30 seconds of the clip are begin played, we don't want to prefetch more data than will be played.

The mediaSize, mediaTime, and bandwidth Attributes

In addition to the attributes allowed on Media Object Elements, the following attributes are allowed:

mediaSize : bytes-value | percent-value
Defines how much of the resource to fetch as a function of the file size of the resource. To fetch the entire resource without knowing its size, specify 100%. The default is 100%.
mediaTime : clock-value | percent-value
Defines how much of the resource to fetch as a function of the duration of the resource. To fetch the entire resource without knowing its duration, specify 100%. The default is 100%.
bandwidth : bitrate-value | percent-value
Defines how much network bandwidth the user-agent should use when doing the prefetch. To use all that is available, specify 100%. The default is 100%

If both mediaSize and mediaTime are specified, mediaSize is used and mediaTime is ignored.

For descrete media (non-time based media like text/html or image/png) using the mediaTime attribute causes the entire resource to be fetched.

Documents must still playback even when the prefetch elements are ignored, although rebuffering or pauses in presentation of the document may occur.

If a prefetch element is repeated, due to restart or repeat on a parent element the prefetch operation should occur again. This insures appropriately "fresh" data is displayed if, for example, the prefetch is for a banner ad to a URL whose content changes with each request. Note that prefetching data from a URL that changes the content dynamically is dangerous if the entire resource isn't prefetched as the subsequent request for the remaining data may yield data from a newer resource. A user-agent should respect any appropriate caching directives applied to the content, e.g. no-cache 822 headers in HTTP. More specifically, content marked as non-cachable would have to be refetched each time it was played, where content that is cachable could be prefetched once, with the results of the prefetch cached for future use.

If the clipBegin or ClipEnd in the media object are different from the prefetch, an implementation can use any data that was fetched and applies but the result may not be optimal.

Attribute value syntax

The bytes-value value has the following syntax:

bytes-value ::= Digit+; any positive number


The percent-val value has the following syntax:

percent-value ::= Digit+ "%"; any positive number in the range 0 to 100


The clock-value value has the following syntax:

Clock-val         ::= ( Hms-val | Smpte-val )
Smpte-val         ::= ( Smpte-type )? Hours ":" Minutes ":" Seconds 
                      ( ":" Frames ( "." Subframes )? )?
Smpte-type        ::= "smpte" | "smpte-30-drop" | "smpte-25"
Hms-val           ::= ( "npt=" )? (Full-clock-val | Partial-clock-val 
                      | Timecount-val)
Full-clock-val    ::= Hours ":" Minutes ":" Seconds ("." Fraction)?
Partial-clock-val ::= Minutes ":" Seconds ("." Fraction)?
Timecount-val     ::= Timecount ("." Fraction)? (Metric)?
Metric            ::= "h" | "min" | "s" | "ms"
Hours             ::= DIGIT+; any positive number
Minutes           ::= 2DIGIT; range from 00 to 59
Seconds           ::= 2DIGIT; range from 00 to 59
Frames            ::= 2DIGIT; @@ range?
Subframes         ::= 2DIGIT; @@ range?
Fraction	  ::= DIGIT+
Timecount         ::= DIGIT+
DIGIT		  ::= [0-9]

For Timecount values, the default metric suffix is "s" (for seconds).


The bitrate-value value specifies a number of bits per second. It has the following syntax:

bitrate-value ::= Digit+; any positive number


1) Prefetch the image so it can be displayed immediately after the video ends:

        <prefetch id="endimage" src=""/> 
        <text id="interlude" src="" fill="freeze"/> 
      <video id="main-event" src="rtsp://"/> 
      <image src="" fill="freeze"/> 

No timing is specified so default timing applies in the above example. The text is discrete media so it ends immediately, the prefetch is defaulted to prefetch the entire image at full available bandwidth and the prefetch element ends when the image is downloaded. That ends the <par> and the video begins playing. When the video ends the image is shown.

2) Prefetch the images for a button so that rollover occurs quickly for the end user:

    <prefetch id="upimage" src=""/> 
    <prefetch id="downimage" src=""/>
    <!-- script will change the graphic on rollover --> 
    <img src=""/> 

4.6 Open Issues

Can prefetch elements be used as timebases for sync? This could be an useful capability to be supported. We should be able to start a prefetch and not play the content until it completes. This means that prefetch has to have effective begin and end, depending upon how long it actually takes to get the data. Of course, if prefetching is optional, we need to decide when the begin and end events fire. However this introduces the problem of how to handle errors. Even though the prefetch may not be allowed or fail, there may be other things dependant upon the timing of the prefetch element. In this case it is appropriate for the element's timing to continue and fire begin\end events as if the prefetch element ran to completion. Since this is all very complicated, and prefetch is intended to be transparent, one idea is that we explicitly prohibit prefetch from being a syncbase. This is not as simple as it sounds, say that a prefetch element is in the middle of a <seq>. Maybe the simplest solution is to allow prefetch as a syncbase, and to say that for sync purposes, all prefetch elements always have duration zero, and fire begin\end events event if the prefetch itself fails or is not allowed

previous   next   contents