previous   next   contents  


4. The SMIL Content Control Module

Jeffrey Ayars (, RealNetworks
Dick Bulterman, (, Oratrix

Table of contents

4.1 Introduction

This Section defines the SMIL content control module. This module contains elements and attributes which provide for runtime content choices and optimized content delivery. Since these elements and attributes are defined in a module, designers of other markup languages can reuse the functionality in the SMIL content control module when they need to include media content control in their language. Conversely, language designers incorporating other SMIL modules do not need to include the content module if other content control functionality is already present.

Proposed Extensions to SMIL 1.0 content control functionality includes:

4.2 Content Selection

SMIL 1.0 provides a "test-attribute" mechanism to process an element only when certain conditions are true, e.g. when the client has a certain screen-size. SMIL 1.0 also provides the "switch" element for expressing that a set of document parts are alternatives, and that the first one fulfilling certain conditions should be chosen. This is useful e.g. to express that different language versions of an audio file are available, and to have the client select one of them. SMIL Boston includes these features and extends them by supporting new system test-attributes, as well as the ability to customize a presentation to an individual viewer by providing author defined, user selected test-attributes.

4.2.1 The <switch> Element

The switch element allows an author to specify a set of alternative elements from which only one acceptable element should be chosen. In SMIL Boston, an element is acceptable if the element is a SMIL Boston element, the media-type can be decoded (if the element declares media), and all of the test-attributes of the element evaluate to "true". When integrating content control into other languages, the language designer must specify what constitutes an "acceptable element."

An element is selected as follows: the player evaluates the elements in the order in which they occur in the switch element. The first acceptable element is selected at the exclusion of all other elements within the switch.

Thus, authors should order the alternatives from the most desirable to the least desirable. Furthermore, authors should place a relatively fail-safe alternative as the last item in the <switch> so that at least one item within the switch is chosen (unless this is explicitly not desired). Implementations should NOT arbitrarily pick an object within a <switch> when test-attributes for all child elements fail.

Note that some network protocols, e.g. HTTP and RTSP, support content-negotiation, which may be an alternative to using the "switch" element in some cases.


The switch element can have the following attributes:

An XML identifier
This attribute offers advisory information about the element for which it is set. Values of the title attribute may be rendered by user agents in a variety of ways. For instance, visual browsers frequently display the title as a "tool tip" (a short message that appears when the pointing device pauses over an object).

4.2.2 Predefined Test Attributes

This specification defines a list of test attributes that can be added to language elements, as allowed by the language designer. In SMIL Boston, these elements are synchronization and media elements. Conceptually, these attributes represent boolean tests. When one of the test attributes specified for an element evaluates to "false", the element carrying this attribute is ignored.

Within the list below, the concept of "user preference" may show up. User preferences are usually set by the playback engine using a preferences dialog box, but this specification does not place any restrictions on how such preferences are communicated from the user to the SMIL player.

SMIL Boston defines the following test attributes. Note that some hyphenated test attribute names from SMIL 1.0 have been deprecated in favor of names using the SMIL Boston camelCase convention. For these, the deprecated SMIL 1.0 name is shown in parentheses after the SMIL-Boston name.

systemBitrate (system-bitrate)
This attribute specifies the approximate bandwidth, in bits per second available to the system. The measurement of bandwidth is application specific, meaning that applications may use sophisticated measurement of end-to-end connectivity, or a simple static setting controlled by the user. In the latter case, this could for instance be used to make a choice based on the users connection to the network. Typical values for modem users would be 14400, 28800, 56000 bit/s etc. Evaluates to "true" if the available system bitrate is equal to or greater than the given value. Evaluates to "false" if the available system bitrate is less than the given value.
The attribute can assume any integer value greater than 0. If the value exceeds an implementation-defined maximum bandwidth value, the attribute always evaluates to "false".
systemCaptions (system-captions)
This attribute allows authors to distinguish between a redundant text equivalent of the audio portion of the presentation (intended for a audiences such as those with hearing disabilities or those learning to read who want or need this information) and text intended for a wide audience. The attribute can has the value "on" if the user has indicated a desire to see closed-captioning information, and it has the value "off" if the user has indicated that they don't wish to see such information. Evaluates to "true" if the value is "on", and evaluates to "false" if the value is "off".
systemLanguage (system-language)
The attribute value is a comma-separated list of language names as defined in [RFC1766].

Evaluates to "true" if one of the languages indicated by user preferences exactly equals one of the languages given in the value of this parameter, or if one of the languages indicated by

user preferences exactly equals a prefix of one of the languages given in the value of this parameter such that the first tag character following the prefix is "-".

Evaluates to "false" otherwise.

Note: This use of a prefix matching rule does not imply that language tags are assigned to languages in such a way that it is always true that if a user understands a language with a certain tag, then this user will also understand all languages with tags for which this tag is a prefix.

The prefix rule simply allows the use of prefix tags if this is the case.

Implementation note: When making the choice of linguistic preference available to the user, implementors should take into account the fact that users are not familiar with the details of language matching as described above, and should provide appropriate guidance. As an example, users may assume that on selecting "en-gb", they will be served any kind of English document if British English is not available. The user interface for setting user preferences should guide the user to add "en" to get the best matching behavior.

Multiple languages MAY be listed for content that is intended for multiple audiences. For example, a rendition of the "Treaty of Waitangi", presented simultaneously in the original Maori and English versions, would call for:

<audio src="foo.rm" systemLanguage="mi, en"/>
However, just because multiple languages are present within the object on which the systemLanguage test attribute is placed, this does not mean that it is intended for multiple linguistic audiences. An example would be a beginner's language primer, such as "A First Lesson in Latin," which is clearly intended to be used by an English-literate audience. In this case, the systemLanguage test attribute should only include "en".

Authoring note: Authors should realize that if several alternative language objects are enclosed in a "switch", and none of them matches, this may lead to situations such as a video being shown without any audio track. It is thus recommended to include a "catch-all" choice at the end of such a switch which is acceptable in all cases.

systemOverdubOrCaption (system-overdub-or-caption)
This attribute is a setting which determines if users prefer overdubbing or captioning when the option is available. The attribute can have the values "caption" and "overdub". Evaluates to "true" if the user preference matches this attribute value. Evaluates to "false" if they do not match. This test attribute has been deprecated in favor of using systemOverdubOrSubtitle and systemCaptions.
systemRequired (system-required)
This attribute specifies the name of an extension. The extension may be a newly adopted language element or attribute, or may be the namespace prefix or URI for a namespace extension. Evaluates to "true" if the extension is supported by the implementation, otherwise, this evaluates to "false". [NAMESPACES]
systemScreenSize (system-screen-size)
Attribute values have the following syntax:
screen-size-val ::= screen-height"X"screen-width
Each of these is a pixel value, and must be an integer value greater than 0. Evaluates to "true" if the SMIL playback engine is capable of displaying a presentation of the given size. Evaluates to "false" if the SMIL playback engine is only capable of displaying a smaller presentation.
systemScreenDepth (system-screen-depth)
This attribute specifies the depth of the screen color palette in bits required for displaying the element. The value must be greater than 0. Typical values are 1, 8, 24, 32 .... Evaluates to "true" if the SMIL playback engine is capable of displaying images or video with the given color depth. Evaluates to "false" if the SMIL playback engine is only capable of displaying images or video with a smaller color depth.
This attribute specifies whether subtitles or overdub is rendered for people who are watching a presentation where the audio may be in a language in which they are not fluent. This attribute can have two values: "overdub", which selects for substitution of one voice track for another, and "subtitle", which means that the user prefers the display of subtitles.
This test attribute specifies whether or not closed audio descriptions should be rendered. This is intended to provide authors with the ability to support audio descriptions for blind users like systemCaptions provides text captions for deaf users. The attribute has the value "on" if the user has indicated a desire to hear audio descriptions, and it has the value "off" if the user has indicated that they don't wish to hear audio descriptions. Evaluates to "true" if the value is "on", and evaluates to "false" if the value is "off".
TBD (i.e. Streaming/Stored)
TBD (i.e. Selecting embedded information (element in aggregate))
TBD (i.e. Costs of accessing a stream, free or Pay-Per-View)
CDATA that describes a component of the playback system, e.g. user-agent component/feature, # audio channels, codec, HW mpeg decoder


1) Choosing between content with different bitrate

In a common scenario, implementations may wish to allow for selection via a "systemBitrate" parameter on elements. The media player evaluates each of the "choices" (elements within the switch) one at a time, looking for an acceptable bitrate given the known characteristics of the link between the media player and media server.

  <text .../>
    <par systemBitrate="40000">
    <par systemBitrate="24000">
    <par systemBitrate="10000">

2) Choosing between audio resources with different bitrate

The elements within the switch may be any combination of elements. For instance, one could merely be specifying an alternate audio track:

   <audio src="joe-audio-better-quality" systemBitrate="16000" />
   <audio src="joe-audio" systemBitrate="8000" />

3) Choosing between audio resources in different languages

In the following example, an audio resource is available both in French and in English. Based on the user's preferred language, the player can choose one of these audio resources.

   <audio src="joe-audio-french" systemLanguage="fr"/>
   <audio src="joe-audio-english" systemLanguage="en"/>

4) Choosing between content written for different screens

In the following example, the presentation contains alternative parts designed for screens with different resolutions and bit-depths. Depending on the particular characteristics of the screen, the player can choose one of the alternatives.

  <text .../>
    <par systemScreenSize="1280X1024" systemScreenDepth="16">
    <par systemScreenSize="640X480" systemScreenDepth="32">
    <par systemScreenSize="640X480" systemScreenDepth="16">

5) Distinguishing caption tracks from stock tickers

In the following example, captions are shown only if the user wants captions on.

    <audio      src="audio.rm"/>
    <video      src="video.rm"/>
    <textstream src="stockticker.rtx"/>
    <textstream src="closed-caps.rtx" systemCaptions="on"/>

6) Choosing the language of overdub and subtitle tracks

In the following example, a French-language movie is available with English, German, and Dutch overdub and subtitle tracks. The following SMIL segment expresses this, and switches on the alternatives that the user prefers.

    <audio src="movie-aud-en.rm" systemLanguage="en" 
    <audio src="movie-aud-de.rm" systemLanguage="de" 
    <audio src="movie-aud-nl.rm" systemLanguage="nl" 
    <!-- French for everyone else -->
    <audio src="movie-aud-fr.rm"/>
  <video src="movie-vid.rm"/>
    <textstream src="movie-sub-en.rt" systemLanguage="en"
    <textstream src="movie-sub-de.rt" systemLanguage="de"
    <textstream src="movie-sub-nl.rt" systemLanguage="nl"
    <!-- French captions for those that really want them -->
    <textstream src="movie-caps-fr.rt" systemCaptions="on"/>

4.2.3 User Groups

The following is still under development by the SYMM Working Group. The syntax and semantics described here are preliminary and subject to change.

New to SMIL Boston is a mechanism for authors to define a set of test-attributes that enable a presentation to be customized to the needs of an individual viewer. The author defines a set of named states along with their initial value. In the body of the presentation these states are checked by declaring their ID's to be the value of the "uGroup" test attribute. The user groups can be used to control content presentation or selection just like the system test attributes described above.

<userAttributes> element
This element introduces a section within the SMIL head that contains definitions of each of the user groups.


an author-defined grouping of related media objects.
this is the default evaluated state of the uGroup.

If the uGroup attribute evaluates to true, the associated element is evaluated, otherwise it and its content is skipped. Note that players are free to implement different mechanisms for setting the state of the user groups. Bringing up a dialog box allowing the user to choose, and evaluating based on data stored in a configuration file, are two of the suggested alternatives.


      <!-- define projection regions a, b, c & d -->
      <uGroup id="nl_aud" uState="RENDERED" title="Dutch Audio Cap" />
      <uGroup id="uk_aud" uState="NOT_RENDERED" title="English Audio Cap" />
      <uGroup id="nl_txt" uState="NOT_RENDERED" title="Dutch Text Cap"/>
      <uGroup id="uk_txt" uState="NOT_RENDERED" title="English Text Cap"/>
      <video src="announcer.rm" region="a"/>
      <text src="news_headline.html" region="b"/>
      <audio src="story_1_nl.rm" uGroup="nl_aud" region="c"/>
      <audio src="story_1_uk.rm" uGroup="uk_aud" region="d"/>
      <text src="story_1_nl.html" uGroup="nl_txt"/>
      <text src="story_1_uk.html" uGroup="uk_txt"/>

{Need to provide description of example}

4.3 Presentation Priority/Grouping

The following is still under development by the SYMM Working Group. The syntax and semantics described here are preliminary and subject to change.

Define a means to group collections of objects that share a common policy. A Channel defines a partitioning of elements into groups each group has a common set of access policies control use of quasi-physical resources: - priority - common server - common access rights / charging model - local resource use (layout, devices, etc.)

4.4 User-Centered Adaptation

The following is still under development by the SYMM Working Group. The syntax and semantics described here are preliminary and subject to change.

Focus on presentation as collection of content: each of the components may have a different user-level representation, encoding:

At author-time, you know alternatives; at use-time, you select

4.5 Presentation Optimization

4.5.1 The <prefetch> element

The following is still under development by the SYMM Working Group. The syntax and semantics described here are preliminary and subject to change.

This element will give a suggestion or hint to a user-agent that a media resource will be used in the future and the author would like part or all of the resource fetched ahead of time to make to make the document playback more smoothly. User-agents can ignore prefetch elements, though doing so may cause an interruption in the document playback when the resource is needed. It gives authoring tools or savvy authors the ability to schedule retrieval of resources when they think that there is available bandwidth or time to do it. A <prefetch> element is contained within the body of an XML document, and its scheduling is based on its lexical order unless explicit timing is present.

The <prefetch> element, like media object elements, can have id and src. If SMIL Boston Timing is integrated into the document, begin, end, dur, clipBegin, and clipEnd attributes are also available. The id and src elements are the same as for other media objects id names the element for reference in the document and src names the resource to be prefetched. When a media object with the same src URL is encountered the user-agent can use any data it prefetched to begin playback without rebuffering or other interruption. The timing attributes begin, end, dur would constrain the presentation time period for prefetching the element. At the end of the presentation time specified by end or dur, the prefetch operation should stop. The clipBegin and clipEnd elements are used to identify the part of the src clip to prefetch, if only the last 30s of the clip are being played, we don't want to prefetch it from the beginning. Likewise if only the middle 30 seconds of the clip are begin played, we don't want to prefetch more data than will be played.

The mediaSize, mediaTime, and bandwidth Attributes

In addition to the attributes allowed on Media Object Elements, the following attributes are allowed:

mediaSize : bytes-value | percent-value
Defines how much of the resource to fetch as a function of the file size of the resource. To fetch the entire resource without knowing its size, specify 100%. The default is 100%.
mediaTime : clock-value | percent-value
Defines how much of the resource to fetch as a function of the duration of the resource. To fetch the entire resource without knowing its duration, specify 100%. The default is 100%.
bandwidth : bitrate-value | percent-value
Defines how much network bandwidth the user-agent should use when doing the prefetch. To use all that is available, specify 100%. The default is 100%

If both mediaSize and mediaTime are specified, mediaSize is used and mediaTime is ignored.

For descrete media (non-time based media like text/html or image/png) using the mediaTime attribute causes the entire resource to be fetched.

Documents must still playback even when the prefetch elements are ignored, although rebuffering or pauses in presentation of the document may occur.

If a prefetch element is repeated, due to restart or repeat on a parent element the prefetch operation should occur again. This insures appropriately "fresh" data is displayed if, for example, the prefetch is for a banner ad to a URL whose content changes with each request. Note that prefetching data from a URL that changes the content dynamically is dangerous if the entire resource isn't prefetched as the subsequent request for the remaining data may yield data from a newer resource. A user-agent should respect any appropriate caching directives applied to the content, e.g. no-cache 822 headers in HTTP. More specifically, content marked as non-cachable would have to be refetched each time it was played, where content that is cachable could be prefetched once, with the results of the prefetch cached for future use.

If the clipBegin or ClipEnd in the media object are different from the prefetch, an implementation can use any data that was fetched and applies but the result may not be optimal.

Attribute value syntax

The bytes-value value has the following syntax:

bytes-value ::= Digit+; any positive number


The percent-val value has the following syntax:

percent-value ::= Digit+ "%"; any positive number in the range 0 to 100


The clock-value value has the following syntax:

Clock-val         ::= ( Hms-val | Smpte-val )
Smpte-val         ::= ( Smpte-type )? Hours ":" Minutes ":" Seconds 
                      ( ":" Frames ( "." Subframes )? )?
Smpte-type        ::= "smpte" | "smpte-30-drop" | "smpte-25"
Hms-val           ::= ( "npt=" )? (Full-clock-val | Partial-clock-val 
                      | Timecount-val)
Full-clock-val    ::= Hours ":" Minutes ":" Seconds ("." Fraction)?
Partial-clock-val ::= Minutes ":" Seconds ("." Fraction)?
Timecount-val     ::= Timecount ("." Fraction)? (Metric)?
Metric            ::= "h" | "min" | "s" | "ms"
Hours             ::= DIGIT+; any positive number
Minutes           ::= 2DIGIT; range from 00 to 59
Seconds           ::= 2DIGIT; range from 00 to 59
Frames            ::= 2DIGIT; @@ range?
Subframes         ::= 2DIGIT; @@ range?
Fraction	  ::= DIGIT+
Timecount         ::= DIGIT+
DIGIT		  ::= [0-9]

For Timecount values, the default metric suffix is "s" (for seconds).


The bitrate-value value specifies a number of bits per second. It has the following syntax:

bitrate-value ::= Digit+; any positive number


1) Prefetch the image so it can be displayed immediately after the video ends:

        <prefetch id="endimage" src=""/> 
        <text id="interlude" src="" fill="freeze"/> 
      <video id="main-event" src="rtsp://"/> 
      <image src="" fill="freeze"/> 

No timing is specified so default timing applies in the above example. The text is discrete media so it ends immediately, the prefetch is defaulted to prefetch the entire image at full available bandwidth and the prefetch element ends when the image is downloaded. That ends the <par> and the video begins playing. When the video ends the image is shown.

2) Prefetch the images for a button so that rollover occurs quickly for the end user:

    <prefetch id="upimage" src=""/> 
    <prefetch id="downimage" src=""/>
    <!-- script will change the graphic on rollover --> 
    <img src=""/> 

4.6 Open Issues

Can prefetch elements be used as timebases for sync? This could be an useful capability to be supported. We should be able to start a prefetch and not play the content until it completes. This means that prefetch has to have effective begin and end, depending upon how long it actually takes to get the data. Of course, if prefetching is optional, we need to decide when the begin and end events fire. However this introduces the problem of how to handle errors. Even though the prefetch may not be allowed or fail, there may be other things dependant upon the timing of the prefetch element. In this case it is appropriate for the element's timing to continue and fire begin\end events as if the prefetch element ran to completion. Since this is all very complicated, and prefetch is intended to be transparent, one idea is that we explicitly prohibit prefetch from being a syncbase. This is not as simple as it sounds, say that a prefetch element is in the middle of a <seq>. Maybe the simplest solution is to allow prefetch as a syncbase, and to say that for sync purposes, all prefetch elements always have duration zero, and fire begin\end events event if the prefetch itself fails or is not allowed

previous   next   contents