W3C

Synchronized Multimedia Integration Language (SMIL) Boston Specification

W3C Working Draft 20-August-1999

This version:
http://www.w3.org/1999/08/WD-smil-boston-19990820 (as a single HTML file )
Latest version:
http://www.w3.org/TR/smil-boston
Previous version:
http://www.w3.org/1999/08/WD-smil-boston-19990803
Editors:
Jeff Ayars (RealNetworks), Aaron Cohen (Intel), Ken Day (Macromedia), Erik Hodge (RealNetworks), Philipp Hoschka (W3C), Rob Lanphier (RealNetworks), Nabil Layaïda (INRIA), Jacco van Ossenbruggen (CWI), Lloyd Rutledge (CWI), Bridie Saccocio (RealNetworks), Patrick Schmitz (Microsoft), Warner ten Kate (Philips), Ted Wugofski (Gateway), Jin Yu (Compaq)


Abstract

This document specifies the "Boston" version of the Synchronized Multimedia Integration Language (SMIL, pronounced "smile"). SMIL Boston has the following two design goals:

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This document is the first working draft of the specification for the next version of SMIL code-named "Boston". It has been produced as part of the W3C Synchronized Multimedia Activity. The document has been written by the SYMM Working Group (members only). The goals of this group are discussed in the SYMM Working Group charter (members only).

Many parts of the document are still preliminary, and do not constitute full consensus within the Working Group. Also, some of the functionality planned for SMIL Boston is not contained in this draft. Many parts are not yet detailed enough for implementation, and other parts are only suitable for highly experimental implementation work.

At this point, the W3C SYMM WG seeks input by the public on the concepts and directions described in this specification. Please send your comments to www-smil@w3.org. Since it is difficult to anticipate the number of comments that come in, the WG cannot guarantee an individual response to all comments. However, we will study each comment carefully, and try to be as responsive as time permits.

The only difference between this working draft and the version from August 3 1999 is that the draft is also provide as a single HTML document.

This working draft may be updated, replaced or rendered obsolete by other W3C documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This document is work in progress and does not imply endorsement by the W3C membership.

Short Table of Contents

Full Table of Contents

A. About SMIL Boston

B. Synchronized Multimedia Integration Language (SMIL) Modules

C. SMIL-Boston Animation

D. Content Control Module (detailed specification not yet available)

E. Event Module (detailed specification not yet available)

F. Integration Module  (detailed specification not yet available)

G. Layout Module (detailed specification not yet available)

H. The SMIL Linking Module

I. Media Object Module

J. Metainformation Module (detailed specification not yet available)

K. Structure Module (detailed specification not yet available)

L. SMIL Timing and Synchronization

M. Integrating SMIL Timing into other XML-Based Languages

O. Synchronized Multimedia Integration Language (SMIL) Document Object Model

A. About SMIL Boston

Editors:
Philipp Hoschka, W3C (ph@w3.org)


Table of Contents

1 Introduction

This document specifies the "Boston" version of the Synchronized Multimedia Integration Language (SMIL, pronounced "smile"). SMIL Boston has the following two design goals:

SMIL Boston is defined as a set of markup modules, which define the semantics and an XML syntax for certain areas of SMIL functionality. All modules have an associated Document Object Model (DOM).

SMIL Boston deprecates some SMIL 1.0 syntax in favor of more DOM friendly syntax. Most notable is the change from hyphenated attribute names to mixed case (camel case) attribute names, e.g., clipBegin is introduced in favor of clip-begin. The SMIL Boston modules do not contain these SMIL 1.0 attributes so that integration applications are not burdened with supporting them. SMIL document players, those applications that support playback of "application/smil" documents (or <smil></smil> documents (or however we denote SMIL documents vs. integration documents)) must support the SMIL 1.0 attribute names.

This specification is structured as follows: Section B presents the individual modules in more detail, and gives example profiles. Section 2 defines the animation module. Section C defines the animation module. Section D defines control elements such as the switch element. Section E defines the SMIL event model. Section F defines syntax that is only used when SMIL modules are integrated into other XML-based languages. Section G defines the elements that can be used to define the layout of a SMIL presentation. Section H provides for XML linking into SMIL documents. Section I defines elements and attributes allowing to describe media objects. Section J defines the meta element functionality. Section K defines the elements that form the sceleton of a SMIL document (head, body etc.). Section L defines the Timing and Synchronization elements. In particular, this Section defines the time model used in SMIL. Section M explains how SMIL timing can be integrated into other XML-based languages.

2 Acknowledgements

This document has been prepared by the Synchronized Multimedia Working Group (WG) of the World Wide Web Consortium. The WG includes the following individuals:

In addition to the working group members, the following people contributed to the SMIL effort: Dan Austin (CNET), Rob Glidden (Web3D), Mark Hakkinen (The Productivity Works), Jonathan Hui (Canon), Rob Lanphier (RealNetworks), Tony Parisi (Web3D), Dave Raggett (W3C).

B. Synchronized Multimedia Integration Language (SMIL) Modules

Previous version (W3C members only):
http://www.w3.org/AudioVideo/Group/Modules/symm-modules-19990719
Editors:
Ted Wugofski <ted.wugofski@otmp.com>,
Patrick Schmitz <pschmitz@microsoft.com>,
Warner ten Kate<tenkate@natlab.research.philips.com>.


Abstract

This is a working draft of a specification of synchronized multimedia integration language (SMIL) modules. These modules may be used to provide multimedia features to other XML based languages, such as the Extensible Hypertext Markup Language (XHTML). To demonstrate how these modules may be used, this specification outlines a set of sample profiles based on common use cases.


Table of Contents


1 Introduction

The first W3C Working Group on Synchronized Multimedia (SYMM) developed SMIL, the Synchronized Multimedia Integration Language [SMIL]. This XML-based language [XML] is used to express timing relationships among media elements such as audio and video files. SMIL 1.0 documents describe multimedia presentations that can be played in a SMIL-conformant viewer.

Since the publication of SMIL 1.0, interest in the integration of SMIL concepts with the HTML, the Hypertext Markup Language [HTML], and other XML languages, has grown. Likewise, the W3C HTML Working Group is exploring how the XHTML, the Extensible Markup Language [XHTML], can be integrated with other languages. Both Working Groups are considering modularization as a strategy for integrating their respective functionality with each other and other XML languages.

Modularization is a solution in which a language's functionality is partitioned into sets of semantically-related elements. Profiling is the combination of these feature sets to solve a particular problem. For the purposes of this specification we define:

element
An element is a representation of a semantic feature. An element has one representation in any given syntax.
module
A module is a collection of semantically-related elements.
module family
A module family is a collection of semantically-related modules. Each element is in one and only one module family. Modules in a module family are generally ordered by increasing functionality (each module is generally inclusive of the previous module in the module family).
profile
A profile is a collection of modules particular to an application domain or language. For example, the SMIL profile corresponds to the collection of modules that make up the SMIL language. Likewise, an enhanced television profile would correspond to the collection of modules for media-enhancement of broadcast television. In general, a profile would include only one module from a particular module family.

SMIL functionality is partitioned into modules based on the following design requirements:

  1. Ensure that a profile may be defined that is completely backward compatibility with SMIL 1.0.
  2. Ensure that a module's semantics maintain compatibility with SMIL 1.0 semantics (this includes content and timing).
  3. Specify modules that are isomorphic with other modules based on W3C recommendations.
  4. Specify modules that can complement XHTML modules.
  5. Adopt new W3C recommendations when appropriate and not in conflict with other requirements.
  6. Specify how the modules support the document object model.

The first requirement is that modules are specified such that a collection of modules can be "recombined" in such a way as to be backward compatible with SMIL (it will properly play SMIL conforming content).

The second requirement is that the semantics of SMIL must not change when they are embodied in a module. Fundamentally, this ensures the integrity of the SMIL content and timing models. This is particularly relevant when a different syntax is required to integrate SMIL functionality with other languages.

The third requirement is that modules be isomorphic with other modules from other W3C recommendations. This will assist designers when sharing modules across profiles.

The fourth requirement is that specific attention be payed to providing multimedia functionality to the XHTML language. XHTML is the reformulation of HTML in XML.

The fifth requirement is that the modules should adopt new W3C recommendations when they are appropriate and when they do not conflict with other requirements (such as complementing the XHTML language).

The sixth requirement is to ensure that modules have integrated support for the document object model. This facilitates additional control through scripting and user agents.

These requirements, and the ongoing work by the SYMM Working Group, led to a partitioning of SMIL functionality into nine modules.

2 SMIL Modules

SMIL functionality is partitioned into nine (9) modules :

Each of these modules introduces a set of semantically-related elements, properties, and attributes.

2.1 Animation Module

The Animation Module provides a framework for incorporating animation onto a timeline (a timing model) and a mechanism for composing the effects of multiple animations (a composition model). The Animation Module defines semantics for the animate, set, move, and colorAnim elements.

2.2 Content Control Module

The Content Control Module provides a framework for selecting content based on a set of test attributes. The Content Control Module defines semantics for the switch element.

2.3 Event Module

The Event Module provides a framework for realizing the event model specified in the W3C Document Object Model Level 2. The Event Module defines semantics for the eventhandler and event elements.

2.4 Layout Module

The Layout Module provides a framework for spatial layout of visual components. The Layout Module defines semantics for the layout, root-layout, and region elements.

2.5 Linking Module

The Linking Module provides a framework for relating documents to content, documents and document fragments. The Linking Module defines semantics for the a and anchor elements.

2.6 Media Object Module

The Media Object Module provides a framework for declaring media. The Media Object Module defines semantics for the ref, animation, audio, img, video, text, textstream, xref, xanimation, xaudio, ximg, xvideo, xtext, xtextstream elements.

2.7 Metainformation Module

The Metainformation Module provides a framework for describing a document, either to inform the human user or to assist in automation. The Metainformation Module defines semantics for the meta element.

2.8 Structure Module

The Structure Module provides a framework for structuring a SMIL document. The Structure Module defines semantics for the smil, head, and body elements.

2.9 Timing and Synchronization Module

The Timing and Synchronization Module provides a framework for describing timing structure, timing control properties, and temporal relationships between elements. The Timing and Synchronization Module defines semantics for par, seq, excl, and choice elements. In addition, this module defines semantics for properties such as begin, beginAfter, beginWith, beginEvent, dur, end, endEvent, endWith, eventRestart, repeat, repeatDur, timeAction, and timeline. These elements and attributes are subject to change.

3 Isomorphism

A requirement for SMIL modularization is that the modules be isomorphic with other modules from other W3C recommendations. Isomorphism will assist designers when sharing modules across profiles.
Table -- Isomorphism between SMIL modules and their corresponding HTML modules.
SMIL modules
HTML modules
Animation animate - -
Content Control switch - -
Event event, eventhandler Intrinsic Events onevent
Event event, eventhandler
Layout layout, region, root-layout Stylesheet style
Linking a, anchor Hypertext a
Link link
Base base
Image Map map, area
Media Object ref, audio, video, text, img, animation, textstream Object object, param
Image img
Applet applet, param
Metainformation meta Metainformation meta
Structure smil, head, body Structure html, head, body, title
??? div and span
Timing and Synchronization par, seq - -

As can be seen in the table, there are two modules that appear in both SMIL and HTML: Event and Metainformation. Work is underway to define a single module that can be shared by both SMIL and HTML.

4 Multimedia Profiles

There are a range of possible profiles that may be built using SMIL modules. Four profiles are defined to inform the reader of how profiles may be constructed to solve particular problems:

These example profiles are non-normative.

4.1 Lightweight Presentations Profile

The Lightweight Presentations Profile handles simple presentations, supporting timing of text content. The simplest version of this could be used to sequence stock quotes or headlines on constrained devices such as a palmtop device or a smart phone. This example profile might include the following SMIL modules:

This profile may be based on XHTML modules [XMOD] with the addition of Timing and Synchronization Module. Transitions might be accomplished using the Animation Module.

4.2 SMIL-Boston Profile

The SMIL-Boston Profile supports the timeline-centric multimedia features found in SMIL language. This profile might include the following SMIL modules:

4.3 XHTML Presentations Profile

The XHTML Presentations Profile integrates multimedia, XHTML layout, and CSS positioning. This profile might include the following SMIL modules:

This profile would use XHTML modules for structure and layout and SMIL modules for multimedia and timing. The linking functionality may come from the XHTML modules [XMOD] or from the SMIL modules.

4.4 Web Enhanced Media Profile

The Web Enhanced Media Profile supports the integration of multimedia presentations with broadcast or on-demand streaming media. The primary media will often define the main timeline. This profile might include the following SMIL modules:

This profile is similar to the XHTML Presentations Profile with additional support to manage stream events and synchronization of the document's clock to the primary media.

5 Appendices

A Normative References

[SMIL] "Synchronized Multimedia Integration Language (SMIL) 1.0 Specification", P. Hoschka, 15 Jun 98. This is available at http://www.w3.org/TR/REC-smil.

[XLINK]

[XML] "Extensible Markup Language (XML) 1.0", T. Bray, J. Paoli, C. M. Sperberg-McQueen, 10 Feb 98. This is available at http://www.w3.org/TR/REC-xml.

[XPTR]

B Informative References

[HTML] "HTML 4.0 Specification", D. Raggett, A. Le Hors, I. Jacobs, 24 Apr 98. This is available at http://www.w3.org/TR/REC-html40.

[XHTML] "Extensible Markup Language (XHTML) 1.0 Specification"

[XMOD] "Modularization of XHTML Working Draft"

C Document Type Definitions

We will probably want to follow the HTML WG's lead on architecting module DTDs and the drivers for combining these DTDs automatically. We might want to consider our schedule in light of the XML Schema schedule.

D Document Object Model Bindings

The modules defined in this WD need to clearly align with the interfaces defined in the SMIL DOM WD and the existing DOM Level 1 and DOM Level 2 interfaces.

C. SMIL Boston Animation

Previous version (W3C members only):
http://www.w3.org/AudioVideo/Group/Animation/symm-animation-990702.html
Editors:
Ken Day (Macromedia),
Patrick Schmitz (Microsoft),
Aaron Cohen (Intel)


Abstract

This is a working draft of a specification of animation functionality for XML documents.  It is part of work in the Synchronized Multimedia Working Group (SYMM) towards a next version of the SMIL language and modules.  It describes an animation framework as well as a set of base XML animation elements, included in SMIL and suitable for integration with other XML documents.



Table of Contents

1. Introduction

The first W3C Working Group on Synchronized Multimedia (SYMM) developed SMIL - Synchronized Multimedia Integration Language.  This XML-based language is used to express synchronization relationships among media elements.  SMIL 1.0 documents describe multimedia presentations that can be played in SMIL-conformant viewers.

SMIL 1.0 was focused primarily on linear presentations, and did not include support for animation.  Other working groups (especially Graphics) are exploring animation support for things like vector graphics languages.  As the timing model is at the heart of animation support, it is appropriate for the SYMM working group to define a framework for animation support, and to define a base set of widely applicable animation structures.  This document describes that support.

Where SMIL 1.0 defined a document type and the associated semantics, the next version modularizes the functionality.  The modularization facilitates integration with other languages, and the development of profiles suited to a wider variety of playback environments.  See also "Synchronized Multimedia Modules based upon SMIL 1.0" (W3C members only).  The Animation Module described herein is designed with the same goals in mind, and in particular to satisfy requirements such as those of the Graphics Working Group.

1.1 Overview of support

This document describes a framework for incorporating animation onto a time line and a mechanism for composing the effects of multiple animations.  A set of basic animation elements are also described that can be applied to any XML-based language that supports a Document Object Model. A language in which this module is embedded is referred to as a host language.

Animation is inherently time-based. SMIL animation is defined in terms of the SMIL timing model, and is dependent upon the support described in the SMIL Timing and Synchronization Module.  The capabilities are described by new elements with associated attributes and associated semantics, as well as the SMIL timing attributes. Animation is modeled as a local time line. An animation element is typically a child of the target element, the element that is to be animated.

While this document defines a base set of animation capabilities, it is assumed that host languages will build upon the support to define additional and/or more specialized animation elements.  In order to ensure a consistent model for document authors and runtime implementors, we introduce a framework for integrating animation with the SMIL timing model. Animation only manipulates attributes of the target elements, and so does not require any specific knowledge of the target element semantics.

An overview of the fundamentals of SMIL animation is given in Animation Framework. The syntax of the animation elements and attributes is specified in Animation Syntax. The semantics of animation is specified in Animation Semantics. The normative definition of syntax is entirely contained in Animation Syntax, and the normative definition of precise semantics is entirely contained in Animation Semantics. All other text in this specification is informative. In cases of conflicts, the normative form sections take precedence. Anyone having a detailed question should refer to the Animation Syntax and Animation Semantics, as appropriate.

1.2 Requirements

2. Animation Framework

This section is informative. Readers who need to resolve detailed questions of syntax or semantics should refer to Animation Syntax and Animation Semantics, respectively, which are the only normative forms.

Animation is inherently time-based, changing the values of element attributes over time. The SYMM Working Group defines a generalized model for timing and synchronization that applies to SMIL documents, and is intended to be included in other XML-based host languages. While this document defines a base set of animation elements, it is assumed that other host languages will build upon the support to define additional and/or more specialized elements.  In order to ensure a consistent model for both document authors and runtime implementors, we introduce a framework for integrating animation with the SMIL timing model.

[@@Ed: We intend that this section be a useful discussion of our central animation concepts, using simple examples. Feedback on its usefulness and clarity will be appreciated. In particular, the syntax elements used are not introduced prior to their use. It is hoped that the examples are sufficiently simple that an intuitive understanding of from/to/by will be sufficient. Details are in Section 3, Animation Syntax.]

@@@[Issue] This draft is written in terms of XML attribute animation. However, there is a need to animate DOM attributes which are not exposed as XML attributes. This applies in particular to structured attributes. A mechanism for naming these attributes is needed.

2.1 Basic Animation

Animation is defined as a time-based manipulation of a target element (or more specifically of some attribute of the target element, the target attribute). The definition expresses a function, the animation function, of time from 0 to the simple duration of the animation element. The definition is evaluated as needed over time by the runtime engine, and the resulting values are applied to the target attribute.  The functional representation of the animation's definition is independent of this model, and may be expressed as a sequence of discrete values, a keyframe based function, a spline function, etc. In all cases, the animation exposes this as a function of time.

For example, the following defines the linear animation of a bitmap. The bitmap appears at the top of the region, moves 100 pixels down over 10 seconds, and disappears.

<par>
   <img dur="10s" ...>
      <animate attribute="top" from="0" to="100" dur="10s"/>
   </img>
</par>

Animation has a very simple model for time. It just uses the animation element's local time line, with time varying from 0 to the duration.  All other timing functions, including synchronization and time manipulations such as repeat, time scaling, etc. are provided (transparently) by the timing model. This makes it very simple to define animations, and properly modularizes the respective functionality.

Other features of the SMIL Timing and Synchronization module may be used to create more complex animations. For example, an accelerated straight-line motion can be created by applying an acceleration time filter to a straight-line, constant-velocity motion. There are many other examples.

2.2 Additive Animation

It is frequently useful to define animation as a change in an attribute's value. Motion, for example, is often best expressed as an increment, such as moving an image from it's initial position to a point 100 pixels down:

<par>
   <img dur="10s" ...>
      <animate attribute="top" by="100"
               dur="10s" additive="true"/>
   </img>
</par>

Many complex animations are best expressed as combinations of simpler animations. A corkscrew path, for example, can be described as a circular motion added to a straight-line motion. Or, as a simpler example, the example immediately above can be slowed to move only 40 pixels over the same period of time by inserting a second additive <animate> which by itself would animate 60 pixels the other direction:

<par>
   <img dur="10s" ...>
      <animate attribute="top" by="100"
               dur="10s" additive="true"/>
      <animate attribute="top" by="-60"
               dur="10s" additive="true"/>
   </img>
</par>

When there are multiple animations active for an element at a given moment, they are said to be composed, and the resulting animation is composite. The active animations are applied to the current underlying value of the target attribute in activation order (first begun is first applied), with later additive animations being applied to the result of the earlier-activated animations. When two animations start at the same moment, the first in lexical order is applied first.

A non-additive animation masks all animations which began before it, until the non-additive animation ends.

Numeric attributes generally can have additive animations applied, though it may not make sense for some. Types such as strings and booleans, for which addition is not defined, cannot.

As long as the host language defines addition of the target attribute type and the value of the animation function, additive animation is possible. For example, if the language defines date arithmetic, date attributes can have additive animations applied, perhaps as a number of days to be added to the date. Such attributes are said to support composite animation.

2.3 Cumulative Animation

The author may also select whether a repeating animation should repeat the original behavior for each iteration, or whether it should build upon the previous results, accumulating with each iteration. For example, a motion path that describes an arc can repeat by drawing the same arc over and over again, or it can begin each repeat iteration where the last left off, making the animated element bounce across the window. This is called cumulative animation.

Repeating our 100-pixel-move-down example, we can move 1,000 pixels in 100 seconds.

<par>
   <img dur="100s" ...>
<animate dur="10s" repeat="indefinite" accumulate="true"
attribute="top" by="100"/>    </img> </par>

This example can, of course, be coded as a single 100-second, 1000-pixel motion. With more complex paths, additive animation is much more valuable. For example, if one created a motion path for a single sine wave, a repeated sine wave animation could easily be created by cumulatively repeating the single wave.

Typically, authors expect cumulative animations to be additive (as in the example directly above), but this is not required. The following example is not additive. It starts at the absolute position given, 20. It moves down by 10 pixels to 30, then repeats. It is cumulative, so the second iteration starts at 30 and moves down by another 10 to 40. Etc.

<par>
   <img dur="100s" ...>
<animate dur="10s" repeat="indefinite"
attribute="top" from="20" by="10"
additive="false" accumulate="true"/>
   </img> </par>

Cumulative animations are possible for any attribute which supports animation composition. When the animation is also additive, as composite animations typically are, they compose just as straight additive animations do (using the cumulative value).

2.4 Freezing Animations and Holding Values

When an animation element ends, its affect is normally removed from the target. For example, if an animation moves an image and the animation element ends, the image will jump back to its original position. For example:

<par>
   <img dur="20s" ...>
      <animate begin="5s" dur="10s" attribute="top" by="100"/>
   </img>
</par>

The image will appear stationary for 5 seconds (begin="5s" in the <animate>), then move 100 pixels down in 10 seconds (dur="10s", by="100"). At the end of the movement the animation element ends, so it's effect ends and the image jumps back where it started (to the underlying value of the top attribute). The image lasts for 20 seconds, so it will remain back at the original position for 5 seconds then disappear.

The standard timing attribute fill can be used to maintain the value of the animation after the simple duration of the animation element ends:

<par>
   <img dur="20s" ...>
      <animate begin="5s" dur="10s" fill="freeze"
               attribute="top" by="100"/>
   </img>
</par>

The <animate> ends after 10 seconds, but fill="freeze" keeps its final effect active until it is ended by the ending of its parent element, the image.

However, it is frequently useful to define an animation as a sequence of additive steps, one building on the other. For example, the author might wish to move an image rapidly for 2 seconds, slowly for another 2, then rapidly for 1, ending 100 pixels down. It is natural to express this as a <seq>, but each element of a <seq> ends before the next begins.

The attribute hold keeps final effect applied until ended by target element itself, the image, ends:

<par>
   <img dur="100s" ...>
<seq>
<animate dur="2s" attribute="top" by="50" hold="true"/>
<animate dur="2s" attribute="top" by="10" hold="true"/> <animate dur="1s" attribute="top" by="40" hold="true"/> </seq>
   </img>
</par>

The effect of the held animations are essentially attached to the target to achieve the desired result. In this example, it will have moved 50 pixels after 2 seconds and 60 after 4. At 5 seconds it will reach 100 pixels and stay there. Note that not only does each <animate> end before the image, but the <seq> containing the animation elements also ends (when the last <animate> ends). The effect of the held animations is retained until the image ends.

The difference between hold="true" and fill="freeze" is that hold causes the animation to "stick" to the target element until the target element ends, while the duration of the fill is determined by the parent of the animation element.

The above example is equivalent to both of the following examples, but easier to visualize and maintain:

<!-- Equivalent animation using a <seq> -->
<img dur="100s" ...>
<seq>
<animate dur="2s" attribute="top" by="50"/>
<animate dur="2s" attribute="top" values="50 60" additive="true"/> <animate dur="1s" attribute="top" values="60 100" additive="true" fill="freeze"/> </seq>
</img>
<!-- Equivalent animation using a <par> -->
<img dur="100s" ...>
<par>
<animate dur="2s"
attribute="top" by="50" fill="freeze"/> <animate begin="prev.end" dur="2s" attribute="top" by="10" fill="freeze"/> <animate begin="prev.end" dur="1s" attribute="top" by="40" fill="freeze"/> </par>
</img>

The trick here is that fill="freeze" causes the animation elements to last until the end of the <seq> or <par>, respectively, which in turn lasts until the image ends. With more complex paths, the arithmetic would be impractical and difficult to maintain.

2.5 Nested Animation (?)

@@@Issue If animation elements were allowed to animate the parameters of other animation elements, certain use cases become very easy. For example, a dying oscillation could be created by placing an undamped oscillation animation, then animating the length of the oscillation's path (decreasing it over time). The SYMM WG is uncertain whether the complexity of this feature is worth its benefit.

2.6 Interaction with DOM Manipulations

@@@ Issue We need to define what it means to animate an attribute that has been changed by scripting or by another DOM client while the <animate> is active. This involves some implementation issues. Some alternatives: changing an attribute with script cancels the animation, changing an attribute simply changes the "initial state" of that attribute and the animation proceeds as if the attribute started out with that values.

2.7 Animation Elements as Independent Timelines

By default, the target of an animation element will be the closest ancestor for which the manipulated attribute is defined. However, the target may be any (@@@??) element in the <body> of the document, identified by its element id. [@@@Should be limited to elements which are known when the animation begins, or perhaps to those known when the animation is encountered in the text -- should be similar to other limitations on idrefs. Probably no forward references past the point in document loading at which playback starts.]

An animation element affects its target only if both are active at the same time. The calculation of the target attribute at a given moment in time uses the animation element's timeline (current position on its timeline and simple duration) to compute the new value of the animated attribute of the target.

For example, in the following animation the image repeatedly moves 100 pixels down, from 0 to 100, and jumps back to the top. The 10 second animation begins 5 seconds before the target element. So, the target appears at 50, moves down for 5 seconds to 100, jumps back to the top, and goes into a series of 10-second motions from 0 to 100.

<par>
   <img id="a"         begin="5s" .../>
   <animate target="a" begin="0s" dur="10s" repeat="indefinite"
                       attribute="top" from="0" to="100"/>
</par>

Note that in this example, the animation is running before the target exists, so it cannot be a child of the target. It must explicitly identify the target.

This is very useful for starting part of the way into spline-based paths, as splines are hard to split.

2.8 Limits on Animation

The definitions in this module could be used to animate any attribute. However, it is expected that host languages will constrain what elements and attributes animation may be applied to. For example, we do not expect that most host languages will support animation of the src attribute of a media element. A host language which included a DOM might limit animation to the attributes which may be modified through the DOM.

Any attribute of any element not specifically excluded from animation by the host language may be animated, as long as the underlying datatype supports discrete values (for discrete animation) or addition (for additive animation).

3. Animation Syntax

This section defines the XML animation elements and attributes. It is the normative form for syntax questions. See Animation Semantics for semantic definitions; all discussion of semantics in this section is informative.

3.1 Common Attributes

All animation elements use the common timing markup described in the SMIL Timing and Synchronization module.  In addition, animation elements share attributes to control composition, and to describe the calculation mechanism.

Common attributes:

additive
Controls whether or not the animation is additive. Possible values are "true" and "false". Default is "false", unless another attribute is used which specifies otherwise. This attribute is ignored if the target attribute does not support addition.
accumulate
Controls whether or not the animation is cumulative. Possible values are "true" and "false". Default is "false". This attribute is ignored if the target attribute does not support addition.
hold
Controls whether or not the effect of the animation when the animation element ends is applied to the target attribute until the target element ends. Possible values are "true" and "false". Default is "false".
attribute
This attribute specifies the attribute to animate.  This is the name of an XML attribute. Any constraints on attributes which may be animated must be specified in the definition of the host language.
target [@@@Issue: should we change this to "element"? Would that be clearer?]
This attribute specifies the element to be animated. The value is the XML identifier of the target element. It defaults to the parent element if no id is specified. If the target does not support the named attribute, the animation element has no effect.
         @@@Issue: The id mechanism is important to support timed sets of animations (e.g. sequences) that introduce parent nodes between the animated element and the animation behavior. Xpointer/Xpel syntax may be preferable (since it is a uniform mechanism already supported by future XML processors) than new attributes called target and attribute.

3.2 Animate Element

The <animate> element introduces a generic attribute animation that requires no semantic understanding of the attribute being animated.  It can animate numeric scalars as well as numeric vectors. It can also animate discrete sets of non-numeric attributes.

attributes

calcMode
Specifies the interpolation mode for the animation. This can take any of the following values:
 
"discrete"
This specifies that the animation function will jump from one value to the next without any interpolation.
"linear"
Simple linear interpolation between values is used to calculate the animation function. Treated as "discrete" if the attribute does not support linear interpolation. This is the default calcMode.
"spline"
As for linear, interpolating from one value in the values list to the next by the the amount of time elapsed defined by a cubic Bezier spline. The knots of the spline must be specified in the values attribute.  A values array must be specified. @@@[Issue] Syntax of values attribute to specify splines is TBD. Needs to cover both open & closed paths.
There are two ways to define the animation function, lists of values or from-to-by.

list of values specification of animation function

The basic form is to provide a list of values:

values attribute of the <animate> element:
A space-separated list of one or more values, compatible with the target attribute and interpolation mode, and agreeing with the value syntax of the host language. Vector-valued attributes are supported using the vector syntax of the host language. [@@@Issue, Patrick] Space separation may be a pain for some types of values. I would describe some explicit separator, and would add that parens around the values are legal, and ignored. In particular, consider a list of triplets for something like HSL or x,y,z values.

The values array and calcMode together define the animation function. For discrete animation, the duration is divided into even time periods, one per value. The animation function takes on the values in order, one value for each time period. For linear animation, the duration is divided into n-1 even periods, and the animation function is a linear interpolation between the values at the associated times. Note that a linear animation will be a nicely closed loop if the first value is repeated as the last.

from/to/by specification of animation function

For convenience, the values for a simple discrete or linear animation may be specified using a from/to/by notation, replacing the values and additive attributes. From is optional in all cases. To or by (but not both) must be specified. If a values attribute or an additive attribute is specified, none of these three attributes may be specified. [@@@Issue] Need to specify behavior in error cases.

from attribute of the <animate> element:
Optional. Specifies the starting value of the animation.  If from is specified, its value must match the to or by type. If the from value is not specified, zero is used for additive animations.
to attribute of the <animate> element:
Specifies the ending value of the animation.  The argument value must match the attribute type. Specifies a non-additive animation with 2 elements in the values array, the value of the from attribute and the value of the to attribute. The from attribute defaults to the base value of the attribute for the animation when to is used.
by attribute of the <animate> element:
Specifies a relative offset value for the animation. The host language must define addition of the by value to the target attribute, and the result must be compatible with the target attribute. Specifies an additive animation with two elements in the values array, from and by. The from attribute defaults to zero when by is used.

Animations expressed using from/to/by are equivalent to the same animation with from and to or by replaced by values. Examples of equivalent <animate> elements:

from/to/by form
values form
<animate ... by="10"/>
<animate ... values="0 10" additive="true"/>
<animate ... from="5" by="10"/>
<animate ... values="5 15" additive="true"/>
<animate ... from="10" to="20"/>
<animate ... values="10 20" additive="false"/>
<animate ... to="10"/>
<animate ... values="b 10" additive="false"/>,
where b is the base value for the animation.


3.3 Set Element

The <set> element is a convenience form of the <animate> element.  It supports all attribute types, including those that cannot reasonably by interpolated, and that more sensibly support semantics of setting a value over the specified duration (e.g. strings and boolean values). The <set> element is non-additive.  While this supports the general set of timing attributes, the effect of the "repeat" attribute is just to extend the defined duration. In addition, using "fill=freeze" will have largely the same effect as an indefinite duration.

<Set> takes the "attribute" and "target" attributes from the generic attribute list described above, as well as the following:

to
Specifies the value for the attribute during the duration of the <set> element. The original value of the attribute (before the animation started) is restored when the animation completes (i.e. ends).  The argument value must match the attribute type. This can be used with any attribute, but it is primarily intended for use with String and Boolean attributes.

Formally, <set ... to=z .../> is defined as <animate ... calcMode="discrete" additive="false" values=z .../>.

3.4 Motion Animation

[@@@ Issue] The WG does not agree on the inclusion of this element in SMIL. This would be a very reasonable extension in other host languages, and there is value in a standardized motion animation element. We are interested in feedback from others who are defining potential host languages.

In order to abstract the notion of motion paths across a variety of layout mechanisms, we introduce the <move> element. This takes all the attributes of <animate> described above, as well as two additional attributes:

origin
Specifies the origin of motion for the animation. The default origin for an element is relative to the parent container, and is generally relative to the top left of the parent (although this depends upon the layout model in the document). However, it is often useful to place the origin at the position of the element as it is laid out. This allows for motion relative to the default layout position (e.g. from off screen left to the layout position, specified as from="(-100, 0)" and to="(0, 0)". This is especially useful for flow-layout models like HTML and CSS.

Legal values are:
parent
The origin is the top left of the parent layout container. This is the default.
layout
The origin is the default layout position of the element being animated, as set by the layout manager.
path
Specifies the curve that describes the attribute value as a function of time. @@@ The path syntax is TBD. Should address the issue of mapping time to control points, so that control points need not be spaced evenly in time.

3.5 Color Animation

[@@@ Issue] SYMM WG would like there to be a standard animation tag for color, e.g. interpreted in the HSL space. However, some of the members do not see this as part of our charter. We are interested in feedback from others who are defining potential host languages.

The following is one such possible definition:

In order to abstract the notion of color animation, we introduce the <colorAnim> element. This takes all the generic attributes described above, supporting string values as well as RGB values for the individual argument values.  The animation of the color is defined to be in HSL space. [@@@ need to explain why & interaction with RGB values -- examples. Might want rgb-space animation for improved performance when it's "good enough" for the author].  This element takes one additional attribute as well:
direction
This specifies the direction to run through the colors, relative to the standard color wheel. If the to and from are the same values and clockwise or cclockwise were specified, the animation will cycle full circle through the color wheel.
Legal values are:
clockwise
Animate colors between the from and to values in the clockwise direction on the color wheel. This is the default
cclockwise
Animate colors between the from and to values in the counter-clockwise direction on the color wheel.
nohue
Do not animate the hue, but only the saturation and level.  This allows for simple saturation animations, ignoring the hue and ensuring that it does not cycle.

We may need to support extensions to the path specification to allow the direction to be specified between each pair of color values in a path specification.  This would allow for more complex color animations specified as a path.

4. Animation Semantics

@@@ Need a section with precise mathematical definitions of animation semantics

5. Document Object Model Support

@@@Need DOM for animation elements.

Need to mention and point to DOM Core and SMIL DOM specs.  May want to discuss issues which host languages must specify: Interaction between animation and DOM manipulations, the mechanism for determining property type, definition of addition.  Animation of DOM attributes not exposed as XML attributes discussion may belong here.

Related section: Interaction with DOM Manipulations

6. References

@@@Ed: Need to review
[HTML]
"HTML 4.0 Specification", D. Raggett, A. Le Hors, I. Jacobs, 24 April 1998.

Available at http://www.w3.org/TR/REC-html40.
[SMIL]
"Synchronized Multimedia Integration Language (SMIL) 1.0 Specification W3C Recommendation 15-June-1998 ".

Available at: http://www.w3.org/TR/REC-smil.
[SMIL-DOM]
"SMIL Document Object Model", Nabil Layaïda, Patrick Schmitz, Jin Yu.

Available at http://www.w3.org/AudioVideo/Group/DOM/symm-dom (W3C members only).
[SMIL-MOD]
"Synchronized Multimedia Modules based upon SMIL 1.0", Patrick Schmitz, Ted Wugofski, Warner ten Kate.

Available at http://www.w3.org/TR/NOTE-SYMM-modules.
[SMIL-TIME]
... SMIL timing module, Patrick Schmitz, ...other editors.

Available at http://www.w3.org/@@@.
 
[SVG]
@@@ an appropriate public reference describing the activities of the Graphics WG and/or information about the SVG language.

Available at http://www.w3.org/@@@
[XML]
"Extensible Markup Language (XML) 1.0", T. Bray, J. Paoli, C.M. Sperberg-McQueen, editors, 10 February 1998.

Available at http://www.w3.org/TR/REC-xml

Appendix 1. Collected Issues to be resolved

In no particular order:

H. The SMIL Linking Module

Previous version:
http://www.w3.org/AudioVideo/Group/Linking/extended-linking-19990623 (W3C member only)
Editors:
Lloyd Rutledge, CWI (lloyd.rutledge@cwi.nl),
Philipp Hoschka, W3C (ph@w3.org)


Table of Contents

1. Introduction

The SMIL linking module defines the user-initiated hyperlink elements that can be used in a SMIL document. It describes

XPointer [XPTR] allows components of XML documents to be addressed in terms of their placement in the XML structure rather than on their unique identifiers. This allows referencing of any portion of an XML document without having to modify that document. Without XPointer, pointing within a document may require adding unique identifiers to it, or inserting specific elements into the document, such as a named anchor in HTML. XPointers are put within the fragment identifier part of a URI.

XLink (XML Linking Language) [XLINK] defines a set of generic attributes that can be used when defining linking elements in an XML-encoded language. Using these generic XLink attributes has the advantage that users find the same syntactic constructs with the same semantics in many XML-based languages, resulting in a faster learning curve. It also enables generic link processors to process the hyperlinking semantics in XLink documents without understanding the details of the DTD. For example, it allows users of a generic XML browser to follow SMIL links.

Both XLink and XPointer are subject to change. At the time of this document's writing, neither is a full W3C recommendation. This document is based on the public Working Drafts ([XLINK], [XPTR]). It will change when these two formats change.

2. XPointer Support

2.1 Linking into SMIL documents

SMIL 1.0 allowed authors to playing back a SMIL presentation at a particular element rather than at the beginning by using a URI with a fragment identifier, e.g. "doc#test", where "test" was the value of an element identifier in the SMIL document "doc". This meant that only elements with an "id" attribute could be the target of a link.

The SMIL Linking module defined in this specification allows using any element in a SMIL document as target of a link. SMIL software must fully support the use of XPointers for fragment identifiers in URIs pointing into SMIL documents.

Example:

The following URI selects the 4th par element of an element called "bar":

http://www.w3.org/foo.smil#id("bar").child(4,par)

Note that XPointer only allows navigating in the XML document tree, i.e. it does not actually understand the time structure of a SMIL document.

Error handling

When a link into a SMIL document contains an unresolvable XPointer ("dangling link") because it identifies an element that is not actually part of the document, SMIL software should ignore the XPointer, and start playback from the beginning of the document.

When a link into a SMIL document contains an XPointer which identifies an element that is the content of a "switch" element, SMIL software should interpret this link as going to the parent "switch" element instead. The result of the link traversal is thus to play the "switch" element child that passes the usual switch child selection process.

2.2 Use of Xpointer in SMIL attributes

The use of XPointer is not restricted to XLink attributes. Any attribute specifying a URI can use an XPointer (unless, of course, prohibited for that attributes document set).

XPointer can be used in various SMIL attributes which refer to XML components in the same SMIL document or in external XML documents. These include

3. Link Elements

3.1 The a Element

The "a" element has the same syntax and semantics as the SMIL 1.0 "a" element. All SMIL 1.0 attributes can still be used. The following lists attributes that are newly introduced by this specification, and attributes that are extended with respect to SMIL 1.0:

actuate
The value of this XLink attribute is fixed to "user". This indicates that traversal of this link is triggered by an external event, typically by user interaction.
behavior
This XLink attribute controls the temporal behavior of the presentation containing the link when the link is traversed. It can have the following values:
The default value of the "behavior" attribute depends on the value of the "show" attribute.
href
This XLink attribute is equivalent to the SMIL 1.0 "href" attribute.
inline
The value of this attribute is fixed to "true" (since the content of the "a" element is the local resource of the link).
sourceVolume
This attribute sets the volume of audio media objects in the presentation containing the link when the link is followed. Ignored if the presentation does not contain audio media objects. This attribute can have the same values as the "volume" property in CSS2. [CSS2]
destinationVolume
This attribute sets the volume of audio media contained in the remote resource. Ignored if the remote resource does not contain audio media. This attribute can have the same values as the "localVolume" attribute.
destinationPlaystate
This attribute controls the temporal behavior of the resource identified by the href attribute when the link is followed. It only applies when this resource is a continuous media object. It can have the same values as the "behavior" attribute.
show
This XLink attribute specifies how to handle the current state of the presentation at the time in which the link is activated. The following values are allowed:
The default value of "show" is "replace".
tabindex
This attribute provides the same functionality as the "tabindex" attribute in HTML 4.0 [HTML4]. It specifies the position of the element in the tabbing order for the current document. The tabbing order defines the order in which elements will receive focus when navigated by the user via the keyboard. At any particular point in time, only elements with an active timeline are taken into account for the tabbing order, and inactive elements that are are ignored for the tabbing order.
target
This attribute defines either in which existing display environment the link should be opened (e.g. a SMIL region, an HTML frame or another named window), or triggers opening a new display environment. Its value is the identifier of the display environment. If no currently active display environment has this identifier, a new display environment is opened and assigned the identifier of the target. When a presentation uses different types of display environments (e.g. SMIL regions and HTML frames), the namespace for identifiers is shared between these different types of display environments. For example, one cannot use a "target" attribute with the value "foo" twice in a document, and have it point once to an HTML frame, and then to a SMIL region. If the element has both a "show" attribute and a "target" attribute, the "show" attribute is ignored.
@@ linking into "excl" needs to be resolved
title
This XLink attribute provides human-readable text describing the link. It has the same significance as in SMIL 1.0. It is now also recognized as the XLink title attribute, whose semantics are consistent with that of the "title" attribute in SMIL 1.0.
xml:link
The value of this XLink attribute is fixed to "simple". This establishes the element as being an XLink simple link element. XLink processors will recognize this attribute assignment and know to process this element as an XLink simple link. Because the attribute is fixed in the DTD, it is not assigned in any SMIL document instance, and authors need not be aware of its use.

All XLink attributes not mentioned in the list above are not allowed in SMIL.

Element Content

No changes to SMIL 1.0.

3.2 The area Element

This element extends the syntax and semantics of the HTML 4.0 "area" element with constructs required for timing. The SMIL 1.0 "anchor" element is deprecated in favor of "area".

The  "area" element can have the attributes listed below, with the same syntax and semantics as in HTML 4.0:

The following lists attributes that are newly introduced by this specification, and attributes that are extended with respect to HTML 4.0:

actuate
Defined in Section on "a" element.
begin
Defined in "SMIL Timing and Synchronization" module.
behavior
Defined in Section on "a" element.
inline
The value of this attribute is fixed to "false" ("area" element are out-of-line links, since they do not element content, and thus do not have a local resource as defined by XLink).
dur
Defined in "SMIL Timing and Synchronization" module.
end
Defined in "SMIL Timing and Synchronization" module.
sourceVolume
Defined in Section on "a" element.
destinationVolume
Defined in Section on "a" element.
destinationPlaystate
Defined in Section on "a" element.
show
Defined in Section on "a" element.
tabindex
Defined in Section on "a" element.
target
Defined in Section on "a" element.
title
Defined in Section on "a" element.
xml:link
If the "anchor" element contains an "href" attribute, this attribute must be present, and set to "simple", if the "area" element should be interpreted as an XLink. It must not be present if there is no "href" attribute. (Explanation: SMIL uses "anchor" elements without href to e.g. allow jumping into a video. However, an "area" element without href is not an Xlink. The disadvantage is that in the case in which "anchor" is used to define a link, it will have to have an explicit xml:link attribute (see examples below)).

Element Content

An "area" elements can contain "seq" and "par" elements for scheduling other "area" elements over time.

seq
Defined in Timing module.
When used in the content of an "area" element, a "seq" element may only contain "seq" and "par" elements, and none of the other elements that can be used when a "seq" element is used outside of an "area" element.
par
Defined in Timing module.
When used in the content of an "area" element, a "par" element may only contain "seq" and "par" elements, and none of the other elements that can be used when a "par" element is used outside of an "area" element.

Examples

1) Decomposing a video into temporal segments

In the following example,  the temporal structure of an interview in a newscast (camera shot on interviewer asking a question followed by shot on interviewed person answering ) is exposed by fragmentation:

<smil>
  <body>
    <video src="video" title="Tom Cruise interview 1995" >
      <seq>
        <area dur="20s" title="first question" /> 
        <area dur="50s" title="first answer" />
      </seq>
    </video>
  </body>
</smil>

2) Associating links with spatial segments In the following example, the screen space taken up by a video clip is split into two sections. A different link is associated with each of these sections.

<smil>
  <body>
    <video src="video" title="Tom Cruise interview 1995" >
      <area shape="rect" coords="5,5,50,50" 
              title="Journalist" href="http://www.cnn.com" xml:link="simple" />
      <area shape="rect" coords="5,60,50,50" 
title="Tom Cruise" href="http://www.brando.com" xml:link="simple" /> </video> </body> </smil>

3) Associating links with temporal segments

In the following example, the duration of a video clip is split into two sub-intervals. A different link is associated with each of these sub-intervals.

<smil>
  <body>
    <video src="video" title="Tom Cruise interview 1995" >
      <seq> 
        <area dur="20s" title="first question" 
              href="http://www.cnn.com" xml:link="simple" />
        <area dur="50s" title="first answer" 
              href="http://www.brando.com" xml:link="simple" />
      </seq>
   </video>
  </body>
</smil>

References

[CSS2]
Cascading Style Sheets, level 2 (CSS2) Specification 12 May 1998, Bert Bos, Håkon Wium Lie, Chris Lilley and Ian Jacobs, editors, 12 May 1998. Available at http://www.w3.org/TR/REC-CSS2/  .
[HTML4]
"HTML 4.0 Specification", . Dave Raggett, Arnaud Le Hors and Ian Jacobs,  editors, 18 December 1997, revised 24 April 1998. Available at http://www.w3.org/TR/REC-html40/.
[XPTR]
Eve Maler and Steve DeRose, editors. XML Pointer Language (XPointer) V1.0. ArborText, Inso, and Brown University. Burlington,  Seekonk, et al.: World Wide Web Consortium, 1998. Available at   http://www.w3.org/TR/WD-xptr.
[XLINK]
Eve Maler and Steve DeRose, editors. XML Linking Language (XLink) V1.0. ArborText, Inso, and Brown University. Burlington, Seekonk, et al.: World Wide Web Consortium, 1998. Available at http://www.w3.org/TR/WD-xlink.

I. The SMIL Media Object Module

Previous version:
http://www.w3.org/AudioVideo/Group/Media/extended-media-object-19990713 (W3C members only)
Editors:
Philipp Hoschka, W3C (ph@w3.org),
Rob Lanphier (robla@real.com)


Table of Contents

1 Introduction

This Section defines the SMIL media object module. This module contains elements and attributes allowing to describe media objects. Since these elements and attributes are defined in a module, designers of other markup languages can reuse the SMIL media module when they need to include media objects into their language.

Changes with respect to the media object elements in SMIL 1.0 include changes required by basing SMIL on XLink [XLINK], and changes that provide additional functionality that was brought up as Requirements in the Working Group.

2 The ref, animation, audio, img, video, text and textstream elements

These elements can contain all attributes defined for media object elements in SMIL 1.0 with the changes described below, and the additional attributes described below.

2.1 Changes to SMIL 1.0 Attributes

clipBegin, clipEnd, clip-begin, clip-end

Using attribute names with hyphens such as "clip-begin" and "clip-end" is problematic when using a scripting language and the DOM to manipulate these attributes. Therefore, this specification adds the attribute names "clipBegin" and "clipEnd" as an equivalent alternative to the SMIL 1.0 "clip-begin" and "clip-end" attributes. The attribute names with hyphens are deprecated. Software supporting SMIL Boston must be able to handle all four attribute names, whereas software supporting only the SMIL media object module does not have to support the attribute names with hyphens. If an element contains both the old and the new version of a clipping attribute, the the attribute that occurs later in the text is ignored.

Example:

<audio src="radio.wav" clip-begin="5s" clipBegin="10s" />

The clip begins at second 5 of the audio, and not at second 10, since the "clipBegin" attribute is ignored.

The syntax of legal values for these attributes is defined by the following BNF:

Clip-value        ::= [ Metric ] "=" ( Clock-val | Smpte-val ) |
                      "name" "=" name-val 
Metric            ::= Smpte-type | "npt" 
Smpte-type        ::= "smpte" | "smpte-30-drop" | "smpte-25"
Smpte-val         ::= Hours ":" Minutes ":" Seconds 
                      [ ":" Frames [ "." Subframes ]]
Hours             ::= Digit Digit 
                  /* see XML 1.0 for a definition of ´Digit´*/
Minutes           ::= Digit Digit
Seconds           ::= Digit Digit
Frames            ::= Digit Digit
Subframes         ::= Digit Digit
name-val          ::= ([^<&"] | [^<&´])*
                  /* Derived from BNF rule [10] in [XML] 
                     Whether single or double quotes are 
                     allowed in a name value depends on which
                     type of quotes is used to quote the 
                     clip attribute value */

This implies the following changes to the syntax defined in SMIL 1.0:

Handling of new syntax in SMIL 1.0 software

Authors can use two approaches for writing SMIL Boston presentations that use the new clipping syntax and functionality ("name", default metric) defined in this specification, but can still can be handled by SMIL 1.0 software.

First, authors can use non-hyphenated versions of the new attributes that use the new functionality, and add SMIL 1.0 conformant clipping attributes later in the text.

Example:

<audio src="radio.wav" clipBegin="name=song1" clipEnd="name=moderator1" 
       clip-begin="0s" clip-end="3:50" />

SMIL 1.0 players implementing the recommended extensibility rules of SMIL 1.0 [SMIL] will ignore the clip attributes using the new functionality, since they are not part of SMIL 1.0. SMIL Boston players, in contrast, will ignore the clip attributes using SMIL 1.0 syntax, since they occur later in the text.

The second approach is to use the following steps:

  1. Add a "system-required" test attribute to media object elements using  the new functionality. The value of the "system-required" attribute must be the URI of this specification, i.e. @@ http://www.w3.org/AudioVideo/Group/Media/extended-media-object19990707
  2. Add an alternative version of the media object element that conforms to SMIL 1.0
  3. Include these two elements in a "switch" element

Example:

<switch>
  <audio src="radio.wav" clipBegin="name=song1" clipEnd="name=moderator1"  
   system-required=
     "@@http://www.w3.org/AudioVideo/Group/Media/extended-media-object19990707" />
  <audio src="radio.wav" clip-begin="0s" clip-end="3:50" />
</switch>

alt, longdesc

If the content of these attributes is read by a screen-reader, the presentation should be paused while the text is read out, and resumed afterwards.

New Accessibility Attributes

readIndex
This attribute specifies the position of the current element in the order in which longdesc and alt text are read out by a screen reader for the current document. This value must be a number between 0 and 32767. User agents should ignore leading zeros. The default value is 0.
Elements that contain alt or longdesc attributes are read by a screen reader according to the following rules:

2.2 XLink Attributes

To make SMIL 1.0 media objects elements XLink-conformant, the attributes defined in the XLink specification are added as described below.

Note: Due to a limitation in the current XLink draft, only the "src" attribute is treated as an Xlink locator, the "longdesc" attribute is treated as non-XLink linking mechanism (as allowed in Section 8 of the XLink draft). See Appendix for an XLink-conformant equivalent of SMIL 1.0 elements that contain a "longdesc" attribute.

actuate
The value of this attribute is fixed to "auto", i.e. the link is followed automatically.
behavior
This attribute does not apply to simple-link media object elements
content-role
This attribute does not apply, since media object elements are not inline links.
content-title
This attribute does not apply, since media object elements are not inline links.
inline
Defined in Xlink specification.
The value of this attribute is fixed to "false". SMIL media object elements are out-of-line links, since they do not have any content, and thus do not have a local resource as defined by XLink.
@@ since this is also a "simple link", this seems to be a "one-ended" link as described in Section 4.2 of the XLink draft (description there is not very clear)
role
@@ could be used to describe the role of the remote resource, i.e. the value of the "src" attribute. Can't think of a use case, so don't think this is needed
show
This attribute is defined in the Xlink specification. Its value is fixed to "embed". The media object behaves in the same way as SMIL 1.0 media objects, i.e. the media object is inserted into the presentation.
src
Equivalent to the SMIL 1.0 "src" attribute. Remapped via XLink attribute remapping onto the XLink "href" attribute.
Note: Attribute remapping is costly when the document does not contain a DTD definition, because in this case, FIXED attributes need to be included explicitly. This means the author has to use the following syntax to be XLink conformant:
<smil>
  <body>
    <audio src="audio.wav" xml:attributes="href src" />
  </body>
</smil>
title
Equivalent to SMIL 1.0 "title" attribute.
xml:link (required)
This attribute is required for an element to be an Xlink element. For simple media object elements, its value is fixed to "simple".
@@ same disadvantage for fixed attributes when DTD is missing as with "src" attribute.

2.3 SDP Attributes

When using SMIL in conjunction with the Real Time Transport Protocol (RTP, [RFC1889]), which is designed for real-time delivery of media streams, a media client is required to have initialization parameters in order to interpret the RTP data. These are typically described in the Session Description Protocol (SDP, [RFC2327]). This can be delivered in the DESCRIBE portion of the Real Time Streaming Protocol (RTSP, [RFC2326]), or can be delivered as a file via HTTP.

Since SMIL provides a media description language which often references SDP via RTSP and can also reference SDP files via HTTP, a very useful optimization can be realized by merging parameters typically delivered via SDP into the SMIL document. Since retrieving a SMIL document constitutes one round trip, and retrieving the SDP descriptions referenced in the SMIL document constitutes another round trip, merging the media description into the SMIL document itself can save a round trip in a typical media exchange.  This round-trip savings can result in a noticeably faster start-up over a slow network link.

This applies particularly well to two primary usage scenarios:

(see also "The rtpmap element" below)

SDP-related Attributes

port
This provides the RTP/RTCP port for a media object transferred via multicast. It is specified as a range, e.g., port="3456-3457" (this is different from "port" in SDP, where the second port is derived by an algorithm). Note: For transports based on UDP in IPv4, the value should be in the range 1024 to 65535 inclusive. For RTP compliance it should start with an even number. For applications where hierarchically encoded streams are being sent to a unicast address, this may be necessary to specify  multiple port pairs. Thus, the range of this request may contain greater than two ports. This attribute is only interpreted if the media object is transferred via RTP and without using RTSP.
rtpformat
This field has the same semantics as the "fmt list" sub-field in a SDP media description. It contains a list of media formats payload IDs. For audio and video, these will normally be a media payload type as defined in the RTP Audio/Video Profile (RFC 1890). When a list of payload formats is given, this implies that all of these formats may be used in the session, but the first of these formats is the default format for the session.For media payload types not explicitly defined as static types, the rtpmap element (defined below) may be used to provide a dynamic binding of media encoding to RTP payload type. The encoding names in the RTP AV Profile do not specify a complete set of parameters for decoding the audio encodings (in terms of clock rate and number of audio channels), and so they are not used directly in this field. Instead, the payload type number should be used to specify the format for static payload types and the payload type number along with additional encoding information should be used for dynamically allocated payload types. This attribute is only interpreted if the media object is transferred via RTP.
transport
This attribute has the same syntax and semantics as the "transport" sub-field in a SDP media description. It defines the transport protocol that is used to deliver the media streams. The standard value for this field is "RTP/AVP", but alternate values may be defined by IANA. RTP/AVP is the IETF's Realtime Transport Protocol using the Audio/Video profile carried over UDP. The complete definition of RTP/AVP can be found in [RFC1890]. Only applies if media object is transferred via RTP.
@@ this may be better to derive from the "src" parameter, which could optionally be rtp://___. This would mean that an RTP URL format  would need to be defined.

Example

<audio src="rtsp://www.w3.org/test.rtp" port="49170-49171"
       transport="RTP/AVP" fmt-list="96,97,98" />

Element Content

Media object elements can contain the following elements:

anchor
Defined in Linking Module
par
Defined in Timing Module
rtpmap
Defined below
seq
Defined in Timing Module

3 The rtpmap element

If the media object is transferred using the RTP protocol, and uses a dynamic payload type, SDP requires the use of the "rtpmap" attribute field. In this specification, this is mapped onto the "rtpmap" element, which is contained in the content of the media object element. If the media object is not transferred using RTP, this element is ignored.

Attributes

payload
The value of this attribute is a payload format type number listed in the parent element's "rtpformat" attribute. This is used to map dynamic payload types onto definitions of specific encoding types and necessary parameters.
encoding
This attribute encodes parameters needed to decode the dynamic payload type. The attribute values have the following syntax:
encoding-val    ::= encoding-name "/" clock-rate "/" encoding-params
encoding-name ::= name-val clock-rate ::= +Digit encoding-params ::= ??

Legal values for "encoding-name" are payload names defined in [RFC1890], and RTP payload names registered as MIME types [draft-ietf-avt-rtp-mime-00].
For audio streams, "encoding parameters" may specify the number of audio channels. This parameter may be omitted if the number of channels is one provided no additional parameters are needed. For video streams, no encoding parameters are currently specified. Additional parameters may be defined in the future, but codec specific parameters should not be added, but defined as separate rtpmap attributes.

Element Content

"rtpmap" is an empty element

Example

<audio src="rtsp://www.w3.org/foo.rtp" port="49170" 
       transport="RTP/AVP" fmt-list="96,97,98">
  <rtpmap payload="96" encoding="L8/8000" />
  <rtpmap payload="97" encoding="L16/8000" />
  <rtpmap payload="98" encoding="L16/11025/2" />
</audio>

4 Support for media player extensions

A media object referenced by a media object element is often rendered by software modules referred to as media players that are separate from the software module providing the synchronization between different media objects in a presentation (referred to as synchronization engine).

Media players generally support varying levels of control, depending on the constraints of the underlying renderer as well as media delivery, streaming etc. This specification defines 4 levels of support, allowing for increasingly tight integration, and broader functionality. The details of the interface will be presented in a separate document.

Level 0
Must allow the synchronization engine to query for duration, and must support cue, start and stop on the player. To support reasonable resynchronization, the media player must provide pause/unpause controls with minimal latency. This is the minimum level of support defined.
Level 1
In addition to all Level 0 support, the media player can detect when sync has been broken, so that a resynchronization event can be fired. A media player that cannot support Level 1 functionality is responsible to maintain proper synchronization in all circumstances, and has no remedy if it cannot (Level 1 support is recommended).
Level 2
In addition to all Level 1 support, the media player supports a tick() method for advancing the timeline in strict sync with the document timeline. This is generally appropriate to animation renderers that are not tightly bound to media delivery constraints.
Level 3
In addition to all Level 2 support, the media player also supports a query interface to provide information about its time-related capabilities. Capabilities include things like canRepeat, canPlayBackwards, canPlayVariable, canHold, etc. This is mostly for future extension of the timing functionality and for optimization of media playback/rendering.

References

[draft-ietf-avt-rtp-mime-00]
"MIME Type Registration of RTP Payload Formats", Steve Casner and Philipp Hoschka, June 1999.
Available at ftp://ftpeng.cisco.com/casner/outgoing/draft-ietf-avt-rtp-mime-00.txt.
[RFC1889]
"RTP: A Transport Protocol for Real-Time Applications", Henning Schulzrinne, Steve Casner, Ron Frederick and Van Jacobson, January 1996. Available at ftp://ftp.isi.edu/in-notes/rfc1889.txt.
[RFC1890]
" RTP Profile for Audio and Video Conferences with Minimal Control", Henning Schulzrinne, January 1996.
Available at ftp://ftp.isi.edu/in-notes/rfc1890.txt.
[RFC2326]
"Real Time Streaming Protocol (RTSP)", Henning Schulzrinne, Anup Rao and Rob Lanphier, April 1998. Available at ftp://ftp.isi.edu/in-notes/rfc2326.txt.
[RFC2327]
"SDP: Session Description Protocol", M. Handley, V. Jacobson, April 1998. Available at ftp://ftp.isi.edu/in-notes/rfc2327.txt.
[SMIL]
"Synchronized Multimedia Integration Language (SMIL) 1.0 Specification", Philipp Hoschka, editor, 15 June 1998. Available at http://www.w3.org/TR/REC-smil.
[XLINK]
"XML Linking Language (XLink) V1.0", Eve Maler and Steve DeRose, editors, 3 March 1998. Available at http://www.w3.org/TR/WD-xlink.
[XML]
"Extensible Markup Language (XML) 1.0", Tim Bray, Jean Paoli and C. M. Sperberg-McQueen, editors, 10 February 1998. Available at http://www.w3.org/TR/REC-xml.

L. SMIL Timing and Synchronization

Previous version: