O. Synchronized Multimedia Integration Language (SMIL) Document Object Model

Previous version:
http://www.w3.org/AudioVideo/Group/DOM/smil-dom-990721.html (W3C members only)
Patrick Schmitz (Microsoft),
Jin Yu (Compaq),
Nabil Layaïda (INRIA)


This is a working draft of a Document Object Model (DOM) specification for synchronized multimedia functionality. It is part of work in the Synchronized Multimedia Working Group (SYMM) towards a next version of the SMIL language and SMIL modules.  Related documents describe the specific application of this SMIL DOM for SMIL documents and for HTML and XML documents that integrate SMIL functionality.  The SMIL DOM builds upon the Core DOM functionality, adding support for timing and synchronization, media integration and other extensions to support synchronized multimedia documents.

Table of Contents

1.0 Introduction

The first W3C Working Group on Synchronized Multimedia (SYMM) developed SMIL - Synchronized Multimedia Integration Language. This XML-based language is used to express synchronization relationships among media elements. SMIL 1.0 documents describe multimedia presentations that can be played in SMIL-conformant viewers.

SMIL 1.0 did not define a Document Object Model.  Because SMIL is XML based, the basic functionality defined by the Core DOM is available.  However, just as HTML and CSS have defined DOM interfaces to make it easier to manipulate these document types, there is a need to define a specific DOM interface for SMIL functionality.  The current SYMM charter includes a deliverable for a SMIL-specific DOM to address this need, and this document specifies the SMIL DOM interfaces.

Broadly defined, the SMIL DOM is an Application Programming Interface (API) for SMIL documents and XML/HTML documents that integrate SMIL functionality.  It defines the logical structure of documents and the way a document is accessed and manipulated.  This is described more completely in "What is the Document Object Model".

The SMIL DOM will be based upon the DOM Level 1 Core functionality.  This describes a set of objects and interfaces for accessing and manipulating document objects.  The SMIL DOM will also include the additional event interfaces described in the DOM Level 2 Events specification.  The SMIL DOM extends these interfaces to describe elements, attributes, methods and events specific to SMIL functionality. Note that the SMIL DOM does not include support for DOM Level 2 Namespaces, Stylesheets, CSS, Filters and Iterators, and Model Range specifications.

The SYMM Working Group is also working towards a modularization of SMIL functionality, to better support integration with HTML and XML applications.  Accordingly, the SMIL DOM is defined in terms of the SMIL modules.

2.0 Requirements

The design and specification of the SMIL DOM must meet the following set of requirements.

General requirements:

SMIL specific requirements

It is not yet clear what all the requirements on the SMIL DOM will be related to the modularization of SMIL functionality.  While the HTML Working Group is also working on modularization of XHTML, a modularized HTML DOM is yet to be defined.  In addition, there is no general mechanism yet defined for combining DOM modules for a particular profile.

3.0 Core DOM: the SMIL DOM Foundation

The SMIL DOM has as its foundation the Core DOM.  The SMIL DOM includes the support defined in the DOM Level 1 Core API, and the DOM Level 2 Events API.

3.1 DOM Level 1 Core

The DOM Level 1 Core API describes the general functionality needed to manipulate hierarchical document structures, elements and attributes.  The SMIL DOM describes functionality that is associated with or depends upon SMIL elements and attributes.  Where practical, we would like to simply inherit functionality that is already defined in the DOM Level 1 Core.  Nevertheless, we want to present an API that is easy to use, and familiar to script authors that work with the HTML and CSS DOM definitions.

Following the pattern of the HTML DOM, the SMIL DOM follows a naming convention for properties, methods, events, collections and data types.  All names are defined as one or more English words concatenated together to form a single string. The property or method name starts with the initial keyword in lowercase, and each subsequent word starts with a capital letter. For example, a method that converts a time on an element local timeline to global document time might be called "localToGlobalTime".

Properties and methods

In the ECMAScript binding, properties are exposed as properties of a given object. In Java, properties are exposed with get and set methods.

Most of the properties are directly associated with attributes defined in the SMIL syntax.  By the same token, most (or all?) of the attributes defined in the SMIL syntax are reflected as properties in the SMIL DOM.  There are also additional properties in the DOM that present aspects of SMIL semantics (such as the current position on a timeline).

The SMIL DOM methods support functionality that is directly associated with SMIL functionality (such as control of an element timeline).

Note that the naming follows the DOM standard for XML, HTML and CSS DOM.  This matches the HTML attribute naming scheme, but is on conflict with the SMIL 1.0 (and CSS) attribute naming conventions (all-lower with dashes between words).  Given that the DOM Level 2 CSS API follows the primary DOM naming conventions, I think we should as well.  Although this presents a naming conflict with the SMIL attributes (unless we reconsider attribute naming in the next version of SMIL), it presents a consistent DOM API.

Constraints on Core interfaces

In some instances, the SMIL DOM defines constraints on the Level 1 Core interfaces.  These are introduced to simplify the SMIL associated runtime engines.  The constraints include:

These constraints are defined in detail below.

This section will need to be reworked once we have a better handle on the approach we take (w.r.t. modality, etc.) and the details of the interfaces.

We probably also want to include notes on the recent discussion of a presentation or runtime object model as distinct from the DOM.

3.2 DOM Level 2 Event Model

One of the goals of DOM Level 2 Event Model is the design of a generic event system which allows registration of event handlers, describes event flow through a tree structure, and provides basic contextual information for each event. The SMIL event model includes the definition of a standard set of events for synchronization control and presentation change notifications, a means of defining new events dynamically, and the defined contextual information for these events.

3.2.1 SMIL and DOM events

The DOM Level 2 Events specification currently defines a base Event interface and three broad event classifications:

In HTML documents, elements generally behave in a passive (or sometimes reactive) manner, with most events being user-driven (mouse and keyboard events). In SMIL, all timed elements behave in a more active manner, with many events being content-driven. Events are generated for key points or state on the element timeline (at the beginning, at the end and when the element repeats). Media elements generate additional events associated with the synchronization management of the media itself.

The SMIL DOM makes use of the general UI and mutation events, and also defines new event types, including:

Some runtime platforms will also define new UI events, e.g. associated with a control unit for web-enhanced television (e.g. channel change and simple focus navigation events). In addition, media players within a runtime may also define specific events related to the media player (e.g. low memory).

The SMIL events are grouped into four classifications:

Static SMIL events
This is a group of events that are required for SMIL functionality. Some of the events have more general utility, while others are specific to SMIL modules and associated documents (SMIL documents as well as HTML and XML documents that integrate SMIL modules).
Platform and environment specific events
These events are not defined in the specification, but may be created and raised by the runtime environment, and may be referenced by the SMIL syntax.
Author-defined events
This is a very important class of events that are not specifically defined in the DOM, but that must be supported for some common use-case scenarios.  A common example is that of broadcast or streaming media with embedded triggers.  Currently, a media player exposes these triggers by calling script on the page.  To support purely declarative content, and to support a cleaner model for script integration, we allow elements to raise events associated with these stream triggers.  The events are identified by names defined by the author (e.g. "onBillWaves" or "onScene2").   Declarative syntax can bind to these events, so that some content can begin (or simply appear) when the event is raised. This is very important for things like Enhanced Television profiles, Enhanced DVD profiles, etc.
This functionality is built upon the DOM Level 2 Events specification.
Property mutation events
These are mutation events as defined in the DOM Level 2 Events specification.  These events are raised when a particular property is changed (either externally via the API, or via internal mechanisms).

3.2.2 Event propagation support

In addition to defining the basic event types, the DOM Level 2 Events specification describes event flow and mechanisms to manipulate the event flow, including:

The SMIL DOM defines the behavior of Event capture, bubbling and cancellation in the context of SMIL and SMIL-integrated Documents.

In the HTML DOM, events originate from within the DOM implementation, in response to user interaction (e.g. mouse actions), to document changes or to some runtime state (e.g. document parsing).  The DOM provides methods to register interest in an event, and to control event capture and bubbling.  In particular, events can be handled locally at the target node or centrally at a particular node. This support is included in the SMIL DOM. Thus, for example, synchronization or media events can be handled locally on an element, or re-routed (via the bubbling mechanisms) to a parent element or even the document root. Event registrants can handle events locally or centrally.

Note: It is currently not resolved precisely how event flow (dispatch, bubbling, etc.) will be defined for SMIL timing events.  Especially when the timing containment graph is orthogonal to the content structure (e.g. in XML/SMIL integrated documents), it may make more sense to define timing event flow relative to the timing containment graph, rather than the content containment graph.  This may also cause problems, as different event types will behave in very different ways within the same document.

Note: It is currently not resolved precisely how certain user interface events (e.g. onmouseover, onmouseout) will be defined and will behave for SMIL documents. It may make more sense to define these events relative to the regions and layout model, rather than the timing graph.

4.0 Constraints imposed upon DOM

We have found that the DOM has utility in a number of scenarios, and that these scenarios have differing requirements and constraints.  In particular, we find that editing application scenarios require specific support that the browser or runtime environment typically does not require.  We have identified the following requirements that are directly associated with support for editing application scenarios as distinct from runtime or playback scenarios:

4.1 Document modality

Due to the time-varying behavior of SMIL and SMIL-integrated document types, we need to be able to impose different constraints upon the model depending upon whether the environment is editing or browsing/playing back.  As such, we need  to introduce the notion of modality to the DOM (and perhaps more generally to XML documents).  We need a means of defining modes, of associating a mode with a document, and of querying the current document mode.

We are still considering the details, but it has been proposed to specify an active mode that is most commonly associated with browsers, and a non-active or editing mode that would be associated with an editing tool when the author is manipulating the document structure.

4.2 Node locking

Associated with the requirement for modality is a need to represent a lock or read-only qualification on various elements and attributes, dependent upon the current document mode.

For an example that illustrates this need within the SMIL DOM: To simplify runtime engines, we want to disallow certain changes to the timing structure in an active document mode (e.g. to preclude certain structural changes or to make some properties read-only).  However when editing the document, we do not want to impose these restrictions. It is a natural requirement of editing that the document structure and properties be mutable. We would like to represent this explicitly in the DOM specification.

There is currently some precedent for this in HTML browsers.  E.g. within Microsoft Internet Explorer, some element structures (such as tables) cannot be manipulated while they are being parsed.  Also, many script authors implicitly define a "loading" modality by associating script with the document.onLoad event. While this mechanism serves authors well, it nevertheless underscores the need for a generalized model for document modality.

4.3 Grouped, atomic changes

A related requirement to modality support is the need for a simplified transaction  model for the DOM. This would allow us to make a set of logically grouped manipulations to the DOM, deferring all mutation events and related notification until the atomic group is completed.  We specifically do not foresee the need for a DBMS-style transaction model that includes rollback and advanced transaction functionality.  We are prepared to specify a simplified model for the atomic changes.  For example, if any error occurs at a step in an atomic change group, the atomicity can be broken at that point.

As an example of our related requirements, we will require support to optimize the propagation of changes to the time-graph modeled by the DOM.  A typical operation when editing a timeline shortens one element of a timeline by trimming material from the beginning of the element.  The associated changes to the DOM require two steps:

Typically, a timing engine will maintain a cache of the global begin and end times  for the elements in the timeline.  These caches are updated when a time that  they depend on changes.  In the above scenario, if the timeline represents a long  sequence of elements, the first change will propagate to the whole chain of  time-dependents and recalculate the cache times for all these elements.  The  second change will then propagate, recalculating the cache times again, and  restoring them to the previous value.   If the two operations could be grouped as  an atomic change, deferring the change notice, the cache mechanism will see no effective change to the end time of the original element, and so no cache update  will be required.  This can have a significant impact on the performance of an  application.

When manipulating  the DOM for a timed multimedia presentation, the efficiency and robustness of  the model will be greatly enhanced if there is a means of grouping related changes and the resulting event propagation into an atomic change.

5.0 SMIL specific extensions

In all the interfaces below, the details need discussion and review.  Do not assume that the defined types, return values or exceptions described are final.

The IDL interfaces will be moved to specific module documents once they are ready.

5.1 Document Interface

Cover document timing, document locking?, linking modality and any other document level issues. Are there issues with nested SMIL files?

Is it worth talking about different document scenarios, corresponding to differing profiles? E.g. Standalone SMIL, HTML integration, etc.

5.2 SMIL Interfaces

A separate document should describe the integrated DOM associated with SMIL documents, and documents for other document profiles (like HTML and SMIL integrations).

The SMILElement interface is the base for all SMIL element types. It follows the model of the HTMLElement in the HTML DOM, extending the base Element class to denote SMIL-specific elements.

Note that the SMILElement interface overlaps with the HTMLElement interface. In practice, an integrated document profile that include HTML and SMIL modules will effectively implement both interfaces (see also the DOM documentation discussion of Inheritance vs Flattened Views of the API).

Interface SMILElement

Base interface for all SMIL elements.

interface SMILElement : Element {   
        attribute  DOMString            id;
        // etc. This needs attention

5.2.1 Structure Elements Interface

This module includes the SMIL, HEAD and BODY elements. These elements are all represented by the core SMIL element interface.

5.2.2 Meta Elements Interface

This module includes the META element.

Interface SMILMetaElement (<meta>)
interface SMILMetaElement : SMILElement {   
        attribute  DOMString            content;
        attribute  DOMString            name;
        attribute  DOMString            skipContent;
        // Types may be wrong - review


5.2.3 Layout Interfaces

This module includes the LAYOUT, ROOT_LAYOUT and REGION elements, and associated attributes.

Interface SMILLayoutElement (<layout>)

Declares layout type for the document. See the LAYOUT element definition in SMIL 1.0

interface SMILLayoutElement : SMILElement {   
        attribute  DOMString            type;
        // Types may be wrong - review
Interface SMILRootLayoutElement (<root-layout>)

Declares layout properties for the root element. See the ROOT-LAYOUT element definition in SMIL 1.0

interface SMILRootLayoutElement : SMILElement {   
        attribute  DOMString            backgroundColor;
        attribute  long                 height;
        attribute  DOMString            skipContent;
        attribute  DOMString            title;
        attribute  long                 width;
        // Types may be wrong - review
Interface SMILRegionElement (<region>)

Controls the position, size and scaling of media object elements. See the REGION element definition in SMIL 1.0

interface SMILRegionElement : SMILElement {   
        attribute  DOMString            backgroundColor;
        attribute  DOMString            fit;
        attribute  long                 height;
        attribute  DOMString            skipContent;
        attribute  DOMString            title;
        attribute  DOMString            top;
        attribute  long                 width;
        attribute  long                 zIndex;
        // Types may be wrong - review

The layout module also includes the region attribute, used in SMIL layout to associate layout with content elements. This is represented as an individual interface, that is supported by content elements in SMIL documents (i.e. in profiles that use SMIL layout).

Interface SMILRegionInterface

Declares rendering surface for an element. See the region attribute definition in SMIL 1.0

interface SMILRegionInterface {   
        attribute  SMILRegionElement    region;

5.2.4 Timing Interfaces

This module includes the PAR and SEQ elements, and associated attributes.

This will be fleshed out as we work on the timing module. For now, we will define a time leaf interface as a placeholder for media elements. This is just an indication of one possibility - this is subject to discussion and review.

Interface SMILTimeInterface

Declares timing information for timed elements.

interface SMILTimeInterface {   
        attribute  InstantType          begin;
        attribute  InstantType          end;
        attribute  DurationType         dur;
        attribute  DOMString            repeat;
        // etc. Types may be wrong - review
            // Presentation methods
        void                            beginElement();
        void                            endElement();
        void                            pauseElement();
        void                            resumeElement();
        void                            seekElement(in InstantType seekTo);


The date value of the begin instant of this node, relative to the parent timeline.
The date value of the end instant of this node, relative to the parent timeline.
The duration value of this node.

Presentation Methods

Causes this element to begin the local timeline (subject to sync constraints).
Return Value
Causes this element to end the local timeline (subject to sync constraints).
Return Value
Causes this element to pause the local timeline (subject to sync constraints).
Return Value
Causes this element to resume a paused local timeline. If the timeline was not paused, this is a no-op.
Return Value
Seeks this element to the specified point on the local timeline (subject to sync constraints). If this is a timeline, this must seek the entire timeline (i.e. propagate to all timeChildren).
The desired position on the local timeline. 
Return Value


This event is raised when the element local timeline begins to play. It will be raised each time the element begins (but not on repeats - see the onRepeat Event). It may be raised both in the course of normal (i.e. scheduled) timeline play, as well as in the case that the element was begun with the beginElement() method. Note that if an element is not yet ready to play (e.g. if media is not ready), the onBegin event should not be raised until the element timeline actually begins to play and local time begins to advance.

As a composite timeline begins to play, each element will raise an onBegin event as it in turn begins to play. A parent element will raise an onBegin event before any child elements do.
This event is raised when the element local timeline ends play. It will be raised when the element ends (NOT on each repeat - see the onRepeat event). It may be raised both in the course of normal (i.e. scheduled) timeline play, as well as in the case that the element was ended with the endElement() method. As a composite timeline ends play, each element will raise an onEnd event as it in turn ends play.
This event is raised when the element local timeline repeats. It will be raised each time the element repeats, after the first iteration.

This event should support an integer attribute to indicate the current repeat iteration.
This event is raised when the element local timeline is paused. This is only raised when the element pauseElement() method is invoked.

When pausing a timeline, I do not think that all descendents should also raise onPause events. However, it may be useful to have media descendents raise an onPause event. This needs attention.
I think we should consider supporting a "reason" attribute on this event. This would allow authors to disambiguate a pause due to a method call, and a pause forced by the timing engine as part of handling an out-of-sync problem.
This event is raised when the element local timeline resumes after being paused. This is only raised when the element resumeElement() method is invoked, and only if the element was actually paused. When resuming a timeline, I do not think that all descendents should also raise onResume events. However, it may be useful to have media descendents raise an onResume event. This needs attention.
This event is raised when an element timeline falls out of sync (either for internal or external reasons). The default action of the timing model is to attempt to reestablish the synchronization, however the means may be implementation dependent. Depending upon the synchronization rules, this event may propagate up the time graph for each timeline that is affected.
This event is raised when an element that has fallen out of sync has been restored to the proper sync relationship with the parent timeline. This will only be raise after an onOutOfSync event has been raised. Depending upon the synchronization rules, this event may propagate down the time graph, effectively "unwinding" the original onOutOfSync stack.
Interface SMILTimelineInterface

This is a placeholder - subject to change. This represents generic timelines.

interface SMILTimelineInterface : SMILTimeInterface {   
        attribute NodeList        timeChildren;
        // Presentation methods
        NodeList                  getActiveChildrenAt();
        NodeList                  getActiveChildrenAt(
                                       in instant InstantType instant);


A NodeList that contains all timed children of this node. If there are no timed children, this is a Nodelist containing no nodes.

Presentation Methods

Causes this element to begin the local timeline (subject to sync constraints).
instant: The desired position on the local timeline.

Return Value
NodeList: List of timed child-elements active at instant. 


Interface SMILParElement (<par>)
interface SMILParElement : SMILTimelineInterface, SMILElement {   
        attribute DOMString       endsync;
Interface SMILSeqElement (<seq>)
interface SMILSeqElement : SMILTimelineInterface, SMILElement {   


5.2.5 Media Element Interfaces

This module includes the media elements, and associated attributes. They are all currently represented by a single interface, as there are no specific attributes for individual media elements.

Interface SMILMediaInterface

Declares media content.

interface SMILMediaInterface : SMILTimeInterface {   
        attribute  DOMString            abstract;
        attribute  DOMString            alt;
        attribute  DOMString            author;
        attribute  ClipTime             clipBegin;
        attribute  ClipTime             clipEnd;
        attribute  DOMString            copyright;
        attribute  DOMString            fill;
        attribute  DOMString            longdesc;
        attribute  DOMString            src;
        attribute  DOMString            title;
        attribute  DOMString            type;
        // Types may be wrong - review
Interface SMILRefElement (<ref>)
interface SMILRefElement : SMILMediaInterface, SMILElement {
// audio, video, ...

5.2.6 Transition Interfaces

This module will include interfaces associated with transition markup. This is yet to be defined.

5.2.7 Animation Interfaces

This module will include interfaces associated with animation behaviors and markup. This is yet to be defined.

5.2.8 Linking Interfaces

This module includes interfaces for hyperlinking elements.

Interface SMILAElement (<a>)

Declares a hyperlink anchor. See the A element definition in SMIL 1.0.

interface SMILAElement : SMILElement {   
            attribute  DOMString            title;
            attribute  DOMString            href;
            attribute  DOMString            show;
        // needs attention from the linking folks

5.2.9 Content Control Interfaces

This module includes interfaces for content control markup.

Interface SMILSwitchElement (<switch>)

Defines a block of content control. See the SWITCH element definition in SMIL 1.0

interface SMILSwitchElement : SMILElement {   
        attribute  DOMString            title;
        // and...?
Interface SMILTestInterface

Defines the test attributes interface. See the Test attributes definition in SMIL 1.0

interface SMILTestInterface {   
        attribute  DOMString            systemBitrate;
        attribute  DOMString            systemCaptions;
        attribute  DOMString            systemLanguage;
        attribute  DOMString            systemOverdubOrCaption;
        attribute  DOMString            systemRequired;
        attribute  DOMString            systemScreenSize;
        attribute  DOMString            systemScreenDepth;
        // and...?

5.3 Media Player Interfaces

This is NOT a plug-in interface, but rather a simple interface that describes some guaranteed methods that any application plug-in interface must support. This provides a means of standardizing extensions to the timing model, independent of the specific application.
5.3.1 Media Player Level 1 Interface
5.3.2 Media Player Level 2 Interface
5.3.3 Media Player Level 3 Interface

6.0 References

"Document Object Model (DOM) Level 1 Specification"

Available at http://www.w3.org/TR/REC-DOM-Level-1/.
"Document Object Model Events", T. Pixley, C. Wilson

Available at http://www.w3.org/TR/WD-DOM-Level-2/events.html.
"Document Object Model Requirements for Synchronized Multimedia", P. Schmitz.

Available at http://www.w3.org/AudioVideo/Group/DOM/DOM_reqts (W3C members only).
"HTML 4.0 Specification", D. Raggett, A. Le Hors, I. Jacobs, 24 April 1998.

Available at http://www.w3.org/TR/REC-html40.
[ISO/IEC 10646]
ISO (International Organization for Standardization). ISO/IEC 10646-1993 (E). Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane. [Geneva]: International Organization for Standardization, 1993 (plus amendments AM 1 through AM 7).
"PICS 1.1 Label Distribution -- Label Syntax and Communication Protocols", 31 October 1996, T. Krauskopf, J. Miller, P. Resnick, W. Trees

Available at http://www.w3.org/TR/REC-PICS-labels-961031
"Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", N. Freed and N. Borenstein, November 1996.

Available at ftp://ftp.isi.edu/in-notes/rfc2045.txt. Note that this RFC obsoletes RFC1521, RFC1522, and RFC1590.
"Synchronized Multimedia Integration Language (SMIL) 1.0 Specification W3C Recommendation 15-June-1998 ".

Available at: http://www.w3.org/TR/REC-smil.
"Displaying SMIL Basic Layout with a CSS2 Rendering Engine".

Available at: http://www.w3.org/TR/NOTE-CSS-smil.html.
"WAI Accessibility Guidelines: User Agent", W3C Working Draft 3-July-1998.

Available at http://www.w3.org/WAI/UA/WD-WAI-USERAGENT.html
"Extensible Markup Language (XML) 1.0", T. Bray, J. Paoli, C.M. Sperberg-McQueen, editors, 10 February 1998.

Available at http://www.w3.org/TR/REC-xml