A Framework for Interactive Television
Based on Internet Standards

George Backer, Tele-Communications Inc.
Jerry Bennington, Cable Television Laboratories, Inc.
Jonathan Boltax, NBC
Kevin Gage, Warner Bros.
Miguel Garcia, CNN Interactive
Mike Richmond, Intel Corporation
Jeff Scherb, Tribune Company
Mark Vickers, Network Computer, Inc.
Scott Watson, Disney
Dan Zigmond, Microsoft Corporation & WebTV Networks, Inc.

 

Introduction

The World-Wide Web has seen such phenomenal growth in recent years in part because its architecture is based on a handful of key protocols all of which have evolved over time through the open standards process. This process has allowed the Web community to learn from experience, and the underlying protocols have proven to be sufficiently flexible and extensible to facilitate tremendous technical innovation.

Several companies have been collaborating over the past months on a framework for supporting interactive television content based largely on the existing Internet standards. The goal has been to create a platform that can be supported across all television environments (analog or digital; cable, satellite, or terrestrial broadcast), and which leverages the huge base of tools, media, and know-how that has developed for the Web. Although often termed an "HTML-based" solution, this framework in fact builds on a number of components:

Together, we feel these six elements provide virtually all of the services necessary to design and deliver compelling interactive television content. Only a handful of elements of innovation appear to be required:

We are in the process of designing solutions for each of these four problems, all of which can be blended seamlessly with existing standards. Although these do not necessarily solve every television-related problem, they provide a complete framework for initial deployments and allow for incremental extensions and refinements. This paper examines all ten of these pieces, dividing them into those related to delivery and those related to content design and presentation.

 

IP-based Delivery

The IP protocol provides a standard transport mechanism upon which to build high-level data announcement, delivery, and synchronization protocols. Multicast IP datagrams can be transmitted on bi-directional or uni-directional data links, so higher-level protocols built on IP can be used without change on a wide variety of broadcast networks. Standards for transmitting and receiving IP datagrams over virtually every variety of broadcast television network (both analog and digital) either exist today or are fast emerging from the appropriate bodies. For example, the IPVBI working group of the Internet Engineering Task Force (IETF) is in the process of publishing a specification for the transmission of IP datagrams in the vertical-blanking interval of analog television broadcasts worldwide.

With IP transport in place, we can deliver the two streams of data necessary for the interactive content. The first stream contains the HTML resources along with related meta-data. We think of this as a sort of uni-directional HTTP; like HTTP, it would use MIME-style headers to send attribute/value pairs as meta-data (for example, a "Content-Length:" header to indicate the size of a resource, "Content-Type:" to indicate the media type of the resource, or "Content-Location:" to provide a URL for the resource), as well as sending the resources themselves in binary form (and possibly compressed, as specified by the "Content-Encoding:"). Using the "multipart/related" media type, several resources could be group together in this protocol and sent as a single unit, allowing large sets of resources to be delivered with "all or nothing" semantics.

In addition to these HTTP-like properties, the new protocol would provide some degree of forward-error correction (FEC) to recover from lost or damaged data. This is especially important over broadcast television networks, which in some cases can have very high error rates. The FEC algorithm can also be defined to allow reconstruction of the data even when the packets arrive out of order. This would facilitate efficient reception of the complete content even when the client tuned to the stream in the middle of a broadcast.

The second stream contains small ECMAScript fragments that trigger actions on specific pages of HTML content on the client. Because scripts can be embedded on the pages themselves, only small fragments of script (often single method calls) need to be transmitted. These script fragments can be packaged in an appropriate syntax to distinguish them from other multicast packets, and perhaps describing the page at which they are targeted and/or other appropriate meta-data (such as an expiration date or human-readable name).

SAP/SDP notifies clients of the existence of the multicast IP addresses for specific data streams. Both the data and trigger streams described above can be announced in this way so that a client need only listen to a well-know announcement port in order to discover all available interactive programming. The SAP/SDP packets can contain fields identifying them as announcements for interactive television programming, as well as standard SDP fields specifying such things as language and time span. New fields to indicate parental guidelines for television content or to distinguish content synchronized to video from other interactive enhancements may also be useful.

 

HTML-based Design

HTML and the related standards that comprise most Web content (ie, image formats, style sheets, etc.) provide a thorough framework for describing the presentation of multimedia content. In keeping with the HTML model of using URIs to reference all multimedia data, we require two new URI schemes: one for television broadcasts and one for other content received over a uni-directional link.

The URI scheme "tv:" has already been proposed in draft form to the IETF for television broadcasts. Although this proposed scheme includes mechanisms for tuning to specific television broadcasts based on station call letters, network name, or channel number, a simpler version might be appropriate initially that defines only the basic URI "tv:" (ie, with no path specified after the scheme) and refers specifically to the "current" broadcast. Because the HTML content containing this URI is itself received on a specific broadcast channel, the "current" broadcast is generally well defined.

A URI scheme is needed to refer to content received over a uni-directional link that cannot be retrieved on-demand. This is essentially a location-independent local name for the data. It is guaranteed to refer only to a specific resource if the resource is present locally, but provides no indication of how to obtain the resource if it is not present. One approach to such a scheme is to create a unique namespaces for content based on globally unique identifiers (GUIDs), and to use shorter names relative to that namespace. Using the relative URI syntax, this would allow short, human-readable names to be used to refer to content from within other content, while still preventing name collision between interactive programs. When resources are also available via HTTP (or FTP or a similar protocols), a traditional "http:" (or "ftp:", etc.) URI can be used to indicate this.

With these two URI schemes in place, television broadcasts and other unicast content become first-class objects within the HTML model. For example, other resources (like images or text) can be overlaid on the video using the usual HTML overlay mechanisms. All that is needed, then, to create truly interactive television-based content is a way of manipulating objects within HTML. ECMAScript and the Document Object Model provide this capability. Using these standards (as embodied in products like JScript™ and JavaScript™), embedded television broadcasts can be resized on the fly, for example, or overlays on top of the video can be added, removed, or moved as appropriate.

 

Conclusions

Existing Internet standards provide almost all the pieces necessary to build a powerful, flexible, and extensible framework for interactive television. The few small areas where new specifications are needed are well bounded, and the new pieces can be and are being designed to fit seamlessly with the existing protocols.

We believe that an approach to interactive television based on Internet standards will ultimately prove to be the most fruitful. In addition to leveraging the incredible investment already made in the Web itself, this framework can grow as the underlying technologies grow, naturally supporting new networks, media, and protocols as they are developed.

 

Acknowledgments

Significant contributions to this paper were made by Lee Acton (Microsoft Corporation), Dean Blackketter (WebTV Networks, Inc.), Wayne Carr (Intel Corporation), and David Mott (Network Computer, Inc.).

 

References