W3C Activity: Real Time Multimedia

This activity statement presents an analysis and sumary of the results of a series of meetings that were organized by W3C in this area (BOF at the WWW4 conference, Developer's day at WWW5 conference, workshop on "Real Time Multimedia and the Web"). It also proposes directions for future work. It is part of the W3C activity list.

Overview
Support for Creating Real Time Multimedia Content

Analysis of Strengths and Weaknesses
Conclusions and Basic Assumptions
Work Items

Support for Network Transmission of Real-Time Multimedia

Analysis of Strengths and Weaknesses
Conclusions and Basic Assumptions
Work Items

Products
Next Step
Achievements
Acknowledgements

Introduction

Since the Web was invented, the target audience and the content of a typical Web site has changed dramatically. Originally, the Web was designed as a means for physicists to share the results of their research. For this purpose, simple text documents with hyperlinks were sufficient. Today, Web sites are typically designed for a large consumer audience. The Web is used for catching up on the latest news from Hollywood, or for selecting the next personal computer they will buy.

This sort of content puts the Web into direct competition with traditional print media like journals or brochures. To present content on the Web in the same way as in sophisticated print media, technologies are needed that go beyond the simple hypertext of the original Web design. Many current W3C activities such as graphics, fonts, stylesheets, layout and html are working towards the goal of bringing the quality of Web publication to the level of today's print media.

Many industry analysts predict that the content of next-generation Web sites will go far beyond simulating the features of print media. They expect that Web content will become similar to the content of today's multimedia CD-ROMs or even to today's television programs. In other words, they believe that the Web will be turned into a distribution system for both interactive and continous multimedia content, or real-time multimedia content.

Typical CD-ROM products include educational courses, lexica, "virtual" art galleries or CDs of pop groups which are enhaced by text, images and video. Many CD-ROMs contain sections with continious multimedia. One example is a "guided tour". In such an application, the screen shows a sequence of different images, text and graphs, which are explained in an audio stream that is played in parallel. Other examples are videos with overlaid text (subtitles or scrolling text) or multiple videos shown in parallel on the screen.

In many respects, continious multimedia presentations are similar to many television programs. Interestingly enough, the graphic design of some television programs already employs elements of CD-ROM and computer user interfaces - windows pop up in a news broadcast, graphics appear on the screen or program items are presented in menu form. The much-heralded integration of television and computers thus seems to be already on its way, at least on the level of gaphic design of television programs.

Moreover, many television programs are actually multimedia presentations, and could be implemented by using techniques known from multimedia CD-ROMs. Consider the example of a news broadcast: many parts of the television screen contain text, images, graphs or other static elements. The content some of today's television programs could thus be transmitted by sending static information such as background images as seperate files, together with a schedule that determines when the content of these files should be displayed on the screen. Obviously, this requires much less bandwidth than sending the same content as full motion video.

Of course, television programs lack the interactivity known from Web and CD-ROM content. For example, a user cannot simply interupt a news transmission by clicking on a photograph to go and search more information about a person being interviewed.

Requirements

Making interactive and continous multimedia content available on the Web requires a solution to the following problems:

there must be a way to express continious and interactive multimedia content
the integration of audio and video media types into the Web must be improved, and other media types often used in CD-ROM content such as midi or sprite-based animation formats must be added
the difficulties of transmitting real-time multimedia over the Internet must be addressed

Current Situation

Overview

Several different communities are currently working independantly on the integration of real-time multimedia into the Internet, namely the Web community, the CD-ROM community and the community working on Internet-based audio/video-on-demand.

Lacking both a forum for discussion/standardisation and a common reference implementation, there is a danger of a plethora of non-interoperable solutions. Competing solutions developed by each of the three communities will show the strengths and weaknesses of each community. An example might be an excellent authoring tool that produces a format that is hard to transmit over the Internet. In such a situation, content providers will hesitate to produce content, and it will be difficult to reach a critcial mass of real-time multimedia content on the Internet.

Therefore, cooperation and standardisation appears to be in the best interest of the different communities working on the integration of real-time multimedia into the Internet. They can combine their respective expertise, and come up with a single, coherent solution.

W3C has members from all three communities. Many representatives of each community participated at the recent W3C workshop on "Real Time Multimedia and the Web". In the feedback we received after this event, members of all communities reported that they look to W3C as a promising forum for exchanging ideas and for finding consensus on common solutions for integrating real-time multimedia into the Web.

For producing interactive and continious multimedia content, two things need to be added to Web technology : support for creating real time multimedia content, and support for network transmission of this type of content. In the following, we analyse the strengths and weaknesses of each of the communities in each of these areas, give conclusions, describe current work items and discuss ways in which they can be addressed.

Support for Creating Real Time Multimedia Content

Analysis of Strengths and Weaknesses

Web technology

A comparison between CD-ROM products and typical Web sites reveals many similarities. On CD-ROMs, the user navigates through the multimedia content using "point-and-click". On the Web, this sort of interface is achieved by using hyperlinks (URLs), possibly associated with image maps. Image and text media types are of equal importance for both Web and CD-ROM content. A point that is much criticized by CD-ROM designers is layout control on the Web. However, formats for layout control on the Web are currently emerging (e.g. the "Layout" tag introduced by Netscape or work done in the W3C Activity on Style Sheets).

Web technology is limited today when it comes to creating continous multimedia presentations. For these applications, content authors need to express things like "five into the presentation, show image X and keep it on the screen for ten seconds". More generally speaking, there must be a way to describe the synchronization between the different media that make up a continous multimedia presentation.

On the Web, media synchronisation is currently expressed by using a scripting language such as JavasScript or VisualBasic. However, scripting has a set of well-known disadvantages. Script-based content is often hard to produce and maintain. Moreover, it is hard to build search engines and other automated tools for scripting languages.

To address these disadvantages, CD-ROM technology uses declarative formats such as Apple's Quicktime as an alternative approach to well-known scripting languages such as Macromedia's Lingo or Apple's Hypercard. In a declarative language, the "events" in a multimedia presentation are simply expressed by using a timeline, instead of writing a script program. Both approaches have succeeded in the CD-ROM marketplace, and are often used side-by-side in a particular multimedia product.

Thus, experience from the CD-ROM community suggests that it would be beneficial to adopt a declarative format for expressing media synchronisation on the Web as an alternative to scripting. After all, HTML is a declarative language, and so is the stylesheets language CSS. HTML content could equally well be programmed in a scripting language. This is not happening, because it has the same disadvantages as using scripting for expressing media synchronisations.

CD-ROM Technology

CD-ROM content providers are interested in making their content available on the Internet for several reasons. First, the Internet largely facilitates updating content Therefore, more up-to-date content can be offered via the Internet than on a CD-ROM. Moreover, distribution of content via the Internet is cheaper than distributing CD-ROMs. Also, consumers that use CD-ROMs do not need to buy a new end-user appliance for receiving the same content via the Internet. Both technologies require the same end-user appliance, namely a personal computer. Lost revenue of CD-ROM sales can be recovered ither by using an advertisment-based revenue model, or by emerging technologies for electronic payment (see W3C Joint Electronic Payment Initiative).

One approach that has been followed for providing CD-ROM content on the Internet is to package an existing format so that it can be transmitted over the net. Repackaging a CD-ROM format has the advantage that existing authoring tools can be reused to also create content for the Web. The only Web technology commonly reused in most of these approaches is URL-based addressing of Web content.

One disadvantage of this approach is that most of the CD-ROM formats are proprietary. In contrast, the Web community is already using open formats that can duplicate some of the functionality of CD-ROM formats (image maps, html, URLs), and is working on formats that will bring the capabilities of the Web even clser to those of CD-ROM formats (layout control, stylesheets, fonts, scripting). More sophisticated authoring tools for Web content are also emerging.

Another advantage of Web content is that it is relatively easy to locate information on a particular subject using search engines. Moreover, the Web allows content to be stored on many different servers distributed over the whole Internet. This facilitates the creation of new content by reusing existing content. Two of the most successful Web applications - search engines and Web directories - work because today's Web content is reusable.

Taking these factors together with the forthcoming high-speed Internet links, it seems likely that Web technology will assimilate proprietary CD-ROM content in the near future.

Internet-Based Audio/Video-On-Demand

In the last two years, applications allowing to retrieve audio and video content over the Internet have become quite popular (e.g. RealAudio and VDOLive). These applications are currently being extended to also "stream" other media types such as text and images, and not only audio and video. Examples are Microsoft's NetShow, Netscape's MediaServer and work at Progressive Networks on enhancing their RealAudio product.

These applications require a format for synchronising different media types contained in a presentation, and standard formats for basic data types such as text and images. Work has already started on designing application-specific formats. However, it appears more attractive to reuse existing solutions from the Web community for text, images and layout, and solutoins from the CD-ROM community for expressing media synchronisation. This allows to leverage existing authoring tools and existing content.

Conclusions and Basic Assumptions

In the area of support for real-time multimedia content, both the Web comunity and the community working on Internet-Based Audio/Video-On-Demand can profit from the experiences gained by the CD-ROM community.

An overlap already exists between the functionality found in Web technology and the functionality found in real-time multimedia technology. Therefore, a reasonable working hypothesis for W3C work in this area is that real-time multimedia will be integrated into the Web by using extensions and additions to the basic Web content technologies for producing content, such as html , JPEG, GIF and PNG images, image maps and URLs.

Other groups working on real-time multimedia make quite different assumptions. For instance, the Java community uses Java byte-code as distribution format for real-time multimedia content, and Java-code generators for producing this content. DAVIC (Digital Audio Video Council) - a standards body for interactive TV - uses MHEG-5 as their base format, and uses only a subset of HTML for implementing hypertext functionality.

Work Items

Simple audio/video streams are generally integrated into a Web page by means of a plug-in. These streams can be presented in different ways. For example, audio can be used as background, e.g. in form of a continious "loop", or the user can be given control over the replay in form of a user interface containing a number of buttons labelled "play", "rewind", and so forth. At the moment, there is no standardized way to control audio/video replay when writing a Web page referencing an audio resource - parameter names and values may differ for each plug-in. One possible approach is defining standard parameters for the "object" tag. More generally, a standard API for letting a web content author control audio/video replay could be provided (see W3C Activity on Distributed Objects and Mobile Code).

Audio and video presentations can be enhanced by giving the user the opportunity to download additional text-based information on the audio/video outpout at a given point in time. An example seen in some academic and commercial products is a video that displays a rectangle surrounding a certain object. Clicking into the rectangle results in following the link associated with the respective object.

Addressing Subparts of Audio/Video Files via URL's

In many applications, it is useful to have random access to large audio and video files in a similar to the "edit-lists" used in audio/video editing systems. This requires that subparts (or "clips") of audio/video files must be addressable. On the Web, such addresses could be constructed by using URLs that describe a time range, e.g. "the clip in audio file F starting at minute 5 and ending at minute 10".

Many companies competing in the market for audio/video tools are currently differentiating their products by using proprietary codecs. Existing standard audio and video formats were not originally designed for the Internet, and thus do not cope well with its specific problems (e.g. packet loss, variable bandwidth). Allowing competition to find a good solution in this area appears beneficial, and a standard could be premature. However, it would still be useful to establish one or several common fallback-codecs that could be used to achieve interoperability between tools of competing vendors. Moreover, "internet-suitable" audio/video codecs could be developed following a similar model as for the PNG image format. In this case, n independent group of developers chose W3C as the organisation for distributing and maintaining their "patent-free" standard.

Standard Formats for other CD-ROM Data Formats

This includes for example formats for synthesized sound like midi and formats for sprite-based animation.

Support for Network Transmission of Real-Time Multimedia

Analysis of Strengths and Weaknesses

Web Technology

The current transport protocol for web content, HTTP, is not the first choice when it comes to transporting audio and video content. This is due to the fact that HTTP delivers content via TCP. For numerous reasions, it is difficult to deliver audio and video content over TCP without degrading the user experience significantly. Given the importance of audio and video for continious multimedia applications, support for a protocol that has been designed with the transport of audio and video in mind is required.

A current trend on the Internet are "push-based" applications like Pointcast. In these applications, a collection of Web-like pages is sent out on "channels". Rather than retrieving each page explicitly, consumers switch into a channel and receive the whole collection of content without further interaction, similar to a TV program.

For this sort of "push" or "broadcast" applications that transmitt to large audiences, much experience exists in parts of the community that developed streaming of audio and video content over the Internet. IP multicast can solve many of the network problems occuring in such an application, and is an attractive alternative to establishing and maintaining a seperate network of "repeaters" or "reflectors" for each particular broadcast application.

CD-Rom Technology

A difference between the Web and CD-ROM technolgy is the medium over which content is transmitted. The bandwidth delivered by a CD-ROM drive is much higher than the average bandwidth in wide parts of the current Internet. Moreover, the Internet is a harsh environment for transmitting real-time data streams. Its "best-effort" service implies that there is no guaranteed bandwidth, that the transmission delay of a packet can vary arbitrarily and that packets can be lost in transmission.

However, the problem of using the Internet may not be as serious as it seems at first sight. The bandwidth limitation appears solvable. Even today, many CD-ROM applications do not require continious access to the CD-ROM drive. Moreover, CD-ROM designers are already experienced in designing their content around bandwidth limitations, using techniques such as prefetching of content. Higher Internet bandwidth is becoming availaibe in the form of IP access over television cable networks or satellites. Finally, the community working on streaming of audio and video over the Interent has developed solutions to overcome the problems of variable transmission delay (playout buffers), packet loss (forward error correction) and variable bandwidth (adaptive coding).

Internet-Based Audio/Video-On-Demand

Web-based audio/video-on-demand, i.e. access to stored audio and video resources, became much more practical due to the advent of "streaming protocols" which replace the standard Web transport protocol http when audio and video data must be transferred. Many technologies developed in this area can be reused to support the transmission of real-time multimedia over the Internet.

Conclusions and Basic Assumptions

In the area of support for real-time transporting multimedia content over the Internet, both the Web comunity and the CD-ROM community can profit from the experiences gained by the community working on Internet-Based Audio/Video-On-Demand.

Given that Web content is primarily transported over the Internet, W3C work on support for transmitting real-time multimedia concentrates on this network. Other groups working on real-time multimedia follow different assumptions. For instance, DAVIC recommendations are primarily targeted to distribution networks based on ATM and MPEG system streams distributed via cable and satellites systems for delivering interactive TV content.

Moreover, given the difficulties of changing the basic Internet infrastructure, it seems safe not to base solutions on novel techniques such as the bandwidth reservation protocols RSVP or IPv6. An exception is IP multicasting, which seems to be close to being used in commercial products, as indicated by the fact that both Microsoft's NetShow and Progressive Network's extensions to RealAudio allow content multicasting, and by the recent announcement of multicast support by Microsoft Network.

Work Items

Real-Time Streaming Protocol (RTSP)
The Real-Time Streaming Protocol is urrently being discussed by the IETF. Originally proposed by Progressive Networks and Netscape, the RTSP version (RTSP' or "RTSP prime") that is currently most likely to be accepted by the IETF shares many properties with HTTP. The motivation is to reuse technology that has been developed for HTTP (caching of content, authenticaction, encryption, PICS, JEPI) when accessing real-time multimedia content. W3C is thus carefully tracking the development of RTSP, for example to evaluate the applicability of PEP to this protocol.
RTSP' provides solutions for two problems of transmitting real-time multimedia content over the Internet:

The IETF has developed RTP (Real Time Transport Protocol) as the standard for carrying data for real-time multimedia applications over the Internet. This protocol is also used to trasnport audio and video in the H.323 conferencing standard.

One of the design principles behind this protocol is that real-time data should be split into packets in such a way that each packet can be processed independantly by the receiver application (application-level framing). This greatly facilitates synchronizing real-time multimedia streams when there is packet loss on the Internet. As an example, consider transmitting an html page that is synchronized with an audio stream. With application-level framing, packet loss in the html transmission will not lead to an interuption in the real-time output, but only to a "hole" in the displayed HTML page.

Packetization schemes for HTML that allow displaying html using application level framing techniques can be derived from similar techniques that have been developed for shared editing of SGML documents. W3C has recently received a concrete proposal for solving this problem, which will very probably be turned into a W3C note.
A similar packetization scheme is required for GIF images. For JPEG images, it should be possible to reuse the existing RTP payload format for MJPEG ("moving JPEG").

Multicasting Web Content

Many Web sites that distribute popular, frequently updated content experience problems with "flash crowds". Thousands of accesses per minute to the same Web page overload the Web server or the network link leading to the server. This problem was first observed at NASA, where the server providing information and images during a space shuttle mission is often completely overwhelmed by the demand.
One solution that has been proposed to alleviate this problem is to multicast very popular content to an infrastructure of caches (see the W3C Activity on Replication and Caching). Problems with this approach are setting up the necessary cache infrastructure, and determining to which caches the content should be sent.
The broadcast model of television channels suggests a different approach, namely delivering popular content directly to an end-user's browser via multicast. Prototype applications for this purpose already exist, e.g. mMosaic or Shared Mosaic. These applications are for example useful when multicasting a conference presentation to a whole group of remote listeners, where the speaker has prepared his slides using html. However, they can also be useful for transmitting stored content, e.g. using techniques developed in near-video-on-demand applications.
For a perfect solution, this approach requires a reliable multicast protocol. However, the IETF has decided that standardisation of reliable multicast protocols for the Internet is premature. It has assigned a research group to this problem (in the IRTF (Internet Research Task Force)). signalling that the installation of reliable multicast protocols on the Internet without thorough study is not desirable, since wrongly designed protocols can lead to catastrophic congestion breakdowns of the Internet.
Nevertheless, there are uses of multicast on the Web for which unreliable multicast is sufficient, and application level framing for web data formats facilitate these applications. Moreover, there is market demand for these solutions, and at least one concrete technical proposals have been made. On member request, W3C is considering to set up a mailing list for discussing the use of multicast, and in particular of unreliable multicast for the Web. This is a first step towards for setting up a working group in the IETF. Further involvement of W3C in this area will be decided upon at a future point in time.

Products

Potential W3C products in the area of real-time multimedia are:

W3C recommendation(s) for a declarative format that allows describing the synchronisation of different real-time multimedia streams. This should be addressed by creating a W3C working group.
W3C recommendation(s) for base formats used in real-time multimedia applications (audio, video, synthesized audio, sprite-based animation, ...). Due to a current lack of W3C resources, this should be addressed by requesting and adopting submissions from outside groups.
W3C recommendation(s) for application level framing schemes for Web data formats. These could be turned into IETF recommendations (RTP payload format definitions) at a later point.
W3C reference code for a server and client for real-time multimedia Web content. This could be achieved by extending the Jigsaw server and the Amaya client with the RTP standard protocol, the RTSP protocol , and the declarative synchronisation format to be designed by a W3C working group. The achievability of this goal with current W3C resources must still be evaluated.

Next Step

"Call for Interest" for declarative format that allows describing the synchronisation of different real-time multimedia streams. Interest in W3C taking on this work has been expressed by many participants of the W3C workshop on "Real- Time Multimedia and the Web". This included members of all of the concerned communities identified in this paper (Web, CD-ROM, Audio/Video-on-Demand). Tackling this task within a W3C working group is appropriate for two reasons: first, the IETF is traditionally reluctant to standardize data formats. Second, several W3C members plan to ship products soon. Therefore, a quick solution is required, which can be achieved by the proven speediness of the W3C recommendation process.

Achievements

Workshop "Real Time Multimedia and the Web" (RTMW '96), Sophia-Antipolis, October 24/25 '96
Presentation at RTMW '96
Created mailing list www-multimedia@w3.org for discussing all aspects of real-time multimedia on the Web. The list's archive can be found at http://www.w3.org/pub/WWW/Archives/Public/www-multimedia/.
The list is open to all people interested.
To subscribe:

compose a message with "subscribe" in the "subject" line
send it to www-multimedia-request@w3.org.

Presentation at Advisory Committee Meeting, Boston, June '96
CannesCast '96 (http://www.inria.fr/rodeo/personnel/hoschka/CannesCast96.html) - broadcast of videos from the Cannes Film Festival on the MBone
Organisation of developer's day session "Real Time" (http://www.w3.org/pub/Conferences/WWW5/fich_html/dday/realtime.html), 5th WWW conference, Paris, May '96
Tutorial "Sound and Video on the Web" (http://www.inria.fr/rodeo/personnel/hoschka/WWW5tutorial.ps.gz), 5th WWW conference, Paris, May '96
Article "Integration of Real-Time Multimedia into the Web" (http:/www.inria.fr/rodeo/personnel/hoschka/ercim.html) in special issue on WWW of ERCIM news
Birds of a Feather session "Towards a Real-Time Multimedia Web", 4th WWW conference, Boston, December '95 (Minutes: http://www.inria.fr/rodeo/personnel/hoschka/bof/minutes.html)

Acknowledgements

Many of the ideas contained in this document are based on presentations and discussions at the W3C workshop on "Real-time multimedia and the Web". Special thanks goes to the workshop participants that replied to our requrest and gave us extensive written feedback on the future directions of W3C work:

Marc Kaufman, Adobe
Peter Hoddie, Apple
Igor Faynberg, ATT Bell Labs/Lucent
Jean-Pierre Henot, CCETT
Henning Schulzrinne, Columbia University
Nikals Hanberger and Ola Carlvik, Ericsson
Patrick Soquet, Havas Edition Multimedia
Mark Handley, ISI/MIT
Peter Parnes, LUTH
Kimmo Djupsjobacka, NOKIA
Rob Lanphier, Progressive Networks/Real Audio
Randa El-Marakakby, University of Lancaster

Enter your e-mail address to receive e-mail (courtesy of NetMind) when this page is updated

W3C team at INRIA, Ed. Philipp Hoschka 960531 Webmaster
Created May 1996 , Last Change: January 13, 1997

W3C Activity: Real Time Multimedia

Table of Content

Introduction

Requirements

Current Situation

Overview

Support for Creating Real Time Multimedia Content

Analysis of Strengths and Weaknesses

Web technology

CD-ROM Technology

Internet-Based Audio/Video-On-Demand

Conclusions and Basic Assumptions

Work Items

Declarative Format for Time Representation and Media Synchronisation

Controling Audio/Video Replay

Embedding URL's into Audio/Video Streams

A Common Fallback-Format for Audio/Video Data

Standard Formats for other CD-ROM Data Formats

Support for Network Transmission of Real-Time Multimedia

Analysis of Strengths and Weaknesses

Web Technology

CD-Rom Technology

Internet-Based Audio/Video-On-Demand

Conclusions and Basic Assumptions

Work Items

Products

Next Step

Achievements

Acknowledgements