MINUTES

World Wide Web Consortium
Workshop
"Real Time Multimedia and the Web"
(RTMW '96)

October 24-25, 1996
INRIA-Sophia Antipolis - France


Table of Contents


Thursday, October 24, 1996

Minutes (unless otherwise noted): Bert Bos, W3C


Welcome-Address,
Statement of Workshop Goals
(9:30-10:00)


Session 1 (10:00-11:00)

Chair: Philipp Hoschka, INRIA-W3C


Internet access to interactive multimedia documents from heterogeneous stations,
Guy Foquet, Alcatel

Q: Is it possible to negotiate a Quality of Service (QoS) ?
A: Negotiation is being worked on. Different levels of QoS are available. Negotiation will be transparent for the user.


The Interactive Future;
Robert Woodford and Jonathan Grayson, Macromedia

Speaker: Jonathan Grayson

Q: On what platforms does your software run?
A: PC, MAC, SGI, Sun

Q: What audio compression system do you use?
A: A proprietary scheme, based on asymmetric perceptive compression. It gives almost FM quality on a 14.4K modem.

Q: What is the format for the multiple resolution images?
A: It's called XRes and it's proprietary. It loads parts only on demand.

Q: How about using/creating some standards?
A: Maybe something for discussions over these two days...

Q: Can you combine parts of Macromedia software with parts of something else?
A: We have an open architecture; for example, RealAudio has a plug-in.

Q: Can you start a Shockwave presentation in the middle?
A: Yes, but it requires some in-depth knowledge of the scripting language.

Q: Is the scripting language published?
A: It's called Lingo and there are books about it. It's a text-based programming language.

Q: Can other vendors create programs that interpret Lingo?
A: People can add to Lingo for specific purposes.

Q: What protocols do you use?
A: Only standard HTTP. (In fact, we use Netscape as a back-end.)

Q: Do you use server extensions, such as DSMCC?
A: No.

Q: What is IML? Are there licenses for it?
A: Idealized Machine Level; it's an API. Netscape will incorporate it. It depends on Netscape whether everybody will be able to use it and under what conditions. It contains such things as routines for bitblt for use by Java.


Session 2 (11:30-13:00)

Chair: Patrick Soquet, HAVAS


Maja: MHEG Applications Using JAVA Applets;
Hans Werner Bitzer et al; GMD

Speaker: Klaus Hofrichter

Q: Is there overlap, or integration, between MHEG and HTML?
A: HTML is normally put inside the MHEG hypertext object. A different mapping is maybe a subject for a break-out session

Q: Instead of performing all of the operations on the server, why not do only the operations that are to be billed, and do the rest on the client?
A: We don't yet know which operations will be billed. This is an experimental set-up, partly to investigate exactly that.

Q: Why do you use CORBA?
A: We need a system that not only works on the Internet.


Lessons from Some Real-Time Hypermedia Systems;
John Buford; University of Massechusetts Lowell

Q: Does MHEG5 allow scripting?
A: MHEG3 defines a virtual machine for scripting.

Q: How successful is this scripting?
A: There is an MHEG6 under development now, which uses the Java VM instead of the MHEG3 virtual machine (minus AWT, plus MHEG5)

Q: Does that mean the Java VM will an ISO standard?
A: Yes, Sun is helping with that.

Q: Which version of HTML do you use for text objects?
A: We can use any registered MIME type.

Q: Why do you use MPEG for all network protocols?
A: MPEG was designed for all networks, not just IP.

Q: MPEG seems to have a large overhead.
A: I don't think so.

Q: MHEG approach seems to *require* specific authoring tools, since document format is impossible (binary representation) or hard (text-based representation) to edit manually. If the format would be text-based, authoring tools would not be essential. See Lingo, HTML.
A: I was never very excited by the ASN.1 notation myself.

Q: What is DAVIC?
A: A consortium of companies doing standards for interactive TV, such as set-top boxes.


Network-Based Multimedia: Experiences from the CMIF Project;
Dick Bulterman; CWI

Q: When there are to be synchronized streams, don't you need to create them together?
A: No, one or both may already exist. Synchronizing is done at a separate level.

Comment from Peter Hoddie (Apple): We used statistical feedback in order to do pre-fetching, like you suggested, and it turned out to help a lot for our CD-ROM based multimedia, even more than we expected.


Session 3 (14:00-15:00)

Chair: Patrick Soquet, HAVAS


Overview of Quicktime;
Peter Hoddie; Apple

Q: Is Quicktime public?
A: Yes, apart from some of the external formats (such as Cinepack) that it supports. Apple doesn't plan to make money selling Quicktime, except maybe from implementations of Quicktime.

Q: Can one convert between an MPEG2 stream and Quicktime?
A: We have a prototype for it.

Q: Can you say something about Quicktime vs HyTime vs MHEG?
A: Quicktime can do everything they do and more. Plus, Quicktime is already working.

Q: Why do we just see Quicktime movies, and no other types of presentations?
A: Because our Windows implementation of Quicktime used to be very bad.

Q: Should Macromedia be afraid ?
Comment from Jonathan Grayson (Macromedia): We are talking to Apple about integrating out formats into Quicktime.
Hoddie: They complement each other.

Q: Is there authoring support?
A: Not as good as we like, but there is a variety of tools just arriving. We are working on tools for interactive Quicktime.

Q: Is there any dynamic adaptation to bandwidth?
A: With interactive Quicktime, that is possible.

Q: Is there support for scripting?
A: Scripting is sometimes necessary, but we try to minimize its use. The support is there.

Comment from the audience: Different authors need different tools. Some can do scripting, others can't.


Issues in Temporal Representations of Multimedia Documents;
Nabil Layaida; INRIA Grenoble

Q: Is there dynamic bandwidth negotiation?
A: The schedule is dynamically adjusted to early or late arrivals, within the given constraints, but there are no alternative tracks.

Q: There apparently is no scripting and there are no numbers either. Interesting!
A: All scheduling can be done by the system, by using a constraint solver.

Comment from audience: A declarative format is a bonus, as is a text-format, despite its larger size. Text formats allow different tools, and even no tools.

Comment from audience: Scripting also precludes conversion to other formats.


Session 4 (15:30-17:00) - Breakout Sessions


How to integrate active objects into html

Chair: Klaus Hofrichter, GMD
Minutes: Roy Platon, RAL

The discussion group identified several areas of interest:

Events

First discussion centered on 'what is an active object?'. It was agreed that these were objects which had feedback mechanisms, so that a browser view could be changed in a more dynamic way than with links.

Some ideas from MHEG could be used to identify OBJECTS, so that the browser can apply applicable methods.

Should objects be allowed to change the browser, eg. go outside there screen area or create new objects? This was an unanswered question.

Representation

How are objects represented in HTML? Here there seemed to a general ignorance of the OBJECT tag and its applicability. There was a requirement to represent objects in HTML with a general interface, which provided the structuring tools and enough common details, so that objects were not a 'black hole' to the browser.

Modularity and Conformance

Not all browsers/platforms could handle everything in an object. There was a need for profiles and negotiation mechanisms.

An object needs to be able to define a minimum set of requirements to run. This would also provide guidelines for content providers to work to. There was a 'BIG Danger' in proprietary solutions. The foundation set should be a minimal set of functions to provide functionality. More complex functions could be left to Java, Shockwave etc.

There was an some interesting approach by Microsoft in event handling. But using Visual Basic to control events was not an acceptable solution.

Summary

The key features of Active Objects are:

W3C could help in defining the main behaviour for classes of objects.


Architecture and Strategy

Chair: John Buford, University of Massachussets, Lowell
Minutes (incomplete): Philipp Hoschka, W3C


Screen-dumps from MBone transmission


Are commercial open formats possible ?

Marc Kaufman (Adobe): sure - defines how you compress audio/video, *not* how do you implement it ? You sell the latter. We nearly lost market for fonts due to this.

John Buford: Can we stay with plugin architecture ?

Marc Kaufman (Adobe): Not important, What counts is what goes over the wire. Standards are important especially for content-providers. They want to distribute their content to everyone.

?? (Oracle): Is it ok to publish a technology, but patent it ?

Marc Kaufman (Adobe): This is a question of how you plan to make revenue. Think of LZW. There are standards bodies that allow patents, but these must be available at a reasonable fee.

Buford: What do we need to change with URLs, html and http ?

Can we use html ? Does it need to be extended ?

Buford: time-based, using schedules - no streaming required - could be done. However, will be hard to pick a content model. Solution will be neither MHEG nor HyTime. It will be hard to decide between QuickTime and Macromedia.

Kimmo Djupsjobacka (Nokia): Html should allow to *control* video download or Macromedia download.

??, (General Instruments): Html should be evolved slowly. We can only support html 1.0 on settop boxes. URL's should enable *actions* when you click on them, like "buy movie now". This needs a new URL scheme.

Jonathan Grayson (Macromedia): Multimedia is not just a video playing in a window. It needs interactivity as well.

Discussion on how to "gracefully extend" html in the face of low-end hardware, and browsers that cannot deal with extended html

??, (General Instruments): Please keep low-end terminals in mind. Keep it simple ! Browsing the web using a terminal should be like using cars of different cost for looking at a scenery.

Jonathan Grayson (Macromedia): We need a very scalable language !

Kaufman (Adobe): We need a resource conditional.


Is there hope for MHEG ?

Chair: Dick Bulterman, CWI
Minutes: Philipp Hoschka, W3C (from chair's report to whole group)

W3C should not be the first one to use MHEG - interest in MHEG should be collected in a MHEG consortium


Session 5: Wild Ideas and Strong Opinions (17:15-18:45)

Chair: Chris Lilley, INRIA-W3C


Is bandwidth really a factor limiting real time Web interactivity, or are Web experts the main limitation?,
Kjell Arisland, University of Oslo

Q: You say rightly that we need vector graphics, but you say is cannot be done with mark-up. How about Postscript, isn't that mark-up?
A: I meant, it cannot be done with HTML. You'll need something that draws on a canvas. Maybe "mark-up" is not the right word. You'll need something else than the structure-oriented and text-oriented mark-up that HTML represents.

Q: How about VRML?
A: It's to heavy. We only need 2D.

Comment from the audience: HTML isn't just structure.

Q: Tcl seems to be similar to Å.
A: Yes, but we started before Tcl.

Q: Do you transmit Å source to a client?  
A: At first yes, but we are starting to use the Java VM now.

Q: What is the coordinate system?
A: Pixels; the programmer has to provide for resizing in the program itself.

Q: What does Å look like?
A: The syntax is a bit like C.

Q: Is it true that to create any picture, you need to write a program?
A: Yes.


The Edge Server - A Proposal for Internet Media Servers;
Daryl Rosen; Oracle

Comment from audience: There are better audio tools available now, that can deal with packet loss better.

Q: What is the incentive for ISP's to install an Edge server?
A: Added quality for their customers.

Comment from audience: Also, it may mean a lower cost to the ISP itself in terms of the lines it has to rent.

Q: What protocol is used between the plug-in and the Edge server?
A: We don't really care (?)

Q: How does a client find the nearest Edge server?
A: That problem hasn't been solved yet. It may be pre-installed in the software you get from your ISP.

Q: Will a single Edge server be large enough? How does its content get updated? How about copyright and charging for movies?
A: Distribution is via multi-cast from the movie maker to all Edge servers.

Q: Will content providers have to buy space on an Edge server?
A: No, the Edge server uses a simple LRU algorithm.

Q: How much video is there out there, really?
Answer from audience: Terabytes per day, if you include the video that is not currently in digital form. It will be very difficult to decide what to cache.

Q: Why would an ISP buy an Edge server, instead of just buying more bandwidth?
A: There will never be enough bandwidth.

Q: Why not a direct line from a content provider to the ISP?
A: There are too many producers.

Comment from audience: The bandwidth will be available eventually, pricing is a question of market-structure, which means that the cost will go down eventually as well.

Q: How about accounting?
A: The Edge server will record everything.

Q: Why can't you use an ordinary HTTP proxy?
A: The Edge server can deliver a guaranteed QoS. Managing thousands of connections per second is hard, Oracle has developed hardware and software to do that.


Why are Live Media Still Uncommon? Are Browser Developers Blind?;
Heiner Wolf, University of Ulm, Germany

Q: We tried video between all our users, but nobody ever uses it. Live video is not a technical problem, but a social one.
A: For some situations it is better than for others.

Comment from audience: For conferences over a long distance is has more use.

Comment from audience: But in that case there is the time-zone problem.

Q: Audio can't be integrated into a browser window anyway, so why isn't it good enough to use an (existing) external application for that?
A: I don't want people to have to download plug-ins. They should be able to see my stuff immediately when they reach my page.

Q: But there will always be old browsers without built-in audio/video.
A: People need an incentive to upgrade. Java may be (have been) such an incentive, but it may well be the only one.

Comment from audience: Video-phones appear to be of little use, but video-conferencing may have good uses. It really depends on the task. But video on the Internet may not yet be ready for widespread use.

Comment from audience: The problem of multiple formats will work itself out over time.

Comment from audience: A single standard for doing product upgrades would be nice.

Comment from audience: How many plug-ins are there now? Can't we define a single format now?


Friday, October 25, 1996

Minutes (unless otherwise noted): Anselm Baird-Smith, W3C


Session 6: Audio and Video on the Internet (9:30-11:00)

Chair: Jean Bolot, INRIA


Integrating RTP into the World Wide Web;
Randa El-Marakby et al; University of Lancaster

RTP

TCP suffers from several problems when it comes to transporting real time data:

In contrast RTP offers:

RTP is split into:

Q.: Is this RTCP using the same transport layer as RTP?
A.: Yes, because RTP consists of both protocols: RTP data protocol and RTCP. RTP/RTCP is deliberately integrated within the appliation. RTP also is maily used above UDP.

Q.: So, why not split the two parts and put RTP over UDP and RTCP over TCP?
A.: But, RTCP is for real-time feedback

Q.: Is it a requirement?
A.: You can use RTP data protocol only but RTCP is helpful in the feedback and adaptive applications

In Our adaptive real-time Multimedia application in which we implemented, we handled congestion control through packet loss discovery (with the usual assumption that packet loss is congestion, as in TCP).

RTP and the Web

A Netscape client and sender get in touch through a central HTTP server (usually using CGI scripts), then they talk directly to each other through RTP and the client is not aware that Netscape is not running RTP directly. HTTP is used only at start up.

In our adaptive implementation, the transmission rate adapt to current state of the network by using RTCP provided information. This has the effect of changing the bandwidth.

Q.:In the multicast case, this can lead to some problems if subscribers live in different bandwidth ranges. What happens if one subscriber is experiencing a major packet loss more than the others.
A.:(one of the solutions suggested by J. Bolot would be to split the stream into several multicast groups, to which clients could subscribe depending on their bandwidth - the more groups you subscribe, the better quality you get).


The Multimedia Intranet;
Cosmo Nicolaou, Nemesys

Goals

TV quality broadcast on Intranets. Made possible because bandwidth is available (and is not a problem). Explicitly not targeted to the home user.

This domain has suffered from a chicken & egg problem: no bandwidth available means no applications to use it, no applications means the bandwidth is not made available.

Architecture

Clients connect directly via ATM to video codecs for receiving real-time video/audio streams. A separate server controls the devices and clients communicate with this server via RPC over IP.


Providing Video Services over Networks without Quality of Service Guarentees;
Stephen Jacobs and Alexandros Eleftheriadis; Columbia University

Speaker: Stephen Jacobs

Guarentees

Dynamic Rate Shaping is a way to transform video stream to make it suit available bandwidth.

Available bandwidth and general network data are collected by reimplementing TCP control flow on top of UDP.

DRS is a transformation of an MPEG stream that can be done on the fly by eliminating DCT coefficients.

Questions

User side concerns: is the user willing to accept a degraded video stream ?

Henning Schulzrinne: I am conducting human factors experiments to check this.

MPEG stream transformation: some concerns where expressed as to whether the technique was in fact usable (eliminating DCT coefficients might just not be enough).


Session 7: Is Web-based Digital Television possible? (11:30-13:00)

Chair: Philipp Hoschka, INRIA-W3C


mWeb: a framework for distributed presentations using the WWW and the MBone;
Peter Parnes et.al.; Lulea University

Goal: multicasted presentations for today needs.

Includes audio/video and some presentation material.

Previous work in that domain has always relied on modifying a browser (in general Mosaic), and the typical scenario is to broadcast URLs every N seconds.

The goal of mWeb is to distribute the material itself (eg HTML pages), and synchronization informations. mWeb is built on top of /TMP which includes:

RTP
A Java implementation of the RFC 1889
SRRTP
Scaleable Reliable RTP
SRFDP
For small files transfer.

Request for CCI (Comon Client Interface): should w3c take any actions on that to standardize such an API ?

HTML is enough for the purpose of distributed presentations (even if more limited then say powerpoint)

Something must be done about distributed learning, needs are present and no tools cover them yet (as of today).


Applying Real-time Multimedia Conferencing Techniques to the Web; Mark Handley, UCL

History of multimedia on Internet.

Most documents are available from:

ftp://ftp.isi.edu/confctrl/docs

This includes a number of protocol description (SDP Session description Protocol, SAP Session Announcement Protocol, etc)

Typical usage is to GET a document in a special MIME type that describes the session to join to get the actual data.

Ability to invite a recorder or a player inside a session (to record it or to display some multimedia streams to all participants). This can be done through the invite protocol.

HTML extensions may be needed to handle multimedia (speaker wasn't aware of OBJECT tag. It was briefly presented by Vincent Quint).


Opportunities for migration of Internet and Digital TV applications ; Jan van der Meer; Philips

Speaker: Jorgen Rosengren

Ability to predict what's going to happen is fundamental to the users: end users don't expect there TV to shutdown with a "core dump" message (ie as Netscape does today).

Security concerns: Applets should be run in a real sandbox (including CPU and memory limitations). Downloading an applet should not cause any damage to the browser itself, whatever the applet does.

Intersection between digital TV and delivery infrastructure and the Internet is not empty. IP frames can be encoded using standard MPEG system streams (kind of strange, but does work). This will happen.

There is a standard in the TV world for video server control (DSM-CC) based on CORBA (laughs).


Session 8: (14:15-14:45)

Real-Time Streaming Protocol; Rob Lanphier, Progressive Networks


Breakout Sessions (14:45-16:30)


RTSP

Chair and Minutes: Henning Schulzrinne (Columbia University)

Minutes: Philipp Hoschka (derived from this report by Mark Handley)

Henning and Mark presented the need for a protocol like RTSP to fit into the larger picture of both realtime conferencing and web playback.

It was initially thought that the combination of S(C)IP/HTTP/SDP for initiation and a trimmed RTSP (but capable of both TCP and UDP transport) would fit the requirements pretty well. Then, more hypermedia-style scenarios came up, which RTSP in its current form might be able to support, but that initial viewpoint on it might have difficulty in supporting.

Such scenarios include examples such as might be provided by more sophisticated frameworks such as MHEG, where once a presentation has been started, different media start and stop as appropriate according to *both* pre-programmed events from the server and user events from the client. This sort of scenario doesn't fit cleanly into a typical VCR-control type protocol.

From a technical point of view, we would seem to have a set of priorities for revision of RTSP and also a few problems.

Priorities:

This last modification is not difficult in itself, but seems to be where the (technical) problems begin. If we do this, RTSP either cannot initiate streams itself, or duplicates S(C)IP/HTTP functionality when it does so. If we don't do this, it becomes difficult to integrate RTSP into live multimedia sessions where it is natural to "invite" the server to participate in an existing session.

Integrating RTSP into SCIP (which has a concept of a session, unlike SIP) is a technical possibility, but probably technically undesirable (we end up with too many cross-dependencies between different scenarios where we want to use these protocols), and almost certainly politically infeasible. It would however provide a more unified solution to this last problem.

Using SIP to initiate new media flows (events really) within a RTSP session is also a possibility, but seems a little ungainly given the session nature of RTSP and the pre-session intention of SIP.

The problem here really is that RTSP needs S(C)IP functionality but we don't really want to integrate RTSP into S(C)IP and getting RTSP to utilise S(C)IP and (either SIP or SCIP are right now) isn't really very elegant. With time we could probably get this cross-reference between these protocols right, but given the timescales that PN/NSCP seem to need to move in, Mark does not think we can revise SIP/SCIP sufficiently to satisfy the requirement.


Digital Television and the Internet

Chair and Minutes: Philipp Hoschka, W3C

(discussion was hampered by the fact that most participants knowledgeable about digital TV/DAVIC had already left)

Philipp Hoschka (W3C): Why do Audio/Video over IP *and* digital television Network ? Why not use IP directly ?

There is an installed base of MPEG-based distribution systems.

You cannot do pay-per-view over Internet today.

Do you really *want* to do video over IP ? Low quality.

Dick Bulterman (CWI): Both digital television network and Internet can co-exist. Digital television for high-quality, pay-per-view. Internet for low quality, best effort.

Philipp Hoschka (W3C): Let's try to design an IP-based digital television network !

Marc Kaufman (Adobe): OC-* will never have enough bandwidth to carry a large number of wide-area continuous TV-quality video streams

(??): On the other hand, most traffic will be local. You don't want to watch a TV station local to San Francisco in the South of France.

But isn't this the great thing about the Web, that you can access non-local resources ? Plus, there are many expatriates in the South of France who only get a satellite dish to be able to watch their local TV station.

Martin Chapman (IONA): Here is the architecture used by DAVIC's DSM-CC (draws of a Client-Server system). Uses CORBA IDL. Functionality of protocol similar to video-recorder (pause etc.). Can we gateway http into this ? system was designed for a distance of 10 km between client and server, server stores all videos client wants to access.

Internet telephony: only people that don't own facilities are worried about this.

Patrick Soquet (Havas):

Peter Hoddie (Apple):


Session 9 (16:00-17:00)
Decisions on Future Work

Chair and Minutes: Philipp Hoschka, W3C

A number of possible work items were collected for further review by W3C management: