AVWoD - Concept and Realization
for Internet-based Media Integration

Marc Alexa, Norbert Gerfelder, Paul Grimm, Christian Seiler

Fraunhofer Institute for Computer Graphics
Dep. Animation and High-Definition Image Communication
64283 Darmstadt, Germany

Introduction

Due to the heavy increase of the use of computer networks for both commercial and academic use, network-based applications are nowadays a topic of major interest. Adding video and audio to computer applications and their use in computer networks has specific requirements on the quality issue. In the area of computer-based and distributed applications we are confronted with a large variety of distribution channels and output devices with different characteristics regarding bandwidth, processing power, display size, etc.

Until now, distributed systems and applications using audio and video are often separated. Current systems integrating different media are mostly stand-alone systems using harddisk or CD-ROM, without using network distribution of media. Therefore, one goal for the development of distributed network-based services has to be the integration of different media representations, e.g. text, video, audio, graphics and other types of representation for visual, acoustic and tactile communication primitives. These services would enable or improve network-based conferencing, teleteaching, news distribution, etc., to mention just a few application areas.

The goal of the AVWoD project - Audio-Video-Web-on-Demand - carried out at Fraunhofer IGD is to develop the concepts and to build the base of such distributed applications. AVWoD is a distributed, network-based system which enables the integrated and synchronized transmission and presentation of audio, video, and WWW hypertext documents. Users can select available sessions which are provided by a AVWoD server. A session in the AVWoD context consists of synchronized media data streams sharing a common topic.

The development was based on different prerequisites:

The above stated prerequisites lead to the use or adaptation of HTTP, RTP, and UDP/IP as communication protocols, MBone tools in the area of audio and video presentation and Java for user interface design [BeCL 94] [MaBr 94] [SCFJ 96] [SUN 95]. In the following sections the different parts of AVWoD will be explained and discussed in detail. Conclusions and prospects are given at the end of the paper.

Requirements and Realization of the AVWoD System

In the development of network-based multimedia applications bandwidth limitations and their variability have to be considered. To satisfy as many human senses as possible, the user has to be suited with different kinds of media. Typically these media demand varying amounts of bandwidth. Due to these circumstances it is necessary that the used media are scalable. Only scalability allows for variable split up of available bandwidth among the different media according to

This concept of scalability is discussed later in more detail in the section "Quality - Media Scalability".

The requirements and the prerequisites lead to a client-server model of processing. Systems based on this model offer the opportunity to use a simple platform independent client which takes advantage of widespread and platform independent tools, in our case receiver, viewer and presentation tools. The server can be developed on dedicated computer systems to offer maximum application performance; platform independence for this particular component needs not to be achieved.

Furthermore, the use of separated delivery channels and protocols for each medium should be enabled, because media and channel separation offer the following advantages:

We defined an open environment to encode any media data at the server side, transmit it to the client and decode the received data. So each medium has its own pair of transmission and presentation tools. "Transmission" includes the appropriate channel encoding of the medium and the transmission of the coded data stream. "Presentation" covers receiving the coded data stream, channel and source decoding, and the presentation of the medium.

Hence, we separated all media related services from the AVWoD client-server system. The AVWoD server provides an interface through which the transmission tool for each medium is controlled. The presentation tools are started by the client and their control and synchronization is handled by the AVWoD server and the corresponding transmission tool. The communication between server and client enables the exchange of the following information and action requests:

Figure 1 shows the principle of the communication and data exchange model:

Figure 1: AVWoD communication and data exchange model

Although it might be desirable to support as many media as possible, the first step has to be the support of typically needed media for presentations, lectures, etc.: video, audio and slides. In many applications slides - text, graphics, and images - contain most of the information, but many distributed systems handle slides as video data. This is obviously a waste of bandwidth and furthermore interaction and presentation are restricted. So we decided to send slides as HTML pages which offer an appropriate format and representation for this kind of information, needing only low bandwidth.

We have chosen MBone tools which are widely used and freely available for many different platforms as presentation tools for video and audio data. Consequently our current implementation integrates the vic video and the vat audio tool as receiver and presentation tools at the client side. We have developed an own set of tools for encoding and transmission of video and audio data.

We are currently using RTP as the protocol for video and audio transmission [SCFJ 96]. RTP is the native protocol for MBone today and as an open standard ensures flexibility and simple interoperability with other third party applications, too. Although, the AVWoD system is not limited to RTP and HTTP as transmission protocols, the usage especially of RTP has even more advantages: Many different encodings and formats are already defined for the use with RTP thus offering the opportunity to provide the user with a selected choice of encodings tailored to suit his needs towards bandwidth or quality.

Right now AVWoD supports nv [Fred 94], CellB [SpHo 96], and M-JPEG as video encodings. All supported audio encodings by the vat tool are implemented (e.g., ITU-R G.711 law, GSM, DVI). Especially in the area of video encoding the supported formats only offer a limited set of quality levels.

In the following we will discuss the client part of the AVWoD system. The major design and implementation goals for the client were:

The above mentioned goals can be reached by using a Web-based system including Web browsers and the use of the Java programming language to build the graphical user interface. The client user interface is designed as a normal Web page using frames and Java-applets. The first frame and the included applets are used for the control of the application and the client-server communication. This frame is namely the AVWoD-client. It is kept unchanged for all different sessions of the application and throughout various application areas. The second frame will be used for the presentation of available sessions and afterwards the visualization of synchronized HTML slides, and can be designed for specific application needs, e.g., those of teleteaching, news delivery, etc. This frame is only restricted by its role as a media presenter, in the moment the presentation of HTML. The first frame - the control frame - is responsible for the time-synchronized presentation of HTML inside the second frame together with starting of external receiving and presentation tools for other media. Note, that the use of the AVWoD-client for controlling slides is a violation of our design goals. However, this approach offers an optimized and easy way to handle HTML pages.

Inside the second frame - the selection and visualization frame - session selection is realized by means of Java-applets representing buttons. These buttons send messages to the control frame. Figure 2 shows the dependencies between the Java-applets, the different servers and external presentation tools.

Figure 2: AVWoD dependencies

Quality - Media Scalability

In general, each application or communication has a specific communicative goal [GeMü 94]. To reach this goal, the following factors have to be considered in development and use of applications and media:

Human perception is always integral and complete and we are using all our senses while communicating - consequently our perception is multimedia. Thus, the main goal in development of AVWoD encompasses not only the inclusion of more and more monomedia. The factors described above and the different human perception abilities have also to be taken into account.

Only the determination of and adherence to application- and situation-dependent quality criteria, with respect to coding, transmission and representation of information, will lead to an undisturbed, error-minimized, and efficiency-enhanced mode of man-computer interaction and man-to-man communication. Apart from the possibility of media change, e.g. delivering of content as video or audio or text, the aspect of quality scaling is of major importance.

Quality scaling of video data can be achieved by varying different media parameter. The adaptive change of frame rate is only one possibility to achieve an adequate quality, depending on the basic factors as communicative goals, etc. which were described above.

Different quality levels were achieved in a first step by using different formats and encodings. Hence, choosing a specific format means choosing a specific quality level. This is not appropriate because of the waste of storage capacity and the need for a large variety of conversion and presentation tools.

Today's video compression standards are mostly based on a hybrid coding using the Discrete Cosine Transform (DCT). Examples for these standards are JPEG [ISO 94], MPEG-1 [ISO 93] and MPEG-2 [ISO 96]. One major disadvantage of these compression schemes is the complexity of scaling on the compressed video streams. Scaling is often achieved by decompressing and decoding the video stream, using algorithms working in the spatial domain and compressing and encoding the video stream again at the end of the process.

We are currently developing algorithms which can be applied in the frequency domain, so no inverse DCT has to be applied to perform different kinds of scaling [Alex 96]. They enable spatial and color resolution scalability as well as SNR (quantization) scalability. Using these algorithms, scaling operations can be performed on-demand without large decoding and encoding processes within the AVWoD system.

MPEG-2 [ISO 96] offers only the SNR/Spatial profile at High 1440 level for scaling. Because only the Main profile / Main Level is used for distribution, respectively the 4:2:2 profile [ISO 96a] for contribution, the scalability offered in MPEG-2 is currently of academic interest only.

In our current implementation of the AVWoD transmission tools we use intraframe coding (JPEG). With this, only moderate compression rates can be achieved, but the processing can be performed very fast because no motion compensation is used.

It is necessary to use a video meta-format, which is scalable on-line (while transmitting the data). With the use of the above mentioned scaling algorithms, a DCT-based compression of video data is appropriate as a meta-format for archiving and processing. This would also allow for changing the quality level during transmission, and thus adapt to the available bandwidth and channel limitations.

The quality can be controlled by either the AVWoD server or, transparently to the server, by an automatic controller. Such a controller could use information gathered through the transmission protocol. The control protocol (RTCP), associated to the RTP generates the kind of information needed for an automatic adjustment of transmission parameters as described above.

Conclusions and Prospect

The development of AVWoD shows a flexible and extensible concept for both media integration and quality scalability. Adaptive quality settings and the feasibility of on-demand quality scaling are the base of an user-oriented presentation. This is of major importance if we consider communication channels with different bandwidth and Quality-of-Service restrictions.

Using Java enables the development of integrated, Web-based user-interface, ease to use and with updates being transparent for the user. One of the next steps is to use Plug-Ins or Java-applets for receiving and presenting audio, video data and additional media or representations. SunSoft will publish the Java-media extensions for audio and video integration in 1997 [Java 96]. These extensions allow for developments of presentation tools embedded in Web pages. Furthermore, we are working on the integration of telepointer and whiteboard as additional media for information representation.

The AVWoD-system can be contacted using the URL:

http://avwod.igd.fhg.de

References

[Alex 96]
Alexa M., et.al., Fast Scaling of DCT-based Images in the Frequency Domain, Fraunhofer IGD, 1996, to be published
[BeCL 94]
Berners-Lee T., Cailliau R., Luotonen A., Nielsen H.F., Secret A., The World-Wide Web, Communications of the ACM, Vol.37 No.8, Aug. 1994, pp. 76-82
[Fred 94]
Frederick R., Experiences with real-time software video compression, Xerox PARC, ftp://parcftp.xerox.com/pub/net-research/nv-paper.ps, July 1994
[GeMü 94]
Gerfelder N., Müller W., Quality Aspects for Computer-Based Video Services, Proceedings SMPTE 1994 European Conference, pp. 44-67
[ISO 93]
ISO/IEC, ISO/IEC 11172-2:1993, Information Technology - Coding of moving pictures and associated audio for storage media at up to about 1,5 Mbit/s Part 2: Video, 1993
[ISO 94]
ISO/IEC, ISO/IEC 10918-2:1994 / ITU-T Recommendation T.81 (1994), Information Technology - Digital compression and coding of continuous-tone still images: Requirements and guidelines, 1994
[ISO 96]
ISO/IEC, ISO/IEC 13818:2:1996 / ITU-T Recommendation H.262 (1996), Information Technology - Generic coding of moving pictures and associated audio information: Video, 1996
[ISO 96a]
ISO/IEC, ISO/IEC 13818:2:1996/AMD 2 / ITU-T Recommendation H.262 (1996), Information Technology - Generic coding of moving pictures and associated audio information: Video AMENDMENT 2: 4:2:2 profile, 1996
[Java 96]
JavaSoft, Java API Overview, Sun Microsystems, Mountain View, CA, 1996 http://www.javasoft.com/products/apiOverview.html
[MaBr 94]
Macedonia M., Brutzman D., MBone Provides Audio and Video Across the Internet, IEEE Computer, April 1994, pp. 30-36
[SCFJ 96]
Schulzrinne H., Casner S., Frederick R., Jackobson V., RTP: A Transport Protocol For Real-Time Applications, RFC 1889, Jan. 1996
[SpHo 96]
Speer M., Hoffmann D., RTP Payload Format of Sun's CellB Video Encoding, IETF Internet Draft, July 1996
[SUN 95]
Sun Microsystems, Inc., The Java Language Environment: A White Paper, Sun Microsystems, Mountain View, CA, 1995