Submitted to the World Wide Web Consortium Workshop on "Real Time Multimedia and the Web," September, 1996.
Stephen Jacobs and Alexandros Eleftheriadis
There are many reasons for developing video services over networks without Quality of Service (QoS) guarantees. One reason is that the cost of a non-QoS connection will be substantially cheaper for the user. Another reason is that although networks of the future will provide the option for a QoS connection, today's networks have no such provisions. This position paper presents a novel approach to the problem by acknowledging that a successful solution must contain both a networking and an image processing perspective. The proposed approach should have better visual quality and utilize network resources more fully then current techniques.
The underlying technologies of the Internet are not sufficient to support Quality of Service (QoS) guarantees. The evolution of the base technologies of the World Wide Web could result in any number of possibilities, including ATM backbones, IP switching, or fully deployed ATM networks. Regardless of the specifics, the general consensus is that this network of the future will have QoS and that users will be able to request connections with or without QoS. Certainly connections which reserve resources would demand a higher cost and users may not always want to pay this extra cost for a particular video service. If a user is watching a major sports event perhaps the extra cost would be worthwhile. However, watching home movies from a relative's home page may not warrant the extra cost. Thus, some video services may not need QoS. Also, another motivation for being able to provide good quality video services without QoS is that today there is no option for having QoS. Finally, wireless networks may never have QoS as there is no way to guarantee a certain bandwidth when there exist so many other variables that affect the throughput on the wireless link.
In the next two sections we provide overviews of both current work and our proposal on Bandwidth Estimation and Rate Shaping, respectively. In Section 4, we provide some details of our proposal. Section 5 discusses how the proposed architecture scales to other network platforms and Section 6 contains some closing comments.
The technique for developing video services without QoS involves an explicit attempt at avoiding network congestion. Clearly, network congestion hurts the performance for all users of the network. The goal is to send only the data that can fit into the network at a particular time. This requires an estimate of the bandwidth in the network. Much work has been done in the past regarding bandwidth estimation [1,5]. However, providing good quality networked video requires innovative solutions from both the networking and image processing perspectives. Past attempts at solving the problem of video services over non-QoS networks have tried to isolate the problem as being purely networking. This has resulted in crude techniques, such as frame dropping, for forcing high bandwidth video streams through small bit pipes .
Our first attempt at bandwidth estimation will be based on the TCP congestion control algorithm. Although it currently has disadvantages for use with wide area networks and also for very high speed networks it does have several advantages . TCP streams have been shown to work well together and today's internet is proof of that. They operate according to a greedy but "socially-minded" algorithm which attempts to get as much bandwidth as possible, but backs off substantially during congestion. However, this substantial back-off is problematic for video quality.
We are currently implementing a solution based on the dual-perspective approach. Namely, we are concerned with both the difficulties of bandwidth estimation as well as developing a methodology for shaping compressed MPEG-2 streams to a continuum of possible bandwidths. This technique is called Dynamic Rate Shaping (DRS) . In its simplest form, DRS selectively drops coefficients from the bit stream which are least important in terms of image quality. This gives us the ability to dynamically change the bit rate of a pre-compressed stream. Of course, there is much overhead in MPEG so there is a lower bound on the rate of the shaped bit stream. The lower bound is reached once all but the DC coefficients are dropped from every macroblock.
The advantage of this technique is that DRS can meet any reasonable bandwidth estimation and maintain 30 frames per second (fps). Frame dropping gives a very crude approximation to the bandwidth estimation since the size of a frame is quite large and this is the smallest amount by which the bandwidth can be reduced. This means that in most cases the system will not be utilizing its bandwidth resources to the maximum extent possible.
The quality that can be reproduced with DRS more closely approximates that of high quality video. The reason for this is that an effort is always made to maintain 30 fps. Of course, the quality has to degrade somewhat, but it does so in a hierarchical fashion by dropping the higher frequency components first, which are less sensitive to the human visual system.
Also, the period of adaptation for DRS is much faster. The unit of rate shaping could be as small as a macroblock. This would mean that as soon as the system receives an estimate of lowered bandwidth, it would begin sending the stream at the lower rate. Each macroblock that was sent out after receiving the estimate would correspond to the new estimate and the shaping would be almost immediately. In actual practice, this probably would not be done to such an extreme because it would probably be disturbing to the user if half of a frame was good quality and the other half of the frame was low quality.
Another distinct advantage of using DRS is that it decouples the encoder from the system. Some work has been done on providing feedback to the encoder and then adjusting the quantizer based on this feedback. This has the disadvantage of only working with live video and must have as many encoders as simultaneous video streams.
The proposed system architecture is shown in Figure 1. The encoder stores the video on local storage media. Later, when a stream request is initiated by a user, the stored stream is read piece by piece into a buffer. DRS then shapes the stream into the appropriate bit rate and the data is sent out with the correct rate onto the network using UDP datagrams. The receiver buffers the data before playing, and then begins decoding.
Each packet received generates an upstream acknowledgement which is used to implement the TCP congestion control algorithm. This information is fed into the Bandwidth Estimator Translator to convert congestion window size into a metric that is meaningful to DRS.
Figure 1. Architecture of proposed system
The Internet is only one application for this system. Any network without QoS is eligible, though the means of determining the available bandwidth will certainly differ. The ATM Available Bit Rate (ATM-ABR) traffic type guarantees a minimum bandwidth but provides a user with an indication of how much extra bandwidth is available. The information about the extra available bandwidth is obtained via Resource Management Cells that are sent periodically. In this case, the problem of bandwidth estimation is primarily solved, though the Bandwidth Estimator Translator is still necessary to interpret the data for DRS.
In addition, the wireless environment is an ideal one for this system because it cannot guarantee QoS inherently. The available bandwidth on a wireless link tends to change as the user changes location. Bandwidth estimation is a difficult problem here because a dropped packet doesn't necessarily mean congestion, but instead could be due to errors on the unreliable air interface. If the link layer were made reliable by retransmissions and sequence numbering, the same techniques could be used as in the Internet case.
We have presented a novel proposal to the problem of networked video services operating without QoS guarantees and have shown that this technique will be useful for Internet, ATM, and wireless environments. This approach uses standard congestion avoidance measures that have been shown to work conservatively and couples them with a breakthrough technology to shape the video to a broad range of possible bit rates. It is the first approach of its type in that it acknowledges that the problem is one of both networking and image processing.
 I. Busse, B. Deffner, and H. Schulzrinne, "Dynamic QoS control of multimedia applications based on RTP," in First International Workshop on High Speed Networks and Open Distributed Platforms, (St. Petersburg, Russia), June 1995.
 Z. Chen, S. M. Tan, R. H. Campbell and Y. Li, Real Time Video and Audio in the World Wide Web, World Wide Web Journal, Volume 1, January 1996.
 A. Eleftheriadis and D. Anastassiou, "Meeting Arbitrary QoS Constraints Using Dynamic Rate Shaping of Coded Digital Video", Proceedings, 5th International Workshop on Network and Operating System Support for Digital Audio and Video, Durham, New Hampshire, April 1995, pp. 95-106.
 K. M. Khalil and Y.S. Sun, "Performance Considerations for TCP/IP in Wide Area Networks," Proceedings. 19th Conference on Local Computer Networks, 2-5 Oct. 1994, pp. 166-175.
 T. Sakatani, "Congestion avoidance for video over IP networks," Multimedia Transport and Teleservices Proceedings, 13-15 Nov. 1994. pp. 256-273.