Multimedia Communication Standards and the WWW

Introduction There is an emerging market of multimedia communications products that use Intranets (especially local LANs) as the communications infrastructure. A variety of standard protocols have been defined to enable interworking among these products. These standards need to be adopted by the WWW community to allow interoperation of WWW access with these products to provide true ubiquitous multimedia communication through both intranets and the Internet.

Background

As a basis for real time multimedia communication products, an Intranet (particularly a LAN with a limited number router "hops" between terminals) is an attractive vehicle for implementing multimedia communications. Over the past 18 months several products have been introduced that allow voice, video and collaborative data sharing over a LAN. One among these is the Multimedia Communications eXchange (MMCX) from Lucent Technologies (http://www.lucent.com/BusinessWorks/bw/mmcx.html). The MMCX server enables UNIX workstation (from SUN or HP) users to collaborate using voice , video and shared data. Users can see and hear each other and share data applications (via a whiteboard and shared-X) in multiparty (up to 6 parties) conferences. Ordinary telephones attached to the Public Switched Telephone Network (PSTN) can also participate in a conference as audio only endpoints.

All communication streams to workstations flow over an IP network. Audio to the PSTN is interworked to the LAN through the MMCX server. Quality of service is managed by proper engineering of the IP network and limiting router hops. For such products a local IP network has advantages in bandwidth and delay over a larger network. Additionally, since these products are introduced within an enterprise (typically within an engineering department) commonality of workstations and ability to properly configure the IP network can be arranged more easily than in a broader setting.

Communique! from Insoft (now part of Netscape), Intel Proshare (http://www.intel.com/comm-net/proshare/index.htm), PictureTel LiveLAN (http://www.picturetel.com) are some of the products in this area. These products have been built on proprietary protocols and interoperation is limited at best.

Communications products require standards to interoperate. Users are not willing to commit to large deployments of these products without assurance that different products can be used together with other communication devices (ordinary telephones, for example) and will not be made obsolete by technical changes or the failure of one particular product line. Properly conceived communications standards address all of these issues.

The lack of standards prevents a communication market from growing and thus limits investment in the technology. It is in the interest of vendors to cooperate in establishing at least a base level of interoperation among products.

Recently, standards for multimedia communications over an IP network have begun to emerge and products developed to them will begin to appear in the next few months. Most companies with products in this area have committed to supporting these standards (including all those mentioned above). The standards (H.323 and others) have been created by a study group of the ITU (study group 15) and represent the collective view of many of the companies with experience in this area. The standards take into account the nature LANs and also interworking with existing wide area collaborative standards (such as H.320 - a multimedia standard for ISDN lines) and ordinary telephones.

H.323

Briefly, the suite of H.323 standards describe how multimedia terminals on a non-guaranteed bandwidth network should interoperate. Key pieces of H.323 are call setup signaling (based on Q.931), media stream representation (H.225.1) and media control (H.245). Call setup can be centralized in a server or handled directly between terminals. Since there was strong participation by organizations familiar with IP LAN conventions, the standards are suitable for IP networks. H.225.1 represents media streams using RTP/RTCP (RFC 1889) as defined by the IETF for framing and timing. H.245 is a powerful protocol for managing streams among terminals and for representing the capabilities of terminals in a conference. H.245 allows new media formats to be used in a conference by adding only a new code point.

These protocols are available via ftp from ftp.gctech.co.jp login itu-t password sg15!avc. The site has (or did have) working versions of the specifications. Hunt around a bit. There is also a mail reflector for the H.323 implementers group - subscribe by sending mail to majordomo@mailbag.intel.com; in the body of the message put subscribe h323implementors <you address or null>. This group discusses detailed implementation issues and does not deal in tutorial information.

H.323 calls for the T.120 suite of protocols to be the basis of collaborative data applications. T.120 includes a protocol for a multiparty shared white board and a multiparty binary file transfer protocol. A multiparty application share protocol has recently been presented for incorporation into the T.120 suite by Microsoft, PictureTel and Polycom (ftp://ftp.imtc-files.org sub-directory imtc-site/t120_napa96 file tshare.zip).

.

Multimedia Standards and the WWW

The Opportunity

Users of the WWW should be able to work with multimedia communication products developed for an Intranet environment. We see this as analogous to a phone from home interoperating with a business phone or the interoperation between PBX systems from different vendors.

Much of the ubiquity of the web and its universal utility comes from standardization around formats and protocols (HTML and HTTP in particular). WWW multimedia protocols should interoperate with the "business" multimedia workstations based on H.323 standards. This would allow a web browser to collaborate in a multimedia session with one or more multimedia workstations sharing voice, video and data. For example DataBeam Corporation is testing a product that allows a web browser to participate in a T.120 based whiteboard collaboration (http://www.databeam.com/Products/neT.120/). Similarly, streaming media could be incorporated into a collaboration to serve as a training session or a review of audio or video material.

Integration of H.323 standards into WEB components will also enable the development of multimedia communication interfaces based entirely on standard web components. This greatly expands the number of endpoints that could be in a multimedia collaboration and will allow an "Internet Appliance" to collaborate as well. Imagine being able to retrieve your mail (voice, video and electronic) from an Internet Appliance in a hotel room or a library. A prototype of such a retrieval server has been built for the Intuity Message Manager from Lucent Technologies. Standardization of audio representation and the availability of a variety of codecs would make developing such services easier. A common way of representing terminal multimedia capabilities is also needed.

Voice over the Internet is growing rapidly. Unfortunately, there is little interoperability between products from different vendors. These products will remain "toys" until standards for interoperation are established. In this market as in any communications market a base level of interoperation is required. After this is established, competition based on features, price, etc. is good for the growth of the market.

The Challenge

The Internet is different from a controlled intranet in terms of bandwidth and delay. The media stream formats called for in H.323 may require more bandwidth than many Internet customers can currently get. Nonetheless the way to handle this is to build on H.323, extending the set of media stream formats with those appropriate to the Internet. The various media streams can be interworked at "media gateways" where specialized conversion resources may be present (the Internet telephony Gateway from Lucent Technologies an example of such a media server for audio, interworking GSM, G.723 and G.711 formats). H.245 is rich enough to allow a media server to make good decisions about how to interwork the media. H.245 also allows more capable terminals to "fall back" to a lower bit rate coding to avoid interworking all together (at the cost of a lower quality multimedia "experience"). Many of the features of H.245 were incorporated based on insight gained from the H.320 protocols where terminals of various capabilities had to be interworked at an Multipoint Control Unit (MCU).

There is much work to be done to improve the Internet as a base for real-time multimedia communication.

· Identification of media stream formats suitable for Internet bandwidths and delay characteristics will allow Internet users to communicate with H.323 users in enterprise settings via real-time multimedia. Careful consideration of the computation and delay needed for stream interworking is required, especially for video.

· Improving the delay characteristics and lowering the bandwidth requirements for RTP/RTCP headers will benefit H.323 based multimedia communications and is independent of the H.323 standards. This, along with a set of low bitrate media stream formats will enable H.323 to operate over 28.8 modems, greatly expanding the reach of real-time multimedia communication.

· Expanding the use of reservation protocols such as RSVP throughout the Internet will improve the quality of interactive multimedia.

· Augmenting WWW standards so that browsers can integrate cleanly with H.323 terminals will greatly increase the possibility of HTML to H.323 terminal interoperation and provide an exciting way to deliver and customize user interfaces for multimedia communications. This group along with a few others has particular experience with WWW standards and is uniquely situated to determine "the right way" to do this.

· Standardization of Internet voice products around H.323 will increase the utility of all these products.

· Modification of firewall technology must be done to allow these communication protocols through firewalls in a secure way.

Imagine a world where a student with nothing more than a web enabled device can visit a school's homework support desk and find help on the most recent assignment. If the published hints are not sufficient the student can consult via audio with a teacher who is in a pool of teachers providing consulting today. The teacher can see the students face and judge how the explanation is being understood. The teacher then augments the consultation with a short video clip. An expert can be added to the conversation and a three way whiteboard exposition made to further explain the concepts of the homework.

This scenario is within reach if we can focus on implementing and extending real-time multimedia communications standards.